SlideShare a Scribd company logo
1 of 100
Retail Reference Architecture
with MongoDB
Antoine Girbal
Principal Solutions Engineer, MongoDB Inc.
@antoinegirbal
Introduction
4
• it is way too broad to tackle with one solution
• data maps so well to the document model
• needs for agility, performance and scaling
• Many (e)retailers are already using MongoDB
• Let's define the best ways and places for it!
Retail solution
5
• Holds complex JSON structures
• Dynamic Schema for Agility
• complex querying and in-place updating
• Secondary, compound and geo indexing
• full consistency, durability, atomic operations
• Near linear scaling via sharding
• Overall, MongoDB is a unique fit!
MongoDB is a great fit
6
MongoDB Strategic Advantages
Horizontally Scalable
-Sharding
Agile
Flexible
High Performance &
Strong Consistency
Application
Highly
Available
-Replica Sets
{ customer: “roger”,
date: new Date(),
comment: “Spirited Away”,
tags: [“Tezuka”, “Manga”]}
7
build your data to fit your application
Relational MongoDB
{ customer_id : 1,
name : "Mark Smith",
city : "San Francisco",
orders: [ {
order_number : 13,
store_id : 10,
date: “2014-01-03”,
products: [
{SKU: 24578234,
Qty: 3,
Unit_price: 350},
{SKU: 98762345,
Qty: 1,
Unit_Price: 110}
]
},
{ <...> }
]
}
CustomerID First Name Last Name City
0 John Doe New York
1 Mark Smith San Francisco
2 Jay Black Newark
3 Meagan White London
4 Edward Danields Boston
Order Number Store ID Product Customer ID
10 100 Tablet 0
11 101 Smartphone 0
12 101 Dishwasher 0
13 200 Sofa 1
14 200 Coffee table 1
15 201 Suit 2
8
Notions
RDBMS MongoDB
Database Database
Table Collection
Row Document
Column Field
Retail Components Overview
10
Information
Management
Merchandising
Content
Inventory
Customer
Channel
Sales &
Fulfillment
Insight
Social
Architecture Overview
Customer
Channels
Amazon
Ebay
…
Stores
POS
Kiosk
…
Mobile
Smartphone
Tablet
Website
Contact
Center
API
Data and
Service
Integration
Social
Facebook
Twitter
…
Data
Warehouse
Analytics
Supply Chain
Management
System
Suppliers
3rd Party
In Network
Web
Servers
Application
Servers
11
Commerce Functional Components
Information
Layer
Look & Feel
Navigation
Customization
Personalization
Branding
Promotions
Chat
Ads
Customer's
Perspective
Research
Browse
Search
Select
Shopping Cart
Purchase
Checkout
Receive
Track
Use
Feedback
Maintain
Dialog
Assist
Market / Offer
Guide
Offer
Semantic
Search
Recommend
Rule-based
Decisions
Pricing
Coupons
Sell / Fullfill
Orders
Payments
Fraud
Detection
Fulfillment
Business Rules
Insight
Session
Capture
Activity
Monitoring
Customer Enterprise
Information
Management
Merchandising
Content
Inventory
Customer
Channel
Sales &
Fulfillment
Insight
Social
Merchandising
13
Merchandising
Merchandising
MongoDB
Variant
Hierarchy
Pricing
Promotions
Ratings & Reviews
Calendar
Semantic Search
Item
Localization
14
• Single view of a product, one central catalog service
• Read volume high and sustained, 100k reads / s
• Write volume spikes up during catalog update
• Advanced indexing and querying
• Geographical distribution and low latency
• No need for a cache layer, CDN for assets
Merchandising - principles
15
Merchandising - requirements
Requirement Example Challenge MongoDB
Single-view of product Blended description and
hierarchy of product to
ensure availability on all
channels
Flexible document-oriented
storage
High sustained read
volume with low latency
Constant querying from
online users and sales
associates, requiring
immediate response
Fast indexed querying,
replication allows local copy
of catalog, sharding for
scaling
Spiky and real-time write
volume
Bulk update of full catalog
without impacting
production, real-time touch
update
Fast in-place updating, real-
time indexing, , sharding for
scaling
Advanced querying Find product based on
color, size, description
Ad-hoc querying on any
field, advanced secondary
and compound indexing
16
Merchandising - Product Page
Product
images
General
Informatio
n
List of
Variants
External
Informatio
n
Localized
Description
17
> db.item.findOne()
{ _id: "301671", // main item id
department: "Shoes",
category: "Shoes/Women/Pumps",
brand: "Guess",
thumbnail: "http://cdn…/pump.jpg",
image: "http://cdn…/pump1.jpg", // larger version of thumbnail
title: "Evening Platform Pumps",
description: "Those evening platform pumps put the perfect
finishing touches on your most glamourous night-on-the-town
outfit",
shortDescription: "Evening Platform Pumps",
style: "Designer",
type: "Platform",
rating: 4.5, // user rating
lastUpdated: Date("2014/04/01"), // last update time
… }
Merchandising - Item Model
18
• Get item by id
db.definition.findOne( { _id: "301671" } )
• Get item from Product Ids
db.definition.findOne( { _id: { $in: ["301671", "301672" ] } } )
• Get items by department
db.definition.find({ department: "Shoes" })
• Get items by category prefix
db.definition.find( { category: /^Shoes/Women/ } )
• Indices
productId, department, category, lastUpdated
Merchandising - Item Definition
19
> db.variant.findOne()
{
_id: "730223104376", // the sku
itemId: "301671", // references item id
thumbnail: "http://cdn…/pump-red.jpg", // variant
specific
image: "http://cdn…/pump-red.jpg",
size: 6.0,
color: "Red",
width: "B",
heelHeight: 5.0,
lastUpdated: Date("2014/04/01"), // last update time
…
}
Merchandising – Variant Model
20
• Get variant from SKU
db.variation.find( { _id: "730223104376" } )
• Get all variants for a product, sorted by SKU
db.variation.find( { productId: "301671" } ).sort( { _id: 1 } )
• Indices
productId, lastUpdated
Merchandising – Variant Model
22
Per store Pricing could result in billions of documents,
unless you build it in a modular way
Price: {
_id: "sku730223104376_store123",
currency: "USD",
price: 89.95,
lastUpdated: Date("2014/04/01"), // last update time
…
}
_id: concatenation of item and store.
Item: can be an item id or sku
Store: can be a store group or store id.
Indices: lastUpdated
Merchandising – per store Pricing
23
• Get all prices for a given item
db.prices.find( { _id: /^p301671_/ )
• Get all prices for a given sku (price could be at item level)
db.prices.find( { _id: { $in: [ /^sku730223104376_/, /^p301671_/ ])
• Get minimum and maximum prices for a sku
db.prices.aggregate( { match }, { $group: { _id: 1, min: { $min: price },
max: { $max : price} } })
• Get price for a sku and store id (returns up to 4 prices)
db.prices.find( { _id: { $in: [ "sku730223104376_store1234",
"sku730223104376_sgroup0",
"p301671_store1234",
"p301671_sgroup0"] , { price: 1 })
Merchandising – per store Pricing
26
Merchandising – Browse and Search products
Browse by
category
Special
Lists
Filter by
attributes
Lists hundreds
of item
summaries
Ideally a single query is issued to the database
to obtain all items and metadata to display
27
The previous page presents many challenges:
• Response within milliseconds for hundreds of items
• Faceted search on many attributes: category, brand, …
• Attributes at the variant level: color, size, etc, and the
variation's image should be shown
• thousands of variants for an item, need to de-duplicate
• Efficient sorting on several attributes: price, popularity
• Pagination feature which requires deterministic ordering
Merchandising – Browse and Search products
28
Merchandising – Browse and Search products
Hundreds
of sizes
One Item
Dozens of
colors
A single item may have thousands of variants
29
Merchandising – Browse and Search products
Images of the matching
variants are displayed
Hierarchy
Sort
parameter
Faceted
Search
30
Merchandising – Traditional Architecture
Relational DB
System of Records
Full Text Search
Engine
Indexing
#1 obtain
search
results IDs
ApplicationCache
#2 obtain
objects by
ID
Pre-joined
into objects
31
The traditional architecture issues:
• 3 different systems to maintain: RDBMS, Search
engine, Caching layer
• search returns a list of IDs to be looked up in the cache,
increases latency of response
• RDBMS schema is complex and static
• The search index is expensive to update
• Setup does not allow efficient pagination
Merchandising – Traditional Architecture
32
MongoDB Data Store
Merchandising - Architecture
SummariesItems Pricing
PromotionsVariants
Ratings &
Reviews
#1 Obtain
results
33
The summary relies on the following parameters:
• department e.g. "Shoes"
• An indexed attribute
– Category path, e.g. "Shoes/Women/Pumps"
– Price range
– List of Item Attributes, e.g. Brand = Guess
– List of Variant Attributes, e.g. Color = red
• A non-indexed attribute
– List of Item Secondary Attributes, e.g. Style = Designer
– List of Variant Secondary Attributes, e.g. heel height = 4.0
• Sorting, e.g. Price Low to High
Merchandising – Summary Model
34
> db.summaries.findOne()
{ "_id": "p39",
"title": "Evening Platform Pumps 39",
"department": "Shoes", "category": "Shoes/Women/Pumps",
"thumbnail": "http://cdn…/pump-small-39.jpg", "image":
"http://cdn…/pump-39.jpg",
"price": 145.99,
"rating": 0.95,
"attrs": [ { "brand" : "Guess"}, … ],
"sattrs": [ { "style" : "Designer"} , { "type" : "Platform"}, …],
"vars": [
{ "sku": "sku2441",
"thumbnail": "http://cdn…/pump-small-39.jpg.Blue",
"image": "http://cdn…/pump-39.jpg.Blue",
"attrs": [ { "size": 6.0 }, { "color": "Blue" }, …],
"sattrs": [ { "width" : "B"} , { "heelHeight" : 5.0 }, …],
}, … Many more skus …
] }
Merchandising – Summary Model
35
• Get summary from item id
db.variation.find({ _id: "p301671" })
• Get summary's specific variation from SKU
db.variation.find( { "vars.sku": "730223104376" }, { "vars.$": 1 } )
• Get summary by department, sorted by rating
db.variation.find( { department: "Shoes" } ).sort( { rating: 1 } )
• Get summary with mix of parameters
db.variation.find( { department : "Shoes" ,
"vars.attrs" : { "color" : "Gray"} ,
"category" : ^/Shoes/Women/ ,
"price" : { "$gte" : 65.99 , "$lte" : 180.99 } } )
Merchandising - Summary Model
36
Merchandising – Summary Model
• The following indices are used:
– department + attr + category + _id
– department + vars.attrs + category + _id
– department + category + _id
– department + price + _id
– department + rating + _id
• _id used for pagination
• Can take advantage of index intersection
• With several attributes specified (e.g. color=red
and size=6), which one is looked up?
37
Facet samples:
{ "_id" : "Accessory Type=Hosiery" , "count" : 14}
{ "_id" : "Ladder Material=Steel" , "count" : 2}
{ "_id" : "Gold Karat=14k" , "count" : 10138}
{ "_id" : "Stone Color=Clear" , "count" : 1648}
{ "_id" : "Metal=White gold" , "count" : 10852}
Single operations to insert / update:
db.facet.update( { _id: "Accessory Type=Hosiery" },
{ $inc: 1 }, true, false)
The facet with lowest count is the most restrictive…
It should come first in the query!
Merchandising – Facet
38
Merchandising – Query stats
Department Category Price Primary
attribute
Time
Average
(ms)
90th (ms) 95th (ms)
1 0 0 0 2 3 3
1 1 0 0 1 2 2
1 0 1 0 1 2 3
1 1 1 0 1 2 2
1 0 0 1 0 1 2
1 1 0 1 0 1 1
1 0 1 1 1 2 2
1 1 1 1 0 1 1
1 0 0 2 1 3 3
1 1 0 2 0 2 2
1 0 1 2 10 20 35
1 1 1 2 0 1 1
Inventory
42
Inventory – Traditional Architecture
Relational DB
System of Records
Nightly
Batches
Analytics,
Aggregations,
Reports
Caching
Layer
Field Inventory
Internal &
External Apps
Point-in-time
Loads
43
Opportunities Missed
• Can’t reliability detect availability
• Can't redirect purchasers to in-store pickup
• Can’t do intra-day replenishment
• Degraded customer experience
• Higher internal expense
44
Inventory – Principles
• Single view of the inventory
• Used by most services and channels
• Read dominated workload
• Local, real-time writes
• Bulk writes for refresh
• Geographically distributed
• Horizontally scalable
45
Inventory – Requirements
Requirement Challenge MongoDB
Single view of
inventory
Ensure availability of
inventory information on
all channels and
services
Developer-friendly,
document-oriented
storage
High volume,
low latency reads
Anytime, anywhere
access to inventory
data without
overloading the system
of record
Fast, indexed reads
Local reads
Horizontal scaling
Bulk updates,
intra-day deltas
Provide window-in-time
consistency for highly
available services
Bulk writes
Fast, in-place updates
Horizontal scaling
Rapid application
development cycles
Deliver new services
rapidly to capture new
opportunities
Flexible schema
Rich query language
Agile-friendly iterations
46
Inventory – Target Architecture
Relational DB
System of Records
Analytics,
Aggregations,
Reports
Field Inventory
Internal &
External Apps
Inventory
Assortments
Shipments
Audits
Products
Stores
Point-in-time
Loads
Nightly
Refresh
Real-time
Updates
47
Horizontal Scaling
Inventory – Technical Decisions
Store
Inventory
Schema
Indexing
48
Inventory – Collections
Stores Inventory
Products
Audits
Assortmen
ts
Shipments
49
Stores – Sample Document
• > db.stores.findOne()
• {
• "_id" :
ObjectId("53549fd3e4b0aaf5d6d07f35"),
• "className" : "catalog.Store",
• "storeId" : "store0",
• "name" : "Bessemer store",
• "address" : {
• "addr1" : "1st Main St",
• "city" : "Bessemer",
• "state" : "AL",
• "zip" : "12345",
50
Stores – Sample Queries
• Get a store by storeId
db.stores.find({ "storeId" : "store0" })
• Get a store by zip code
db.stores.find({ "address.zip" : "12345" })
51
What’s near me?
52
Stores – Sample Geo Queries
• Get nearby stores sorted by distance
db.runCommand({
geoNear : "stores",
near : {
type : "Point",
coordinates : [-82.8006, 40.0908] },
maxDistance : 10000.0,
spherical : true
})
53
Stores – Sample Geo Queries
• Get the five nearest stores within 10 km
db.stores.find({
location : {
$near : {
$geometry : {
type : "Point",
coordinates : [-82.80, 40.09] },
$maxDistance : 10000 } }
}).limit(5)
54
Stores – Indices
• { "storeId" : 1 }
• { "name" : 1 }
• { "address.zip" : 1 }
• { "location" : "2dsphere" }
55
Inventory – Sample Document
• > db.inventory.findOne()
• {
• "_id": "5354869f300487d20b2b011d",
• "storeId": "store0",
• "location": [-86.95444, 33.40178],
• "productId": "p0",
• "vars": [
• { "sku": "sku1", "q": 14 },
• { "sku": "sku3", "q": 7 },
• { "sku": "sku7", "q": 32 },
• { "sku": "sku14", "q": 65 },
• ...
56
Inventory – Sample Queries
• Get all items in a store
db.inventory.find({ storeId : "store100" })
• Get quantity for an item at a store
db.inventory.find({
"storeId" : "store100",
"productId" : "p200"
})
57
Inventory – Sample Queries
• Get quantity for a sku at a store
db.inventory.find(
{
"storeId" : "store100",
"productId" : "p200",
"vars.sku" : "sku11736"
},
{ "vars.$" : 1 }
)
58
Inventory – Sample Update
• Increment / decrement inventory for an item at
a store
db.inventory.update(
{
"storeId" : "store100",
"productId" : "p200",
"vars.sku" : "sku11736"
},
{ "$inc" : { "vars.$.q" : 20 } }
)
59
Inventory – Sample Aggregations
• Aggregate total quantity for a product
db.inventory.aggregate( [
{ $match : { productId : "p200" } },
{ $unwind : "$vars" },
{ $group : {
_id : "result",
count : { $sum : "$vars.q" } } } ] )
{ "_id" : "result", "count" : 101752 }
60
Inventory – Sample Aggregations
• Aggregate total quantity for a store
db.inventory.aggregate( [
{ $match : { storeId : "store100" } },
{ $unwind : "$vars" },
{ $match : { "vars.q" : { $gt : 0 } } },
{ $group : {
_id : "result",
count : { $sum : 1 } } } ] )
{ "_id" : "result", "count" : 29347 }
61
Inventory – Sample Aggregations
• Aggregate total quantity for a store
db.inventory.aggregate( [
{ $match : { storeId : "store100" } },
{ $unwind : "$vars" },
{ $group : {
_id : "result",
count : { $sum : "$vars.q" } } } ] )
{ "_id" : "result", "count" : 29347 }
63
64
Inventory – Sample Geo-Query
• Get inventory for an item near a point
db.runCommand( {
geoNear : "inventory",
near : {
type : "Point",
coordinates : [-82.8006, 40.0908] },
maxDistance : 10000.0,
spherical : true,
limit : 10,
query : { "productId" : "p200",
"vars.sku" : "sku11736" } } )
65
Inventory – Sample Geo-Query
• Get closest store with available sku
db.runCommand( {
geoNear : "inventory",
near : {
type : "Point",
coordinates : [-82.800672, 40.090844] },
maxDistance : 10000.0,
spherical : true,
limit : 1,
query : {
productId : "p200",
vars : {
$elemMatch : { sku : "sku11736",
q : { $gt : 0 } } } } } )
66
Inventory – Sample Geo-Aggregation
• Get count of inventory for an item near a point
db.inventory.aggregate( [
{ $geoNear: {
near : { type : "Point",
coordinates : [-82.800672, 40.090844] },
distanceField: "distance",
maxDistance: 10000.0, spherical : true,
query: { productId : "p200",
vars : { $elemMatch : { sku : "sku11736",
q : {$gt : 0} } } },
includeLocs: "dist.location",
num: 5 } },
{ $unwind: "$vars" },
{ $match: { "vars.sku" : "sku11736" } },
{ $group: { _id: "result", count: {$sum: "$vars.q"} } }])
67
Inventory – Sample Indices
• { storeId : 1 }
• { productId : 1, storeId : 1 }
• Why not "vars.sku"?
– { productId : 1, storeId : 1, "vars.sku" : 1 }
• { productId : 1, location : "2dsphere" }
68
Horizontal Scaling
Inventory – Technical Decisions
Store
Inventory
Schema
Indexing
69
Shar
d
East
Shar
d
Centr
al
Shar
d
West
East DC
Inventory – Sharding Topology
West DC Central DC
Legacy
Inventory
Primary
Primary
Primary
70
Inventory – Shard Key
• Choose shard key
– { productId : 1, storeId : 1 }
• Set up sharding
– sh.enableSharding("inventoryDB")
– sh.shardCollection(
"inventoryDB.inventory",
{ productId : 1, storeId : 1 } )
71
Inventory – Shard Tags
• Set up shard tags
– sh.addShardTag("shard0000", "west")
– sh.addShardTag("shard0001", "central")
– sh.addShardTag("shard0002", "east")
• Set up tag ranges
– Add new field: region
– sh.addTagRange("inventoryDB.inventory",
{ region : 0 }, { region : 100}, "west" )
– sh.addTagRange("inventoryDB.inventory",
{ region : 100 }, { region : 200 }, "central" )
– sh.addTagRange("inventoryDB.inventory",
{ region : 200 }, { region : 300 }, "east" )
Insight
87
Insight
Insight
MongoDB
Advertising metrics
Clickstream
Recommendations
Session Capture
Activity Logging
Geo Tracking
Product Analytics
Customer Insight
Application Logs
88
Many user activities can be of interest:
• Search
• Product view, like or wish
• Shopping cart add / remove
• Sharing on social network
• Ad impression, Clickstream
Activity Logging – Data of interest
89
Will be used to compute:
• Product Map (relationships, etc)
• User Preferences
• Recommendations
• Trends …
Activity Logging – Data of interest
90
Activity logging - Architecture
MongoDB
HVDF
API
Activity Logging
User History
External
Analytics:
Hadoop,
Spark,
Storm,
…
User Preferences
Recommendations
Trends
Product Map
Apps
Internal
Analytics:
Aggregation,
MR
All user activity
is recorded
MongoDB –
Hadoop
Connector
Personalization
91
Activity Logging
92
• store and manage an incoming stream of data
samples
– High arrival rate of data from many sources
– Variable schema of arriving data
– control retention period of data
• compute derivative data sets based on these
samples
– Aggregations and statistics based on data
– Roll-up data into pre-computed reports and summaries
• low latency access to up-to-date data (user history)
– Flexible indexing of raw and derived data sets
– Rich querying based on time + meta-data fields in samples
Activity Logging – Problem statement
93
Activity logging - Requirements
Requirement MongoDB
Ingestion of 100ks of
writes / sec
Fast C++ process, multi-threads, multi-locks. Horizontal
scaling via sharding. Sequential IO via time partitioning.
Flexible schema Dynamic schema, each document is independent. Data is
stored the same format and size as it is inserted.
Fast querying on varied
fields, sorting
Secondary Btree indexes can lookup and sort the data in
milliseconds.
Easy clean up of old data Deletes are typically as expensive as inserts. Getting free
deletes via time partitioning.
94
Activity Logging using HVDF
HVDF (High Volume Data Feed):
• Open source reference implementation of high
volume writing with MongoDB
https://github.com/10gen-labs/hvdf
• Rest API server written in Java with most
popular libraries
• Public project, issues can be logged
https://jira.mongodb.org/browse/HVDF
• Can be run as-is, or customized as needed
95
Feed
High volume data feed architecture
Channel
Sample Sample Sample Sample
Source
Source
Processor
Inline
Processing
Batch
Processing
Stream
Processing
Grouping by Feed
and Channel
Sources send
samples
Processors generate
derivative Channels
96
HVDF -- High Volume Data Feed engine
HVDF – Reference implementation
REST
Service API
Processor
Plugins
Inline
Batch
Stream
Channel Data Storage
Raw
Channel
Data
Aggregated
Rollup T1
Aggregated
Rollup T2
Query Processor Streaming spout
Custom Stream
Processing Logic
Incoming Sample Stream
POST /feed/channel/data
GET
/feed/channeldata?time=XX
X&range=YYY
Real-time Queries
97
{ _id: ObjectId(),
geoCode: 1, // used to localize write operations
sessionId: "2373BB…",
device: { id: "1234",
type: "mobile/iphone",
userAgent: "Chrome/34.0.1847.131"
}
userId: "u123",
type: "VIEW|CART_ADD|CART_REMOVE|ORDER|…", // type of activity
itemId: "301671",
sku: "730223104376",
order: { id: "12520185",
… },
location: [ -86.95444, 33.40178 ],
tags: [ "smartphone", "iphone", … ], // associated tags
timeStamp: Date("2014/04/01 …")
}
User Activity - Model
98
Dynamic schema for sample data
Sample 1
{
deviceId: XXXX,
time: Date(…)
type: "VIEW",
…
}
Channel
Sample 2
{
deviceId: XXXX,
time: Date(…)
type: "CART_ADD",
cartId: 123, …
}
Sample 3
{
deviceId: XXXX,
time: Date(…)
type: “FB_LIKE”
}
Each sample
can have
variable fields
99
Channels are sharded
Shard
Shard
Shard
Shard
Shard
Shard Key:
Customer_id
Sample
{
customer_id: XXXX,
time: Date(…)
type: "VIEW",
}
Channel
You choose how
to partition
samples
Samples can
have dynamic
schema
Scale
horizontally by
adding shards
Each shard is
highly available
100
Channels are time partitioned
Channel
Sample Sample Sample Sample Sample Sample Sample Sample
- 2 days - 1 Day Today
Partitioning
keeps indexes
manageable
This is where all
of the writes
happen
Older partitions
are read only for
best possible
concurrency
Queries are routed
only to needed
partitions
Partition 1 Partition 2 Partition N
Each partition is
a separate
collection
Efficient and
space reclaiming
purging of old
data
101
Dynamic queries on Channels
Channel
Sample Sample Sample Sample
App
App
App
Indexes
Queries Pipelines Map-Reduce
Create custom
indexes on
Channels
Use full mongodb
query language to
access samples
Use mongodb
aggregation
pipelines to access
samples
Use mongodb
inline map-reduce
to access samples
Full access to
field, text, and geo
indexing
102
North America - West
North America - East
Europe
Geographically distributed system
Channel
Sample Sample Sample Sample
Source
Source
Source
Source
Source
Source
Sample
Sample
Sample
Sample
Geo shards per
location
Clients write
local nodes
Single view of
channel available
globally
103
Insight
104
Insight – Useful Data
Useful data for better shopping:
• User history (e.g. recently seen products)
• User statistics (e.g. total purchases, visits)
• User interests (e.g. likes videogames and SciFi)
• User social network
105
Insight – Useful Data
Useful data for selling more:
• Cross-selling: people who bought this item had
tendency to buy those other items (e.g. iPhone,
then bought iPhone case)
• Up-selling: people who looked at this item
eventually bought those items (alternative product
that may be better)
106
• Get the recent activity for a user, to populate the "recently
viewed" list
db.activities.find({ userId: "u123", time: { $gt: DATE }}).
sort({ time: -1 }).limit(100)
• Get the recent activity for a product, to populate the "N users
bought this in the past N hours" list
db.activities.find({ itemId: "301671", time: { $gt: DATE }}).
sort({ time: -1 }).limit(100)
• Indices: time, userId + time, deviceId + time, itemId + time
• All queries should be time bound, since this is a lot of data!
Insight – User History
107
• Get the recent number of views, purchases, etc for a user
db.activities.aggregate(([
{ $match: { userId: "u123", time: { $gt: DATE } }},
{ $group: { _id: "$type", count: {$sum: 1} } }])
• Get the total recent sales for a user
db.activities.aggregate(([
{ $match: { userId: "u123", time: { $gt: DATE }, type: "ORDER" }},
{ $group: { _id: "result", count: {$sum: "$totalPrice"} } }])
• Get the recent number of views, purchases, etc for an item
db.activities.aggregate(([
{ $match: { itemId: "301671", time: { $gt: DATE } }},
{ $group: { _id: "$type", count: {$sum: "1"} } }])
• Those aggregations are very fast, real-time
Insight – User Stats
108
• number of activities for unique visitors for the past hour. Calculation of
uniques is hard for any system!
db.activities.aggregate(([
{ $match: { time: { $gt: NOW-1H } }},
{ $group: { _id: "$userId", count: {$sum: 1} } }], { allowDiskUse: 1 })
• Aggregation above can have issues (single shard final grouping, result
not persisted). Map Reduce is a better alternative here
var map = function() { emit(this.userId, 1); }
var reduce = function(key, values) { return Array.sum(values); }
db.activities.mapreduce(map, reduce,
{ query: { time: { $gt: NOW-1H } },
out: { replace: "lastHourUniques", sharded: true })
db.lastHourUniques.find({ userId: "u123" }) // number activities for a user
db.lastHourUniques.count() // total uniques
Insight – User Stats
109
User Activity – Items bought together
Time to cross-sell!
110
Let's simplify each activity recorded as the following:
{ userId: "u123", type: order, itemId: 2, time: DATE }
{ userId: "u123", type: order, itemId: 3, time: DATE }
{ userId: "u234", type: order, itemId: 7, time: DATE }
Calculate items bought by a user with Map Reduce:
- Match activities of type "order" for the past 2 weeks
- map: emit the document by userId
- reduce: push all itemId in a list
- Output looks like { _id: "u123", items: [2, 3, 8] }
User Activity – Items bought together
111
Then run a 2nd mapreduce job from the previous output to compute
the number of occurrences of each item combination:
- query: go over all documents (1 document per userId)
- map: emit every combination of 2 items, starting with lowest
itemId
- reduce: sum up the total.
- output looks like { _id: { a: 2, b: 3 } , count: 36 }
User Activity – Items bought together
112
Then obtain the most popular combinations per item:
- Index created on { _id.a : 1, count: 1 } and { _id.b: 1, count: 1 }
- Query with a threshold:
- db.combinations.find( { _id.a: "u123", count: { $gt: 10 }} ).sort({ count: -1 })
- db.combinations.find( { _id.b: "u123", count: { $gt: 10 }} ).sort({ count: -1 })
Later we can create a more compact recommendation collection
that includes popular combinations with weights, like:
{ itemId: 2, recom: [ { itemId: 32, weight: 36},
{ itemId: 158, weight: 23}, … ] }
User Activity – Items bought together
113
User Activity – Hadoop integration
EDW
Management&Monitoring
Security&Auditing
RDBM
S
CRM, ERP, Collaboration, Mobile, BI
OS & Virtualization, Compute, Storage, Network
RDBMS
Applications
Infrastructure
Data Management
Operational Analytical
114
Commerce
Applications
powered by
Analysis
powered by
• Products & Inventory
• Recommended products
• Customer profile
• Session management
• Elastic pricing
• Recommendation models
• Predictive analytics
• Clickstream history
MongoDB
Connector for
Hadoop
115
Connector Overview
Data
Read/Write
MongoDB
Read/Write
BSON
Tools
MapReduce
Pig
Hive
Spark
Platforms
Apache Hadoop
Cloudera CDH
Hortonworks HDP
Amazon EMR
116
Connector Features and
Functionality
• Open-source on github
https://github.com/mongodb/mongo-hadoop
• Computes splits to read data
– Single Node, Replica Sets, Sharded Clusters
• Mappings for Pig and Hive
– MongoDB as a standard data source/destination
• Support for
– Filtering data with MongoDB queries
– Authentication
– Reading from Replica Set tags
– Appending to existing collections
117
MapReduce Configuration
• MongoDB input
– mongo.job.input.format = com.hadoop.MongoInputFormat
– mongo.input.uri = mongodb://mydb:27017/db1.collection1
• MongoDB output
– mongo.job.output.format = com.hadoop.MongoOutputFormat
– mongo.output.uri = mongodb://mydb:27017/db1.collection2
• BSON input/output
– mongo.job.input.format = com.hadoop.BSONFileInputFormat
– mapred.input.dir = hdfs:///tmp/database.bson
– mongo.job.output.format = com.hadoop.BSONFileOutputFormat
– mapred.output.dir = hdfs:///tmp/output.bson
118
Pig Mappings
• Input: BSONLoader and MongoLoader
data = LOAD ‘mongodb://mydb:27017/db.collection’
using com.mongodb.hadoop.pig.MongoLoader
• Output: BSONStorage and MongoInsertStorage
STORE records INTO ‘hdfs:///output.bson’
using com.mongodb.hadoop.pig.BSONStorage
119
Hive Support
CREATE TABLE mongo_users (id int, name string, age int)
STORED BY "com.mongodb.hadoop.hive.MongoStorageHandler"
WITH SERDEPROPERTIES("mongo.columns.mapping” =
"_id,name,age”) TBLPROPERTIES("mongo.uri" =
"mongodb://host:27017/test.users”)
• Access collections as Hive tables
• Use with MongoStorageHandler or
BSONStorageHandler
Thank You!
Antoine Girbal
Principal Solutions Engineer, MongoDB Inc.
@antoinegirbal
Retail Reference Architecture

More Related Content

What's hot

Data Quality Patterns in the Cloud with Azure Data Factory
Data Quality Patterns in the Cloud with Azure Data FactoryData Quality Patterns in the Cloud with Azure Data Factory
Data Quality Patterns in the Cloud with Azure Data FactoryMark Kromer
 
Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...
Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...
Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...Denodo
 
Data quality and bi
Data quality and biData quality and bi
Data quality and bijeffd00
 
Project Presentation on Data WareHouse
Project Presentation on Data WareHouseProject Presentation on Data WareHouse
Project Presentation on Data WareHouseAbhi Bhardwaj
 
Formalize Data Governance with Policies and Procedures
Formalize Data Governance with Policies and ProceduresFormalize Data Governance with Policies and Procedures
Formalize Data Governance with Policies and ProceduresDATAVERSITY
 
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...Amazon Web Services
 
03. Business Information Requirements Template
03. Business Information Requirements Template03. Business Information Requirements Template
03. Business Information Requirements TemplateAlan D. Duncan
 
Brd template uml-noble_inc
Brd template uml-noble_incBrd template uml-noble_inc
Brd template uml-noble_incUdaya Kumar
 
Data Quality Integration (ETL) Open Source
Data Quality Integration (ETL) Open SourceData Quality Integration (ETL) Open Source
Data Quality Integration (ETL) Open SourceStratebi
 
Data Management Maturity Assessment
Data Management Maturity AssessmentData Management Maturity Assessment
Data Management Maturity AssessmentFiras Hamdan
 
Create a 'Customer 360' with Master Data Management for Financial Services
Create a 'Customer 360' with Master Data Management for Financial ServicesCreate a 'Customer 360' with Master Data Management for Financial Services
Create a 'Customer 360' with Master Data Management for Financial ServicesPerficient, Inc.
 
Webinar: How Banks Manage Reference Data with MongoDB
 Webinar: How Banks Manage Reference Data with MongoDB Webinar: How Banks Manage Reference Data with MongoDB
Webinar: How Banks Manage Reference Data with MongoDBMongoDB
 
Master Data Management's Place in the Data Governance Landscape
Master Data Management's Place in the Data Governance Landscape Master Data Management's Place in the Data Governance Landscape
Master Data Management's Place in the Data Governance Landscape CCG
 
BI-Analytics-Overview.pptx
BI-Analytics-Overview.pptxBI-Analytics-Overview.pptx
BI-Analytics-Overview.pptxPerumalPitchandi
 
SAP GRC 10 Access Control
SAP GRC 10 Access ControlSAP GRC 10 Access Control
SAP GRC 10 Access ControlNasir Gondal
 
Oracle Retail Merchandise System
Oracle Retail Merchandise SystemOracle Retail Merchandise System
Oracle Retail Merchandise SystemAdeel Siddiqui
 

What's hot (20)

Data Quality Patterns in the Cloud with Azure Data Factory
Data Quality Patterns in the Cloud with Azure Data FactoryData Quality Patterns in the Cloud with Azure Data Factory
Data Quality Patterns in the Cloud with Azure Data Factory
 
Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...
Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...
Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...
 
Data quality and bi
Data quality and biData quality and bi
Data quality and bi
 
Solution architecture for big data projects
Solution architecture for big data projectsSolution architecture for big data projects
Solution architecture for big data projects
 
Project Presentation on Data WareHouse
Project Presentation on Data WareHouseProject Presentation on Data WareHouse
Project Presentation on Data WareHouse
 
SAP grc
SAP grc SAP grc
SAP grc
 
Formalize Data Governance with Policies and Procedures
Formalize Data Governance with Policies and ProceduresFormalize Data Governance with Policies and Procedures
Formalize Data Governance with Policies and Procedures
 
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
 
03. Business Information Requirements Template
03. Business Information Requirements Template03. Business Information Requirements Template
03. Business Information Requirements Template
 
Brd template uml-noble_inc
Brd template uml-noble_incBrd template uml-noble_inc
Brd template uml-noble_inc
 
Data Quality Integration (ETL) Open Source
Data Quality Integration (ETL) Open SourceData Quality Integration (ETL) Open Source
Data Quality Integration (ETL) Open Source
 
Enterprise Data Management
Enterprise Data ManagementEnterprise Data Management
Enterprise Data Management
 
Data Management Maturity Assessment
Data Management Maturity AssessmentData Management Maturity Assessment
Data Management Maturity Assessment
 
Create a 'Customer 360' with Master Data Management for Financial Services
Create a 'Customer 360' with Master Data Management for Financial ServicesCreate a 'Customer 360' with Master Data Management for Financial Services
Create a 'Customer 360' with Master Data Management for Financial Services
 
Webinar: How Banks Manage Reference Data with MongoDB
 Webinar: How Banks Manage Reference Data with MongoDB Webinar: How Banks Manage Reference Data with MongoDB
Webinar: How Banks Manage Reference Data with MongoDB
 
Master Data Management's Place in the Data Governance Landscape
Master Data Management's Place in the Data Governance Landscape Master Data Management's Place in the Data Governance Landscape
Master Data Management's Place in the Data Governance Landscape
 
BI-Analytics-Overview.pptx
BI-Analytics-Overview.pptxBI-Analytics-Overview.pptx
BI-Analytics-Overview.pptx
 
BRD Template
BRD Template BRD Template
BRD Template
 
SAP GRC 10 Access Control
SAP GRC 10 Access ControlSAP GRC 10 Access Control
SAP GRC 10 Access Control
 
Oracle Retail Merchandise System
Oracle Retail Merchandise SystemOracle Retail Merchandise System
Oracle Retail Merchandise System
 

Similar to Retail Reference Architecture

Retail Reference Architecture Part 3: Scalable Insight Component Providing Us...
Retail Reference Architecture Part 3: Scalable Insight Component Providing Us...Retail Reference Architecture Part 3: Scalable Insight Component Providing Us...
Retail Reference Architecture Part 3: Scalable Insight Component Providing Us...MongoDB
 
Unify Your Selling Channels in One Product Catalog Service
Unify Your Selling Channels in One Product Catalog ServiceUnify Your Selling Channels in One Product Catalog Service
Unify Your Selling Channels in One Product Catalog ServiceMongoDB
 
Prepare for Peak Holiday Season with MongoDB
Prepare for Peak Holiday Season with MongoDBPrepare for Peak Holiday Season with MongoDB
Prepare for Peak Holiday Season with MongoDBMongoDB
 
Calculating ROI with Innovative eCommerce Platforms
Calculating ROI with Innovative eCommerce PlatformsCalculating ROI with Innovative eCommerce Platforms
Calculating ROI with Innovative eCommerce PlatformsMongoDB
 
Webinar: Expanding Retail Frontiers with MongoDB
 Webinar: Expanding Retail Frontiers with MongoDB Webinar: Expanding Retail Frontiers with MongoDB
Webinar: Expanding Retail Frontiers with MongoDBMongoDB
 
PrestaShop features, demo and RetailOn extensions
PrestaShop features, demo and RetailOn extensionsPrestaShop features, demo and RetailOn extensions
PrestaShop features, demo and RetailOn extensionsRasbor.com
 
Webinar: Realizing Omni-Channel Retailing with MongoDB - One Step at a Time
Webinar: Realizing Omni-Channel Retailing with MongoDB - One Step at a TimeWebinar: Realizing Omni-Channel Retailing with MongoDB - One Step at a Time
Webinar: Realizing Omni-Channel Retailing with MongoDB - One Step at a TimeMongoDB
 
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And WhentranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And WhenDavid Peyruc
 
Webinar: Scaling MongoDB
Webinar: Scaling MongoDBWebinar: Scaling MongoDB
Webinar: Scaling MongoDBMongoDB
 
Expanding Retail Frontiers with MongoDB
Expanding Retail Frontiers with MongoDBExpanding Retail Frontiers with MongoDB
Expanding Retail Frontiers with MongoDBNorberto Leite
 
Data Science and Machine Learning for eCommerce and Retail
Data Science and Machine Learning for eCommerce and RetailData Science and Machine Learning for eCommerce and Retail
Data Science and Machine Learning for eCommerce and RetailAndrei Lopatenko
 
Salesforce Analytics Cloud - Explained
Salesforce Analytics Cloud - ExplainedSalesforce Analytics Cloud - Explained
Salesforce Analytics Cloud - ExplainedCarl Brundage
 
Data_Modeling_MongoDB.pdf
Data_Modeling_MongoDB.pdfData_Modeling_MongoDB.pdf
Data_Modeling_MongoDB.pdfjill734733
 
Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDB
 Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDB Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDB
Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDBMongoDB
 
Simplifying & accelerating application development with MongoDB's intelligent...
Simplifying & accelerating application development with MongoDB's intelligent...Simplifying & accelerating application development with MongoDB's intelligent...
Simplifying & accelerating application development with MongoDB's intelligent...Maxime Beugnet
 
Novedades de MongoDB 3.6
Novedades de MongoDB 3.6Novedades de MongoDB 3.6
Novedades de MongoDB 3.6MongoDB
 
Django introduction @ UGent
Django introduction @ UGentDjango introduction @ UGent
Django introduction @ UGentkevinvw
 
Migrating from RDBMS to MongoDB
Migrating from RDBMS to MongoDBMigrating from RDBMS to MongoDB
Migrating from RDBMS to MongoDBMongoDB
 
Qntev tech talk - contextual indexes at scale
Qntev tech talk - contextual indexes at scaleQntev tech talk - contextual indexes at scale
Qntev tech talk - contextual indexes at scaleShane Lewin
 
Personalisation packages in Umbraco
Personalisation packages in UmbracoPersonalisation packages in Umbraco
Personalisation packages in UmbracoAndy Butland
 

Similar to Retail Reference Architecture (20)

Retail Reference Architecture Part 3: Scalable Insight Component Providing Us...
Retail Reference Architecture Part 3: Scalable Insight Component Providing Us...Retail Reference Architecture Part 3: Scalable Insight Component Providing Us...
Retail Reference Architecture Part 3: Scalable Insight Component Providing Us...
 
Unify Your Selling Channels in One Product Catalog Service
Unify Your Selling Channels in One Product Catalog ServiceUnify Your Selling Channels in One Product Catalog Service
Unify Your Selling Channels in One Product Catalog Service
 
Prepare for Peak Holiday Season with MongoDB
Prepare for Peak Holiday Season with MongoDBPrepare for Peak Holiday Season with MongoDB
Prepare for Peak Holiday Season with MongoDB
 
Calculating ROI with Innovative eCommerce Platforms
Calculating ROI with Innovative eCommerce PlatformsCalculating ROI with Innovative eCommerce Platforms
Calculating ROI with Innovative eCommerce Platforms
 
Webinar: Expanding Retail Frontiers with MongoDB
 Webinar: Expanding Retail Frontiers with MongoDB Webinar: Expanding Retail Frontiers with MongoDB
Webinar: Expanding Retail Frontiers with MongoDB
 
PrestaShop features, demo and RetailOn extensions
PrestaShop features, demo and RetailOn extensionsPrestaShop features, demo and RetailOn extensions
PrestaShop features, demo and RetailOn extensions
 
Webinar: Realizing Omni-Channel Retailing with MongoDB - One Step at a Time
Webinar: Realizing Omni-Channel Retailing with MongoDB - One Step at a TimeWebinar: Realizing Omni-Channel Retailing with MongoDB - One Step at a Time
Webinar: Realizing Omni-Channel Retailing with MongoDB - One Step at a Time
 
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And WhentranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
 
Webinar: Scaling MongoDB
Webinar: Scaling MongoDBWebinar: Scaling MongoDB
Webinar: Scaling MongoDB
 
Expanding Retail Frontiers with MongoDB
Expanding Retail Frontiers with MongoDBExpanding Retail Frontiers with MongoDB
Expanding Retail Frontiers with MongoDB
 
Data Science and Machine Learning for eCommerce and Retail
Data Science and Machine Learning for eCommerce and RetailData Science and Machine Learning for eCommerce and Retail
Data Science and Machine Learning for eCommerce and Retail
 
Salesforce Analytics Cloud - Explained
Salesforce Analytics Cloud - ExplainedSalesforce Analytics Cloud - Explained
Salesforce Analytics Cloud - Explained
 
Data_Modeling_MongoDB.pdf
Data_Modeling_MongoDB.pdfData_Modeling_MongoDB.pdf
Data_Modeling_MongoDB.pdf
 
Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDB
 Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDB Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDB
Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDB
 
Simplifying & accelerating application development with MongoDB's intelligent...
Simplifying & accelerating application development with MongoDB's intelligent...Simplifying & accelerating application development with MongoDB's intelligent...
Simplifying & accelerating application development with MongoDB's intelligent...
 
Novedades de MongoDB 3.6
Novedades de MongoDB 3.6Novedades de MongoDB 3.6
Novedades de MongoDB 3.6
 
Django introduction @ UGent
Django introduction @ UGentDjango introduction @ UGent
Django introduction @ UGent
 
Migrating from RDBMS to MongoDB
Migrating from RDBMS to MongoDBMigrating from RDBMS to MongoDB
Migrating from RDBMS to MongoDB
 
Qntev tech talk - contextual indexes at scale
Qntev tech talk - contextual indexes at scaleQntev tech talk - contextual indexes at scale
Qntev tech talk - contextual indexes at scale
 
Personalisation packages in Umbraco
Personalisation packages in UmbracoPersonalisation packages in Umbraco
Personalisation packages in Umbraco
 

More from MongoDB

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump StartMongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB
 

More from MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Recently uploaded

Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 

Recently uploaded (20)

Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 

Retail Reference Architecture

  • 1. Retail Reference Architecture with MongoDB Antoine Girbal Principal Solutions Engineer, MongoDB Inc. @antoinegirbal
  • 3. 4 • it is way too broad to tackle with one solution • data maps so well to the document model • needs for agility, performance and scaling • Many (e)retailers are already using MongoDB • Let's define the best ways and places for it! Retail solution
  • 4. 5 • Holds complex JSON structures • Dynamic Schema for Agility • complex querying and in-place updating • Secondary, compound and geo indexing • full consistency, durability, atomic operations • Near linear scaling via sharding • Overall, MongoDB is a unique fit! MongoDB is a great fit
  • 5. 6 MongoDB Strategic Advantages Horizontally Scalable -Sharding Agile Flexible High Performance & Strong Consistency Application Highly Available -Replica Sets { customer: “roger”, date: new Date(), comment: “Spirited Away”, tags: [“Tezuka”, “Manga”]}
  • 6. 7 build your data to fit your application Relational MongoDB { customer_id : 1, name : "Mark Smith", city : "San Francisco", orders: [ { order_number : 13, store_id : 10, date: “2014-01-03”, products: [ {SKU: 24578234, Qty: 3, Unit_price: 350}, {SKU: 98762345, Qty: 1, Unit_Price: 110} ] }, { <...> } ] } CustomerID First Name Last Name City 0 John Doe New York 1 Mark Smith San Francisco 2 Jay Black Newark 3 Meagan White London 4 Edward Danields Boston Order Number Store ID Product Customer ID 10 100 Tablet 0 11 101 Smartphone 0 12 101 Dishwasher 0 13 200 Sofa 1 14 200 Coffee table 1 15 201 Suit 2
  • 7. 8 Notions RDBMS MongoDB Database Database Table Collection Row Document Column Field
  • 9. 10 Information Management Merchandising Content Inventory Customer Channel Sales & Fulfillment Insight Social Architecture Overview Customer Channels Amazon Ebay … Stores POS Kiosk … Mobile Smartphone Tablet Website Contact Center API Data and Service Integration Social Facebook Twitter … Data Warehouse Analytics Supply Chain Management System Suppliers 3rd Party In Network Web Servers Application Servers
  • 10. 11 Commerce Functional Components Information Layer Look & Feel Navigation Customization Personalization Branding Promotions Chat Ads Customer's Perspective Research Browse Search Select Shopping Cart Purchase Checkout Receive Track Use Feedback Maintain Dialog Assist Market / Offer Guide Offer Semantic Search Recommend Rule-based Decisions Pricing Coupons Sell / Fullfill Orders Payments Fraud Detection Fulfillment Business Rules Insight Session Capture Activity Monitoring Customer Enterprise Information Management Merchandising Content Inventory Customer Channel Sales & Fulfillment Insight Social
  • 13. 14 • Single view of a product, one central catalog service • Read volume high and sustained, 100k reads / s • Write volume spikes up during catalog update • Advanced indexing and querying • Geographical distribution and low latency • No need for a cache layer, CDN for assets Merchandising - principles
  • 14. 15 Merchandising - requirements Requirement Example Challenge MongoDB Single-view of product Blended description and hierarchy of product to ensure availability on all channels Flexible document-oriented storage High sustained read volume with low latency Constant querying from online users and sales associates, requiring immediate response Fast indexed querying, replication allows local copy of catalog, sharding for scaling Spiky and real-time write volume Bulk update of full catalog without impacting production, real-time touch update Fast in-place updating, real- time indexing, , sharding for scaling Advanced querying Find product based on color, size, description Ad-hoc querying on any field, advanced secondary and compound indexing
  • 15. 16 Merchandising - Product Page Product images General Informatio n List of Variants External Informatio n Localized Description
  • 16. 17 > db.item.findOne() { _id: "301671", // main item id department: "Shoes", category: "Shoes/Women/Pumps", brand: "Guess", thumbnail: "http://cdn…/pump.jpg", image: "http://cdn…/pump1.jpg", // larger version of thumbnail title: "Evening Platform Pumps", description: "Those evening platform pumps put the perfect finishing touches on your most glamourous night-on-the-town outfit", shortDescription: "Evening Platform Pumps", style: "Designer", type: "Platform", rating: 4.5, // user rating lastUpdated: Date("2014/04/01"), // last update time … } Merchandising - Item Model
  • 17. 18 • Get item by id db.definition.findOne( { _id: "301671" } ) • Get item from Product Ids db.definition.findOne( { _id: { $in: ["301671", "301672" ] } } ) • Get items by department db.definition.find({ department: "Shoes" }) • Get items by category prefix db.definition.find( { category: /^Shoes/Women/ } ) • Indices productId, department, category, lastUpdated Merchandising - Item Definition
  • 18. 19 > db.variant.findOne() { _id: "730223104376", // the sku itemId: "301671", // references item id thumbnail: "http://cdn…/pump-red.jpg", // variant specific image: "http://cdn…/pump-red.jpg", size: 6.0, color: "Red", width: "B", heelHeight: 5.0, lastUpdated: Date("2014/04/01"), // last update time … } Merchandising – Variant Model
  • 19. 20 • Get variant from SKU db.variation.find( { _id: "730223104376" } ) • Get all variants for a product, sorted by SKU db.variation.find( { productId: "301671" } ).sort( { _id: 1 } ) • Indices productId, lastUpdated Merchandising – Variant Model
  • 20. 22 Per store Pricing could result in billions of documents, unless you build it in a modular way Price: { _id: "sku730223104376_store123", currency: "USD", price: 89.95, lastUpdated: Date("2014/04/01"), // last update time … } _id: concatenation of item and store. Item: can be an item id or sku Store: can be a store group or store id. Indices: lastUpdated Merchandising – per store Pricing
  • 21. 23 • Get all prices for a given item db.prices.find( { _id: /^p301671_/ ) • Get all prices for a given sku (price could be at item level) db.prices.find( { _id: { $in: [ /^sku730223104376_/, /^p301671_/ ]) • Get minimum and maximum prices for a sku db.prices.aggregate( { match }, { $group: { _id: 1, min: { $min: price }, max: { $max : price} } }) • Get price for a sku and store id (returns up to 4 prices) db.prices.find( { _id: { $in: [ "sku730223104376_store1234", "sku730223104376_sgroup0", "p301671_store1234", "p301671_sgroup0"] , { price: 1 }) Merchandising – per store Pricing
  • 22. 26 Merchandising – Browse and Search products Browse by category Special Lists Filter by attributes Lists hundreds of item summaries Ideally a single query is issued to the database to obtain all items and metadata to display
  • 23. 27 The previous page presents many challenges: • Response within milliseconds for hundreds of items • Faceted search on many attributes: category, brand, … • Attributes at the variant level: color, size, etc, and the variation's image should be shown • thousands of variants for an item, need to de-duplicate • Efficient sorting on several attributes: price, popularity • Pagination feature which requires deterministic ordering Merchandising – Browse and Search products
  • 24. 28 Merchandising – Browse and Search products Hundreds of sizes One Item Dozens of colors A single item may have thousands of variants
  • 25. 29 Merchandising – Browse and Search products Images of the matching variants are displayed Hierarchy Sort parameter Faceted Search
  • 26. 30 Merchandising – Traditional Architecture Relational DB System of Records Full Text Search Engine Indexing #1 obtain search results IDs ApplicationCache #2 obtain objects by ID Pre-joined into objects
  • 27. 31 The traditional architecture issues: • 3 different systems to maintain: RDBMS, Search engine, Caching layer • search returns a list of IDs to be looked up in the cache, increases latency of response • RDBMS schema is complex and static • The search index is expensive to update • Setup does not allow efficient pagination Merchandising – Traditional Architecture
  • 28. 32 MongoDB Data Store Merchandising - Architecture SummariesItems Pricing PromotionsVariants Ratings & Reviews #1 Obtain results
  • 29. 33 The summary relies on the following parameters: • department e.g. "Shoes" • An indexed attribute – Category path, e.g. "Shoes/Women/Pumps" – Price range – List of Item Attributes, e.g. Brand = Guess – List of Variant Attributes, e.g. Color = red • A non-indexed attribute – List of Item Secondary Attributes, e.g. Style = Designer – List of Variant Secondary Attributes, e.g. heel height = 4.0 • Sorting, e.g. Price Low to High Merchandising – Summary Model
  • 30. 34 > db.summaries.findOne() { "_id": "p39", "title": "Evening Platform Pumps 39", "department": "Shoes", "category": "Shoes/Women/Pumps", "thumbnail": "http://cdn…/pump-small-39.jpg", "image": "http://cdn…/pump-39.jpg", "price": 145.99, "rating": 0.95, "attrs": [ { "brand" : "Guess"}, … ], "sattrs": [ { "style" : "Designer"} , { "type" : "Platform"}, …], "vars": [ { "sku": "sku2441", "thumbnail": "http://cdn…/pump-small-39.jpg.Blue", "image": "http://cdn…/pump-39.jpg.Blue", "attrs": [ { "size": 6.0 }, { "color": "Blue" }, …], "sattrs": [ { "width" : "B"} , { "heelHeight" : 5.0 }, …], }, … Many more skus … ] } Merchandising – Summary Model
  • 31. 35 • Get summary from item id db.variation.find({ _id: "p301671" }) • Get summary's specific variation from SKU db.variation.find( { "vars.sku": "730223104376" }, { "vars.$": 1 } ) • Get summary by department, sorted by rating db.variation.find( { department: "Shoes" } ).sort( { rating: 1 } ) • Get summary with mix of parameters db.variation.find( { department : "Shoes" , "vars.attrs" : { "color" : "Gray"} , "category" : ^/Shoes/Women/ , "price" : { "$gte" : 65.99 , "$lte" : 180.99 } } ) Merchandising - Summary Model
  • 32. 36 Merchandising – Summary Model • The following indices are used: – department + attr + category + _id – department + vars.attrs + category + _id – department + category + _id – department + price + _id – department + rating + _id • _id used for pagination • Can take advantage of index intersection • With several attributes specified (e.g. color=red and size=6), which one is looked up?
  • 33. 37 Facet samples: { "_id" : "Accessory Type=Hosiery" , "count" : 14} { "_id" : "Ladder Material=Steel" , "count" : 2} { "_id" : "Gold Karat=14k" , "count" : 10138} { "_id" : "Stone Color=Clear" , "count" : 1648} { "_id" : "Metal=White gold" , "count" : 10852} Single operations to insert / update: db.facet.update( { _id: "Accessory Type=Hosiery" }, { $inc: 1 }, true, false) The facet with lowest count is the most restrictive… It should come first in the query! Merchandising – Facet
  • 34. 38 Merchandising – Query stats Department Category Price Primary attribute Time Average (ms) 90th (ms) 95th (ms) 1 0 0 0 2 3 3 1 1 0 0 1 2 2 1 0 1 0 1 2 3 1 1 1 0 1 2 2 1 0 0 1 0 1 2 1 1 0 1 0 1 1 1 0 1 1 1 2 2 1 1 1 1 0 1 1 1 0 0 2 1 3 3 1 1 0 2 0 2 2 1 0 1 2 10 20 35 1 1 1 2 0 1 1
  • 36. 42 Inventory – Traditional Architecture Relational DB System of Records Nightly Batches Analytics, Aggregations, Reports Caching Layer Field Inventory Internal & External Apps Point-in-time Loads
  • 37. 43 Opportunities Missed • Can’t reliability detect availability • Can't redirect purchasers to in-store pickup • Can’t do intra-day replenishment • Degraded customer experience • Higher internal expense
  • 38. 44 Inventory – Principles • Single view of the inventory • Used by most services and channels • Read dominated workload • Local, real-time writes • Bulk writes for refresh • Geographically distributed • Horizontally scalable
  • 39. 45 Inventory – Requirements Requirement Challenge MongoDB Single view of inventory Ensure availability of inventory information on all channels and services Developer-friendly, document-oriented storage High volume, low latency reads Anytime, anywhere access to inventory data without overloading the system of record Fast, indexed reads Local reads Horizontal scaling Bulk updates, intra-day deltas Provide window-in-time consistency for highly available services Bulk writes Fast, in-place updates Horizontal scaling Rapid application development cycles Deliver new services rapidly to capture new opportunities Flexible schema Rich query language Agile-friendly iterations
  • 40. 46 Inventory – Target Architecture Relational DB System of Records Analytics, Aggregations, Reports Field Inventory Internal & External Apps Inventory Assortments Shipments Audits Products Stores Point-in-time Loads Nightly Refresh Real-time Updates
  • 41. 47 Horizontal Scaling Inventory – Technical Decisions Store Inventory Schema Indexing
  • 42. 48 Inventory – Collections Stores Inventory Products Audits Assortmen ts Shipments
  • 43. 49 Stores – Sample Document • > db.stores.findOne() • { • "_id" : ObjectId("53549fd3e4b0aaf5d6d07f35"), • "className" : "catalog.Store", • "storeId" : "store0", • "name" : "Bessemer store", • "address" : { • "addr1" : "1st Main St", • "city" : "Bessemer", • "state" : "AL", • "zip" : "12345",
  • 44. 50 Stores – Sample Queries • Get a store by storeId db.stores.find({ "storeId" : "store0" }) • Get a store by zip code db.stores.find({ "address.zip" : "12345" })
  • 46. 52 Stores – Sample Geo Queries • Get nearby stores sorted by distance db.runCommand({ geoNear : "stores", near : { type : "Point", coordinates : [-82.8006, 40.0908] }, maxDistance : 10000.0, spherical : true })
  • 47. 53 Stores – Sample Geo Queries • Get the five nearest stores within 10 km db.stores.find({ location : { $near : { $geometry : { type : "Point", coordinates : [-82.80, 40.09] }, $maxDistance : 10000 } } }).limit(5)
  • 48. 54 Stores – Indices • { "storeId" : 1 } • { "name" : 1 } • { "address.zip" : 1 } • { "location" : "2dsphere" }
  • 49. 55 Inventory – Sample Document • > db.inventory.findOne() • { • "_id": "5354869f300487d20b2b011d", • "storeId": "store0", • "location": [-86.95444, 33.40178], • "productId": "p0", • "vars": [ • { "sku": "sku1", "q": 14 }, • { "sku": "sku3", "q": 7 }, • { "sku": "sku7", "q": 32 }, • { "sku": "sku14", "q": 65 }, • ...
  • 50. 56 Inventory – Sample Queries • Get all items in a store db.inventory.find({ storeId : "store100" }) • Get quantity for an item at a store db.inventory.find({ "storeId" : "store100", "productId" : "p200" })
  • 51. 57 Inventory – Sample Queries • Get quantity for a sku at a store db.inventory.find( { "storeId" : "store100", "productId" : "p200", "vars.sku" : "sku11736" }, { "vars.$" : 1 } )
  • 52. 58 Inventory – Sample Update • Increment / decrement inventory for an item at a store db.inventory.update( { "storeId" : "store100", "productId" : "p200", "vars.sku" : "sku11736" }, { "$inc" : { "vars.$.q" : 20 } } )
  • 53. 59 Inventory – Sample Aggregations • Aggregate total quantity for a product db.inventory.aggregate( [ { $match : { productId : "p200" } }, { $unwind : "$vars" }, { $group : { _id : "result", count : { $sum : "$vars.q" } } } ] ) { "_id" : "result", "count" : 101752 }
  • 54. 60 Inventory – Sample Aggregations • Aggregate total quantity for a store db.inventory.aggregate( [ { $match : { storeId : "store100" } }, { $unwind : "$vars" }, { $match : { "vars.q" : { $gt : 0 } } }, { $group : { _id : "result", count : { $sum : 1 } } } ] ) { "_id" : "result", "count" : 29347 }
  • 55. 61 Inventory – Sample Aggregations • Aggregate total quantity for a store db.inventory.aggregate( [ { $match : { storeId : "store100" } }, { $unwind : "$vars" }, { $group : { _id : "result", count : { $sum : "$vars.q" } } } ] ) { "_id" : "result", "count" : 29347 }
  • 56. 63
  • 57. 64 Inventory – Sample Geo-Query • Get inventory for an item near a point db.runCommand( { geoNear : "inventory", near : { type : "Point", coordinates : [-82.8006, 40.0908] }, maxDistance : 10000.0, spherical : true, limit : 10, query : { "productId" : "p200", "vars.sku" : "sku11736" } } )
  • 58. 65 Inventory – Sample Geo-Query • Get closest store with available sku db.runCommand( { geoNear : "inventory", near : { type : "Point", coordinates : [-82.800672, 40.090844] }, maxDistance : 10000.0, spherical : true, limit : 1, query : { productId : "p200", vars : { $elemMatch : { sku : "sku11736", q : { $gt : 0 } } } } } )
  • 59. 66 Inventory – Sample Geo-Aggregation • Get count of inventory for an item near a point db.inventory.aggregate( [ { $geoNear: { near : { type : "Point", coordinates : [-82.800672, 40.090844] }, distanceField: "distance", maxDistance: 10000.0, spherical : true, query: { productId : "p200", vars : { $elemMatch : { sku : "sku11736", q : {$gt : 0} } } }, includeLocs: "dist.location", num: 5 } }, { $unwind: "$vars" }, { $match: { "vars.sku" : "sku11736" } }, { $group: { _id: "result", count: {$sum: "$vars.q"} } }])
  • 60. 67 Inventory – Sample Indices • { storeId : 1 } • { productId : 1, storeId : 1 } • Why not "vars.sku"? – { productId : 1, storeId : 1, "vars.sku" : 1 } • { productId : 1, location : "2dsphere" }
  • 61. 68 Horizontal Scaling Inventory – Technical Decisions Store Inventory Schema Indexing
  • 62. 69 Shar d East Shar d Centr al Shar d West East DC Inventory – Sharding Topology West DC Central DC Legacy Inventory Primary Primary Primary
  • 63. 70 Inventory – Shard Key • Choose shard key – { productId : 1, storeId : 1 } • Set up sharding – sh.enableSharding("inventoryDB") – sh.shardCollection( "inventoryDB.inventory", { productId : 1, storeId : 1 } )
  • 64. 71 Inventory – Shard Tags • Set up shard tags – sh.addShardTag("shard0000", "west") – sh.addShardTag("shard0001", "central") – sh.addShardTag("shard0002", "east") • Set up tag ranges – Add new field: region – sh.addTagRange("inventoryDB.inventory", { region : 0 }, { region : 100}, "west" ) – sh.addTagRange("inventoryDB.inventory", { region : 100 }, { region : 200 }, "central" ) – sh.addTagRange("inventoryDB.inventory", { region : 200 }, { region : 300 }, "east" )
  • 66. 87 Insight Insight MongoDB Advertising metrics Clickstream Recommendations Session Capture Activity Logging Geo Tracking Product Analytics Customer Insight Application Logs
  • 67. 88 Many user activities can be of interest: • Search • Product view, like or wish • Shopping cart add / remove • Sharing on social network • Ad impression, Clickstream Activity Logging – Data of interest
  • 68. 89 Will be used to compute: • Product Map (relationships, etc) • User Preferences • Recommendations • Trends … Activity Logging – Data of interest
  • 69. 90 Activity logging - Architecture MongoDB HVDF API Activity Logging User History External Analytics: Hadoop, Spark, Storm, … User Preferences Recommendations Trends Product Map Apps Internal Analytics: Aggregation, MR All user activity is recorded MongoDB – Hadoop Connector Personalization
  • 71. 92 • store and manage an incoming stream of data samples – High arrival rate of data from many sources – Variable schema of arriving data – control retention period of data • compute derivative data sets based on these samples – Aggregations and statistics based on data – Roll-up data into pre-computed reports and summaries • low latency access to up-to-date data (user history) – Flexible indexing of raw and derived data sets – Rich querying based on time + meta-data fields in samples Activity Logging – Problem statement
  • 72. 93 Activity logging - Requirements Requirement MongoDB Ingestion of 100ks of writes / sec Fast C++ process, multi-threads, multi-locks. Horizontal scaling via sharding. Sequential IO via time partitioning. Flexible schema Dynamic schema, each document is independent. Data is stored the same format and size as it is inserted. Fast querying on varied fields, sorting Secondary Btree indexes can lookup and sort the data in milliseconds. Easy clean up of old data Deletes are typically as expensive as inserts. Getting free deletes via time partitioning.
  • 73. 94 Activity Logging using HVDF HVDF (High Volume Data Feed): • Open source reference implementation of high volume writing with MongoDB https://github.com/10gen-labs/hvdf • Rest API server written in Java with most popular libraries • Public project, issues can be logged https://jira.mongodb.org/browse/HVDF • Can be run as-is, or customized as needed
  • 74. 95 Feed High volume data feed architecture Channel Sample Sample Sample Sample Source Source Processor Inline Processing Batch Processing Stream Processing Grouping by Feed and Channel Sources send samples Processors generate derivative Channels
  • 75. 96 HVDF -- High Volume Data Feed engine HVDF – Reference implementation REST Service API Processor Plugins Inline Batch Stream Channel Data Storage Raw Channel Data Aggregated Rollup T1 Aggregated Rollup T2 Query Processor Streaming spout Custom Stream Processing Logic Incoming Sample Stream POST /feed/channel/data GET /feed/channeldata?time=XX X&range=YYY Real-time Queries
  • 76. 97 { _id: ObjectId(), geoCode: 1, // used to localize write operations sessionId: "2373BB…", device: { id: "1234", type: "mobile/iphone", userAgent: "Chrome/34.0.1847.131" } userId: "u123", type: "VIEW|CART_ADD|CART_REMOVE|ORDER|…", // type of activity itemId: "301671", sku: "730223104376", order: { id: "12520185", … }, location: [ -86.95444, 33.40178 ], tags: [ "smartphone", "iphone", … ], // associated tags timeStamp: Date("2014/04/01 …") } User Activity - Model
  • 77. 98 Dynamic schema for sample data Sample 1 { deviceId: XXXX, time: Date(…) type: "VIEW", … } Channel Sample 2 { deviceId: XXXX, time: Date(…) type: "CART_ADD", cartId: 123, … } Sample 3 { deviceId: XXXX, time: Date(…) type: “FB_LIKE” } Each sample can have variable fields
  • 78. 99 Channels are sharded Shard Shard Shard Shard Shard Shard Key: Customer_id Sample { customer_id: XXXX, time: Date(…) type: "VIEW", } Channel You choose how to partition samples Samples can have dynamic schema Scale horizontally by adding shards Each shard is highly available
  • 79. 100 Channels are time partitioned Channel Sample Sample Sample Sample Sample Sample Sample Sample - 2 days - 1 Day Today Partitioning keeps indexes manageable This is where all of the writes happen Older partitions are read only for best possible concurrency Queries are routed only to needed partitions Partition 1 Partition 2 Partition N Each partition is a separate collection Efficient and space reclaiming purging of old data
  • 80. 101 Dynamic queries on Channels Channel Sample Sample Sample Sample App App App Indexes Queries Pipelines Map-Reduce Create custom indexes on Channels Use full mongodb query language to access samples Use mongodb aggregation pipelines to access samples Use mongodb inline map-reduce to access samples Full access to field, text, and geo indexing
  • 81. 102 North America - West North America - East Europe Geographically distributed system Channel Sample Sample Sample Sample Source Source Source Source Source Source Sample Sample Sample Sample Geo shards per location Clients write local nodes Single view of channel available globally
  • 83. 104 Insight – Useful Data Useful data for better shopping: • User history (e.g. recently seen products) • User statistics (e.g. total purchases, visits) • User interests (e.g. likes videogames and SciFi) • User social network
  • 84. 105 Insight – Useful Data Useful data for selling more: • Cross-selling: people who bought this item had tendency to buy those other items (e.g. iPhone, then bought iPhone case) • Up-selling: people who looked at this item eventually bought those items (alternative product that may be better)
  • 85. 106 • Get the recent activity for a user, to populate the "recently viewed" list db.activities.find({ userId: "u123", time: { $gt: DATE }}). sort({ time: -1 }).limit(100) • Get the recent activity for a product, to populate the "N users bought this in the past N hours" list db.activities.find({ itemId: "301671", time: { $gt: DATE }}). sort({ time: -1 }).limit(100) • Indices: time, userId + time, deviceId + time, itemId + time • All queries should be time bound, since this is a lot of data! Insight – User History
  • 86. 107 • Get the recent number of views, purchases, etc for a user db.activities.aggregate(([ { $match: { userId: "u123", time: { $gt: DATE } }}, { $group: { _id: "$type", count: {$sum: 1} } }]) • Get the total recent sales for a user db.activities.aggregate(([ { $match: { userId: "u123", time: { $gt: DATE }, type: "ORDER" }}, { $group: { _id: "result", count: {$sum: "$totalPrice"} } }]) • Get the recent number of views, purchases, etc for an item db.activities.aggregate(([ { $match: { itemId: "301671", time: { $gt: DATE } }}, { $group: { _id: "$type", count: {$sum: "1"} } }]) • Those aggregations are very fast, real-time Insight – User Stats
  • 87. 108 • number of activities for unique visitors for the past hour. Calculation of uniques is hard for any system! db.activities.aggregate(([ { $match: { time: { $gt: NOW-1H } }}, { $group: { _id: "$userId", count: {$sum: 1} } }], { allowDiskUse: 1 }) • Aggregation above can have issues (single shard final grouping, result not persisted). Map Reduce is a better alternative here var map = function() { emit(this.userId, 1); } var reduce = function(key, values) { return Array.sum(values); } db.activities.mapreduce(map, reduce, { query: { time: { $gt: NOW-1H } }, out: { replace: "lastHourUniques", sharded: true }) db.lastHourUniques.find({ userId: "u123" }) // number activities for a user db.lastHourUniques.count() // total uniques Insight – User Stats
  • 88. 109 User Activity – Items bought together Time to cross-sell!
  • 89. 110 Let's simplify each activity recorded as the following: { userId: "u123", type: order, itemId: 2, time: DATE } { userId: "u123", type: order, itemId: 3, time: DATE } { userId: "u234", type: order, itemId: 7, time: DATE } Calculate items bought by a user with Map Reduce: - Match activities of type "order" for the past 2 weeks - map: emit the document by userId - reduce: push all itemId in a list - Output looks like { _id: "u123", items: [2, 3, 8] } User Activity – Items bought together
  • 90. 111 Then run a 2nd mapreduce job from the previous output to compute the number of occurrences of each item combination: - query: go over all documents (1 document per userId) - map: emit every combination of 2 items, starting with lowest itemId - reduce: sum up the total. - output looks like { _id: { a: 2, b: 3 } , count: 36 } User Activity – Items bought together
  • 91. 112 Then obtain the most popular combinations per item: - Index created on { _id.a : 1, count: 1 } and { _id.b: 1, count: 1 } - Query with a threshold: - db.combinations.find( { _id.a: "u123", count: { $gt: 10 }} ).sort({ count: -1 }) - db.combinations.find( { _id.b: "u123", count: { $gt: 10 }} ).sort({ count: -1 }) Later we can create a more compact recommendation collection that includes popular combinations with weights, like: { itemId: 2, recom: [ { itemId: 32, weight: 36}, { itemId: 158, weight: 23}, … ] } User Activity – Items bought together
  • 92. 113 User Activity – Hadoop integration EDW Management&Monitoring Security&Auditing RDBM S CRM, ERP, Collaboration, Mobile, BI OS & Virtualization, Compute, Storage, Network RDBMS Applications Infrastructure Data Management Operational Analytical
  • 93. 114 Commerce Applications powered by Analysis powered by • Products & Inventory • Recommended products • Customer profile • Session management • Elastic pricing • Recommendation models • Predictive analytics • Clickstream history MongoDB Connector for Hadoop
  • 95. 116 Connector Features and Functionality • Open-source on github https://github.com/mongodb/mongo-hadoop • Computes splits to read data – Single Node, Replica Sets, Sharded Clusters • Mappings for Pig and Hive – MongoDB as a standard data source/destination • Support for – Filtering data with MongoDB queries – Authentication – Reading from Replica Set tags – Appending to existing collections
  • 96. 117 MapReduce Configuration • MongoDB input – mongo.job.input.format = com.hadoop.MongoInputFormat – mongo.input.uri = mongodb://mydb:27017/db1.collection1 • MongoDB output – mongo.job.output.format = com.hadoop.MongoOutputFormat – mongo.output.uri = mongodb://mydb:27017/db1.collection2 • BSON input/output – mongo.job.input.format = com.hadoop.BSONFileInputFormat – mapred.input.dir = hdfs:///tmp/database.bson – mongo.job.output.format = com.hadoop.BSONFileOutputFormat – mapred.output.dir = hdfs:///tmp/output.bson
  • 97. 118 Pig Mappings • Input: BSONLoader and MongoLoader data = LOAD ‘mongodb://mydb:27017/db.collection’ using com.mongodb.hadoop.pig.MongoLoader • Output: BSONStorage and MongoInsertStorage STORE records INTO ‘hdfs:///output.bson’ using com.mongodb.hadoop.pig.BSONStorage
  • 98. 119 Hive Support CREATE TABLE mongo_users (id int, name string, age int) STORED BY "com.mongodb.hadoop.hive.MongoStorageHandler" WITH SERDEPROPERTIES("mongo.columns.mapping” = "_id,name,age”) TBLPROPERTIES("mongo.uri" = "mongodb://host:27017/test.users”) • Access collections as Hive tables • Use with MongoStorageHandler or BSONStorageHandler
  • 99. Thank You! Antoine Girbal Principal Solutions Engineer, MongoDB Inc. @antoinegirbal

Editor's Notes

  1. How does eventual consistency fit into the idea of inventory? Is something in stock or out of stock? Items on hand matters at order time. What about at buying time? Are we pitching this as the system of record for inventory or as a single view on top of multiple, discrete inventory systems?
  2. In a single view sort of application, where we’re designing for many use-cases instead of a single application, how do we handle schema design trade-offs?
  3. Challenges for every service/component: Schema Indexing Sharding Most important criteria: User facing latency Linear scaling of services
  4. Not shown on this slide: Audit collection Assortments – list of items in an order that a shop is going to make (backorder?) Shipments – going to stores one sku per item fast reading / writing to support updating the inventor in real time Make this like slide 30, drop the fields, just show the collection relations
  5. Fix stream box. Add validator box.