This document summarizes a MongoDB webinar on advanced schema design patterns. It introduces common schema design patterns like attribute, subset, computed, and approximation patterns. It discusses how to use these patterns to address issues like large documents with many fields, working sets that don't fit in RAM, high CPU usage from repeated calculations, and changing schemas over time. The webinar provides examples of each pattern and encourages learning a common vocabulary for designing MongoDB schemas by applying these reusable patterns.
MongoDB .local Paris 2020: Tout savoir sur le moteur de recherche Full Text S...
Advanced Schema Design Patterns for MongoDB Systems
1. O C T O B E R 1 6 , 2 0 1 7 | M O N G O D B W E B I N A R
Advanced Schema
Design Patterns
2. # M D B l o c a l
{ "name": "Daniel Coupal",
"jobs_at_MongoDB": [
{ "job": "Senior Curriculum Engineer",
"from": new Date("2016-11") },
{ "job": "Senior Technical Service Engineer",
"from": new Date("2013-11") }
],
"previous_jobs": [
"Consultant",
"Developer",
"Manager Quality & Tools Team",
"Manager Software Team",
"Tools Developer"
],
"likes": [ "food", "beers", "movies", "MongoDB" ]
}
Who Am I?
3. # M D B l o c a l
The "Gang of Four":
A design pattern systematically names, explains,
and evaluates an important and recurring design
in object-oriented systems
MongoDB systems can also be built using its
own patterns
PATTERN
Pattern
4. # M D B l o c a l
• Enable teams to use a common methodology and vocabulary
when designing schemas for MongoDB
• Giving you the ability to model schemas using building blocks
• Less art and more methodology
Why this Talk?
5. # M D B l o c a l
Ensure:
• Good performance
• Scalability
despite constraints ➡
• Hardware
• RAM faster than Disk
• Disk cheaper than RAM
• Network latency
• Reduce costs $$$
• Database Server
• Maximum size for a document
• Atomicity of a write
• Data set
• Size of data
Why do we Create Models?
6. # M D B l o c a l
•Don’t over-design! •Design for:
•Performance
•Scalability
•Simplicity
However …
7. # M D B l o c a l
WMDB -
World Movie Database
Any events, characters and
entities depicted in this
presentation are fictional.
Any resemblance or similarity to
reality is entirely coincidental
8. # M D B l o c a l
WMDB -
World Movie Database
First iteration
3 collections:
A. movies
B. moviegoers
C. screenings
9. # M D B l o c a l
Our mission, should we decide to accept it, is to
fix this solution, so it can perform well and
scale.
As always, should I or anyone in the audience do
it without training, WMDB will disavow any
knowledge of our actions.
This tape will self-destruct in five seconds. Good
luck!
Mission Possible
10. # M D B l o c a l
Categories of Patterns
• Frequency of Access
• Subset ✓
• Approximation ✓
• Grouping
• Computed ✓
• Overflow
• Bucket
• Representation
• Attribute ✓
• Schema Versioning ✓
• Document Versioning
• Tree
• Pre-Allocation
11. # M D B l o c a l
{
title: "Moonlight",
...
release_USA: "2016/09/02",
release_Mexico: "2017/01/27",
release_France: "2017/02/01",
release_Festival_Mill_Valley:
"2017/10/10"
}
Would need the following indexes:
{ release_USA: 1 }
{ release_Mexico: 1 }
{ release_France: 1 }
...
{ release_Festival_Mill_Valley: 1 }
...
Issue #1: Big Documents, Many Fields
and Many Indexes
12. # M D B l o c a l
Pattern #1: Attribute
{
title: "Moonlight",
...
release_USA: "2016/09/02",
release_Mexico: "2017/01/27",
release_France: "2017/02/01",
release_Festival_Mill_Valley:
"2017/10/10"
}
13. # M D B l o c a l
Problem:
• Lots of similar fields
• Common characteristic to search across those fields together
• Fields present in only a small subset of documents
Use cases:
• Product attributes like ‘color’, ‘size’, ‘dimensions’, ...
• Release dates of a movie in different countries, festivals
Attribute Pattern
14. # M D B l o c a l
Solution:
• Field pairs in an array
Benefits:
• Allow for non deterministic list of attributes
• Easy to index
{ "releases.location": 1, "releases.date": 1 }
• Easy to extend with a qualifier, for example:
{ descriptor: "price", qualifier: "euros", value: Decimal(100.00) }
Attribute Pattern - Solution
15. # M D B l o c a l
Possible solutions:
A. Reduce the size of your working set
B. Add more RAM per machine
C. Start sharding or add more shards
Issue #2: Working Set doesn’t fit in RAM
16. # M D B l o c a l
WMDB -
World Movie Database
First iteration
3 collections:
A. movies
B. moviegoers
C. screenings
17. # M D B l o c a l
In this example, we can:
• Limit the list of actors and
crew to 20
• Limit the embedded reviews
to the top 20
• …
Pattern #2: Subset
18. # M D B l o c a l
Problem:
• There is a 1-N or N-N relationship, and only few documents
always need to be shown
• Only infrequently do you need to pull all of the depending
documents
Use cases:
• Main actors of a movie
• List of reviews or comments
Subset Pattern
19. # M D B l o c a l
Solution:
• Keep duplicates of a small subset of fields in the main collection
Benefits:
• Allows for fast data retrieval and a reduced working set size
• One query brings all the information needed for the "main page"
Subset Pattern - Solution
21. # M D B l o c a l
{
title: "Your Name",
...
viewings: 5,000
viewers: 385,000
revenues: 5,074,800
}
Issue #3: ..caused by repeated calculations
22. # M D B l o c a l
For example:
• Apply a sum, count, ...
• rollup data by minute, hour,
day
• As long as you don’t mess
with your source, you can
recreate the rollups
Pattern #3: Computed
23. # M D B l o c a l
Problem:
• There is data that needs to be computed
• The same calculations would happen over and over
• Reads outnumber writes:
• example: 1K writes per hour vs 1M read per hour
Use cases:
• Have revenues per movie showing, want to display sums
• Time series data, Event Sourcing
Computed Pattern
24. # M D B l o c a l
Solution:
• Apply a computation or operation on data and store the result
Benefits:
• Avoid re-computing the same thing over and over
• Replaces a view
Computed Pattern - Solution
25. # M D B l o c a l
Issue #4: Lots of Writes
Web page counters
Updates on movie data
Screenings
Other
26. # M D B l o c a l
Issue #4: … for non critical data
27. # M D B l o c a l
• Only increment once in X
iterations
• Increment by X
Pattern #4: Approximation
28. # M D B l o c a l
Web page counters
Updates on movie data
Screenings
Other
29. # M D B l o c a l
Problem:
• Data is difficult to calculate correctly
• May be too expensive to update the document every time to keep
an exact count
• No one gives a damn if the number is exact
Use cases:
• Population of a country
• Web site visits
Approximation Pattern
30. # M D B l o c a l
Solution:
• Fewer stronger writes
Benefits:
• Less writes, reducing contention on some documents
Approximation Pattern –
Solution
31. # M D B l o c a l
• Keeping track of the schema version of a document
Issue #5: Need to change the list of fields
in the documents
32. # M D B l o c a l
Add a field to track the
schema version number, per
document
Does not have to exist for
version 1
Pattern #5: Schema Versioning
33. # M D B l o c a l
Problem:
• Updating the schema of a database is:
• Not atomic
• Long operation
• May not want to update all documents, only do it on updates
Use cases:
• Practically any database that will go to production
Schema Versioning Pattern
34. # M D B l o c a l
Solution:
• Have a field keeping track of the schema version
Benefits:
• Don't need to update all the documents at once
• May not have to update documents until their next modification
Schema Versioning Pattern –
Solution
35. # M D B l o c a l
• How duplication is handled
A. Update both source and target in real time
B. Update target from source at regular intervals. Examples:
• Most popular items => update nightly
• Revenues from a movie => update every hour
• Last 10 reviews => update hourly? daily?
Aspect of Patterns: Consistency
36. # M D B l o c a l
• Bucket
• grouping documents together, to have less documents
• Document Versioning
• tracking of content changes in a document
• Outlier
• Avoid few documents drive the design, and impact performance for all
• Tree(s)
• Pre-allocation
Other Patterns
38. # M D B l o c a l
• Simple grouping from tables to collections is not optimal
• Learn a common vocabulary for designing schemas with
MongoDB
• Use patterns as "plug-and-play" for your future designs
• Attribute
• Subset
• Computed
• Approximation
• Schema Versioning
Take Aways
39. # M D B l o c a l
A full design example for a
given problem:
• E-commerce site
• Contents Management
System
• Social Networking
• Single view
• …
References for complete Solutions
40. # M D B l o c a l
• More patterns in a follow up to this presentation
• MongoDB in-person training courses on Schema Design
• Upcoming Online course at
MongoDB University:
• https://university.mongodb.com
• M220 Data Modeling
How Can I Learn More About Schema
Design?
41. # M D B l o c a l
daniel.coupal@mongodb.com
Thank You for
using MongoDB!