As a software adventurer, Charles “Indy” Sarrazin, has brought numerous customers through the MongoDB world, using his extensive knowledge to make sure they always got the most out of their databases.
Let us embark on a journey inside the Document Model, where we will identify, analyze and fix anti-patterns. I will also provide you with tools to ease migration strategies towards the Temple of Lost Performance!
Be warned, though! You might want to learn about design patterns before, in order to survive this exhilarating trial!
14. The Fauna
The Anti-Pattern
§ Access patterns are actually
different based on document type
§ Each document type depends on a
specific index
§ No common access patterns
The Actual Reason
§ While indexes improve reads, they
might negatively impact writes
§ You may only have up to 64
indexes in a single collection
§ If you don’t use Partial or Sparse
indexes, null or absent values will
still be indexed
15. The Fauna
Takeaways
§ Documents sharing different access pattern or business logic
should be stored in separate collections
§ You can temporarily rely on Partial Indexes in order to reduce the
size of indexes and performance impact
§ Spending a just a little time for schema design is important
17. The Squashed Database
Symptoms
§ High IOPS (random reads/writes)
§ Low throughput
§ High yields and/or nReturned
§ High index size
18. The Squashed Database
The Anti-Pattern
§ Flat documents stored in separate
collections
§ Only using root-level fields and no
hierarchy
The Actual Reason
§ In order to parse a flat document,
MongoDB will read each field
sequentially
§ Normalization also means
redundant data (relations)
§ Data needs to be consolidated
using JOINs ($lookup)
19. The Squashed Database
Takeaways
§ Simply transposing your data model from a RDBMS to MongoDB
won’t be as helpful for scaling up
§ Consider grouping data from multiple tables in a single collection,
by embedding the relations (1:1, 1:n) when data volume is
reasonable
21. $project the Elephant
Symptoms
§ High read IOPS
§ High cache activity (bytes read into cache)
§ High number of yields when reading a single document
§ Slow indexed queries when reading a single document
§ Result length lower than document size
§ Generally, big document size (> 200+ KB)
22. $project the Elephant
The Anti-Pattern
§ Using big document (>100kb)
while only projecting a few fields
The Actual Reason
§ Documents are the base level
transfer unit from disk to memory
§ Even when using a single field, the
whole document is loaded from
disk to the WiredTiger cache
23. $project the Elephant
Takeaways
§ Use smaller documents with more
frequently accessed data
§ Store less frequently accessed data
in another collection
Also known as the Subset Pattern
https://www.mongodb.com/blog/post/building-with-patterns-the-subset-pattern
25. The Single-Person Bridge
Symptoms
§ Some updates seem to take a long time
§ MongoDB logs show writeConflicts>0 for these updates
§ The application seems to perform write operations sequentially
26. The Single-Person Bridge
The Anti-Pattern
§ Simulating a SQL sequence by
using a counter document and
findOneAndModify
The Actual Reason
§ As WiredTiger uses a document-
level lock, concurrent updates to a
single document will block other
writes to the same document
29. Sorted Monkeys
Symptoms
§ Very high Oplog churn (Oplog GB/Hour)
§ Low Oplog window with default Oplog size
§ Oplog size is very high compared to data size to ensure proper
operations (target Oplog window > 3 days)
30. Sorted Monkeys
The Anti-Pattern
§ Using $push on big arrays (>20
entries) with:
§ The $sort modifier
§ The $slice modifier
The Actual Reason
§ Oplog operations are idempotent,
meaning that these operations are
replaced with a $set statement,
replacing the full array.
31. Sorted Monkeys
Takeaways
§ Only rely on the $slice and $sort modifiers when manipulating
small arrays
§ You can rely on in-memory or application-level sorts for medium-
sized result sets
32. The Tree in the House
a.k.a. « Push until the End »
33. The Tree in the House
Symptoms
§ Your application worked fine for some period of time
§ After a while, some updates fail with:
Resulting document after update is larger than 16777216
34. The Tree in the House
The Anti-Pattern
§ Using unbounded arrays for
storing data (e.g. Audit logs for
tracing document updates)
The Actual Reason
§ MongoDB documents are limited
to 16MB
§ Depending on relationship, you
might reach maximum document
size if not careful
35. The Tree in the House
Takeaways
§ For 1:n relationships, you need to
consider cardinality
§ Differentiate 1 to few (<10k array
elements) from 1 to zillions
§ Consider using the Subset, Outlier
or Bucket patterns
37. Considerations
§ Availability
§ Can your business afford scheduled downtime?
§ Do you need to keep multiple versions of your app online?
§ Performance
§ How does the migration affect performance?
§ Rollback Strategy
§ How do we go back if we run into a problem?
§ Risk
§ What is the impact of a failed migration?
40. Blue/Green
Principles Pros
§ Always available
§ Easy rollback: change
router to point to
previous version
Cons
§ You need to be able to
sync the two DBs
§ Use ChangeStreams
§ You need double the
hardware or resources
41. Y-Write
Principles Pros
§ Always available
§ Easy rollback: stop
writing to new schema
§ Legacy applications can
still read from the old
schema
Cons
§ You need to be able to
sync the two DBs
§ Write logic needs to be
centralized and migrated
before read logic
42. Read & Upgrade
Principles Pros
§ Always available
§ Good performance
Cons
§ You need to consider
schema backward and
forward compatibility
§ Schema upgrade is part
of the application logic
§ Requires a depreciation
roadmap to remove
legacy code
43. Ensuring backward compatibility
Do
§ Insert data in existing collections
§ Add new field
§ Create a new collection/database
Don’t
§ Rename/Remove field
§ Remove data
§ Change field type or format
§ Remove/Rename
collection/database
46. Key takeaways
Regularly reassess your hypotheses
§ Your access patterns will change over time
§ Check your actual access patterns
47. Key takeaways
MongoDB provides flexible migration options
§ You can combine both online and offline schema migrations
§ Consider your development lifecycle and your release schedule to
choose your migration strategy
§ Use $jsonSchema to handle schema validation or check migration
status