Cassandra community has consistently requested that we cover C* schema design concepts. This presentation goes in depth on the following topics:
- Schema design
- Best Practices
- Capacity Planning
- Real World Examples
5. Data Model
Columns sorted by comparator
name
356
Paul
group
34567
sex
male
name
54
kim
group
34566
sex
female
US:CA:Fremont
54353
US:CA:Hayward
34343
status
single
zip
94538
r
o
w
s
Composite columns
US:CA:San Jose
987556
population
Columns sorted by composite comparators
6. Do your Homework
① Understand your application requirements
② Identify your access patterns
③ Model around these access patterns
④ Denormalization is your new friend but…
⑤ Benchmark – Avoid Surprises
8. Edge Services Data Model
alloc
/xyz/jkl_1
000
active
yes
script
text
alloc
/xyl/jkl_2
111
active
yes
script
text
alloc
/xyl/jkl_3
222
active
yes
script
text
ROWID ALLOCATION ACTIVE SCRIPT
Script_location_version 000 YES OR NO
EDGE
SERVICE
CLUSTER
9. Edge Service Anti patterns
• High concurrency: Edge servers auto scale
• Range scans: Read all data
• Large payload: ~1MB of data
Very high read latency /
unstable cassandra
14. RDBMS -> CASSANDRA
user
id (primary key)
name
alias
email
movie
id (primary key)
title
description
user_movie_rating
id (primary key)
userId (foreign key)
movieId (foreign key)
rating
1 ∞ 1∞
Queries
Get email of userid 123
Get title and description of movieId 222
List all movie names and corresponding ratings for userId 123
List all users and corresponding rating for movieId 222
15. CASSANDRA MODEL
123
222:rating 222:title 534:rating 534:title 888:rating 888:title
4 rockstar 2 Finding
Nemo
1 Top Guns
movieI
d
userId
rating
222
334 455 544 633 789 999
2 5 1 2 2 3
123
name alias email
Nitish Korla buckwild nk@netflix.com
user
223
title description
Find Nemo Good luck
with that
movie
ratingsByMovie
ratingsByUser
userId
Seque
nce?
17. Viewing History
ROWID 1234454545 : 5466
Format
<Timeuuid> : <movieid>
1234454545 : 5466 1234454545 :
5466
1234454545 :
5466
Subscriber_id Playback/Bookmark related
SERRIALED DATA
Playback/Bookmark
related SERRIALED
DATA
Playback/Bookmark
related SERRIALED
DATA
Playback/Bookmar
k related
SERRIALED DATA
3454545_5
634534
JSON
3454546_5
JSON
3454547_5
JSON
3454555_9
JSON
3454560_9
JSON
3454580_9
JSON
454545_56
54534
JSON
4454546_5
JSON
4454547_5
JSON
4454555_9
JSON
5554560_9
JSON
5554580_9
JSON
3454545_5
69545 JSON
3454546_5
JSON
3454547_5
JSON
3454555_9
JSON
3454560_9
JSON
3454580_9
JSON
3454545_5
64354
JSON
3454546_5
JSON
3454547_5
JSON
3454555_9
JSON
3454560_9
JSON
3454580_9
JSON
18. Viewing History compression
ROWID 1234454545_5466
Format
<Timeuuid>_<movieid>
1234454546_5466 1234454547_5466 1234454548_5466
Subscriber_id Playback/Bookmark related
SERRIALED DATA
Playback/Bookmark
related SERRIALED DATA
Playback/Bookmark
related SERRIALED
DATA
Playback/Bookmark
related SERRIALED
DATA
Re-sort by movie id
Movie_id:[{playbackevent1,playbackevent2 ...... } ],
Movie_id:[{playbackevent1,playbackevent2 ...... } ],
Movie_id:[{playbackevent1,playbackevent2 ...... } ],
Movie_id:[{playbackevent1,playbackevent2 ...... } ],
Compress data
1
3
2
4 Store in separate column family
Reduced data size by 7
times
Operational processes
improved by 10 times
Money saved: $,$$$,$$$Improvement in app read
latency
19. Think Data Archival
• Data stores in Netflix grow exponentially
• Have a process in place to archive data
– DSE
– Moving to a separate column family
– Moving to a separate cluster (non SSD)
– Setting right expectations w.r.t latencies with historical
data
• Cassandra TTL’s
21. read-modify-write pattern
• Data read and written back (even if data was not
modified)
• Large BLOB’s
Cassandra under IO pressure
Peak traffic – compaction yet to run – high
read latency
22. read-modify-write pattern
• Do you really need to read data ?
• Avoid write if data has not changed – SSTable
creation – immutable SSTables created at backend
• Write with a new row key (Limit sstable scans). TTL
data
• If a batch process, throttle the write rate to let
compactions catch up
24. Observations
• Cassandra scales linearly without any noticeable
degradation to running cluster
• Self-healing : minimal operational noise
• Developers
– mindset need to shift from normalization to
denormalization
– Need to have reasonable understanding of Cassandra
architecture
– Enjoy the schema change flexibility. No more DDL locks/
DBA dependency
Start with some live example.. And then use it as segway to cover some best practices
RdbmsbackgroudKeyspace -> DBCF -> TableRow groups columnsEach column is a tripletColumn naming is not necessary/could be different. Column comparator specifies the sorting.. No need to stick to certain rules Name -> sortedTimestamp -> conflict resolution
Rows are indexedColumns are sorted based on comparator you specify, so use it to your benefitKeep column names short as they are repeated Column size = 15 bytes + size of name + size of value Don’t store empty columns if there is no need – schema free designCOMPOSITE COLUMNScustom inverted search indexes: when you want more control over the CF layout than a secondary indexa replacement for super columns: both and a means to offset some of the worst performance penalties associated with such, as well as extend the model to provide and arbitrary level of nestinggrouping otherwise static skinny rows into wider rows for greater efficiency
Cassandra is for point queriesStill ok for small set of rows
API servers autoscale or new push, they need to read majority of rows in scripts column family
Simple but powerful concept – based on premise thatrows are indexed and point looks are fasterCreate another column family and store list of all required rowid’s for faster lookup
Wide row can reside only on one node.. And that can create hot spotsSharding – application logic / buckets
20% performance loss due to parsing1.2netty protocol
Start with some live example.. And then use it as segway to cover some best practices
One to one mapping doesn’t workFifth normal form deals with cases where information can be reconstructed from smaller pieces of information that can be maintained with less redundancy. Second, third, and fourth normal forms also serve this purpose, but fifth normal form generalizes to cases not covered by the others. - multi-valued depedencies
Sequence in cassandra??Index lookupdenormalization
We don’t have linear growthTTL fascinating feature… coming from oracle backgroundViewing history dataWide row implementation, Compressed dataStored till perpetuitySome rows have ~20M of data (and growing)App code paginates through columns - Good thingCapacity considerationCassandra house keeping (more data -> repairs/bootstraps)
We don’t have linear growthTTL fascinating feature… coming from oracle backgroundViewing history dataWide row implementation, Compressed dataStored till perpetuitySome rows have ~20M of data (and growing)App code paginates through columns - Good thingCapacity considerationCassandra house keeping (more data -> repairs/bootstraps)
We don’t have linear growthTTL fascinating feature… coming from oracle background
Read is going to drive the latency of overall request
architecture to reap the benefits of distributed computing / high performance
2 digest query/ 1 complete data response. The optimization is only on the bandwidthNumber of replicas contacted depend on the consistency level specifiedHinted handoff, read repair, antientropy node repairDon’t expect cassandra as a load balancer
Commit log for durability – sequential writeMemtable – no disk access (no reads or seeks)Sstables written sequentially to the diskThe operational design integrates nicely with the operating system page cache. Because Cassandra does not modify the data, dirty pages that would have to be flushed are not even generated.