SlideShare une entreprise Scribd logo
1  sur  52
Apache Cassandra, part 2 – data model example, machinery
V. Data model example - Twissandra
Twissandra Use Cases Get the friends of a username Get the followers of a username Get a timeline of a specific user’s tweets Create a tweet Create a user Add friends to a user
Twissandra – DB User User id user_name password
Twissandra - DB Followers User User Followers id user_name password id user_name password user_id follower_id
Twissandra - DB Following User User Following id user_name password id user_name password user_id following_id
Twissandra – DB Tweets User Tweet id user_name password id user_id body timestamp
Twissandra column families User Username Friends, Followers Tweet Userline Timeline
Twissandra – Users CF <<CF>> User <<CF>> Username <<RowKey>> userid + username + password <<RowKey>> username + userid
Twissandra–Friends and Followers CFs <<CF>> Friends <<CF>> Followers <<RowKey>> userid <<RowKey>> userid friendid followerid timestamp timestamp
Twissandra – Tweet CF <<CF>> Tweet <<RowKey>> tweetid  + userid  + body  + timestamp
Twissandra–Userline and Timeline CFs <<CF>> Userline <<CF>> Timeline <<RowKey>> userid <<RowKey>> userid timestamp timestamp tweetid tweetid
Cassandra QL – User creation BATCH BEGIN BATCH  INSERT INTO User (KEY, username, password) VALUES (‘id', ‘konstantin’,  ‘******’) INSERT INTO Username (KEY, userid) VALUES ( ‘konstantin’,  ‘id’) APPLY BATCH
Cassandra QL – following a friend BATCH BEGIN BATCH INSERT INTO Friends (KEY,  friendid) VALUES (‘userid‘, ‘friendid’) INSERT INTO Followers (KEY, userid) VALUES (‘friendid ‘, ‘userid’) APPLY BATCH
Cassandra QL – Tweet creation  BATCH BEGIN BATCH INSERT INTO Tweet (KEY,  userid, body, timestamp) VALUES (‘tweetid‘, ‘userid’, ’@ericflo thanks for Twissandra, it helps!’, 123656459847) INSERT INTO Userline (KEY, 123656459847) VALUES ( ‘userid’, ‘tweetid’) INSERT INTO Timeline (KEY, 123656459847) VALUES ( ‘userid’, ‘tweetid’) …….. INSERT INTO Timeline (KEY, 123656459847) VALUES ( ‘followerid’, ‘tweetid’) …… APPLY BATCH
Cassandra QL – Getting user tweets SELECT  * FROM Userline KEY = ‘userid’ SELECT * FROM  Tweet WHERE KEY IN (‘tweetid1’, ‘tweetid2’, ‘tweetid3’, …., ‘tweetidn’)
Cassandra QL – Getting user timeline SELECT  * FROM Timeline KEY = ‘userid’ SELECT * FROM  Tweet WHERE KEY IN (‘tweetid1’, ‘tweetid2’, ‘tweetid3’, …., ‘tweetidn’)
Design patterns Materialized View create a second column family to represent additional queries Valueless Column use column names for values Aggregate Key If you need to find sub item, use composite key
Indexes <<CF>> Item_Properties <<CF>> Container_Items <<RowKey>> item_id <<RowKey>> container_id property_name item_id property_value insertion_timestamp
Indexes <<CF>> Container_Items_Property_Index <<RowKey>>  container_id + property_name composite(property_value, item_id, entry_timestamp) item_id Comparator: compositecomparer.CompositeType
Problem with eventual consistency When we update value, we should add new value to index, and remove old value. However, eventual consistency and lack of transactions make it impossible
Solution <<CF>> Container_Item_Property_Index_Entries <<RowKey>>  container_id + item_id 		+ property_name entry_timestamp property_value
VI. Architecture
Partitioners Partitioners decide where a key maps onto the ring. Key 1 Key 2 Key 3 Key 4
Partitioners RandomPartitioner OrderPreservingPartitioner ByteOrderedPartitioner CollatingOrderPreservingPartitioner
Replication Replication controlled by the replication_factor setting in the keyspace definition The actual placement of replicas in the cluster is determined by the Replica Placement Strategies.
Placement Strategies SimpleStrategy - returns the nodes that are next to each other on the ring.
Placement Strategies OldNetworkTopologyStrategy - places one replica in a different data center while placing the others on different racks in the current data center.
Placement Strategies NetworkTopologyStrategy - Allows you to configure the number of replicas per data center as specified in the strategy_options.
Snitches Give Cassandra information about the network topology of the cluster Endpoint snitch – gives information about network topology. Dynamic snitch – monitor read latencies
Endpoint Snitch Implementations SimpleSnitch(default)- can be efficient for locating nodes in clusters limited to a single data center.
Endpoint Snitch Implementations RackInferringSnitch - extrapolates the topolology of the network by analyzing IP addresses. 192.168.191.71 In the same rack 192.168.191.21 192.168.191.71 In the same datacenter 192.168.171.21 192.78.19.71 In different datacenters 192.18.11.21
Endpoint Snitch Implementations PropertyFileSnitch - determines the location of nodes by referring to a user-defined description of the network details located in the property file cassandra-topology.properties.
Commit Log ,[object Object]
 sequential writes onlyMemtable ,[object Object],SSTable ,[object Object]
 indexesMemtables, SSTables, Commit Logs
Write properties Write properties No reads No seeks Fast Atomic within ColumnFamily Always writable
Write/Read properties Read properties Read multiple SSTables Slower than writes (but still fast) Seeks can be mitigated with more RAM Scales to billions of rows
Commit Log durability Durability settings reflects PostgreSQL settings. Periodic sync of commit log. With potential probability for data loss. Batch sync of commit log.  Write is acknowledged only if commit log is flushed on disk. It is strongly recommended to have separate device for commit log in such case.
Gossip protocol Intra-ring communication Runs periodically Failure detection,hinted handoffs and nodes exchange
Gossip protocol org.apache.cassandra.gms.Gossiper Has the list of nodes that are alive and dead Chooses a random node and starts “chat” with it. One gossip round requires three messages Failure detection uses a suspicion level to decide whether the node is alive or dead
Hinted handoff Write Hint Cassandra is always available for write
Consistency level
Tombstones The data is not immediately deleted Deleted values are marked Tombstones will be suppressed during next compaction GCGraceSeconds – amount of seconds that server will wait to garbage-collect a tombstone
Compaction Merging SSTables into one merging keys combining columns creating new index Main aims: Free up space Reduce number of required seeks
Compaction Minor: Triggered when at least N SSTables have been flushed on disk (N is tunable, 4 – by default) Merging SSTables of the similar size Major: Merging all SSTables Done manually through nodetool compact discarding tombstones
Replica synchronization Anti-entropy Read repair
Anti-entropy During major compaction the node exchanges Merkle trees (hash of its data) with another nodes If the trees don’t match, they are repaired Nodes maintain timestamp index and exchange only the most recent updates
Read repair During read operation replicas with stale values are brought up to date Week consistency level (ONE): 			after the data is returned Strong consistency level (QUORUM, ALL): 			before the data is returned
Bloom filters A bit array Test whether value is a member of set Reduce disk access (improve performance)
Bloom filters On write:` several hashes are generated per key bits for each hash are marked On read: hashes are generated for the key if all bits of this hashes are non-empty then the key may probably exist in SSTable if at least one bit is empty then the key has been never written to SSTable
Bloom filters Read Write 1 0 0 Hash1 Hash1 0 0 0 Key1 Hash2 Key2 Hash2 0 1 0 Hash3 1 Hash3 0 SSTable

Contenu connexe

Tendances

Apache Cassandra Lesson: Data Modelling and CQL3
Apache Cassandra Lesson: Data Modelling and CQL3Apache Cassandra Lesson: Data Modelling and CQL3
Apache Cassandra Lesson: Data Modelling and CQL3Markus Klems
 
Cassandra Community Webinar: Back to Basics with CQL3
Cassandra Community Webinar: Back to Basics with CQL3Cassandra Community Webinar: Back to Basics with CQL3
Cassandra Community Webinar: Back to Basics with CQL3DataStax
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into CassandraBrent Theisen
 
Cassandra EU - Data model on fire
Cassandra EU - Data model on fireCassandra EU - Data model on fire
Cassandra EU - Data model on firePatrick McFadin
 
NOSQL and Cassandra
NOSQL and CassandraNOSQL and Cassandra
NOSQL and Cassandrarantav
 
A Deep Dive Into Understanding Apache Cassandra
A Deep Dive Into Understanding Apache CassandraA Deep Dive Into Understanding Apache Cassandra
A Deep Dive Into Understanding Apache CassandraDataStax Academy
 
Advanced data modeling with apache cassandra
Advanced data modeling with apache cassandraAdvanced data modeling with apache cassandra
Advanced data modeling with apache cassandraPatrick McFadin
 
Scaling web applications with cassandra presentation
Scaling web applications with cassandra presentationScaling web applications with cassandra presentation
Scaling web applications with cassandra presentationMurat Çakal
 
Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example
Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series ExampleCassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example
Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series ExampleDataStax Academy
 
Cassandra Community Webinar | Become a Super Modeler
Cassandra Community Webinar | Become a Super ModelerCassandra Community Webinar | Become a Super Modeler
Cassandra Community Webinar | Become a Super ModelerDataStax
 
C*ollege Credit: Creating Your First App in Java with Cassandra
C*ollege Credit: Creating Your First App in Java with CassandraC*ollege Credit: Creating Your First App in Java with Cassandra
C*ollege Credit: Creating Your First App in Java with CassandraDataStax
 
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...DataStax
 
Algebird : Abstract Algebra for big data analytics. Devoxx 2014
Algebird : Abstract Algebra for big data analytics. Devoxx 2014Algebird : Abstract Algebra for big data analytics. Devoxx 2014
Algebird : Abstract Algebra for big data analytics. Devoxx 2014Samir Bessalah
 
Successful Architectures for Fast Data
Successful Architectures for Fast DataSuccessful Architectures for Fast Data
Successful Architectures for Fast DataPatrick McFadin
 
Cassandra nice use cases and worst anti patterns
Cassandra nice use cases and worst anti patternsCassandra nice use cases and worst anti patterns
Cassandra nice use cases and worst anti patternsDuyhai Doan
 
Datastax day 2016 introduction to apache cassandra
Datastax day 2016   introduction to apache cassandraDatastax day 2016   introduction to apache cassandra
Datastax day 2016 introduction to apache cassandraDuyhai Doan
 
Cassandra 3.0 - JSON at scale - StampedeCon 2015
Cassandra 3.0 - JSON at scale - StampedeCon 2015Cassandra 3.0 - JSON at scale - StampedeCon 2015
Cassandra 3.0 - JSON at scale - StampedeCon 2015StampedeCon
 
Cassandra Day SV 2014: Fundamentals of Apache Cassandra Data Modeling
Cassandra Day SV 2014: Fundamentals of Apache Cassandra Data ModelingCassandra Day SV 2014: Fundamentals of Apache Cassandra Data Modeling
Cassandra Day SV 2014: Fundamentals of Apache Cassandra Data ModelingDataStax Academy
 
Big data 101 for beginners devoxxpl
Big data 101 for beginners devoxxplBig data 101 for beginners devoxxpl
Big data 101 for beginners devoxxplDuyhai Doan
 

Tendances (20)

Apache Cassandra Lesson: Data Modelling and CQL3
Apache Cassandra Lesson: Data Modelling and CQL3Apache Cassandra Lesson: Data Modelling and CQL3
Apache Cassandra Lesson: Data Modelling and CQL3
 
Cassandra Community Webinar: Back to Basics with CQL3
Cassandra Community Webinar: Back to Basics with CQL3Cassandra Community Webinar: Back to Basics with CQL3
Cassandra Community Webinar: Back to Basics with CQL3
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into Cassandra
 
Cassandra EU - Data model on fire
Cassandra EU - Data model on fireCassandra EU - Data model on fire
Cassandra EU - Data model on fire
 
NOSQL and Cassandra
NOSQL and CassandraNOSQL and Cassandra
NOSQL and Cassandra
 
A Deep Dive Into Understanding Apache Cassandra
A Deep Dive Into Understanding Apache CassandraA Deep Dive Into Understanding Apache Cassandra
A Deep Dive Into Understanding Apache Cassandra
 
Advanced data modeling with apache cassandra
Advanced data modeling with apache cassandraAdvanced data modeling with apache cassandra
Advanced data modeling with apache cassandra
 
Scaling web applications with cassandra presentation
Scaling web applications with cassandra presentationScaling web applications with cassandra presentation
Scaling web applications with cassandra presentation
 
Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example
Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series ExampleCassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example
Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example
 
Cassandra Community Webinar | Become a Super Modeler
Cassandra Community Webinar | Become a Super ModelerCassandra Community Webinar | Become a Super Modeler
Cassandra Community Webinar | Become a Super Modeler
 
C*ollege Credit: Creating Your First App in Java with Cassandra
C*ollege Credit: Creating Your First App in Java with CassandraC*ollege Credit: Creating Your First App in Java with Cassandra
C*ollege Credit: Creating Your First App in Java with Cassandra
 
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
 
Algebird : Abstract Algebra for big data analytics. Devoxx 2014
Algebird : Abstract Algebra for big data analytics. Devoxx 2014Algebird : Abstract Algebra for big data analytics. Devoxx 2014
Algebird : Abstract Algebra for big data analytics. Devoxx 2014
 
Successful Architectures for Fast Data
Successful Architectures for Fast DataSuccessful Architectures for Fast Data
Successful Architectures for Fast Data
 
Spanner (may 19)
Spanner (may 19)Spanner (may 19)
Spanner (may 19)
 
Cassandra nice use cases and worst anti patterns
Cassandra nice use cases and worst anti patternsCassandra nice use cases and worst anti patterns
Cassandra nice use cases and worst anti patterns
 
Datastax day 2016 introduction to apache cassandra
Datastax day 2016   introduction to apache cassandraDatastax day 2016   introduction to apache cassandra
Datastax day 2016 introduction to apache cassandra
 
Cassandra 3.0 - JSON at scale - StampedeCon 2015
Cassandra 3.0 - JSON at scale - StampedeCon 2015Cassandra 3.0 - JSON at scale - StampedeCon 2015
Cassandra 3.0 - JSON at scale - StampedeCon 2015
 
Cassandra Day SV 2014: Fundamentals of Apache Cassandra Data Modeling
Cassandra Day SV 2014: Fundamentals of Apache Cassandra Data ModelingCassandra Day SV 2014: Fundamentals of Apache Cassandra Data Modeling
Cassandra Day SV 2014: Fundamentals of Apache Cassandra Data Modeling
 
Big data 101 for beginners devoxxpl
Big data 101 for beginners devoxxplBig data 101 for beginners devoxxpl
Big data 101 for beginners devoxxpl
 

En vedette

CQRS innovations (English version)
CQRS innovations (English version)CQRS innovations (English version)
CQRS innovations (English version)Andrey Lomakin
 
CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...
CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...
CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...DataStax
 
Cassandra internals
Cassandra internalsCassandra internals
Cassandra internalsnarsiman
 
High performance queues with Cassandra
High performance queues with CassandraHigh performance queues with Cassandra
High performance queues with CassandraMikalai Alimenkou
 
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...DataStax
 
Distributed Airline Reservation System
Distributed Airline Reservation SystemDistributed Airline Reservation System
Distributed Airline Reservation Systemamanchaurasia
 

En vedette (6)

CQRS innovations (English version)
CQRS innovations (English version)CQRS innovations (English version)
CQRS innovations (English version)
 
CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...
CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...
CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...
 
Cassandra internals
Cassandra internalsCassandra internals
Cassandra internals
 
High performance queues with Cassandra
High performance queues with CassandraHigh performance queues with Cassandra
High performance queues with Cassandra
 
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
 
Distributed Airline Reservation System
Distributed Airline Reservation SystemDistributed Airline Reservation System
Distributed Airline Reservation System
 

Similaire à Apache Cassandra, part 2 – data model example, machinery

Apache cassandra - future without boundaries (part2)
Apache cassandra - future without boundaries (part2)Apache cassandra - future without boundaries (part2)
Apache cassandra - future without boundaries (part2)Return on Intelligence
 
Nyc summit intro_to_cassandra
Nyc summit intro_to_cassandraNyc summit intro_to_cassandra
Nyc summit intro_to_cassandrazznate
 
Meetup cassandra for_java_cql
Meetup cassandra for_java_cqlMeetup cassandra for_java_cql
Meetup cassandra for_java_cqlzznate
 
Learning Cassandra NoSQL
Learning Cassandra NoSQLLearning Cassandra NoSQL
Learning Cassandra NoSQLPankaj Khattar
 
Introduciton to Apache Cassandra for Java Developers (JavaOne)
Introduciton to Apache Cassandra for Java Developers (JavaOne)Introduciton to Apache Cassandra for Java Developers (JavaOne)
Introduciton to Apache Cassandra for Java Developers (JavaOne)zznate
 
Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X
Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.XCassandra SF Meetup - CQL Performance With Apache Cassandra 3.X
Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.Xaaronmorton
 
Storage cassandra
Storage   cassandraStorage   cassandra
Storage cassandraPL dream
 
Using Cassandra with your Web Application
Using Cassandra with your Web ApplicationUsing Cassandra with your Web Application
Using Cassandra with your Web Applicationsupertom
 
About "Apache Cassandra"
About "Apache Cassandra"About "Apache Cassandra"
About "Apache Cassandra"Jihyun Ahn
 
Spark Summit EU talk by Ross Lawley
Spark Summit EU talk by Ross LawleySpark Summit EU talk by Ross Lawley
Spark Summit EU talk by Ross LawleySpark Summit
 
How To Connect Spark To Your Own Datasource
How To Connect Spark To Your Own DatasourceHow To Connect Spark To Your Own Datasource
How To Connect Spark To Your Own DatasourceMongoDB
 
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWSAWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWSCobus Bernard
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSAmazon Web Services
 
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...Tathagata Das
 
Mongodb in-anger-boston-rb-2011
Mongodb in-anger-boston-rb-2011Mongodb in-anger-boston-rb-2011
Mongodb in-anger-boston-rb-2011bostonrb
 
NET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptxNET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptxpetabridge
 

Similaire à Apache Cassandra, part 2 – data model example, machinery (20)

Apache cassandra - future without boundaries (part2)
Apache cassandra - future without boundaries (part2)Apache cassandra - future without boundaries (part2)
Apache cassandra - future without boundaries (part2)
 
Nyc summit intro_to_cassandra
Nyc summit intro_to_cassandraNyc summit intro_to_cassandra
Nyc summit intro_to_cassandra
 
Meetup cassandra for_java_cql
Meetup cassandra for_java_cqlMeetup cassandra for_java_cql
Meetup cassandra for_java_cql
 
Learning Cassandra NoSQL
Learning Cassandra NoSQLLearning Cassandra NoSQL
Learning Cassandra NoSQL
 
Cassandra20141009
Cassandra20141009Cassandra20141009
Cassandra20141009
 
Introduciton to Apache Cassandra for Java Developers (JavaOne)
Introduciton to Apache Cassandra for Java Developers (JavaOne)Introduciton to Apache Cassandra for Java Developers (JavaOne)
Introduciton to Apache Cassandra for Java Developers (JavaOne)
 
Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X
Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.XCassandra SF Meetup - CQL Performance With Apache Cassandra 3.X
Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X
 
Cassandra20141113
Cassandra20141113Cassandra20141113
Cassandra20141113
 
Storage cassandra
Storage   cassandraStorage   cassandra
Storage cassandra
 
Using Cassandra with your Web Application
Using Cassandra with your Web ApplicationUsing Cassandra with your Web Application
Using Cassandra with your Web Application
 
About "Apache Cassandra"
About "Apache Cassandra"About "Apache Cassandra"
About "Apache Cassandra"
 
Spark Summit EU talk by Ross Lawley
Spark Summit EU talk by Ross LawleySpark Summit EU talk by Ross Lawley
Spark Summit EU talk by Ross Lawley
 
How To Connect Spark To Your Own Datasource
How To Connect Spark To Your Own DatasourceHow To Connect Spark To Your Own Datasource
How To Connect Spark To Your Own Datasource
 
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWSAWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWS
 
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...
 
Cassandra no sql ecosystem
Cassandra no sql ecosystemCassandra no sql ecosystem
Cassandra no sql ecosystem
 
Mongodb in-anger-boston-rb-2011
Mongodb in-anger-boston-rb-2011Mongodb in-anger-boston-rb-2011
Mongodb in-anger-boston-rb-2011
 
Cassandra
CassandraCassandra
Cassandra
 
NET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptxNET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptx
 

Dernier

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 

Dernier (20)

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 

Apache Cassandra, part 2 – data model example, machinery

  • 1. Apache Cassandra, part 2 – data model example, machinery
  • 2. V. Data model example - Twissandra
  • 3. Twissandra Use Cases Get the friends of a username Get the followers of a username Get a timeline of a specific user’s tweets Create a tweet Create a user Add friends to a user
  • 4. Twissandra – DB User User id user_name password
  • 5. Twissandra - DB Followers User User Followers id user_name password id user_name password user_id follower_id
  • 6. Twissandra - DB Following User User Following id user_name password id user_name password user_id following_id
  • 7. Twissandra – DB Tweets User Tweet id user_name password id user_id body timestamp
  • 8. Twissandra column families User Username Friends, Followers Tweet Userline Timeline
  • 9. Twissandra – Users CF <<CF>> User <<CF>> Username <<RowKey>> userid + username + password <<RowKey>> username + userid
  • 10. Twissandra–Friends and Followers CFs <<CF>> Friends <<CF>> Followers <<RowKey>> userid <<RowKey>> userid friendid followerid timestamp timestamp
  • 11. Twissandra – Tweet CF <<CF>> Tweet <<RowKey>> tweetid + userid + body + timestamp
  • 12. Twissandra–Userline and Timeline CFs <<CF>> Userline <<CF>> Timeline <<RowKey>> userid <<RowKey>> userid timestamp timestamp tweetid tweetid
  • 13. Cassandra QL – User creation BATCH BEGIN BATCH INSERT INTO User (KEY, username, password) VALUES (‘id', ‘konstantin’, ‘******’) INSERT INTO Username (KEY, userid) VALUES ( ‘konstantin’, ‘id’) APPLY BATCH
  • 14. Cassandra QL – following a friend BATCH BEGIN BATCH INSERT INTO Friends (KEY, friendid) VALUES (‘userid‘, ‘friendid’) INSERT INTO Followers (KEY, userid) VALUES (‘friendid ‘, ‘userid’) APPLY BATCH
  • 15. Cassandra QL – Tweet creation BATCH BEGIN BATCH INSERT INTO Tweet (KEY, userid, body, timestamp) VALUES (‘tweetid‘, ‘userid’, ’@ericflo thanks for Twissandra, it helps!’, 123656459847) INSERT INTO Userline (KEY, 123656459847) VALUES ( ‘userid’, ‘tweetid’) INSERT INTO Timeline (KEY, 123656459847) VALUES ( ‘userid’, ‘tweetid’) …….. INSERT INTO Timeline (KEY, 123656459847) VALUES ( ‘followerid’, ‘tweetid’) …… APPLY BATCH
  • 16. Cassandra QL – Getting user tweets SELECT * FROM Userline KEY = ‘userid’ SELECT * FROM Tweet WHERE KEY IN (‘tweetid1’, ‘tweetid2’, ‘tweetid3’, …., ‘tweetidn’)
  • 17. Cassandra QL – Getting user timeline SELECT * FROM Timeline KEY = ‘userid’ SELECT * FROM Tweet WHERE KEY IN (‘tweetid1’, ‘tweetid2’, ‘tweetid3’, …., ‘tweetidn’)
  • 18. Design patterns Materialized View create a second column family to represent additional queries Valueless Column use column names for values Aggregate Key If you need to find sub item, use composite key
  • 19. Indexes <<CF>> Item_Properties <<CF>> Container_Items <<RowKey>> item_id <<RowKey>> container_id property_name item_id property_value insertion_timestamp
  • 20. Indexes <<CF>> Container_Items_Property_Index <<RowKey>> container_id + property_name composite(property_value, item_id, entry_timestamp) item_id Comparator: compositecomparer.CompositeType
  • 21. Problem with eventual consistency When we update value, we should add new value to index, and remove old value. However, eventual consistency and lack of transactions make it impossible
  • 22. Solution <<CF>> Container_Item_Property_Index_Entries <<RowKey>> container_id + item_id + property_name entry_timestamp property_value
  • 24. Partitioners Partitioners decide where a key maps onto the ring. Key 1 Key 2 Key 3 Key 4
  • 25. Partitioners RandomPartitioner OrderPreservingPartitioner ByteOrderedPartitioner CollatingOrderPreservingPartitioner
  • 26. Replication Replication controlled by the replication_factor setting in the keyspace definition The actual placement of replicas in the cluster is determined by the Replica Placement Strategies.
  • 27. Placement Strategies SimpleStrategy - returns the nodes that are next to each other on the ring.
  • 28. Placement Strategies OldNetworkTopologyStrategy - places one replica in a different data center while placing the others on different racks in the current data center.
  • 29. Placement Strategies NetworkTopologyStrategy - Allows you to configure the number of replicas per data center as specified in the strategy_options.
  • 30. Snitches Give Cassandra information about the network topology of the cluster Endpoint snitch – gives information about network topology. Dynamic snitch – monitor read latencies
  • 31. Endpoint Snitch Implementations SimpleSnitch(default)- can be efficient for locating nodes in clusters limited to a single data center.
  • 32. Endpoint Snitch Implementations RackInferringSnitch - extrapolates the topolology of the network by analyzing IP addresses. 192.168.191.71 In the same rack 192.168.191.21 192.168.191.71 In the same datacenter 192.168.171.21 192.78.19.71 In different datacenters 192.18.11.21
  • 33. Endpoint Snitch Implementations PropertyFileSnitch - determines the location of nodes by referring to a user-defined description of the network details located in the property file cassandra-topology.properties.
  • 34.
  • 35.
  • 37. Write properties Write properties No reads No seeks Fast Atomic within ColumnFamily Always writable
  • 38. Write/Read properties Read properties Read multiple SSTables Slower than writes (but still fast) Seeks can be mitigated with more RAM Scales to billions of rows
  • 39. Commit Log durability Durability settings reflects PostgreSQL settings. Periodic sync of commit log. With potential probability for data loss. Batch sync of commit log. Write is acknowledged only if commit log is flushed on disk. It is strongly recommended to have separate device for commit log in such case.
  • 40. Gossip protocol Intra-ring communication Runs periodically Failure detection,hinted handoffs and nodes exchange
  • 41. Gossip protocol org.apache.cassandra.gms.Gossiper Has the list of nodes that are alive and dead Chooses a random node and starts “chat” with it. One gossip round requires three messages Failure detection uses a suspicion level to decide whether the node is alive or dead
  • 42. Hinted handoff Write Hint Cassandra is always available for write
  • 44. Tombstones The data is not immediately deleted Deleted values are marked Tombstones will be suppressed during next compaction GCGraceSeconds – amount of seconds that server will wait to garbage-collect a tombstone
  • 45. Compaction Merging SSTables into one merging keys combining columns creating new index Main aims: Free up space Reduce number of required seeks
  • 46. Compaction Minor: Triggered when at least N SSTables have been flushed on disk (N is tunable, 4 – by default) Merging SSTables of the similar size Major: Merging all SSTables Done manually through nodetool compact discarding tombstones
  • 48. Anti-entropy During major compaction the node exchanges Merkle trees (hash of its data) with another nodes If the trees don’t match, they are repaired Nodes maintain timestamp index and exchange only the most recent updates
  • 49. Read repair During read operation replicas with stale values are brought up to date Week consistency level (ONE): after the data is returned Strong consistency level (QUORUM, ALL): before the data is returned
  • 50. Bloom filters A bit array Test whether value is a member of set Reduce disk access (improve performance)
  • 51. Bloom filters On write:` several hashes are generated per key bits for each hash are marked On read: hashes are generated for the key if all bits of this hashes are non-empty then the key may probably exist in SSTable if at least one bit is empty then the key has been never written to SSTable
  • 52. Bloom filters Read Write 1 0 0 Hash1 Hash1 0 0 0 Key1 Hash2 Key2 Hash2 0 1 0 Hash3 1 Hash3 0 SSTable
  • 53. Resources Home of Apache Cassandra Project http://cassandra.apache.org/ Apache Cassandra Wiki http://wiki.apache.org/cassandra/ Documentation provided by DataStaxhttp://www.datastax.com/docs/0.8/ Good explanation of creation secondary indexes http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html Eben Hewitt “Cassandra: The Definitive Guide”, O’REILLY, 2010, ISBN: 978-1-449-39041-9
  • 54. Authors Lev Sivashov- lsivashov@gmail.com Andrey Lomakin - lomakin.andrey@gmail.com, twitter: @Andrey_LomakinLinkedIn: http://www.linkedin.com/in/andreylomakin Artem Orobets – enisher@gmail.comtwitter: @Dr_EniSh Anton Veretennik - tennik@gmail.com

Notes de l'éditeur

  1. Endpoint snitch can be wrapped with a dynamic snitch, which will monitor read latencies and avoid reading from hosts that have slowed (due to compaction, for instance)