Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
CQRS innovations (English version)
Download to read offline and view in fullscreen.


Apache Cassandra, part 2 – data model example, machinery

Download to read offline

Aim of this presentation to provide enough information for enterprise architect to choose whether Cassandra will be project data store. Presentation describes each nuance of Cassandra architecture and ways to design data and work with them.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Apache Cassandra, part 2 – data model example, machinery

  1. 1. Apache Cassandra, part 2 – data model example, machinery<br />
  2. 2. V. Data model example - Twissandra<br />
  3. 3. Twissandra Use Cases<br />Get the friends of a username<br />Get the followers of a username<br />Get a timeline of a specific user’s tweets<br />Create a tweet<br />Create a user<br />Add friends to a user<br />
  4. 4. Twissandra – DB User<br />User<br />id<br />user_name<br />password<br />
  5. 5. Twissandra - DB Followers<br />User<br />User<br />Followers<br />id<br />user_name<br />password<br />id<br />user_name<br />password<br />user_id<br />follower_id<br />
  6. 6. Twissandra - DB Following<br />User<br />User<br />Following<br />id<br />user_name<br />password<br />id<br />user_name<br />password<br />user_id<br />following_id<br />
  7. 7. Twissandra – DB Tweets<br />User<br />Tweet<br />id<br />user_name<br />password<br />id<br />user_id<br />body<br />timestamp<br />
  8. 8. Twissandra column families<br />User<br />Username<br />Friends, Followers<br />Tweet<br />Userline<br />Timeline<br />
  9. 9. Twissandra – Users CF<br /><<CF>> User<br /><<CF>> Username<br /><<RowKey>> userid<br />+ username<br />+ password<br /><<RowKey>> username<br />+ userid<br />
  10. 10. Twissandra–Friends and Followers CFs<br /><<CF>> Friends<br /><<CF>> Followers<br /><<RowKey>> userid<br /><<RowKey>> userid<br />friendid<br />followerid<br />timestamp<br />timestamp<br />
  11. 11. Twissandra – Tweet CF<br /><<CF>> Tweet<br /><<RowKey>> tweetid<br /> + userid<br /> + body<br /> + timestamp<br />
  12. 12. Twissandra–Userline and Timeline CFs<br /><<CF>> Userline<br /><<CF>> Timeline<br /><<RowKey>> userid<br /><<RowKey>> userid<br />timestamp<br />timestamp<br />tweetid<br />tweetid<br />
  13. 13. Cassandra QL – User creation<br />BATCH BEGIN BATCH <br />INSERT INTO User (KEY, username, password) VALUES (‘id', ‘konstantin’, ‘******’)<br />INSERT INTO Username (KEY, userid) VALUES ( ‘konstantin’, ‘id’)<br />APPLY BATCH<br />
  14. 14. Cassandra QL – following a friend<br />BATCH BEGIN BATCH<br />INSERT INTO Friends (KEY, friendid) VALUES (‘userid‘, ‘friendid’)<br />INSERT INTO Followers (KEY, userid) VALUES (‘friendid ‘, ‘userid’)<br />APPLY BATCH<br />
  15. 15. Cassandra QL – Tweet creation <br />BATCH BEGIN BATCH<br />INSERT INTO Tweet (KEY, userid, body, timestamp) VALUES (‘tweetid‘, ‘userid’, ’@ericflo thanks for Twissandra, it helps!’, 123656459847)<br />INSERT INTO Userline (KEY, 123656459847) VALUES ( ‘userid’, ‘tweetid’)<br />INSERT INTO Timeline (KEY, 123656459847) VALUES ( ‘userid’, ‘tweetid’)<br />……..<br />INSERT INTO Timeline (KEY, 123656459847) VALUES ( ‘followerid’, ‘tweetid’)<br />……<br />APPLY BATCH<br />
  16. 16. Cassandra QL – Getting user tweets<br />SELECT * FROM Userline KEY = ‘userid’<br />SELECT * FROM Tweet WHERE KEY IN (‘tweetid1’, ‘tweetid2’, ‘tweetid3’, …., ‘tweetidn’)<br />
  17. 17. Cassandra QL – Getting user timeline<br />SELECT * FROM Timeline KEY = ‘userid’<br />SELECT * FROM Tweet WHERE KEY IN (‘tweetid1’, ‘tweetid2’, ‘tweetid3’, …., ‘tweetidn’)<br />
  18. 18. Design patterns<br />Materialized View<br />create a second column family to represent additional queries<br />Valueless Column<br />use column names for values<br />Aggregate Key<br />If you need to find sub item, use composite key<br />
  19. 19. Indexes<br /><<CF>> Item_Properties<br /><<CF>> Container_Items<br /><<RowKey>> item_id<br /><<RowKey>> container_id<br />property_name<br />item_id<br />property_value<br />insertion_timestamp<br />
  20. 20. Indexes<br /><<CF>> Container_Items_Property_Index<br /><<RowKey>> <br />container_id + property_name<br />composite(property_value, item_id, entry_timestamp)<br />item_id<br />Comparator: compositecomparer.CompositeType<br />
  21. 21. Problem with eventual consistency<br />When we update value, we should add new value to index, and remove old value.<br />However, eventual consistency and lack of transactions make it impossible<br />
  22. 22. Solution<br /><<CF>> Container_Item_Property_Index_Entries<br /><<RowKey>> <br />container_id + item_id<br /> + property_name<br />entry_timestamp<br />property_value<br />
  23. 23. VI. Architecture<br />
  24. 24. Partitioners<br />Partitioners decide where a key maps onto the ring.<br />Key 1<br />Key 2<br />Key 3<br />Key 4<br />
  25. 25. Partitioners<br />RandomPartitioner<br />OrderPreservingPartitioner<br />ByteOrderedPartitioner<br />CollatingOrderPreservingPartitioner<br />
  26. 26. Replication<br />Replication controlled by the replication_factor setting in the keyspace definition<br />The actual placement of replicas in the cluster is determined by the Replica Placement Strategies. <br />
  27. 27. Placement Strategies<br />SimpleStrategy - returns the nodes that are next to each other on the ring.<br />
  28. 28. Placement Strategies<br />OldNetworkTopologyStrategy - places one replica in a different data center while placing the others on different racks in the current data center.<br />
  29. 29. Placement Strategies<br />NetworkTopologyStrategy - Allows you to configure the number of replicas per data center as specified in the strategy_options.<br />
  30. 30. Snitches<br />Give Cassandra information about the network topology of the cluster<br />Endpoint snitch – gives information about network topology.<br />Dynamic snitch – monitor read latencies<br />
  31. 31. Endpoint Snitch Implementations<br />SimpleSnitch(default)- can be efficient for locating nodes in clusters limited to a single data center. <br />
  32. 32. Endpoint Snitch Implementations<br />RackInferringSnitch - extrapolates the topolology of the network by analyzing IP addresses.<br /><br />In the same rack<br /><br /><br />In the same datacenter<br /><br /><br />In different datacenters<br /><br />
  33. 33. Endpoint Snitch Implementations<br />PropertyFileSnitch - determines the location of nodes by referring to a user-defined description of the network details located in the property file <br />
  34. 34. Commit Log<br /><ul><li> Durability
  35. 35. sequential writes only</li></ul>Memtable<br /><ul><li> no disk access, batched writes</li></ul>SSTable<br /><ul><li> become read‐only
  36. 36. indexes</li></ul>Memtables, SSTables, Commit Logs<br />
  37. 37. Write properties<br />Write properties<br />No reads<br />No seeks<br />Fast<br />Atomic within ColumnFamily<br />Always writable<br />
  38. 38. Write/Read properties<br />Read properties<br />Read multiple SSTables<br />Slower than writes (but still fast)<br />Seeks can be mitigated with more RAM<br />Scales to billions of rows<br />
  39. 39. Commit Log durability<br />Durability settings reflects PostgreSQL settings.<br />Periodic sync of commit log. With potential probability for data loss.<br />Batch sync of commit log. Write is acknowledged only if commit log is flushed on disk. It is strongly recommended to have separate device for commit log in such case.<br />
  40. 40. Gossip protocol<br />Intra-ring communication<br />Runs periodically<br />Failure detection,hinted handoffs and nodes exchange<br />
  41. 41. Gossip protocol<br />org.apache.cassandra.gms.Gossiper<br />Has the list of nodes that are alive and dead<br />Chooses a random node and starts “chat” with it. One gossip round requires three messages<br />Failure detection uses a suspicion level to decide whether the node is alive or dead<br />
  42. 42. Hinted handoff<br />Write<br />Hint<br />Cassandra is always available for write<br />
  43. 43. Consistency level<br />
  44. 44. Tombstones<br />The data is not immediately deleted<br />Deleted values are marked<br />Tombstones will be suppressed during next compaction<br />GCGraceSeconds – amount of seconds that server will wait to garbage-collect a tombstone<br />
  45. 45. Compaction<br />Merging SSTables into one<br />merging keys<br />combining columns<br />creating new index<br />Main aims:<br />Free up space<br />Reduce number of required seeks<br />
  46. 46. Compaction<br />Minor:<br />Triggered when at least N SSTables have been flushed on disk (N is tunable, 4 – by default)<br />Merging SSTables of the similar size<br />Major:<br />Merging all SSTables<br />Done manually through nodetool compact<br />discarding tombstones<br />
  47. 47. Replica synchronization<br />Anti-entropy<br />Read repair<br />
  48. 48. Anti-entropy<br />During major compaction the node exchanges Merkle trees (hash of its data) with another nodes<br />If the trees don’t match, they are repaired<br />Nodes maintain timestamp index and exchange only the most recent updates<br />
  49. 49. Read repair<br />During read operation replicas with stale values are brought up to date<br />Week consistency level (ONE):<br /> after the data is returned<br />Strong consistency level (QUORUM, ALL):<br /> before the data is returned<br />
  50. 50. Bloom filters<br />A bit array<br />Test whether value is a member of set<br />Reduce disk access (improve performance)<br />
  51. 51. Bloom filters<br />On write:`<br />several hashes are generated per key<br />bits for each hash are marked<br />On read:<br />hashes are generated for the key<br />if all bits of this hashes are non-empty then the key may probably exist in SSTable<br />if at least one bit is empty then the key has been never written to SSTable<br />
  52. 52. Bloom filters<br />Read<br />Write<br />1<br />0<br />0<br />Hash1<br />Hash1<br />0<br />0<br />0<br />Key1<br />Hash2<br />Key2<br />Hash2<br />0<br />1<br />0<br />Hash3<br />1<br />Hash3<br />0<br />SSTable<br />
  53. 53. Resources<br />Home of Apache Cassandra Project<br />Apache Cassandra Wiki<br />Documentation provided by DataStax<br />Good explanation of creation secondary indexes<br />Eben Hewitt “Cassandra: The Definitive Guide”, O’REILLY, 2010, ISBN: 978-1-449-39041-9<br />
  54. 54. Authors<br />Lev Sivashov-<br />Andrey Lomakin -, twitter: @Andrey_LomakinLinkedIn:<br />Artem Orobets – enisher@gmail.comtwitter: @Dr_EniSh<br />Anton Veretennik -<br />
  • XuZhang29

    Jun. 21, 2017
  • rajanrajendran

    Feb. 2, 2017
  • iaio81

    Jun. 6, 2015
  • obazoud

    Sep. 4, 2013
  • xxg4813

    Jun. 9, 2013
  • bygone

    May. 31, 2013
  • ckadam

    Jan. 29, 2013
  • ddelarue

    Aug. 26, 2012
  • mabbashm

    Jul. 18, 2012
  • colinkuo

    Aug. 29, 2011
  • djcoin

    Jul. 1, 2011

Aim of this presentation to provide enough information for enterprise architect to choose whether Cassandra will be project data store. Presentation describes each nuance of Cassandra architecture and ways to design data and work with them.


Total views


On Slideshare


From embeds


Number of embeds