Aim of this presentation to provide enough information for enterprise architect to choose whether Cassandra will be project data store. Presentation describes each nuance of Cassandra architecture and ways to design data and work with them.
3. Twissandra Use Cases Get the friends of a username Get the followers of a username Get a timeline of a specific user’s tweets Create a tweet Create a user Add friends to a user
13. Cassandra QL – User creation BATCH BEGIN BATCH INSERT INTO User (KEY, username, password) VALUES (‘id', ‘konstantin’, ‘******’) INSERT INTO Username (KEY, userid) VALUES ( ‘konstantin’, ‘id’) APPLY BATCH
14. Cassandra QL – following a friend BATCH BEGIN BATCH INSERT INTO Friends (KEY, friendid) VALUES (‘userid‘, ‘friendid’) INSERT INTO Followers (KEY, userid) VALUES (‘friendid ‘, ‘userid’) APPLY BATCH
15. Cassandra QL – Tweet creation BATCH BEGIN BATCH INSERT INTO Tweet (KEY, userid, body, timestamp) VALUES (‘tweetid‘, ‘userid’, ’@ericflo thanks for Twissandra, it helps!’, 123656459847) INSERT INTO Userline (KEY, 123656459847) VALUES ( ‘userid’, ‘tweetid’) INSERT INTO Timeline (KEY, 123656459847) VALUES ( ‘userid’, ‘tweetid’) …….. INSERT INTO Timeline (KEY, 123656459847) VALUES ( ‘followerid’, ‘tweetid’) …… APPLY BATCH
16. Cassandra QL – Getting user tweets SELECT * FROM Userline KEY = ‘userid’ SELECT * FROM Tweet WHERE KEY IN (‘tweetid1’, ‘tweetid2’, ‘tweetid3’, …., ‘tweetidn’)
17. Cassandra QL – Getting user timeline SELECT * FROM Timeline KEY = ‘userid’ SELECT * FROM Tweet WHERE KEY IN (‘tweetid1’, ‘tweetid2’, ‘tweetid3’, …., ‘tweetidn’)
18. Design patterns Materialized View create a second column family to represent additional queries Valueless Column use column names for values Aggregate Key If you need to find sub item, use composite key
21. Problem with eventual consistency When we update value, we should add new value to index, and remove old value. However, eventual consistency and lack of transactions make it impossible
26. Replication Replication controlled by the replication_factor setting in the keyspace definition The actual placement of replicas in the cluster is determined by the Replica Placement Strategies.
30. Snitches Give Cassandra information about the network topology of the cluster Endpoint snitch – gives information about network topology. Dynamic snitch – monitor read latencies
31. Endpoint Snitch Implementations SimpleSnitch(default)- can be efficient for locating nodes in clusters limited to a single data center.
32. Endpoint Snitch Implementations RackInferringSnitch - extrapolates the topolology of the network by analyzing IP addresses. 192.168.191.71 In the same rack 192.168.191.21 192.168.191.71 In the same datacenter 192.168.171.21 192.78.19.71 In different datacenters 192.18.11.21
33. Endpoint Snitch Implementations PropertyFileSnitch - determines the location of nodes by referring to a user-defined description of the network details located in the property file cassandra-topology.properties.
37. Write properties Write properties No reads No seeks Fast Atomic within ColumnFamily Always writable
38. Write/Read properties Read properties Read multiple SSTables Slower than writes (but still fast) Seeks can be mitigated with more RAM Scales to billions of rows
39. Commit Log durability Durability settings reflects PostgreSQL settings. Periodic sync of commit log. With potential probability for data loss. Batch sync of commit log. Write is acknowledged only if commit log is flushed on disk. It is strongly recommended to have separate device for commit log in such case.
40. Gossip protocol Intra-ring communication Runs periodically Failure detection,hinted handoffs and nodes exchange
41. Gossip protocol org.apache.cassandra.gms.Gossiper Has the list of nodes that are alive and dead Chooses a random node and starts “chat” with it. One gossip round requires three messages Failure detection uses a suspicion level to decide whether the node is alive or dead
44. Tombstones The data is not immediately deleted Deleted values are marked Tombstones will be suppressed during next compaction GCGraceSeconds – amount of seconds that server will wait to garbage-collect a tombstone
45. Compaction Merging SSTables into one merging keys combining columns creating new index Main aims: Free up space Reduce number of required seeks
46. Compaction Minor: Triggered when at least N SSTables have been flushed on disk (N is tunable, 4 – by default) Merging SSTables of the similar size Major: Merging all SSTables Done manually through nodetool compact discarding tombstones
48. Anti-entropy During major compaction the node exchanges Merkle trees (hash of its data) with another nodes If the trees don’t match, they are repaired Nodes maintain timestamp index and exchange only the most recent updates
49. Read repair During read operation replicas with stale values are brought up to date Week consistency level (ONE): after the data is returned Strong consistency level (QUORUM, ALL): before the data is returned
50. Bloom filters A bit array Test whether value is a member of set Reduce disk access (improve performance)
51. Bloom filters On write:` several hashes are generated per key bits for each hash are marked On read: hashes are generated for the key if all bits of this hashes are non-empty then the key may probably exist in SSTable if at least one bit is empty then the key has been never written to SSTable
53. Resources Home of Apache Cassandra Project http://cassandra.apache.org/ Apache Cassandra Wiki http://wiki.apache.org/cassandra/ Documentation provided by DataStaxhttp://www.datastax.com/docs/0.8/ Good explanation of creation secondary indexes http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html Eben Hewitt “Cassandra: The Definitive Guide”, O’REILLY, 2010, ISBN: 978-1-449-39041-9
Endpoint snitch can be wrapped with a dynamic snitch, which will monitor read latencies and avoid reading from hosts that have slowed (due to compaction, for instance)