Akka Distributed Data is useful when building distributed systems that focus on high availability rather than strong consistency. The module uses Conflict-Free Replicated Data Types (CRDTs) to replicate data across nodes without conflicts.
This talk will give an introduction to distributed systems, data replication and how CRDTs are working. It will serve as a basis to explain Akka Distributed Data and typical uses cases in which it should be used.
16. CAP theorem
Consistency
Clients have the same view of the same data
Availability
Clients can always read and write
Partition tolerance
System continues to function
despite physical network partitions
47. Eventual consistency
• Tolerates N node failures out of N+1 nodes
• Only 1 node needs to be available
• Updates are observed eventually
48. Strong vs eventual consistency
• Strong consistency
• Consistent state at any time
• Operations require Quorum
• Higher latency
• Eventual consistency
• Possible stale state at a given time
• High availability
• Lower latency
49. Strong eventual consistency (SEC)
• Type of eventual consistency
• Guarantees that N nodes that received same (unordered)
updates will be in the same state
• Usage
• Cassandra, DynamoDB, Riak, CouchDB, Voldemort
• Needs non conflicting merge algorithm
56. What is a CRDT?
• Operation-based CRDTs (CmRDTs)
• Broadcasts update operations
• Not idempotent (require causal delivery of order)
• State-based CRDTs (CvRDTs)
• Broadcasts full or delta state
• Merge must be commutative, associative, idempotent
Conflict-free replicated data type
61. What is Akka Distributed Data?
• Uses CvRDT to replicate data across cluster
• Replication via gossip protocol
• Data kept in memory
• Every node has all data
63. Replicator
• Actor to interact with CRDT data
• Get replicator via Akka Extension
val replicator = DistributedData(context.system).replicator
64. Update
• Pure modify function to update the CRDT
• Choose consistency level
replicator ! Update(key, consistency, request)(modify)
65. Get
def receive = {
case GetFromCache(key) =>
replicator ! Get(key, consistency, request)
case g @ GetSuccess(key, request) =>
// CRDT by key found
val currentValue = g.get(key).value
case GetFailure(key, request) =>
// Failed to retrieve data based on consistency level
case NotFound(key, request) =>
// Key not found
}
66. Consistency
• Specify how many nodes must respond successfully to
write or read data
• Request-based
• Consistency levels
• ReadLocal, WriteLocal
• ReadMajority, WriteMajority
• ReadFrom, WriteFrom
• ReadAll, WriteAll
67. Subscribe
Receive changed notifications with updated data
replicator ! Subscribe(key, actorRef)
def receive: Receive = {
case c @ Changed(key) =>
val currentValue = c.get(key).value
}
68. Data types
• Counters: GCounter, PNCounter
• Registers: LWWRegister, Flag
• Sets: GSet, ORSet
• Maps: ORMap, ORMultiMap, LWWMap, PNCounterMap
• Values in data types must be serializable
69. Custom data type
• Extend from ReplicatedData trait
• Implement function
• Must be serializable
def merge(that: T: T)
71. Delta CRDTs
• Sending only the delta of the state
• Occasionally full state is replicated
• When node joins
• After network partition
• Support for causal consistency
• Since Akka 2.5.0
72. Delta CRDTs
• Supported built-in data types
• GCounter, PNCounter
• GSet, ORSet
• Ensures causal consistency (if required)
• Custom data types
• Implement methods of trait DeltaReplicatedData
• Use trait RequiresCausalDeliveryOfDeltas to ensure
causal consistency
73. Durable storage
• Configuration to store data on disk
akka.cluster.distributed-data.durable.keys = [“key1", "durable*"]
• Update flushed to disk before UpdateSuccess
• Replicator sends WriteFailure if write to disk failed
74. Limitations
• Eventual consistent
• Not suitable for Big Data
• Data kept in memory on every node
• Maximum 100000 top level entries
• Replicating entire state to a new node can take several
seconds
75. Use Cases
• Key value store
• Service discovery
• Shopping cart
• Distributing state across Akka cluster
76. Learn more
• Akka Distributed Data Samples
• The Final Causal Frontier talk by Sean Cribbs
• Eventually Consistent Data Structures talk by Sean Cribbs
• Strong Eventual Consistency and Conflict-free Replicated Data Types talk by
Mark Shapiro
• A comprehensive study of Convergent and Communitative Replicated Data
Types paper by Mark Shapiro et. al.
• Delta State Replicated Data Types paper by Paulo Sergio Almeida et. al.
• Please stop calling databased CP or AP blog post by Martin Kleppmann