SlideShare une entreprise Scribd logo
1  sur  299
Télécharger pour lire hors ligne
The Road 
to Akka Cluster 
and Beyond… 
Jonas Bonér 
CTO Typesafe 
@jboner
What is a 
Distributed 
System?
What is a 
Distributed 
System? 
and Why would You Need one?
Distributed 
Computing 
is the New 
normal
Distributed 
Computing 
is the New 
normal 
you already have a 
distributed system, 
WHETHER 
you want it or not
Distributed 
Computing 
is the New 
normal 
you already have a 
distributed system, 
WHETHER 
you want it or not 
Mobile 
NOSQL Databases 
SQL Replication 
Cloud & REST Services
What is the 
essence of 
distributed 
computing?
What is the It’s to try to 
essence of 
distributed 
computing? 
overcome 
1. Information travels at 
the speed of light 
2. Independent things 
fail independently
Why do we need it?
Why do we need it? 
Elasticity 
When you outgrow 
the resources of 
a single node
Why do we need it? 
Elasticity 
When you outgrow 
the resources of 
a single node 
Availability 
Providing resilience if 
one node fails
Why do we need it? 
Elasticity 
When you outgrow 
the resources of 
a single node 
Availability 
Providing resilience if 
one node fails 
Rich stateful clients
So, what’s the problem?
So, what’s the problem? 
It is still 
Very Hard
The network is 
Inherently 
Unreliable
You can’t tell the DIFFERENCE 
Between a 
Slow NODE 
and a 
Dead NODE
Fallacies 
Peter Deutsch’s 
8 Fallacies 
of 
Distributed 
Computing
Fallacies 
1. The network is reliable 
2. Latency is zero 
3. Bandwidth is infinite 
4. The network is secure 
5. Topology doesn't change 
6. There is one administrator 
7. Transport cost is zero 
8. The network is homogeneous 
Peter Deutsch’s 
8 Fallacies 
of 
Distributed 
Computing
So, oh yes…
So, oh yes… 
It is still 
Very Hard
Graveyard of distributed systems 
1. Guaranteed Delivery 
2. Synchronous RPC 
3. Distributed Objects 
4. Distributed Shared Mutable State 
5. Serializable Distributed Transactions
General strategies 
Divide & Conquer 
Partition 
for scale 
Replicate 
for resilience
General strategies 
WHICH Requires 
SHARE NOTHING 
Designs 
Asynchronous 
Message-Passing
General strategies 
WHICH Requires 
SHARE NOTHING 
Designs 
Asynchronous 
Message-Passing 
Location 
Transparency 
ISolation 
& Containment
theoretical 
Models
A model for distributed Computation 
Should 
Allow 
explicit 
reasoning 
abouT 
1. Concurrency 
2. Distribution 
3. Mobility 
Carlos Varela 2013
Lambda Calculus 
Alonzo Church 1930
Lambda Calculus 
Alonzo Church 1930 
state 
Immutable state 
Managed through 
functional application 
Referential transparent
Lambda Calculus 
Alonzo Church 1930 
order 
β-reduction—can be 
performed in any order 
Normal order 
Applicative order 
Call-by-name order 
Call-by-value order 
Call-by-need order 
state 
Immutable state 
Managed through 
functional application 
Referential transparent
Lambda Calculus 
Even in parallel 
Alonzo Church 1930 
order 
β-reduction—can be 
performed in any order 
Normal order 
Applicative order 
Call-by-name order 
Call-by-value order 
Call-by-need order 
state 
Immutable state 
Managed through 
functional application 
Referential transparent
Lambda Calculus 
Even in parallel 
Alonzo Church 1930 
order 
β-reduction—can be 
performed in any order 
Normal order 
Applicative order 
Call-by-name order 
Call-by-value order 
Call-by-need order 
state 
Immutable state 
Managed through 
functional application 
Referential transparent 
Supports 
Concurrency
Lambda Calculus 
Even in parallel 
Alonzo Church 1930 
order 
β-reduction—can be 
performed in any order 
Normal order 
Applicative order 
Call-by-name order 
Call-by-value order 
Call-by-need order 
state 
Immutable state 
Managed through 
functional application 
Referential transparent 
Supports 
ConcurrencyNo model for 
Distribution
Lambda Calculus 
Even in parallel 
Alonzo Church 1930 
No order 
model for 
Mobility 
β-reduction—can be 
performed in any order 
Normal order 
Applicative order 
Call-by-name order 
Call-by-value order 
Call-by-need order 
state 
Immutable state 
Managed through 
functional application 
Referential transparent 
Supports 
ConcurrencyNo model for 
Distribution
Von neumann machine 
John von Neumann 1945 
Memory 
Control Unit Arithmetic Logic Unit 
Accumulator 
Input Output
Von neumann machine 
John von Neumann 1945
Von neumann machine 
John von Neumann 1945 
state 
Mutable state 
In-place updates
Von neumann machine 
John von Neumann 1945 
order 
Total order 
List of instructions 
Array of memory 
state 
Mutable state 
In-place updates
Von neumann machine 
John von Neumann 1945 
order 
for 
No Concurrency 
model Total order 
List of instructions 
Array of memory 
state 
Mutable state 
In-place updates
Von neumann machine 
John von Neumann 1945 
order 
for 
state 
No Concurrencymodel No Total order 
Mutable model state 
List of instructions 
In-place updates 
for 
Array of memory 
Distribution
Von neumann machine 
John von Neumann 1945 
No model for 
Mobility 
order 
for 
state 
No Concurrencymodel No Total order 
Mutable model state 
List of instructions 
In-place updates 
for 
Array of memory 
Distribution
transactions 
Jim Gray 1981
transactions 
Jim Gray 1981 
state 
Isolation of updates 
Atomicity
transactions 
Jim Gray 1981 
order 
Serializability 
Disorder across 
transactions 
Illusion of order within 
transactions 
state 
Isolation of updates 
Atomicity
transactions 
Jim Gray 1981 
Concurrency 
order 
Serializability 
Disorder Works 
across 
Work transactions 
Well 
Illusion of order within 
transactions 
state 
Isolation of updates 
Atomicity
transactions 
Jim Gray 1981 
Concurrency 
Works 
Work Well 
Distribution 
Does Not 
Work Well 
order 
Serializability 
Disorder across 
transactions 
Illusion of order within 
transactions 
state 
Isolation of updates 
Atomicity
actors 
Carl HEWITT 1973
actors 
Carl HEWITT 1973 
state 
Share nothing 
Atomicity within the actor
actors 
Carl HEWITT 1973 
order 
Async message passing 
Non-determinism in 
message delivery 
state 
Share nothing 
Atomicity within the actor
actors 
Carl HEWITT 1973 
order 
model for 
Great Concurrency 
Async message passing 
Non-determinism in 
message delivery 
state 
Share nothing 
Atomicity within the actor
actors 
Carl HEWITT 1973 
order 
for 
Great model Great Concurrencymodel for 
Async message passing 
Non-determinism in 
message delivery 
state 
Share nothing 
Atomicity within the actor 
Distribution
actors 
Carl HEWITT 1973 
Great order 
for 
Great model Great model Concurrencymodel for 
Mobility 
for 
Async message passing 
Non-determinism in 
message delivery 
state 
Share nothing 
Atomicity within the actor 
Distribution
other interesting models 
That are 
suitable for 
distributed 
systems 
1. Pi Calculus 
2. Ambient Calculus 
3. Join Calculus
state of the 
The Art
Impossibility 
Theorems
Impossibility of Distributed 
Consensus with One Faulty Process
Impossibility of Distributed 
Consensus with One Faulty Process 
FLP 
Fischer 
Lynch 
Paterson 
1985
Impossibility of Distributed 
Consensus with One Faulty Process 
FLP 
Fischer 
Lynch 
Paterson 
Consensus 
1985 
is impossible
Impossibility of Distributed 
Consensus with One Faulty Process 
FLP “The FLP result shows that in an 
asynchronous setting, where only one 
processor might crash, there is no 
distributed algorithm that solves the 
consensus problem” - The Paper Trail 
Fischer 
Lynch 
Paterson 
1985 
Consensus 
is impossible
Impossibility of Distributed 
Consensus with One Faulty Process 
FLP 
Fischer 
Lynch 
Paterson 
1985
Impossibility of Distributed 
Consensus with One Faulty Process 
FLP “These results do not show that 
such problems cannot be 
“solved” in practice; rather, 
they point up the need for more 
refined models of distributed 
computing” - FLP paper 
Fischer 
Lynch 
Paterson 
1985
CAP 
Theorem
Linearizability 
is impossible 
CAP 
Theorem
Conjecture by 
Eric Brewer 2000 
Proof by 
Lynch & Gilbert 2002 
Linearizability 
is impossible 
CAP 
Theorem
Brewer’s Conjecture and the Feasibility of 
Consistent, Available, Partition-Tolerant Web Services 
Conjecture by 
Eric Brewer 2000 
Proof by 
Lynch & Gilbert 2002 
Linearizability 
is impossible 
CAP 
Theorem
linearizability
linearizability 
“Under linearizable consistency, all 
operations appear to have executed 
atomically in an order that is consistent with 
the global real-time ordering of operations.” 
Herlihy & Wing 1991
linearizability 
“Under linearizable consistency, all 
operations appear to have executed 
atomically in an order that is consistent with 
the global real-time ordering of operations.” 
Herlihy & Wing 1991 
Less formally: 
A read will return the last completed 
write (made on any replica)
dissecting CAP
dissecting CAP 
1. Very influential—but very NARROW scope
dissecting CAP 
1. Very influential—but very NARROW scope 
2. “[CAP] has lead to confusion and misunderstandings 
regarding replica consistency, transactional isolation and 
high availability” - Bailis et.al in HAT paper
dissecting CAP 
1. Very influential—but very NARROW scope 
2. “[CAP] has lead to confusion and misunderstandings 
regarding replica consistency, transactional isolation and 
high availability” - Bailis et.al in HAT paper 
3. Linearizability is very often NOT required
dissecting CAP 
1. Very influential—but very NARROW scope 
2. “[CAP] has lead to confusion and misunderstandings 
regarding replica consistency, transactional isolation and 
high availability” - Bailis et.al in HAT paper 
3. Linearizability is very often NOT required 
4. Ignores LATENCY—but in practice latency & partitions are 
deeply related
dissecting CAP 
1. Very influential—but very NARROW scope 
2. “[CAP] has lead to confusion and misunderstandings 
regarding replica consistency, transactional isolation and 
high availability” - Bailis et.al in HAT paper 
3. Linearizability is very often NOT required 
4. Ignores LATENCY—but in practice latency & partitions are 
deeply related 
5. Partitions are RARE—so why sacrifice C or A ALL the time?
dissecting CAP 
1. Very influential—but very NARROW scope 
2. “[CAP] has lead to confusion and misunderstandings 
regarding replica consistency, transactional isolation and 
high availability” - Bailis et.al in HAT paper 
3. Linearizability is very often NOT required 
4. Ignores LATENCY—but in practice latency & partitions are 
deeply related 
5. Partitions are RARE—so why sacrifice C or A ALL the time? 
6. NOT black and white—can be fine-grained and dynamic
dissecting CAP 
1. Very influential—but very NARROW scope 
2. “[CAP] has lead to confusion and misunderstandings 
regarding replica consistency, transactional isolation and 
high availability” - Bailis et.al in HAT paper 
3. Linearizability is very often NOT required 
4. Ignores LATENCY—but in practice latency & partitions are 
deeply related 
5. Partitions are RARE—so why sacrifice C or A ALL the time? 
6. NOT black and white—can be fine-grained and dynamic 
7. Read ‘CAP Twelve Years Later’ - Eric Brewer
consensus
consensus 
“The problem of reaching agreement 
among remote processes is one of the 
most fundamental problems in distributed 
computing and is at the core of many 
algorithms for distributed data processing, 
distributed file management, and fault-tolerant 
distributed applications.” 
Fischer, Lynch & Paterson 1985
Consistency models
Consistency models 
Strong
Consistency models 
Strong 
Weak
Consistency models 
Strong 
Weak 
Eventual
Time & 
Order
Last write wins 
global clock 
timestamp
Last write wins 
global clock 
timestamp
lamport clocks 
logical clock 
causal consistency 
Leslie lamport 1978
lamport clocks 
logical clock 
causal consistency 
Leslie lamport 1978 
1. When a process does work, increment the counter
lamport clocks 
logical clock 
causal consistency 
Leslie lamport 1978 
1. When a process does work, increment the counter 
2. When a process sends a message, include the counter
lamport clocks 
logical clock 
causal consistency 
Leslie lamport 1978 
1. When a process does work, increment the counter 
2. When a process sends a message, include the counter 
3. When a message is received, merge the counter 
(set the counter to max(local, received) + 1)
vector clocks 
Extends 
lamport clocks 
colin fidge 1988
vector clocks 
Extends 
lamport clocks 
colin fidge 1988 
1. Each node owns and increments its own Lamport Clock
vector clocks 
Extends 
lamport clocks 
colin fidge 1988 
1. Each node owns and increments its own Lamport Clock 
[node -> lamport clock]
vector clocks 
Extends 
lamport clocks 
colin fidge 1988 
1. Each node owns and increments its own Lamport Clock 
[node -> lamport clock]
vector clocks 
Extends 
lamport clocks 
colin fidge 1988 
1. Each node owns and increments its own Lamport Clock 
[node -> lamport clock] 
2. Alway keep the full history of all increments
vector clocks 
Extends 
lamport clocks 
colin fidge 1988 
1. Each node owns and increments its own Lamport Clock 
[node -> lamport clock] 
2. Alway keep the full history of all increments 
3. Merges by calculating the max—monotonic merge
Quorum
Quorum 
Strict majority vote
Quorum 
Strict majority vote 
Sloppy partial vote
Quorum 
Strict majority vote 
Sloppy partial vote 
• Most use R + W > N ⇒ R & W overlap
Quorum 
Strict majority vote 
Sloppy partial vote 
• Most use R + W > N ⇒ R & W overlap 
• If N / 2 + 1 is still alive ⇒ all good
Quorum 
Strict majority vote 
Sloppy partial vote 
• Most use R + W > N ⇒ R & W overlap 
• If N / 2 + 1 is still alive ⇒ all good 
• Most use N ⩵ 3
failure 
Detection
Failure detection 
Formal model
Failure detection 
Formal model 
Strong completeness
Failure detection 
Formal model 
Strong completeness 
Every crashed process is eventually suspected by every correct process
Failure detection 
Formal model 
Strong completeness 
Everyone knows 
Every crashed process is eventually suspected by every correct process
Failure detection 
Formal model 
Strong completeness 
Everyone knows 
Every crashed process is eventually suspected by every correct process 
Weak completeness
Failure detection 
Formal model 
Strong completeness 
Everyone knows 
Every crashed process is eventually suspected by every correct process 
Weak completeness 
Every crashed process is eventually suspected by some correct process
Failure detection 
Formal model 
Strong completeness 
Everyone knows 
Every crashed process is eventually suspected by every correct process 
Weak completeness 
Someone knows 
Every crashed process is eventually suspected by some correct process
Failure detection 
Formal model 
Strong completeness 
Everyone knows 
Every crashed process is eventually suspected by every correct process 
Weak completeness 
Someone knows 
Every crashed process is eventually suspected by some correct process 
Strong accuracy
Failure detection 
Formal model 
Strong completeness 
Everyone knows 
Every crashed process is eventually suspected by every correct process 
Weak completeness 
Someone knows 
Every crashed process is eventually suspected by some correct process 
Strong accuracy 
No correct process is suspected ever
Failure detection 
Strong completeness 
Every crashed process is eventually suspected by every correct process 
Weak completeness 
Every crashed process is eventually suspected by some correct process 
Strong accuracy 
No false positives 
No correct process is suspected ever 
Formal model 
Everyone knows 
Someone knows
Failure detection 
Strong completeness 
Every crashed process is eventually suspected by every correct process 
Weak completeness 
Every crashed process is eventually suspected by some correct process 
Strong accuracy 
No false positives 
No correct process is suspected ever 
Weak accuracy 
Formal model 
Everyone knows 
Someone knows
Failure detection 
Strong completeness 
Every crashed process is eventually suspected by every correct process 
Weak completeness 
Every crashed process is eventually suspected by some correct process 
Strong accuracy 
No false positives 
No correct process is suspected ever 
Weak accuracy 
Some correct process is never suspected 
Formal model 
Everyone knows 
Someone knows
Failure detection 
Strong completeness 
Every crashed process is eventually suspected by every correct process 
Weak completeness 
Every crashed process is eventually suspected by some correct process 
Strong accuracy 
No false positives 
No correct process is suspected ever 
Weak accuracy 
Some false positives 
Some correct process is never suspected 
Formal model 
Everyone knows 
Someone knows
Accrual Failure detector 
Hayashibara et. al. 2004
Accrual Failure detector 
Hayashibara et. al. 2004 
Keeps history of 
heartbeat statistics
Accrual Failure detector 
Hayashibara et. al. 2004 
Keeps history of 
heartbeat statistics 
Decouples monitoring 
from interpretation
Accrual Failure detector 
Hayashibara et. al. 2004 
Keeps history of 
heartbeat statistics 
Decouples monitoring 
from interpretation 
Calculates a likelihood 
(phi value) 
that the process is down
Accrual Failure detector 
Hayashibara et. al. 2004 
Not YES or NO 
Keeps history of 
heartbeat statistics 
Decouples monitoring 
from interpretation 
Calculates a likelihood 
(phi value) 
that the process is down
Accrual Failure detector 
Hayashibara et. al. 2004 
Not YES or NO 
Keeps history of 
heartbeat statistics 
Decouples monitoring 
from interpretation 
Calculates a likelihood 
(phi value) 
that the process is down 
Takes network hiccups into account
Accrual Failure detector 
Hayashibara et. al. 2004 
Not YES or NO 
Keeps history of 
heartbeat statistics 
Decouples monitoring 
from interpretation 
Calculates a likelihood 
(phi value) 
that the process is down 
Takes network hiccups into account 
phi = -log10(1 - F(timeSinceLastHeartbeat)) 
F is the cumulative distribution function of a normal distribution with mean 
and standard deviation estimated from historical heartbeat inter-arrival times
SWIM Failure detector 
das et. al. 2002
SWIM Failure detector 
das et. al. 2002 
Separates heartbeats from cluster dissemination
SWIM Failure detector 
das et. al. 2002 
Separates heartbeats from cluster dissemination 
Quarantine: suspected ⇒ time window ⇒ faulty
SWIM Failure detector 
das et. al. 2002 
Separates heartbeats from cluster dissemination 
Quarantine: suspected ⇒ time window ⇒ faulty 
Delegated heartbeat to bridge network splits
byzantine Failure detector 
liskov et. al. 1999
byzantine Failure detector 
liskov et. al. 1999 
Supports 
misbehaving 
processes
byzantine Failure detector 
liskov et. al. 1999 
Supports 
misbehaving 
processes 
Omission failures
byzantine Failure detector 
liskov et. al. 1999 
Supports 
misbehaving 
processes 
Omission failures 
Crash failures, failing to receive a request, 
or failing to send a response
byzantine Failure detector 
liskov et. al. 1999 
Supports 
misbehaving 
processes 
Omission failures 
Crash failures, failing to receive a request, 
or failing to send a response 
Commission failures
byzantine Failure detector 
liskov et. al. 1999 
Supports 
misbehaving 
processes 
Omission failures 
Crash failures, failing to receive a request, 
or failing to send a response 
Commission failures 
Processing a request incorrectly, corrupting 
local state, and/or sending an incorrect or 
inconsistent response to a request
byzantine Failure detector 
liskov et. al. 1999 
Supports 
misbehaving 
processes 
Omission failures 
Crash failures, failing to receive a request, 
or failing to send a response 
Commission failures 
Processing a request incorrectly, corrupting 
local state, and/or sending an incorrect or 
inconsistent response to a request 
Very expensive, not practical
replication
Types of replication 
Active (Push) 
! 
Asynchronous 
Passive (Pull) 
! 
Synchronous 
VS 
VS
master/slave Replication
Tree replication
master/master Replication
buddy Replication
buddy Replication
analysis of 
replication 
consensus 
strategies 
Ryan Barrett 2009
Strong 
Consistency
Distributed 
transactions 
Strikes Back
Highly Available Transactions 
Peter Bailis et. al. 2013 
HAT NOT 
CAP
Highly Available Transactions 
Peter Bailis et. al. 2013 
Executive Summary 
HAT NOT 
CAP
Highly Available Transactions 
Peter Bailis et. al. 2013 
Executive Summary 
• Most SQL DBs do not provide 
Serializability, but weaker guarantees— 
for performance reasons 
HAT NOT 
CAP
Highly Available Transactions 
Peter Bailis et. al. 2013 
Executive Summary 
• Most SQL DBs do not provide 
Serializability, but weaker guarantees— 
for performance reasons 
• Some weaker transaction guarantees are 
possible to implement in a HA manner 
HAT NOT 
CAP
Highly Available Transactions 
Peter Bailis et. al. 2013 
Executive Summary 
• Most SQL DBs do not provide 
Serializability, but weaker guarantees— 
for performance reasons 
• Some weaker transaction guarantees are 
possible to implement in a HA manner 
• What transaction semantics can be 
provided with HA? 
HAT NOT 
CAP
HAT
UnAvailable 
• Serializable 
• Snapshot Isolation 
• Repeatable Read 
• Cursor Stability 
• etc. 
Highly Available 
• Read Committed 
• Read Uncommited 
• Read Your Writes 
• Monotonic Atomic View 
• Monotonic Read/Write 
• etc. 
HAT
Other scalable or 
Highly Available 
Transactional Research
Other scalable or 
Highly Available 
Transactional Research 
Bolt-On Consistency Bailis et. al. 2013
Other scalable or 
Highly Available 
Transactional Research 
Bolt-On Consistency Bailis et. al. 2013 
Calvin Thompson et. al. 2012
Other scalable or 
Highly Available 
Transactional Research 
Bolt-On Consistency Bailis et. al. 2013 
Calvin Thompson et. al. 2012 
Spanner (Google) Corbett et. al. 2012
consensus 
Protocols
Specification
Specification 
Properties
Events 
1. Request(v) 
2. Decide(v) 
Specification 
Properties
Events 
1. Request(v) 
2. Decide(v) 
Specification 
Properties 
1. Termination: every process eventually decides on a value v
Events 
1. Request(v) 
2. Decide(v) 
Specification 
Properties 
1. Termination: every process eventually decides on a value v 
2. Validity: if a process decides v, then v was proposed by 
some process
Events 
1. Request(v) 
2. Decide(v) 
Specification 
Properties 
1. Termination: every process eventually decides on a value v 
2. Validity: if a process decides v, then v was proposed by 
some process 
3. Integrity: no process decides twice
Events 
1. Request(v) 
2. Decide(v) 
Specification 
Properties 
1. Termination: every process eventually decides on a value v 
2. Validity: if a process decides v, then v was proposed by 
some process 
3. Integrity: no process decides twice 
4. Agreement: no two correct processes decide differently
Consensus Algorithms 
CAP
Consensus Algorithms 
CAP
Consensus Algorithms 
VR Oki & liskov 1988 CAP
Consensus Algorithms 
VR Oki & liskov 1988 
CAP Paxos Lamport 1989
Consensus Algorithms 
VR Oki & liskov 1988 
Paxos Lamport 1989 
ZAB reed & junquiera 2008 CAP
Consensus Algorithms 
VR Oki & liskov 1988 
Paxos Lamport 1989 
ZAB reed & junquiera 2008 
Raft ongaro & ousterhout 2013 
CAP
Event 
Log
Immutability 
“Immutability Changes Everything” - Pat Helland 
Immutable Data 
Share Nothing Architecture
Immutability 
“Immutability Changes Everything” - Pat Helland 
Immutable Data 
Share Nothing Architecture 
Is the path towards 
TRUE Scalability
Think In Facts 
"The database is a cache of a subset of the log” - Pat Helland
Think In Facts 
"The database is a cache of a subset of the log” - Pat Helland 
Never delete data 
Knowledge only grows 
Append-Only Event Log 
Use Event Sourcing and/or CQRS
Aggregate Roots 
Can wrap multiple Entities 
Aggregate Root is the Transactional Boundary
Aggregate Roots 
Can wrap multiple Entities 
Aggregate Root is the Transactional Boundary 
Strong Consistency Within Aggregate 
Eventual Consistency Between Aggregates
Aggregate Roots 
Can wrap multiple Entities 
Aggregate Root is the Transactional Boundary 
Strong Consistency Within Aggregate 
Eventual Consistency Between Aggregates 
No limit to scalability
eventual 
Consistency
Dynamo 
VerY 
influential 
CAP 
Vogels et. al. 2007
Dynamo 
Popularized 
• Eventual consistency 
• Epidemic gossip 
• Consistent hashing 
! 
• Hinted handoff 
• Read repair 
• Anti-Entropy W/ Merkle trees 
VerY 
influential 
CAP 
Vogels et. al. 2007
Consistent Hashing 
Karger et. al. 1997
Consistent Hashing 
Karger et. al. 1997 
Support elasticity— 
easier to scale up and 
down 
Avoids hotspots 
Enables partitioning 
and replication
Consistent Hashing 
Karger et. al. 1997 
Support elasticity— 
easier to scale up and 
down 
Avoids hotspots 
Enables partitioning 
and replication 
Only K/N nodes needs to be remapped when adding 
or removing a node (K=#keys, N=#nodes)
How eventual is
How eventual is Eventual 
consistency?
How eventual is 
How consistent is 
Eventual 
consistency?
How eventual is 
How consistent is 
Eventual 
consistency? 
PBS 
Probabilistically 
Bounded Staleness 
Peter Bailis et. al 2012
How eventual is 
How consistent is 
Eventual 
consistency? 
PBS 
Probabilistically 
Bounded Staleness 
Peter Bailis et. al 2012
epidemic 
Gossip
Node ring & Epidemic Gossip 
CHORD 
Stoica et al 2001
Node ring & Epidemic Gossip 
Member 
Node 
Member 
Node 
Member 
Node 
Member 
Node 
Member 
Node 
Member 
Node 
Member 
Node 
Member 
Node 
Member 
Node 
Member 
Node 
CHORD 
Stoica et al 2001
Node ring & Epidemic Gossip 
Member 
Node 
Member 
Node 
Member 
Node 
Member 
Node 
Member 
Node 
Member 
Node 
Member 
Node 
Member 
Node 
Member 
Node 
Member 
Node 
CHORD 
Stoica et al 2001
Node ring & Epidemic Gossip 
Member 
Node 
Member 
Node 
Member 
Node 
Member 
Node 
Member 
Node 
Member 
Node 
Member 
Node 
Member 
Node 
Member 
Node 
Member 
Node 
CHORD 
Stoica et al 2001
Node ring & Epidemic Gossip 
CHORD 
Stoica et al 2001 CAP 
Member 
Node 
Member 
Node 
Member 
Node 
Member 
Node 
Member 
Node 
Member 
Node 
Member 
Node 
Member 
Node 
Member 
Node 
Member 
Node
Benefits of Epidemic Gossip 
Decentralized P2P 
No SPOF or SPOB 
Very Scalable 
Fully Elastic 
! 
Requires minimal 
administration 
Often used with 
VECTOR CLOCKS
Some Standard 
Optimizations to 
Epidemic Gossip 
1. Separation of failure detection heartbeat 
and dissemination of data - DAS et. al. 2002 (SWIM) 
2. Push/Pull gossip - Khambatti et. al 2003 
1. Hash and compare data 
2. Use single hash or Merkle Trees
disorderly 
Programming
ACID 2.0
ACID 2.0 
Associative 
Batch-insensitive 
(grouping doesn't matter) 
a+(b+c)=(a+b)+c
ACID 2.0 
Associative 
Batch-insensitive 
(grouping doesn't matter) 
a+(b+c)=(a+b)+c 
Commutative 
Order-insensitive 
(order doesn't matter) 
a+b=b+a
ACID 2.0 
Associative 
Batch-insensitive 
(grouping doesn't matter) 
a+(b+c)=(a+b)+c 
Commutative 
Order-insensitive 
(order doesn't matter) 
a+b=b+a 
Idempotent 
Retransmission-insensitive 
(duplication does not matter) 
a+a=a
ACID 2.0 
Associative 
Batch-insensitive 
(grouping doesn't matter) 
a+(b+c)=(a+b)+c 
Commutative 
Order-insensitive 
(order doesn't matter) 
a+b=b+a 
Idempotent 
Retransmission-insensitive 
(duplication does not matter) 
a+a=a 
Eventually Consistent
Convergent & Commutative 
Replicated Data Types 
Shapiro et. al. 2011
Convergent & Commutative 
Replicated Data Types 
Shapiro et. al. 2011 
CRDT
Convergent & Commutative 
Replicated Data Types 
Shapiro et. al. 2011 
CRDT 
Join Semilattice 
Monotonic merge function
Convergent & Commutative 
Replicated Data Types 
Shapiro et. al. 2011 
Data types 
Counters 
Registers 
Sets 
Maps 
Graphs 
CRDT 
Join Semilattice 
Monotonic merge function
Convergent & Commutative 
Replicated Data Types 
Shapiro et. al. 2011 
Data types 
Counters 
Registers 
Sets 
Maps 
Graphs 
CRDT 
CAP 
Join Semilattice 
Monotonic merge function
2 TYPES of CRDTs 
CvRDT 
Convergent 
State-based 
CmRDT 
Commutative 
Ops-based
2 TYPES of CRDTs 
CvRDT 
Convergent 
State-based 
CmRDT 
Commutative 
Ops-based 
Self contained, 
holds all history
2 TYPES of CRDTs 
CvRDT 
Convergent 
State-based 
CmRDT 
Commutative 
Ops-based 
Self contained, 
holds all history 
Needs a reliable 
broadcast channel
CALM theorem 
Hellerstein et. al. 2011 
Consistency As Logical Monotonicity
CALM theorem 
Hellerstein et. al. 2011 
Consistency As Logical Monotonicity 
Bloom Language 
Compiler help to detect & 
encapsulate non-monotonicity
CALM theorem 
Hellerstein et. al. 2011 
Consistency As Logical Monotonicity 
Bloom Language 
Distributed Logic 
Datalog/Dedalus 
Monotonic functions 
Just add facts to the system 
Model state as Lattices 
Similar to CRDTs (without the scope problem) 
Compiler help to detect & 
encapsulate non-monotonicity
The Akka Way
Akka Actors
Akka IO 
Akka Actors
Akka REMOTE 
Akka IO 
Akka Actors
Akka CLUSTER 
Akka REMOTE 
Akka IO 
Akka Actors
Akka CLUSTER EXTENSIONS 
Akka CLUSTER 
Akka REMOTE 
Akka IO 
Akka Actors
What is Akka CLUSTER all about? 
• Cluster Membership 
• Leader & Singleton 
• Cluster Sharding 
• Clustered Routers (adaptive, consistent hashing, …) 
• Clustered Supervision and Deathwatch 
• Clustered Pub/Sub 
• and more
cluster membership in Akka
cluster membership in Akka 
• Dynamo-style master-less decentralized P2P
cluster membership in Akka 
• Dynamo-style master-less decentralized P2P 
• Epidemic Gossip—Node Ring
cluster membership in Akka 
• Dynamo-style master-less decentralized P2P 
• Epidemic Gossip—Node Ring 
• Vector Clocks for causal consistency
cluster membership in Akka 
• Dynamo-style master-less decentralized P2P 
• Epidemic Gossip—Node Ring 
• Vector Clocks for causal consistency 
• Fully elastic with no SPOF or SPOB
cluster membership in Akka 
• Dynamo-style master-less decentralized P2P 
• Epidemic Gossip—Node Ring 
• Vector Clocks for causal consistency 
• Fully elastic with no SPOF or SPOB 
• Very scalable—2400 nodes (on GCE)
cluster membership in Akka 
• Dynamo-style master-less decentralized P2P 
• Epidemic Gossip—Node Ring 
• Vector Clocks for causal consistency 
• Fully elastic with no SPOF or SPOB 
• Very scalable—2400 nodes (on GCE) 
• High throughput—1000 nodes in 4 min (on GCE)
Gossip 
State 
GOSSIPING 
case class Gossip( 
members: SortedSet[Member], 
seen: Set[Member], 
unreachable: Set[Member], 
version: VectorClock)
Gossip 
State 
GOSSIPING 
Is a CRDT 
case class Gossip( 
members: SortedSet[Member], 
seen: Set[Member], 
unreachable: Set[Member], 
version: VectorClock)
Gossip 
State 
GOSSIPING 
Is a CRDT 
Ordered node ring 
case class Gossip( 
members: SortedSet[Member], 
seen: Set[Member], 
unreachable: Set[Member], 
version: VectorClock)
Gossip 
State 
GOSSIPING 
Is a CRDT 
Ordered node ring 
case class Gossip( 
members: SortedSet[Member], 
seen: Set[Member], 
unreachable: Set[Member], 
version: VectorClock) 
Seen set 
for convergence
Gossip 
State 
GOSSIPING 
Is a CRDT 
Ordered node ring 
case class Gossip( 
members: SortedSet[Member], 
seen: Set[Member], 
unreachable: Set[Member], 
version: VectorClock) 
Seen set 
for convergence 
Unreachable set
Gossip 
State 
GOSSIPING 
Is a CRDT 
Ordered node ring 
case class Gossip( 
members: SortedSet[Member], 
seen: Set[Member], 
unreachable: Set[Member], 
version: VectorClock) 
Seen set 
for convergence 
Unreachable set 
Version
Gossip 
Seen set 
for convergence 
Unreachable set 
State 
GOSSIPING 
Is a CRDT 
Ordered node ring 
case class Gossip( 
members: SortedSet[Member], 
seen: Set[Member], 
unreachable: Set[Member], 
version: VectorClock) 
Version 
1. Picks random node with older/newer version
Gossip 
Seen set 
for convergence 
Unreachable set 
State 
GOSSIPING 
Is a CRDT 
Ordered node ring 
case class Gossip( 
members: SortedSet[Member], 
seen: Set[Member], 
unreachable: Set[Member], 
version: VectorClock) 
Version 
1. Picks random node with older/newer version 
2. Gossips in a request/reply fashion
Gossip 
Seen set 
for convergence 
Unreachable set 
State 
GOSSIPING 
Is a CRDT 
Ordered node ring 
case class Gossip( 
members: SortedSet[Member], 
seen: Set[Member], 
unreachable: Set[Member], 
version: VectorClock) 
Version 
1. Picks random node with older/newer version 
2. Gossips in a request/reply fashion 
3. Updates internal state and adds himself to ‘seen’ set
Cluster 
Convergence
Cluster 
Convergence 
Reached when: 
1. All nodes are represented in the seen set 
2. No members are unreachable, or 
3. All unreachable members have status down or exiting
BIASED 
GOSSIP
BIASED 
GOSSIP 
80% bias to nodes not in seen table 
Up to 400 nodes, then reduced
PUSH/PULL 
GOSSIP
PUSH/PULL 
GOSSIP 
Variation
PUSH/PULL 
GOSSIP 
Variation 
case class Status(version: VectorClock)
LEADER 
ROLE
LEADER 
ROLE 
Any node can 
be the leader
LEADER 
ROLE 
Any node can 
be the leader 
1. No election, but deterministic
LEADER 
ROLE 
Any node can 
be the leader 
1. No election, but deterministic 
2. Can change after cluster convergence
LEADER 
ROLE 
Any node can 
be the leader 
1. No election, but deterministic 
2. Can change after cluster convergence 
3. Leader has special duties
Node Lifecycle in Akka
Failure Detection
Failure Detection 
Hashes the node ring 
Picks 5 nodes 
Request/Reply heartbeat
Failure Detection 
To increase likelihood of bridging 
racks and data centers 
Hashes the node ring 
Picks 5 nodes 
Request/Reply heartbeat
Failure Detection 
Used by racks and data centers 
Cluster Membership 
Remote Death Watch 
Remote Supervision 
To increase likelihood of bridging 
Hashes the node ring 
Picks 5 nodes 
Request/Reply heartbeat
Failure Detection 
Is an Accrual Failure Detector
Failure Detection 
Is an Accrual Failure Detector 
Does not 
help much 
in practice
Failure Detection 
Is an Accrual Failure Detector 
Does not 
help much 
in practice 
Need to add delay to deal with Garbage Collection
Failure Detection 
Is an Accrual Failure Detector 
Does not 
help much 
in practice 
Instead of this 
Need to add delay to deal with Garbage Collection
Failure Detection 
Is an Accrual Failure Detector 
Does not 
help much 
in practice 
Instead of this It often looks like this 
Need to add delay to deal with Garbage Collection
Network Partitions
Network Partitions 
• Failure Detector can mark an unavailable member Unreachable
Network Partitions 
• Failure Detector can mark an unavailable member Unreachable 
• If one node is Unreachable then no cluster Convergence
Network Partitions 
• Failure Detector can mark an unavailable member Unreachable 
• If one node is Unreachable then no cluster Convergence 
• This means that the Leader can no longer perform it’s duties
Network Partitions 
• Failure Detector can mark an unavailable • If one node is Split Unreachable then Brain 
member Unreachable 
no cluster Convergence 
• This means that the Leader can no longer perform it’s duties
Network Partitions 
• Failure Detector can mark an unavailable • If one node is Split Unreachable then Brain 
member Unreachable 
no cluster Convergence 
• This means that the Leader can no longer perform it’s duties 
• Member can come back from Unreachable—Else:
Network Partitions 
• Failure Detector can mark an unavailable • If one node is Split Unreachable then Brain 
member Unreachable 
no cluster Convergence 
• This means that the Leader can no longer perform it’s duties 
• Member can come back from Unreachable—Else: 
• The node needs to be marked as Down—either through:
Network Partitions 
• Failure Detector can mark an unavailable • If one node is Split Unreachable then Brain 
member Unreachable 
no cluster Convergence 
• This means that the Leader can no longer perform it’s duties 
• Member can come back from Unreachable—Else: 
• The node needs to be marked as Down—either through: 
1. auto-down 
2. Manual down
Potential FUTURE Optimizations
Potential FUTURE Optimizations 
• Vector Clock HISTORY pruning
Potential FUTURE Optimizations 
• Vector Clock HISTORY pruning 
• Delegated heartbeat
Potential FUTURE Optimizations 
• Vector Clock HISTORY pruning 
• Delegated heartbeat 
• “Real” push/pull gossip
Potential FUTURE Optimizations 
• Vector Clock HISTORY pruning 
• Delegated heartbeat 
• “Real” push/pull gossip 
• More out-of-the-box auto-down patterns
Akka Modules For Distribution
Akka Modules For Distribution 
Akka Cluster 
Akka Remote 
Akka HTTP 
Akka IO
Akka Modules For Distribution 
Akka Cluster 
Akka Remote 
Akka HTTP 
Akka IO 
Clustered Singleton 
Clustered Routers 
Clustered Pub/Sub 
Cluster Client 
Consistent Hashing
…and 
Beyond
Akka & The Road Ahead 
Akka HTTP 
Akka Streams 
Akka CRDT 
Akka Raft
Akka & The Road Ahead 
Akka HTTP 
Akka Streams 
Akka CRDT 
Akka Raft 
Akka 2.4
Akka & The Road Ahead 
Akka HTTP 
Akka Streams 
Akka CRDT 
Akka Raft 
Akka 2.4 
Akka 2.4
Akka & The Road Ahead 
Akka HTTP 
Akka Streams 
Akka CRDT 
Akka Raft 
Akka 2.4 
Akka 2.4 
?
Akka & The Road Ahead 
Akka HTTP 
Akka Streams 
Akka CRDT 
Akka Raft 
Akka 2.4 
Akka 2.4 
? 
?
Eager 
for more?
Try AKKA out 
akka.io
Join us at 
React Conf 
San Francisco 
Nov 18-21 
reactconf.com
Early Registration 
ends tomorrow 
Join us at 
React Conf 
San Francisco 
Nov 18-21 
reactconf.com
References 
• General Distributed Systems 
• Summary of network reliability post-mortems—more terrifying than the most 
horrifying Stephen King novel: http://aphyr.com/posts/288-the-network-is-reliable 
• A Note on Distributed Computing: http://citeseerx.ist.psu.edu/viewdoc/ 
summary?doi=10.1.1.41.7628 
• On the problems with RPC: http://steve.vinoski.net/pdf/IEEE-Convenience_ 
Over_Correctness.pdf 
• 8 Fallacies of Distributed Computing: https://blogs.oracle.com/jag/resource/ 
Fallacies.html 
• 6 Misconceptions of Distributed Computing: www.dsg.cs.tcd.ie/~vjcahill/ 
sigops98/papers/vogels.ps 
• Distributed Computing Systems—A Foundational Approach: http:// 
www.amazon.com/Programming-Distributed-Computing-Systems- 
Foundational/dp/0262018985 
• Introduction to Reliable and Secure Distributed Programming: http:// 
www.distributedprogramming.net/ 
• Nice short overview on Distributed Systems: http://book.mixu.net/distsys/ 
• Meta list of distributed systems readings: https://gist.github.com/macintux/ 
6227368
References 
! 
• Actor Model 
• Great discussion between Erik Meijer & Carl 
Hewitt or the essence of the Actor Model: http:// 
channel9.msdn.com/Shows/Going+Deep/Hewitt- 
Meijer-and-Szyperski-The-Actor-Model-everything- 
you-wanted-to-know-but-were-afraid-to- 
ask 
• Carl Hewitt’s 1973 paper defining the Actor 
Model: http://worrydream.com/refs/Hewitt- 
ActorModel.pdf 
• Gul Agha’s Doctoral Dissertation: https:// 
dspace.mit.edu/handle/1721.1/6952
References 
• FLP 
• Impossibility of Distributed Consensus with One Faulty Process: http:// 
cs-www.cs.yale.edu/homes/arvind/cs425/doc/fischer.pdf 
• A Brief Tour of FLP: http://the-paper-trail.org/blog/a-brief-tour-of-flp-impossibility/ 
• CAP 
• Brewer’s Conjecture and the Feasibility of Consistent, Available, 
Partition-Tolerant Web Services: http://lpd.epfl.ch/sgilbert/pubs/ 
BrewersConjecture-SigAct.pdf 
• You Can’t Sacrifice Partition Tolerance: http://codahale.com/you-cant-sacrifice- 
partition-tolerance/ 
• Linearizability: A Correctness Condition for Concurrent Objects: http:// 
courses.cs.vt.edu/~cs5204/fall07-kafura/Papers/TransactionalMemory/ 
Linearizability.pdf 
• CAP Twelve Years Later: How the "Rules" Have Changed: http:// 
www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changed 
• Consistency vs. Availability: http://www.infoq.com/news/2008/01/ 
consistency-vs-availability
References 
• Time & Order 
• Post on the problems with Last Write Wins in Riak: http:// 
aphyr.com/posts/285-call-me-maybe-riak 
• Time, Clocks, and the Ordering of Events in a Distributed System: 
http://research.microsoft.com/en-us/um/people/lamport/pubs/ 
time-clocks.pdf 
• Vector Clocks: http://zoo.cs.yale.edu/classes/cs426/2012/lab/ 
bib/fidge88timestamps.pdf 
• Failure Detection 
• Unreliable Failure Detectors for Reliable Distributed Systems: 
http://www.cs.utexas.edu/~lorenzo/corsi/cs380d/papers/p225- 
chandra.pdf 
• The ϕ Accrual Failure Detector: http://ddg.jaist.ac.jp/pub/HDY 
+04.pdf 
• SWIM Failure Detector: http://www.cs.cornell.edu/~asdas/ 
research/dsn02-swim.pdf 
• Practical Byzantine Fault Tolerance: http://www.pmg.lcs.mit.edu/ 
papers/osdi99.pdf
References 
• Transactions 
• Jim Gray’s classic book: http://www.amazon.com/Transaction- 
Processing-Concepts-Techniques-Management/dp/1558601902 
• Highly Available Transactions: Virtues and Limitations: http:// 
www.bailis.org/papers/hat-vldb2014.pdf 
• Bolt on Consistency: http://db.cs.berkeley.edu/papers/sigmod13- 
bolton.pdf 
• Calvin: Fast Distributed Transactions for Partitioned Database Systems: 
http://cs.yale.edu/homes/thomson/publications/calvin-sigmod12.pdf 
• Spanner: Google's Globally-Distributed Database: http:// 
research.google.com/archive/spanner.html 
• Life beyond Distributed Transactions: an Apostate’s Opinion https:// 
cs.brown.edu/courses/cs227/archives/2012/papers/weaker/ 
cidr07p15.pdf 
• Immutability Changes Everything—Pat Hellands talk at Ricon: http:// 
vimeo.com/52831373 
• Unschackle Your Domain (Event Sourcing): http://www.infoq.com/ 
presentations/greg-young-unshackle-qcon08 
• CQRS: http://martinfowler.com/bliki/CQRS.html
References 
• Consensus 
• Paxos Made Simple: http://research.microsoft.com/en-us/ 
um/people/lamport/pubs/paxos-simple.pdf 
• Paxos Made Moderately Complex: http:// 
www.cs.cornell.edu/courses/cs7412/2011sp/paxos.pdf 
• A simple totally ordered broadcast protocol (ZAB): 
labs.yahoo.com/files/ladis08.pdf 
• In Search of an Understandable Consensus Algorithm 
(Raft): https://ramcloud.stanford.edu/wiki/download/ 
attachments/11370504/raft.pdf 
• Replication strategy comparison diagram: http:// 
snarfed.org/transactions_across_datacenters_io.html 
• Distributed Snapshots: Determining Global States of 
Distributed Systems: http://www.cs.swarthmore.edu/ 
~newhall/readings/snapshots.pdf
References 
• Eventual Consistency 
• Dynamo: Amazon’s Highly Available Key-value 
Store: http://www.read.seas.harvard.edu/ 
~kohler/class/cs239-w08/ 
decandia07dynamo.pdf 
• Consistency vs. Availability: http:// 
www.infoq.com/news/2008/01/consistency-vs- 
availability 
• Consistent Hashing and Random Trees: http:// 
thor.cs.ucsb.edu/~ravenben/papers/coreos/kll 
+97.pdf 
• PBS: Probabilistically Bounded Staleness: 
http://pbs.cs.berkeley.edu/
References 
• Epidemic Gossip 
• Chord: A Scalable Peer-to-peer Lookup Service for Internet 
• Applications: http://pdos.csail.mit.edu/papers/chord:sigcomm01/ 
chord_sigcomm.pdf 
• Gossip-style Failure Detector: http://www.cs.cornell.edu/home/rvr/ 
papers/GossipFD.pdf 
• GEMS: http://www.hcs.ufl.edu/pubs/GEMS2005.pdf 
• Efficient Reconciliation and Flow Control for Anti-Entropy Protocols: 
http://www.cs.cornell.edu/home/rvr/papers/flowgossip.pdf 
• 2400 Akka nodes on GCE: http://typesafe.com/blog/running-a-2400- 
akka-nodes-cluster-on-google-compute-engine 
• Starting 1000 Akka nodes in 4 min: http://typesafe.com/blog/starting-up- 
a-1000-node-akka-cluster-in-4-minutes-on-google-compute-engine 
• Push Pull Gossiping: http://khambatti.com/mujtaba/ 
ArticlesAndPapers/pdpta03.pdf 
• SWIM: Scalable Weakly-consistent Infection-style Process Group 
Membership Protocol: http://www.cs.cornell.edu/~asdas/research/ 
dsn02-swim.pdf
References 
• Conflict-Free Replicated Data Types (CRDTs) 
• A comprehensive study of Convergent and Commutative 
Replicated Data Types: http://hal.upmc.fr/docs/ 
00/55/55/88/PDF/techreport.pdf 
• Mark Shapiro talks about CRDTs at Microsoft: http:// 
research.microsoft.com/apps/video/dl.aspx?id=153540 
• Akka CRDT project: https://github.com/jboner/akka-crdt 
• CALM 
• Dedalus: Datalog in Time and Space: http:// 
db.cs.berkeley.edu/papers/datalog2011-dedalus.pdf 
• CALM: http://www.cs.berkeley.edu/~palvaro/cidr11.pdf 
• Logic and Lattices for Distributed Programming: http:// 
db.cs.berkeley.edu/papers/UCB-lattice-tr.pdf 
• Bloom Language website: http://bloom-lang.net 
• Joe Hellerstein talks about CALM: http://vimeo.com/ 
53904989
References 
• Akka Cluster 
• My Akka Cluster Implementation Notes: https:// 
gist.github.com/jboner/7692270 
• Akka Cluster Specification: http://doc.akka.io/docs/ 
akka/snapshot/common/cluster.html 
• Akka Cluster Docs: http://doc.akka.io/docs/akka/ 
snapshot/scala/cluster-usage.html 
• Akka Failure Detector Docs: http://doc.akka.io/docs/ 
akka/snapshot/scala/remoting.html#Failure_Detector 
• Akka Roadmap: https://docs.google.com/a/ 
typesafe.com/document/d/18W9- 
fKs55wiFNjXL9q50PYOnR7-nnsImzJqHOPPbM4E/ 
mobilebasic?pli=1&hl=en_US 
• Where Akka Came From: http://letitcrash.com/post/ 
40599293211/where-akka-came-from
any Questions?
The Road 
to Akka Cluster 
and Beyond… 
Jonas Bonér 
CTO Typesafe 
@jboner

Contenu connexe

Plus de Jonas Bonér

Cloudstate—Towards Stateful Serverless
Cloudstate—Towards Stateful ServerlessCloudstate—Towards Stateful Serverless
Cloudstate—Towards Stateful ServerlessJonas Bonér
 
Designing Events-first Microservices
Designing Events-first MicroservicesDesigning Events-first Microservices
Designing Events-first MicroservicesJonas Bonér
 
How Events Are Reshaping Modern Systems
How Events Are Reshaping Modern SystemsHow Events Are Reshaping Modern Systems
How Events Are Reshaping Modern SystemsJonas Bonér
 
Reactive Microsystems: The Evolution of Microservices at Scale
Reactive Microsystems: The Evolution of Microservices at ScaleReactive Microsystems: The Evolution of Microservices at Scale
Reactive Microsystems: The Evolution of Microservices at ScaleJonas Bonér
 
From Microliths To Microsystems
From Microliths To MicrosystemsFrom Microliths To Microsystems
From Microliths To MicrosystemsJonas Bonér
 
Without Resilience, Nothing Else Matters
Without Resilience, Nothing Else MattersWithout Resilience, Nothing Else Matters
Without Resilience, Nothing Else MattersJonas Bonér
 
Life Beyond the Illusion of Present
Life Beyond the Illusion of PresentLife Beyond the Illusion of Present
Life Beyond the Illusion of PresentJonas Bonér
 
Go Reactive: Building Responsive, Resilient, Elastic & Message-Driven Systems
Go Reactive: Building Responsive, Resilient, Elastic & Message-Driven SystemsGo Reactive: Building Responsive, Resilient, Elastic & Message-Driven Systems
Go Reactive: Building Responsive, Resilient, Elastic & Message-Driven SystemsJonas Bonér
 
Reactive Supply To Changing Demand
Reactive Supply To Changing DemandReactive Supply To Changing Demand
Reactive Supply To Changing DemandJonas Bonér
 
Building Reactive Systems with Akka (in Java 8 or Scala)
Building Reactive Systems with Akka (in Java 8 or Scala)Building Reactive Systems with Akka (in Java 8 or Scala)
Building Reactive Systems with Akka (in Java 8 or Scala)Jonas Bonér
 
Go Reactive: Event-Driven, Scalable, Resilient & Responsive Systems
Go Reactive: Event-Driven, Scalable, Resilient & Responsive SystemsGo Reactive: Event-Driven, Scalable, Resilient & Responsive Systems
Go Reactive: Event-Driven, Scalable, Resilient & Responsive SystemsJonas Bonér
 
Building Scalable, Highly Concurrent & Fault Tolerant Systems - Lessons Learned
Building Scalable, Highly Concurrent & Fault Tolerant Systems -  Lessons LearnedBuilding Scalable, Highly Concurrent & Fault Tolerant Systems -  Lessons Learned
Building Scalable, Highly Concurrent & Fault Tolerant Systems - Lessons LearnedJonas Bonér
 
Event Driven-Architecture from a Scalability perspective
Event Driven-Architecture from a Scalability perspectiveEvent Driven-Architecture from a Scalability perspective
Event Driven-Architecture from a Scalability perspectiveJonas Bonér
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsJonas Bonér
 
Akka: Simpler Scalability, Fault-Tolerance, Concurrency & Remoting through Ac...
Akka: Simpler Scalability, Fault-Tolerance, Concurrency & Remoting through Ac...Akka: Simpler Scalability, Fault-Tolerance, Concurrency & Remoting through Ac...
Akka: Simpler Scalability, Fault-Tolerance, Concurrency & Remoting through Ac...Jonas Bonér
 
State: You're Doing It Wrong - Alternative Concurrency Paradigms For The JVM
State: You're Doing It Wrong - Alternative Concurrency Paradigms For The JVMState: You're Doing It Wrong - Alternative Concurrency Paradigms For The JVM
State: You're Doing It Wrong - Alternative Concurrency Paradigms For The JVMJonas Bonér
 
Pragmatic Real-World Scala (short version)
Pragmatic Real-World Scala (short version)Pragmatic Real-World Scala (short version)
Pragmatic Real-World Scala (short version)Jonas Bonér
 

Plus de Jonas Bonér (17)

Cloudstate—Towards Stateful Serverless
Cloudstate—Towards Stateful ServerlessCloudstate—Towards Stateful Serverless
Cloudstate—Towards Stateful Serverless
 
Designing Events-first Microservices
Designing Events-first MicroservicesDesigning Events-first Microservices
Designing Events-first Microservices
 
How Events Are Reshaping Modern Systems
How Events Are Reshaping Modern SystemsHow Events Are Reshaping Modern Systems
How Events Are Reshaping Modern Systems
 
Reactive Microsystems: The Evolution of Microservices at Scale
Reactive Microsystems: The Evolution of Microservices at ScaleReactive Microsystems: The Evolution of Microservices at Scale
Reactive Microsystems: The Evolution of Microservices at Scale
 
From Microliths To Microsystems
From Microliths To MicrosystemsFrom Microliths To Microsystems
From Microliths To Microsystems
 
Without Resilience, Nothing Else Matters
Without Resilience, Nothing Else MattersWithout Resilience, Nothing Else Matters
Without Resilience, Nothing Else Matters
 
Life Beyond the Illusion of Present
Life Beyond the Illusion of PresentLife Beyond the Illusion of Present
Life Beyond the Illusion of Present
 
Go Reactive: Building Responsive, Resilient, Elastic & Message-Driven Systems
Go Reactive: Building Responsive, Resilient, Elastic & Message-Driven SystemsGo Reactive: Building Responsive, Resilient, Elastic & Message-Driven Systems
Go Reactive: Building Responsive, Resilient, Elastic & Message-Driven Systems
 
Reactive Supply To Changing Demand
Reactive Supply To Changing DemandReactive Supply To Changing Demand
Reactive Supply To Changing Demand
 
Building Reactive Systems with Akka (in Java 8 or Scala)
Building Reactive Systems with Akka (in Java 8 or Scala)Building Reactive Systems with Akka (in Java 8 or Scala)
Building Reactive Systems with Akka (in Java 8 or Scala)
 
Go Reactive: Event-Driven, Scalable, Resilient & Responsive Systems
Go Reactive: Event-Driven, Scalable, Resilient & Responsive SystemsGo Reactive: Event-Driven, Scalable, Resilient & Responsive Systems
Go Reactive: Event-Driven, Scalable, Resilient & Responsive Systems
 
Building Scalable, Highly Concurrent & Fault Tolerant Systems - Lessons Learned
Building Scalable, Highly Concurrent & Fault Tolerant Systems -  Lessons LearnedBuilding Scalable, Highly Concurrent & Fault Tolerant Systems -  Lessons Learned
Building Scalable, Highly Concurrent & Fault Tolerant Systems - Lessons Learned
 
Event Driven-Architecture from a Scalability perspective
Event Driven-Architecture from a Scalability perspectiveEvent Driven-Architecture from a Scalability perspective
Event Driven-Architecture from a Scalability perspective
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
 
Akka: Simpler Scalability, Fault-Tolerance, Concurrency & Remoting through Ac...
Akka: Simpler Scalability, Fault-Tolerance, Concurrency & Remoting through Ac...Akka: Simpler Scalability, Fault-Tolerance, Concurrency & Remoting through Ac...
Akka: Simpler Scalability, Fault-Tolerance, Concurrency & Remoting through Ac...
 
State: You're Doing It Wrong - Alternative Concurrency Paradigms For The JVM
State: You're Doing It Wrong - Alternative Concurrency Paradigms For The JVMState: You're Doing It Wrong - Alternative Concurrency Paradigms For The JVM
State: You're Doing It Wrong - Alternative Concurrency Paradigms For The JVM
 
Pragmatic Real-World Scala (short version)
Pragmatic Real-World Scala (short version)Pragmatic Real-World Scala (short version)
Pragmatic Real-World Scala (short version)
 

Dernier

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 

Dernier (20)

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 

The Road to Akka Cluster and Beyond

  • 1. The Road to Akka Cluster and Beyond… Jonas Bonér CTO Typesafe @jboner
  • 2. What is a Distributed System?
  • 3. What is a Distributed System? and Why would You Need one?
  • 4. Distributed Computing is the New normal
  • 5. Distributed Computing is the New normal you already have a distributed system, WHETHER you want it or not
  • 6. Distributed Computing is the New normal you already have a distributed system, WHETHER you want it or not Mobile NOSQL Databases SQL Replication Cloud & REST Services
  • 7. What is the essence of distributed computing?
  • 8. What is the It’s to try to essence of distributed computing? overcome 1. Information travels at the speed of light 2. Independent things fail independently
  • 9. Why do we need it?
  • 10. Why do we need it? Elasticity When you outgrow the resources of a single node
  • 11. Why do we need it? Elasticity When you outgrow the resources of a single node Availability Providing resilience if one node fails
  • 12. Why do we need it? Elasticity When you outgrow the resources of a single node Availability Providing resilience if one node fails Rich stateful clients
  • 13. So, what’s the problem?
  • 14. So, what’s the problem? It is still Very Hard
  • 15. The network is Inherently Unreliable
  • 16. You can’t tell the DIFFERENCE Between a Slow NODE and a Dead NODE
  • 17. Fallacies Peter Deutsch’s 8 Fallacies of Distributed Computing
  • 18. Fallacies 1. The network is reliable 2. Latency is zero 3. Bandwidth is infinite 4. The network is secure 5. Topology doesn't change 6. There is one administrator 7. Transport cost is zero 8. The network is homogeneous Peter Deutsch’s 8 Fallacies of Distributed Computing
  • 20. So, oh yes… It is still Very Hard
  • 21. Graveyard of distributed systems 1. Guaranteed Delivery 2. Synchronous RPC 3. Distributed Objects 4. Distributed Shared Mutable State 5. Serializable Distributed Transactions
  • 22. General strategies Divide & Conquer Partition for scale Replicate for resilience
  • 23. General strategies WHICH Requires SHARE NOTHING Designs Asynchronous Message-Passing
  • 24. General strategies WHICH Requires SHARE NOTHING Designs Asynchronous Message-Passing Location Transparency ISolation & Containment
  • 26. A model for distributed Computation Should Allow explicit reasoning abouT 1. Concurrency 2. Distribution 3. Mobility Carlos Varela 2013
  • 27.
  • 28. Lambda Calculus Alonzo Church 1930
  • 29. Lambda Calculus Alonzo Church 1930 state Immutable state Managed through functional application Referential transparent
  • 30. Lambda Calculus Alonzo Church 1930 order β-reduction—can be performed in any order Normal order Applicative order Call-by-name order Call-by-value order Call-by-need order state Immutable state Managed through functional application Referential transparent
  • 31. Lambda Calculus Even in parallel Alonzo Church 1930 order β-reduction—can be performed in any order Normal order Applicative order Call-by-name order Call-by-value order Call-by-need order state Immutable state Managed through functional application Referential transparent
  • 32. Lambda Calculus Even in parallel Alonzo Church 1930 order β-reduction—can be performed in any order Normal order Applicative order Call-by-name order Call-by-value order Call-by-need order state Immutable state Managed through functional application Referential transparent Supports Concurrency
  • 33. Lambda Calculus Even in parallel Alonzo Church 1930 order β-reduction—can be performed in any order Normal order Applicative order Call-by-name order Call-by-value order Call-by-need order state Immutable state Managed through functional application Referential transparent Supports ConcurrencyNo model for Distribution
  • 34. Lambda Calculus Even in parallel Alonzo Church 1930 No order model for Mobility β-reduction—can be performed in any order Normal order Applicative order Call-by-name order Call-by-value order Call-by-need order state Immutable state Managed through functional application Referential transparent Supports ConcurrencyNo model for Distribution
  • 35.
  • 36. Von neumann machine John von Neumann 1945 Memory Control Unit Arithmetic Logic Unit Accumulator Input Output
  • 37. Von neumann machine John von Neumann 1945
  • 38. Von neumann machine John von Neumann 1945 state Mutable state In-place updates
  • 39. Von neumann machine John von Neumann 1945 order Total order List of instructions Array of memory state Mutable state In-place updates
  • 40. Von neumann machine John von Neumann 1945 order for No Concurrency model Total order List of instructions Array of memory state Mutable state In-place updates
  • 41. Von neumann machine John von Neumann 1945 order for state No Concurrencymodel No Total order Mutable model state List of instructions In-place updates for Array of memory Distribution
  • 42. Von neumann machine John von Neumann 1945 No model for Mobility order for state No Concurrencymodel No Total order Mutable model state List of instructions In-place updates for Array of memory Distribution
  • 43.
  • 45. transactions Jim Gray 1981 state Isolation of updates Atomicity
  • 46. transactions Jim Gray 1981 order Serializability Disorder across transactions Illusion of order within transactions state Isolation of updates Atomicity
  • 47. transactions Jim Gray 1981 Concurrency order Serializability Disorder Works across Work transactions Well Illusion of order within transactions state Isolation of updates Atomicity
  • 48. transactions Jim Gray 1981 Concurrency Works Work Well Distribution Does Not Work Well order Serializability Disorder across transactions Illusion of order within transactions state Isolation of updates Atomicity
  • 49.
  • 51. actors Carl HEWITT 1973 state Share nothing Atomicity within the actor
  • 52. actors Carl HEWITT 1973 order Async message passing Non-determinism in message delivery state Share nothing Atomicity within the actor
  • 53. actors Carl HEWITT 1973 order model for Great Concurrency Async message passing Non-determinism in message delivery state Share nothing Atomicity within the actor
  • 54. actors Carl HEWITT 1973 order for Great model Great Concurrencymodel for Async message passing Non-determinism in message delivery state Share nothing Atomicity within the actor Distribution
  • 55. actors Carl HEWITT 1973 Great order for Great model Great model Concurrencymodel for Mobility for Async message passing Non-determinism in message delivery state Share nothing Atomicity within the actor Distribution
  • 56. other interesting models That are suitable for distributed systems 1. Pi Calculus 2. Ambient Calculus 3. Join Calculus
  • 57. state of the The Art
  • 59. Impossibility of Distributed Consensus with One Faulty Process
  • 60. Impossibility of Distributed Consensus with One Faulty Process FLP Fischer Lynch Paterson 1985
  • 61. Impossibility of Distributed Consensus with One Faulty Process FLP Fischer Lynch Paterson Consensus 1985 is impossible
  • 62. Impossibility of Distributed Consensus with One Faulty Process FLP “The FLP result shows that in an asynchronous setting, where only one processor might crash, there is no distributed algorithm that solves the consensus problem” - The Paper Trail Fischer Lynch Paterson 1985 Consensus is impossible
  • 63. Impossibility of Distributed Consensus with One Faulty Process FLP Fischer Lynch Paterson 1985
  • 64. Impossibility of Distributed Consensus with One Faulty Process FLP “These results do not show that such problems cannot be “solved” in practice; rather, they point up the need for more refined models of distributed computing” - FLP paper Fischer Lynch Paterson 1985
  • 65.
  • 68. Conjecture by Eric Brewer 2000 Proof by Lynch & Gilbert 2002 Linearizability is impossible CAP Theorem
  • 69. Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services Conjecture by Eric Brewer 2000 Proof by Lynch & Gilbert 2002 Linearizability is impossible CAP Theorem
  • 71. linearizability “Under linearizable consistency, all operations appear to have executed atomically in an order that is consistent with the global real-time ordering of operations.” Herlihy & Wing 1991
  • 72. linearizability “Under linearizable consistency, all operations appear to have executed atomically in an order that is consistent with the global real-time ordering of operations.” Herlihy & Wing 1991 Less formally: A read will return the last completed write (made on any replica)
  • 74. dissecting CAP 1. Very influential—but very NARROW scope
  • 75. dissecting CAP 1. Very influential—but very NARROW scope 2. “[CAP] has lead to confusion and misunderstandings regarding replica consistency, transactional isolation and high availability” - Bailis et.al in HAT paper
  • 76. dissecting CAP 1. Very influential—but very NARROW scope 2. “[CAP] has lead to confusion and misunderstandings regarding replica consistency, transactional isolation and high availability” - Bailis et.al in HAT paper 3. Linearizability is very often NOT required
  • 77. dissecting CAP 1. Very influential—but very NARROW scope 2. “[CAP] has lead to confusion and misunderstandings regarding replica consistency, transactional isolation and high availability” - Bailis et.al in HAT paper 3. Linearizability is very often NOT required 4. Ignores LATENCY—but in practice latency & partitions are deeply related
  • 78. dissecting CAP 1. Very influential—but very NARROW scope 2. “[CAP] has lead to confusion and misunderstandings regarding replica consistency, transactional isolation and high availability” - Bailis et.al in HAT paper 3. Linearizability is very often NOT required 4. Ignores LATENCY—but in practice latency & partitions are deeply related 5. Partitions are RARE—so why sacrifice C or A ALL the time?
  • 79. dissecting CAP 1. Very influential—but very NARROW scope 2. “[CAP] has lead to confusion and misunderstandings regarding replica consistency, transactional isolation and high availability” - Bailis et.al in HAT paper 3. Linearizability is very often NOT required 4. Ignores LATENCY—but in practice latency & partitions are deeply related 5. Partitions are RARE—so why sacrifice C or A ALL the time? 6. NOT black and white—can be fine-grained and dynamic
  • 80. dissecting CAP 1. Very influential—but very NARROW scope 2. “[CAP] has lead to confusion and misunderstandings regarding replica consistency, transactional isolation and high availability” - Bailis et.al in HAT paper 3. Linearizability is very often NOT required 4. Ignores LATENCY—but in practice latency & partitions are deeply related 5. Partitions are RARE—so why sacrifice C or A ALL the time? 6. NOT black and white—can be fine-grained and dynamic 7. Read ‘CAP Twelve Years Later’ - Eric Brewer
  • 82. consensus “The problem of reaching agreement among remote processes is one of the most fundamental problems in distributed computing and is at the core of many algorithms for distributed data processing, distributed file management, and fault-tolerant distributed applications.” Fischer, Lynch & Paterson 1985
  • 86. Consistency models Strong Weak Eventual
  • 88. Last write wins global clock timestamp
  • 89. Last write wins global clock timestamp
  • 90. lamport clocks logical clock causal consistency Leslie lamport 1978
  • 91. lamport clocks logical clock causal consistency Leslie lamport 1978 1. When a process does work, increment the counter
  • 92. lamport clocks logical clock causal consistency Leslie lamport 1978 1. When a process does work, increment the counter 2. When a process sends a message, include the counter
  • 93. lamport clocks logical clock causal consistency Leslie lamport 1978 1. When a process does work, increment the counter 2. When a process sends a message, include the counter 3. When a message is received, merge the counter (set the counter to max(local, received) + 1)
  • 94. vector clocks Extends lamport clocks colin fidge 1988
  • 95. vector clocks Extends lamport clocks colin fidge 1988 1. Each node owns and increments its own Lamport Clock
  • 96. vector clocks Extends lamport clocks colin fidge 1988 1. Each node owns and increments its own Lamport Clock [node -> lamport clock]
  • 97. vector clocks Extends lamport clocks colin fidge 1988 1. Each node owns and increments its own Lamport Clock [node -> lamport clock]
  • 98. vector clocks Extends lamport clocks colin fidge 1988 1. Each node owns and increments its own Lamport Clock [node -> lamport clock] 2. Alway keep the full history of all increments
  • 99. vector clocks Extends lamport clocks colin fidge 1988 1. Each node owns and increments its own Lamport Clock [node -> lamport clock] 2. Alway keep the full history of all increments 3. Merges by calculating the max—monotonic merge
  • 100. Quorum
  • 102. Quorum Strict majority vote Sloppy partial vote
  • 103. Quorum Strict majority vote Sloppy partial vote • Most use R + W > N ⇒ R & W overlap
  • 104. Quorum Strict majority vote Sloppy partial vote • Most use R + W > N ⇒ R & W overlap • If N / 2 + 1 is still alive ⇒ all good
  • 105. Quorum Strict majority vote Sloppy partial vote • Most use R + W > N ⇒ R & W overlap • If N / 2 + 1 is still alive ⇒ all good • Most use N ⩵ 3
  • 108. Failure detection Formal model Strong completeness
  • 109. Failure detection Formal model Strong completeness Every crashed process is eventually suspected by every correct process
  • 110. Failure detection Formal model Strong completeness Everyone knows Every crashed process is eventually suspected by every correct process
  • 111. Failure detection Formal model Strong completeness Everyone knows Every crashed process is eventually suspected by every correct process Weak completeness
  • 112. Failure detection Formal model Strong completeness Everyone knows Every crashed process is eventually suspected by every correct process Weak completeness Every crashed process is eventually suspected by some correct process
  • 113. Failure detection Formal model Strong completeness Everyone knows Every crashed process is eventually suspected by every correct process Weak completeness Someone knows Every crashed process is eventually suspected by some correct process
  • 114. Failure detection Formal model Strong completeness Everyone knows Every crashed process is eventually suspected by every correct process Weak completeness Someone knows Every crashed process is eventually suspected by some correct process Strong accuracy
  • 115. Failure detection Formal model Strong completeness Everyone knows Every crashed process is eventually suspected by every correct process Weak completeness Someone knows Every crashed process is eventually suspected by some correct process Strong accuracy No correct process is suspected ever
  • 116. Failure detection Strong completeness Every crashed process is eventually suspected by every correct process Weak completeness Every crashed process is eventually suspected by some correct process Strong accuracy No false positives No correct process is suspected ever Formal model Everyone knows Someone knows
  • 117. Failure detection Strong completeness Every crashed process is eventually suspected by every correct process Weak completeness Every crashed process is eventually suspected by some correct process Strong accuracy No false positives No correct process is suspected ever Weak accuracy Formal model Everyone knows Someone knows
  • 118. Failure detection Strong completeness Every crashed process is eventually suspected by every correct process Weak completeness Every crashed process is eventually suspected by some correct process Strong accuracy No false positives No correct process is suspected ever Weak accuracy Some correct process is never suspected Formal model Everyone knows Someone knows
  • 119. Failure detection Strong completeness Every crashed process is eventually suspected by every correct process Weak completeness Every crashed process is eventually suspected by some correct process Strong accuracy No false positives No correct process is suspected ever Weak accuracy Some false positives Some correct process is never suspected Formal model Everyone knows Someone knows
  • 120. Accrual Failure detector Hayashibara et. al. 2004
  • 121. Accrual Failure detector Hayashibara et. al. 2004 Keeps history of heartbeat statistics
  • 122. Accrual Failure detector Hayashibara et. al. 2004 Keeps history of heartbeat statistics Decouples monitoring from interpretation
  • 123. Accrual Failure detector Hayashibara et. al. 2004 Keeps history of heartbeat statistics Decouples monitoring from interpretation Calculates a likelihood (phi value) that the process is down
  • 124. Accrual Failure detector Hayashibara et. al. 2004 Not YES or NO Keeps history of heartbeat statistics Decouples monitoring from interpretation Calculates a likelihood (phi value) that the process is down
  • 125. Accrual Failure detector Hayashibara et. al. 2004 Not YES or NO Keeps history of heartbeat statistics Decouples monitoring from interpretation Calculates a likelihood (phi value) that the process is down Takes network hiccups into account
  • 126. Accrual Failure detector Hayashibara et. al. 2004 Not YES or NO Keeps history of heartbeat statistics Decouples monitoring from interpretation Calculates a likelihood (phi value) that the process is down Takes network hiccups into account phi = -log10(1 - F(timeSinceLastHeartbeat)) F is the cumulative distribution function of a normal distribution with mean and standard deviation estimated from historical heartbeat inter-arrival times
  • 127. SWIM Failure detector das et. al. 2002
  • 128. SWIM Failure detector das et. al. 2002 Separates heartbeats from cluster dissemination
  • 129. SWIM Failure detector das et. al. 2002 Separates heartbeats from cluster dissemination Quarantine: suspected ⇒ time window ⇒ faulty
  • 130. SWIM Failure detector das et. al. 2002 Separates heartbeats from cluster dissemination Quarantine: suspected ⇒ time window ⇒ faulty Delegated heartbeat to bridge network splits
  • 131. byzantine Failure detector liskov et. al. 1999
  • 132. byzantine Failure detector liskov et. al. 1999 Supports misbehaving processes
  • 133. byzantine Failure detector liskov et. al. 1999 Supports misbehaving processes Omission failures
  • 134. byzantine Failure detector liskov et. al. 1999 Supports misbehaving processes Omission failures Crash failures, failing to receive a request, or failing to send a response
  • 135. byzantine Failure detector liskov et. al. 1999 Supports misbehaving processes Omission failures Crash failures, failing to receive a request, or failing to send a response Commission failures
  • 136. byzantine Failure detector liskov et. al. 1999 Supports misbehaving processes Omission failures Crash failures, failing to receive a request, or failing to send a response Commission failures Processing a request incorrectly, corrupting local state, and/or sending an incorrect or inconsistent response to a request
  • 137. byzantine Failure detector liskov et. al. 1999 Supports misbehaving processes Omission failures Crash failures, failing to receive a request, or failing to send a response Commission failures Processing a request incorrectly, corrupting local state, and/or sending an incorrect or inconsistent response to a request Very expensive, not practical
  • 139. Types of replication Active (Push) ! Asynchronous Passive (Pull) ! Synchronous VS VS
  • 145. analysis of replication consensus strategies Ryan Barrett 2009
  • 148. Highly Available Transactions Peter Bailis et. al. 2013 HAT NOT CAP
  • 149. Highly Available Transactions Peter Bailis et. al. 2013 Executive Summary HAT NOT CAP
  • 150. Highly Available Transactions Peter Bailis et. al. 2013 Executive Summary • Most SQL DBs do not provide Serializability, but weaker guarantees— for performance reasons HAT NOT CAP
  • 151. Highly Available Transactions Peter Bailis et. al. 2013 Executive Summary • Most SQL DBs do not provide Serializability, but weaker guarantees— for performance reasons • Some weaker transaction guarantees are possible to implement in a HA manner HAT NOT CAP
  • 152. Highly Available Transactions Peter Bailis et. al. 2013 Executive Summary • Most SQL DBs do not provide Serializability, but weaker guarantees— for performance reasons • Some weaker transaction guarantees are possible to implement in a HA manner • What transaction semantics can be provided with HA? HAT NOT CAP
  • 153. HAT
  • 154. UnAvailable • Serializable • Snapshot Isolation • Repeatable Read • Cursor Stability • etc. Highly Available • Read Committed • Read Uncommited • Read Your Writes • Monotonic Atomic View • Monotonic Read/Write • etc. HAT
  • 155. Other scalable or Highly Available Transactional Research
  • 156. Other scalable or Highly Available Transactional Research Bolt-On Consistency Bailis et. al. 2013
  • 157. Other scalable or Highly Available Transactional Research Bolt-On Consistency Bailis et. al. 2013 Calvin Thompson et. al. 2012
  • 158. Other scalable or Highly Available Transactional Research Bolt-On Consistency Bailis et. al. 2013 Calvin Thompson et. al. 2012 Spanner (Google) Corbett et. al. 2012
  • 162. Events 1. Request(v) 2. Decide(v) Specification Properties
  • 163. Events 1. Request(v) 2. Decide(v) Specification Properties 1. Termination: every process eventually decides on a value v
  • 164. Events 1. Request(v) 2. Decide(v) Specification Properties 1. Termination: every process eventually decides on a value v 2. Validity: if a process decides v, then v was proposed by some process
  • 165. Events 1. Request(v) 2. Decide(v) Specification Properties 1. Termination: every process eventually decides on a value v 2. Validity: if a process decides v, then v was proposed by some process 3. Integrity: no process decides twice
  • 166. Events 1. Request(v) 2. Decide(v) Specification Properties 1. Termination: every process eventually decides on a value v 2. Validity: if a process decides v, then v was proposed by some process 3. Integrity: no process decides twice 4. Agreement: no two correct processes decide differently
  • 169. Consensus Algorithms VR Oki & liskov 1988 CAP
  • 170. Consensus Algorithms VR Oki & liskov 1988 CAP Paxos Lamport 1989
  • 171. Consensus Algorithms VR Oki & liskov 1988 Paxos Lamport 1989 ZAB reed & junquiera 2008 CAP
  • 172. Consensus Algorithms VR Oki & liskov 1988 Paxos Lamport 1989 ZAB reed & junquiera 2008 Raft ongaro & ousterhout 2013 CAP
  • 174. Immutability “Immutability Changes Everything” - Pat Helland Immutable Data Share Nothing Architecture
  • 175. Immutability “Immutability Changes Everything” - Pat Helland Immutable Data Share Nothing Architecture Is the path towards TRUE Scalability
  • 176. Think In Facts "The database is a cache of a subset of the log” - Pat Helland
  • 177. Think In Facts "The database is a cache of a subset of the log” - Pat Helland Never delete data Knowledge only grows Append-Only Event Log Use Event Sourcing and/or CQRS
  • 178. Aggregate Roots Can wrap multiple Entities Aggregate Root is the Transactional Boundary
  • 179. Aggregate Roots Can wrap multiple Entities Aggregate Root is the Transactional Boundary Strong Consistency Within Aggregate Eventual Consistency Between Aggregates
  • 180. Aggregate Roots Can wrap multiple Entities Aggregate Root is the Transactional Boundary Strong Consistency Within Aggregate Eventual Consistency Between Aggregates No limit to scalability
  • 182. Dynamo VerY influential CAP Vogels et. al. 2007
  • 183. Dynamo Popularized • Eventual consistency • Epidemic gossip • Consistent hashing ! • Hinted handoff • Read repair • Anti-Entropy W/ Merkle trees VerY influential CAP Vogels et. al. 2007
  • 184. Consistent Hashing Karger et. al. 1997
  • 185. Consistent Hashing Karger et. al. 1997 Support elasticity— easier to scale up and down Avoids hotspots Enables partitioning and replication
  • 186. Consistent Hashing Karger et. al. 1997 Support elasticity— easier to scale up and down Avoids hotspots Enables partitioning and replication Only K/N nodes needs to be remapped when adding or removing a node (K=#keys, N=#nodes)
  • 188. How eventual is Eventual consistency?
  • 189. How eventual is How consistent is Eventual consistency?
  • 190. How eventual is How consistent is Eventual consistency? PBS Probabilistically Bounded Staleness Peter Bailis et. al 2012
  • 191. How eventual is How consistent is Eventual consistency? PBS Probabilistically Bounded Staleness Peter Bailis et. al 2012
  • 193. Node ring & Epidemic Gossip CHORD Stoica et al 2001
  • 194. Node ring & Epidemic Gossip Member Node Member Node Member Node Member Node Member Node Member Node Member Node Member Node Member Node Member Node CHORD Stoica et al 2001
  • 195. Node ring & Epidemic Gossip Member Node Member Node Member Node Member Node Member Node Member Node Member Node Member Node Member Node Member Node CHORD Stoica et al 2001
  • 196. Node ring & Epidemic Gossip Member Node Member Node Member Node Member Node Member Node Member Node Member Node Member Node Member Node Member Node CHORD Stoica et al 2001
  • 197. Node ring & Epidemic Gossip CHORD Stoica et al 2001 CAP Member Node Member Node Member Node Member Node Member Node Member Node Member Node Member Node Member Node Member Node
  • 198. Benefits of Epidemic Gossip Decentralized P2P No SPOF or SPOB Very Scalable Fully Elastic ! Requires minimal administration Often used with VECTOR CLOCKS
  • 199. Some Standard Optimizations to Epidemic Gossip 1. Separation of failure detection heartbeat and dissemination of data - DAS et. al. 2002 (SWIM) 2. Push/Pull gossip - Khambatti et. al 2003 1. Hash and compare data 2. Use single hash or Merkle Trees
  • 202. ACID 2.0 Associative Batch-insensitive (grouping doesn't matter) a+(b+c)=(a+b)+c
  • 203. ACID 2.0 Associative Batch-insensitive (grouping doesn't matter) a+(b+c)=(a+b)+c Commutative Order-insensitive (order doesn't matter) a+b=b+a
  • 204. ACID 2.0 Associative Batch-insensitive (grouping doesn't matter) a+(b+c)=(a+b)+c Commutative Order-insensitive (order doesn't matter) a+b=b+a Idempotent Retransmission-insensitive (duplication does not matter) a+a=a
  • 205. ACID 2.0 Associative Batch-insensitive (grouping doesn't matter) a+(b+c)=(a+b)+c Commutative Order-insensitive (order doesn't matter) a+b=b+a Idempotent Retransmission-insensitive (duplication does not matter) a+a=a Eventually Consistent
  • 206. Convergent & Commutative Replicated Data Types Shapiro et. al. 2011
  • 207. Convergent & Commutative Replicated Data Types Shapiro et. al. 2011 CRDT
  • 208. Convergent & Commutative Replicated Data Types Shapiro et. al. 2011 CRDT Join Semilattice Monotonic merge function
  • 209. Convergent & Commutative Replicated Data Types Shapiro et. al. 2011 Data types Counters Registers Sets Maps Graphs CRDT Join Semilattice Monotonic merge function
  • 210. Convergent & Commutative Replicated Data Types Shapiro et. al. 2011 Data types Counters Registers Sets Maps Graphs CRDT CAP Join Semilattice Monotonic merge function
  • 211. 2 TYPES of CRDTs CvRDT Convergent State-based CmRDT Commutative Ops-based
  • 212. 2 TYPES of CRDTs CvRDT Convergent State-based CmRDT Commutative Ops-based Self contained, holds all history
  • 213. 2 TYPES of CRDTs CvRDT Convergent State-based CmRDT Commutative Ops-based Self contained, holds all history Needs a reliable broadcast channel
  • 214. CALM theorem Hellerstein et. al. 2011 Consistency As Logical Monotonicity
  • 215. CALM theorem Hellerstein et. al. 2011 Consistency As Logical Monotonicity Bloom Language Compiler help to detect & encapsulate non-monotonicity
  • 216. CALM theorem Hellerstein et. al. 2011 Consistency As Logical Monotonicity Bloom Language Distributed Logic Datalog/Dedalus Monotonic functions Just add facts to the system Model state as Lattices Similar to CRDTs (without the scope problem) Compiler help to detect & encapsulate non-monotonicity
  • 219. Akka IO Akka Actors
  • 220. Akka REMOTE Akka IO Akka Actors
  • 221. Akka CLUSTER Akka REMOTE Akka IO Akka Actors
  • 222. Akka CLUSTER EXTENSIONS Akka CLUSTER Akka REMOTE Akka IO Akka Actors
  • 223. What is Akka CLUSTER all about? • Cluster Membership • Leader & Singleton • Cluster Sharding • Clustered Routers (adaptive, consistent hashing, …) • Clustered Supervision and Deathwatch • Clustered Pub/Sub • and more
  • 225. cluster membership in Akka • Dynamo-style master-less decentralized P2P
  • 226. cluster membership in Akka • Dynamo-style master-less decentralized P2P • Epidemic Gossip—Node Ring
  • 227. cluster membership in Akka • Dynamo-style master-less decentralized P2P • Epidemic Gossip—Node Ring • Vector Clocks for causal consistency
  • 228. cluster membership in Akka • Dynamo-style master-less decentralized P2P • Epidemic Gossip—Node Ring • Vector Clocks for causal consistency • Fully elastic with no SPOF or SPOB
  • 229. cluster membership in Akka • Dynamo-style master-less decentralized P2P • Epidemic Gossip—Node Ring • Vector Clocks for causal consistency • Fully elastic with no SPOF or SPOB • Very scalable—2400 nodes (on GCE)
  • 230. cluster membership in Akka • Dynamo-style master-less decentralized P2P • Epidemic Gossip—Node Ring • Vector Clocks for causal consistency • Fully elastic with no SPOF or SPOB • Very scalable—2400 nodes (on GCE) • High throughput—1000 nodes in 4 min (on GCE)
  • 231. Gossip State GOSSIPING case class Gossip( members: SortedSet[Member], seen: Set[Member], unreachable: Set[Member], version: VectorClock)
  • 232. Gossip State GOSSIPING Is a CRDT case class Gossip( members: SortedSet[Member], seen: Set[Member], unreachable: Set[Member], version: VectorClock)
  • 233. Gossip State GOSSIPING Is a CRDT Ordered node ring case class Gossip( members: SortedSet[Member], seen: Set[Member], unreachable: Set[Member], version: VectorClock)
  • 234. Gossip State GOSSIPING Is a CRDT Ordered node ring case class Gossip( members: SortedSet[Member], seen: Set[Member], unreachable: Set[Member], version: VectorClock) Seen set for convergence
  • 235. Gossip State GOSSIPING Is a CRDT Ordered node ring case class Gossip( members: SortedSet[Member], seen: Set[Member], unreachable: Set[Member], version: VectorClock) Seen set for convergence Unreachable set
  • 236. Gossip State GOSSIPING Is a CRDT Ordered node ring case class Gossip( members: SortedSet[Member], seen: Set[Member], unreachable: Set[Member], version: VectorClock) Seen set for convergence Unreachable set Version
  • 237. Gossip Seen set for convergence Unreachable set State GOSSIPING Is a CRDT Ordered node ring case class Gossip( members: SortedSet[Member], seen: Set[Member], unreachable: Set[Member], version: VectorClock) Version 1. Picks random node with older/newer version
  • 238. Gossip Seen set for convergence Unreachable set State GOSSIPING Is a CRDT Ordered node ring case class Gossip( members: SortedSet[Member], seen: Set[Member], unreachable: Set[Member], version: VectorClock) Version 1. Picks random node with older/newer version 2. Gossips in a request/reply fashion
  • 239. Gossip Seen set for convergence Unreachable set State GOSSIPING Is a CRDT Ordered node ring case class Gossip( members: SortedSet[Member], seen: Set[Member], unreachable: Set[Member], version: VectorClock) Version 1. Picks random node with older/newer version 2. Gossips in a request/reply fashion 3. Updates internal state and adds himself to ‘seen’ set
  • 241. Cluster Convergence Reached when: 1. All nodes are represented in the seen set 2. No members are unreachable, or 3. All unreachable members have status down or exiting
  • 243. BIASED GOSSIP 80% bias to nodes not in seen table Up to 400 nodes, then reduced
  • 246. PUSH/PULL GOSSIP Variation case class Status(version: VectorClock)
  • 248. LEADER ROLE Any node can be the leader
  • 249. LEADER ROLE Any node can be the leader 1. No election, but deterministic
  • 250. LEADER ROLE Any node can be the leader 1. No election, but deterministic 2. Can change after cluster convergence
  • 251. LEADER ROLE Any node can be the leader 1. No election, but deterministic 2. Can change after cluster convergence 3. Leader has special duties
  • 254. Failure Detection Hashes the node ring Picks 5 nodes Request/Reply heartbeat
  • 255. Failure Detection To increase likelihood of bridging racks and data centers Hashes the node ring Picks 5 nodes Request/Reply heartbeat
  • 256. Failure Detection Used by racks and data centers Cluster Membership Remote Death Watch Remote Supervision To increase likelihood of bridging Hashes the node ring Picks 5 nodes Request/Reply heartbeat
  • 257. Failure Detection Is an Accrual Failure Detector
  • 258. Failure Detection Is an Accrual Failure Detector Does not help much in practice
  • 259. Failure Detection Is an Accrual Failure Detector Does not help much in practice Need to add delay to deal with Garbage Collection
  • 260. Failure Detection Is an Accrual Failure Detector Does not help much in practice Instead of this Need to add delay to deal with Garbage Collection
  • 261. Failure Detection Is an Accrual Failure Detector Does not help much in practice Instead of this It often looks like this Need to add delay to deal with Garbage Collection
  • 263. Network Partitions • Failure Detector can mark an unavailable member Unreachable
  • 264. Network Partitions • Failure Detector can mark an unavailable member Unreachable • If one node is Unreachable then no cluster Convergence
  • 265. Network Partitions • Failure Detector can mark an unavailable member Unreachable • If one node is Unreachable then no cluster Convergence • This means that the Leader can no longer perform it’s duties
  • 266. Network Partitions • Failure Detector can mark an unavailable • If one node is Split Unreachable then Brain member Unreachable no cluster Convergence • This means that the Leader can no longer perform it’s duties
  • 267. Network Partitions • Failure Detector can mark an unavailable • If one node is Split Unreachable then Brain member Unreachable no cluster Convergence • This means that the Leader can no longer perform it’s duties • Member can come back from Unreachable—Else:
  • 268. Network Partitions • Failure Detector can mark an unavailable • If one node is Split Unreachable then Brain member Unreachable no cluster Convergence • This means that the Leader can no longer perform it’s duties • Member can come back from Unreachable—Else: • The node needs to be marked as Down—either through:
  • 269. Network Partitions • Failure Detector can mark an unavailable • If one node is Split Unreachable then Brain member Unreachable no cluster Convergence • This means that the Leader can no longer perform it’s duties • Member can come back from Unreachable—Else: • The node needs to be marked as Down—either through: 1. auto-down 2. Manual down
  • 271. Potential FUTURE Optimizations • Vector Clock HISTORY pruning
  • 272. Potential FUTURE Optimizations • Vector Clock HISTORY pruning • Delegated heartbeat
  • 273. Potential FUTURE Optimizations • Vector Clock HISTORY pruning • Delegated heartbeat • “Real” push/pull gossip
  • 274. Potential FUTURE Optimizations • Vector Clock HISTORY pruning • Delegated heartbeat • “Real” push/pull gossip • More out-of-the-box auto-down patterns
  • 275. Akka Modules For Distribution
  • 276. Akka Modules For Distribution Akka Cluster Akka Remote Akka HTTP Akka IO
  • 277. Akka Modules For Distribution Akka Cluster Akka Remote Akka HTTP Akka IO Clustered Singleton Clustered Routers Clustered Pub/Sub Cluster Client Consistent Hashing
  • 279. Akka & The Road Ahead Akka HTTP Akka Streams Akka CRDT Akka Raft
  • 280. Akka & The Road Ahead Akka HTTP Akka Streams Akka CRDT Akka Raft Akka 2.4
  • 281. Akka & The Road Ahead Akka HTTP Akka Streams Akka CRDT Akka Raft Akka 2.4 Akka 2.4
  • 282. Akka & The Road Ahead Akka HTTP Akka Streams Akka CRDT Akka Raft Akka 2.4 Akka 2.4 ?
  • 283. Akka & The Road Ahead Akka HTTP Akka Streams Akka CRDT Akka Raft Akka 2.4 Akka 2.4 ? ?
  • 285. Try AKKA out akka.io
  • 286. Join us at React Conf San Francisco Nov 18-21 reactconf.com
  • 287. Early Registration ends tomorrow Join us at React Conf San Francisco Nov 18-21 reactconf.com
  • 288. References • General Distributed Systems • Summary of network reliability post-mortems—more terrifying than the most horrifying Stephen King novel: http://aphyr.com/posts/288-the-network-is-reliable • A Note on Distributed Computing: http://citeseerx.ist.psu.edu/viewdoc/ summary?doi=10.1.1.41.7628 • On the problems with RPC: http://steve.vinoski.net/pdf/IEEE-Convenience_ Over_Correctness.pdf • 8 Fallacies of Distributed Computing: https://blogs.oracle.com/jag/resource/ Fallacies.html • 6 Misconceptions of Distributed Computing: www.dsg.cs.tcd.ie/~vjcahill/ sigops98/papers/vogels.ps • Distributed Computing Systems—A Foundational Approach: http:// www.amazon.com/Programming-Distributed-Computing-Systems- Foundational/dp/0262018985 • Introduction to Reliable and Secure Distributed Programming: http:// www.distributedprogramming.net/ • Nice short overview on Distributed Systems: http://book.mixu.net/distsys/ • Meta list of distributed systems readings: https://gist.github.com/macintux/ 6227368
  • 289. References ! • Actor Model • Great discussion between Erik Meijer & Carl Hewitt or the essence of the Actor Model: http:// channel9.msdn.com/Shows/Going+Deep/Hewitt- Meijer-and-Szyperski-The-Actor-Model-everything- you-wanted-to-know-but-were-afraid-to- ask • Carl Hewitt’s 1973 paper defining the Actor Model: http://worrydream.com/refs/Hewitt- ActorModel.pdf • Gul Agha’s Doctoral Dissertation: https:// dspace.mit.edu/handle/1721.1/6952
  • 290. References • FLP • Impossibility of Distributed Consensus with One Faulty Process: http:// cs-www.cs.yale.edu/homes/arvind/cs425/doc/fischer.pdf • A Brief Tour of FLP: http://the-paper-trail.org/blog/a-brief-tour-of-flp-impossibility/ • CAP • Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services: http://lpd.epfl.ch/sgilbert/pubs/ BrewersConjecture-SigAct.pdf • You Can’t Sacrifice Partition Tolerance: http://codahale.com/you-cant-sacrifice- partition-tolerance/ • Linearizability: A Correctness Condition for Concurrent Objects: http:// courses.cs.vt.edu/~cs5204/fall07-kafura/Papers/TransactionalMemory/ Linearizability.pdf • CAP Twelve Years Later: How the "Rules" Have Changed: http:// www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changed • Consistency vs. Availability: http://www.infoq.com/news/2008/01/ consistency-vs-availability
  • 291. References • Time & Order • Post on the problems with Last Write Wins in Riak: http:// aphyr.com/posts/285-call-me-maybe-riak • Time, Clocks, and the Ordering of Events in a Distributed System: http://research.microsoft.com/en-us/um/people/lamport/pubs/ time-clocks.pdf • Vector Clocks: http://zoo.cs.yale.edu/classes/cs426/2012/lab/ bib/fidge88timestamps.pdf • Failure Detection • Unreliable Failure Detectors for Reliable Distributed Systems: http://www.cs.utexas.edu/~lorenzo/corsi/cs380d/papers/p225- chandra.pdf • The ϕ Accrual Failure Detector: http://ddg.jaist.ac.jp/pub/HDY +04.pdf • SWIM Failure Detector: http://www.cs.cornell.edu/~asdas/ research/dsn02-swim.pdf • Practical Byzantine Fault Tolerance: http://www.pmg.lcs.mit.edu/ papers/osdi99.pdf
  • 292. References • Transactions • Jim Gray’s classic book: http://www.amazon.com/Transaction- Processing-Concepts-Techniques-Management/dp/1558601902 • Highly Available Transactions: Virtues and Limitations: http:// www.bailis.org/papers/hat-vldb2014.pdf • Bolt on Consistency: http://db.cs.berkeley.edu/papers/sigmod13- bolton.pdf • Calvin: Fast Distributed Transactions for Partitioned Database Systems: http://cs.yale.edu/homes/thomson/publications/calvin-sigmod12.pdf • Spanner: Google's Globally-Distributed Database: http:// research.google.com/archive/spanner.html • Life beyond Distributed Transactions: an Apostate’s Opinion https:// cs.brown.edu/courses/cs227/archives/2012/papers/weaker/ cidr07p15.pdf • Immutability Changes Everything—Pat Hellands talk at Ricon: http:// vimeo.com/52831373 • Unschackle Your Domain (Event Sourcing): http://www.infoq.com/ presentations/greg-young-unshackle-qcon08 • CQRS: http://martinfowler.com/bliki/CQRS.html
  • 293. References • Consensus • Paxos Made Simple: http://research.microsoft.com/en-us/ um/people/lamport/pubs/paxos-simple.pdf • Paxos Made Moderately Complex: http:// www.cs.cornell.edu/courses/cs7412/2011sp/paxos.pdf • A simple totally ordered broadcast protocol (ZAB): labs.yahoo.com/files/ladis08.pdf • In Search of an Understandable Consensus Algorithm (Raft): https://ramcloud.stanford.edu/wiki/download/ attachments/11370504/raft.pdf • Replication strategy comparison diagram: http:// snarfed.org/transactions_across_datacenters_io.html • Distributed Snapshots: Determining Global States of Distributed Systems: http://www.cs.swarthmore.edu/ ~newhall/readings/snapshots.pdf
  • 294. References • Eventual Consistency • Dynamo: Amazon’s Highly Available Key-value Store: http://www.read.seas.harvard.edu/ ~kohler/class/cs239-w08/ decandia07dynamo.pdf • Consistency vs. Availability: http:// www.infoq.com/news/2008/01/consistency-vs- availability • Consistent Hashing and Random Trees: http:// thor.cs.ucsb.edu/~ravenben/papers/coreos/kll +97.pdf • PBS: Probabilistically Bounded Staleness: http://pbs.cs.berkeley.edu/
  • 295. References • Epidemic Gossip • Chord: A Scalable Peer-to-peer Lookup Service for Internet • Applications: http://pdos.csail.mit.edu/papers/chord:sigcomm01/ chord_sigcomm.pdf • Gossip-style Failure Detector: http://www.cs.cornell.edu/home/rvr/ papers/GossipFD.pdf • GEMS: http://www.hcs.ufl.edu/pubs/GEMS2005.pdf • Efficient Reconciliation and Flow Control for Anti-Entropy Protocols: http://www.cs.cornell.edu/home/rvr/papers/flowgossip.pdf • 2400 Akka nodes on GCE: http://typesafe.com/blog/running-a-2400- akka-nodes-cluster-on-google-compute-engine • Starting 1000 Akka nodes in 4 min: http://typesafe.com/blog/starting-up- a-1000-node-akka-cluster-in-4-minutes-on-google-compute-engine • Push Pull Gossiping: http://khambatti.com/mujtaba/ ArticlesAndPapers/pdpta03.pdf • SWIM: Scalable Weakly-consistent Infection-style Process Group Membership Protocol: http://www.cs.cornell.edu/~asdas/research/ dsn02-swim.pdf
  • 296. References • Conflict-Free Replicated Data Types (CRDTs) • A comprehensive study of Convergent and Commutative Replicated Data Types: http://hal.upmc.fr/docs/ 00/55/55/88/PDF/techreport.pdf • Mark Shapiro talks about CRDTs at Microsoft: http:// research.microsoft.com/apps/video/dl.aspx?id=153540 • Akka CRDT project: https://github.com/jboner/akka-crdt • CALM • Dedalus: Datalog in Time and Space: http:// db.cs.berkeley.edu/papers/datalog2011-dedalus.pdf • CALM: http://www.cs.berkeley.edu/~palvaro/cidr11.pdf • Logic and Lattices for Distributed Programming: http:// db.cs.berkeley.edu/papers/UCB-lattice-tr.pdf • Bloom Language website: http://bloom-lang.net • Joe Hellerstein talks about CALM: http://vimeo.com/ 53904989
  • 297. References • Akka Cluster • My Akka Cluster Implementation Notes: https:// gist.github.com/jboner/7692270 • Akka Cluster Specification: http://doc.akka.io/docs/ akka/snapshot/common/cluster.html • Akka Cluster Docs: http://doc.akka.io/docs/akka/ snapshot/scala/cluster-usage.html • Akka Failure Detector Docs: http://doc.akka.io/docs/ akka/snapshot/scala/remoting.html#Failure_Detector • Akka Roadmap: https://docs.google.com/a/ typesafe.com/document/d/18W9- fKs55wiFNjXL9q50PYOnR7-nnsImzJqHOPPbM4E/ mobilebasic?pli=1&hl=en_US • Where Akka Came From: http://letitcrash.com/post/ 40599293211/where-akka-came-from
  • 299. The Road to Akka Cluster and Beyond… Jonas Bonér CTO Typesafe @jboner