SlideShare une entreprise Scribd logo
1  sur  22
Télécharger pour lire hors ligne
REPLICATION
IN THE WILD
Ensar Basri Kahveci
REPLICATION
- Putting a data set into multiple nodes.
- Each replica has a full copy.
- A few reasons for replication:
- Performance
- Availability and fault tolerance
- Mostly used with partitioning.
NOTHING FOR FREE!
- Very easy to do when the data is immutable.
- Problems start when we have multiple copies
of the data and we want to update them.
- Two main difficulties
- Handling updates
- Handling failures
The dangers of replication and a solution
- Gray et al. [1] classify replication models by 2
parameters:
- Where to make updates: primary copy or update
anywhere
- When to make updates: eagerly or lazily
WHERE: PRIMARY COPY
- There is a single replica managing the updates.
- Concurrency control is easy.
- No conflicts and no conflict-handling logic.
- Updates are executed on the primary and
secondaries with the same order.
- When primary fails, a new primary is elected.
- Ensuring a single and good primary is hard.
WHERE: UPDATE ANYWHERE
- Each replica can initiate a transaction to make
an update.
- Complex concurrency control.
- Deadlocks or conflicts are possible.
- In practice, there is also multi-leader.
WHEN: EAGER REPLICATION
- Synchronously updates all replicas as part of
one atomic transaction.
- Provides strong consistency.
- Not very flexible. Degree of availability can
degrade on node failures.
- Consensus algorithms.
WHEN: LAZY REPLICATION
- Updates each replica with a separate
transaction.
- Updates can execute quite fast.
- Degree of availability is high.
- Eventual consistency.
- Data copies can diverge.
- Data loss or conflicts can occur.
WHERE?
WHEN?
PRIMARY COPY UPDATE ANYWHERE
EAGER
strong consistency
simple concurrency
slow
inflexible
strong consistency
complex concurrency
slow
expensive
deadlocks
LAZY
fast
eventual consistency
simple concurrency
inconsistency
fast
available
flexible
eventual consistency
inconsistency
conflicts
WHERE?
WHEN?
PRIMARY COPY UPDATE ANYWHERE
EAGER
Multi Paxos [5]
etcd and Consul (RAFT) [6]
Zookeeper (Zab) [7]
Kafka
Paxos [5]
Hazelcast Cluster State Change [12]
LAZY
Hazelcast
MongoDB
ElasticSearch
Redis
Dynamo [4]
Cassandra
Riak
PRIMARY COPY + EAGER REPLICATION
- When the primary fails, secondaries are
guaranteed to be up to date.
- Raft, Kafka etc.
- Majority approach can be used.
- In Kafka, in-sync-replica set is maintained. [11]
- Secondaries can be used for reads.
UPDATE ANYWHERE + EAGER REPLICATION
- Each replica can initiate a new transaction.
- Concurrent transactions can compete with
each other.
- Possibility of deadlocks.
- In the basic Paxos algorithm, there is no
designated leader.
PRIMARy COPY + LAZY REPLICATION
- The primary copy can execute updates fast.
- Secondaries can fall behind the primary. It is
called replication lag.
- It can lead to data loss during leader failover, but
still no conflicts.
- Secondaries can be used for reads.
UPDATE ANYWHERE + LAZY REPLICATION
- Dynamo-style [4] highly available databases.
- Quorums
- Concurrent updates are first-class citizens.
- Possibility of conflicts
- Avoiding, discarding, detecting & resolving conflicts
- Eventual convergence
- Write repair, read repair and anti-entropy
QUORUMS
- W + R > N
- W = 3, R = 1, N = 3
- W = 1, R = 3, N = 3
- W = 2, R = 2, N = 3
- If W or R is not met, consistency may be broken.
- Sloppy quorums and hinted handoff.
- Even if W and R are met, it can be still broken.
Conflict-free replicated data types (CRDTS)
- Special data types that achieve strong
eventual consistency and monotonicity [2]
- No conflicts
- Merge function has 3 properties:
- Commutative: A + B = B + A
- Associative: A + (B + C) = (A + B) + C
- Idempotent: f(f(x)) = f(x)
- Riak Data Types [3]
DISCARDING CONFLICTS: LAST WRITE WINS
- When 2 updates are concurrent, define an
arbitrary order among them.
- i.e., pretend that one of them is more recent.
- Attach a timestamp to each write.
- Cassandra uses physical timestamps [8], [9]
DETECTING CONFLICTS: VECTOR CLOCKS
- In Dynamo paper [4], each update is done
against a particular version of a data entry.
- Multiple versions of a data entry can exist together.
- Vector clocks [10] are used to track causality.
- The system can determine the authoritative version:
syntactic reconciliation
- The system cannot reconcile multiple versions:
semantic reconciliation
Resolving conflicts and EVENTUAL CONVERGENCE
- Write repair
- Read repair
- Anti-entropy
- Merkle trees
Recap
- We apply replication to make our systems
performant and fault tolerant.
- Replication suffers from core problems of
distributed systems.
- We can build many replication protocols that
vary on the 2 dimensions we discussed.
- No silver bullet. It is a world of trade-offs.
REFerences
[1] Gray, Jim, et al. "The dangers of replication and a solution." ACM SIGMOD Record 25.2 (1996): 173-182.
[2] Shapiro, Marc, et al. "Conflict-free replicated data types." Symposium on Self-Stabilizing Systems. Springer, Berlin, Heidelberg, 2011.
[3] http://docs.basho.com/riak/kv/2.2.0/learn/concepts/crdts/
[4] DeCandia, Giuseppe, et al. "Dynamo: amazon's highly available key-value store." ACM SIGOPS operating systems review 41.6 (2007): 205-220.
[5] Lamport, Leslie. "Paxos made simple." ACM Sigact News 32.4 (2001): 18-25.
[6] Ongaro, Diego, and John K. Ousterhout. "In Search of an Understandable Consensus Algorithm." USENIX Annual Technical Conference. 2014.
[7] Hunt, Patrick, et al. "ZooKeeper: Wait-free Coordination for Internet-scale Systems." USENIX annual technical conference. Vol. 8. 2010.
[8] http://www.datastax.com/dev/blog/why-cassandra-doesnt-need-vector-clocks
[9] https://aphyr.com/posts/299-the-trouble-with-timestamps
[10] Raynal, Michel, and Mukesh Singhal. "Logical time: Capturing causality in distributed systems." Computer 29.2 (1996): 49-56.
[11] http://kafka.apache.org/documentation.html#replication
[12] http://docs.hazelcast.org/docs/latest/manual/html-single/index.html#managing-cluster-and-member-states
THANKS!Any questions?

Contenu connexe

Tendances

Distributed Systems Theory for Mere Mortals - Java Day Istanbul May 2017
Distributed Systems Theory for Mere Mortals - Java Day Istanbul May 2017 Distributed Systems Theory for Mere Mortals - Java Day Istanbul May 2017
Distributed Systems Theory for Mere Mortals - Java Day Istanbul May 2017 Ensar Basri Kahveci
 
MC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DE
MC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DEMC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DE
MC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DEAravind NC
 
Multiprocessing -Interprocessing communication and process sunchronization,se...
Multiprocessing -Interprocessing communication and process sunchronization,se...Multiprocessing -Interprocessing communication and process sunchronization,se...
Multiprocessing -Interprocessing communication and process sunchronization,se...Neena R Krishna
 
Error tolerant resource allocation and payment minimization for cloud system
Error tolerant resource allocation and payment minimization for cloud systemError tolerant resource allocation and payment minimization for cloud system
Error tolerant resource allocation and payment minimization for cloud systemJPINFOTECH JAYAPRAKASH
 
Dynamic load balancing in distributed systems in the presence of delays a re...
Dynamic load balancing in distributed systems in the presence of delays  a re...Dynamic load balancing in distributed systems in the presence of delays  a re...
Dynamic load balancing in distributed systems in the presence of delays a re...Mumbai Academisc
 
dos mutual exclusion algos
dos mutual exclusion algosdos mutual exclusion algos
dos mutual exclusion algosAkhil Sharma
 
Dichotomy of parallel computing platforms
Dichotomy of parallel computing platformsDichotomy of parallel computing platforms
Dichotomy of parallel computing platformsSyed Zaid Irshad
 
DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT Error tolerant resource allocation an...
DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT Error tolerant resource allocation an...DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT Error tolerant resource allocation an...
DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT Error tolerant resource allocation an...IEEEGLOBALSOFTTECHNOLOGIES
 
Error tolerant resource allocation and payment minimization for cloud system
Error tolerant resource allocation and payment minimization for cloud systemError tolerant resource allocation and payment minimization for cloud system
Error tolerant resource allocation and payment minimization for cloud systemIEEEFINALYEARPROJECTS
 
SYNCHRONIZATION IN MULTIPROCESSING
SYNCHRONIZATION IN MULTIPROCESSINGSYNCHRONIZATION IN MULTIPROCESSING
SYNCHRONIZATION IN MULTIPROCESSINGAparna Bhadran
 
Management on Cloud 2011
Management on Cloud 2011Management on Cloud 2011
Management on Cloud 2011steccami
 
Distributed Shared Memory
Distributed Shared MemoryDistributed Shared Memory
Distributed Shared MemoryPrakhar Rastogi
 
Introduction to parallel processing
Introduction to parallel processingIntroduction to parallel processing
Introduction to parallel processingPage Maker
 
CAP theorem and distributed systems
CAP theorem and distributed systemsCAP theorem and distributed systems
CAP theorem and distributed systemsKlika Tech, Inc
 

Tendances (20)

Distributed Systems Theory for Mere Mortals - Java Day Istanbul May 2017
Distributed Systems Theory for Mere Mortals - Java Day Istanbul May 2017 Distributed Systems Theory for Mere Mortals - Java Day Istanbul May 2017
Distributed Systems Theory for Mere Mortals - Java Day Istanbul May 2017
 
data replication
data replicationdata replication
data replication
 
Scope of parallelism
Scope of parallelismScope of parallelism
Scope of parallelism
 
MC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DE
MC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DEMC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DE
MC0085 – Advanced Operating Systems - Master of Computer Science - MCA - SMU DE
 
Multiprocessing -Interprocessing communication and process sunchronization,se...
Multiprocessing -Interprocessing communication and process sunchronization,se...Multiprocessing -Interprocessing communication and process sunchronization,se...
Multiprocessing -Interprocessing communication and process sunchronization,se...
 
Error tolerant resource allocation and payment minimization for cloud system
Error tolerant resource allocation and payment minimization for cloud systemError tolerant resource allocation and payment minimization for cloud system
Error tolerant resource allocation and payment minimization for cloud system
 
Dynamic load balancing in distributed systems in the presence of delays a re...
Dynamic load balancing in distributed systems in the presence of delays  a re...Dynamic load balancing in distributed systems in the presence of delays  a re...
Dynamic load balancing in distributed systems in the presence of delays a re...
 
dos mutual exclusion algos
dos mutual exclusion algosdos mutual exclusion algos
dos mutual exclusion algos
 
Dichotomy of parallel computing platforms
Dichotomy of parallel computing platformsDichotomy of parallel computing platforms
Dichotomy of parallel computing platforms
 
Distruted applications
Distruted applicationsDistruted applications
Distruted applications
 
DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT Error tolerant resource allocation an...
DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT Error tolerant resource allocation an...DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT Error tolerant resource allocation an...
DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT Error tolerant resource allocation an...
 
Error tolerant resource allocation and payment minimization for cloud system
Error tolerant resource allocation and payment minimization for cloud systemError tolerant resource allocation and payment minimization for cloud system
Error tolerant resource allocation and payment minimization for cloud system
 
Cap Theorem
Cap TheoremCap Theorem
Cap Theorem
 
Chap 4
Chap 4Chap 4
Chap 4
 
SYNCHRONIZATION IN MULTIPROCESSING
SYNCHRONIZATION IN MULTIPROCESSINGSYNCHRONIZATION IN MULTIPROCESSING
SYNCHRONIZATION IN MULTIPROCESSING
 
Management on Cloud 2011
Management on Cloud 2011Management on Cloud 2011
Management on Cloud 2011
 
Distributed Shared Memory
Distributed Shared MemoryDistributed Shared Memory
Distributed Shared Memory
 
Introduction to parallel processing
Introduction to parallel processingIntroduction to parallel processing
Introduction to parallel processing
 
CAP theorem and distributed systems
CAP theorem and distributed systemsCAP theorem and distributed systems
CAP theorem and distributed systems
 
Buffer management
Buffer managementBuffer management
Buffer management
 

Similaire à Replication Models and Consistency Trade-Offs

Replication in the wild ankara cloud meetup - feb 2017
Replication in the wild   ankara cloud meetup - feb 2017Replication in the wild   ankara cloud meetup - feb 2017
Replication in the wild ankara cloud meetup - feb 2017AnkaraCloud
 
Replication in the wild ankara cloud meetup - feb 2017
Replication in the wild   ankara cloud meetup - feb 2017Replication in the wild   ankara cloud meetup - feb 2017
Replication in the wild ankara cloud meetup - feb 2017Onur Dayıbaşı
 
18 philbe replication stanford99
18 philbe replication stanford9918 philbe replication stanford99
18 philbe replication stanford99ashish61_scs
 
MySQL 5.7 clustering: The developer perspective
MySQL 5.7 clustering: The developer perspectiveMySQL 5.7 clustering: The developer perspective
MySQL 5.7 clustering: The developer perspectiveUlf Wendel
 
Lecture-04-Principles of data management.pdf
Lecture-04-Principles of data management.pdfLecture-04-Principles of data management.pdf
Lecture-04-Principles of data management.pdfmanimozhi98
 
MySQL 5.7 Fabric: Introduction to High Availability and Sharding
MySQL 5.7 Fabric: Introduction to High Availability and Sharding MySQL 5.7 Fabric: Introduction to High Availability and Sharding
MySQL 5.7 Fabric: Introduction to High Availability and Sharding Ulf Wendel
 
Talon systems - Distributed multi master replication strategy
Talon systems - Distributed multi master replication strategyTalon systems - Distributed multi master replication strategy
Talon systems - Distributed multi master replication strategySaptarshi Chatterjee
 
Data Engineering for Data Scientists
Data Engineering for Data Scientists Data Engineering for Data Scientists
Data Engineering for Data Scientists jlacefie
 
Zookeeper vs Raft: Stateful distributed coordination with HA and Fault Tolerance
Zookeeper vs Raft: Stateful distributed coordination with HA and Fault ToleranceZookeeper vs Raft: Stateful distributed coordination with HA and Fault Tolerance
Zookeeper vs Raft: Stateful distributed coordination with HA and Fault ToleranceAlluxio, Inc.
 
Best Practice for Achieving High Availability in MariaDB
Best Practice for Achieving High Availability in MariaDBBest Practice for Achieving High Availability in MariaDB
Best Practice for Achieving High Availability in MariaDBMariaDB plc
 
MySQL Group Replication
MySQL Group ReplicationMySQL Group Replication
MySQL Group ReplicationUlf Wendel
 
DIY: A distributed database cluster, or: MySQL Cluster
DIY: A distributed database cluster, or: MySQL ClusterDIY: A distributed database cluster, or: MySQL Cluster
DIY: A distributed database cluster, or: MySQL ClusterUlf Wendel
 
Scaling the server side of occasionally-connected, mobile systems with MongoDB
Scaling the server side of occasionally-connected, mobile systems with MongoDBScaling the server side of occasionally-connected, mobile systems with MongoDB
Scaling the server side of occasionally-connected, mobile systems with MongoDBThomas Huber
 
Understanding Data Consistency in Apache Cassandra
Understanding Data Consistency in Apache CassandraUnderstanding Data Consistency in Apache Cassandra
Understanding Data Consistency in Apache CassandraDataStax
 
Getting Modern With MySQL
Getting Modern With MySQLGetting Modern With MySQL
Getting Modern With MySQLAll Things Open
 
Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupesh Bansal
 
Design (Cloud systems) for Failures
Design (Cloud systems) for FailuresDesign (Cloud systems) for Failures
Design (Cloud systems) for FailuresRodolfo Kohn
 

Similaire à Replication Models and Consistency Trade-Offs (20)

Replication in the wild ankara cloud meetup - feb 2017
Replication in the wild   ankara cloud meetup - feb 2017Replication in the wild   ankara cloud meetup - feb 2017
Replication in the wild ankara cloud meetup - feb 2017
 
Replication in the wild ankara cloud meetup - feb 2017
Replication in the wild   ankara cloud meetup - feb 2017Replication in the wild   ankara cloud meetup - feb 2017
Replication in the wild ankara cloud meetup - feb 2017
 
18 philbe replication stanford99
18 philbe replication stanford9918 philbe replication stanford99
18 philbe replication stanford99
 
MySQL 5.7 clustering: The developer perspective
MySQL 5.7 clustering: The developer perspectiveMySQL 5.7 clustering: The developer perspective
MySQL 5.7 clustering: The developer perspective
 
Lecture-04-Principles of data management.pdf
Lecture-04-Principles of data management.pdfLecture-04-Principles of data management.pdf
Lecture-04-Principles of data management.pdf
 
MySQL 5.7 Fabric: Introduction to High Availability and Sharding
MySQL 5.7 Fabric: Introduction to High Availability and Sharding MySQL 5.7 Fabric: Introduction to High Availability and Sharding
MySQL 5.7 Fabric: Introduction to High Availability and Sharding
 
Talon systems - Distributed multi master replication strategy
Talon systems - Distributed multi master replication strategyTalon systems - Distributed multi master replication strategy
Talon systems - Distributed multi master replication strategy
 
Data Engineering for Data Scientists
Data Engineering for Data Scientists Data Engineering for Data Scientists
Data Engineering for Data Scientists
 
Zookeeper vs Raft: Stateful distributed coordination with HA and Fault Tolerance
Zookeeper vs Raft: Stateful distributed coordination with HA and Fault ToleranceZookeeper vs Raft: Stateful distributed coordination with HA and Fault Tolerance
Zookeeper vs Raft: Stateful distributed coordination with HA and Fault Tolerance
 
10 replication
10 replication10 replication
10 replication
 
Best Practice for Achieving High Availability in MariaDB
Best Practice for Achieving High Availability in MariaDBBest Practice for Achieving High Availability in MariaDB
Best Practice for Achieving High Availability in MariaDB
 
MySQL Group Replication
MySQL Group ReplicationMySQL Group Replication
MySQL Group Replication
 
DIY: A distributed database cluster, or: MySQL Cluster
DIY: A distributed database cluster, or: MySQL ClusterDIY: A distributed database cluster, or: MySQL Cluster
DIY: A distributed database cluster, or: MySQL Cluster
 
Scaling the server side of occasionally-connected, mobile systems with MongoDB
Scaling the server side of occasionally-connected, mobile systems with MongoDBScaling the server side of occasionally-connected, mobile systems with MongoDB
Scaling the server side of occasionally-connected, mobile systems with MongoDB
 
Understanding Data Consistency in Apache Cassandra
Understanding Data Consistency in Apache CassandraUnderstanding Data Consistency in Apache Cassandra
Understanding Data Consistency in Apache Cassandra
 
Getting Modern With MySQL
Getting Modern With MySQLGetting Modern With MySQL
Getting Modern With MySQL
 
Getting modern with my sql
Getting modern with my sqlGetting modern with my sql
Getting modern with my sql
 
ZooKeeper (and other things)
ZooKeeper (and other things)ZooKeeper (and other things)
ZooKeeper (and other things)
 
Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata
 
Design (Cloud systems) for Failures
Design (Cloud systems) for FailuresDesign (Cloud systems) for Failures
Design (Cloud systems) for Failures
 

Plus de Ensar Basri Kahveci

java.util.concurrent for Distributed Coordination - Berlin Expert Days 2019
java.util.concurrent for Distributed Coordination - Berlin Expert Days 2019java.util.concurrent for Distributed Coordination - Berlin Expert Days 2019
java.util.concurrent for Distributed Coordination - Berlin Expert Days 2019Ensar Basri Kahveci
 
java.util.concurrent for Distributed Coordination, Riga DevDays 2019
java.util.concurrent for Distributed Coordination, Riga DevDays 2019java.util.concurrent for Distributed Coordination, Riga DevDays 2019
java.util.concurrent for Distributed Coordination, Riga DevDays 2019Ensar Basri Kahveci
 
java.util.concurrent for Distributed Coordination, GeeCON Krakow 2019
java.util.concurrent for Distributed Coordination, GeeCON Krakow 2019java.util.concurrent for Distributed Coordination, GeeCON Krakow 2019
java.util.concurrent for Distributed Coordination, GeeCON Krakow 2019Ensar Basri Kahveci
 
java.util.concurrent for Distributed Coordination, JEEConf 2019
java.util.concurrent for Distributed Coordination, JEEConf 2019java.util.concurrent for Distributed Coordination, JEEConf 2019
java.util.concurrent for Distributed Coordination, JEEConf 2019Ensar Basri Kahveci
 
Replication Distilled: Hazelcast Deep Dive @ In-Memory Computing Summit San F...
Replication Distilled: Hazelcast Deep Dive @ In-Memory Computing Summit San F...Replication Distilled: Hazelcast Deep Dive @ In-Memory Computing Summit San F...
Replication Distilled: Hazelcast Deep Dive @ In-Memory Computing Summit San F...Ensar Basri Kahveci
 
Replication Distilled: Hazelcast Deep Dive - Berlin Expert Days 2018
Replication Distilled: Hazelcast Deep Dive - Berlin Expert Days 2018Replication Distilled: Hazelcast Deep Dive - Berlin Expert Days 2018
Replication Distilled: Hazelcast Deep Dive - Berlin Expert Days 2018Ensar Basri Kahveci
 
From AP to CP and Back: The Curious Case of Hazelcast (jdk.io 2018)
From AP to CP and Back: The Curious Case of Hazelcast (jdk.io 2018)From AP to CP and Back: The Curious Case of Hazelcast (jdk.io 2018)
From AP to CP and Back: The Curious Case of Hazelcast (jdk.io 2018)Ensar Basri Kahveci
 
Distributed Systems Theory for Mere Mortals - Software Craftsmanship Turkey
Distributed Systems Theory for Mere Mortals - Software Craftsmanship TurkeyDistributed Systems Theory for Mere Mortals - Software Craftsmanship Turkey
Distributed Systems Theory for Mere Mortals - Software Craftsmanship TurkeyEnsar Basri Kahveci
 
Distributed Systems Theory for Mere Mortals - Topconf Dusseldorf October 2017
Distributed Systems Theory for Mere Mortals - Topconf Dusseldorf October 2017Distributed Systems Theory for Mere Mortals - Topconf Dusseldorf October 2017
Distributed Systems Theory for Mere Mortals - Topconf Dusseldorf October 2017Ensar Basri Kahveci
 
Ankara Jug - Practical Functional Programming with Scala
Ankara Jug - Practical Functional Programming with ScalaAnkara Jug - Practical Functional Programming with Scala
Ankara Jug - Practical Functional Programming with ScalaEnsar Basri Kahveci
 

Plus de Ensar Basri Kahveci (10)

java.util.concurrent for Distributed Coordination - Berlin Expert Days 2019
java.util.concurrent for Distributed Coordination - Berlin Expert Days 2019java.util.concurrent for Distributed Coordination - Berlin Expert Days 2019
java.util.concurrent for Distributed Coordination - Berlin Expert Days 2019
 
java.util.concurrent for Distributed Coordination, Riga DevDays 2019
java.util.concurrent for Distributed Coordination, Riga DevDays 2019java.util.concurrent for Distributed Coordination, Riga DevDays 2019
java.util.concurrent for Distributed Coordination, Riga DevDays 2019
 
java.util.concurrent for Distributed Coordination, GeeCON Krakow 2019
java.util.concurrent for Distributed Coordination, GeeCON Krakow 2019java.util.concurrent for Distributed Coordination, GeeCON Krakow 2019
java.util.concurrent for Distributed Coordination, GeeCON Krakow 2019
 
java.util.concurrent for Distributed Coordination, JEEConf 2019
java.util.concurrent for Distributed Coordination, JEEConf 2019java.util.concurrent for Distributed Coordination, JEEConf 2019
java.util.concurrent for Distributed Coordination, JEEConf 2019
 
Replication Distilled: Hazelcast Deep Dive @ In-Memory Computing Summit San F...
Replication Distilled: Hazelcast Deep Dive @ In-Memory Computing Summit San F...Replication Distilled: Hazelcast Deep Dive @ In-Memory Computing Summit San F...
Replication Distilled: Hazelcast Deep Dive @ In-Memory Computing Summit San F...
 
Replication Distilled: Hazelcast Deep Dive - Berlin Expert Days 2018
Replication Distilled: Hazelcast Deep Dive - Berlin Expert Days 2018Replication Distilled: Hazelcast Deep Dive - Berlin Expert Days 2018
Replication Distilled: Hazelcast Deep Dive - Berlin Expert Days 2018
 
From AP to CP and Back: The Curious Case of Hazelcast (jdk.io 2018)
From AP to CP and Back: The Curious Case of Hazelcast (jdk.io 2018)From AP to CP and Back: The Curious Case of Hazelcast (jdk.io 2018)
From AP to CP and Back: The Curious Case of Hazelcast (jdk.io 2018)
 
Distributed Systems Theory for Mere Mortals - Software Craftsmanship Turkey
Distributed Systems Theory for Mere Mortals - Software Craftsmanship TurkeyDistributed Systems Theory for Mere Mortals - Software Craftsmanship Turkey
Distributed Systems Theory for Mere Mortals - Software Craftsmanship Turkey
 
Distributed Systems Theory for Mere Mortals - Topconf Dusseldorf October 2017
Distributed Systems Theory for Mere Mortals - Topconf Dusseldorf October 2017Distributed Systems Theory for Mere Mortals - Topconf Dusseldorf October 2017
Distributed Systems Theory for Mere Mortals - Topconf Dusseldorf October 2017
 
Ankara Jug - Practical Functional Programming with Scala
Ankara Jug - Practical Functional Programming with ScalaAnkara Jug - Practical Functional Programming with Scala
Ankara Jug - Practical Functional Programming with Scala
 

Dernier

Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...ranjana rawat
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxthe ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxhumanexperienceaaa
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 

Dernier (20)

Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxthe ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 

Replication Models and Consistency Trade-Offs

  • 2. REPLICATION - Putting a data set into multiple nodes. - Each replica has a full copy. - A few reasons for replication: - Performance - Availability and fault tolerance - Mostly used with partitioning.
  • 3. NOTHING FOR FREE! - Very easy to do when the data is immutable. - Problems start when we have multiple copies of the data and we want to update them. - Two main difficulties - Handling updates - Handling failures
  • 4. The dangers of replication and a solution - Gray et al. [1] classify replication models by 2 parameters: - Where to make updates: primary copy or update anywhere - When to make updates: eagerly or lazily
  • 5. WHERE: PRIMARY COPY - There is a single replica managing the updates. - Concurrency control is easy. - No conflicts and no conflict-handling logic. - Updates are executed on the primary and secondaries with the same order. - When primary fails, a new primary is elected. - Ensuring a single and good primary is hard.
  • 6. WHERE: UPDATE ANYWHERE - Each replica can initiate a transaction to make an update. - Complex concurrency control. - Deadlocks or conflicts are possible. - In practice, there is also multi-leader.
  • 7. WHEN: EAGER REPLICATION - Synchronously updates all replicas as part of one atomic transaction. - Provides strong consistency. - Not very flexible. Degree of availability can degrade on node failures. - Consensus algorithms.
  • 8. WHEN: LAZY REPLICATION - Updates each replica with a separate transaction. - Updates can execute quite fast. - Degree of availability is high. - Eventual consistency. - Data copies can diverge. - Data loss or conflicts can occur.
  • 9. WHERE? WHEN? PRIMARY COPY UPDATE ANYWHERE EAGER strong consistency simple concurrency slow inflexible strong consistency complex concurrency slow expensive deadlocks LAZY fast eventual consistency simple concurrency inconsistency fast available flexible eventual consistency inconsistency conflicts
  • 10. WHERE? WHEN? PRIMARY COPY UPDATE ANYWHERE EAGER Multi Paxos [5] etcd and Consul (RAFT) [6] Zookeeper (Zab) [7] Kafka Paxos [5] Hazelcast Cluster State Change [12] LAZY Hazelcast MongoDB ElasticSearch Redis Dynamo [4] Cassandra Riak
  • 11. PRIMARY COPY + EAGER REPLICATION - When the primary fails, secondaries are guaranteed to be up to date. - Raft, Kafka etc. - Majority approach can be used. - In Kafka, in-sync-replica set is maintained. [11] - Secondaries can be used for reads.
  • 12. UPDATE ANYWHERE + EAGER REPLICATION - Each replica can initiate a new transaction. - Concurrent transactions can compete with each other. - Possibility of deadlocks. - In the basic Paxos algorithm, there is no designated leader.
  • 13. PRIMARy COPY + LAZY REPLICATION - The primary copy can execute updates fast. - Secondaries can fall behind the primary. It is called replication lag. - It can lead to data loss during leader failover, but still no conflicts. - Secondaries can be used for reads.
  • 14. UPDATE ANYWHERE + LAZY REPLICATION - Dynamo-style [4] highly available databases. - Quorums - Concurrent updates are first-class citizens. - Possibility of conflicts - Avoiding, discarding, detecting & resolving conflicts - Eventual convergence - Write repair, read repair and anti-entropy
  • 15. QUORUMS - W + R > N - W = 3, R = 1, N = 3 - W = 1, R = 3, N = 3 - W = 2, R = 2, N = 3 - If W or R is not met, consistency may be broken. - Sloppy quorums and hinted handoff. - Even if W and R are met, it can be still broken.
  • 16. Conflict-free replicated data types (CRDTS) - Special data types that achieve strong eventual consistency and monotonicity [2] - No conflicts - Merge function has 3 properties: - Commutative: A + B = B + A - Associative: A + (B + C) = (A + B) + C - Idempotent: f(f(x)) = f(x) - Riak Data Types [3]
  • 17. DISCARDING CONFLICTS: LAST WRITE WINS - When 2 updates are concurrent, define an arbitrary order among them. - i.e., pretend that one of them is more recent. - Attach a timestamp to each write. - Cassandra uses physical timestamps [8], [9]
  • 18. DETECTING CONFLICTS: VECTOR CLOCKS - In Dynamo paper [4], each update is done against a particular version of a data entry. - Multiple versions of a data entry can exist together. - Vector clocks [10] are used to track causality. - The system can determine the authoritative version: syntactic reconciliation - The system cannot reconcile multiple versions: semantic reconciliation
  • 19. Resolving conflicts and EVENTUAL CONVERGENCE - Write repair - Read repair - Anti-entropy - Merkle trees
  • 20. Recap - We apply replication to make our systems performant and fault tolerant. - Replication suffers from core problems of distributed systems. - We can build many replication protocols that vary on the 2 dimensions we discussed. - No silver bullet. It is a world of trade-offs.
  • 21. REFerences [1] Gray, Jim, et al. "The dangers of replication and a solution." ACM SIGMOD Record 25.2 (1996): 173-182. [2] Shapiro, Marc, et al. "Conflict-free replicated data types." Symposium on Self-Stabilizing Systems. Springer, Berlin, Heidelberg, 2011. [3] http://docs.basho.com/riak/kv/2.2.0/learn/concepts/crdts/ [4] DeCandia, Giuseppe, et al. "Dynamo: amazon's highly available key-value store." ACM SIGOPS operating systems review 41.6 (2007): 205-220. [5] Lamport, Leslie. "Paxos made simple." ACM Sigact News 32.4 (2001): 18-25. [6] Ongaro, Diego, and John K. Ousterhout. "In Search of an Understandable Consensus Algorithm." USENIX Annual Technical Conference. 2014. [7] Hunt, Patrick, et al. "ZooKeeper: Wait-free Coordination for Internet-scale Systems." USENIX annual technical conference. Vol. 8. 2010. [8] http://www.datastax.com/dev/blog/why-cassandra-doesnt-need-vector-clocks [9] https://aphyr.com/posts/299-the-trouble-with-timestamps [10] Raynal, Michel, and Mukesh Singhal. "Logical time: Capturing causality in distributed systems." Computer 29.2 (1996): 49-56. [11] http://kafka.apache.org/documentation.html#replication [12] http://docs.hazelcast.org/docs/latest/manual/html-single/index.html#managing-cluster-and-member-states