Put Your Thinking CAP On

Put Your Thinking
CAP On
Tomer Gabel, Wix
JDay Lviv, 2015

Credits
Originally a talk by
Yoav Abrahami (Wix)
Based on “Call Me Maybe” by
Kyle “Aphyr” Kingsbury

Brewer’s CAP Theorem
Partition
Tolerance
ConsistencyAvailability

By Example
• I want this book!
– I add it to the cart
– Then continue
browsing
• There’s only one copy
in stock!

By Example
• I want this book!
– I add it to the cart
– Then continue
browsing
• There’s only one copy
in stock!
• … and someone else
just bought it.

Consistency: Defined
• In a consistent
system:
All participants
see the same value
at the same time
• “Do you have this
book in stock?”

Consistency: Defined
• If our book store is an
inconsistent system:
– Two customers may
buy the book
– But there’s only one
item in inventory!
• We’ve just violated a
business constraint.

Availability: Defined
• An available system:
– Is reachable
– Responds to requests
(within SLA)
• Availability does not
guarantee success!
– The operation may fail
– “This book is no longer
available”

Availability: Defined
• What if the system is
unavailable?
– I complete the
checkout
– And click on “Pay”
– And wait
– And wait some more
– And…
• Did I purchase the
book or not?!

Partition Tolerance: Defined
• Partition: one or
more nodes are
unreachable
• No practical
system runs on a
single node
• So all systems are
susceptible!
A
B
C
D
E

“The Network is Reliable”
• All four happen in an
IP network
• To a client, delays
and drops are the
same
• Perfect failure
detection is provably
impossible1!
A B
drop delay
duplicate reorder
A B
A B A B
time
1 “Impossibility of Distributed Consensus with One Faulty Process”, Fischer, Lynch and Paterson

Partition Tolerance: Reified
• External causes:
– Bad network config
– Faulty equipment
– Scheduled
maintenance
• Even software causes
partitions:
– Bad network config.
– GC pauses
– Overloaded servers
• Plenty of war stories!
– Netflix
– Twilio
– GitHub
– Wix :-)
• Some hard numbers1:
– 5.2 failed devices/day
– 59K lost packets/day
– Adding redundancy
only improves by 40%
1 “Understanding Network Failures in Data Centers: Measurement, Analysis, and Implications”, Gill et al

In Pictures
• Let’s consider a simple
system:
– Service A writes values
– Service B reads values
– Values are replicated
between nodes
• These are “ideal”
systems
– Bug-free, predictable
Node 1
V0A
Node 2
V0B

In Pictures
• “Sunny day scenario”:
– A writes a new value V1
– The value is replicated
to node 2
– B reads the new value
Node 1
V0A
Node 2
V0B
V1
V1
V1
V1

In Pictures
• What happens if the
network drops?
– Replication fails
– B still sees the old value
– The system is
inconsistent
Node 1
V0A
Node 2
V0B
V1
V0
V1

In Pictures
• Possible mitigation is
synchronous replication
– Cannot replicate, so write is
rejected
– Both A and B still see V0
– The system is logically
unavailable
Node 1
V0A
Node 2
V0B
V1

The network is not reliable
• Distributed systems must handle partitions
• Any modern system runs on >1 nodes…
• … and is therefore distributed
• Ergo, you have to choose:
– Consistency over availability
– Availability over consistency

Granularity
• Real systems comprise many operations
– “Add book to cart”
– “Pay for the book”
• Each has different properties
• It’s a spectrum, not a binary choice!
Consistency Availability
Shopping CartCheckout

CAP IN THE REAL
WORLD
Kyle “Aphyr” Kingsbury
Breaking consistency
guarantees since 2013

PostgreSQL
• Traditional RDBMS
– Transactional
– ACID compliant
• Primarily a CP system
– Writes against a
master node
• “Not a distributed
system”
– Except with a client at
play!

PostgreSQL
• Writes are a simplified
2PC:
– Client votes to commit
– Server validates
transaction
– Server stores changes
– Server acknowledges
commit
– Client receives
acknowledgement
Client Server
Store

PostgreSQL
• But what if the ack is
never received?
• The commit is already
stored…
• … but the client has
no indication!
• The system is in an
inconsistent state
Client Server
Store
?

PostgreSQL
• Let’s experiment!
• 5 clients write to a
PostgreSQL instance
• We then drop the server
from the network
• Results:
– 1000 writes
– 950 acknowledged
– 952 survivors

So what can we do?
1. Accept false-negatives
– May not be acceptable for your use case!
2. Use idempotent operations
3. Apply unique transaction IDs
– Query state after partition is resolved
• These strategies apply to any RDBMS

• A document-oriented database
• Availability/scale via replica sets
– Client writes to a master node
– Master replicates writes to n replicas
• User-selectable consistency guarantees

MongoDB
• When a partition occurs:
– If the master is in the
minority, it is demoted
– The majority promotes a
new master…
– … selected by the highest
optime

MongoDB
• The cluster “heals” after partition resolution:
– The “old” master rejoins the cluster
– Acknowleged minority writes are reverted!

MongoDB
• Let’s experiment!
• Set up a 5-node
MongoDB cluster
• 5 clients write to
the cluster
• We then partition
the cluster
• … and restore it to
see what happens

MongoDB
• With write concern
unacknowleged:
– Server does not ack
writes (except TCP)
– The default prior to
November 2012
• Results:
– 6000 writes
– 3319 survivors
– 42% data loss!

MongoDB
acknowleged:
– Server acknowledges
writes (after store)
– The default guarantee
• Results:
– 6000 writes
– 3692 survivors
– 37% data loss!

MongoDB
replica acknowleged:
– Client specifies
minimum replicas
– Server acks after
writes to replicas
• Results:
– 6000 writes
– 3768 survivors
– 33% data loss!

MongoDB
majority:
– For an n-node cluster,
requires at least n/2
replicas
– Also called “quorum”
• Results:
– 6000 writes
– 5701 survivors
– No data loss

So what can we do?
1. Keep calm and carry on
– As Aphyr puts it, “not all applications need
consistency”
– Have a reliable backup strategy
– … and make sure you drill restores!
2. Use write concern majority
– And take the performance hit

The prime suspects
• Aphyr’s Jepsen tests
include:
– Redis
– Riak
– Zookeeper
– Kafka
– Cassandra
– RabbitMQ
– etcd (and consul)
– ElasticSearch
• If you’re
considering them,
go read his posts
• In fact, go read his
posts regardless
http://aphyr.com/tags/jepsen

STRATEGIES FOR
DISTRIBUTED SYSTEMS

Immutable Data
• Immutable (adj.):
“Unchanging over
time or unable to be
changed.”
• Meaning:
– No deletes
– No updates
– No merge conflicts
– Replication is trivial

Idempotence
• An idempotent
operation:
– Can be applied one or
more times with the
same effect
• Enables retries
• Not always possible
– Side-effects are key
– Consider: payments

Eventual Consistency
• A design which prefers
availability
• … but guarantees that
clients will eventually see
consistent reads
• Consider git:
– Always available locally
– Converges via push/pull
– Human conflict resolution

Eventual Consistency
• The system expects
data to diverge
• … and includes
mechanisms to regain
convergence
– Partial ordering to
minimize conflicts
– A merge function to
resolve conflicts

Vector Clocks
• A technique for partial ordering
• Each node has a logical clock
– The clock increases on every write
– Track the last observed clocks for each item
– Include this vector on replication
• When observed and inbound vectors have
no common ancestor, we have a conflict
• This lets us know when history diverged

CRDTs
• Commutative Replicated Data Types1
• A CRDT is a data structure that:
– Eventually converges to a consistent state
– Guarantees no conflicts on replication
1 “A comprehensive study of Convergent and Commutative Replicated Data Types”, Shapiro et al

CRDTs
• CRDTs provide specialized semantics:
– G-Counter: Monotonously increasing counter
– PN-Counter: Also supports decrements
– G-Set: A set that only supports adds
– 2P-Set: Supports removals but only once
• OR-Sets are particularly useful
– Keeps track of both additions and removals
– Can be used for shopping carts

WE’RE DONE
HERE!
Thank you for listening
tomer@tomergabel.com
@tomerg
http://il.linkedin.com/in/tomergabel
Aphyr’s “Call Me Maybe” blog posts:
http://aphyr.com/tags/jepsen

Put Your Thinking CAP On

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (17)

Similaire à Put Your Thinking CAP On

Similaire à Put Your Thinking CAP On (20)

Plus de Tomer Gabel

Plus de Tomer Gabel (20)

Dernier

Dernier (20)

Put Your Thinking CAP On

Notes de l'éditeur