2. Durability
Data Store
(university , KTH )
set(university , UPC)
Ack
get(university )
UPC
2
3. Durability
Data Store
set(university , UPC )
Commodity
Ack
Non Volatile
3
4. Durability
Data Store
set(myKey, U)
Commodity
Ack
4
5. Durability
Seek time
+
SLOW
Rotational time
+ Write Read
Transfer time
Disk
5
6. Cache in memory
Slow Writes Reads Fast
Cached Objects
Consistency ?
Primary copy of objects
6
7. Cache in memory
Stale data
Application Servers
Set ObjA Read ObjA - > Cache Miss
Spending resouces
Read Obj A Memcache servers
Complicates development Delete Obj A
Update Obj A
Writes are still Slow
MySQL Servers
7
9. Approaches towards durability
State A State B Periodic Snapshots Data loss
Snapshot Snapshot
Synchronous logging Slow
Log Log Log
Asynchronous logging Data loss
Logs Logs
9
11. Project Goals
Durable write
Low latency
Availability, able to recover quickly
Cheap, commodity hardware
11
12. Target systems
• Data is big = many machines
• Read dominant workload
• Simple key-value store
• Small writes
– Example: Facebook
• Tera bytes of data = 2000 memcache servers
• Write/read ratio < 6%
• Memcache is a key-value store
• Status update, tag photo, profile update, etc
12
18. Design Decision
write
Two Problems
Database
client 1) Synchronous logging
Ack Log Must
Asynchronous logging
Problems: Data loss
2) Data availability
Replication
18
27. Log Server
3 2 1
Reader
7
Receiver 6 5 3
Persister
Sequential Write
Seek time
2 1
27
28. Forming storage units
1. Query zookeeper
Zookeeper
2. Get list of servers
3. Leader send request
4. Leader send list of
members
ID1 ID2 ID3
5. Upload storage unit data
6. Start the service
28
29. Storage System
Zookeeper
Client
Client Stable storage unit Stable storage unit
Client
Stable storage unit Stable storage unit
29
30. Failover
Cient
ID 1 ID 2 ID 3
50% 20% 30%
ID 4 ID 5 ID 6
40% 45% 20%
Stable Storage Unit Stable Storage Unit 30
31. Failover
Cient
ID 1 ID 2 ID 3
50% 20% 30%
ID 4 ID 5 ID 6
40% 45% 20%
Stable Storage Unit Stable Storage Unit 31
32. Failover
Cient
ID 1 ID 2 ID 3
50% 20% 30%
ID 4 ID 5 ID 6
40% 45% 20%
Stable Storage Unit Stable Storage Unit 32
33. Evaluation
• Throughput and latency of stable storage unit
– Log entry sizes
– Replication factors
• Comparison with WAL into local disk
33
Periodicsnapshop: degrade the performance at the time of snapshot, generate load spikeon machine
Important not to try to be all things to all people– Clients might be demanding 8 different things– Doing 6 of them is easy– …handling 7 of them requires real thought– …dealing with all 8 usually results in a worse system• more complex, compromises other clients in trying to satisfy everyoneE.g.Facebook 2008 – 800 memcache server – 2000 now < 6% writeUpdatessmall (expecttag, addfriend, new ads, status, profileupdate, sharing)
After log isreplicated in memory of several machines ackissendtotheclientIfsome of theprocessescrashsomeotherprocess in other machines willstillpersistthe dataSeveral replicas providebetteravailabilityof data at the time of recoveryAggregatethereadbandwidth of the servers toacceceleratetherecovery
Adding replica doesnt introduce bottleneck and doesnotimpactthroughput
Scalablility
Replication factor of three
Commonapproach WAL to local disk, Redisisanexample of a popular in memorydatabase uses WAL to diskToGuranteedurability of every log ,itshould be writtento disk uponeverywriteoperationEvenwhen log iswrittento disk thereis no guranteethatitispersisted disk, bacauseby default the disk caches are enabledProcesscrash 1.7 alsopoweroutage 49, no availabilityif server isdownOurs factor of 4 betterthan disk with cache disableSaturation can be prevented