Presentation

Durability for Memory-Based
Key-Value Stores

Kiarash Rezahanjani
July 4, 2012

1

Durability
Data Store
(university , KTH )
set(university , UPC)

Ack

get(university )

UPC

2

Durability
Data Store
set(university , UPC )

Commodity
Ack

Non Volatile

3

Durability
Data Store
set(myKey, U)

Commodity
Ack

4

Durability

Seek time
+
SLOW
Rotational time
+ Write Read
Transfer time

Disk

5

Cache in memory

Slow Writes Reads Fast

Cached Objects

Consistency ?
Primary copy of objects

6

Cache in memory

Stale data
Application Servers

Set ObjA Read ObjA - > Cache Miss

Spending resouces
Read Obj A Memcache servers

Complicates development Delete Obj A

Update Obj A
Writes are still Slow
MySQL Servers

7

Memory-Based Databases
No inconsistency Writes Reads
No stale data

Reads are fast Primary Copy of Objects
Durability?

Writes latency?
Back up

8

Approaches towards durability

State A State B Periodic Snapshots Data loss

Snapshot Snapshot

Synchronous logging Slow

Log Log Log

Asynchronous logging Data loss

Logs Logs
9

Approaches towards durability

Replica

Expensive
Data

Catastrophic Failure , All gone
Replica Replica

10

Project Goals

Durable write
Low latency

Availability, able to recover quickly

Cheap, commodity hardware

11

Target systems
• Data is big = many machines
• Read dominant workload
• Simple key-value store
• Small writes
– Example: Facebook
• Tera bytes of data = 2000 memcache servers
• Write/read ratio < 6%
• Memcache is a key-value store
• Status update, tag photo, profile update, etc

12

Design decisions

Periodic snapshot
vs.
Message logging 

14

Design decisions

Local disk
vs.
Remote location 

15

Design decisions

Remote file server
vs.
Local disks of database cluster 

16

Design Decision
write

Database
client

Ack Log

Remote storage
17

Design Decision
write
Two Problems
Database
client 1) Synchronous logging

Ack Log Must
Asynchronous logging
Problems: Data loss

2) Data availability

Replication
18

Replication

Ack Log
Ack Log

Log Log Log

Replication

19

Replication
Broadcast Chain replication

Ack Log Ack Log

mast
er tail head

slave slave

20

Replication
Broadcast

Ack Log

mast
er

slave slave

slave

21

Replication
Chain replication

Ack Log

tail head

22

Replication
Chain replication

Log
Ack

tail head

23

Chain Replication
write

Database
Ack client Log

Log Log Log

24

Chain Replication
Synchronous logging abstraction
write

Low latency Database
Ack client Log

Available Logs

Log Log Log
Stable Storage Unit

25

Log Server
3 2 1
Reader

7
Receiver 6 5 3

Persister

Sequential Write

Seek time

2 1
27

Forming storage units

1. Query zookeeper
Zookeeper
2. Get list of servers
3. Leader send request
4. Leader send list of
members
ID1 ID2 ID3
5. Upload storage unit data
6. Start the service
28

Storage System
Zookeeper

Client

Client Stable storage unit Stable storage unit

Client

Stable storage unit Stable storage unit
29

Failover
Cient

ID 1 ID 2 ID 3
50% 20% 30%

ID 4 ID 5 ID 6
40% 45% 20%

Stable Storage Unit Stable Storage Unit 30

Failover
Cient

ID 1 ID 2 ID 3
50% 20% 30%

ID 4 ID 5 ID 6
40% 45% 20%


Evaluation
• Throughput and latency of stable storage unit
– Log entry sizes
– Replication factors
• Comparison with WAL into local disk

33

Single synchronous client
Replication factor of 3

Entry Size Latency(ms) Throughput(entries/sec)
(bytes)
200 0,45 2200
1024 0,62 1600
4096 0,99 1000

34

Throughput vs. Latency
Replication factor of 3
3500

3000

2500
Latency (ms)

2000
5B
200 B
1500 1 KB
5000 4 KB

1000 14000 28000 10 KB

34000
500

0
0 5000 10000 15000 20000 25000 30000 35000 40000 45000
Throughput (entries/sec)

35

Additional replica
Entry size of 200 bytes
2000

1800

1600

1400
Latency (microsecond)

1200

1000

800 RF 3
RF 2
600

400

200

0
0 5000 10000 15000 20000 25000 30000 35000 40000
Throughput (entries/sec)

36

Resource utilization

• Throughput of 6,000 entries/sec
• Log entries of 200 bytes
– CPU utilization = 9%
– Bandwidth = 29 Mb/s
– Dedicated disk
– Small memory requirement

39

Summary
 Durable write

 Low latency

 High availability

 Scalable

 No additional resources

 Avoid dependencies 40

Presentation

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (6)

Similaire à Presentation

Similaire à Presentation (20)

Dernier

Dernier (20)

Presentation

Notes de l'éditeur