PaxosStore: High-availability Storage Made Practical in WeChat

PaxosStore: High-availability
Storage Made Practical in WeChat
• Powered by the CohAna Engine

WeChat
The new way to connect
Chat Moments Contacts Search Pay
800 Million
monthly active users

Applications
(frontend)
Services
(backend)
StoragePaxosStore

Evolution of Storage System in WeChat
1ST
GEN
2011–2015
Based on the quorum protocol (NWR)
2ND
GEN
2015–now
Based on the Paxos algorithm

PaxosStore
Paxos-based Storage Protocol
Key-Value Table Queue Set
Programming
Model
Storage
Layer
Consensus
Layer
Application Clients
... ...
Bitcask Main/Delta
Table
LSM-tree
Effective & Efficient
consensus guarantee
Elastic
for dynamic workload
Cross-datacenter
fault tolerance

Storage Protocol Stack
Consistent Read/Write
Data access based on PaxosLog
PaxosLog
Each log entry is determined by Paxos
Paxos
Determining value with consensus
PaxosStore implements the Paxos procedure
using semi-symmetry message passing (read our paper for details)
Prepare phase -- making a preliminary agreement
Accept phase -- reaching the eventual consensus

PaxosLog
Paxos
Entry EntryPaxosLog ⋯
Request
ID
Timestamp
(16 bits)
Request Seq.
(16 bits)
Client ID
(32 bits)
Promise
No.
Entry
Proposal
No.
Value
Proposer
ID

PaxosLog
Paxos
𝑖 + 1 𝒊 𝑖 − 1 𝑖 − 2 ⋯
𝒓
PaxosLog
Data Object
Pending
Chosen
Data Key

PaxosLog
Paxos
𝑟𝑖+1 𝒓𝒊
PaxosLog
Data Key
PaxosLog-as-Value
(for key-value storage)

PaxosLog
Paxos
For a data object 𝑟,
1) system reads its value from any
of the up-to-date 𝑟 replicas, and
2) these up-to-date replicas need to
dominate the total replicas of 𝑟
Consistent Read For read-frequent data, these criteria are likely to be satisfied
For data contention, use trial Paxos procedure to sync replicas
do not correspond to
any substantive write operation

PaxosLog
Paxos
Liveness
PaxosLog-entry batched applying
Consistent Write
Relying on the Paxos procedures

Deployment & Fault Tolerance
𝑵 𝑨 𝑵 𝑫
@Datacenter 1
𝑵 𝑩 𝑵 𝑬
@Datacenter 2
𝑵 𝑪 𝑵 𝑭
@Datacenter 3
Paxos
Paxos Paxos
Failures in WeChat Production

𝑵 𝑨 𝑵 𝑫
@Datacenter 1
𝑵 𝑩 𝑵 𝑬
@Datacenter 2
𝑵 𝑪 𝑵 𝑭
@Datacenter 3
Paxos
Paxos Paxos
mini-cluster
mini-clustermini-cluster

𝑵 𝑨 𝑵 𝑫
@Datacenter 1
𝑵 𝑩 𝑵 𝑬
@Datacenter 2
𝑵 𝑪 𝑵 𝑭
@Datacenter 3
Paxos
Paxos Paxos
data hosted by 𝑁𝐴

𝑵 𝑨 𝑵 𝑫
@Datacenter 1
𝑵 𝑩 𝑵 𝑬
@Datacenter 2
𝑵 𝑪 𝑵 𝑭
@Datacenter 3
Paxos
Paxos Paxos

𝑵 𝑨 𝑵 𝑫
@Datacenter 1
𝑵 𝑩 𝑵 𝑬
@Datacenter 2
𝑵 𝑪 𝑵 𝑭
@Datacenter 3
Paxos
Paxos Paxos
queries

Data Recovery
Recover through
PaxosLog
Recover through
delta updates of data image
Recover through
whole data image
Recovery
starts
Incremental
PaxosLog entries
exist?
No
Yes Data object is
append-only?
Yes
No
Recovery time decreases
Lazy Recovery
Obsolete data replicas are
not recovered immediately
upon node recovery, but
recovered when they are
subsequently accessed.
Failover reads
De-duplicated processing

Implementation
• Use coroutine to program asynchronous procedure in the
synchronous paradigm
Search Repository https://github.com/Tencent/libco
Much more efficient than Boost.Coroutine, while easy to use

Failure Recovery in WeChat Production
• Read/Write ratio is 15:1 on average
Failure happens at 14:20 Node resumes at 15:27
Restored to
95% normal throughput
within 3 minutes

Summary
• What covered in the paper
– The design of PaxosStore, with emphasis on the construction of the
consistent read/write protocol
– Fault-tolerant scheme and data recovery strategies
– Pragmatic optimizations come from our engineering practice
• Key lessons learned
– Apart from faults and failure, system overload is also a critical factor
that affects system availability
o Especially, the potential avalanche effect caused by overload must be paid
enough attention to when designing the system fault-tolerant scheme.
– Use coroutine and socket hook to program asynchronous procedures
in a pseudo-synchronous style
o This helps eliminate the error-prone function callbacks and simplify the
implementation of asynchronous logics.

Thank You ALL!
https://github.com/tencent/paxosstore

PaxosStore: High-availability Storage Made Practical in WeChat

Recommandé

Recommandé

Contenu connexe

Similaire à PaxosStore: High-availability Storage Made Practical in WeChat

Similaire à PaxosStore: High-availability Storage Made Practical in WeChat (20)

Plus de Qian Lin

Plus de Qian Lin (13)

Dernier

Dernier (20)

PaxosStore: High-availability Storage Made Practical in WeChat