Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

1

Share

Download to read offline

PaxosStore: High-availability Storage Made Practical in WeChat

Download to read offline

PaxosStore is a high-availability storage system developed to support the comprehensive business of WeChat. It employs a combinational design in the storage layer to engage multiple storage engines constructed for different storage models. PaxosStore is characteristic of extracting the Paxos-based distributed consensus protocol as a middleware that is universally accessible to the underlying multi-model storage engines. This facilitates tuning, maintaining, scaling and extending the storage engines. According to our experience in engineering practice, to achieve a practical consistent read/write protocol is far more complex than its theory. To tackle such engineering complexity, we propose a layered design of the Paxos-based storage protocol stack, where PaxosLog, the key data structure used in the protocol, is devised to bridge the programming-oriented consistent read/write to the storage-oriented Paxos procedure. Additionally, we present optimizations based on Paxos that made fault-tolerance more efficient.

PaxosStore is open source:
https://github.com/tencent/paxosstore

For details of the design, please refer to our VLDB 2017 paper:
http://www.vldb.org/pvldb/vol10/p1730-lin.pdf

Video of the presentation is also available:
https://youtu.be/5zNRfuaCgBI

Related Books

Free with a 30 day trial from Scribd

See all

PaxosStore: High-availability Storage Made Practical in WeChat

  1. 1. PaxosStore: High-availability Storage Made Practical in WeChat • Powered by the CohAna Engine
  2. 2. WeChat The new way to connect Chat Moments Contacts Search Pay 800 Million monthly active users
  3. 3. Applications (frontend) Services (backend) StoragePaxosStore
  4. 4. Evolution of Storage System in WeChat 1ST GEN 2011–2015 Based on the quorum protocol (NWR) 2ND GEN 2015–now Based on the Paxos algorithm
  5. 5. PaxosStore Paxos-based Storage Protocol Key-Value Table Queue Set Programming Model Storage Layer Consensus Layer Application Clients ... ... Bitcask Main/Delta Table LSM-tree Effective & Efficient consensus guarantee Elastic for dynamic workload Cross-datacenter fault tolerance
  6. 6. Storage Protocol Stack Consistent Read/Write Data access based on PaxosLog PaxosLog Each log entry is determined by Paxos Paxos Determining value with consensus PaxosStore implements the Paxos procedure using semi-symmetry message passing (read our paper for details) Prepare phase -- making a preliminary agreement Accept phase -- reaching the eventual consensus
  7. 7. Storage Protocol Stack Consistent Read/Write Data access based on PaxosLog PaxosLog Each log entry is determined by Paxos Paxos Determining value with consensus Entry EntryPaxosLog ⋯ Request ID Timestamp (16 bits) Request Seq. (16 bits) Client ID (32 bits) Promise No. Entry Proposal No. Value Proposer ID
  8. 8. Storage Protocol Stack Consistent Read/Write Data access based on PaxosLog PaxosLog Each log entry is determined by Paxos Paxos Determining value with consensus 𝑖 + 1 𝒊 𝑖 − 1 𝑖 − 2 ⋯ 𝒓 PaxosLog Data Object Pending Chosen Data Key
  9. 9. Storage Protocol Stack Consistent Read/Write Data access based on PaxosLog PaxosLog Each log entry is determined by Paxos Paxos Determining value with consensus 𝑟𝑖+1 𝒓𝒊 PaxosLog Data Key PaxosLog-as-Value (for key-value storage)
  10. 10. Storage Protocol Stack Consistent Read/Write Data access based on PaxosLog PaxosLog Each log entry is determined by Paxos Paxos Determining value with consensus For a data object 𝑟, 1) system reads its value from any of the up-to-date 𝑟 replicas, and 2) these up-to-date replicas need to dominate the total replicas of 𝑟 Consistent Read For read-frequent data, these criteria are likely to be satisfied For data contention, use trial Paxos procedure to sync replicas do not correspond to any substantive write operation
  11. 11. Storage Protocol Stack Consistent Read/Write Data access based on PaxosLog PaxosLog Each log entry is determined by Paxos Paxos Determining value with consensus Liveness PaxosLog-entry batched applying Consistent Write Relying on the Paxos procedures
  12. 12. Deployment & Fault Tolerance 𝑵 𝑨 𝑵 𝑫 @Datacenter 1 𝑵 𝑩 𝑵 𝑬 @Datacenter 2 𝑵 𝑪 𝑵 𝑭 @Datacenter 3 Paxos Paxos Paxos Failures in WeChat Production
  13. 13. Deployment & Fault Tolerance 𝑵 𝑨 𝑵 𝑫 @Datacenter 1 𝑵 𝑩 𝑵 𝑬 @Datacenter 2 𝑵 𝑪 𝑵 𝑭 @Datacenter 3 Paxos Paxos Paxos mini-cluster mini-clustermini-cluster
  14. 14. Deployment & Fault Tolerance 𝑵 𝑨 𝑵 𝑫 @Datacenter 1 𝑵 𝑩 𝑵 𝑬 @Datacenter 2 𝑵 𝑪 𝑵 𝑭 @Datacenter 3 Paxos Paxos Paxos data hosted by 𝑁𝐴
  15. 15. Deployment & Fault Tolerance 𝑵 𝑨 𝑵 𝑫 @Datacenter 1 𝑵 𝑩 𝑵 𝑬 @Datacenter 2 𝑵 𝑪 𝑵 𝑭 @Datacenter 3 Paxos Paxos Paxos
  16. 16. Deployment & Fault Tolerance 𝑵 𝑨 𝑵 𝑫 @Datacenter 1 𝑵 𝑩 𝑵 𝑬 @Datacenter 2 𝑵 𝑪 𝑵 𝑭 @Datacenter 3 Paxos Paxos Paxos queries
  17. 17. Deployment & Fault Tolerance 𝑵 𝑨 𝑵 𝑫 @Datacenter 1 𝑵 𝑩 𝑵 𝑬 @Datacenter 2 𝑵 𝑪 𝑵 𝑭 @Datacenter 3 Paxos Paxos Paxos
  18. 18. Data Recovery Recover through PaxosLog Recover through delta updates of data image Recover through whole data image Recovery starts Incremental PaxosLog entries exist? No Yes Data object is append-only? Yes No Recovery time decreases Lazy Recovery Obsolete data replicas are not recovered immediately upon node recovery, but recovered when they are subsequently accessed. Failover reads De-duplicated processing
  19. 19. Implementation • Use coroutine to program asynchronous procedure in the synchronous paradigm Search Repository https://github.com/Tencent/libco Much more efficient than Boost.Coroutine, while easy to use
  20. 20. Failure Recovery in WeChat Production • Read/Write ratio is 15:1 on average Failure happens at 14:20 Node resumes at 15:27 Restored to 95% normal throughput within 3 minutes
  21. 21. Summary • What covered in the paper – The design of PaxosStore, with emphasis on the construction of the consistent read/write protocol – Fault-tolerant scheme and data recovery strategies – Pragmatic optimizations come from our engineering practice • Key lessons learned – Apart from faults and failure, system overload is also a critical factor that affects system availability o Especially, the potential avalanche effect caused by overload must be paid enough attention to when designing the system fault-tolerant scheme. – Use coroutine and socket hook to program asynchronous procedures in a pseudo-synchronous style o This helps eliminate the error-prone function callbacks and simplify the implementation of asynchronous logics.
  22. 22. Thank You ALL! https://github.com/tencent/paxosstore
  • lqmike

    Sep. 13, 2017

PaxosStore is a high-availability storage system developed to support the comprehensive business of WeChat. It employs a combinational design in the storage layer to engage multiple storage engines constructed for different storage models. PaxosStore is characteristic of extracting the Paxos-based distributed consensus protocol as a middleware that is universally accessible to the underlying multi-model storage engines. This facilitates tuning, maintaining, scaling and extending the storage engines. According to our experience in engineering practice, to achieve a practical consistent read/write protocol is far more complex than its theory. To tackle such engineering complexity, we propose a layered design of the Paxos-based storage protocol stack, where PaxosLog, the key data structure used in the protocol, is devised to bridge the programming-oriented consistent read/write to the storage-oriented Paxos procedure. Additionally, we present optimizations based on Paxos that made fault-tolerance more efficient. PaxosStore is open source: https://github.com/tencent/paxosstore For details of the design, please refer to our VLDB 2017 paper: http://www.vldb.org/pvldb/vol10/p1730-lin.pdf Video of the presentation is also available: https://youtu.be/5zNRfuaCgBI

Views

Total views

483

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

33

Shares

0

Comments

0

Likes

1

×