SlideShare une entreprise Scribd logo
1  sur  22
Télécharger pour lire hors ligne
PaxosStore: High-availability
Storage Made Practical in WeChat
• Powered by the CohAna Engine
WeChat
The new way to connect
Chat Moments Contacts Search Pay
800 Million
monthly active users
Applications
(frontend)
Services
(backend)
StoragePaxosStore
Evolution of Storage System in WeChat
1ST
GEN
2011–2015
Based on the quorum protocol (NWR)
2ND
GEN
2015–now
Based on the Paxos algorithm
PaxosStore
Paxos-based Storage Protocol
Key-Value Table Queue Set
Programming
Model
Storage
Layer
Consensus
Layer
Application Clients
... ...
Bitcask Main/Delta
Table
LSM-tree
Effective & Efficient
consensus guarantee
Elastic
for dynamic workload
Cross-datacenter
fault tolerance
Storage Protocol Stack
Consistent Read/Write
Data access based on PaxosLog
PaxosLog
Each log entry is determined by Paxos
Paxos
Determining value with consensus
PaxosStore implements the Paxos procedure
using semi-symmetry message passing (read our paper for details)
Prepare phase -- making a preliminary agreement
Accept phase -- reaching the eventual consensus
Storage Protocol Stack
Consistent Read/Write
Data access based on PaxosLog
PaxosLog
Each log entry is determined by Paxos
Paxos
Determining value with consensus
Entry EntryPaxosLog ⋯
Request
ID
Timestamp
(16 bits)
Request Seq.
(16 bits)
Client ID
(32 bits)
Promise
No.
Entry
Proposal
No.
Value
Proposer
ID
Storage Protocol Stack
Consistent Read/Write
Data access based on PaxosLog
PaxosLog
Each log entry is determined by Paxos
Paxos
Determining value with consensus
𝑖 + 1 𝒊 𝑖 − 1 𝑖 − 2 ⋯
𝒓
PaxosLog
Data Object
Pending
Chosen
Data Key
Storage Protocol Stack
Consistent Read/Write
Data access based on PaxosLog
PaxosLog
Each log entry is determined by Paxos
Paxos
Determining value with consensus
𝑟𝑖+1 𝒓𝒊
PaxosLog
Data Key
PaxosLog-as-Value
(for key-value storage)
Storage Protocol Stack
Consistent Read/Write
Data access based on PaxosLog
PaxosLog
Each log entry is determined by Paxos
Paxos
Determining value with consensus
For a data object 𝑟,
1) system reads its value from any
of the up-to-date 𝑟 replicas, and
2) these up-to-date replicas need to
dominate the total replicas of 𝑟
Consistent Read For read-frequent data, these criteria are likely to be satisfied
For data contention, use trial Paxos procedure to sync replicas
do not correspond to
any substantive write operation
Storage Protocol Stack
Consistent Read/Write
Data access based on PaxosLog
PaxosLog
Each log entry is determined by Paxos
Paxos
Determining value with consensus
Liveness
PaxosLog-entry batched applying
Consistent Write
Relying on the Paxos procedures
Deployment & Fault Tolerance
𝑵 𝑨 𝑵 𝑫
@Datacenter 1
𝑵 𝑩 𝑵 𝑬
@Datacenter 2
𝑵 𝑪 𝑵 𝑭
@Datacenter 3
Paxos
Paxos Paxos
Failures in WeChat Production
Deployment & Fault Tolerance
𝑵 𝑨 𝑵 𝑫
@Datacenter 1
𝑵 𝑩 𝑵 𝑬
@Datacenter 2
𝑵 𝑪 𝑵 𝑭
@Datacenter 3
Paxos
Paxos Paxos
mini-cluster
mini-clustermini-cluster
Deployment & Fault Tolerance
𝑵 𝑨 𝑵 𝑫
@Datacenter 1
𝑵 𝑩 𝑵 𝑬
@Datacenter 2
𝑵 𝑪 𝑵 𝑭
@Datacenter 3
Paxos
Paxos Paxos
data hosted by 𝑁𝐴
Deployment & Fault Tolerance
𝑵 𝑨 𝑵 𝑫
@Datacenter 1
𝑵 𝑩 𝑵 𝑬
@Datacenter 2
𝑵 𝑪 𝑵 𝑭
@Datacenter 3
Paxos
Paxos Paxos
Deployment & Fault Tolerance
𝑵 𝑨 𝑵 𝑫
@Datacenter 1
𝑵 𝑩 𝑵 𝑬
@Datacenter 2
𝑵 𝑪 𝑵 𝑭
@Datacenter 3
Paxos
Paxos Paxos
queries
Deployment & Fault Tolerance
𝑵 𝑨 𝑵 𝑫
@Datacenter 1
𝑵 𝑩 𝑵 𝑬
@Datacenter 2
𝑵 𝑪 𝑵 𝑭
@Datacenter 3
Paxos
Paxos Paxos
Data Recovery
Recover through
PaxosLog
Recover through
delta updates of data image
Recover through
whole data image
Recovery
starts
Incremental
PaxosLog entries
exist?
No
Yes Data object is
append-only?
Yes
No
Recovery time decreases
Lazy Recovery
Obsolete data replicas are
not recovered immediately
upon node recovery, but
recovered when they are
subsequently accessed.
Failover reads
De-duplicated processing
Implementation
• Use coroutine to program asynchronous procedure in the
synchronous paradigm
Search Repository https://github.com/Tencent/libco
Much more efficient than Boost.Coroutine, while easy to use
Failure Recovery in WeChat Production
• Read/Write ratio is 15:1 on average
Failure happens at 14:20 Node resumes at 15:27
Restored to
95% normal throughput
within 3 minutes
Summary
• What covered in the paper
– The design of PaxosStore, with emphasis on the construction of the
consistent read/write protocol
– Fault-tolerant scheme and data recovery strategies
– Pragmatic optimizations come from our engineering practice
• Key lessons learned
– Apart from faults and failure, system overload is also a critical factor
that affects system availability
o Especially, the potential avalanche effect caused by overload must be paid
enough attention to when designing the system fault-tolerant scheme.
– Use coroutine and socket hook to program asynchronous procedures
in a pseudo-synchronous style
o This helps eliminate the error-prone function callbacks and simplify the
implementation of asynchronous logics.
Thank You ALL!
https://github.com/tencent/paxosstore

Contenu connexe

Similaire à PaxosStore: High-availability Storage Made Practical in WeChat

Flink Forward San Francisco 2018 keynote: Srikanth Satya - "Stream Processin...
Flink Forward San Francisco 2018 keynote:  Srikanth Satya - "Stream Processin...Flink Forward San Francisco 2018 keynote:  Srikanth Satya - "Stream Processin...
Flink Forward San Francisco 2018 keynote: Srikanth Satya - "Stream Processin...Flink Forward
 
Panasas ® Los Alamos National Laboratory
Panasas ® Los Alamos National LaboratoryPanasas ® Los Alamos National Laboratory
Panasas ® Los Alamos National LaboratoryPanasas
 
nextcomputing-packet-continuum
nextcomputing-packet-continuumnextcomputing-packet-continuum
nextcomputing-packet-continuumblabadini
 
Apache Stratos tutorial WSO2Con Europe-2014
Apache Stratos tutorial WSO2Con Europe-2014Apache Stratos tutorial WSO2Con Europe-2014
Apache Stratos tutorial WSO2Con Europe-2014Lakmal Warusawithana
 
Bring N-Tier Apps to containers 2015 ContainerCon
Bring N-Tier Apps to containers  2015 ContainerConBring N-Tier Apps to containers  2015 ContainerCon
Bring N-Tier Apps to containers 2015 ContainerConChris Haddad
 
ZCloud Consensus on Hardware for Distributed Systems
ZCloud Consensus on Hardware for Distributed SystemsZCloud Consensus on Hardware for Distributed Systems
ZCloud Consensus on Hardware for Distributed SystemsGokhan Boranalp
 
Luxun a Persistent Messaging System Tailored for Big Data Collecting & Analytics
Luxun a Persistent Messaging System Tailored for Big Data Collecting & AnalyticsLuxun a Persistent Messaging System Tailored for Big Data Collecting & Analytics
Luxun a Persistent Messaging System Tailored for Big Data Collecting & AnalyticsWilliam Yang
 
lessons from managing a pulsar cluster
 lessons from managing a pulsar cluster lessons from managing a pulsar cluster
lessons from managing a pulsar clusterShivji Kumar Jha
 
Climbing the beanstalk
Climbing the beanstalkClimbing the beanstalk
Climbing the beanstalkgordonyorke
 
Connect K of SMACK:pykafka, kafka-python or?
Connect K of SMACK:pykafka, kafka-python or?Connect K of SMACK:pykafka, kafka-python or?
Connect K of SMACK:pykafka, kafka-python or?Micron Technology
 
Spinnaker VLDB 2011
Spinnaker VLDB 2011Spinnaker VLDB 2011
Spinnaker VLDB 2011sandeep_tata
 
Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...
Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...
Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...HostedbyConfluent
 
Distributed Consensus: Making the Impossible Possible
Distributed Consensus: Making the Impossible PossibleDistributed Consensus: Making the Impossible Possible
Distributed Consensus: Making the Impossible PossibleC4Media
 
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...C4Media
 
Apache Stratos - Building a PaaS using OSGi and Equinox
Apache Stratos - Building a PaaS using OSGi and EquinoxApache Stratos - Building a PaaS using OSGi and Equinox
Apache Stratos - Building a PaaS using OSGi and EquinoxPaul Fremantle
 
Apache Pulsar as a Dual Stream / Batch Processor
Apache Pulsar as a Dual Stream / Batch ProcessorApache Pulsar as a Dual Stream / Batch Processor
Apache Pulsar as a Dual Stream / Batch ProcessorJoe Olson
 
Evolutionary Systems - Kafka Microservices
Evolutionary Systems - Kafka MicroservicesEvolutionary Systems - Kafka Microservices
Evolutionary Systems - Kafka MicroservicesStefano Rocco
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkDataWorks Summit
 
Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck - Pravega: Storage Rei...
Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck -  Pravega: Storage Rei...Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck -  Pravega: Storage Rei...
Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck - Pravega: Storage Rei...Flink Forward
 

Similaire à PaxosStore: High-availability Storage Made Practical in WeChat (20)

Flink Forward San Francisco 2018 keynote: Srikanth Satya - "Stream Processin...
Flink Forward San Francisco 2018 keynote:  Srikanth Satya - "Stream Processin...Flink Forward San Francisco 2018 keynote:  Srikanth Satya - "Stream Processin...
Flink Forward San Francisco 2018 keynote: Srikanth Satya - "Stream Processin...
 
Panasas ® Los Alamos National Laboratory
Panasas ® Los Alamos National LaboratoryPanasas ® Los Alamos National Laboratory
Panasas ® Los Alamos National Laboratory
 
nextcomputing-packet-continuum
nextcomputing-packet-continuumnextcomputing-packet-continuum
nextcomputing-packet-continuum
 
Apache Stratos tutorial WSO2Con Europe-2014
Apache Stratos tutorial WSO2Con Europe-2014Apache Stratos tutorial WSO2Con Europe-2014
Apache Stratos tutorial WSO2Con Europe-2014
 
Bring N-Tier Apps to containers 2015 ContainerCon
Bring N-Tier Apps to containers  2015 ContainerConBring N-Tier Apps to containers  2015 ContainerCon
Bring N-Tier Apps to containers 2015 ContainerCon
 
ZCloud Consensus on Hardware for Distributed Systems
ZCloud Consensus on Hardware for Distributed SystemsZCloud Consensus on Hardware for Distributed Systems
ZCloud Consensus on Hardware for Distributed Systems
 
Luxun a Persistent Messaging System Tailored for Big Data Collecting & Analytics
Luxun a Persistent Messaging System Tailored for Big Data Collecting & AnalyticsLuxun a Persistent Messaging System Tailored for Big Data Collecting & Analytics
Luxun a Persistent Messaging System Tailored for Big Data Collecting & Analytics
 
lessons from managing a pulsar cluster
 lessons from managing a pulsar cluster lessons from managing a pulsar cluster
lessons from managing a pulsar cluster
 
Climbing the beanstalk
Climbing the beanstalkClimbing the beanstalk
Climbing the beanstalk
 
Connect K of SMACK:pykafka, kafka-python or?
Connect K of SMACK:pykafka, kafka-python or?Connect K of SMACK:pykafka, kafka-python or?
Connect K of SMACK:pykafka, kafka-python or?
 
Spinnaker VLDB 2011
Spinnaker VLDB 2011Spinnaker VLDB 2011
Spinnaker VLDB 2011
 
Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...
Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...
Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...
 
Distributed Consensus: Making the Impossible Possible
Distributed Consensus: Making the Impossible PossibleDistributed Consensus: Making the Impossible Possible
Distributed Consensus: Making the Impossible Possible
 
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
 
Apache Stratos - Building a PaaS using OSGi and Equinox
Apache Stratos - Building a PaaS using OSGi and EquinoxApache Stratos - Building a PaaS using OSGi and Equinox
Apache Stratos - Building a PaaS using OSGi and Equinox
 
Apache Pulsar as a Dual Stream / Batch Processor
Apache Pulsar as a Dual Stream / Batch ProcessorApache Pulsar as a Dual Stream / Batch Processor
Apache Pulsar as a Dual Stream / Batch Processor
 
Oracle Coherence
Oracle CoherenceOracle Coherence
Oracle Coherence
 
Evolutionary Systems - Kafka Microservices
Evolutionary Systems - Kafka MicroservicesEvolutionary Systems - Kafka Microservices
Evolutionary Systems - Kafka Microservices
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck - Pravega: Storage Rei...
Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck -  Pravega: Storage Rei...Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck -  Pravega: Storage Rei...
Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck - Pravega: Storage Rei...
 

Plus de Qian Lin

Fine-Grained, Secure and Efficient Data Provenance on Blockchain Systems
Fine-Grained, Secure and Efficient Data Provenance on Blockchain SystemsFine-Grained, Secure and Efficient Data Provenance on Blockchain Systems
Fine-Grained, Secure and Efficient Data Provenance on Blockchain SystemsQian Lin
 
Trinity: A Distributed Graph Engine on a Memory Cloud
Trinity: A Distributed Graph Engine on a Memory CloudTrinity: A Distributed Graph Engine on a Memory Cloud
Trinity: A Distributed Graph Engine on a Memory CloudQian Lin
 
Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices
Presto: Distributed Machine Learning and Graph Processing with Sparse MatricesPresto: Distributed Machine Learning and Graph Processing with Sparse Matrices
Presto: Distributed Machine Learning and Graph Processing with Sparse MatricesQian Lin
 
Adaptive Execution Support for Malleable Computation
Adaptive Execution Support for Malleable ComputationAdaptive Execution Support for Malleable Computation
Adaptive Execution Support for Malleable ComputationQian Lin
 
C-Cube: Elastic Continuous Clustering in the Cloud
C-Cube: Elastic Continuous Clustering in the CloudC-Cube: Elastic Continuous Clustering in the Cloud
C-Cube: Elastic Continuous Clustering in the CloudQian Lin
 
Kineograph: Taking the Pulse of a Fast-Changing and Connected World
Kineograph: Taking the Pulse of a Fast-Changing and Connected WorldKineograph: Taking the Pulse of a Fast-Changing and Connected World
Kineograph: Taking the Pulse of a Fast-Changing and Connected WorldQian Lin
 
Optimizing Virtual Machines Using Hybrid Virtualization
Optimizing Virtual Machines Using Hybrid VirtualizationOptimizing Virtual Machines Using Hybrid Virtualization
Optimizing Virtual Machines Using Hybrid VirtualizationQian Lin
 
Virtual Machine Performance
Virtual Machine PerformanceVirtual Machine Performance
Virtual Machine PerformanceQian Lin
 
Be an Explorer, Be a Coder, Be a Writer
Be an Explorer, Be a Coder, Be a WriterBe an Explorer, Be a Coder, Be a Writer
Be an Explorer, Be a Coder, Be a WriterQian Lin
 
SciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data Formats
SciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data FormatsSciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data Formats
SciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data FormatsQian Lin
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...Qian Lin
 
In-situ MapReduce for Log Processing
In-situ MapReduce for Log ProcessingIn-situ MapReduce for Log Processing
In-situ MapReduce for Log ProcessingQian Lin
 
C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors
C-MR: Continuously Executing MapReduce Workflows on Multi-Core ProcessorsC-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors
C-MR: Continuously Executing MapReduce Workflows on Multi-Core ProcessorsQian Lin
 

Plus de Qian Lin (13)

Fine-Grained, Secure and Efficient Data Provenance on Blockchain Systems
Fine-Grained, Secure and Efficient Data Provenance on Blockchain SystemsFine-Grained, Secure and Efficient Data Provenance on Blockchain Systems
Fine-Grained, Secure and Efficient Data Provenance on Blockchain Systems
 
Trinity: A Distributed Graph Engine on a Memory Cloud
Trinity: A Distributed Graph Engine on a Memory CloudTrinity: A Distributed Graph Engine on a Memory Cloud
Trinity: A Distributed Graph Engine on a Memory Cloud
 
Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices
Presto: Distributed Machine Learning and Graph Processing with Sparse MatricesPresto: Distributed Machine Learning and Graph Processing with Sparse Matrices
Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices
 
Adaptive Execution Support for Malleable Computation
Adaptive Execution Support for Malleable ComputationAdaptive Execution Support for Malleable Computation
Adaptive Execution Support for Malleable Computation
 
C-Cube: Elastic Continuous Clustering in the Cloud
C-Cube: Elastic Continuous Clustering in the CloudC-Cube: Elastic Continuous Clustering in the Cloud
C-Cube: Elastic Continuous Clustering in the Cloud
 
Kineograph: Taking the Pulse of a Fast-Changing and Connected World
Kineograph: Taking the Pulse of a Fast-Changing and Connected WorldKineograph: Taking the Pulse of a Fast-Changing and Connected World
Kineograph: Taking the Pulse of a Fast-Changing and Connected World
 
Optimizing Virtual Machines Using Hybrid Virtualization
Optimizing Virtual Machines Using Hybrid VirtualizationOptimizing Virtual Machines Using Hybrid Virtualization
Optimizing Virtual Machines Using Hybrid Virtualization
 
Virtual Machine Performance
Virtual Machine PerformanceVirtual Machine Performance
Virtual Machine Performance
 
Be an Explorer, Be a Coder, Be a Writer
Be an Explorer, Be a Coder, Be a WriterBe an Explorer, Be a Coder, Be a Writer
Be an Explorer, Be a Coder, Be a Writer
 
SciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data Formats
SciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data FormatsSciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data Formats
SciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data Formats
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
 
In-situ MapReduce for Log Processing
In-situ MapReduce for Log ProcessingIn-situ MapReduce for Log Processing
In-situ MapReduce for Log Processing
 
C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors
C-MR: Continuously Executing MapReduce Workflows on Multi-Core ProcessorsC-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors
C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors
 

Dernier

Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 

Dernier (20)

Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 

PaxosStore: High-availability Storage Made Practical in WeChat

  • 1. PaxosStore: High-availability Storage Made Practical in WeChat • Powered by the CohAna Engine
  • 2. WeChat The new way to connect Chat Moments Contacts Search Pay 800 Million monthly active users
  • 4. Evolution of Storage System in WeChat 1ST GEN 2011–2015 Based on the quorum protocol (NWR) 2ND GEN 2015–now Based on the Paxos algorithm
  • 5. PaxosStore Paxos-based Storage Protocol Key-Value Table Queue Set Programming Model Storage Layer Consensus Layer Application Clients ... ... Bitcask Main/Delta Table LSM-tree Effective & Efficient consensus guarantee Elastic for dynamic workload Cross-datacenter fault tolerance
  • 6. Storage Protocol Stack Consistent Read/Write Data access based on PaxosLog PaxosLog Each log entry is determined by Paxos Paxos Determining value with consensus PaxosStore implements the Paxos procedure using semi-symmetry message passing (read our paper for details) Prepare phase -- making a preliminary agreement Accept phase -- reaching the eventual consensus
  • 7. Storage Protocol Stack Consistent Read/Write Data access based on PaxosLog PaxosLog Each log entry is determined by Paxos Paxos Determining value with consensus Entry EntryPaxosLog ⋯ Request ID Timestamp (16 bits) Request Seq. (16 bits) Client ID (32 bits) Promise No. Entry Proposal No. Value Proposer ID
  • 8. Storage Protocol Stack Consistent Read/Write Data access based on PaxosLog PaxosLog Each log entry is determined by Paxos Paxos Determining value with consensus 𝑖 + 1 𝒊 𝑖 − 1 𝑖 − 2 ⋯ 𝒓 PaxosLog Data Object Pending Chosen Data Key
  • 9. Storage Protocol Stack Consistent Read/Write Data access based on PaxosLog PaxosLog Each log entry is determined by Paxos Paxos Determining value with consensus 𝑟𝑖+1 𝒓𝒊 PaxosLog Data Key PaxosLog-as-Value (for key-value storage)
  • 10. Storage Protocol Stack Consistent Read/Write Data access based on PaxosLog PaxosLog Each log entry is determined by Paxos Paxos Determining value with consensus For a data object 𝑟, 1) system reads its value from any of the up-to-date 𝑟 replicas, and 2) these up-to-date replicas need to dominate the total replicas of 𝑟 Consistent Read For read-frequent data, these criteria are likely to be satisfied For data contention, use trial Paxos procedure to sync replicas do not correspond to any substantive write operation
  • 11. Storage Protocol Stack Consistent Read/Write Data access based on PaxosLog PaxosLog Each log entry is determined by Paxos Paxos Determining value with consensus Liveness PaxosLog-entry batched applying Consistent Write Relying on the Paxos procedures
  • 12. Deployment & Fault Tolerance 𝑵 𝑨 𝑵 𝑫 @Datacenter 1 𝑵 𝑩 𝑵 𝑬 @Datacenter 2 𝑵 𝑪 𝑵 𝑭 @Datacenter 3 Paxos Paxos Paxos Failures in WeChat Production
  • 13. Deployment & Fault Tolerance 𝑵 𝑨 𝑵 𝑫 @Datacenter 1 𝑵 𝑩 𝑵 𝑬 @Datacenter 2 𝑵 𝑪 𝑵 𝑭 @Datacenter 3 Paxos Paxos Paxos mini-cluster mini-clustermini-cluster
  • 14. Deployment & Fault Tolerance 𝑵 𝑨 𝑵 𝑫 @Datacenter 1 𝑵 𝑩 𝑵 𝑬 @Datacenter 2 𝑵 𝑪 𝑵 𝑭 @Datacenter 3 Paxos Paxos Paxos data hosted by 𝑁𝐴
  • 15. Deployment & Fault Tolerance 𝑵 𝑨 𝑵 𝑫 @Datacenter 1 𝑵 𝑩 𝑵 𝑬 @Datacenter 2 𝑵 𝑪 𝑵 𝑭 @Datacenter 3 Paxos Paxos Paxos
  • 16. Deployment & Fault Tolerance 𝑵 𝑨 𝑵 𝑫 @Datacenter 1 𝑵 𝑩 𝑵 𝑬 @Datacenter 2 𝑵 𝑪 𝑵 𝑭 @Datacenter 3 Paxos Paxos Paxos queries
  • 17. Deployment & Fault Tolerance 𝑵 𝑨 𝑵 𝑫 @Datacenter 1 𝑵 𝑩 𝑵 𝑬 @Datacenter 2 𝑵 𝑪 𝑵 𝑭 @Datacenter 3 Paxos Paxos Paxos
  • 18. Data Recovery Recover through PaxosLog Recover through delta updates of data image Recover through whole data image Recovery starts Incremental PaxosLog entries exist? No Yes Data object is append-only? Yes No Recovery time decreases Lazy Recovery Obsolete data replicas are not recovered immediately upon node recovery, but recovered when they are subsequently accessed. Failover reads De-duplicated processing
  • 19. Implementation • Use coroutine to program asynchronous procedure in the synchronous paradigm Search Repository https://github.com/Tencent/libco Much more efficient than Boost.Coroutine, while easy to use
  • 20. Failure Recovery in WeChat Production • Read/Write ratio is 15:1 on average Failure happens at 14:20 Node resumes at 15:27 Restored to 95% normal throughput within 3 minutes
  • 21. Summary • What covered in the paper – The design of PaxosStore, with emphasis on the construction of the consistent read/write protocol – Fault-tolerant scheme and data recovery strategies – Pragmatic optimizations come from our engineering practice • Key lessons learned – Apart from faults and failure, system overload is also a critical factor that affects system availability o Especially, the potential avalanche effect caused by overload must be paid enough attention to when designing the system fault-tolerant scheme. – Use coroutine and socket hook to program asynchronous procedures in a pseudo-synchronous style o This helps eliminate the error-prone function callbacks and simplify the implementation of asynchronous logics.