SlideShare une entreprise Scribd logo
1  sur  40
Durability for Memory-Based
     Key-Value Stores


       Kiarash Rezahanjani
             July 4, 2012




                              1
Durability
                          Data Store
                        (university , KTH )
set(university , UPC)




 Ack




get(university )




  UPC


                                              2
Durability
                         Data Store
set(university , UPC )


                                        Commodity
Ack


                                      Non Volatile




                                                     3
Durability
                     Data Store
set(myKey, U)


                                  Commodity
Ack




                                        4
Durability


  Seek time
        +
                      SLOW
Rotational time
      +           Write          Read
 Transfer time




                          Disk

                                        5
Cache in memory

Slow   Writes          Reads   Fast



                   Cached Objects


                                          Consistency ?
                Primary copy of objects

                                                     6
Cache in memory

        Stale data
            Application Servers


                Set ObjA             Read ObjA - > Cache Miss

                 Spending resouces
Read Obj A                                                      Memcache servers


Complicates development                                         Delete Obj A


Update Obj A
               Writes are still Slow
                                  MySQL Servers

                                                                                   7
Memory-Based Databases
No inconsistency Writes             Reads
                                             No stale data


 Reads are fast    Primary Copy of Objects
                    Durability?


                                    Writes latency?
                          Back up

                                                             8
Approaches towards durability

State A            State B     Periodic Snapshots   Data loss


Snapshot           Snapshot


                              Synchronous logging Slow


   Log       Log        Log



                              Asynchronous logging Data loss


          Logs       Logs
                                                            9
Approaches towards durability

                    Replica



                                  Expensive
                     Data

Catastrophic Failure , All gone
       Replica                     Replica




                                             10
Project Goals

          Durable write
           Low latency

Availability, able to recover quickly

 Cheap, commodity hardware


                                        11
Target systems
•   Data is big = many machines
•   Read dominant workload
•   Simple key-value store
•   Small writes
    – Example: Facebook
       •   Tera bytes of data = 2000 memcache servers
       •   Write/read ratio < 6%
       •   Memcache is a key-value store
       •   Status update, tag photo, profile update, etc

                                                           12
Solution




           13
Design decisions


  Periodic snapshot
       vs.
  Message logging     



                          14
Design decisions


    Local disk
       vs.
  Remote location   



                        15
Design decisions


      Remote file server
               vs.
Local disks of database cluster   



                                      16
Design Decision
                write


         Database
           client


  Ack               Log




        Remote storage
                          17
Design Decision
            write
                       Two Problems
      Database
       client           1) Synchronous logging

Ack              Log             Must
                          Asynchronous logging
                                 Problems: Data loss

                        2) Data availability

 Replication
                                                       18
Replication

                   Ack                  Log
Ack     Log


                 Log        Log   Log

Replication




                                              19
Replication
              Broadcast                              Chain replication


        Ack               Log           Ack                                Log



                mast
                 er                           tail                       head



slave                           slave




                                                                            20
Replication
          Broadcast


    Ack               Log


            mast
             er



slave                  slave



            slave

                               21
Replication
               Chain replication


Ack                                  Log




      tail                         head




                                           22
Replication
               Chain replication


                                          Log
Ack



      tail                         head




                                                23
Chain Replication
                       write


            Database
      Ack     client   Log




Log         Log                Log




                                     24
Chain Replication
Synchronous logging abstraction
                           write


Low latency             Database
                  Ack     client   Log


Available Logs


        Log             Log              Log
                  Stable Storage Unit

                                               25
Log Server


 Log




             26
Log Server
                                                       3        2 1
                                        Reader


           7
Receiver                6     5     3




                                        Persister


                 Sequential Write

                 Seek time

                                                 2 1
                                                           27
Forming storage units

1. Query zookeeper
                                    Zookeeper
2. Get list of servers
3. Leader send request
4. Leader send list of
  members
                              ID1     ID2       ID3
5. Upload storage unit data
6. Start the service
                                                 28
Storage System
                                 Zookeeper




Client


Client     Stable storage unit               Stable storage unit



Client




           Stable storage unit               Stable storage unit
                                                                   29
Failover
                          Cient




ID 1                              ID 2             ID 3
50%                               20%              30%




ID 4               ID 5                            ID 6
40%                45%                             20%



 Stable Storage Unit                Stable Storage Unit   30
Failover
                          Cient




ID 1                              ID 2             ID 3
50%                               20%              30%




ID 4               ID 5                            ID 6
40%                45%                             20%



 Stable Storage Unit                Stable Storage Unit   31
Failover
                          Cient




ID 1                              ID 2             ID 3
50%                               20%              30%




ID 4               ID 5                            ID 6
40%                45%                             20%



 Stable Storage Unit                Stable Storage Unit   32
Evaluation
• Throughput and latency of stable storage unit
  – Log entry sizes
  – Replication factors
• Comparison with WAL into local disk




                                                  33
Single synchronous client
             Replication factor of 3


Entry Size    Latency(ms)        Throughput(entries/sec)
(bytes)
200           0,45               2200
1024          0,62               1600
4096          0,99               1000




                                                           34
Throughput vs. Latency
                                          Replication factor of 3
               3500



               3000



               2500
Latency (ms)




               2000
                                                                                                             5B
                                                                                                             200 B
               1500                                                                                          1 KB
                          5000                                                                               4 KB

               1000                14000                        28000                                        10 KB

                                                                                 34000
               500



                  0
                      0    5000   10000   15000      20000      25000        30000   35000   40000   45000
                                                  Throughput (entries/sec)


                                                                                                                     35
Additional replica
                                                   Entry size of 200 bytes
                        2000

                        1800

                        1600

                        1400
Latency (microsecond)




                        1200

                        1000

                         800                                                                                  RF 3
                                                                                                              RF 2
                         600

                         400

                         200

                           0
                               0   5000    10000     15000       20000        25000   30000   35000   40000
                                                        Throughput (entries/sec)



                                                                                                                     36
Sustained load




                 37
‹#›
Resource utilization

• Throughput of 6,000 entries/sec
• Log entries of 200 bytes
  – CPU utilization = 9%
  – Bandwidth = 29 Mb/s
  – Dedicated disk
  – Small memory requirement


                                    39
Summary
 Durable write

 Low latency

 High availability

 Scalable

 No additional resources

  Avoid dependencies       40

Contenu connexe

Tendances

Intro to Kernel Debugging - Just make the crashing stop!
Intro to Kernel Debugging - Just make the crashing stop!Intro to Kernel Debugging - Just make the crashing stop!
Intro to Kernel Debugging - Just make the crashing stop!All Things Open
 
Lecture1: NGS Analysis on Beocat and an introduction to Perl programming for ...
Lecture1: NGS Analysis on Beocat and an introduction to Perl programming for ...Lecture1: NGS Analysis on Beocat and an introduction to Perl programming for ...
Lecture1: NGS Analysis on Beocat and an introduction to Perl programming for ...Jennifer Shelton
 
Collaborate instant cloning_kyle
Collaborate instant cloning_kyleCollaborate instant cloning_kyle
Collaborate instant cloning_kyleKyle Hailey
 
Built in physical and logical replication in postgresql-Firat Gulec
Built in physical and logical replication in postgresql-Firat GulecBuilt in physical and logical replication in postgresql-Firat Gulec
Built in physical and logical replication in postgresql-Firat GulecFIRAT GULEC
 
[오픈소스컨설팅] Linux Network Troubleshooting
[오픈소스컨설팅] Linux Network Troubleshooting[오픈소스컨설팅] Linux Network Troubleshooting
[오픈소스컨설팅] Linux Network TroubleshootingOpen Source Consulting
 
Modern Linux Tracing Landscape
Modern Linux Tracing LandscapeModern Linux Tracing Landscape
Modern Linux Tracing LandscapeKernel TLV
 
Firebird 3: provider-based architecture, plugins and OO approach to API
Firebird 3: provider-based architecture, plugins and OO approach to API Firebird 3: provider-based architecture, plugins and OO approach to API
Firebird 3: provider-based architecture, plugins and OO approach to API Mind The Firebird
 
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all startedKernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all startedAnne Nicolas
 
Enable DPDK and SR-IOV for containerized virtual network functions with zun
Enable DPDK and SR-IOV for containerized virtual network functions with zunEnable DPDK and SR-IOV for containerized virtual network functions with zun
Enable DPDK and SR-IOV for containerized virtual network functions with zunheut2008
 
Take a Jailbreak -Stunning Guards for iOS Jailbreak- by Kaoru Otsuka
Take a Jailbreak -Stunning Guards for iOS Jailbreak- by Kaoru OtsukaTake a Jailbreak -Stunning Guards for iOS Jailbreak- by Kaoru Otsuka
Take a Jailbreak -Stunning Guards for iOS Jailbreak- by Kaoru OtsukaCODE BLUE
 
Linux Performance Analysis and Tools
Linux Performance Analysis and ToolsLinux Performance Analysis and Tools
Linux Performance Analysis and ToolsBrendan Gregg
 
DeathNote of Microsoft Windows Kernel
DeathNote of Microsoft Windows KernelDeathNote of Microsoft Windows Kernel
DeathNote of Microsoft Windows KernelPeter Hlavaty
 
嵌入式Linux課程-GNU Toolchain
嵌入式Linux課程-GNU Toolchain嵌入式Linux課程-GNU Toolchain
嵌入式Linux課程-GNU Toolchain艾鍗科技
 
JRuby with Java Code in Data Processing World
JRuby with Java Code in Data Processing WorldJRuby with Java Code in Data Processing World
JRuby with Java Code in Data Processing WorldSATOSHI TAGOMORI
 
Bypassing ASLR Exploiting CVE 2015-7545
Bypassing ASLR Exploiting CVE 2015-7545Bypassing ASLR Exploiting CVE 2015-7545
Bypassing ASLR Exploiting CVE 2015-7545Kernel TLV
 
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...Anne Nicolas
 
10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you!
10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you!10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you!
10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you!Laurent Bernaille
 
How to deploy PHP projects with docker
How to deploy PHP projects with dockerHow to deploy PHP projects with docker
How to deploy PHP projects with dockerRuoshi Ling
 
Google File Systems
Google File SystemsGoogle File Systems
Google File SystemsAzeem Mumtaz
 

Tendances (20)

Intro to Kernel Debugging - Just make the crashing stop!
Intro to Kernel Debugging - Just make the crashing stop!Intro to Kernel Debugging - Just make the crashing stop!
Intro to Kernel Debugging - Just make the crashing stop!
 
Lecture1: NGS Analysis on Beocat and an introduction to Perl programming for ...
Lecture1: NGS Analysis on Beocat and an introduction to Perl programming for ...Lecture1: NGS Analysis on Beocat and an introduction to Perl programming for ...
Lecture1: NGS Analysis on Beocat and an introduction to Perl programming for ...
 
Collaborate instant cloning_kyle
Collaborate instant cloning_kyleCollaborate instant cloning_kyle
Collaborate instant cloning_kyle
 
Frits Hoogland - About multiblock reads
Frits Hoogland - About multiblock readsFrits Hoogland - About multiblock reads
Frits Hoogland - About multiblock reads
 
Built in physical and logical replication in postgresql-Firat Gulec
Built in physical and logical replication in postgresql-Firat GulecBuilt in physical and logical replication in postgresql-Firat Gulec
Built in physical and logical replication in postgresql-Firat Gulec
 
[오픈소스컨설팅] Linux Network Troubleshooting
[오픈소스컨설팅] Linux Network Troubleshooting[오픈소스컨설팅] Linux Network Troubleshooting
[오픈소스컨설팅] Linux Network Troubleshooting
 
Modern Linux Tracing Landscape
Modern Linux Tracing LandscapeModern Linux Tracing Landscape
Modern Linux Tracing Landscape
 
Firebird 3: provider-based architecture, plugins and OO approach to API
Firebird 3: provider-based architecture, plugins and OO approach to API Firebird 3: provider-based architecture, plugins and OO approach to API
Firebird 3: provider-based architecture, plugins and OO approach to API
 
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all startedKernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all started
 
Enable DPDK and SR-IOV for containerized virtual network functions with zun
Enable DPDK and SR-IOV for containerized virtual network functions with zunEnable DPDK and SR-IOV for containerized virtual network functions with zun
Enable DPDK and SR-IOV for containerized virtual network functions with zun
 
Take a Jailbreak -Stunning Guards for iOS Jailbreak- by Kaoru Otsuka
Take a Jailbreak -Stunning Guards for iOS Jailbreak- by Kaoru OtsukaTake a Jailbreak -Stunning Guards for iOS Jailbreak- by Kaoru Otsuka
Take a Jailbreak -Stunning Guards for iOS Jailbreak- by Kaoru Otsuka
 
Linux Performance Analysis and Tools
Linux Performance Analysis and ToolsLinux Performance Analysis and Tools
Linux Performance Analysis and Tools
 
DeathNote of Microsoft Windows Kernel
DeathNote of Microsoft Windows KernelDeathNote of Microsoft Windows Kernel
DeathNote of Microsoft Windows Kernel
 
嵌入式Linux課程-GNU Toolchain
嵌入式Linux課程-GNU Toolchain嵌入式Linux課程-GNU Toolchain
嵌入式Linux課程-GNU Toolchain
 
JRuby with Java Code in Data Processing World
JRuby with Java Code in Data Processing WorldJRuby with Java Code in Data Processing World
JRuby with Java Code in Data Processing World
 
Bypassing ASLR Exploiting CVE 2015-7545
Bypassing ASLR Exploiting CVE 2015-7545Bypassing ASLR Exploiting CVE 2015-7545
Bypassing ASLR Exploiting CVE 2015-7545
 
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
 
10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you!
10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you!10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you!
10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you!
 
How to deploy PHP projects with docker
How to deploy PHP projects with dockerHow to deploy PHP projects with docker
How to deploy PHP projects with docker
 
Google File Systems
Google File SystemsGoogle File Systems
Google File Systems
 

En vedette

Scaling Analytics with elasticsearch
Scaling Analytics with elasticsearchScaling Analytics with elasticsearch
Scaling Analytics with elasticsearchdnoble00
 
Présentation de la solution Strategeex
Présentation de la solution StrategeexPrésentation de la solution Strategeex
Présentation de la solution StrategeexVisiativ
 
ElasticSearch in action
ElasticSearch in actionElasticSearch in action
ElasticSearch in actionCodemotion
 
Data Exploration with Elasticsearch
Data Exploration with ElasticsearchData Exploration with Elasticsearch
Data Exploration with ElasticsearchAleksander Stensby
 
Analyse des Sentiments -cas twitter- "Opinion Detection with Machine Lerning "
Analyse des Sentiments  -cas twitter- "Opinion Detection with Machine Lerning "Analyse des Sentiments  -cas twitter- "Opinion Detection with Machine Lerning "
Analyse des Sentiments -cas twitter- "Opinion Detection with Machine Lerning "Soumia Elyakote HERMA
 
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...StampedeCon
 

En vedette (6)

Scaling Analytics with elasticsearch
Scaling Analytics with elasticsearchScaling Analytics with elasticsearch
Scaling Analytics with elasticsearch
 
Présentation de la solution Strategeex
Présentation de la solution StrategeexPrésentation de la solution Strategeex
Présentation de la solution Strategeex
 
ElasticSearch in action
ElasticSearch in actionElasticSearch in action
ElasticSearch in action
 
Data Exploration with Elasticsearch
Data Exploration with ElasticsearchData Exploration with Elasticsearch
Data Exploration with Elasticsearch
 
Analyse des Sentiments -cas twitter- "Opinion Detection with Machine Lerning "
Analyse des Sentiments  -cas twitter- "Opinion Detection with Machine Lerning "Analyse des Sentiments  -cas twitter- "Opinion Detection with Machine Lerning "
Analyse des Sentiments -cas twitter- "Opinion Detection with Machine Lerning "
 
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
 

Similaire à Presentation

Sql server 2012 - always on deep dive - bob duffy
Sql server 2012 - always on deep dive - bob duffySql server 2012 - always on deep dive - bob duffy
Sql server 2012 - always on deep dive - bob duffyAnuradha
 
Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...
Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...
Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...SQLExpert.pl
 
Cистема распределенного, масштабируемого и высоконадежного хранения данных дл...
Cистема распределенного, масштабируемого и высоконадежного хранения данных дл...Cистема распределенного, масштабируемого и высоконадежного хранения данных дл...
Cистема распределенного, масштабируемого и высоконадежного хранения данных дл...Ontico
 
Collaborate vdb performance
Collaborate vdb performanceCollaborate vdb performance
Collaborate vdb performanceKyle Hailey
 
Oracle 10g Performance: chapter 05 waits intro
Oracle 10g Performance: chapter 05 waits introOracle 10g Performance: chapter 05 waits intro
Oracle 10g Performance: chapter 05 waits introKyle Hailey
 
WalB: Block-level WAL. Concept.
WalB: Block-level WAL. Concept.WalB: Block-level WAL. Concept.
WalB: Block-level WAL. Concept.Takashi Hoshino
 
Less01 architecture
Less01 architectureLess01 architecture
Less01 architectureAmit Bhalla
 
Scaling at Showyou: Operations
Scaling at Showyou: OperationsScaling at Showyou: Operations
Scaling at Showyou: Operationsaphyr_
 
Less14 br concepts
Less14 br conceptsLess14 br concepts
Less14 br conceptsAmit Bhalla
 
ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "
ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "
ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "Kuniyasu Suzaki
 
Deploying Maximum HA Architecture With PostgreSQL
Deploying Maximum HA Architecture With PostgreSQLDeploying Maximum HA Architecture With PostgreSQL
Deploying Maximum HA Architecture With PostgreSQLDenish Patel
 
Power of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data StructuresPower of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data Structuresconfluent
 
The Power of the Log
The Power of the LogThe Power of the Log
The Power of the LogBen Stopford
 
AWS Summit 2011: High Availability Database Architectures in AWS Cloud
AWS Summit 2011: High Availability Database Architectures in AWS CloudAWS Summit 2011: High Availability Database Architectures in AWS Cloud
AWS Summit 2011: High Availability Database Architectures in AWS CloudAmazon Web Services
 
Castle enhanced Cassandra
Castle enhanced CassandraCastle enhanced Cassandra
Castle enhanced CassandraEric Evans
 
Playing in the Same Sandbox: MySQL and Oracle
Playing in the Same Sandbox:  MySQL and OraclePlaying in the Same Sandbox:  MySQL and Oracle
Playing in the Same Sandbox: MySQL and Oraclelynnferrante
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaShiao-An Yuan
 
Deploying Maximum HA Architecture With PostgreSQL
Deploying Maximum HA Architecture With PostgreSQLDeploying Maximum HA Architecture With PostgreSQL
Deploying Maximum HA Architecture With PostgreSQLDenish Patel
 
Open stack in sina
Open stack in sinaOpen stack in sina
Open stack in sinaHui Cheng
 

Similaire à Presentation (20)

Sql server 2012 - always on deep dive - bob duffy
Sql server 2012 - always on deep dive - bob duffySql server 2012 - always on deep dive - bob duffy
Sql server 2012 - always on deep dive - bob duffy
 
Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...
Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...
Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...
 
Cистема распределенного, масштабируемого и высоконадежного хранения данных дл...
Cистема распределенного, масштабируемого и высоконадежного хранения данных дл...Cистема распределенного, масштабируемого и высоконадежного хранения данных дл...
Cистема распределенного, масштабируемого и высоконадежного хранения данных дл...
 
MySQL高可用
MySQL高可用MySQL高可用
MySQL高可用
 
Collaborate vdb performance
Collaborate vdb performanceCollaborate vdb performance
Collaborate vdb performance
 
Oracle 10g Performance: chapter 05 waits intro
Oracle 10g Performance: chapter 05 waits introOracle 10g Performance: chapter 05 waits intro
Oracle 10g Performance: chapter 05 waits intro
 
WalB: Block-level WAL. Concept.
WalB: Block-level WAL. Concept.WalB: Block-level WAL. Concept.
WalB: Block-level WAL. Concept.
 
Less01 architecture
Less01 architectureLess01 architecture
Less01 architecture
 
Scaling at Showyou: Operations
Scaling at Showyou: OperationsScaling at Showyou: Operations
Scaling at Showyou: Operations
 
Less14 br concepts
Less14 br conceptsLess14 br concepts
Less14 br concepts
 
ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "
ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "
ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "
 
Deploying Maximum HA Architecture With PostgreSQL
Deploying Maximum HA Architecture With PostgreSQLDeploying Maximum HA Architecture With PostgreSQL
Deploying Maximum HA Architecture With PostgreSQL
 
Power of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data StructuresPower of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data Structures
 
The Power of the Log
The Power of the LogThe Power of the Log
The Power of the Log
 
AWS Summit 2011: High Availability Database Architectures in AWS Cloud
AWS Summit 2011: High Availability Database Architectures in AWS CloudAWS Summit 2011: High Availability Database Architectures in AWS Cloud
AWS Summit 2011: High Availability Database Architectures in AWS Cloud
 
Castle enhanced Cassandra
Castle enhanced CassandraCastle enhanced Cassandra
Castle enhanced Cassandra
 
Playing in the Same Sandbox: MySQL and Oracle
Playing in the Same Sandbox:  MySQL and OraclePlaying in the Same Sandbox:  MySQL and Oracle
Playing in the Same Sandbox: MySQL and Oracle
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Deploying Maximum HA Architecture With PostgreSQL
Deploying Maximum HA Architecture With PostgreSQLDeploying Maximum HA Architecture With PostgreSQL
Deploying Maximum HA Architecture With PostgreSQL
 
Open stack in sina
Open stack in sinaOpen stack in sina
Open stack in sina
 

Dernier

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 

Dernier (20)

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 

Presentation

  • 1. Durability for Memory-Based Key-Value Stores Kiarash Rezahanjani July 4, 2012 1
  • 2. Durability Data Store (university , KTH ) set(university , UPC) Ack get(university ) UPC 2
  • 3. Durability Data Store set(university , UPC ) Commodity Ack Non Volatile 3
  • 4. Durability Data Store set(myKey, U) Commodity Ack 4
  • 5. Durability Seek time + SLOW Rotational time + Write Read Transfer time Disk 5
  • 6. Cache in memory Slow Writes Reads Fast Cached Objects Consistency ? Primary copy of objects 6
  • 7. Cache in memory Stale data Application Servers Set ObjA Read ObjA - > Cache Miss Spending resouces Read Obj A Memcache servers Complicates development Delete Obj A Update Obj A Writes are still Slow MySQL Servers 7
  • 8. Memory-Based Databases No inconsistency Writes Reads No stale data Reads are fast Primary Copy of Objects Durability? Writes latency? Back up 8
  • 9. Approaches towards durability State A State B Periodic Snapshots Data loss Snapshot Snapshot Synchronous logging Slow Log Log Log Asynchronous logging Data loss Logs Logs 9
  • 10. Approaches towards durability Replica Expensive Data Catastrophic Failure , All gone Replica Replica 10
  • 11. Project Goals Durable write Low latency Availability, able to recover quickly Cheap, commodity hardware 11
  • 12. Target systems • Data is big = many machines • Read dominant workload • Simple key-value store • Small writes – Example: Facebook • Tera bytes of data = 2000 memcache servers • Write/read ratio < 6% • Memcache is a key-value store • Status update, tag photo, profile update, etc 12
  • 13. Solution 13
  • 14. Design decisions Periodic snapshot vs. Message logging  14
  • 15. Design decisions Local disk vs. Remote location  15
  • 16. Design decisions Remote file server vs. Local disks of database cluster  16
  • 17. Design Decision write Database client Ack Log Remote storage 17
  • 18. Design Decision write Two Problems Database client 1) Synchronous logging Ack Log Must Asynchronous logging Problems: Data loss 2) Data availability Replication 18
  • 19. Replication Ack Log Ack Log Log Log Log Replication 19
  • 20. Replication Broadcast Chain replication Ack Log Ack Log mast er tail head slave slave 20
  • 21. Replication Broadcast Ack Log mast er slave slave slave 21
  • 22. Replication Chain replication Ack Log tail head 22
  • 23. Replication Chain replication Log Ack tail head 23
  • 24. Chain Replication write Database Ack client Log Log Log Log 24
  • 25. Chain Replication Synchronous logging abstraction write Low latency Database Ack client Log Available Logs Log Log Log Stable Storage Unit 25
  • 27. Log Server 3 2 1 Reader 7 Receiver 6 5 3 Persister Sequential Write Seek time 2 1 27
  • 28. Forming storage units 1. Query zookeeper Zookeeper 2. Get list of servers 3. Leader send request 4. Leader send list of members ID1 ID2 ID3 5. Upload storage unit data 6. Start the service 28
  • 29. Storage System Zookeeper Client Client Stable storage unit Stable storage unit Client Stable storage unit Stable storage unit 29
  • 30. Failover Cient ID 1 ID 2 ID 3 50% 20% 30% ID 4 ID 5 ID 6 40% 45% 20% Stable Storage Unit Stable Storage Unit 30
  • 31. Failover Cient ID 1 ID 2 ID 3 50% 20% 30% ID 4 ID 5 ID 6 40% 45% 20% Stable Storage Unit Stable Storage Unit 31
  • 32. Failover Cient ID 1 ID 2 ID 3 50% 20% 30% ID 4 ID 5 ID 6 40% 45% 20% Stable Storage Unit Stable Storage Unit 32
  • 33. Evaluation • Throughput and latency of stable storage unit – Log entry sizes – Replication factors • Comparison with WAL into local disk 33
  • 34. Single synchronous client Replication factor of 3 Entry Size Latency(ms) Throughput(entries/sec) (bytes) 200 0,45 2200 1024 0,62 1600 4096 0,99 1000 34
  • 35. Throughput vs. Latency Replication factor of 3 3500 3000 2500 Latency (ms) 2000 5B 200 B 1500 1 KB 5000 4 KB 1000 14000 28000 10 KB 34000 500 0 0 5000 10000 15000 20000 25000 30000 35000 40000 45000 Throughput (entries/sec) 35
  • 36. Additional replica Entry size of 200 bytes 2000 1800 1600 1400 Latency (microsecond) 1200 1000 800 RF 3 RF 2 600 400 200 0 0 5000 10000 15000 20000 25000 30000 35000 40000 Throughput (entries/sec) 36
  • 39. Resource utilization • Throughput of 6,000 entries/sec • Log entries of 200 bytes – CPU utilization = 9% – Bandwidth = 29 Mb/s – Dedicated disk – Small memory requirement 39
  • 40. Summary  Durable write  Low latency  High availability  Scalable  No additional resources  Avoid dependencies 40

Notes de l'éditeur

  1. Resume
  2. Periodicsnapshop: degrade the performance at the time of snapshot, generate load spikeon machine
  3. Important not to try to be all things to all people– Clients might be demanding 8 different things– Doing 6 of them is easy– …handling 7 of them requires real thought– …dealing with all 8 usually results in a worse system• more complex, compromises other clients in trying to satisfy everyoneE.g.Facebook 2008 – 800 memcache server – 2000 now &lt; 6% writeUpdatessmall (expecttag, addfriend, new ads, status, profileupdate, sharing)
  4. After log isreplicated in memory of several machines ackissendtotheclientIfsome of theprocessescrashsomeotherprocess in other machines willstillpersistthe dataSeveral replicas providebetteravailabilityof data at the time of recoveryAggregatethereadbandwidth of the servers toacceceleratetherecovery
  5. Adding replica doesnt introduce bottleneck and doesnotimpactthroughput
  6. Scalablility
  7. Replication factor of three
  8. Commonapproach WAL to local disk, Redisisanexample of a popular in memorydatabase uses WAL to diskToGuranteedurability of every log ,itshould be writtento disk uponeverywriteoperationEvenwhen log iswrittento disk thereis no guranteethatitispersisted disk, bacauseby default the disk caches are enabledProcesscrash 1.7 alsopoweroutage 49, no availabilityif server isdownOurs factor of 4 betterthan disk with cache disableSaturation can be prevented