SlideShare a Scribd company logo
1 of 26
안미진
RocksDB Compaction
Embedded Key-Value Store for Flash and RAM
Contents
1. RocksDB Architecture
2. Level Style Compaction
3. Universal Style Compaction
4. RocksDB Compaction
Overview
RocksDB Architecture
Active
Memtable
Read-Only
Memtable
Memory
Log
Log
SSTSSTSST
SSTSSTSST
Persistent Storage
Write Request
Read Request LSM Files
CompactionFlush
Switch Switch
RocksDB Architecture
Active
Memtable
Read-Only
Memtable
Memory
Log
Log
SSTSSTSST
SSTSSTSST
Persistent Storage
Write Request
Read Request LSM Files
CompactionFlush
Switch Switch
RocksDB Architecture
Active
Memtable
Read-Only
Memtable
Memory
Log
Log
SSTSSTSST
SSTSSTSST
Persistent Storage
Write Request
Read Request LSM Files
CompactionFlush
Switch Switch
RocksDB Architecture
Active
Memtable
Read-Only
Memtable
Memory
Log
Log
SSTSSTSST
SSTSSTSST
Persistent Storage
Write Request
LSM Files
CompactionFlush
Switch Switch
Read Request
RocksDB Architecture
Active
Memtable
Read-Only
Memtable
Memory
Log
Log
SSTSSTSST
SSTSSTSST
Persistent Storage
Write Request
LSM Files
CompactionFlush
Switch Switch
Read Request
RocksDB Architecture
Active
Memtable
(4MB)
Immutable
Memtable
Memory
Disk
Write
Level 0
(4 SSTfile)
Level 1
(10MB)
Level 2
(100MB)
. . .
. . . . . .
Info Log
MANIFEST
CURRENT
Compaction
Log
SSTfile
(2MB)
RocksDB Compaction
Multi-threaded compactions
• Background Multi-thread
→ periodically do the “compaction”
→ parallel compactions on different parts of the database
can occur simultaneously
• Merge SSTfiles to a bigger SSTfile
• Remove multiple copies of the same key
– Duplicate or overwritten keys
• Process deletions of keys
• Supports two different styles of compaction
– Tunable compaction to trade-off
Level Style Compaction
• level0_file_num_compaction_trigger
- Number of files to trigger level0 compaction
- Default : 1
Ex) candidate files size < the next file’s size (1% smaller)
→ include next file into this candidate set
• Level0_file_
- The minimum number of files in a single compaction
- Default : 2
• max_merge_width
- The maximum number of files in a single compaction
- Default : UINT_MAX
Compaction options
1. Level Style Compaction
• RocksDB default compaction style
• Stores data in multiple levels in the database
• More recent data → L0
The oldest data → Lmax
• Files in L0
- overlapping keys, sorted by flush time
Files in L1 and higher
- non-overlapping keys, sorted by key
• Each level is 10 times larger than the previous one
Inherited from LevelDB
Level Style Compaction
Compaction process
cache
log
level1
level2
level3
level0
① Pick one file from level N
② Compact it with all its overlapping
files from level N+1
③ Replace them with new files in
level N+1
Level 0 → Level 1 Compaction
• Level 0 → overlapping keys
• Compaction includes all files from L1
• All files from L1 are compacted with L0
• L0 → L1 compaction completion
L1 → L2 compaction start
• Single thread compaction → not good throughput
• Solution : Making the size of L0 similar to size of L1
Tricky Compaction
Level Style Compaction
Level Style Compaction
· Level score =
𝑐𝑢𝑟𝑟𝑒𝑛𝑡 𝑙𝑒𝑣𝑒𝑙 𝑠𝑖𝑧𝑒
max level size
· max file size
= target_file_size_base * target_file_size_multiplier
(Default=2MB) (Default=1)
· Overlapping range search
: Binary Search
Level Style
Flowchart
• Read : 128KB / Write : 512KB
Level Style Compaction
2. Universal Style Compaction
• For write-heavy workloads
→ Level Style Compaction may be bottlenecked on
disk throughput
• Stores all files in L0
• All files are arranged in time order
• Temporarily increase size amplification by a factor of
two
• Intended to decrease write amplification
• But, increase space amplification
Universal Style Compaction
① Pick up a few files that are chronologically adjacent to one
another
② Merge them
③ Replace them with a new file in level 0
Compaction process
Universal Style Compaction
Universal Style Compaction
Universal Style Compaction
Flowchart
Universal Style Compaction
• Read : 128KB / Write : 512KB
Universal Style Compaction
• size_ratio
- Percentage flexibility while comparing file size
- Default : 1
Ex) candidate set size < size of next file (1% smaller)
→ include next file in candidate set
• min_merge_width
- The minimum number of files in a single compaction
- Default : 2
• max_merge_width
- The maximum number of files in a single compaction
- Default : UINT_MAX
Compaction options
Universal Style Compaction
• max_size_amplification_percent
- The amount of additional storage needed to store a
single byte of data in the database
- Controls the amount of space amplification in the
database
- Does not determine when calls to Put & Delete are
stalled
- Determines when compaction is done
- Default : 200
Compaction options
Universal Style Compaction
• stop_style
- The algorithm used to stop picking files into a single
compaction run
- kCompactionStopStyleSimilarSize
→ Pick files of similar size
- kCompactionStopStyleTotalSize
→ total size of picked files > next files
- Default : kCompactionStopStyleTotalSize
Compaction options

More Related Content

What's hot

The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
Databricks
 

What's hot (20)

Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)
 
Adaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at RuntimeAdaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at Runtime
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
Rds data lake @ Robinhood
Rds data lake @ Robinhood Rds data lake @ Robinhood
Rds data lake @ Robinhood
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Understanding and Improving Code Generation
Understanding and Improving Code GenerationUnderstanding and Improving Code Generation
Understanding and Improving Code Generation
 
PostgreSQL Deep Internal
PostgreSQL Deep InternalPostgreSQL Deep Internal
PostgreSQL Deep Internal
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 
Spark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsSpark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka Streams
 
MyRocks introduction and production deployment
MyRocks introduction and production deploymentMyRocks introduction and production deployment
MyRocks introduction and production deployment
 
Druid deep dive
Druid deep diveDruid deep dive
Druid deep dive
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL Administration
 
MongoDB WiredTiger Internals
MongoDB WiredTiger InternalsMongoDB WiredTiger Internals
MongoDB WiredTiger Internals
 
Iceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data AnalyticsIceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data Analytics
 

Similar to RocksDB compaction

Computer Memory Hierarchy Computer Architecture
Computer Memory Hierarchy Computer ArchitectureComputer Memory Hierarchy Computer Architecture
Computer Memory Hierarchy Computer Architecture
Haris456
 
Storage talk
Storage talkStorage talk
Storage talk
christkv
 

Similar to RocksDB compaction (20)

Power of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data StructuresPower of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data Structures
 
The Power of the Log
The Power of the LogThe Power of the Log
The Power of the Log
 
Geek Sync | Guide to Understanding and Monitoring Tempdb
Geek Sync | Guide to Understanding and Monitoring TempdbGeek Sync | Guide to Understanding and Monitoring Tempdb
Geek Sync | Guide to Understanding and Monitoring Tempdb
 
Shignled disk
Shignled diskShignled disk
Shignled disk
 
Scylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast Enough
Scylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast EnoughScylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast Enough
Scylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast Enough
 
Extlect03
Extlect03Extlect03
Extlect03
 
Scaling ScyllaDB Storage Engine with State-of-Art Compaction
Scaling ScyllaDB Storage Engine with State-of-Art CompactionScaling ScyllaDB Storage Engine with State-of-Art Compaction
Scaling ScyllaDB Storage Engine with State-of-Art Compaction
 
Why you should care about data layout in the file system with Cheng Lian and ...
Why you should care about data layout in the file system with Cheng Lian and ...Why you should care about data layout in the file system with Cheng Lian and ...
Why you should care about data layout in the file system with Cheng Lian and ...
 
Deep Dive - Maximising EC2 & EBS Performance
Deep Dive - Maximising EC2 & EBS PerformanceDeep Dive - Maximising EC2 & EBS Performance
Deep Dive - Maximising EC2 & EBS Performance
 
Some key value stores using log-structure
Some key value stores using log-structureSome key value stores using log-structure
Some key value stores using log-structure
 
Getting Under the Hood of Kafka Streams: Optimizing Storage Engines to Tune U...
Getting Under the Hood of Kafka Streams: Optimizing Storage Engines to Tune U...Getting Under the Hood of Kafka Streams: Optimizing Storage Engines to Tune U...
Getting Under the Hood of Kafka Streams: Optimizing Storage Engines to Tune U...
 
Computer Memory Hierarchy Computer Architecture
Computer Memory Hierarchy Computer ArchitectureComputer Memory Hierarchy Computer Architecture
Computer Memory Hierarchy Computer Architecture
 
04 cache memory
04 cache memory04 cache memory
04 cache memory
 
Webinar: Understanding Storage for Performance and Data Safety
Webinar: Understanding Storage for Performance and Data SafetyWebinar: Understanding Storage for Performance and Data Safety
Webinar: Understanding Storage for Performance and Data Safety
 
Storage talk
Storage talkStorage talk
Storage talk
 
Cassandra TK 2014 - Large Nodes
Cassandra TK 2014 - Large NodesCassandra TK 2014 - Large Nodes
Cassandra TK 2014 - Large Nodes
 
Caches microP
Caches microPCaches microP
Caches microP
 
cache memory introduction, level, function
cache memory introduction, level, functioncache memory introduction, level, function
cache memory introduction, level, function
 
MongoDB Replication fundamentals - Desert Code Camp - October 2014
MongoDB Replication fundamentals - Desert Code Camp - October 2014MongoDB Replication fundamentals - Desert Code Camp - October 2014
MongoDB Replication fundamentals - Desert Code Camp - October 2014
 
Memory Hierarchy PPT of Computer Organization
Memory Hierarchy PPT of Computer OrganizationMemory Hierarchy PPT of Computer Organization
Memory Hierarchy PPT of Computer Organization
 

More from MIJIN AN (7)

InnoDB Flushing and Checkpoints
InnoDB Flushing and CheckpointsInnoDB Flushing and Checkpoints
InnoDB Flushing and Checkpoints
 
Secondary Index Search in InnoDB
Secondary Index Search in InnoDBSecondary Index Search in InnoDB
Secondary Index Search in InnoDB
 
MySQL Space Management
MySQL Space ManagementMySQL Space Management
MySQL Space Management
 
MySQL Buffer Management
MySQL Buffer ManagementMySQL Buffer Management
MySQL Buffer Management
 
Group play service for Tizen
Group play service for TizenGroup play service for Tizen
Group play service for Tizen
 
MySQL with FaCE
MySQL with FaCEMySQL with FaCE
MySQL with FaCE
 
MySQL Hash Table
MySQL Hash TableMySQL Hash Table
MySQL Hash Table
 

Recently uploaded

%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Recently uploaded (20)

Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
SHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions PresentationSHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions Presentation
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Generic or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisions
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban
 

RocksDB compaction

Editor's Notes

  1. MANIFEST files will be formatted as a log all changes cause a state change (add or delete) will be appended to the log. A MANIFEST file lists the set of sorted tables that make up each level Informational messages are printed to files named LOG and LOG.old. CURRENT is a latest manifest file name of the text file