Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
1 of 26

RocksDB compaction

26

Share

RocksDB compaction

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

RocksDB compaction

  1. 1. 안미진 RocksDB Compaction Embedded Key-Value Store for Flash and RAM
  2. 2. Contents 1. RocksDB Architecture 2. Level Style Compaction 3. Universal Style Compaction 4. RocksDB Compaction Overview
  3. 3. RocksDB Architecture Active Memtable Read-Only Memtable Memory Log Log SSTSSTSST SSTSSTSST Persistent Storage Write Request Read Request LSM Files CompactionFlush Switch Switch
  4. 4. RocksDB Architecture Active Memtable Read-Only Memtable Memory Log Log SSTSSTSST SSTSSTSST Persistent Storage Write Request Read Request LSM Files CompactionFlush Switch Switch
  5. 5. RocksDB Architecture Active Memtable Read-Only Memtable Memory Log Log SSTSSTSST SSTSSTSST Persistent Storage Write Request Read Request LSM Files CompactionFlush Switch Switch
  6. 6. RocksDB Architecture Active Memtable Read-Only Memtable Memory Log Log SSTSSTSST SSTSSTSST Persistent Storage Write Request LSM Files CompactionFlush Switch Switch Read Request
  7. 7. RocksDB Architecture Active Memtable Read-Only Memtable Memory Log Log SSTSSTSST SSTSSTSST Persistent Storage Write Request LSM Files CompactionFlush Switch Switch Read Request
  8. 8. RocksDB Architecture Active Memtable (4MB) Immutable Memtable Memory Disk Write Level 0 (4 SSTfile) Level 1 (10MB) Level 2 (100MB) . . . . . . . . . Info Log MANIFEST CURRENT Compaction Log SSTfile (2MB)
  9. 9. RocksDB Compaction Multi-threaded compactions • Background Multi-thread → periodically do the “compaction” → parallel compactions on different parts of the database can occur simultaneously • Merge SSTfiles to a bigger SSTfile • Remove multiple copies of the same key – Duplicate or overwritten keys • Process deletions of keys • Supports two different styles of compaction – Tunable compaction to trade-off
  10. 10. Level Style Compaction • level0_file_num_compaction_trigger - Number of files to trigger level0 compaction - Default : 1 Ex) candidate files size < the next file’s size (1% smaller) → include next file into this candidate set • Level0_file_ - The minimum number of files in a single compaction - Default : 2 • max_merge_width - The maximum number of files in a single compaction - Default : UINT_MAX Compaction options
  11. 11. 1. Level Style Compaction • RocksDB default compaction style • Stores data in multiple levels in the database • More recent data → L0 The oldest data → Lmax • Files in L0 - overlapping keys, sorted by flush time Files in L1 and higher - non-overlapping keys, sorted by key • Each level is 10 times larger than the previous one Inherited from LevelDB
  12. 12. Level Style Compaction Compaction process cache log level1 level2 level3 level0 ① Pick one file from level N ② Compact it with all its overlapping files from level N+1 ③ Replace them with new files in level N+1
  13. 13. Level 0 → Level 1 Compaction • Level 0 → overlapping keys • Compaction includes all files from L1 • All files from L1 are compacted with L0 • L0 → L1 compaction completion L1 → L2 compaction start • Single thread compaction → not good throughput • Solution : Making the size of L0 similar to size of L1 Tricky Compaction
  14. 14. Level Style Compaction
  15. 15. Level Style Compaction
  16. 16. · Level score = 𝑐𝑢𝑟𝑟𝑒𝑛𝑡 𝑙𝑒𝑣𝑒𝑙 𝑠𝑖𝑧𝑒 max level size · max file size = target_file_size_base * target_file_size_multiplier (Default=2MB) (Default=1) · Overlapping range search : Binary Search Level Style Flowchart
  17. 17. • Read : 128KB / Write : 512KB Level Style Compaction
  18. 18. 2. Universal Style Compaction • For write-heavy workloads → Level Style Compaction may be bottlenecked on disk throughput • Stores all files in L0 • All files are arranged in time order • Temporarily increase size amplification by a factor of two • Intended to decrease write amplification • But, increase space amplification
  19. 19. Universal Style Compaction ① Pick up a few files that are chronologically adjacent to one another ② Merge them ③ Replace them with a new file in level 0 Compaction process
  20. 20. Universal Style Compaction
  21. 21. Universal Style Compaction
  22. 22. Universal Style Compaction Flowchart
  23. 23. Universal Style Compaction • Read : 128KB / Write : 512KB
  24. 24. Universal Style Compaction • size_ratio - Percentage flexibility while comparing file size - Default : 1 Ex) candidate set size < size of next file (1% smaller) → include next file in candidate set • min_merge_width - The minimum number of files in a single compaction - Default : 2 • max_merge_width - The maximum number of files in a single compaction - Default : UINT_MAX Compaction options
  25. 25. Universal Style Compaction • max_size_amplification_percent - The amount of additional storage needed to store a single byte of data in the database - Controls the amount of space amplification in the database - Does not determine when calls to Put & Delete are stalled - Determines when compaction is done - Default : 200 Compaction options
  26. 26. Universal Style Compaction • stop_style - The algorithm used to stop picking files into a single compaction run - kCompactionStopStyleSimilarSize → Pick files of similar size - kCompactionStopStyleTotalSize → total size of picked files > next files - Default : kCompactionStopStyleTotalSize Compaction options

Editor's Notes

  • MANIFEST files will be formatted as a log all changes cause a state change (add or delete) will be appended to the log. A MANIFEST file lists the set of sorted tables that make up each level
    Informational messages are printed to files named LOG and LOG.old.
    CURRENT is a latest manifest file name of the text file
  • ×