Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Voldemort on Solid State Drives
1. Voldemort on Solid
State Drives
Vinoth Chandar, Lei Gao, Cuong Tran
Linkedin Corp, Mountain View, CA
2. What is Voldemort?
• Distributed key-value storage system
• Based on Amazon Dynamo
• Open source
• https://github.com/voldemort/voldemort
• Widely used
• Several TB of data @ Linkedin
• Eharmony, Nokia, JIVE Software etc.
• We are on SSD!
3. SSD Migration
• Improve average latency
• 20 ms to 2 ms
• Cluster expansion & data restoration
• Improved 10x
• 95th & 99th latencies shot up
• 30ms to 130ms and 240ms to 380ms
• GC Issues !!
Client
operations BDB Cleaner Retention Job
BDB Cache
App1 App2 .. AppN Server JVM Heap
Linux
4. Benchmarking for SSD
• Java based systems
• Need for End-End Correlation
• Linux page stealing
• Use AlwaysPreTouch
• Run Workload causing Fragmentation
• Promotion failure more likely
• Defragmentation costly
• Run Workload with High memory churn
• cost of CMS Initial Mark
• High CMS initiating fraction
• Factor in higher IO Parallelism
• Sequential IO benchmarks no longer use single thread
5. Takeaways
• GC impact significant
• Big heaps = Big pauses
• Longer runs to account for SSD GC
• SSD is log structured
• Worst case 2 ms
• Special attention to 95th & 99th
• Fragmentation eats up gains
Notes de l'éditeur
Voldemort eng at Lnkd . 10 sec
20 sec. Set theme for talk. Not about distributed systems aspects. Experiences with Single node performance issues during SSD migration.
1 min. multi tenant. TODO : say explicitly “focus on diagram at the bottom”
2 min “Tackling these GC issues, gave us some intuitions for benchmarking java based applications on SSD”. In end-end correlation, “First of all, given Big data on SSD is relatively new, we need tools to correlate patterns across the stack and make sense out of it. For example – young gen pauses due to linux page stealing. Line up page scan data and system time components in young gen pause from gc log”