High performance (as measured by sub-millisecond response time for queries) is a key characteristic of the Redis database, and it is one of the main reasons why Redis is the most popular key-value database in the world.
In order to continue improving performance across all of the different Redis components, we’ve developed a framework for automatically triggering performance tests, telemetry gathering, profiling, and data visualization upon code commit.
In this talk, we describe how this type of automation and “zero-touch” profiling scaled our ability to pursue performance regressions and find opportunities to improve the efficiency of our code, helping us (as a company) to start shifting from a reactive to a more proactive performance mindset.
31. improved Redis
performance by up to 4x![1]
[1] - https:/
/redis.com/blog/redis-intel-performance-testing/
https:/
/redis.com/blog/redis-7-geographic-commands/
32. > what we’ve gained (1/4)
● Deeply reduced the feedback cycle ( days -> 1 hour )
● Dev’s can easily add tests (> 300 full suites)
● Scaled + more challenging!
● performance is now everyone’s power/responsibility
33. > what we’ve gained (2/4)
● A/B test new tech/state-of-the-art HW/SW components
● Continuous up-to-date numbers for use-cases that matter
● Foster openness community/cross-company efforts
34. > what we’ve gained (3/4)
● Able to commit to reduce overhead per operation on our cloud/SW
● Competitive advantage/Leading
● Shift proactive/reactive + predictiveness
● Reduce costs
○ manual work/detection is at least 17.5X more expensive vs automation detection
35. > what we’ve gained (4/4)
● ability to reproduce the performance of > 10 years of
development of Redis
■ “go back in time”
37. ● Extend profiler daemon to bpf tooling, vtune
○ off-cpu analysis
○ threading/locking
○ vectorization reports
VISIBILITY for Points of Improvement
> what’s next feature wise (1/2)
38. > what’s next feature wise (2/2)
● extend tools to characterize further the workload
○ mem bound/cpu bound
○ HW counters
■ stalls on memory
○ % time off-cpu
○ extend partner tooling and HW
■ beta versions of their next gen HW
■ multi-arch comparisons (we do it manually now)
○ extended io statistics
● include low(*er) overhead profilers / tracers
○ call count analysis
○ off cpu flame charts analysis
○ syscall
○ …
39. > what’s next product wise
● Improve anomaly/regression detection
● Increase OSS / Company adoption
○ expose data on docs
40. Follow up links
● Redis Performance Group
● Redis Benchmarks Specification
● Making the fast faster blog