CMG 2024 - Performance Testing, Profiling, and Analysis at Redis

© 2024 Redis Ltd. All rights reserved. Confidential (Internal use only)
E2E Performance Testing, Profiling, and
Analysis at Redis
Filipe Oliveira
CMG Atlanta 2024

Filipe Oliveira
Principal Performance Engineer at Redis
> whoami

■ Working on continuous performance analysis
■ Open Source Contributor (C, Go):
● github.com/redis-performance
● github.com/redis/redis
● https://github.com/HdrHistogram/hdrhistogram-go
● https://github.com/RedisBloom/t-digest-c
> whoami

Agenda
Performance @Redis
The “old behaviour”
The do’s and dont’s
Our approach
KEY TAKEAWAYS

Ordinarily, on our companies core products
We have...
● automatic extensive tests to catch functional failures
...but when
● we accidentally commit a performance regression, nothing intercepts it*!

> a real case from the past

> a real case from the past
1. RediSearch minor version bump
2. Required multiple patch
a. Feedback cycle took us at-least 1 day
b. prioritized over other projects
c. Siloed
d. Jul. 30, Nov. 27, 2019
You can relate to...
● your team run performance tests before releasing

You can state...
...but solving slowdowns just before releasing is...
● dangerous
● time-consuming
● one of the most difficult tasks to estimate time to

You can state...
...doing so is just buffering potential issues!

> goal: reduce feedback cycle. avoid silos
Requirements for valid tests
- Stable testing environment
- Deterministic testing tools
- Deterministic outcomes
- Reduced testing/probing overhead
- Reduce tested changes to the minimal
Requirements for acceptance in
products
- Acceptable duration
- No manual work
- Actionable items
- Well defined key performance
indicators
CODE REVIEW
PREVIEW /
UNSTABLE
RELEASE
MANUAL
PERF
CHECK
from:
CODE REVIEW
PREVIEW /
UNSTABLE
RELEASE
ZERO TOUCH
PERF CHECK
ZERO TOUCH
PERF CHECK
ZERO TOUCH
PERF CHECK
to:

> this is not new / disruptive
Elastic
https://elasticsearch-benchmarks.elastic.co/#
Lucene
https://home.apache.org/~mikemccand/lucenebench/

> this is not new / disruptive
mongoDB

> how we’re actually doing it
CODE REVIEW
PREVIEW /
UNSTABLE
RELEASE
ZERO TOUCH
PERF CHECK
ZERO TOUCH
PERF CHECK
ZERO TOUCH
PERF CHECK

> How we’re
actually
doing it…

- summary dashboard details for Redis Ltd Performance CI tracking -
> 200 Active steady stable VMs on peak > 100K benchmark runs in ~= 2 years

> our approach
Vanilla Redis (purely OSS project)
1. Created an OSS SPEC
a. [follow link]
2. Extend the spec and use it
a. for historical data
b. for regression analysis
c. for docs
Redis Developers Group +
(Redis Ltd, AWS, Ericson, Alibaba, …. )

> our approach
Redis Ltd
1. Started by the small scale
projects
a. Redis Module’s
2. Initial OSS deployments
3. local and remote triggers
4. Used for testing, profiling
a. Regression analysis
i. and fix
b. Approval of features
c. Proactive optimization

by branch
scalability analysis
by version
> our approach
Redis Ltd
projects
a. Redis Module’s
i. and fix

scalability analysis by branch/version including client and server metrics
> our approach
Redis Ltd
projects
a. Redis Module’s
i. and fix

nightly:
feature* / perf* / v*:
> our approach
Redis Ltd
projects
a. Redis Module’s
i. and fix

1. Full process Flame Graph + main thread Flame Graph
2. perf report per dso
3. perf report per dso,sym (w/wout callgraph)
4. perf report per dso,sym,srcline (w/wout callgraph)
5. identical stacks collapsed
6. hotpath callgraph
1
3
2
4
6
> our approach
Redis Ltd
projects
a. Redis Module’s
i. and fix

> our approach
Redis Ltd
projects
a. Redis Module’s
i. and fix
> 300 individual profiles all merged…

Analysis:
https://github.com/RedisTimeSeries/RedisTimeSeries/issues/793
PR:
https://github.com/RedisTimeSeries/RedisTimeSeries/pull/794
WIP:
https://github.com/RedisTimeSeries/RedisTimeSeries/issues/907
> our approach
Redis Ltd
projects
a. Redis Module’s
i. and fix

improved Redis
performance by up to 4x![1]
[1] - https:/
/redis.com/blog/redis-intel-performance-testing/
https:/
/redis.com/blog/redis-7-geographic-commands/

> what we’ve gained (1/4)
● Deeply reduced the feedback cycle ( days -> 1 hour )
● Dev’s can easily add tests (> 300 full suites)
● Scaled + more challenging!
● performance is now everyone’s power/responsibility

● A/B test new tech/state-of-the-art HW/SW components
● Continuous up-to-date numbers for use-cases that matter
● Foster openness community/cross-company efforts

● Able to commit to reduce overhead per operation on our cloud/SW
● Competitive advantage/Leading
● Shift proactive/reactive + predictiveness
● Reduce costs
○ manual work/detection is at least 17.5X more expensive vs automation detection

● ability to reproduce the performance of > 10 years of
development of Redis
■ “go back in time”

● Extend proﬁler daemon to bpf tooling, vtune
○ off-cpu analysis
○ threading/locking
○ vectorization reports
VISIBILITY for Points of Improvement
> what’s next feature wise (1/2)

> what’s next feature wise (2/2)
● extend tools to characterize further the workload
○ mem bound/cpu bound
○ HW counters
■ stalls on memory
○ % time off-cpu
○ extend partner tooling and HW
■ beta versions of their next gen HW
■ multi-arch comparisons (we do it manually now)
○ extended io statistics
● include low(*er) overhead profilers / tracers
○ call count analysis
○ off cpu flame charts analysis
○ syscall
○ …

> what’s next product wise
● Improve anomaly/regression detection
● Increase OSS / Company adoption
○ expose data on docs

Follow up links
● Redis Performance Group
● Redis Benchmarks Specification
● Making the fast faster blog

Thank you.
ping us at: performance <at> redis <dot> com

CMG 2024 - Performance Testing, Profiling, and Analysis at Redis

Recommandé

Recommandé

Contenu connexe

Similaire à CMG 2024 - Performance Testing, Profiling, and Analysis at Redis

Similaire à CMG 2024 - Performance Testing, Profiling, and Analysis at Redis (20)

Dernier

Dernier (20)

CMG 2024 - Performance Testing, Profiling, and Analysis at Redis