This is a story about how we struggled to implement strict latency requirements in a service implemented with Scala and Netty. And how we managed to do that.
The most common latency contributors are an in-process locking, thread scheduling, I/O, algorithmic inefficiencies and, of course, garbage collector.
I will share our experience of dealing with the causes. And tell what you can do to prevent them from affecting the production.
1. On the way to low
latency
Artem Orobets
Smartling Inc
2. Long story short
We realized that latency is important for us
Our fabulous architecture supposed to work, but it didn’t
The issues that we have faced on the way
3. Those guys consider 10µs
latencies slow
We have only 100ms
threshold
We are not a trading company
4. Latencies about 50ms
is barely noticeable for human
Trans-Atlantic Path 91 ms*
Trans-Pacific Path 141 ms*
From Earth to Mars 3-22 min
27. -XX:+PrintHeapAtGC
Heap after GC invocations=43363
(full 3):
par new generation total 59008K, used
1335K
eden space 52480K, 0%
from space 6528K, 20% used
to space 6528K, 0% used
concurrent mark-sweep generation total
2031616K, used 1830227K
28. -XX:+PrintTenuringDistribution
Desired survivor size 3342336 bytes, new
threshold 2 (max 2)
- age 1: 878568 bytes, 878568 total
- age 2: 1616 bytes, 880184 total
: 53829K->1380K(59008K), 0.0083140 secs]
1884058K->1831609K(2090624K), 0.0084006 secs]
31. Note: CMS collector on young
generation uses the same algorithm
as that of the parallel collector.
Java GC documentation at oracle.com
* http://www.oracle.com/webfolder/technetwork/tutorials/obe/java/gc01/index.html
33. Too many alive objects
during young gen GC
• Minimize survivors
• Watch the tenuring threshold, might need
to tune it to tenure long lived objects faster
• Reduce NewSize
• Reduce survivor spaces
41. Sync Async
98.85% <= 1 ms
99.95% <= 7 ms
99.98% <= 13 ms
99.99% <= 15 ms
100.00% <= 18 ms
1658 rps
98.47% <= 2 ms
99.95% <= 10 ms
99.98% <= 16 ms
99.99% <= 17 ms
100.00% <= 18 ms
769.05 rps
Logging
48. Nagle's algorithm
• the "small packet problem”
• TCP packets have a 40 byte header
(20 bytes for TCP, 20 bytes for IPv4)
• combining a number of small outgoing messages,
and sending them all at once
49. • Pauses ~100 ms every couple of hours
• During connection creation
• Doesn’t reproduces on a local setup