This presentation is about Java performance and the most effective ways to work with Java memory, including memory saving techniques and overcoming of memory barriers. Moreover, it contains debunking of the most popular myths on speed boosting.
This presentation by Andrii Antilikatorov (Consultant, GlobalLogic) was delivered at GlobalLogic Java Conference #2 in Krakow on April 23, 2016.
11. 11
Serial GC
Min & Max heap size - Xms/Xmx
Free space ratio in each generation:
MinHeapFreeRatio=?, MaxHeapFreeRatio=?
Young/Old generations ratio: NewRatio=?
Young generation min/max size: NewSize=?, MaxNewSize=?
Eden/Survivor ratio: SurvivorRatio=?
Set GC activity limit: UseGCOverheadLimit
13. 13
Parallel GC
Supports automatic self-tuning for optimal performance
Memory fragmentation
Perfectly consumes multi-core CPU power
All options of Serial GC are applicable
Number of GC threads: ParallelGCThreads=?
Disable compaction in Old Gen: UseParallelOldGC
Performance options: MaxGCPauseMillis=?, GCTimeRatio=?
Generation size increment: YoungGenerationSizeIncrement,
TenuredGenerationSizeIncrement
Generation size decrease: AdaptiveSizeDecrementScaleFactor
14. 14
Concurrent Mark-Sweep GC
Works as Parallel GC in case of minor GC
Minimizes pauses, but sacrifices CPU and throughput
Consumes more memory (+20%)
Long pauses in case of concurrency mode failures
Works great with big data with long-living objects
15. 15
Concurrent Mark-Sweep GC
Works as Parallel GC in case of minor GC
All options of Serial and Parallel GC are applicable
Major GC threshold:
CMSInitiatingOccupancyFraction=?
16. 16
Garbage First (G1) GC
JEP 248: Make G1 the Default Garbage Collector on 32- and
64-bit server configurations starting from Java 9
Designed for systems where limiting latency is more
important than maximizing throughput
More accurate pause prediction
No memory fragmentation
High CPU utilization
17. 17
Garbage First (G1) GC
JEP 248: Make G1 the Default Garbage Collector on 32- and
64-bit server configurations starting from Java 9
Designed for systems where limiting latency is more
important than maximizing throughput
More accurate pause prediction
No memory fragmentation
High CPU utilization
Number of GC threads and Marking threads:
ParallelGCThreads=?, ConcGCThreads=?
Heap region size: G1HeapRegionSize=?
Pause minimization: MaxGCPauseMillis=?
Heap memory allocation threshold:
InitiatingHeapOccupancyPercent=?
Options for real geeks: UnlockExperimentalVMOptions,
AggressiveOpts
19. 19
Memory Access
Heap
Direct Memory Access
(Off-Heap)
Non-Direct ByteBuffer
X Axis – No Of Reading
Y Axis – Op/Second in Millions
Direct ByteBuffer
20. 20
Memory Access
Heap
Direct Memory Access
(Off-Heap)
Non-Direct ByteBuffer
X Axis – No Of Reading
Y Axis – Op/Second in Millions
Direct ByteBuffer
21. 21
Memory Access
Heap
Direct Memory Access
(Off-Heap)
Non-Direct ByteBuffer
X Axis – No Of Reading
Y Axis – Op/Second in Millions
Direct ByteBuffer
22. 22
Memory Access
Heap
Direct Memory Access
(Off-Heap)
Non-Direct ByteBuffer
X Axis – No Of Reading
Y Axis – Op/Second in Millions
Direct ByteBuffer
23. 23
Direct Memory Alignment in Java
Type alignment Page size alignment
Cache line alignment
Memory Alignment
24. 24
Direct Memory Alignment in Java
Type alignment Page size alignment
Cache line alignment
Memory Alignment“= new SomeObj()” always type-aligned.
“Unsafe.allocateMemory” always 8-bytes aligned.
“ByteBuffer.allocateDirect” …
Memory is 0-ed out automatically
Memory is page-aligned in JDK ≤ 1.6 and ‘8-bytes’ aligned in JDK ≥ 1.7
Memory is is freed as part of the ByteBuffer object GC
25. 25
Comparing Aligned/Unaligned Access Performance
Type aligned access provides better performance than
unaligned access.
Memory access that spans 2 cache lines has far worse
performance than aligned mid-cache line access.
Cache line access performance changes based on
cache line location.
26. 26
Comparing Aligned/Unaligned Access Performance
Type aligned access provides better performance than
unaligned access.
Memory access that spans 2 cache lines has far worse
performance than aligned mid-cache line access.
Cache line access performance changes based on
cache line location.
28. 28
Comparing Aligned/Unaligned Access Performance
Type aligned access provides better performance than
unaligned access.
Memory access that spans 2 cache lines has far worse
performance than aligned mid-cache line access.
Cache line access performance changes based on
cache line location.
30. 30
Comparing Aligned/Unaligned Access Performance
Type aligned access provides better performance than
unaligned access.
Memory access that spans 2 cache lines has far worse
performance than aligned mid-cache line access.
Cache line access performance changes based on
cache line location.
31. 31
Comparing Aligned/Unaligned Access Performance
Type aligned access provides better performance than
unaligned access.
Memory access that spans 2 cache lines has far worse
performance than aligned mid-cache line access.
Cache line access performance changes based on
cache line location.
32. 32
Cost of Access Based on Cache Line Location
Offset
Number of pagesRelative cost
33. 33
Comparing Aligned/Unaligned Access Performance
Type aligned access provides better performance than
unaligned access.
Memory access that spans 2 cache lines has far worse
performance than aligned mid-cache line access.
Cache line access performance changes based on
cache line location.
38. 38
Create an Instance Without Calling a Constructor
Need a “hack” to create new instance of Singleton
Need to avoid execution of heavy constructor logic
Custom serialization/deserialization.
42. 42
Measure Shallow Size of an Object
Sizes of data structures are fixed for 32/64bit platforms
According to ‘sizeof’ is not required because…
Java VM’s GC does complete memory management
…and you still can ‘measure’ the object by serializing
to byte stream and looking at its length…
43. 43
Measure Shallow Size of an Object
Looking through all
non-static fields
Calculating offset of
the last field
Taking into account
memory alignment
44. 44
Measure Shallow Size of an Object
Getting data from
class struct
Converting signed
integer to long
Taking into account
header size