SlideShare a Scribd company logo
1 of 123
JVM JIT for Dummies
    And the rest of you, too.
Intro
•   Charles Oliver Nutter
    •   “JRuby Guy”
    •   Sun Microsystems 2006-2009
    •   Engine Yard 2009-2012
    •   Red Hat 2012-
•   Primarily responsible for compiler, perf
    •   Looking inside JVM
What We Will Learn

• How the JVM’s JIT works
• Monitoring the JIT
• Finding problems
• Dumping assembly (don’t be scared!)
What We Won’t

• GC tuning
• GC monitoring with VisualVM
 • Google ‘visualgc’, it’s awesome
• OpenJDK internals
• JNI
Caveat

• Focusing on OpenJDK (Hotspot)
• Other JVMs will do things differently
 • But base principals usually apply
• Flags are specific to Hotspot
 • Internal, subject to change, etc
JIT

• Just-In-Time compilation
• Compiled when needed
 • Maybe immediately before execution
 • ...or when we decide it’s important
 • ...or never?
Mixed-Mode
• Interpreted
 • Bytecode-walking
 • Artificial stack machine
• Compiled
 • Direct native operations
 • Native register machine
Profiling
• Gather data about code while interpreting
 • Invariants (types, constants, nulls)
 • Statistics (branches, calls)
• Use that information to optimize
 • Educated guess
 • Guess can be wrong...
The Golden Rule of
   Optimization

  Don’t do unnecessary work.
Optimization
• Method inlining
• Loop unrolling
• Lock coarsening/eliding
• Dead code elimination
• Duplicate code elimination
• Escape analysis
Inlining?

• Combine caller and callee into one unit
 • e.g. based on profile
 • Perhaps with a guard/test
• Optimize as a whole
 • More code means better visibility
Inlining
int addAll(int max) {
  int accum = 0;
  for (int i = 0; i < max; i++) {
    accum = add(accum, i);
  }
  return accum;
}

int add(int a, int b) {
  return a + b;
}
Inlining
int addAll(int max) {
  int accum = 0;
  for (int i = 0; i < max; i++) {
    accum = add(accum, i);
  }
  return accum;
                   Only one target is   ever seen
}

int add(int a, int b) {
  return a + b;
}
Inlining

int addAll(int max) {
  int accum = 0;
  for (int i = 0; i < max; i++) {
    accum = accum + i;
  }
  return accum;   Don’t bother making   the call
}
Loop Unrolling

• Works for small, constant loops
• Avoid tests, branching
• Allow inlining a single call as many
Loop Unrolling
private static final String[] options =
                   { "yes", "no", "maybe"};
public void looper() {
    for (String option : options) {
        process(option);
    }
}
                  Small loop, constant stride,
                         constant size
Loop Unrolling
private static final String[] options =
                   { "yes", "no", "maybe"};
public void looper() {
    process(options[0]);
    process(options[1]);       Unrolled!
    process(options[2]);
}
Lock Coarsening
public void needsLocks() {
    for (option : options) {
        process(option);
    }                           Repeatedly locking
}

private synchronized String process(String option) {
    // some wacky thread-unsafe code
}
Lock Coarsening
public void needsLocks() {         Lock once
    synchronized (this) {
        for (option : options) {
            // some wacky thread-unsafe code
        }
    }
}
Lock Eliding
public void overCautious() {       Synchronize on
    List l = new ArrayList();
    synchronized (l) {
                                     new Object
        for (option : options) {
            l.add(process(option));
        }
    }
}
                But we know it
               never escapes this
                   thread...
Lock Eliding
public void overCautious() {
    List l = new ArrayList();
    for (option : options) {
        l.add(
          /* process()’s code */);
    }
}
                          No need to lock
Escape Analysis
private static class Foo {
    public final String a;
    public final String b;

    Foo(String a, String b) {
        this.a = a;
        this.b = b;
    }
}
Escape Analysis
public void bar() {
    Foo f = new Foo("Hello", "JVM");
    baz(f);
}

public void baz(Foo f) {        Same object all
    System.out.print(f.a);
    System.out.print(", ");    the way through
    quux(f);
}
                               Never “escapes”
public void quux(Foo f) {      these methods
    System.out.print(f.b);
    System.out.println('!');
}
Escape Analysis

public secret awesome inlinedBarBazQuux() {
    System.out.print("Hello");
    System.out.print(", ");
    System.out.print("JavaOne");
    System.out.println('!');
}
                            Don’t bother allocating
                                  Foo object
Escape Analysis

• A bit tweaky on Hotspot
 • All paths must inline
 • No external view of object
• JRockit was better here?
 • Now they can fix Hotspot!
Perf Sinks
• Memory accesses
 • By far the biggest expense
• Calls
 • Memory ref + branch kills pipeline
 • Call stack, register juggling costs
• Locks
Volatile?
• Each CPU maintains a memory cache
• Caches may be out of sync
 • If it doesn’t matter, no problem
 • If it does matter, threads disagree!
• Volatile forces synchronization of cache
 • Across cores and to main memory
Call Site
• The place where you make a call
• Monomorphic (“one shape”)
 • Single target class
• Bimorphic (“two shapes”)
• Polymorphic (“many shapes”)
• Megamorphic (“you’re screwed”)
Blah.java
System.currentTimeMillis();   // static, monomorphic

List list1 = new ArrayList(); // constructor, monomorphic
List list2 = new LinkedList();

for (List list : new List[]{ list1, list2 }) {
  list.add("hello");          // bimorphic
}

for (Object obj : new Object[]{ 'foo', list1, new Object() }) {
  obj.toString();             // polymorphic
}
Hotspot

• -client mode (C1) inlines, less aggressive
 • Fewer opportunities to optimize
• -server mode (C2) inlines aggressively
 • Based on richer runtime profiling
Tiered

• Increasing tiers of interp, C1, and C2
• Level 0 = Interpreter
• Level 1-3 = C1
• Level 4 = C2
• Kinda sorta works...
system ~/projects/javaone2012-jit $ (pickjdk 4 ; time jruby -e 1)
New JDK: jdk1.7.0_07.jdk

real	 0m1.251s
user	 0m2.128s
sys	 m0.093s
   0

system ~/projects/javaone2012-jit $ (pickjdk 5 ; time jruby -e 1)
New JDK: jdk1.8.0.jdk

real	 0m1.167s
user	 0m2.767s
sys	 m0.143s
   0

system ~/projects/javaone2012-jit $ (pickjdk 5 ; 
                      time jruby -J-XX:TieredStopAtLevel=1 -e 1)
New JDK: jdk1.8.0.jdk

real	 0m0.850s
user	 0m1.344s
sys	 m0.114s
   0
C2 Compiler
• Profile to find “hot spots”
 • Call sites
 • Branch statistics
 • Profile until 10k calls
• Inline mono/bimorphic calls
• Other mechanisms for polymorphic calls
Now it gets fun!
Monitoring the JIT

• Dozens of flags
• Reams of output
• Always evolving
• How can you understand it?
public class Accumulator {
  public static void main(String[] args) {
    int max = Integer.parseInt(args[0]);
    System.out.println(addAll(max));
  }

    static int addAll(int max) {
      int accum = 0;
      for (int i = 0; i < max; i++) {
        accum = add(accum, i);
      }
      return accum;
    }

    static int add(int a, int b) {
      return a + b;
    }
}
$ java -version
openjdk version "1.7.0-b147"
OpenJDK Runtime Environment (build 1.7.0-
b147-20110927)
OpenJDK 64-Bit Server VM (build 21.0-b17, mixed mode)

$ javac Accumulator.java

$ java Accumulator 1000
499500
Print Compilation

• -XX:+PrintCompilation
• Print methods as they JIT
 • Class + name + size
$ java -XX:+PrintCompilation Accumulator 1000
     53    1             java.lang.String::hashCode (67 bytes)
499500
$ java -XX:+PrintCompilation Accumulator 1000
     53    1             java.lang.String::hashCode (67 bytes)
499500



                     Where’s our code?
$ java -XX:+PrintCompilation Accumulator 1000
     53    1             java.lang.String::hashCode (67 bytes)
499500


                     Where’s our code?
                         Remember...10k calls before JIT
10k loop, 10k calls to add


$ java -XX:+PrintCompilation Accumulator 10000
     53    1             java.lang.String::hashCode (67 bytes)
     64    2             Accumulator::add (4 bytes)
49995000




                               Hooray!
But what’s this?


$ java -XX:+PrintCompilation Accumulator 10000
     53    1             java.lang.String::hashCode (67 bytes)
     64    2             Accumulator::add (4 bytes)
49995000

                Class loading, security logic, other stuff...
Hotspot is making zombies?
1401   70         java.util.concurrent.ConcurrentHashMap::hash (49 bytes)
1412   71         java.lang.String::indexOf (7 bytes)
1420   72   !     java.io.BufferedReader::readLine (304 bytes)
1420   73         sun.nio.cs.UTF_8$Decoder::decodeArrayLoop (543 bytes)
1422   42         java.util.zip.ZipCoder::getBytes (192 bytes)    made not entrant
1435   74     n   java.lang.Object::hashCode (0 bytes)
1443   29   !     sun.misc.URLClassPath$JarLoader::getResource (91 bytes)    made zombie
1443   25         sun.misc.URLClassPath::getResource (74 bytes)    made zombie
1443   36         sun.misc.URLClassPath::getResource (74 bytes)    made not entrant
1443   43         java.util.zip.ZipCoder::encoder (35 bytes)    made not entrant
1449   75         java.lang.String::endsWith (15 bytes)
1631    1 %       sun.misc.URLClassPath::getResource @ 39 (74 bytes)
1665   76         java.lang.ClassLoader::checkName (43 bytes)
Hotspot is making zombies?
1401   70         java.util.concurrent.ConcurrentHashMap::hash (49 bytes)
1412   71         java.lang.String::indexOf (7 bytes)
1420   72   !     java.io.BufferedReader::readLine (304 bytes)
1420   73         sun.nio.cs.UTF_8$Decoder::decodeArrayLoop (543 bytes)
1422   42         java.util.zip.ZipCoder::getBytes (192 bytes)    made not entrant
1435   74     n   java.lang.Object::hashCode (0 bytes)
1443   29   !     sun.misc.URLClassPath$JarLoader::getResource (91 bytes)    made zombie
1443   25         sun.misc.URLClassPath::getResource (74 bytes)    made zombie
1443   36         sun.misc.URLClassPath::getResource (74 bytes)    made not entrant
1443   43         java.util.zip.ZipCoder::encoder (35 bytes)    made not entrant
1449   75         java.lang.String::endsWith (15 bytes)
1631    1 %       sun.misc.URLClassPath::getResource @ 39 (74 bytes)
1665   76         java.lang.ClassLoader::checkName (43 bytes)



                                    Not entrant? What the heck?
Optimistic Compilers

• Assume profile is accurate
• Aggressively optimize based on profile
• Bail out if we’re wrong
 • ...and hope that we’re usually right
Deoptimization

• Bail out of running code
• Monitoring flags describe process
 • “uncommon trap” - something’s changed
 • “not entrant” - don’t let new calls enter
 • “zombie” - on its way to deadness
JRuby red_black perf

4s




3s




2s




1s




0s
JRuby red_black perf

4s
     Most code not JITed yet

3s




2s




1s




0s
JRuby red_black perf

4s
     Most code not JITed yet

3s

        Back off
2s




1s




0s
JRuby red_black perf

4s
     Most code not JITed yet

3s

        Back off                    Back off
2s




1s




0s
No JIT At All?


• Code is too big
• Code isn’t called enough
That looks exciting!

1401   70            java.util.concurrent.ConcurrentHashMap::hash (49 bytes)
1412   71            java.lang.String::indexOf (7 bytes)
1420   72   !        java.io.BufferedReader::readLine (304 bytes)
1420   73            sun.nio.cs.UTF_8$Decoder::decodeArrayLoop (543 bytes)
1422   42            java.util.zip.ZipCoder::getBytes (192 bytes)    made not entrant
1435   74     n      java.lang.Object::hashCode (0 bytes)
1443   29   !        sun.misc.URLClassPath$JarLoader::getResource (91 bytes)    made zombie
1443   25            sun.misc.URLClassPath::getResource (74 bytes)    made zombie
1443   36            sun.misc.URLClassPath::getResource (74 bytes)    made not entrant
1443   43            java.util.zip.ZipCoder::encoder (35 bytes)    made not entrant
1449   75            java.lang.String::endsWith (15 bytes)
1631    1 %          sun.misc.URLClassPath::getResource @ 39 (74 bytes)
1665   76            java.lang.ClassLoader::checkName (43 bytes)
Exception handling in here (boring!)

1401   70         java.util.concurrent.ConcurrentHashMap::hash (49 bytes)
1412   71         java.lang.String::indexOf (7 bytes)
1420   72   !     java.io.BufferedReader::readLine (304 bytes)
1420   73         sun.nio.cs.UTF_8$Decoder::decodeArrayLoop (543 bytes)
1422   42         java.util.zip.ZipCoder::getBytes (192 bytes)    made not entrant
1435   74     n   java.lang.Object::hashCode (0 bytes)
1443   29   !     sun.misc.URLClassPath$JarLoader::getResource (91 bytes)    made zombie
1443   25         sun.misc.URLClassPath::getResource (74 bytes)    made zombie
1443   36         sun.misc.URLClassPath::getResource (74 bytes)    made not entrant
1443   43         java.util.zip.ZipCoder::encoder (35 bytes)    made not entrant
1449   75         java.lang.String::endsWith (15 bytes)
1631    1 %       sun.misc.URLClassPath::getResource @ 39 (74 bytes)
1665   76         java.lang.ClassLoader::checkName (43 bytes)
Exception Handling

• Unroll stack until someone stops us
 • Handler gets registered in JVM
 • Different treatment by JIT
• Inlined throw + catch = jump
 • If no stack trace, essentially free
What’s this “n” all about?

1401   70         java.util.concurrent.ConcurrentHashMap::hash (49 bytes)
1412   71         java.lang.String::indexOf (7 bytes)
1420   72   !     java.io.BufferedReader::readLine (304 bytes)
1420   73         sun.nio.cs.UTF_8$Decoder::decodeArrayLoop (543 bytes)
1422   42         java.util.zip.ZipCoder::getBytes (192 bytes)    made not entrant
1435   74     n   java.lang.Object::hashCode (0 bytes)
1443   29   !     sun.misc.URLClassPath$JarLoader::getResource (91 bytes)    made zombie
1443   25         sun.misc.URLClassPath::getResource (74 bytes)    made zombie
1443   36         sun.misc.URLClassPath::getResource (74 bytes)    made not entrant
1443   43         java.util.zip.ZipCoder::encoder (35 bytes)    made not entrant
1449   75         java.lang.String::endsWith (15 bytes)
1631    1 %       sun.misc.URLClassPath::getResource @ 39 (74 bytes)
1665   76         java.lang.ClassLoader::checkName (43 bytes)
This method is native

1401   70         java.util.concurrent.ConcurrentHashMap::hash (49 bytes)
1412   71         java.lang.String::indexOf (7 bytes)
1420   72   !     java.io.BufferedReader::readLine (304 bytes)
1420   73         sun.nio.cs.UTF_8$Decoder::decodeArrayLoop (543 bytes)
1422   42         java.util.zip.ZipCoder::getBytes (192 bytes)    made not entrant
1435   74     n   java.lang.Object::hashCode (0 bytes)
1443   29   !     sun.misc.URLClassPath$JarLoader::getResource (91 bytes)    made zombie
1443   25         sun.misc.URLClassPath::getResource (74 bytes)    made zombie
1443   36         sun.misc.URLClassPath::getResource (74 bytes)    made not entrant
1443   43         java.util.zip.ZipCoder::encoder (35 bytes)    made not entrant
1449   75         java.lang.String::endsWith (15 bytes)
1631    1 %       sun.misc.URLClassPath::getResource @ 39 (74 bytes)
1665   76         java.lang.ClassLoader::checkName (43 bytes)
And this one?

1401   70         java.util.concurrent.ConcurrentHashMap::hash (49 bytes)
1412   71         java.lang.String::indexOf (7 bytes)
1420   72   !     java.io.BufferedReader::readLine (304 bytes)
1420   73         sun.nio.cs.UTF_8$Decoder::decodeArrayLoop (543 bytes)
1422   42         java.util.zip.ZipCoder::getBytes (192 bytes)    made not entrant
1435   74     n   java.lang.Object::hashCode (0 bytes)
1443   29   !     sun.misc.URLClassPath$JarLoader::getResource (91 bytes)    made zombie
1443   25         sun.misc.URLClassPath::getResource (74 bytes)    made zombie
1443   36         sun.misc.URLClassPath::getResource (74 bytes)    made not entrant
1443   43         java.util.zip.ZipCoder::encoder (35 bytes)    made not entrant
1449   75         java.lang.String::endsWith (15 bytes)
1631    1 %       sun.misc.URLClassPath::getResource @ 39 (74 bytes)
1665   76         java.lang.ClassLoader::checkName (43 bytes)



       Method has been replaced while running (OSR)
On-Stack Replacement
• Running method never exits?
• But it’s getting really hot?
 • Generally means loops, back-branching
• Compile and replace while running
• Not typically useful in large systems
 • Looks great on benchmarks!
public class Accumulator {
  public static void main(String[] args) {
    int max = Integer.parseInt(args[0]);
    System.out.println(addAll(max));
  }
                                     addAll never exits...
    static int addAll(int max) {
      int accum = 0;
                                       loops until end
      for (int i = 0; i < max; i++) {
        accum = add(accum, i);
      }
      return accum;
    }

    static int add(int a, int b) {
      return a + b;
    }
}
system ~/projects/javaone2012-jit $ java -XX:+PrintCompilation 
                                               Accumulator1 1000
     63    1             java.lang.String::hashCode (55 bytes)
499500

system ~/projects/javaone2012-jit $ java -XX:+PrintCompilation 
                                               Accumulator1 10000
     63    1             java.lang.String::hashCode (55 bytes)
     74    2             Accumulator1::add (4 bytes)
49995000

system ~/projects/javaone2012-jit $ java -XX:+PrintCompilation 
                                               Accumulator1 100000
     62    1             java.lang.String::hashCode (55 bytes)
     73    2             Accumulator1::add (4 bytes)
     74    1 %           Accumulator1::addAll @ 4 (23 bytes)
704982704
Millis from JVM start
1401   70         java.util.concurrent.ConcurrentHashMap::hash (49 bytes)
1412   71         java.lang.String::indexOf (7 bytes)
1420   72   !     java.io.BufferedReader::readLine (304 bytes)
1420   73         sun.nio.cs.UTF_8$Decoder::decodeArrayLoop (543 bytes)
1422   42         java.util.zip.ZipCoder::getBytes (192 bytes)    made not entrant
1435   74     n   java.lang.Object::hashCode (0 bytes)
1443   29   !     sun.misc.URLClassPath$JarLoader::getResource (91 bytes)    made zombie
1443   25         sun.misc.URLClassPath::getResource (74 bytes)    made zombie
1443   36         sun.misc.URLClassPath::getResource (74 bytes)    made not entrant
1443   43         java.util.zip.ZipCoder::encoder (35 bytes)    made not entrant
1449   75         java.lang.String::endsWith (15 bytes)
1631    1 %       sun.misc.URLClassPath::getResource @ 39 (74 bytes)
1665   76         java.lang.ClassLoader::checkName (43 bytes)



       Sequence number of compilation
Compiling
1401   70         java.util.concurrent.ConcurrentHashMap::hash (49 bytes)
1412   71         java.lang.String::indexOf (7 bytes)
1420   72   !     java.io.BufferedReader::readLine (304 bytes)
1420   73         sun.nio.cs.UTF_8$Decoder::decodeArrayLoop (543 bytes)
1422   42         java.util.zip.ZipCoder::getBytes (192 bytes)    made not entrant
1435   74     n   java.lang.Object::hashCode (0 bytes)
1443   29   !     sun.misc.URLClassPath$JarLoader::getResource (91 bytes)    made zombie
1443   25         sun.misc.URLClassPath::getResource (74 bytes)    made zombie
1443   36         sun.misc.URLClassPath::getResource (74 bytes)    made not entrant
1443   43         java.util.zip.ZipCoder::encoder (35 bytes)    made not entrant
1449   75         java.lang.String::endsWith (15 bytes)
1631    1 %       sun.misc.URLClassPath::getResource @ 39 (74 bytes)
1665   76         java.lang.ClassLoader::checkName (43 bytes)
Backing Off
1401   70         java.util.concurrent.ConcurrentHashMap::hash (49 bytes)
1412   71         java.lang.String::indexOf (7 bytes)
1420   72   !     java.io.BufferedReader::readLine (304 bytes)
1420   73         sun.nio.cs.UTF_8$Decoder::decodeArrayLoop (543 bytes)
1422   42         java.util.zip.ZipCoder::getBytes (192 bytes)    made not entrant
1435   74     n   java.lang.Object::hashCode (0 bytes)
1443   29   !     sun.misc.URLClassPath$JarLoader::getResource (91 bytes)    made zombie
1443   25         sun.misc.URLClassPath::getResource (74 bytes)    made zombie
1443   36         sun.misc.URLClassPath::getResource (74 bytes)    made not entrant
1443   43         java.util.zip.ZipCoder::encoder (35 bytes)    made not entrant
1449   75         java.lang.String::endsWith (15 bytes)
1631    1 %       sun.misc.URLClassPath::getResource @ 39 (74 bytes)
1665   76         java.lang.ClassLoader::checkName (43 bytes)
OSR
1401   70         java.util.concurrent.ConcurrentHashMap::hash (49 bytes)
1412   71         java.lang.String::indexOf (7 bytes)
1420   72   !     java.io.BufferedReader::readLine (304 bytes)
1420   73         sun.nio.cs.UTF_8$Decoder::decodeArrayLoop (543 bytes)
1422   42         java.util.zip.ZipCoder::getBytes (192 bytes)    made not entrant
1435   74     n   java.lang.Object::hashCode (0 bytes)
1443   29   !     sun.misc.URLClassPath$JarLoader::getResource (91 bytes)    made zombie
1443   25         sun.misc.URLClassPath::getResource (74 bytes)    made zombie
1443   36         sun.misc.URLClassPath::getResource (74 bytes)    made not entrant
1443   43         java.util.zip.ZipCoder::encoder (35 bytes)    made not entrant
1449   75         java.lang.String::endsWith (15 bytes)
1631    1 %       sun.misc.URLClassPath::getResource @ 39 (74 bytes)
1665   76         java.lang.ClassLoader::checkName (43 bytes)
system ~/projects/javaone2012-jit $ java -XX:+PrintCompilation -XX:+TieredCompilation 
                                                                        Accumulator1 1000
      55   1       3       java.lang.String::charAt (29 bytes)
      57   2       3       java.lang.String::hashCode (55 bytes)
      57   3       3       java.lang.Object::<init> (1 bytes)
      57   4     n 0       java.lang.System::arraycopy (0 bytes)    (static)
      57   5       3       java.lang.String::indexOf (70 bytes)
      57   6       3       java.lang.String::length (6 bytes)
      58   7       3       java.lang.AbstractStringBuilder::ensureCapacityInternal (16 bytes)
      59   8       3       java.lang.String::equals (81 bytes)
...
      69  26       3       java.lang.Character::toLowerCase (6 bytes)
      69  27       3       java.lang.AbstractStringBuilder::append (48 bytes)
      70  28       3       java.lang.String::indexOf (7 bytes)
      72  29       4       java.lang.String::charAt (29 bytes)
      72  30       3       java.lang.StringBuilder::append (8 bytes)
      73  31       1       java.net.URL::getProtocol (5 bytes)
      73  32       3       java.lang.String::lastIndexOf (52 bytes)
      74  33       3       java.io.UnixFileSystem::normalize (75 bytes)
      75   1       3       java.lang.String::charAt (29 bytes)   made not entrant
      77  36     n 0       java.lang.Thread::currentThread (0 bytes)    (static)
      77  35       3       Accumulator1::add (4 bytes)
49950
                       Tier we’re at                       Only called 1k times
Print Inlining

• -XX:+UnlockDiagnosticVMOptions
  -XX:+PrintInlining
• Display hierarchy of inlined methods
• Include reasons for not inlining
• More, better output on OpenJDK 7
$ java -XX:+UnlockDiagnosticVMOptions 
>      -XX:+PrintInlining 
>      Accumulator 10000
49995000
$ java -XX:+UnlockDiagnosticVMOptions 
>      -XX:+PrintInlining 
>      Accumulator 10000
49995000
             Um...I don’t see anything inlining
public class Accumulator {
  public static void main(String[] args) {
    int max = Integer.parseInt(args[0]);
    System.out.println(addAll(max));
  }

    static int addAll(int max) {   Called   only once
      int accum = 0;
      for (int i = 0; i < max; i++) {
        accum = add(accum, i);
      }
      return accum;
    }

    static int add(int a, int b) {
      return a + b;
    }
}
public class Accumulator {
  public static void main(String[] args) {
    int max = Integer.parseInt(args[0]);
    System.out.println(addAll(max));
  }

    static int addAll(int max) {   Called only   once
      int accum = 0;
      for (int i = 0; i < max; i++) {
        accum = add(accum, i);
      }
      return accum;
                          Called 10k times
    }

    static int add(int a, int b) {
      return a + b;
    }
}
public class Accumulator {
  public static void main(String[] args) {
    int max = Integer.parseInt(args[0]);
    System.out.println(addAll(max));
  }

    static int addAll(int max) {   Called only   once
      int accum = 0;
      for (int i = 0; i < max; i++) {
        accum = add(accum, i);
      }
      return accum;
                          Called 10k times
    }

    static int add(int a, int b) {    JITs as expected
      return a + b;
    }
}
public class Accumulator {
  public static void main(String[] args) {
    int max = Integer.parseInt(args[0]);
    System.out.println(addAll(max));
  }

    static int addAll(int max) {   Called only   once
      int accum = 0;
      for (int i = 0; i < max; i++) {
        accum = add(accum, i);
      }
      return accum;
                          Called 10k times
    }

    static int add(int a, int b) {    JITs as expected
      return a + b;
    }
}                       But makes no calls!
static double addAllSqrts(int max) {
  double accum = 0;
  for (int i = 0; i < max; i++) {
    accum = addSqrt(accum, i);
  }
  return accum;
}

static int addSqrt(double a, int b) {
  return a + sqrt(b);
}

static double sqrt(int a) {
  return Math.sqrt(b);
}
$ java -XX:+UnlockDiagnosticVMOptions 
>       -XX:+PrintInlining 
>       -XX:+PrintCompilation 
>       Accumulator 10000
     53     1             java.lang.String::hashCode (67 bytes)
     65     2             Accumulator::addSqrt (7 bytes)
            @ 3   Accumulator::sqrt (6 bytes)    inline (hot)
              @ 2   java.lang.Math::sqrt (5 bytes)    (intrinsic)
     65     3             Accumulator::sqrt (6 bytes)
            @ 2   java.lang.Math::sqrt (5 bytes)    (intrinsic)
666616.4591971082
$ java -XX:+UnlockDiagnosticVMOptions 
>       -XX:+PrintInlining                   HOT HOT HOT!
>       -XX:+PrintCompilation 
>       Accumulator 10000
     53     1             java.lang.String::hashCode (67 bytes)
     65     2             Accumulator::addSqrt (7 bytes)
            @ 3   Accumulator::sqrt (6 bytes)    inline (hot)
              @ 2   java.lang.Math::sqrt (5 bytes)    (intrinsic)
     65     3             Accumulator::sqrt (6 bytes)
            @ 2   java.lang.Math::sqrt (5 bytes)    (intrinsic)
666616.4591971082
$ java -XX:+UnlockDiagnosticVMOptions 
>       -XX:+PrintInlining 
>       -XX:+PrintCompilation 
>       Accumulator 10000
     53     1             java.lang.String::hashCode (67 bytes)
     65     2             Accumulator::addSqrt (7 bytes)
            @ 3   Accumulator::sqrt (6 bytes)    inline (hot)
              @ 2   java.lang.Math::sqrt (5 bytes)    (intrinsic)
     65     3             Accumulator::sqrt (6 bytes)
            @ 2   java.lang.Math::sqrt (5 bytes)    (intrinsic)
666616.4591971082

                                Calls treated specially by JIT
Intrinsic?

• Known to the JIT
 • Don’t inline bytecode
 • Do insert “best” native code
   • e.g. kernel-level memory operation
   • e.g. optimized sqrt in machine code
Common Intrinsics
• String#equals
• Most (all?) Math methods
• System.arraycopy
• Object#hashCode
• Object#getClass
• sun.misc.Unsafe methods
LogCompilation

• -XX:+LogCompilation
• Dumps compiler events to hotspot.log
• Tons and tons of output
scopes_pcs_offset='1384' dependencies_offset='1576' handler_table_offset='1592' nul_chk_table_offset='1736'
oops_offset='992' method='org/jruby/lexer/yacc/ByteArrayLexerSource$ByteArrayCursor read ()I' bytes='49'
count='5296' backedge_count='1' iicount='10296' stamp='0.412'/>
<writer thread='4425007104'/>
<nmethod compile_id='21' compiler='C2' entry='4345862528' size='1152' address='4345862160'
relocation_offset='288' insts_offset='368' stub_offset='688' scopes_data_offset='840' scopes_pcs_offset='904'
dependencies_offset='1016' handler_table_offset='1032' oops_offset='784' method='org/jruby/lexer/yacc/
ByteArrayLexerSource forward (I)I' bytes='111' count='5296' backedge_count='1' iicount='10296' stamp='0.412'/>
<writer thread='4300214272'/>
<task_queued compile_id='22' method='org/jruby/lexer/yacc/ByteArrayLexerSource read ()I' bytes='10'
count='5000' backedge_count='1' iicount='10000' stamp='0.433' comment='count' hot_count='10000'/>
<writer thread='4426067968'/>
<nmethod compile_id='22' compiler='C2' entry='4345885984' size='1888' address='4345885584'
relocation_offset='288' insts_offset='400' stub_offset='912' scopes_data_offset='1104'
scopes_pcs_offset='1496' dependencies_offset='1704' handler_table_offset='1720' nul_chk_table_offset='1864'
oops_offset='1024' method='org/jruby/lexer/yacc/ByteArrayLexerSource read ()I' bytes='10' count='5044'
backedge_count='1' iicount='10044' stamp='0.435'/>
<writer thread='4300214272'/>
<task_queued compile_id='23' method='java/util/HashMap hash (I)I' bytes='23' count='5000' backedge_count='1'
iicount='10000' stamp='0.442' comment='count' hot_count='10000'/>
<writer thread='4425007104'/>
<nmethod compile_id='23' compiler='C2' entry='4345887808' size='440' address='4345887504'
relocation_offset='288' insts_offset='304' stub_offset='368' scopes_data_offset='392' scopes_pcs_offset='400'
dependencies_offset='432' method='java/util/HashMap hash (I)I' bytes='23' count='5039' backedge_count='1'
iicount='10039' stamp='0.442'/>
<writer thread='4300214272'/>
<dependency_failed type='abstract_with_unique_concrete_subtype' ctxk='org/jruby/lexer/yacc/LexerSource'
x='org/jruby/lexer/yacc/ByteArrayLexerSource' witness='org/jruby/lexer/yacc/InputStreamLexerSource'
stamp='0.456'/>
<dependency_failed type='abstract_with_unique_concrete_subtype' ctxk='org/jruby/lexer/yacc/LexerSource'
x='org/jruby/lexer/yacc/ByteArrayLexerSource' witness='org/jruby/lexer/yacc/InputStreamLexerSource'
stamp='0.456'/>
<dependency_failed type='abstract_with_unique_concrete_subtype' ctxk='org/jruby/lexer/yacc/LexerSource'
x='org/jruby/lexer/yacc/ByteArrayLexerSource' witness='org/jruby/lexer/yacc/InputStreamLexerSource'
stamp='0.456'/>
<dependency_failed type='abstract_with_unique_concrete_subtype' ctxk='org/jruby/lexer/yacc/LexerSource'
x='org/jruby/lexer/yacc/ByteArrayLexerSource' witness='org/jruby/lexer/yacc/InputStreamLexerSource'
stamp='0.456'/>
scopes_pcs_offset='1384' dependencies_offset='1576' handler_table_offset='1592' nul_chk_table_offset='1736'
oops_offset='992' method='org/jruby/lexer/yacc/ByteArrayLexerSource$ByteArrayCursor read ()I' bytes='49'
count='5296' backedge_count='1' iicount='10296' stamp='0.412'/>
<writer thread='4425007104'/>
<nmethod compile_id='21' compiler='C2' entry='4345862528' size='1152' address='4345862160'
relocation_offset='288' insts_offset='368' stub_offset='688' scopes_data_offset='840' scopes_pcs_offset='904'
dependencies_offset='1016' handler_table_offset='1032' oops_offset='784' method='org/jruby/lexer/yacc/
ByteArrayLexerSource forward (I)I' bytes='111' count='5296' backedge_count='1' iicount='10296' stamp='0.412'/>
<writer thread='4300214272'/>
<task_queued compile_id='22' method='org/jruby/lexer/yacc/ByteArrayLexerSource read ()I' bytes='10'
count='5000' backedge_count='1' iicount='10000' stamp='0.433' comment='count' hot_count='10000'/>
<writer thread='4426067968'/>
<nmethod compile_id='22' compiler='C2' entry='4345885984' size='1888' address='4345885584'
relocation_offset='288' insts_offset='400' stub_offset='912' scopes_data_offset='1104'
scopes_pcs_offset='1496' dependencies_offset='1704' handler_table_offset='1720' nul_chk_table_offset='1864'
oops_offset='1024' method='org/jruby/lexer/yacc/ByteArrayLexerSource read ()I' bytes='10' count='5044'
backedge_count='1' iicount='10044' stamp='0.435'/>
<writer thread='4300214272'/>
<task_queued compile_id='23' method='java/util/HashMap hash (I)I' bytes='23' count='5000' backedge_count='1'
iicount='10000' stamp='0.442' comment='count' hot_count='10000'/>
<writer thread='4425007104'/>
<nmethod compile_id='23' compiler='C2' entry='4345887808' size='440' address='4345887504'
relocation_offset='288' insts_offset='304' stub_offset='368' scopes_data_offset='392' scopes_pcs_offset='400'
dependencies_offset='432' method='java/util/HashMap hash (I)I' bytes='23' count='5039' backedge_count='1'
iicount='10039' stamp='0.442'/>
<writer thread='4300214272'/>
<dependency_failed type='abstract_with_unique_concrete_subtype' ctxk='org/jruby/lexer/yacc/LexerSource'
x='org/jruby/lexer/yacc/ByteArrayLexerSource' witness='org/jruby/lexer/yacc/InputStreamLexerSource'
stamp='0.456'/>
<dependency_failed type='abstract_with_unique_concrete_subtype' ctxk='org/jruby/lexer/yacc/LexerSource'
x='org/jruby/lexer/yacc/ByteArrayLexerSource' witness='org/jruby/lexer/yacc/InputStreamLexerSource'
stamp='0.456'/>
<dependency_failed type='abstract_with_unique_concrete_subtype' ctxk='org/jruby/lexer/yacc/LexerSource'
x='org/jruby/lexer/yacc/ByteArrayLexerSource' witness='org/jruby/lexer/yacc/InputStreamLexerSource'
stamp='0.456'/>
<dependency_failed type='abstract_with_unique_concrete_subtype' ctxk='org/jruby/lexer/yacc/LexerSource'
x='org/jruby/lexer/yacc/ByteArrayLexerSource' witness='org/jruby/lexer/yacc/InputStreamLexerSource'
stamp='0.456'/>
Worst XML Evar

• Relational structure in hierarchical form
• Hotspot guys can read it...I cannot
• <JDK>/hotspot/src/share/tools/LogCompilation
   • or http://github.com/headius/logc
No flags, like PrintCompilation

$ java -jar logc.jar hotspot.log
1    java.lang.String::hashCode (67 bytes)
2    Accumulator::addSqrt (7 bytes)
3    Accumulator::sqrt (6 bytes)
-i flag, PrintCompilation and PrintInlining
$ java -jar logc.jar -i hotspot.log
1    java.lang.String::hashCode (67 bytes)
2    Accumulator::addSqrt (7 bytes)
    @ 2 Accumulator::sqrt (6 bytes) (end time: 0.0660 nodes: 36)
      @ 2 java.lang.Math::sqrt (5 bytes)
3    Accumulator::sqrt (6 bytes)
    @ 2 java.lang.Math::sqrt (5 bytes)
-i flag, PrintCompilation and PrintInlining
$ java -jar logc.jar -i hotspot.log
1    java.lang.String::hashCode (67 bytes)
2    Accumulator::addSqrt (7 bytes)
    @ 2 Accumulator::sqrt (6 bytes) (end time: 0.0660 nodes: 36)
      @ 2 java.lang.Math::sqrt (5 bytes)
3    Accumulator::sqrt (6 bytes)
    @ 2 java.lang.Math::sqrt (5 bytes)
8     sun.nio.cs.UTF_8$Encoder::encode (361 bytes)
6 uncommon trap null_check make_not_entrant
   @8 java/lang/String equals (Ljava/lang/Object;)Z
6 make_not_entrant
9     java.lang.String::equals (88 bytes)
10     java.util.LinkedList::indexOf (73 bytes)
Hotspot sees it’s 100% String
   10     java.util.LinkedList::indexOf (73 bytes)
        @ 52 java.lang.Object::equals (11 bytes)
          type profile java/lang/Object -> java/lang/String (100%)
        @ 52 java.lang.String::equals (88 bytes)
   11     java.lang.String::indexOf (87 bytes)
        @ 83 java.lang.String::indexOfSupplementary too big




                              Too big to inline! Could be bad?
Tuning Inlining
• -XX:+MaxInlineSize=35
 • Largest inlinable method (bytecode)
• -XX:+InlineSmallCode=#
 • Largest inlinable compiled method
• -XX:+FreqInlineSize=#
 • Largest frequently-called method...
Tuning Inlining

• -XX:+MaxInlineLevel=9
 • How deep does the rabbit hole go?
• -XX:+MaxRecursiveInlineLevel=#
 • Recursive inlining
Did someone say
MACHINE CODE?!
The Red Pill

• Knowing code compiles is good
• Knowing code inlines is better
• Seeing the actual assembly is best!
Caveat


• I don’t really know assembly.
• But I fake it really well.
Print Assembly

• -XX:+PrintAssembly
• Google “hotspot printassembly”
• https://wikis.oracle.com/display/
  HotSpotInternals/PrintAssembly
• Assembly-dumping plugin for Hotspot
Alternative

• -XX:+PrintOptoAssembly
• Only in debug/fastdebug builds
• Not as pretty
Wednesday, July 27, 2011




    ~/oscon ! java -XX:+UnlockDiagnosticVMOptions 
    >              -XX:+PrintAssembly 
    >              Accumulator 10000
    OpenJDK 64-Bit Server VM warning: PrintAssembly is enabled;
    turning on DebugNonSafepoints to gain additional output
    Loaded disassembler from hsdis-amd64.dylib
    ...
Decoding compiled method 11343cbd0:
Code:
[Disassembling for mach='i386:x86-64']
[Entry Point]
[Verified Entry Point]
[Constants]
  # {method} 'add' '(II)I' in 'Accumulator'
  # parm0:    rsi        = int
  # parm1:    rdx        = int
  #           [sp+0x20] (sp of caller)
  11343cd00: push   %rbp
  11343cd01: sub    $0x10,%rsp
  11343cd05: nop                              ;*synchronization entry
                                              ; - Accumulator::add@-1 (line 16)
  11343cd06: mov    %esi,%eax
  11343cd08: add    %edx,%eax                 ;*iadd
                                              ; - Accumulator::add@2 (line 16)
  11343cd0a: add    $0x10,%rsp
  11343cd0e: pop    %rbp
  11343cd0f: test   %eax,-0x1303fd15(%rip)      # 1003fd000
                                              ;   {poll_return}
  11343cd15: retq
Woah there buddy...
x86_64 Assembly 101
       add                Two’s complement add
       sub                        ...subtract
      mov*                Move data from a to b
       jmp                            goto
je, jne, jl, jge, ...     Jump if ==, !=, <, >=, ...
   push, pop               Call stack operations
   call*, ret*          Call, return from subroutine
eax, ebx, esi, ...              32-bit registers
rax, rbx, rsi, ...              64-bit registers
Register Machine

• Instead of stack moves, we have “slots”
 • Move data into slots
 • Trigger operations that manipulate data
 • Get new data out of slots
• JVM stack, locals end up as register ops
Native Stack?

• Native code has a stack too
 • Preserves registers from call to call
• Various calling conventions
 • Caller preserves registers?
 • Callee preserves registers?
Decoding compiled method 11343cbd0:           <= address of new compiled code
Code:
[Disassembling for mach='i386:x86-64']        <= architecture
[Entry Point]
[Verified Entry Point]
[Constants]
  # {method} 'add' '(II)I' in 'Accumulator'   <=   method, signature, class
  # parm0:    rsi       = int                 <=   first parm to method goes in rsi
  # parm1:    rdx       = int                 <=   second parm goes in rdx
  #           [sp+0x20] (sp of caller)        <=   caller’s pointer into native stack
11343cd00: push   %rbp
11343cd01: sub    $0x10,%rsp
11343cd05: nop                             ;*synchronization entry
                                           ; - Accumulator::add@-1 (line 16)
11343cd06: mov    %esi,%eax
11343cd08: add    %edx,%eax                ;*iadd
                                           ; - Accumulator::add@2 (line 16)
11343cd0a: add    $0x10,%rsp
11343cd0e: pop    %rbp
11343cd0f: test   %eax,-0x1303fd15(%rip)     # 1003fd000
                                           ;   {poll_return}
11343cd15: retq



 rbp points at current stack frame, so we save it off.
11343cd00: push   %rbp
11343cd01: sub    $0x10,%rsp
11343cd05: nop                             ;*synchronization entry
                                           ; - Accumulator::add@-1 (line 16)
11343cd06: mov    %esi,%eax
11343cd08: add    %edx,%eax                ;*iadd
                                           ; - Accumulator::add@2 (line 16)
11343cd0a: add    $0x10,%rsp
11343cd0e: pop    %rbp
11343cd0f: test   %eax,-0x1303fd15(%rip)     # 1003fd000
                                           ;   {poll_return}
11343cd15: retq



     Two args, so we bump stack pointer by 0x10.
11343cd00: push   %rbp
11343cd01: sub    $0x10,%rsp
11343cd05: nop                             ;*synchronization entry
                                           ; - Accumulator::add@-1 (line 16)
11343cd06: mov    %esi,%eax
11343cd08: add    %edx,%eax                ;*iadd
                                           ; - Accumulator::add@2 (line 16)
11343cd0a: add    $0x10,%rsp
11343cd0e: pop    %rbp
11343cd0f: test   %eax,-0x1303fd15(%rip)     # 1003fd000
                                           ;   {poll_return}
11343cd15: retq



          Do nothing, e.g. to memory-align code.
11343cd00: push   %rbp
11343cd01: sub    $0x10,%rsp
11343cd05: nop                             ;*synchronization entry
                                           ; - Accumulator::add@-1 (line 16)
11343cd06: mov    %esi,%eax
11343cd08: add    %edx,%eax                ;*iadd
                                           ; - Accumulator::add@2 (line 16)
11343cd0a: add    $0x10,%rsp
11343cd0e: pop    %rbp
11343cd0f: test   %eax,-0x1303fd15(%rip)     # 1003fd000
                                           ;   {poll_return}
11343cd15: retq

     At the “-1” instruction of our add() method...
                     i.e. here we go!
11343cd00: push   %rbp
11343cd01: sub    $0x10,%rsp
11343cd05: nop                             ;*synchronization entry
                                           ; - Accumulator::add@-1 (line 16)
11343cd06: mov    %esi,%eax
11343cd08: add    %edx,%eax                ;*iadd
                                           ; - Accumulator::add@2 (line 16)
11343cd0a: add    $0x10,%rsp
11343cd0e: pop    %rbp
11343cd0f: test   %eax,-0x1303fd15(%rip)     # 1003fd000
                                           ;   {poll_return}
11343cd15: retq



                       Move parm1 into eax.
11343cd00: push   %rbp
11343cd01: sub    $0x10,%rsp
11343cd05: nop                             ;*synchronization entry
                                           ; - Accumulator::add@-1 (line 16)
11343cd06: mov    %esi,%eax
11343cd08: add    %edx,%eax                ;*iadd
                                           ; - Accumulator::add@2 (line 16)
11343cd0a: add    $0x10,%rsp
11343cd0e: pop    %rbp
11343cd0f: test   %eax,-0x1303fd15(%rip)     # 1003fd000
                                           ;   {poll_return}
11343cd15: retq



       Add parm0 and parm1, store result in eax.
11343cd00: push   %rbp
11343cd01: sub    $0x10,%rsp
11343cd05: nop                             ;*synchronization entry
                                           ; - Accumulator::add@-1 (line 16)
11343cd06: mov    %esi,%eax
11343cd08: add    %edx,%eax                ;*iadd
                                           ; - Accumulator::add@2 (line 16)
11343cd0a: add    $0x10,%rsp
11343cd0e: pop    %rbp
11343cd0f: test   %eax,-0x1303fd15(%rip)     # 1003fd000
                                           ;   {poll_return}
11343cd15: retq



 How nice, Hotspot shows us this is our “iadd” op!
11343cd00: push   %rbp
11343cd01: sub    $0x10,%rsp
11343cd05: nop                             ;*synchronization entry
                                           ; - Accumulator::add@-1 (line 16)
11343cd06: mov    %esi,%eax
11343cd08: add    %edx,%eax                ;*iadd
                                           ; - Accumulator::add@2 (line 16)
11343cd0a: add    $0x10,%rsp
11343cd0e: pop    %rbp
11343cd0f: test   %eax,-0x1303fd15(%rip)     # 1003fd000
                                           ;   {poll_return}
11343cd15: retq



            Put stack pointer back where it was.
11343cd00: push   %rbp
11343cd01: sub    $0x10,%rsp
11343cd05: nop                             ;*synchronization entry
                                           ; - Accumulator::add@-1 (line 16)
11343cd06: mov    %esi,%eax
11343cd08: add    %edx,%eax                ;*iadd
                                           ; - Accumulator::add@2 (line 16)
11343cd0a: add    $0x10,%rsp
11343cd0e: pop    %rbp
11343cd0f: test   %eax,-0x1303fd15(%rip)     # 1003fd000
                                           ;   {poll_return}
11343cd15: retq



                     Restore rbp from stack.
11343cd00: push   %rbp
11343cd01: sub    $0x10,%rsp
11343cd05: nop                             ;*synchronization entry
                                           ; - Accumulator::add@-1 (line 16)
11343cd06: mov    %esi,%eax
11343cd08: add    %edx,%eax                ;*iadd
                                           ; - Accumulator::add@2 (line 16)
11343cd0a: add    $0x10,%rsp
11343cd0e: pop    %rbp
11343cd0f: test   %eax,-0x1303fd15(%rip)     # 1003fd000
                                           ;   {poll_return}
11343cd15: retq



   Poll a “safepoint”...give JVM a chance to GC, etc.
11343cd00: push   %rbp
11343cd01: sub    $0x10,%rsp
11343cd05: nop                             ;*synchronization entry
                                           ; - Accumulator::add@-1 (line 16)
11343cd06: mov    %esi,%eax
11343cd08: add    %edx,%eax                ;*iadd
                                           ; - Accumulator::add@2 (line 16)
11343cd0a: add    $0x10,%rsp
11343cd0e: pop    %rbp
11343cd0f: test   %eax,-0x1303fd15(%rip)     # 1003fd000
                                           ;   {poll_return}
11343cd15: retq



                                 All done!
Things to Watch For

• CALL operations
 • Indicates something failed to inline
• LOCK operations
 • Cache-busting, e.g. volatility
CALL
  1134858f5: xchg    %ax,%ax
  1134858f7: callq   113414aa0   ; OopMap{off=316}
                                 ;*invokespecial addAsBignum
                                 ; - org.jruby.RubyFixnum::addFixnum@29 (line 348)
                                 ;   {optimized virtual_call}
  1134858fc: jmpq    11348586d




Ruby integer adds might overflow into Bignum, leading to
addAsBignum call. In this case, it’s never called, so Hotspot
          emits callq assuming we won’t hit it.
LOCK
Code from a RubyBasicObject’s default constructor.
11345d823: mov    0x70(%r8),%r9d    ;*getstatic NULL_OBJECT_ARRAY
                                    ; - org.jruby.RubyBasicObject::<init>@5 (line 76)
                                    ; - org.jruby.RubyObject::<init>@2 (line 118)
                                    ; - org.jruby.RubyNumeric::<init>@2 (line 111)
                                    ; - org.jruby.RubyInteger::<init>@2 (line 95)
                                    ; - org.jruby.RubyFixnum::<init>@5 (line 112)
                                    ; - org.jruby.RubyFixnum::newFixnum@25 (line 173)
11345d827: mov    %r9d,0x14(%rax)
11345d82b: lock addl $0x0,(%rsp)    ;*putfield varTable
                                    ; - org.jruby.RubyBasicObject::<init>@8 (line 76)
                                    ; - org.jruby.RubyObject::<init>@2 (line 118)
                                    ; - org.jruby.RubyNumeric::<init>@2 (line 111)
                                    ; - org.jruby.RubyInteger::<init>@2 (line 95)
                                    ; - org.jruby.RubyFixnum::<init>@5 (line 112)
                                    ; - org.jruby.RubyFixnum::newFixnum@25 (line 173)


Why are we doing a volatile write in the constructor?
LOCK
public class RubyBasicObject ... {
    private static final boolean DEBUG = false;
    private static final Object[] NULL_OBJECT_ARRAY = new Object[0];

    // The class of this object
    protected transient RubyClass metaClass;

    // zeroed by jvm
    protected int flags;

    // variable table, lazily allocated as needed (if needed)
    private volatile Object[] varTable = NULL_OBJECT_ARRAY;



 Maybe it’s not such a good idea to pre-init a volatile?
LOCK

~/projects/jruby ! git log 2f935de1e40bfd8b29b3a74eaed699e519571046 -1 | cat
commit 2f935de1e40bfd8b29b3a74eaed699e519571046
Author: Charles Oliver Nutter <headius@headius.com>
Date:   Tue Jun 14 02:59:41 2011 -0500

    Do not eagerly initialize volatile varTable field in RubyBasicObject;
speeds object creation significantly.




   LEVEL UP!
What Have We Learned?

 • How Hotspot’s JIT works
 • How to monitor the JIT
 • How to find problems
 • How to fix problems we find
What We Missed

• Tuning GC settings in JVM
• Monitoring GC with VisualVM
 • Google ‘visualgc’...it’s awesome
You’re no dummy now!
          ;-)
Thank you!

• headius@headius.com, @headius
• http://blog.headius.com
• “java virtual machine specification”
• “jvm opcodes”

More Related Content

What's hot

「書ける」から「できる」になれる! ~Javaメモリ節約ノウハウ話~
「書ける」から「できる」になれる! ~Javaメモリ節約ノウハウ話~「書ける」から「できる」になれる! ~Javaメモリ節約ノウハウ話~
「書ける」から「できる」になれる! ~Javaメモリ節約ノウハウ話~JustSystems Corporation
 
Experience with jemalloc
Experience with jemallocExperience with jemalloc
Experience with jemallocKit Chan
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at NetflixBrendan Gregg
 
JVM Mechanics: When Does the JVM JIT & Deoptimize?
JVM Mechanics: When Does the JVM JIT & Deoptimize?JVM Mechanics: When Does the JVM JIT & Deoptimize?
JVM Mechanics: When Does the JVM JIT & Deoptimize?Doug Hawkins
 
Javaコードが速く実⾏される秘密 - JITコンパイラ⼊⾨(JJUG CCC 2020 Fall講演資料)
Javaコードが速く実⾏される秘密 - JITコンパイラ⼊⾨(JJUG CCC 2020 Fall講演資料)Javaコードが速く実⾏される秘密 - JITコンパイラ⼊⾨(JJUG CCC 2020 Fall講演資料)
Javaコードが速く実⾏される秘密 - JITコンパイラ⼊⾨(JJUG CCC 2020 Fall講演資料)NTT DATA Technology & Innovation
 
LLVM Backend Porting
LLVM Backend PortingLLVM Backend Porting
LLVM Backend PortingShiva Chen
 
Black Hat EU 2010 - Attacking Java Serialized Communication
Black Hat EU 2010 - Attacking Java Serialized CommunicationBlack Hat EU 2010 - Attacking Java Serialized Communication
Black Hat EU 2010 - Attacking Java Serialized Communicationmsaindane
 
Intel AVX-512/富岳SVE用SIMDコード生成ライブラリsimdgen
Intel AVX-512/富岳SVE用SIMDコード生成ライブラリsimdgenIntel AVX-512/富岳SVE用SIMDコード生成ライブラリsimdgen
Intel AVX-512/富岳SVE用SIMDコード生成ライブラリsimdgenMITSUNARI Shigeo
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsSpark Summit
 
MariaDB 10: The Complete Tutorial
MariaDB 10: The Complete TutorialMariaDB 10: The Complete Tutorial
MariaDB 10: The Complete TutorialColin Charles
 
A Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLA Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLDatabricks
 
Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...
Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...
Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...Monica Beckwith
 
How to implement a simple dalvik virtual machine
How to implement a simple dalvik virtual machineHow to implement a simple dalvik virtual machine
How to implement a simple dalvik virtual machineChun-Yu Wang
 
Java Performance Analysis on Linux with Flame Graphs
Java Performance Analysis on Linux with Flame GraphsJava Performance Analysis on Linux with Flame Graphs
Java Performance Analysis on Linux with Flame GraphsBrendan Gregg
 
Real Life Clean Architecture
Real Life Clean ArchitectureReal Life Clean Architecture
Real Life Clean ArchitectureMattia Battiston
 
Javaはどのように動くのか~スライドでわかるJVMの仕組み
Javaはどのように動くのか~スライドでわかるJVMの仕組みJavaはどのように動くのか~スライドでわかるJVMの仕組み
Javaはどのように動くのか~スライドでわかるJVMの仕組みChihiro Ito
 

What's hot (20)

「書ける」から「できる」になれる! ~Javaメモリ節約ノウハウ話~
「書ける」から「できる」になれる! ~Javaメモリ節約ノウハウ話~「書ける」から「できる」になれる! ~Javaメモリ節約ノウハウ話~
「書ける」から「できる」になれる! ~Javaメモリ節約ノウハウ話~
 
Experience with jemalloc
Experience with jemallocExperience with jemalloc
Experience with jemalloc
 
Inside the jvm
Inside the jvmInside the jvm
Inside the jvm
 
Go入門
Go入門Go入門
Go入門
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at Netflix
 
JVM Mechanics: When Does the JVM JIT & Deoptimize?
JVM Mechanics: When Does the JVM JIT & Deoptimize?JVM Mechanics: When Does the JVM JIT & Deoptimize?
JVM Mechanics: When Does the JVM JIT & Deoptimize?
 
Javaコードが速く実⾏される秘密 - JITコンパイラ⼊⾨(JJUG CCC 2020 Fall講演資料)
Javaコードが速く実⾏される秘密 - JITコンパイラ⼊⾨(JJUG CCC 2020 Fall講演資料)Javaコードが速く実⾏される秘密 - JITコンパイラ⼊⾨(JJUG CCC 2020 Fall講演資料)
Javaコードが速く実⾏される秘密 - JITコンパイラ⼊⾨(JJUG CCC 2020 Fall講演資料)
 
LLVM Backend Porting
LLVM Backend PortingLLVM Backend Porting
LLVM Backend Porting
 
Metaspace
MetaspaceMetaspace
Metaspace
 
Black Hat EU 2010 - Attacking Java Serialized Communication
Black Hat EU 2010 - Attacking Java Serialized CommunicationBlack Hat EU 2010 - Attacking Java Serialized Communication
Black Hat EU 2010 - Attacking Java Serialized Communication
 
Intel AVX-512/富岳SVE用SIMDコード生成ライブラリsimdgen
Intel AVX-512/富岳SVE用SIMDコード生成ライブラリsimdgenIntel AVX-512/富岳SVE用SIMDコード生成ライブラリsimdgen
Intel AVX-512/富岳SVE用SIMDコード生成ライブラリsimdgen
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
 
MariaDB 10: The Complete Tutorial
MariaDB 10: The Complete TutorialMariaDB 10: The Complete Tutorial
MariaDB 10: The Complete Tutorial
 
A Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLA Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQL
 
Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...
Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...
Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...
 
How to implement a simple dalvik virtual machine
How to implement a simple dalvik virtual machineHow to implement a simple dalvik virtual machine
How to implement a simple dalvik virtual machine
 
Java Performance Analysis on Linux with Flame Graphs
Java Performance Analysis on Linux with Flame GraphsJava Performance Analysis on Linux with Flame Graphs
Java Performance Analysis on Linux with Flame Graphs
 
InnoDB Locking Explained with Stick Figures
InnoDB Locking Explained with Stick FiguresInnoDB Locking Explained with Stick Figures
InnoDB Locking Explained with Stick Figures
 
Real Life Clean Architecture
Real Life Clean ArchitectureReal Life Clean Architecture
Real Life Clean Architecture
 
Javaはどのように動くのか~スライドでわかるJVMの仕組み
Javaはどのように動くのか~スライドでわかるJVMの仕組みJavaはどのように動くのか~スライドでわかるJVMの仕組み
Javaはどのように動くのか~スライドでわかるJVMの仕組み
 

Viewers also liked

JavaOne 2011 - JVM Bytecode for Dummies
JavaOne 2011 - JVM Bytecode for DummiesJavaOne 2011 - JVM Bytecode for Dummies
JavaOne 2011 - JVM Bytecode for DummiesCharles Nutter
 
Fast as C: How to Write Really Terrible Java
Fast as C: How to Write Really Terrible JavaFast as C: How to Write Really Terrible Java
Fast as C: How to Write Really Terrible JavaCharles Nutter
 
Game of Performance: A Song of JIT and GC
Game of Performance: A Song of JIT and GCGame of Performance: A Song of JIT and GC
Game of Performance: A Song of JIT and GCMonica Beckwith
 
Down the Rabbit Hole: An Adventure in JVM Wonderland
Down the Rabbit Hole: An Adventure in JVM WonderlandDown the Rabbit Hole: An Adventure in JVM Wonderland
Down the Rabbit Hole: An Adventure in JVM WonderlandCharles Nutter
 
The Performance Engineer's Guide To (OpenJDK) HotSpot Garbage Collection - Th...
The Performance Engineer's Guide To (OpenJDK) HotSpot Garbage Collection - Th...The Performance Engineer's Guide To (OpenJDK) HotSpot Garbage Collection - Th...
The Performance Engineer's Guide To (OpenJDK) HotSpot Garbage Collection - Th...Monica Beckwith
 
JFokus Java 9 contended locking performance
JFokus Java 9 contended locking performanceJFokus Java 9 contended locking performance
JFokus Java 9 contended locking performanceMonica Beckwith
 
Java Performance Engineer's Survival Guide
Java Performance Engineer's Survival GuideJava Performance Engineer's Survival Guide
Java Performance Engineer's Survival GuideMonica Beckwith
 
LISA2010 visualizations
LISA2010 visualizationsLISA2010 visualizations
LISA2010 visualizationsBrendan Gregg
 
The New Systems Performance
The New Systems PerformanceThe New Systems Performance
The New Systems PerformanceBrendan Gregg
 
Linux Performance Tools 2014
Linux Performance Tools 2014Linux Performance Tools 2014
Linux Performance Tools 2014Brendan Gregg
 
DTrace Topics: Introduction
DTrace Topics: IntroductionDTrace Topics: Introduction
DTrace Topics: IntroductionBrendan Gregg
 
FreeBSD 2014 Flame Graphs
FreeBSD 2014 Flame GraphsFreeBSD 2014 Flame Graphs
FreeBSD 2014 Flame GraphsBrendan Gregg
 
From DTrace to Linux
From DTrace to LinuxFrom DTrace to Linux
From DTrace to LinuxBrendan Gregg
 
Lisa12 methodologies
Lisa12 methodologiesLisa12 methodologies
Lisa12 methodologiesBrendan Gregg
 
Performance Analysis: new tools and concepts from the cloud
Performance Analysis: new tools and concepts from the cloudPerformance Analysis: new tools and concepts from the cloud
Performance Analysis: new tools and concepts from the cloudBrendan Gregg
 
Open Source Systems Performance
Open Source Systems PerformanceOpen Source Systems Performance
Open Source Systems PerformanceBrendan Gregg
 
How does Content go Viral?
How does Content go Viral?How does Content go Viral?
How does Content go Viral?Daniel Howard
 

Viewers also liked (20)

JavaOne 2011 - JVM Bytecode for Dummies
JavaOne 2011 - JVM Bytecode for DummiesJavaOne 2011 - JVM Bytecode for Dummies
JavaOne 2011 - JVM Bytecode for Dummies
 
Presto@Uber
Presto@UberPresto@Uber
Presto@Uber
 
Fast as C: How to Write Really Terrible Java
Fast as C: How to Write Really Terrible JavaFast as C: How to Write Really Terrible Java
Fast as C: How to Write Really Terrible Java
 
Game of Performance: A Song of JIT and GC
Game of Performance: A Song of JIT and GCGame of Performance: A Song of JIT and GC
Game of Performance: A Song of JIT and GC
 
Down the Rabbit Hole: An Adventure in JVM Wonderland
Down the Rabbit Hole: An Adventure in JVM WonderlandDown the Rabbit Hole: An Adventure in JVM Wonderland
Down the Rabbit Hole: An Adventure in JVM Wonderland
 
The Performance Engineer's Guide To (OpenJDK) HotSpot Garbage Collection - Th...
The Performance Engineer's Guide To (OpenJDK) HotSpot Garbage Collection - Th...The Performance Engineer's Guide To (OpenJDK) HotSpot Garbage Collection - Th...
The Performance Engineer's Guide To (OpenJDK) HotSpot Garbage Collection - Th...
 
JFokus Java 9 contended locking performance
JFokus Java 9 contended locking performanceJFokus Java 9 contended locking performance
JFokus Java 9 contended locking performance
 
Java Performance Engineer's Survival Guide
Java Performance Engineer's Survival GuideJava Performance Engineer's Survival Guide
Java Performance Engineer's Survival Guide
 
LISA2010 visualizations
LISA2010 visualizationsLISA2010 visualizations
LISA2010 visualizations
 
DTraceCloud2012
DTraceCloud2012DTraceCloud2012
DTraceCloud2012
 
The New Systems Performance
The New Systems PerformanceThe New Systems Performance
The New Systems Performance
 
Linux Performance Tools 2014
Linux Performance Tools 2014Linux Performance Tools 2014
Linux Performance Tools 2014
 
DTrace Topics: Introduction
DTrace Topics: IntroductionDTrace Topics: Introduction
DTrace Topics: Introduction
 
FreeBSD 2014 Flame Graphs
FreeBSD 2014 Flame GraphsFreeBSD 2014 Flame Graphs
FreeBSD 2014 Flame Graphs
 
From DTrace to Linux
From DTrace to LinuxFrom DTrace to Linux
From DTrace to Linux
 
Lisa12 methodologies
Lisa12 methodologiesLisa12 methodologies
Lisa12 methodologies
 
Performance Analysis: new tools and concepts from the cloud
Performance Analysis: new tools and concepts from the cloudPerformance Analysis: new tools and concepts from the cloud
Performance Analysis: new tools and concepts from the cloud
 
Open Source Systems Performance
Open Source Systems PerformanceOpen Source Systems Performance
Open Source Systems Performance
 
save girl child...
save girl child...save girl child...
save girl child...
 
How does Content go Viral?
How does Content go Viral?How does Content go Viral?
How does Content go Viral?
 

Similar to JavaOne 2012 - JVM JIT for Dummies

Øredev 2011 - JVM JIT for Dummies (What the JVM Does With Your Bytecode When ...
Øredev 2011 - JVM JIT for Dummies (What the JVM Does With Your Bytecode When ...Øredev 2011 - JVM JIT for Dummies (What the JVM Does With Your Bytecode When ...
Øredev 2011 - JVM JIT for Dummies (What the JVM Does With Your Bytecode When ...Charles Nutter
 
Paradigma FP y OOP usando técnicas avanzadas de Programación | Programacion A...
Paradigma FP y OOP usando técnicas avanzadas de Programación | Programacion A...Paradigma FP y OOP usando técnicas avanzadas de Programación | Programacion A...
Paradigma FP y OOP usando técnicas avanzadas de Programación | Programacion A...Víctor Bolinches
 
Java 7 Whats New(), Whats Next() from Oredev
Java 7 Whats New(), Whats Next() from OredevJava 7 Whats New(), Whats Next() from Oredev
Java 7 Whats New(), Whats Next() from OredevMattias Karlsson
 
JRuby and Invokedynamic - Japan JUG 2015
JRuby and Invokedynamic - Japan JUG 2015JRuby and Invokedynamic - Japan JUG 2015
JRuby and Invokedynamic - Japan JUG 2015Charles Nutter
 
Blocks & GCD
Blocks & GCDBlocks & GCD
Blocks & GCDrsebbe
 
Ahead-Of-Time Compilation of Java Applications
Ahead-Of-Time Compilation of Java ApplicationsAhead-Of-Time Compilation of Java Applications
Ahead-Of-Time Compilation of Java ApplicationsNikita Lipsky
 
Silicon Valley JUG: JVM Mechanics
Silicon Valley JUG: JVM MechanicsSilicon Valley JUG: JVM Mechanics
Silicon Valley JUG: JVM MechanicsAzul Systems, Inc.
 
JIT vs. AOT: Unity And Conflict of Dynamic and Static Compilers
JIT vs. AOT: Unity And Conflict of Dynamic and Static Compilers JIT vs. AOT: Unity And Conflict of Dynamic and Static Compilers
JIT vs. AOT: Unity And Conflict of Dynamic and Static Compilers Nikita Lipsky
 
Tips and tricks for building high performance android apps using native code
Tips and tricks for building high performance android apps using native codeTips and tricks for building high performance android apps using native code
Tips and tricks for building high performance android apps using native codeKenneth Geisshirt
 
Javascript Everywhere
Javascript EverywhereJavascript Everywhere
Javascript EverywherePascal Rettig
 
.NET Multithreading and File I/O
.NET Multithreading and File I/O.NET Multithreading and File I/O
.NET Multithreading and File I/OJussi Pohjolainen
 
GOTO Night with Charles Nutter Slides
GOTO Night with Charles Nutter SlidesGOTO Night with Charles Nutter Slides
GOTO Night with Charles Nutter SlidesAlexandra Masterson
 
A topology of memory leaks on the JVM
A topology of memory leaks on the JVMA topology of memory leaks on the JVM
A topology of memory leaks on the JVMRafael Winterhalter
 
JRuby 9000 - Optimizing Above the JVM
JRuby 9000 - Optimizing Above the JVMJRuby 9000 - Optimizing Above the JVM
JRuby 9000 - Optimizing Above the JVMCharles Nutter
 
Atmosphere 2016 - Krzysztof Kaczmarek - Don't fear the brackets - Clojure in ...
Atmosphere 2016 - Krzysztof Kaczmarek - Don't fear the brackets - Clojure in ...Atmosphere 2016 - Krzysztof Kaczmarek - Don't fear the brackets - Clojure in ...
Atmosphere 2016 - Krzysztof Kaczmarek - Don't fear the brackets - Clojure in ...PROIDEA
 

Similar to JavaOne 2012 - JVM JIT for Dummies (20)

Øredev 2011 - JVM JIT for Dummies (What the JVM Does With Your Bytecode When ...
Øredev 2011 - JVM JIT for Dummies (What the JVM Does With Your Bytecode When ...Øredev 2011 - JVM JIT for Dummies (What the JVM Does With Your Bytecode When ...
Øredev 2011 - JVM JIT for Dummies (What the JVM Does With Your Bytecode When ...
 
Paradigma FP y OOP usando técnicas avanzadas de Programación | Programacion A...
Paradigma FP y OOP usando técnicas avanzadas de Programación | Programacion A...Paradigma FP y OOP usando técnicas avanzadas de Programación | Programacion A...
Paradigma FP y OOP usando técnicas avanzadas de Programación | Programacion A...
 
Java 7 Whats New(), Whats Next() from Oredev
Java 7 Whats New(), Whats Next() from OredevJava 7 Whats New(), Whats Next() from Oredev
Java 7 Whats New(), Whats Next() from Oredev
 
JRuby and Invokedynamic - Japan JUG 2015
JRuby and Invokedynamic - Japan JUG 2015JRuby and Invokedynamic - Japan JUG 2015
JRuby and Invokedynamic - Japan JUG 2015
 
Blocks & GCD
Blocks & GCDBlocks & GCD
Blocks & GCD
 
Ahead-Of-Time Compilation of Java Applications
Ahead-Of-Time Compilation of Java ApplicationsAhead-Of-Time Compilation of Java Applications
Ahead-Of-Time Compilation of Java Applications
 
Nodejs - A quick tour (v6)
Nodejs - A quick tour (v6)Nodejs - A quick tour (v6)
Nodejs - A quick tour (v6)
 
Silicon Valley JUG: JVM Mechanics
Silicon Valley JUG: JVM MechanicsSilicon Valley JUG: JVM Mechanics
Silicon Valley JUG: JVM Mechanics
 
JIT vs. AOT: Unity And Conflict of Dynamic and Static Compilers
JIT vs. AOT: Unity And Conflict of Dynamic and Static Compilers JIT vs. AOT: Unity And Conflict of Dynamic and Static Compilers
JIT vs. AOT: Unity And Conflict of Dynamic and Static Compilers
 
Tips and tricks for building high performance android apps using native code
Tips and tricks for building high performance android apps using native codeTips and tricks for building high performance android apps using native code
Tips and tricks for building high performance android apps using native code
 
Jvm memory model
Jvm memory modelJvm memory model
Jvm memory model
 
Javascript Everywhere
Javascript EverywhereJavascript Everywhere
Javascript Everywhere
 
Why learn Internals?
Why learn Internals?Why learn Internals?
Why learn Internals?
 
.NET Multithreading and File I/O
.NET Multithreading and File I/O.NET Multithreading and File I/O
.NET Multithreading and File I/O
 
GOTO Night with Charles Nutter Slides
GOTO Night with Charles Nutter SlidesGOTO Night with Charles Nutter Slides
GOTO Night with Charles Nutter Slides
 
A topology of memory leaks on the JVM
A topology of memory leaks on the JVMA topology of memory leaks on the JVM
A topology of memory leaks on the JVM
 
JRuby 9000 - Optimizing Above the JVM
JRuby 9000 - Optimizing Above the JVMJRuby 9000 - Optimizing Above the JVM
JRuby 9000 - Optimizing Above the JVM
 
Atmosphere 2016 - Krzysztof Kaczmarek - Don't fear the brackets - Clojure in ...
Atmosphere 2016 - Krzysztof Kaczmarek - Don't fear the brackets - Clojure in ...Atmosphere 2016 - Krzysztof Kaczmarek - Don't fear the brackets - Clojure in ...
Atmosphere 2016 - Krzysztof Kaczmarek - Don't fear the brackets - Clojure in ...
 
Java
JavaJava
Java
 
Java Language fundamental
Java Language fundamentalJava Language fundamental
Java Language fundamental
 

More from Charles Nutter

The Year of JRuby - RubyC 2018
The Year of JRuby - RubyC 2018The Year of JRuby - RubyC 2018
The Year of JRuby - RubyC 2018Charles Nutter
 
Ruby Performance - The Last Mile - RubyConf India 2016
Ruby Performance - The Last Mile - RubyConf India 2016Ruby Performance - The Last Mile - RubyConf India 2016
Ruby Performance - The Last Mile - RubyConf India 2016Charles Nutter
 
JRuby 9000 - Taipei Ruby User's Group 2015
JRuby 9000 - Taipei Ruby User's Group 2015JRuby 9000 - Taipei Ruby User's Group 2015
JRuby 9000 - Taipei Ruby User's Group 2015Charles Nutter
 
Open Source Software Needs You!
Open Source Software Needs You!Open Source Software Needs You!
Open Source Software Needs You!Charles Nutter
 
InvokeBinder: Fluent Programming for Method Handles
InvokeBinder: Fluent Programming for Method HandlesInvokeBinder: Fluent Programming for Method Handles
InvokeBinder: Fluent Programming for Method HandlesCharles Nutter
 
Over 9000: JRuby in 2015
Over 9000: JRuby in 2015Over 9000: JRuby in 2015
Over 9000: JRuby in 2015Charles Nutter
 
Doing Open Source the Right Way
Doing Open Source the Right WayDoing Open Source the Right Way
Doing Open Source the Right WayCharles Nutter
 
Bringing Concurrency to Ruby - RubyConf India 2014
Bringing Concurrency to Ruby - RubyConf India 2014Bringing Concurrency to Ruby - RubyConf India 2014
Bringing Concurrency to Ruby - RubyConf India 2014Charles Nutter
 
Beyond JVM - YOW! Sydney 2013
Beyond JVM - YOW! Sydney 2013Beyond JVM - YOW! Sydney 2013
Beyond JVM - YOW! Sydney 2013Charles Nutter
 
Beyond JVM - YOW! Brisbane 2013
Beyond JVM - YOW! Brisbane 2013Beyond JVM - YOW! Brisbane 2013
Beyond JVM - YOW! Brisbane 2013Charles Nutter
 
Beyond JVM - YOW Melbourne 2013
Beyond JVM - YOW Melbourne 2013Beyond JVM - YOW Melbourne 2013
Beyond JVM - YOW Melbourne 2013Charles Nutter
 
The Future of JRuby - Baruco 2013
The Future of JRuby - Baruco 2013The Future of JRuby - Baruco 2013
The Future of JRuby - Baruco 2013Charles Nutter
 
High Performance Ruby - E4E Conference 2013
High Performance Ruby - E4E Conference 2013High Performance Ruby - E4E Conference 2013
High Performance Ruby - E4E Conference 2013Charles Nutter
 
Invokedynamic in 45 Minutes
Invokedynamic in 45 MinutesInvokedynamic in 45 Minutes
Invokedynamic in 45 MinutesCharles Nutter
 
Invokedynamic: Tales from the Trenches
Invokedynamic: Tales from the TrenchesInvokedynamic: Tales from the Trenches
Invokedynamic: Tales from the TrenchesCharles Nutter
 
Why JRuby? - RubyConf 2012
Why JRuby? - RubyConf 2012Why JRuby? - RubyConf 2012
Why JRuby? - RubyConf 2012Charles Nutter
 
Aloha RubyConf 2012 - JRuby
Aloha RubyConf 2012 - JRubyAloha RubyConf 2012 - JRuby
Aloha RubyConf 2012 - JRubyCharles Nutter
 
High Performance Ruby - Golden Gate RubyConf 2012
High Performance Ruby - Golden Gate RubyConf 2012High Performance Ruby - Golden Gate RubyConf 2012
High Performance Ruby - Golden Gate RubyConf 2012Charles Nutter
 

More from Charles Nutter (20)

The Year of JRuby - RubyC 2018
The Year of JRuby - RubyC 2018The Year of JRuby - RubyC 2018
The Year of JRuby - RubyC 2018
 
Ruby Performance - The Last Mile - RubyConf India 2016
Ruby Performance - The Last Mile - RubyConf India 2016Ruby Performance - The Last Mile - RubyConf India 2016
Ruby Performance - The Last Mile - RubyConf India 2016
 
JRuby 9000 - Taipei Ruby User's Group 2015
JRuby 9000 - Taipei Ruby User's Group 2015JRuby 9000 - Taipei Ruby User's Group 2015
JRuby 9000 - Taipei Ruby User's Group 2015
 
Open Source Software Needs You!
Open Source Software Needs You!Open Source Software Needs You!
Open Source Software Needs You!
 
InvokeBinder: Fluent Programming for Method Handles
InvokeBinder: Fluent Programming for Method HandlesInvokeBinder: Fluent Programming for Method Handles
InvokeBinder: Fluent Programming for Method Handles
 
Over 9000: JRuby in 2015
Over 9000: JRuby in 2015Over 9000: JRuby in 2015
Over 9000: JRuby in 2015
 
Doing Open Source the Right Way
Doing Open Source the Right WayDoing Open Source the Right Way
Doing Open Source the Right Way
 
JRuby: The Hard Parts
JRuby: The Hard PartsJRuby: The Hard Parts
JRuby: The Hard Parts
 
Bringing Concurrency to Ruby - RubyConf India 2014
Bringing Concurrency to Ruby - RubyConf India 2014Bringing Concurrency to Ruby - RubyConf India 2014
Bringing Concurrency to Ruby - RubyConf India 2014
 
Beyond JVM - YOW! Sydney 2013
Beyond JVM - YOW! Sydney 2013Beyond JVM - YOW! Sydney 2013
Beyond JVM - YOW! Sydney 2013
 
Beyond JVM - YOW! Brisbane 2013
Beyond JVM - YOW! Brisbane 2013Beyond JVM - YOW! Brisbane 2013
Beyond JVM - YOW! Brisbane 2013
 
Beyond JVM - YOW Melbourne 2013
Beyond JVM - YOW Melbourne 2013Beyond JVM - YOW Melbourne 2013
Beyond JVM - YOW Melbourne 2013
 
Down the Rabbit Hole
Down the Rabbit HoleDown the Rabbit Hole
Down the Rabbit Hole
 
The Future of JRuby - Baruco 2013
The Future of JRuby - Baruco 2013The Future of JRuby - Baruco 2013
The Future of JRuby - Baruco 2013
 
High Performance Ruby - E4E Conference 2013
High Performance Ruby - E4E Conference 2013High Performance Ruby - E4E Conference 2013
High Performance Ruby - E4E Conference 2013
 
Invokedynamic in 45 Minutes
Invokedynamic in 45 MinutesInvokedynamic in 45 Minutes
Invokedynamic in 45 Minutes
 
Invokedynamic: Tales from the Trenches
Invokedynamic: Tales from the TrenchesInvokedynamic: Tales from the Trenches
Invokedynamic: Tales from the Trenches
 
Why JRuby? - RubyConf 2012
Why JRuby? - RubyConf 2012Why JRuby? - RubyConf 2012
Why JRuby? - RubyConf 2012
 
Aloha RubyConf 2012 - JRuby
Aloha RubyConf 2012 - JRubyAloha RubyConf 2012 - JRuby
Aloha RubyConf 2012 - JRuby
 
High Performance Ruby - Golden Gate RubyConf 2012
High Performance Ruby - Golden Gate RubyConf 2012High Performance Ruby - Golden Gate RubyConf 2012
High Performance Ruby - Golden Gate RubyConf 2012
 

JavaOne 2012 - JVM JIT for Dummies

  • 1. JVM JIT for Dummies And the rest of you, too.
  • 2. Intro • Charles Oliver Nutter • “JRuby Guy” • Sun Microsystems 2006-2009 • Engine Yard 2009-2012 • Red Hat 2012- • Primarily responsible for compiler, perf • Looking inside JVM
  • 3. What We Will Learn • How the JVM’s JIT works • Monitoring the JIT • Finding problems • Dumping assembly (don’t be scared!)
  • 4. What We Won’t • GC tuning • GC monitoring with VisualVM • Google ‘visualgc’, it’s awesome • OpenJDK internals • JNI
  • 5. Caveat • Focusing on OpenJDK (Hotspot) • Other JVMs will do things differently • But base principals usually apply • Flags are specific to Hotspot • Internal, subject to change, etc
  • 6. JIT • Just-In-Time compilation • Compiled when needed • Maybe immediately before execution • ...or when we decide it’s important • ...or never?
  • 7. Mixed-Mode • Interpreted • Bytecode-walking • Artificial stack machine • Compiled • Direct native operations • Native register machine
  • 8. Profiling • Gather data about code while interpreting • Invariants (types, constants, nulls) • Statistics (branches, calls) • Use that information to optimize • Educated guess • Guess can be wrong...
  • 9. The Golden Rule of Optimization Don’t do unnecessary work.
  • 10. Optimization • Method inlining • Loop unrolling • Lock coarsening/eliding • Dead code elimination • Duplicate code elimination • Escape analysis
  • 11. Inlining? • Combine caller and callee into one unit • e.g. based on profile • Perhaps with a guard/test • Optimize as a whole • More code means better visibility
  • 12. Inlining int addAll(int max) { int accum = 0; for (int i = 0; i < max; i++) { accum = add(accum, i); } return accum; } int add(int a, int b) { return a + b; }
  • 13. Inlining int addAll(int max) { int accum = 0; for (int i = 0; i < max; i++) { accum = add(accum, i); } return accum; Only one target is ever seen } int add(int a, int b) { return a + b; }
  • 14. Inlining int addAll(int max) { int accum = 0; for (int i = 0; i < max; i++) { accum = accum + i; } return accum; Don’t bother making the call }
  • 15. Loop Unrolling • Works for small, constant loops • Avoid tests, branching • Allow inlining a single call as many
  • 16. Loop Unrolling private static final String[] options = { "yes", "no", "maybe"}; public void looper() { for (String option : options) { process(option); } } Small loop, constant stride, constant size
  • 17. Loop Unrolling private static final String[] options = { "yes", "no", "maybe"}; public void looper() { process(options[0]); process(options[1]); Unrolled! process(options[2]); }
  • 18. Lock Coarsening public void needsLocks() { for (option : options) { process(option); } Repeatedly locking } private synchronized String process(String option) { // some wacky thread-unsafe code }
  • 19. Lock Coarsening public void needsLocks() { Lock once synchronized (this) { for (option : options) { // some wacky thread-unsafe code } } }
  • 20. Lock Eliding public void overCautious() { Synchronize on List l = new ArrayList(); synchronized (l) { new Object for (option : options) { l.add(process(option)); } } } But we know it never escapes this thread...
  • 21. Lock Eliding public void overCautious() { List l = new ArrayList(); for (option : options) { l.add( /* process()’s code */); } } No need to lock
  • 22. Escape Analysis private static class Foo { public final String a; public final String b; Foo(String a, String b) { this.a = a; this.b = b; } }
  • 23. Escape Analysis public void bar() { Foo f = new Foo("Hello", "JVM"); baz(f); } public void baz(Foo f) { Same object all System.out.print(f.a); System.out.print(", "); the way through quux(f); } Never “escapes” public void quux(Foo f) { these methods System.out.print(f.b); System.out.println('!'); }
  • 24. Escape Analysis public secret awesome inlinedBarBazQuux() { System.out.print("Hello"); System.out.print(", "); System.out.print("JavaOne"); System.out.println('!'); } Don’t bother allocating Foo object
  • 25. Escape Analysis • A bit tweaky on Hotspot • All paths must inline • No external view of object • JRockit was better here? • Now they can fix Hotspot!
  • 26. Perf Sinks • Memory accesses • By far the biggest expense • Calls • Memory ref + branch kills pipeline • Call stack, register juggling costs • Locks
  • 27. Volatile? • Each CPU maintains a memory cache • Caches may be out of sync • If it doesn’t matter, no problem • If it does matter, threads disagree! • Volatile forces synchronization of cache • Across cores and to main memory
  • 28. Call Site • The place where you make a call • Monomorphic (“one shape”) • Single target class • Bimorphic (“two shapes”) • Polymorphic (“many shapes”) • Megamorphic (“you’re screwed”)
  • 29. Blah.java System.currentTimeMillis(); // static, monomorphic List list1 = new ArrayList(); // constructor, monomorphic List list2 = new LinkedList(); for (List list : new List[]{ list1, list2 }) { list.add("hello"); // bimorphic } for (Object obj : new Object[]{ 'foo', list1, new Object() }) { obj.toString(); // polymorphic }
  • 30. Hotspot • -client mode (C1) inlines, less aggressive • Fewer opportunities to optimize • -server mode (C2) inlines aggressively • Based on richer runtime profiling
  • 31. Tiered • Increasing tiers of interp, C1, and C2 • Level 0 = Interpreter • Level 1-3 = C1 • Level 4 = C2 • Kinda sorta works...
  • 32. system ~/projects/javaone2012-jit $ (pickjdk 4 ; time jruby -e 1) New JDK: jdk1.7.0_07.jdk real 0m1.251s user 0m2.128s sys m0.093s 0 system ~/projects/javaone2012-jit $ (pickjdk 5 ; time jruby -e 1) New JDK: jdk1.8.0.jdk real 0m1.167s user 0m2.767s sys m0.143s 0 system ~/projects/javaone2012-jit $ (pickjdk 5 ; time jruby -J-XX:TieredStopAtLevel=1 -e 1) New JDK: jdk1.8.0.jdk real 0m0.850s user 0m1.344s sys m0.114s 0
  • 33. C2 Compiler • Profile to find “hot spots” • Call sites • Branch statistics • Profile until 10k calls • Inline mono/bimorphic calls • Other mechanisms for polymorphic calls
  • 34. Now it gets fun!
  • 35. Monitoring the JIT • Dozens of flags • Reams of output • Always evolving • How can you understand it?
  • 36. public class Accumulator { public static void main(String[] args) { int max = Integer.parseInt(args[0]); System.out.println(addAll(max)); } static int addAll(int max) { int accum = 0; for (int i = 0; i < max; i++) { accum = add(accum, i); } return accum; } static int add(int a, int b) { return a + b; } }
  • 37. $ java -version openjdk version "1.7.0-b147" OpenJDK Runtime Environment (build 1.7.0- b147-20110927) OpenJDK 64-Bit Server VM (build 21.0-b17, mixed mode) $ javac Accumulator.java $ java Accumulator 1000 499500
  • 38. Print Compilation • -XX:+PrintCompilation • Print methods as they JIT • Class + name + size
  • 39. $ java -XX:+PrintCompilation Accumulator 1000 53 1 java.lang.String::hashCode (67 bytes) 499500
  • 40. $ java -XX:+PrintCompilation Accumulator 1000 53 1 java.lang.String::hashCode (67 bytes) 499500 Where’s our code?
  • 41. $ java -XX:+PrintCompilation Accumulator 1000 53 1 java.lang.String::hashCode (67 bytes) 499500 Where’s our code? Remember...10k calls before JIT
  • 42. 10k loop, 10k calls to add $ java -XX:+PrintCompilation Accumulator 10000 53 1 java.lang.String::hashCode (67 bytes) 64 2 Accumulator::add (4 bytes) 49995000 Hooray!
  • 43. But what’s this? $ java -XX:+PrintCompilation Accumulator 10000 53 1 java.lang.String::hashCode (67 bytes) 64 2 Accumulator::add (4 bytes) 49995000 Class loading, security logic, other stuff...
  • 44. Hotspot is making zombies? 1401 70 java.util.concurrent.ConcurrentHashMap::hash (49 bytes) 1412 71 java.lang.String::indexOf (7 bytes) 1420 72 ! java.io.BufferedReader::readLine (304 bytes) 1420 73 sun.nio.cs.UTF_8$Decoder::decodeArrayLoop (543 bytes) 1422 42 java.util.zip.ZipCoder::getBytes (192 bytes) made not entrant 1435 74 n java.lang.Object::hashCode (0 bytes) 1443 29 ! sun.misc.URLClassPath$JarLoader::getResource (91 bytes) made zombie 1443 25 sun.misc.URLClassPath::getResource (74 bytes) made zombie 1443 36 sun.misc.URLClassPath::getResource (74 bytes) made not entrant 1443 43 java.util.zip.ZipCoder::encoder (35 bytes) made not entrant 1449 75 java.lang.String::endsWith (15 bytes) 1631 1 % sun.misc.URLClassPath::getResource @ 39 (74 bytes) 1665 76 java.lang.ClassLoader::checkName (43 bytes)
  • 45. Hotspot is making zombies? 1401 70 java.util.concurrent.ConcurrentHashMap::hash (49 bytes) 1412 71 java.lang.String::indexOf (7 bytes) 1420 72 ! java.io.BufferedReader::readLine (304 bytes) 1420 73 sun.nio.cs.UTF_8$Decoder::decodeArrayLoop (543 bytes) 1422 42 java.util.zip.ZipCoder::getBytes (192 bytes) made not entrant 1435 74 n java.lang.Object::hashCode (0 bytes) 1443 29 ! sun.misc.URLClassPath$JarLoader::getResource (91 bytes) made zombie 1443 25 sun.misc.URLClassPath::getResource (74 bytes) made zombie 1443 36 sun.misc.URLClassPath::getResource (74 bytes) made not entrant 1443 43 java.util.zip.ZipCoder::encoder (35 bytes) made not entrant 1449 75 java.lang.String::endsWith (15 bytes) 1631 1 % sun.misc.URLClassPath::getResource @ 39 (74 bytes) 1665 76 java.lang.ClassLoader::checkName (43 bytes) Not entrant? What the heck?
  • 46. Optimistic Compilers • Assume profile is accurate • Aggressively optimize based on profile • Bail out if we’re wrong • ...and hope that we’re usually right
  • 47. Deoptimization • Bail out of running code • Monitoring flags describe process • “uncommon trap” - something’s changed • “not entrant” - don’t let new calls enter • “zombie” - on its way to deadness
  • 49. JRuby red_black perf 4s Most code not JITed yet 3s 2s 1s 0s
  • 50. JRuby red_black perf 4s Most code not JITed yet 3s Back off 2s 1s 0s
  • 51. JRuby red_black perf 4s Most code not JITed yet 3s Back off Back off 2s 1s 0s
  • 52. No JIT At All? • Code is too big • Code isn’t called enough
  • 53. That looks exciting! 1401 70 java.util.concurrent.ConcurrentHashMap::hash (49 bytes) 1412 71 java.lang.String::indexOf (7 bytes) 1420 72 ! java.io.BufferedReader::readLine (304 bytes) 1420 73 sun.nio.cs.UTF_8$Decoder::decodeArrayLoop (543 bytes) 1422 42 java.util.zip.ZipCoder::getBytes (192 bytes) made not entrant 1435 74 n java.lang.Object::hashCode (0 bytes) 1443 29 ! sun.misc.URLClassPath$JarLoader::getResource (91 bytes) made zombie 1443 25 sun.misc.URLClassPath::getResource (74 bytes) made zombie 1443 36 sun.misc.URLClassPath::getResource (74 bytes) made not entrant 1443 43 java.util.zip.ZipCoder::encoder (35 bytes) made not entrant 1449 75 java.lang.String::endsWith (15 bytes) 1631 1 % sun.misc.URLClassPath::getResource @ 39 (74 bytes) 1665 76 java.lang.ClassLoader::checkName (43 bytes)
  • 54. Exception handling in here (boring!) 1401 70 java.util.concurrent.ConcurrentHashMap::hash (49 bytes) 1412 71 java.lang.String::indexOf (7 bytes) 1420 72 ! java.io.BufferedReader::readLine (304 bytes) 1420 73 sun.nio.cs.UTF_8$Decoder::decodeArrayLoop (543 bytes) 1422 42 java.util.zip.ZipCoder::getBytes (192 bytes) made not entrant 1435 74 n java.lang.Object::hashCode (0 bytes) 1443 29 ! sun.misc.URLClassPath$JarLoader::getResource (91 bytes) made zombie 1443 25 sun.misc.URLClassPath::getResource (74 bytes) made zombie 1443 36 sun.misc.URLClassPath::getResource (74 bytes) made not entrant 1443 43 java.util.zip.ZipCoder::encoder (35 bytes) made not entrant 1449 75 java.lang.String::endsWith (15 bytes) 1631 1 % sun.misc.URLClassPath::getResource @ 39 (74 bytes) 1665 76 java.lang.ClassLoader::checkName (43 bytes)
  • 55. Exception Handling • Unroll stack until someone stops us • Handler gets registered in JVM • Different treatment by JIT • Inlined throw + catch = jump • If no stack trace, essentially free
  • 56. What’s this “n” all about? 1401 70 java.util.concurrent.ConcurrentHashMap::hash (49 bytes) 1412 71 java.lang.String::indexOf (7 bytes) 1420 72 ! java.io.BufferedReader::readLine (304 bytes) 1420 73 sun.nio.cs.UTF_8$Decoder::decodeArrayLoop (543 bytes) 1422 42 java.util.zip.ZipCoder::getBytes (192 bytes) made not entrant 1435 74 n java.lang.Object::hashCode (0 bytes) 1443 29 ! sun.misc.URLClassPath$JarLoader::getResource (91 bytes) made zombie 1443 25 sun.misc.URLClassPath::getResource (74 bytes) made zombie 1443 36 sun.misc.URLClassPath::getResource (74 bytes) made not entrant 1443 43 java.util.zip.ZipCoder::encoder (35 bytes) made not entrant 1449 75 java.lang.String::endsWith (15 bytes) 1631 1 % sun.misc.URLClassPath::getResource @ 39 (74 bytes) 1665 76 java.lang.ClassLoader::checkName (43 bytes)
  • 57. This method is native 1401 70 java.util.concurrent.ConcurrentHashMap::hash (49 bytes) 1412 71 java.lang.String::indexOf (7 bytes) 1420 72 ! java.io.BufferedReader::readLine (304 bytes) 1420 73 sun.nio.cs.UTF_8$Decoder::decodeArrayLoop (543 bytes) 1422 42 java.util.zip.ZipCoder::getBytes (192 bytes) made not entrant 1435 74 n java.lang.Object::hashCode (0 bytes) 1443 29 ! sun.misc.URLClassPath$JarLoader::getResource (91 bytes) made zombie 1443 25 sun.misc.URLClassPath::getResource (74 bytes) made zombie 1443 36 sun.misc.URLClassPath::getResource (74 bytes) made not entrant 1443 43 java.util.zip.ZipCoder::encoder (35 bytes) made not entrant 1449 75 java.lang.String::endsWith (15 bytes) 1631 1 % sun.misc.URLClassPath::getResource @ 39 (74 bytes) 1665 76 java.lang.ClassLoader::checkName (43 bytes)
  • 58. And this one? 1401 70 java.util.concurrent.ConcurrentHashMap::hash (49 bytes) 1412 71 java.lang.String::indexOf (7 bytes) 1420 72 ! java.io.BufferedReader::readLine (304 bytes) 1420 73 sun.nio.cs.UTF_8$Decoder::decodeArrayLoop (543 bytes) 1422 42 java.util.zip.ZipCoder::getBytes (192 bytes) made not entrant 1435 74 n java.lang.Object::hashCode (0 bytes) 1443 29 ! sun.misc.URLClassPath$JarLoader::getResource (91 bytes) made zombie 1443 25 sun.misc.URLClassPath::getResource (74 bytes) made zombie 1443 36 sun.misc.URLClassPath::getResource (74 bytes) made not entrant 1443 43 java.util.zip.ZipCoder::encoder (35 bytes) made not entrant 1449 75 java.lang.String::endsWith (15 bytes) 1631 1 % sun.misc.URLClassPath::getResource @ 39 (74 bytes) 1665 76 java.lang.ClassLoader::checkName (43 bytes) Method has been replaced while running (OSR)
  • 59. On-Stack Replacement • Running method never exits? • But it’s getting really hot? • Generally means loops, back-branching • Compile and replace while running • Not typically useful in large systems • Looks great on benchmarks!
  • 60. public class Accumulator { public static void main(String[] args) { int max = Integer.parseInt(args[0]); System.out.println(addAll(max)); } addAll never exits... static int addAll(int max) { int accum = 0; loops until end for (int i = 0; i < max; i++) { accum = add(accum, i); } return accum; } static int add(int a, int b) { return a + b; } }
  • 61. system ~/projects/javaone2012-jit $ java -XX:+PrintCompilation Accumulator1 1000 63 1 java.lang.String::hashCode (55 bytes) 499500 system ~/projects/javaone2012-jit $ java -XX:+PrintCompilation Accumulator1 10000 63 1 java.lang.String::hashCode (55 bytes) 74 2 Accumulator1::add (4 bytes) 49995000 system ~/projects/javaone2012-jit $ java -XX:+PrintCompilation Accumulator1 100000 62 1 java.lang.String::hashCode (55 bytes) 73 2 Accumulator1::add (4 bytes) 74 1 % Accumulator1::addAll @ 4 (23 bytes) 704982704
  • 62. Millis from JVM start 1401 70 java.util.concurrent.ConcurrentHashMap::hash (49 bytes) 1412 71 java.lang.String::indexOf (7 bytes) 1420 72 ! java.io.BufferedReader::readLine (304 bytes) 1420 73 sun.nio.cs.UTF_8$Decoder::decodeArrayLoop (543 bytes) 1422 42 java.util.zip.ZipCoder::getBytes (192 bytes) made not entrant 1435 74 n java.lang.Object::hashCode (0 bytes) 1443 29 ! sun.misc.URLClassPath$JarLoader::getResource (91 bytes) made zombie 1443 25 sun.misc.URLClassPath::getResource (74 bytes) made zombie 1443 36 sun.misc.URLClassPath::getResource (74 bytes) made not entrant 1443 43 java.util.zip.ZipCoder::encoder (35 bytes) made not entrant 1449 75 java.lang.String::endsWith (15 bytes) 1631 1 % sun.misc.URLClassPath::getResource @ 39 (74 bytes) 1665 76 java.lang.ClassLoader::checkName (43 bytes) Sequence number of compilation
  • 63. Compiling 1401 70 java.util.concurrent.ConcurrentHashMap::hash (49 bytes) 1412 71 java.lang.String::indexOf (7 bytes) 1420 72 ! java.io.BufferedReader::readLine (304 bytes) 1420 73 sun.nio.cs.UTF_8$Decoder::decodeArrayLoop (543 bytes) 1422 42 java.util.zip.ZipCoder::getBytes (192 bytes) made not entrant 1435 74 n java.lang.Object::hashCode (0 bytes) 1443 29 ! sun.misc.URLClassPath$JarLoader::getResource (91 bytes) made zombie 1443 25 sun.misc.URLClassPath::getResource (74 bytes) made zombie 1443 36 sun.misc.URLClassPath::getResource (74 bytes) made not entrant 1443 43 java.util.zip.ZipCoder::encoder (35 bytes) made not entrant 1449 75 java.lang.String::endsWith (15 bytes) 1631 1 % sun.misc.URLClassPath::getResource @ 39 (74 bytes) 1665 76 java.lang.ClassLoader::checkName (43 bytes)
  • 64. Backing Off 1401 70 java.util.concurrent.ConcurrentHashMap::hash (49 bytes) 1412 71 java.lang.String::indexOf (7 bytes) 1420 72 ! java.io.BufferedReader::readLine (304 bytes) 1420 73 sun.nio.cs.UTF_8$Decoder::decodeArrayLoop (543 bytes) 1422 42 java.util.zip.ZipCoder::getBytes (192 bytes) made not entrant 1435 74 n java.lang.Object::hashCode (0 bytes) 1443 29 ! sun.misc.URLClassPath$JarLoader::getResource (91 bytes) made zombie 1443 25 sun.misc.URLClassPath::getResource (74 bytes) made zombie 1443 36 sun.misc.URLClassPath::getResource (74 bytes) made not entrant 1443 43 java.util.zip.ZipCoder::encoder (35 bytes) made not entrant 1449 75 java.lang.String::endsWith (15 bytes) 1631 1 % sun.misc.URLClassPath::getResource @ 39 (74 bytes) 1665 76 java.lang.ClassLoader::checkName (43 bytes)
  • 65. OSR 1401 70 java.util.concurrent.ConcurrentHashMap::hash (49 bytes) 1412 71 java.lang.String::indexOf (7 bytes) 1420 72 ! java.io.BufferedReader::readLine (304 bytes) 1420 73 sun.nio.cs.UTF_8$Decoder::decodeArrayLoop (543 bytes) 1422 42 java.util.zip.ZipCoder::getBytes (192 bytes) made not entrant 1435 74 n java.lang.Object::hashCode (0 bytes) 1443 29 ! sun.misc.URLClassPath$JarLoader::getResource (91 bytes) made zombie 1443 25 sun.misc.URLClassPath::getResource (74 bytes) made zombie 1443 36 sun.misc.URLClassPath::getResource (74 bytes) made not entrant 1443 43 java.util.zip.ZipCoder::encoder (35 bytes) made not entrant 1449 75 java.lang.String::endsWith (15 bytes) 1631 1 % sun.misc.URLClassPath::getResource @ 39 (74 bytes) 1665 76 java.lang.ClassLoader::checkName (43 bytes)
  • 66. system ~/projects/javaone2012-jit $ java -XX:+PrintCompilation -XX:+TieredCompilation Accumulator1 1000 55 1 3 java.lang.String::charAt (29 bytes) 57 2 3 java.lang.String::hashCode (55 bytes) 57 3 3 java.lang.Object::<init> (1 bytes) 57 4 n 0 java.lang.System::arraycopy (0 bytes) (static) 57 5 3 java.lang.String::indexOf (70 bytes) 57 6 3 java.lang.String::length (6 bytes) 58 7 3 java.lang.AbstractStringBuilder::ensureCapacityInternal (16 bytes) 59 8 3 java.lang.String::equals (81 bytes) ... 69 26 3 java.lang.Character::toLowerCase (6 bytes) 69 27 3 java.lang.AbstractStringBuilder::append (48 bytes) 70 28 3 java.lang.String::indexOf (7 bytes) 72 29 4 java.lang.String::charAt (29 bytes) 72 30 3 java.lang.StringBuilder::append (8 bytes) 73 31 1 java.net.URL::getProtocol (5 bytes) 73 32 3 java.lang.String::lastIndexOf (52 bytes) 74 33 3 java.io.UnixFileSystem::normalize (75 bytes) 75 1 3 java.lang.String::charAt (29 bytes) made not entrant 77 36 n 0 java.lang.Thread::currentThread (0 bytes) (static) 77 35 3 Accumulator1::add (4 bytes) 49950 Tier we’re at Only called 1k times
  • 67. Print Inlining • -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining • Display hierarchy of inlined methods • Include reasons for not inlining • More, better output on OpenJDK 7
  • 68. $ java -XX:+UnlockDiagnosticVMOptions > -XX:+PrintInlining > Accumulator 10000 49995000
  • 69. $ java -XX:+UnlockDiagnosticVMOptions > -XX:+PrintInlining > Accumulator 10000 49995000 Um...I don’t see anything inlining
  • 70. public class Accumulator { public static void main(String[] args) { int max = Integer.parseInt(args[0]); System.out.println(addAll(max)); } static int addAll(int max) { Called only once int accum = 0; for (int i = 0; i < max; i++) { accum = add(accum, i); } return accum; } static int add(int a, int b) { return a + b; } }
  • 71. public class Accumulator { public static void main(String[] args) { int max = Integer.parseInt(args[0]); System.out.println(addAll(max)); } static int addAll(int max) { Called only once int accum = 0; for (int i = 0; i < max; i++) { accum = add(accum, i); } return accum; Called 10k times } static int add(int a, int b) { return a + b; } }
  • 72. public class Accumulator { public static void main(String[] args) { int max = Integer.parseInt(args[0]); System.out.println(addAll(max)); } static int addAll(int max) { Called only once int accum = 0; for (int i = 0; i < max; i++) { accum = add(accum, i); } return accum; Called 10k times } static int add(int a, int b) { JITs as expected return a + b; } }
  • 73. public class Accumulator { public static void main(String[] args) { int max = Integer.parseInt(args[0]); System.out.println(addAll(max)); } static int addAll(int max) { Called only once int accum = 0; for (int i = 0; i < max; i++) { accum = add(accum, i); } return accum; Called 10k times } static int add(int a, int b) { JITs as expected return a + b; } } But makes no calls!
  • 74. static double addAllSqrts(int max) { double accum = 0; for (int i = 0; i < max; i++) { accum = addSqrt(accum, i); } return accum; } static int addSqrt(double a, int b) { return a + sqrt(b); } static double sqrt(int a) { return Math.sqrt(b); }
  • 75. $ java -XX:+UnlockDiagnosticVMOptions > -XX:+PrintInlining > -XX:+PrintCompilation > Accumulator 10000 53 1 java.lang.String::hashCode (67 bytes) 65 2 Accumulator::addSqrt (7 bytes) @ 3 Accumulator::sqrt (6 bytes) inline (hot) @ 2 java.lang.Math::sqrt (5 bytes) (intrinsic) 65 3 Accumulator::sqrt (6 bytes) @ 2 java.lang.Math::sqrt (5 bytes) (intrinsic) 666616.4591971082
  • 76. $ java -XX:+UnlockDiagnosticVMOptions > -XX:+PrintInlining HOT HOT HOT! > -XX:+PrintCompilation > Accumulator 10000 53 1 java.lang.String::hashCode (67 bytes) 65 2 Accumulator::addSqrt (7 bytes) @ 3 Accumulator::sqrt (6 bytes) inline (hot) @ 2 java.lang.Math::sqrt (5 bytes) (intrinsic) 65 3 Accumulator::sqrt (6 bytes) @ 2 java.lang.Math::sqrt (5 bytes) (intrinsic) 666616.4591971082
  • 77. $ java -XX:+UnlockDiagnosticVMOptions > -XX:+PrintInlining > -XX:+PrintCompilation > Accumulator 10000 53 1 java.lang.String::hashCode (67 bytes) 65 2 Accumulator::addSqrt (7 bytes) @ 3 Accumulator::sqrt (6 bytes) inline (hot) @ 2 java.lang.Math::sqrt (5 bytes) (intrinsic) 65 3 Accumulator::sqrt (6 bytes) @ 2 java.lang.Math::sqrt (5 bytes) (intrinsic) 666616.4591971082 Calls treated specially by JIT
  • 78. Intrinsic? • Known to the JIT • Don’t inline bytecode • Do insert “best” native code • e.g. kernel-level memory operation • e.g. optimized sqrt in machine code
  • 79. Common Intrinsics • String#equals • Most (all?) Math methods • System.arraycopy • Object#hashCode • Object#getClass • sun.misc.Unsafe methods
  • 80. LogCompilation • -XX:+LogCompilation • Dumps compiler events to hotspot.log • Tons and tons of output
  • 81. scopes_pcs_offset='1384' dependencies_offset='1576' handler_table_offset='1592' nul_chk_table_offset='1736' oops_offset='992' method='org/jruby/lexer/yacc/ByteArrayLexerSource$ByteArrayCursor read ()I' bytes='49' count='5296' backedge_count='1' iicount='10296' stamp='0.412'/> <writer thread='4425007104'/> <nmethod compile_id='21' compiler='C2' entry='4345862528' size='1152' address='4345862160' relocation_offset='288' insts_offset='368' stub_offset='688' scopes_data_offset='840' scopes_pcs_offset='904' dependencies_offset='1016' handler_table_offset='1032' oops_offset='784' method='org/jruby/lexer/yacc/ ByteArrayLexerSource forward (I)I' bytes='111' count='5296' backedge_count='1' iicount='10296' stamp='0.412'/> <writer thread='4300214272'/> <task_queued compile_id='22' method='org/jruby/lexer/yacc/ByteArrayLexerSource read ()I' bytes='10' count='5000' backedge_count='1' iicount='10000' stamp='0.433' comment='count' hot_count='10000'/> <writer thread='4426067968'/> <nmethod compile_id='22' compiler='C2' entry='4345885984' size='1888' address='4345885584' relocation_offset='288' insts_offset='400' stub_offset='912' scopes_data_offset='1104' scopes_pcs_offset='1496' dependencies_offset='1704' handler_table_offset='1720' nul_chk_table_offset='1864' oops_offset='1024' method='org/jruby/lexer/yacc/ByteArrayLexerSource read ()I' bytes='10' count='5044' backedge_count='1' iicount='10044' stamp='0.435'/> <writer thread='4300214272'/> <task_queued compile_id='23' method='java/util/HashMap hash (I)I' bytes='23' count='5000' backedge_count='1' iicount='10000' stamp='0.442' comment='count' hot_count='10000'/> <writer thread='4425007104'/> <nmethod compile_id='23' compiler='C2' entry='4345887808' size='440' address='4345887504' relocation_offset='288' insts_offset='304' stub_offset='368' scopes_data_offset='392' scopes_pcs_offset='400' dependencies_offset='432' method='java/util/HashMap hash (I)I' bytes='23' count='5039' backedge_count='1' iicount='10039' stamp='0.442'/> <writer thread='4300214272'/> <dependency_failed type='abstract_with_unique_concrete_subtype' ctxk='org/jruby/lexer/yacc/LexerSource' x='org/jruby/lexer/yacc/ByteArrayLexerSource' witness='org/jruby/lexer/yacc/InputStreamLexerSource' stamp='0.456'/> <dependency_failed type='abstract_with_unique_concrete_subtype' ctxk='org/jruby/lexer/yacc/LexerSource' x='org/jruby/lexer/yacc/ByteArrayLexerSource' witness='org/jruby/lexer/yacc/InputStreamLexerSource' stamp='0.456'/> <dependency_failed type='abstract_with_unique_concrete_subtype' ctxk='org/jruby/lexer/yacc/LexerSource' x='org/jruby/lexer/yacc/ByteArrayLexerSource' witness='org/jruby/lexer/yacc/InputStreamLexerSource' stamp='0.456'/> <dependency_failed type='abstract_with_unique_concrete_subtype' ctxk='org/jruby/lexer/yacc/LexerSource' x='org/jruby/lexer/yacc/ByteArrayLexerSource' witness='org/jruby/lexer/yacc/InputStreamLexerSource' stamp='0.456'/>
  • 82. scopes_pcs_offset='1384' dependencies_offset='1576' handler_table_offset='1592' nul_chk_table_offset='1736' oops_offset='992' method='org/jruby/lexer/yacc/ByteArrayLexerSource$ByteArrayCursor read ()I' bytes='49' count='5296' backedge_count='1' iicount='10296' stamp='0.412'/> <writer thread='4425007104'/> <nmethod compile_id='21' compiler='C2' entry='4345862528' size='1152' address='4345862160' relocation_offset='288' insts_offset='368' stub_offset='688' scopes_data_offset='840' scopes_pcs_offset='904' dependencies_offset='1016' handler_table_offset='1032' oops_offset='784' method='org/jruby/lexer/yacc/ ByteArrayLexerSource forward (I)I' bytes='111' count='5296' backedge_count='1' iicount='10296' stamp='0.412'/> <writer thread='4300214272'/> <task_queued compile_id='22' method='org/jruby/lexer/yacc/ByteArrayLexerSource read ()I' bytes='10' count='5000' backedge_count='1' iicount='10000' stamp='0.433' comment='count' hot_count='10000'/> <writer thread='4426067968'/> <nmethod compile_id='22' compiler='C2' entry='4345885984' size='1888' address='4345885584' relocation_offset='288' insts_offset='400' stub_offset='912' scopes_data_offset='1104' scopes_pcs_offset='1496' dependencies_offset='1704' handler_table_offset='1720' nul_chk_table_offset='1864' oops_offset='1024' method='org/jruby/lexer/yacc/ByteArrayLexerSource read ()I' bytes='10' count='5044' backedge_count='1' iicount='10044' stamp='0.435'/> <writer thread='4300214272'/> <task_queued compile_id='23' method='java/util/HashMap hash (I)I' bytes='23' count='5000' backedge_count='1' iicount='10000' stamp='0.442' comment='count' hot_count='10000'/> <writer thread='4425007104'/> <nmethod compile_id='23' compiler='C2' entry='4345887808' size='440' address='4345887504' relocation_offset='288' insts_offset='304' stub_offset='368' scopes_data_offset='392' scopes_pcs_offset='400' dependencies_offset='432' method='java/util/HashMap hash (I)I' bytes='23' count='5039' backedge_count='1' iicount='10039' stamp='0.442'/> <writer thread='4300214272'/> <dependency_failed type='abstract_with_unique_concrete_subtype' ctxk='org/jruby/lexer/yacc/LexerSource' x='org/jruby/lexer/yacc/ByteArrayLexerSource' witness='org/jruby/lexer/yacc/InputStreamLexerSource' stamp='0.456'/> <dependency_failed type='abstract_with_unique_concrete_subtype' ctxk='org/jruby/lexer/yacc/LexerSource' x='org/jruby/lexer/yacc/ByteArrayLexerSource' witness='org/jruby/lexer/yacc/InputStreamLexerSource' stamp='0.456'/> <dependency_failed type='abstract_with_unique_concrete_subtype' ctxk='org/jruby/lexer/yacc/LexerSource' x='org/jruby/lexer/yacc/ByteArrayLexerSource' witness='org/jruby/lexer/yacc/InputStreamLexerSource' stamp='0.456'/> <dependency_failed type='abstract_with_unique_concrete_subtype' ctxk='org/jruby/lexer/yacc/LexerSource' x='org/jruby/lexer/yacc/ByteArrayLexerSource' witness='org/jruby/lexer/yacc/InputStreamLexerSource' stamp='0.456'/>
  • 83. Worst XML Evar • Relational structure in hierarchical form • Hotspot guys can read it...I cannot • <JDK>/hotspot/src/share/tools/LogCompilation • or http://github.com/headius/logc
  • 84. No flags, like PrintCompilation $ java -jar logc.jar hotspot.log 1 java.lang.String::hashCode (67 bytes) 2 Accumulator::addSqrt (7 bytes) 3 Accumulator::sqrt (6 bytes)
  • 85. -i flag, PrintCompilation and PrintInlining $ java -jar logc.jar -i hotspot.log 1 java.lang.String::hashCode (67 bytes) 2 Accumulator::addSqrt (7 bytes) @ 2 Accumulator::sqrt (6 bytes) (end time: 0.0660 nodes: 36) @ 2 java.lang.Math::sqrt (5 bytes) 3 Accumulator::sqrt (6 bytes) @ 2 java.lang.Math::sqrt (5 bytes)
  • 86. -i flag, PrintCompilation and PrintInlining $ java -jar logc.jar -i hotspot.log 1 java.lang.String::hashCode (67 bytes) 2 Accumulator::addSqrt (7 bytes) @ 2 Accumulator::sqrt (6 bytes) (end time: 0.0660 nodes: 36) @ 2 java.lang.Math::sqrt (5 bytes) 3 Accumulator::sqrt (6 bytes) @ 2 java.lang.Math::sqrt (5 bytes)
  • 87. 8 sun.nio.cs.UTF_8$Encoder::encode (361 bytes) 6 uncommon trap null_check make_not_entrant @8 java/lang/String equals (Ljava/lang/Object;)Z 6 make_not_entrant 9 java.lang.String::equals (88 bytes) 10 java.util.LinkedList::indexOf (73 bytes)
  • 88. Hotspot sees it’s 100% String 10 java.util.LinkedList::indexOf (73 bytes) @ 52 java.lang.Object::equals (11 bytes) type profile java/lang/Object -> java/lang/String (100%) @ 52 java.lang.String::equals (88 bytes) 11 java.lang.String::indexOf (87 bytes) @ 83 java.lang.String::indexOfSupplementary too big Too big to inline! Could be bad?
  • 89. Tuning Inlining • -XX:+MaxInlineSize=35 • Largest inlinable method (bytecode) • -XX:+InlineSmallCode=# • Largest inlinable compiled method • -XX:+FreqInlineSize=# • Largest frequently-called method...
  • 90. Tuning Inlining • -XX:+MaxInlineLevel=9 • How deep does the rabbit hole go? • -XX:+MaxRecursiveInlineLevel=# • Recursive inlining
  • 92.
  • 93. The Red Pill • Knowing code compiles is good • Knowing code inlines is better • Seeing the actual assembly is best!
  • 94. Caveat • I don’t really know assembly. • But I fake it really well.
  • 95. Print Assembly • -XX:+PrintAssembly • Google “hotspot printassembly” • https://wikis.oracle.com/display/ HotSpotInternals/PrintAssembly • Assembly-dumping plugin for Hotspot
  • 96. Alternative • -XX:+PrintOptoAssembly • Only in debug/fastdebug builds • Not as pretty
  • 97. Wednesday, July 27, 2011 ~/oscon ! java -XX:+UnlockDiagnosticVMOptions > -XX:+PrintAssembly > Accumulator 10000 OpenJDK 64-Bit Server VM warning: PrintAssembly is enabled; turning on DebugNonSafepoints to gain additional output Loaded disassembler from hsdis-amd64.dylib ...
  • 98. Decoding compiled method 11343cbd0: Code: [Disassembling for mach='i386:x86-64'] [Entry Point] [Verified Entry Point] [Constants] # {method} 'add' '(II)I' in 'Accumulator' # parm0: rsi = int # parm1: rdx = int # [sp+0x20] (sp of caller) 11343cd00: push %rbp 11343cd01: sub $0x10,%rsp 11343cd05: nop ;*synchronization entry ; - Accumulator::add@-1 (line 16) 11343cd06: mov %esi,%eax 11343cd08: add %edx,%eax ;*iadd ; - Accumulator::add@2 (line 16) 11343cd0a: add $0x10,%rsp 11343cd0e: pop %rbp 11343cd0f: test %eax,-0x1303fd15(%rip) # 1003fd000 ; {poll_return} 11343cd15: retq
  • 100. x86_64 Assembly 101 add Two’s complement add sub ...subtract mov* Move data from a to b jmp goto je, jne, jl, jge, ... Jump if ==, !=, <, >=, ... push, pop Call stack operations call*, ret* Call, return from subroutine eax, ebx, esi, ... 32-bit registers rax, rbx, rsi, ... 64-bit registers
  • 101. Register Machine • Instead of stack moves, we have “slots” • Move data into slots • Trigger operations that manipulate data • Get new data out of slots • JVM stack, locals end up as register ops
  • 102. Native Stack? • Native code has a stack too • Preserves registers from call to call • Various calling conventions • Caller preserves registers? • Callee preserves registers?
  • 103. Decoding compiled method 11343cbd0: <= address of new compiled code Code: [Disassembling for mach='i386:x86-64'] <= architecture [Entry Point] [Verified Entry Point] [Constants] # {method} 'add' '(II)I' in 'Accumulator' <= method, signature, class # parm0: rsi = int <= first parm to method goes in rsi # parm1: rdx = int <= second parm goes in rdx # [sp+0x20] (sp of caller) <= caller’s pointer into native stack
  • 104. 11343cd00: push %rbp 11343cd01: sub $0x10,%rsp 11343cd05: nop ;*synchronization entry ; - Accumulator::add@-1 (line 16) 11343cd06: mov %esi,%eax 11343cd08: add %edx,%eax ;*iadd ; - Accumulator::add@2 (line 16) 11343cd0a: add $0x10,%rsp 11343cd0e: pop %rbp 11343cd0f: test %eax,-0x1303fd15(%rip) # 1003fd000 ; {poll_return} 11343cd15: retq rbp points at current stack frame, so we save it off.
  • 105. 11343cd00: push %rbp 11343cd01: sub $0x10,%rsp 11343cd05: nop ;*synchronization entry ; - Accumulator::add@-1 (line 16) 11343cd06: mov %esi,%eax 11343cd08: add %edx,%eax ;*iadd ; - Accumulator::add@2 (line 16) 11343cd0a: add $0x10,%rsp 11343cd0e: pop %rbp 11343cd0f: test %eax,-0x1303fd15(%rip) # 1003fd000 ; {poll_return} 11343cd15: retq Two args, so we bump stack pointer by 0x10.
  • 106. 11343cd00: push %rbp 11343cd01: sub $0x10,%rsp 11343cd05: nop ;*synchronization entry ; - Accumulator::add@-1 (line 16) 11343cd06: mov %esi,%eax 11343cd08: add %edx,%eax ;*iadd ; - Accumulator::add@2 (line 16) 11343cd0a: add $0x10,%rsp 11343cd0e: pop %rbp 11343cd0f: test %eax,-0x1303fd15(%rip) # 1003fd000 ; {poll_return} 11343cd15: retq Do nothing, e.g. to memory-align code.
  • 107. 11343cd00: push %rbp 11343cd01: sub $0x10,%rsp 11343cd05: nop ;*synchronization entry ; - Accumulator::add@-1 (line 16) 11343cd06: mov %esi,%eax 11343cd08: add %edx,%eax ;*iadd ; - Accumulator::add@2 (line 16) 11343cd0a: add $0x10,%rsp 11343cd0e: pop %rbp 11343cd0f: test %eax,-0x1303fd15(%rip) # 1003fd000 ; {poll_return} 11343cd15: retq At the “-1” instruction of our add() method... i.e. here we go!
  • 108. 11343cd00: push %rbp 11343cd01: sub $0x10,%rsp 11343cd05: nop ;*synchronization entry ; - Accumulator::add@-1 (line 16) 11343cd06: mov %esi,%eax 11343cd08: add %edx,%eax ;*iadd ; - Accumulator::add@2 (line 16) 11343cd0a: add $0x10,%rsp 11343cd0e: pop %rbp 11343cd0f: test %eax,-0x1303fd15(%rip) # 1003fd000 ; {poll_return} 11343cd15: retq Move parm1 into eax.
  • 109. 11343cd00: push %rbp 11343cd01: sub $0x10,%rsp 11343cd05: nop ;*synchronization entry ; - Accumulator::add@-1 (line 16) 11343cd06: mov %esi,%eax 11343cd08: add %edx,%eax ;*iadd ; - Accumulator::add@2 (line 16) 11343cd0a: add $0x10,%rsp 11343cd0e: pop %rbp 11343cd0f: test %eax,-0x1303fd15(%rip) # 1003fd000 ; {poll_return} 11343cd15: retq Add parm0 and parm1, store result in eax.
  • 110. 11343cd00: push %rbp 11343cd01: sub $0x10,%rsp 11343cd05: nop ;*synchronization entry ; - Accumulator::add@-1 (line 16) 11343cd06: mov %esi,%eax 11343cd08: add %edx,%eax ;*iadd ; - Accumulator::add@2 (line 16) 11343cd0a: add $0x10,%rsp 11343cd0e: pop %rbp 11343cd0f: test %eax,-0x1303fd15(%rip) # 1003fd000 ; {poll_return} 11343cd15: retq How nice, Hotspot shows us this is our “iadd” op!
  • 111. 11343cd00: push %rbp 11343cd01: sub $0x10,%rsp 11343cd05: nop ;*synchronization entry ; - Accumulator::add@-1 (line 16) 11343cd06: mov %esi,%eax 11343cd08: add %edx,%eax ;*iadd ; - Accumulator::add@2 (line 16) 11343cd0a: add $0x10,%rsp 11343cd0e: pop %rbp 11343cd0f: test %eax,-0x1303fd15(%rip) # 1003fd000 ; {poll_return} 11343cd15: retq Put stack pointer back where it was.
  • 112. 11343cd00: push %rbp 11343cd01: sub $0x10,%rsp 11343cd05: nop ;*synchronization entry ; - Accumulator::add@-1 (line 16) 11343cd06: mov %esi,%eax 11343cd08: add %edx,%eax ;*iadd ; - Accumulator::add@2 (line 16) 11343cd0a: add $0x10,%rsp 11343cd0e: pop %rbp 11343cd0f: test %eax,-0x1303fd15(%rip) # 1003fd000 ; {poll_return} 11343cd15: retq Restore rbp from stack.
  • 113. 11343cd00: push %rbp 11343cd01: sub $0x10,%rsp 11343cd05: nop ;*synchronization entry ; - Accumulator::add@-1 (line 16) 11343cd06: mov %esi,%eax 11343cd08: add %edx,%eax ;*iadd ; - Accumulator::add@2 (line 16) 11343cd0a: add $0x10,%rsp 11343cd0e: pop %rbp 11343cd0f: test %eax,-0x1303fd15(%rip) # 1003fd000 ; {poll_return} 11343cd15: retq Poll a “safepoint”...give JVM a chance to GC, etc.
  • 114. 11343cd00: push %rbp 11343cd01: sub $0x10,%rsp 11343cd05: nop ;*synchronization entry ; - Accumulator::add@-1 (line 16) 11343cd06: mov %esi,%eax 11343cd08: add %edx,%eax ;*iadd ; - Accumulator::add@2 (line 16) 11343cd0a: add $0x10,%rsp 11343cd0e: pop %rbp 11343cd0f: test %eax,-0x1303fd15(%rip) # 1003fd000 ; {poll_return} 11343cd15: retq All done!
  • 115. Things to Watch For • CALL operations • Indicates something failed to inline • LOCK operations • Cache-busting, e.g. volatility
  • 116. CALL 1134858f5: xchg %ax,%ax 1134858f7: callq 113414aa0 ; OopMap{off=316} ;*invokespecial addAsBignum ; - org.jruby.RubyFixnum::addFixnum@29 (line 348) ; {optimized virtual_call} 1134858fc: jmpq 11348586d Ruby integer adds might overflow into Bignum, leading to addAsBignum call. In this case, it’s never called, so Hotspot emits callq assuming we won’t hit it.
  • 117. LOCK Code from a RubyBasicObject’s default constructor. 11345d823: mov 0x70(%r8),%r9d ;*getstatic NULL_OBJECT_ARRAY ; - org.jruby.RubyBasicObject::<init>@5 (line 76) ; - org.jruby.RubyObject::<init>@2 (line 118) ; - org.jruby.RubyNumeric::<init>@2 (line 111) ; - org.jruby.RubyInteger::<init>@2 (line 95) ; - org.jruby.RubyFixnum::<init>@5 (line 112) ; - org.jruby.RubyFixnum::newFixnum@25 (line 173) 11345d827: mov %r9d,0x14(%rax) 11345d82b: lock addl $0x0,(%rsp) ;*putfield varTable ; - org.jruby.RubyBasicObject::<init>@8 (line 76) ; - org.jruby.RubyObject::<init>@2 (line 118) ; - org.jruby.RubyNumeric::<init>@2 (line 111) ; - org.jruby.RubyInteger::<init>@2 (line 95) ; - org.jruby.RubyFixnum::<init>@5 (line 112) ; - org.jruby.RubyFixnum::newFixnum@25 (line 173) Why are we doing a volatile write in the constructor?
  • 118. LOCK public class RubyBasicObject ... { private static final boolean DEBUG = false; private static final Object[] NULL_OBJECT_ARRAY = new Object[0]; // The class of this object protected transient RubyClass metaClass; // zeroed by jvm protected int flags; // variable table, lazily allocated as needed (if needed) private volatile Object[] varTable = NULL_OBJECT_ARRAY; Maybe it’s not such a good idea to pre-init a volatile?
  • 119. LOCK ~/projects/jruby ! git log 2f935de1e40bfd8b29b3a74eaed699e519571046 -1 | cat commit 2f935de1e40bfd8b29b3a74eaed699e519571046 Author: Charles Oliver Nutter <headius@headius.com> Date: Tue Jun 14 02:59:41 2011 -0500 Do not eagerly initialize volatile varTable field in RubyBasicObject; speeds object creation significantly. LEVEL UP!
  • 120. What Have We Learned? • How Hotspot’s JIT works • How to monitor the JIT • How to find problems • How to fix problems we find
  • 121. What We Missed • Tuning GC settings in JVM • Monitoring GC with VisualVM • Google ‘visualgc’...it’s awesome
  • 122. You’re no dummy now! ;-)
  • 123. Thank you! • headius@headius.com, @headius • http://blog.headius.com • “java virtual machine specification” • “jvm opcodes”

Editor's Notes

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n
  34. \n
  35. \n
  36. \n
  37. \n
  38. \n
  39. \n
  40. \n
  41. \n
  42. \n
  43. \n
  44. \n
  45. \n
  46. \n
  47. \n
  48. \n
  49. \n
  50. \n
  51. \n
  52. \n
  53. \n
  54. \n
  55. \n
  56. \n
  57. \n
  58. \n
  59. \n
  60. \n
  61. \n
  62. \n
  63. \n
  64. \n
  65. \n
  66. \n
  67. \n
  68. \n
  69. \n
  70. \n
  71. \n
  72. \n
  73. \n
  74. \n
  75. \n
  76. \n
  77. \n
  78. \n
  79. \n
  80. \n
  81. \n
  82. \n
  83. \n
  84. \n
  85. \n
  86. \n
  87. \n
  88. \n
  89. \n
  90. \n
  91. \n
  92. \n
  93. \n
  94. \n
  95. \n
  96. \n
  97. \n
  98. \n
  99. \n
  100. \n
  101. \n
  102. \n
  103. \n
  104. \n
  105. \n
  106. \n
  107. \n
  108. \n
  109. \n
  110. \n
  111. \n
  112. \n
  113. \n
  114. \n
  115. \n
  116. \n
  117. \n
  118. \n
  119. \n
  120. \n
  121. \n
  122. \n
  123. \n