Tiered Compilation in the JVM
Explore C1 and C2 JIT compilers, on-stack replacement, and deoptimization handling in the JVM's tiered compilation model.
Tiered Compilation in the JVM
Modern JVMs do not rely on a single JIT compiler. Instead, they use a multi-tier compilation system that balances compilation speed against code quality. The JVM starts executing your code as soon as possible using an interpreter, then progressively compiles hot methods through increasingly aggressive optimization levels. This tiered approach delivers fast startup and low latency during warmup while achieving peak performance for long-running hot paths.
This covers how tiered compilation works, the role of the C1 and C2 compilers, how on-stack replacement enables mid-execution transitions, and what happens when the JVM must deoptimize compiled code.
Introduction
Tiered compilation is the JVM’s answer to the trade-off between fast startup and peak long-term performance. The JVM ships with two JIT compilers — a fast but shallow C1 (Client Compiler) and a slower but deeply optimizing C2 (Server Compiler) — and tiered compilation uses both strategically over a method’s lifetime. At JVM start-up, the interpreter runs code immediately while C1 compiles frequently-called methods at a light optimization level. As the JVM gathers profiling data about hot call sites and loop behavior, methods that remain performance-critical get recompiled by C2 with aggressive inlining, loop unrolling, and escape analysis. Methods that were initially promoted to C2 but regress under deoptimization can fall back to C1.
Understanding tiered compilation matters for production tuning: the code cache size (ReservedCodeCacheSize) directly limits how many methods can be compiled and held simultaneously; the compilation thresholds (TieredCompilation flags) control when promotion happens; and the interaction between deoptimization and on-stack replacement (OSR) determines whether the JVM can transition a running method from interpreted to compiled mid-execution. Getting these settings wrong leads to deoptimization loops, premature code cache exhaustion, or sub-optimal peak throughput. This post walks through the tiered compilation pipeline, the compilation thresholds, deoptimization mechanics, and the OSR entry points that make mid-execution transitions possible.
When to Use This Knowledge
Tiered compilation knowledge helps when you need to:
- Tune JVM startup performance — Understanding thresholds helps you trade off startup time against peak performance
- Diagnose warmup latency — P99 latency spikes during warmup often involve tier transitions or deoptimization
- Configure containerized workloads — CPU limits affect compilation throughput and make tier tuning more important
- Read JITWatch logs — Visualizing compilation across tiers reveals why certain methods compile when they do
When Not to Use
Do not tune tiered compilation unless you have measured that warmup latency or compilation overhead is a genuine problem. Default settings work well for most applications. Premature optimization of tier thresholds is a distraction from writing good application code.
The Four Compilation Tiers
Tiered compilation divides JIT compilation into four levels, each with distinct compilation thresholds and optimization passes:
flowchart TD
INT[Interpreted\nNo Compilation] --> T1[C1 Level 1\nSimple Compilation\n~1,500 invocations]
T1 --> T2[C1 Level 2\nLimited Profiling\n~5,000 invocations]
T2 --> T3[C1 Level 3\nFull Profiling\n~10,000 invocations]
T3 --> T4[C2 Level 4\nFull Optimization\n~100,000 invocations]
T4 -.->|Deoptimization| INT
T3 -.->|Deoptimization| INT
T2 -.->|Deoptimization| INT
T1 -.->|Deoptimization| INT
INT -.->|OSR| T2
T1 -.->|OSR| T2
T3 -.->|OSR| T4
Tier 0: Interpreted Code
At launch, all bytecode runs in the JVM interpreter. The interpreter records profiling information — invocation counts, branch frequencies, type profiles — without generating native code. This incurs no compilation overhead and begins executing immediately.
Tier 1: C1 with Simple Compilation
When a method reaches approximately 1,500 invocations (or loop iterations), the JVM compiles it with the C1 client compiler at its simplest optimization level. This produces native code quickly but without deep optimization. The compiled code begins executing immediately via on-stack replacement.
Tier 2: C1 with Limited Profiling
Methods that remain hot after tier 1 compilation (around 5,000 invocations) are recompiled by C1 with additional profiling enabled. The compiler uses collected type and branch data to make better inlining decisions and eliminate unnecessary type checks.
Tier 3: C1 with Full Profiling
At approximately 10,000 invocations, C1 recompiles with full profiling, collecting complete type profiles, receiver types, branch probabilities, and call site targets. This profiling data guides the tier 4 C2 compiler.
Tier 4: C2 with Full Optimization
Methods that reach around 100,000 invocations are promoted to the C2 server compiler, which applies aggressive optimizations based on the detailed profiling data from tier 3. C2 compilation takes longer but produces significantly faster code.
C1 vs. C2: The Client and Server Compilers
The JVM historically shipped with two separate JIT compilers — the fast but simple client compiler (C1) and the slow but aggressive server compiler (C2). Modern JVMs use both through tiered compilation.
| Aspect | C1 (Client) | C2 (Server) |
|---|---|---|
| Compilation speed | Fast (~100ms for simple methods) | Slow (~1-10s for complex methods) |
| Optimization level | Basic | Aggressive |
| Memory usage | Lower | Higher |
| Best for | Startup, short-lived apps | Long-running servers |
| Inline depth | Shallow | Deep |
| Escape analysis | Basic | Full |
| Loop optimizations | Minimal | Aggressive unrolling, vectorization |
What Makes C2 Slower but Faster
C2 takes longer to compile because it performs more analysis and applies more aggressive optimizations. The tradeoff is that the resulting native code runs significantly faster. For methods that execute billions of times, spending seconds in compilation pays off. For methods that execute only thousands of times, C1’s faster compilation delivers usable code sooner.
On-Stack Replacement (OSR)
On-stack replacement is the mechanism that allows the JVM to swap running code while a method is actively executing. Without OSR, a hot loop that takes hours to complete would never be compiled until it returned — and by then it would be too late.
How OSR Works
- The JVM detects a hot loop through the back-edge counter while the method runs interpreted (or at a lower tier)
- The JVM compiles a replacement version at the current tier (or higher)
- At a safe point — a loop back-edge or method return — the JVM transfers control from the running code to the compiled version
- Local variables and stack state are transferred through a transition frame
- Execution continues in compiled code seamlessly
OSR Trigger Conditions
OSR is triggered by the -XX:OnStackReplacementPercentage flag relative to CompileThreshold:
OSR trigger = CompileThreshold * OnStackReplacementPercentage / 100
= 10000 * 933 / 100 = 93,300 (on typical server JVM)
This is much higher than the regular compilation threshold, meaning OSR-compiled methods typically start at tier 1 or 2 and are promoted to higher tiers on subsequent compilations.
Deoptimization Handling
The JVM does not guarantee that compiled code remains valid forever. Runtime behavior sometimes invalidates the assumptions that guided optimization — new classes load, type profiles change, speculation fails. When this happens, the JVM deoptimizes: it marks the compiled code as non-entrant and falls back to interpreted execution (or recompiles at a lower tier).
Causes of Deoptimization
| Cause | Description | Example |
|---|---|---|
| Monomorphic to Megamorphic | A call site receives more receiver types than expected | Loading a plugin class changes a factory method’s return type |
| Assumption Violated | A compiled assumption proves false | Object o = couldBeNull(); o.toString(); — NPE proves assumption wrong |
| Unloadable Class | A class used in compiled code is unloaded | Intensive classloading with -XX:+ClassUnloading |
| Age Threshold | Code ages out of the code cache | With -XX:+UseCodeCacheFlushing, old code is evicted |
| Loop Pit Limit | A compiled loop runs too long | Long-lived loops eventually trigger deopt to free resources |
Deoptimization Process
When the JVM deoptimizes a method:
- The compiled code is marked as “not entrant” — new calls no longer use it
- Existing executions of the compiled code are left to run to completion (they are “zombie” code)
- The interpreter takes over for new calls
- The JVM may recompile the method with more conservative assumptions if the call site is still hot
- The zombie code is evicted when no threads are executing it and code cache space is needed
Reading Deoptimization Events
With -XX:+PrintCompilation enabled, deoptimization events appear as:
<n> % <method> <tier> made not entrant @ <bc> <inline reason>
<n> % <method> <tier> deoptimized @ <bc> <reason>
The “made not entrant” message marks the code as inactive. “Deoptimized” means a thread was interrupted mid-compiled-code and resumed in the interpreter.
Tiered Compilation Flags
The JVM provides many flags to tune tiered compilation behavior:
| Flag | Default | Purpose |
|---|---|---|
-XX:+TieredCompilation | true (Java 8+) | Enable tiered compilation |
-XX:TieredStopAtLevel | 4 | Stop at a specific tier (1=C1 simple, 4=C2) |
-XX:CompileThreshold | 10000 | Initial compilation threshold |
-XX:Tier3InvocationThreshold | 5000 | Threshold for tier 3 (C1 full profiling) |
-XX:Tier4InvocationThreshold | 100000 | Threshold for tier 4 (C2) |
-XX:Tier3MinInvocationThreshold | 1000 | Minimum invocations to consider for tier 3 |
-XX:TieredCompilationPolicy | simple | Compilation policy: simple, throughput, latency |
-XX:+UseCodeCacheFlushing | true | Flush old compiled code from cache |
Stopping at a Lower Tier
For testing or specific workloads, you can stop tiered compilation at a specific level:
# Only use C1, never promote to C2 (fast startup, lower peak)
java -XX:+TieredCompilation -XX:TieredStopAtLevel=3 -cp . MyApp
# Use C1 simple only (fastest compilation, lowest optimization)
java -XX:+TieredCompilation -XX:TieredStopAtLevel=1 -cp . MyApp
# Disable tiered, use interpreter + C2 only (slow startup, high peak)
java -XX:-TieredCompilation -cp . MyApp
Production Failure Scenarios
Tiered Compilation Causing P99 Latency Spikes
The transition from tier 1/2 to tier 3/4 can cause latency spikes if many methods promote simultaneously during a load spike. The C2 compilation queue saturates, new hot methods wait, and latency rises for requests hitting those methods. Solutions:
- Use
-XX:TieredCompilationPolicy=throughputto compile less aggressively at lower tiers - Increase
-XX:CICompilerCount(default: server JVMs use 2-4 based on CPU count) - Prewarm the application by hitting known hot paths before production traffic
Deoptimization Loops
If a call site oscillates between monomorphic and megamorphic, the JVM compiles, deoptimizes, recompiles, deoptimizes — wasting CPU and causing inconsistent latency. This “heartbeat” deoptimization pattern typically occurs with classloading patterns where a plugin or dynamic proxy changes type profiles periodically.
Code Cache Pressure from Multiple Tiers
Because tiered compilation keeps multiple compiled versions of the same method (at different tiers) in the code cache simultaneously, the cache fills faster than with a single compiler. Combined with code cache flushing, this can cause unnecessary deoptimizations. Size the code cache appropriately with -XX:ReservedCodeCacheSize.
Trade-Off Table
| Scenario | Tiered Off | Tiered On (Default) |
|---|---|---|
| Startup time | Slow (wait for C2) | Fast (C1 kicks in at 1,500) |
| Peak performance | Highest (C2 only) | High (C2 for truly hot code) |
| Memory overhead | Lower (one version per method) | Higher (multiple versions) |
| CPU usage at startup | Lower | Higher (more compilation) |
| P99 latency during warmup | Higher | Lower with C1 |
| Steady-state CPU usage | Lower | Moderate (maintenance compilations) |
Implementation Snippets
Inspect Tier Transitions
java -XX:+UnlockDiagnosticVMOptions \
-XX:+PrintTieredCompilation \
-XX:+PrintCompilation \
-XX:+LogCompilation \
-Xlog:tiered*=info \
-cp . MyApp
Force Methods to a Specific Tier
Use a JVM TI agent or diagnostic command to force compilation:
jcmd <pid> Compiler.compile_method <method_id>
Monitor Compilation Queue
# Check compilation queue depth via JMX
jconsole # Connect and look at VM > Compile
# Or with jstat
jstat -printcompilation <pid> 1000
Profile OSR Frequency
With -XX:+PrintCompilation, look for *osr* markers in the output:
12 3 % foo() @ 42 made not entrant // OSR compilation entry
Observability Checklist
When observing tiered compilation in production:
- Track methods per tier — How many methods are at each compilation tier?
- Monitor OSR frequency — Frequent OSR indicates startup or loop hotness issues
- Watch deoptimization rate — High deopt rate suggests unstable type profiles
- Check compilation queue depth — Saturation causes warmup latency
- Measure code cache usage — With tiered compilation, multiple versions coexist
- Compare tier transition times — When do methods move from tier 1 to tier 4?
- Review zombie code age — Stale compiled code that never gets evicted wastes cache space
Security Notes
Tiered compilation produces native code that is stored in memory-mapped regions of the code cache. Modern JVMs mark these regions as non-executable when the underlying hardware and OS support it (NX bit, DEP). However, the JIT compiler operates with high privileges inside the JVM process, and vulnerabilities in JIT compilation have historically been severe.
Keep your JVM updated. The JIT compiler is complex and has historically been the source of the most serious JVM security vulnerabilities, including privilege escalation exploits. Use a supported, patched JVM version in production.
Common Pitfalls / Anti-Patterns
-
Assuming C2 is always better than C1 — For methods that run only a few thousand times, C2 compilation time exceeds the time saved by optimized code. C1 is actually faster for these methods.
-
Ignoring the compilation queue — The JIT compiler has limited threads (controlled by
-XX:CICompilerCount). If your application creates thousands of hot methods simultaneously (a burst load pattern), the queue backs up and methods wait to compile. -
Setting thresholds too aggressively — Lowering
CompileThresholdto 1000 or below increases CPU usage at startup and may evict more useful compiled code from the cache due to space pressure. -
Misunderstanding tier promotion — Methods do not always go through every tier. A method might go interpreted -> tier 4 directly if it is called via OSR. Check actual tier paths in JIT logs.
-
Forgetting OSR does not preserve all state — OSR transfers local variables and operand stack, but some JVM state (like profile data) is not preserved at the transition point. This can cause slight performance dips right after OSR.
Quick Recap Checklist
- Tiered compilation uses multiple optimization levels: interpreted -> C1 tiers 1-3 -> C2 tier 4
- C1 compiles fast with low optimization; C2 compiles slow with aggressive optimization
- Tier 1 (C1 simple) kicks in at ~1,500 invocations for fast startup
- Tier 4 (C2) kicks in at ~100,000 invocations for peak performance
- On-stack replacement (OSR) swaps running code mid-execution
- Deoptimization marks compiled code invalid and falls back to interpreter
- Multiple compiled versions coexist in the code cache (higher memory usage)
-
-XX:+TieredCompilationis on by default in Java 8+ -
-XX:TieredStopAtLevel=Nstops tiering at a specific level - Tiered compilation trades peak performance for warmup latency
Interview Questions
C2 produces significantly better optimized code, but it takes much longer to compile. For a method that runs one million times, spending five seconds compiling it pays off. For a method that runs ten thousand times and then returns, spending five seconds compiling it is pure waste — the compilation time exceeds the total execution time. Tiered compilation solves this by using C1 (which compiles in ~100ms) to get native code quickly for warmup, then promoting the truly hottest methods to C2 for peak performance. This delivers both fast startup and high throughput without forcing you to choose between them.
"Made not entrant" marks compiled code as inactive — new calls to that method will no longer use this compiled version. This happens when a method is deoptimized (assumptions violated) or when a newer compiled version replaces it. The existing activations of the old compiled code are allowed to run to completion; they become "zombie" code. "Deoptimized" specifically refers to an active compiled code frame being interrupted mid-execution and resumed in the interpreter. If you see "deoptimized" frequently, it indicates a problem — a compiled path is hitting conditions that invalidate its assumptions during normal execution.
The JVM tracks invocation counts separately for each compilation tier. After a method is compiled at tier 3 (C1 with full profiling), its invocation counter continues incrementing. When the count reaches the tier 4 threshold (default ~100,000, controlled by -XX:Tier4InvocationThreshold), the JVM queues the method for C2 compilation. The profiling data collected at tier 3 guides C2's optimization — type profiles, branch probabilities, and receiver histograms help C2 make better inlining and specialization decisions. If the method's hotness drops below a minimum threshold before promotion, it may never reach C2.
Deoptimization occurs when compiled code's assumptions are violated at runtime. Common causes include a call site becoming megamorphic (receiving more receiver types than expected), a class used in the compiled code being unloaded, an assumption about nullness or array length proving false, or the code cache filling and forcing eviction. When deoptimization occurs, the JVM marks the compiled code as non-entrant, existing activations run to completion, and new calls fall back to the interpreter (or a less-optimized compiled version). The JVM may then recompile with more conservative assumptions if the code is still hot. During deoptimization, the JVM must transfer control through a safepoint, which briefly pauses all Java threads.
On-stack replacement (OSR) lets the JVM replace the currently executing code of a method while it is still running — not just when a new invocation begins. This is essential for long-running server applications because a hot loop in a method might run for hours before the method naturally returns. Without OSR, the loop would execute interpreted code for its entire lifetime if its back-edge counter never reached the compilation threshold before the loop started. With OSR, the JVM detects the hot loop mid-execution, compiles a replacement version, and transfers control to the compiled code at the next loop back-edge. The application experiences a brief pause at the OSR transition point but immediately benefits from compiled performance for the remaining loop iterations.
Further Reading
Compilation Queue Prioritization
The JIT compiler maintains separate compilation queues for C1 and C2, and methods are not served in FIFO order. Hotter methods — those with higher invocation counts or that are on critical paths — get priority. The queue is organized as a priority heap where the key is a method’s “cost” (estimated compilation time) divided by its “benefit” (estimated speedup from compilation). This ensures the JIT invests compilation resources in the methods that provide the most performance gain per compilation time spent. In burst-load scenarios where thousands of methods become hot simultaneously, the queue can saturate and methods wait — which is why prewarming your application before production traffic is a recognized optimization for tiered compilation workloads.
The C1 and C2 compilers maintain separate compilation queues organized as priority heaps. When picking the next method to compile, the compiler selects the method with the highest "compilation benefit to cost ratio" — hotter methods with smaller bytecode bodies provide more return on compilation time investment. The JIT tracks estimated compilation cost (based on bytecode size, complexity metrics) and estimated benefit (based on current invocation count and measured speedup from previous compilations). A method that runs 100,000 times and compiles in 50ms gets priority over a method that runs 10,000 times and compiles in 500ms. During burst load patterns, this prioritization prevents the queue from saturating with cold methods, but during sustained high load, the queue can still back up when the compilation throughput cannot keep up with the rate of methods becoming hot.
OSR and deoptimization are separate mechanisms that interact at transition points. OSR replaces running interpreted or lower-tier compiled code with higher-tier compiled code mid-execution. Deoptimization replaces compiled code that has become invalid with lower-tier or interpreted code. An OSR transition can trigger deoptimization if the OSR compilation itself makes assumptions that are later violated — for example, if an OSR-compiled loop later receives a type at a call site that it did not anticipate. Conversely, deoptimization can trigger OSR — when a method deoptimizes, if it is still hot, the JVM queues it for recompilation at a lower tier, and that recompilation can use OSR to re-enter compiled code at a later loop back-edge. The two mechanisms are designed to work together to keep hot code compiled at the appropriate tier despite changing runtime conditions.
Tiered compilation keeps multiple compiled versions of the same method in the code cache simultaneously — the same method may have a tier 1 C1 version, a tier 3 C1 version, and a tier 4 C2 version all coexisting. This is necessary because different versions serve different purposes: the older version may still be executing on threads that have not yet been deoptimized, and newer versions may be waiting for promotion. With single-tier compilation, only one version exists per method. When tiered compilation is aggressive and many methods are being compiled at multiple tiers simultaneously, the code cache fills faster. The JVM manages this through code cache flushing (-XX:+UseCodeCacheFlushing), which evicts cold (not currently executing) compiled methods when the cache reaches high occupancy. However, flushing can cause previously compiled methods to run interpreted if they become hot again, creating a performance dip that requires recompilation.
Tier 1 (C1 simple) kicks in at ~1,500 invocations — fast compilation with minimal profiling, producing native code quickly without deep optimization. The goal is startup acceleration: get some native code running as soon as possible. Tier 2 (C1 limited profiling) promotes at ~5,000 invocations and adds profiling for branch frequencies and type data, enabling better inlining decisions than tier 1. Tier 3 (C1 full profiling) at ~10,000 invocations collects complete type profiles, receiver type histograms, and call site target data — the same data C2 will use for aggressive optimization. Each tier exists to balance compilation time against optimization quality: faster compilation tiers produce less optimized code but do so quickly, while slower tiers produce better code but take longer. The promotion chain means the JVM invests compilation time progressively as a method proves its importance through sustained hotness.
If a method's invocation count drops below a minimum threshold before it is actually compiled by C2 (because C2 compilation takes time and the queue may be backed up), the method may wait in the C2 queue long enough that its hotness decreases. The JVM tracks "age" of compiled methods and may demote a tier 3 method to tier 2 if its hotness falls. When the method finally reaches C2 compilation, the compiler may decide the method is not worth full optimization if its bytecode is large and hotness is marginal. In practice, the threshold for C2 promotion is high enough (~100,000 invocations) that methods reaching C2 are almost always genuinely hot. If a method's hotness drops dramatically after deoptimization, the JVM may never recompile it at C2 — it stays at a lower tier or runs interpreted. This prevents spending expensive C2 cycles on code that is unlikely to recover its compilation cost.
-XX:TieredCompilationPolicy=simple|throughput|latency controls how the JVM selects methods for compilation. The default simple policy compiles methods when they reach tier thresholds (1,500 / 5,000 / 10,000 / 100,000). The throughput policy maximizes throughput by compiling more aggressively at lower tiers and prioritizing C2 compilation for the hottest methods — useful for long-running batch jobs. The latency policy reduces startup latency by prioritizing C1 compilation and limiting C2 queue saturation — useful for microservices with short request windows. You would switch from simple to throughput when running batch workloads that warm up fully, or to latency when running serverless functions or interactive applications where fast response matters more than peak throughput.
OSR compilation must produce a transition frame that is compatible with the interpreter's state at an arbitrary point mid-method — not just at the method entry. The JIT must capture the local variable values and operand stack contents at the OSR trigger point (usually a loop back-edge) and construct a compiled version that can resume from that exact state. This requires special handling in the compiled code: a "OSR entry point" that receives the interpreter's state and transforms it into the compiled stack layout. Regular compilation starts at the method entry with a known empty stack and local variable state. OSR additionally requires the JVM to transfer control at a safepoint while the method is actively running on an interpreter or lower-tier compiled code — this creates a brief pause at the OSR transition point. The complexity is why OSR-compiled code is often less optimized than regularly compiled code — the JIT has less freedom to reorder and restructure.
When a C2-compiled method is deoptimized (assumptions violated, class unloaded, type profile changed), the JVM marks the compiled code as non-entrant. New calls fall back to the interpreter. If the method is still hot (invocation counter still above the tier 3 threshold), the JVM queues it for recompilation — typically back to tier 3, where the profiling data collected before the deoptimization can be reused. If the method is less hot but still relevant, it may be recompiled at tier 2 or even tier 1 with more conservative assumptions. The key point is that C2 deoptimization is not the end — the method gets another chance at a lower optimization level with updated profile data. Each deoptimization cycle teaches the JIT more about the method's actual runtime behavior, often leading to a more stable compiled version on subsequent compilations.
In tiered compilation, lower thresholds (promoting methods to compilation sooner) cause more methods to be compiled at more tiers simultaneously, filling the code cache faster than non-tiered mode. A method compiled at tier 1, then tier 2, then tier 3, then tier 4 occupies four slots in the code cache simultaneously during the promotion chain. The default 48MB code cache was calibrated for non-tiered mode; with tiered compilation, you may need to increase -XX:ReservedCodeCacheSize to 100-256MB to avoid cache pressure from multiple versions per method. Lowering thresholds aggressively (-XX:CompileThreshold=1000) exacerbates this because it creates more compiled methods at lower tiers, all coexisting in the cache. The tradeoff is between warmup speed (lower thresholds) and code cache pressure (more compiled versions simultaneously).
Tiered compilation reduces P99 latency during warmup compared to non-tiered because C1 kicks in at 1,500 invocations, getting methods to native code in ~100ms rather than waiting for C2 (which takes seconds). This means the first few thousand requests already run on compiled code, even if not fully optimized. Without tiered, requests during warmup hit interpreted code or wait in the C2 queue, causing high P99. However, tiered introduces its own P99 spikes: when a method transitions from tier 1 to tier 2 or tier 2 to tier 3, the JVM must deoptimize and recompile, briefly returning to slower code. If many methods promote simultaneously (during a load spike), the queue saturates and P99 spikes as requests wait for compilation. The -XX:TieredCompilationPolicy=latency mode helps by being less aggressive with tier promotion, prioritizing compilation responsiveness over compilation depth.
Compilation regression occurs when a method is compiled at a lower tier (e.g., tier 3), runs for a while, is promoted to C2, and then the C2 compilation takes so long that the application experiences a performance dip before C2 code is ready. During C2 compilation, the tier 3 code may be made non-entrant, causing calls to run interpreted until C2 completes. If the method is very large or complex, C2 compilation can take many seconds, during which the application runs slower than it did at tier 3. This is called "compilation regression" — the method regresses in performance during compilation. The solution is often to use -XX:TieredStopAtLevel=3 to prevent promotion to C2 for very large methods, keeping them at tier 3 which is less optimized but more stable during the compilation period.
The regular compilation threshold is CompileThreshold (default 10,000), which controls when a method is considered hot enough to compile. OSR has its own trigger: OSR trigger = CompileThreshold * OnStackReplacementPercentage / 100, which with defaults (10,000 * 933 / 100) gives ~93,300. This OSR threshold is much higher because OSR is triggered by back-edge counters (loop iterations) rather than method invocations. A loop that iterates 93,000 times within a single method invocation triggers OSR. The OnStackReplacementPercentage=933 means OSR triggers at roughly 9.33x the regular threshold. This design ensures that OSR compilation targets long-running loops (which benefit most from compilation) rather than short loops that would finish before compilation would help. The OSR entry point is also more expensive to create than a regular entry point, so the JVM only creates OSR compilations for loops that are genuinely hot.
CPU limits affect the JIT compilation throughput because compilation requires CPU cycles. When the container has limited CPUs, the compilation threads (controlled by -XX:CICompilerCount, default 2-4 based on CPU count) compete with application threads for CPU time. With tiered compilation, more methods are being compiled more aggressively, increasing compilation CPU demand. Under CPU throttling, the compilation queue backs up, warmup takes longer, and tier promotion slows down. To mitigate, you can increase -XX:CICompilerCount to give the JIT more compilation threads (even under CPU limits, more compilation threads can finish sooner), or reduce tier thresholds so methods compile at simpler tiers sooner rather than waiting for deeper compilation. You can also use -XX:TieredStopAtLevel=3 to keep methods at C1 rather than promoting to C2, reducing compilation overhead at the cost of peak performance.
The interpreter is the source of profiling data that drives tier promotion. While executing bytecode, the interpreter increments invocation counters and back-edge counters, records branch taken/not-taken information, and tracks receiver types at virtual and interface call sites. When a method is promoted to tier 1, this profiling data is available immediately. At tier 2 and tier 3, additional profiling is layered on top — more detailed type information, call site targets, and exception frequency. When the method is finally compiled by C2, all this profiling data guides optimization decisions: which branches to prioritize, which call sites to inline, which types to assume. This data is preserved across tier transitions — tier 3 data carries forward to C2. If the method deoptimizes and recompiles, the accumulated profile data is used again. The interpreter is not just a slow fallback; it is an active profiler that enables intelligent compilation decisions.
-XX:TieredStopAtLevel=N stops tiered promotion at level N: 1 stops at C1 simple, 2 stops at C1 limited profiling, 3 stops at C1 full profiling, 4 allows C2. Setting TieredStopAtLevel=1 gives the fastest warmup because every method compiles only at tier 1 (~1,500 invocations) and never promotes to deeper tiers. The resulting code is less optimized than C2 but is produced quickly. Setting TieredStopAtLevel=3 gives moderate warmup (methods promote through tier 3 but not to C2) — useful when C2 compilation time is too long for the workload and the CPU budget for compilation is limited. The tradeoff is peak performance: TieredStopAtLevel=3 means the application never reaches the highest optimization level that C2 provides. For long-running batch jobs that warm up fully, default tiered (stop at 4) gives peak performance. For short-lived processes or serverless functions, stopping at tier 1 or 2 gives better time-to-maximum-performance.
Conclusion
You now understand the four-tier compilation model from interpreted code through C1 to C2, and how OSR enables mid-execution transitions for long-running hot loops. Use this knowledge to configure tiered compilation for your workload profile — C1-first for fast startup, C2 for sustained throughput. Read GraalVM Native Image to explore the opposite end of the compilation spectrum: ahead-of-time compilation that eliminates JIT warmup entirely.
Category
Related Posts
JIT Compilation Internals
Understand how the JVM's Just-In-Time compiler detects hot code, applies compilation thresholds, and manages the code cache for peak performance.
Deoptimization Debugging: When JIT Compiled Code Reverts
Learn what causes the JVM to deoptimize JIT-compiled code, how to detect deoptimization events, and how to fix the underlying issues.
Java Bytecode Fundamentals
Explore the low-level representation of Java code: op codes, the stack-based JVM architecture, and local variable table mechanics.