ZGC and Shenandoah: Ultra-Low Latency Garbage Collectors
How ZGC and Shenandoah achieve sub-millisecond pause times through concurrent operations and load barriers, without stopping your application.
ZGC and Shenandoah: Ultra-Low Latency Garbage Collectors
G1 made GC pauses more manageable, but it still stops your application for certain phases. ZGC and Shenandoah take a fundamentally different approach: they aim for pauses that are effectively unmeasurable - measured in microseconds rather than milliseconds. They do this by doing almost all GC work concurrently with your application, including the compaction phase.
This covers how ZGC and Shenandoah work, what makes them different from G1, and when the trade-offs are worth it.
Introduction
ZGC and Shenandoah are the JVM’s answer to applications that cannot tolerate GC pauses — trading off some throughput for the ability to keep pause times consistently below one millisecond, even during heavy GC loads. G1 improved on older collectors by_parallelizing pauses, but it still stops the world for young-generation collection and mixed collections. ZGC and Shenandoah take a fundamentally different approach: they perform nearly all GC work concurrently with the application threads, including the compaction phase that traditionally requires a stop-the-world pause.
The key enabling technology in both collectors is load barriers — small checks injected at read access points that ensure the garbage collector can keep track of object references even while mutator threads are actively modifying the heap. This allows the collector to move objects (evacuate them) without stopping the application. ZGC achieves this with colored pointers and a multi-generational design in Java 21+, while Shenandoah uses a Brooks pointer and a single heap view. This post covers how both work, their performance characteristics, and when the trade-offs justify choosing one over G1 or a parallel collector.
When to Use This Knowledge
Use when:
- Your latency SLA is measured in milliseconds or microseconds
- Your application cannot tolerate GC pauses at all (trading systems, real-time gaming, control systems)
- You have very large heaps (100GB+) and need consistent performance
- You are running on Java 11+ and need the lowest possible pause times
Do not use when:
- Throughput is your primary metric (batch workloads, ETL)
- You are on Java 8 (ZGC and Shenandoah require Java 11+)
- Your latency SLA is in the hundreds of milliseconds and G1 already meets it
- You have strict memory constraints and cannot afford the overhead
When NOT to Use
ZGC and Shenandoah are overkill in several common scenarios. The CPU and memory overhead only makes sense when sub-millisecond pauses actually matter.
For short-lived serverless functions, skip them. A Lambda that runs for 500ms does not need sub-millisecond pauses. G1 pauses in that environment are measured in milliseconds at most, and the 5-15% CPU overhead would hurt throughput for no gain. Cold start times for these collectors also work against you in serverless.
Memory-constrained containers without hard pause SLAs also do not benefit. Running in a container with 512MB or 1GB heap when your SLA is measured in seconds? The load barrier overhead is pure waste. Even G1 is probably overkill here. Parallel GC or Serial GC with correct heap sizing often performs better when latency is not the concern.
If your application does not have a latency problem, ZGC and Shenandoah will not help. GC logs showing G1 pauses averaging 50-100ms when your SLA is 500ms means you already have headroom. Profile first. Do not switch collectors to solve a problem that does not exist.
The Load Barrier Approach
Both ZGC and Shenandoah use a technique called a load barrier (also called a read barrier) to maintain correctness without stopping the world. The basic idea is that every heap read - every time your code accesses an object field - goes through a small check. This check ensures the object is in a consistent state.
graph LR
A["Application\nThread"] --> B["Heap Read"]
B --> C{"Load Barrier\nCheck"}
C -->|"Object OK"| D["Return\nReference"]
C -->|"Object being\nmoved"| E["Fix pointer\n atomically"]
E --> D
The load barrier is extremely fast - a handful of instructions. It adds some CPU overhead but eliminates the need for stop-the-world pauses for most GC work.
ZGC (Z Garbage Collector)
ZGC was developed by Oracle and introduced in Java 11 as an experimental feature, becoming production-ready in Java 15. It is designed for very large heaps and very low latency.
How ZGC Works
ZGC divides the heap into regions called pages (not to be confused with OS pages). These are different from G1 regions - ZGC uses three size classes: small (2MB), medium (32MB), and large (2MB multiples, up to 16TB on 64-bit systems).
graph TB
subgraph ZGCHeap["ZGC Heap - Colored Pointers"]
Z1["Small Page\n(2MB)"]
Z2["Medium Page\n(32MB)"]
Z3["Large Page\n(N * 2MB)"]
end
ZGC uses colored pointers. A pointer to an object encodes information about the object: whether it is marked live, whether it is in a remapping set, and more. When you read a pointer, the load barrier checks the colors and handles any necessary fixups on the fly.
ZGC Phases
ZGC has three main pause types, all very short:
-
GC Locker (Stop-The-World, microseconds): Happens when your Java code throws an OutOfMemoryError due to GC locker conditions. Rare.
-
Pause Mark Start (Stop-The-World, microseconds): Briefly pauses to mark roots from thread stacks.
-
Pause Mark End (Stop-The-World, microseconds): Briefly pauses to finalize the marking phase.
Everything else - concurrent marking, concurrent relocate, concurrent remap - runs while your application runs.
graph TB
A["Pause Mark Start\n(microseconds)"] --> B["Concurrent Mark\n(app running)"]
B --> C["Concurrent Relocate\n(app running)"]
C --> D["Pause Mark End\n(microseconds)"]
D --> E["Concurrent Remap\n(app running)"]
E --> A
ZGC Key Characteristics
| Aspect | Behavior |
|---|---|
| Pause times | Sub-millisecond (typically under 1ms) |
| Throughput impact | Moderate - load barrier adds 5-10% CPU overhead |
| Heap scalability | Excellent - tested to 16TB |
| Compaction | Yes - concurrent relocation |
| Java version | Java 11+ (production in Java 15+) |
| NUMA awareness | Yes |
| Single-heap mode | Yes |
ZGC JVM Flags
-XX:+UseZGC
-Xmx64g -Xms64g # Works well with large heaps
-XX:+ZCollectionInterval=120 # Set GC interval target (seconds)
-XX:+ZProactive # Enable proactive GC (default on)
Shenandoah
Shenandoah was developed by Red Hat and open-sourced before being adopted in OpenJDK. Unlike ZGC, Shenandoah is designed to work with a wide range of heap sizes, not just very large ones.
How Shenandoah Works
Shenandoah uses a Brooks pointer - an extra indirection layer. Every object has an extra word that points to its actual location. When Shenandoah moves an object, it updates the Brooks pointer; the old location still points to the new location. The load barrier checks the Brooks pointer and follows the redirect if needed.
graph TB
subgraph ShenandoahObject["Object with Brooks Pointer"]
A["Old Location\n(Forwarding Pointer)"] -->|"redirect"| B["New Location\n(Actual Object)"]
end
This means Shenandoah can move objects without updating any references in other objects or thread stacks - the Brooks pointer handles the indirection. This is fundamentally different from how G1 or ZGC handle relocation.
Shenandoah Phases
-
Init Mark (Stop-The-World, short): Brief pause to start marking from roots.
-
Concurrent Marking (Stop-The-World, concurrent): Marks live objects across the heap while your application runs.
-
Final Mark (Stop-The-World, short): Finalizes marking and prepares for evacuation.
-
Concurrent Evacuation (Stop-The-World, concurrent): Moves live objects to new locations using Brooks pointer indirection.
-
Concurrent Update Refs (Stop-The-World, concurrent): Updates references to moved objects across the heap.
graph TB
A["Init Mark\n(STW - short)"] --> B["Concurrent Mark\n(app running)"]
B --> C["Final Mark\n(STW - short)"]
C --> D["Concurrent Evacuate\n(app running)"]
D --> E["Concurrent Update Refs\n(app running)"]
E --> A
Shenandoah Key Characteristics
| Aspect | Behavior |
|---|---|
| Pause times | Sub-millisecond (typically under 1ms) |
| Throughput impact | Moderate - load barrier + Brooks pointer adds overhead |
| Heap scalability | Good - works well from small to large heaps |
| Compaction | Yes - concurrent evacuation |
| Java version | Java 8+ (via backport) / Java 11+ (native) |
| NUMA awareness | Limited |
| Heap efficiency | Lower overhead than ZGC on smaller heaps |
Shenandoah JVM Flags
-XX:+UseShenandoahGC
-Xmx32g -Xms32g
-XX:ShenandoahGCHeuristics=adaptive # Adaptive heuristics (default)
-XX:ShenandoahGCHeuristics=static # Static heuristics
-XX:ShenandoahGCHeuristics=compact # Run GC more aggressively
ZGC vs Shenandoah
| Aspect | ZGC | Shenandoah |
|---|---|---|
| Developer | Oracle | Red Hat / OpenJDK |
| Production-ready since | Java 15 | Java 12 (OpenJDK) |
| Pause times | Sub-ms (typically 0.5-1ms) | Sub-ms (typically 0.5-1ms) |
| Heap size | Best for very large heaps (16GB+) | Works across heap sizes |
| Throughput impact | Slightly lower overhead | Slightly higher overhead |
| NUMA awareness | Full | Limited |
| Compact on each GC | Yes | Yes |
| Brooks pointer | No | Yes |
Production Failure Scenarios
1. ZGC Allocation Stall
Symptom: Application stalls while ZGC waits for available memory.
Cause: ZGC cycles are not completing fast enough to keep up with allocation rate. Usually means you need more heap or a lower allocation rate.
Solution:
// Increase heap
-Xmx128g -Xms128g
// Reduce GC interval target
-XX:ZCollectionInterval=60
// Enable proactive GC
-XX:+ZProactive
2. Shenandoah Heuristics Mismatch
Symptom: GC runs too frequently or not frequently enough.
Cause: The chosen heuristics does not match your workload. adaptive works well for most, but static or compact may be better for specific workloads.
Solution:
# Try different heuristics
-XX:ShenandoahGCHeuristics=adaptive
-XX:ShenandoahGCHeuristics=static
-XX:ShenandoahGCHeuristics=compact
# Or set explicit targets
-XX:ShenandoahMinFreeThreshold=10
-XX:ShenandoahMaxFreeThreshold=30
3. ZGC Large Heap + NUMA Issues
Symptom: Performance drops on NUMA systems with very large heaps.
Cause: ZGC is NUMA-aware but may not perfectly balance allocations across nodes on startup.
Solution: Bind the JVM to specific NUMA nodes or use -XX:+UseNuma to let ZGC handle it automatically.
Trade-off Table
| Configuration | Benefit | Trade-off |
|---|---|---|
-XX:+UseZGC | Sub-ms pauses, scales to 16TB | Requires Java 11+, moderate CPU overhead |
-XX:+UseShenandoahGC | Sub-ms pauses, works at any heap size | Brooks pointer overhead, slightly higher than ZGC |
-XX:ZCollectionInterval=60 | Control GC frequency | May increase memory usage |
-XX:+ZProactive | Proactive GC to prevent stalls | More GC cycles overall |
-XX:ShenandoahGCHeuristics=adaptive | Auto-tune based on workload | Default for most cases |
Implementation Snippets
Enabling ZGC
java -XX:+UseZGC \
-Xmx64g -Xms64g \
-XX:+ZCollectionInterval=120 \
-XX:+ZProactive \
-Xlog:gc*:file=zgc.log \
-jar myapp.jar
Enabling Shenandoah
java -XX:+UseShenandoahGC \
-Xmx32g -Xms32g \
-XX:ShenandoahGCHeuristics=adaptive \
-Xlog:gc*:file=shenandoah.log \
-jar myapp.jar
Checking ZGC/Shenandoah in JMX
import java.lang.management.*;
import java.util.*;
public class UltraLowLatencyGC {
public static void main(String[] args) {
List<GarbageCollectorMXBean> gcs = ManagementFactory.getGarbageCollectorMXBeans();
for (GarbageCollectorMXBean gc : gcs) {
String name = gc.getName();
if (name.contains("ZGC") || name.contains("Shenandoah")) {
System.out.println("Collector: " + name);
System.out.println(" Collections: " + gc.getCollectionCount());
System.out.println(" Time: " + gc.getCollectionTime() + "ms");
System.out.println(" Avg pause: " +
(gc.getCollectionCount() > 0 ?
gc.getCollectionTime() * 1000.0 / gc.getCollectionCount() : 0) + "us");
}
}
}
}
Reading ZGC Logs
[2026-05-26T10:15:30.123+0000] GC(12345) Garbage Collection
Metadata GC: No
Pause Mark Start: 0.118ms
Concurrent Mark: 45.230ms
Pause Mark End: 0.089ms
Concurrent Reset: 0.023ms
Concurrent Relocate: 78.456ms
Total GC Time: 123.916ms
ZGC logs break down each phase and show microsecond-level pause times.
Observability Checklist
- Enable GC logging:
-Xlog:gc*:file=zgc.logor-Xlog:gc*:file=shenandoah.log - Monitor pause times in GC logs - should be consistently under 1ms
- Track
jstat -gc <pid>for collection counts and time - Watch for allocation stalls (ZGC) or evacuation failures
- Monitor CPU usage - both collectors add 5-15% overhead compared to G1
- For ZGC: watch for
ZCollectionIntervaleffectiveness - For Shenandoah: experiment with different heuristics
Security Notes
- Load barrier overhead adds measurable CPU usage - factor this into resource planning
- GC logs at
-Xlog:gc*level reveal collector timing patterns useful for profiling - Large heaps mean longer dump times if you trigger a heap dump - plan accordingly
- Brooks pointer (Shenandoah) adds an extra indirection that tools may not show in object layouts
Common Pitfalls / Anti-Patterns
| Pitfall | What happens | Fix |
|---|---|---|
| Forgetting Java version | ZGC/Shenandoah not available | Requires Java 11+ (ZGC) or Java 12+ (Shenandoah) |
| Setting heap too small | Frequent allocation stalls | Size heap to handle your allocation rate |
| Wrong heuristics (Shenandoah) | Suboptimal GC behavior | Try adaptive, static, or compact |
| Ignoring CPU overhead | Application throttled under load | Both add 5-15% CPU overhead vs G1 |
| Expecting zero pauses | Some pauses are unavoidable | Both have brief stop-the-world phases |
Quick Recap Checklist
- ZGC = Oracle’s collector, sub-ms pauses, scales to 16TB, Java 11+
- Shenandoah = Red Hat’s collector, sub-ms pauses, works at any heap size, Brooks pointer
- Both use load barriers to eliminate most stop-the-world phases
- Moderate CPU overhead (5-15%) compared to G1
- ZGC uses colored pointers; Shenandoah uses Brooks pointer
- Pause times typically under 1ms even on large heaps
- Not a replacement for proper heap sizing and application-level optimization
- For Java 8, Shenandoah is available via backport but ZGC is not
Interview Questions
A load barrier is a small check that runs every time your application reads a heap reference. It verifies the object is in a consistent state and handles any necessary fixups on the fly - like redirecting to a new location if the object was moved. This lets the GC move objects while the application runs, because the application never sees a half-moved object. The overhead is just a handful of CPU instructions per reference read.
ZGC encodes metadata (mark state, remap info) directly into the pointer bits - the pointer itself carries the information. When you read an object reference, the load barrier checks the pointer's colors and follows redirects if needed. Shenandoah uses a Brooks pointer: every object has an extra word that acts as a forwarding pointer. When objects move, the old location keeps a pointer to the new location. Shenandoah's approach adds an extra indirection on every object access; ZGC's approach is more elegant but limited to 64-bit addressable space.
On large heaps, G1's incremental compaction still requires stop-the-world pauses that scale with heap size - especially during mixed collections that evacuate old regions. ZGC does almost no stop-the-world work; its pauses are microseconds regardless of heap size. ZGC can handle 100GB+ heaps with pause times under 1ms because pause time is tied to root scanning, not heap size.
ZGC and Shenandoah typically pause for 0.1-1ms, usually under 1ms even on large heaps. G1 on the same workload might see pauses of 50-500ms depending on heap size and collection choice. The difference is that ZGC and Shenandoah do not have stop-the-world phases for marking or compaction - just brief pauses for root scanning that do not scale with heap size.
CPU overhead. The load barrier in both collectors adds 5-15% more CPU usage compared to G1. On a machine with spare CPU, this is fine. On a CPU-bound workload, you may see lower overall throughput. The trade-off is: G1 gives you better throughput but occasional long pauses; ZGC/Shenandoah give you consistent low latency but use more CPU doing concurrent work.
ZCollectionInterval sets the target time between ZGC cycles (in seconds). Default is 0 (disabled). Setting it to 120 means ZGC targets a GC cycle every 120 seconds proactively, before memory runs out. This is useful for latency-sensitive workloads that prefer predictable small pauses over occasional larger ones. With ZProactive enabled (default), ZGC runs proactively anyway, so this flag is mainly for fine-tuning the interval.
Every heap read in Shenandoah goes through the Brooks pointer first. If an object is at its original location, the Brooks pointer points there directly. If it was moved, the Brooks pointer points to the new location, and the load barrier follows that redirect. This extra indirection adds latency on every object access, whereas ZGC's colored pointers can often be resolved without following redirects. Shenandoah's overhead is roughly 5-10% higher than ZGC on the same workload.
ZProactive (enabled by default) tells ZGC to run GC cycles before memory actually runs out. This prevents allocation stalls where the application has to wait for free memory. With ZProactive, ZGC monitors memory pressure and initiates GC cycles when heap usage reaches a threshold, well before actual exhaustion. This is the key to ZGC's consistent sub-ms pauses - it stays ahead of memory pressure rather than reacting to it.
On NUMA systems (multi-socket servers where memory has different latency depending on which CPU accesses it), ZGC tries to allocate objects on the NUMA node where the allocating thread runs. This reduces cross-NUMA memory access latency. ZGC is fully NUMA-aware; Shenandoah has limited support. For large-heap workloads on multi-socket servers, ZGC's NUMA awareness provides meaningful performance benefits.
G1's pause times scale with heap size because its stop-the-world phases (young GC, mixed GC) must process more objects as heap grows. ZGC's stop-the-world phases are limited to root scanning (thread stacks, registers), which is constant regardless of heap size. ZGC does all marking, relocation, and remapping concurrently. Even on 128GB heaps, ZGC pauses remain under 1ms because they only touch roots, not the heap itself.
When ZGC cannot keep up with allocation rate, the allocating thread itself performs some GC work to free memory - this is called an allocation stall. Unlike normal ZGC pauses which are brief and in separate threads, allocation stalls directly impact application latency. They happen when heap is too small for the workload or when allocation rate spikes unexpectedly. Increase heap, enable ZProactive, or reduce allocation rate to fix.
GC Locker pauses happen when OutOfMemoryError is thrown due to GC locker conditions (rare). Pause Mark Start briefly pauses to mark roots from thread stacks. Pause Mark End briefly pauses to finalize marking. All three are very short (microseconds to sub-millisecond). Everything else - concurrent marking, relocation, remap - runs while the application runs. This is fundamentally different from G1 where most work happens in stop-the-world pauses.
Shenandoah was backported to Java 8 by Red Hat before being contributed to OpenJDK. ZGC required significant JVM internal changes that only landed in Java 11. The load barrier implementations differ - ZGC's colored pointers required changes to the JIT compiler and object layout that were not available in Java 8. If you need sub-ms pauses on Java 8, Shenandoah is your option; otherwise upgrade to Java 11+ for ZGC.
Remap in ZGC updates references to point to objects' new locations after relocation. Unlike Shenandoah which updates references as a separate concurrent phase, ZGC lazily remaps references the first time they are accessed after being relocated. This is possible because colored pointers encode whether an object has been remapped. The load barrier handles remapping on-the-fly, spreading the work across normal memory accesses rather than a dedicated GC phase.
ZGC divides heap into small (2MB), medium (32MB), and large (2MB multiples, up to 16TB) pages. Small objects (up to ~116KB) allocate in small pages. Large objects allocate in large pages directly. Medium objects (116KB to 32MB) use medium pages. This three-tier approach reduces internal fragmentation compared to G1's fixed region size while maintaining efficient allocation.
Shenandoah evacuates live objects from regions concurrently using Brooks pointer indirection. During concurrent evacuation, application threads can read and write objects while the GC moves them. The Brooks pointer in the old location redirects to the new location. This means reads incur an extra indirection (the pointer check and potential follow), adding roughly 5-10% throughput overhead compared to no-load-barrier collectors. The trade-off is sub-millisecond pauses regardless of heap size, which is worth the overhead for latency-sensitive workloads.
ZGC encodes three colors (meta bits) in a 64-bit pointer: marked (1), remapped (1), and pending marker (0). On a 64-bit system, ZGC uses the low 44 bits for the address itself, giving it addressability up to 16TB (2^44 bytes). This is far beyond the practical heap limit for ZGC. The colored pointer approach means no extra object header is needed - the reference itself carries the GC state, which is what makes ZGC's load barrier so fast compared to Shenandoah's Brooks pointer.
Shenandoah's Brooks pointer requires an extra indirection on every heap read - the load barrier checks the pointer, and if the object was moved, follows the redirect. ZGC stores colors in the pointer itself without indirection - the load barrier reads the pointer bits directly and follows the redirect only if necessary. ZGC's colored pointers typically succeed without a redirect, while Shenandoah's Brooks pointer always requires a follow even when no evacuation happened. Additionally, Shenandoah updates all references in a separate concurrent phase, while ZGC lazily remaps references on access.
Both ZGC and Shenandoah move objects concurrently, but only when no application threads are referencing the object at that instant. The load barrier ensures any read of an object being moved can detect the move and return the new address. Objects cannot be moved while a thread is mid-instruction with a local reference to them, but this is handled naturally because the load barrier executes before the reference is used. Large objects (humongous in ZGC) or pinned objects may not be movable during certain phases.
ZGC triggers cycles based on allocation rate and heap occupancy. With ZProactive enabled (default), ZGC periodically checks if a cycle is needed even before memory runs out. The ZCollectionInterval flag sets a target time between cycles. When heap occupancy exceeds a threshold (dynamically tuned), ZGC initiates a concurrent cycle. ZGC's proactive scheduling prevents allocation stalls by staying ahead of demand. Under extreme allocation pressure, ZGC can still experience allocation stalls, which are the most impactful latency events to avoid.
Further Reading
- ZGC Project Page - OpenJDK ZGC official documentation and design documents
- Shenandoah GC Wiki - OpenJDK Shenandoah official documentation
- JEP 333: ZGC - A Scalable, Low-Latency Garbage Collector - Original Java Enhancement Proposal for ZGC
- Shenandoah: The Church of Optimization - Red Hat’s deep dive into Shenandoah performance characteristics
Conclusion
ZGC and Shenandoah achieve sub-millisecond pause times by doing almost all GC work concurrently with the application, including compaction. ZGC uses colored pointers (Oracle, Java 11+, best for 16GB+ heaps); Shenandoah uses Brooks pointer indirection (Red Hat, Java 8+, works across heap sizes). Both add 5-15% CPU overhead compared to G1 but eliminate stop-the-world pauses for marking and compaction — choose based on your heap size, Java version, and latency requirements.
Category
Related Posts
CMS and G1 Collectors: Low-Latency Garbage Collection
How CMS and G1 garbage collectors reduce pause times through concurrent marking, region-based heap layout, and incremental compaction.
GC Fundamentals: Mark-Compact, Copying, and Mark-Sweep
Understanding the three core garbage collection algorithms - Mark-Sweep, Mark-Compact, and Copying - their mechanics, trade-offs, and when to use each.
JVM GC Tuning: Heap Sizing and Threshold Optimization
Practical strategies for sizing JVM heap, tuning generation ratios, and optimizing GC thresholds to reduce pause times and improve throughput.