ZGC and Shenandoah: Ultra-Low Latency Garbage Collectors

How ZGC and Shenandoah achieve sub-millisecond pause times through concurrent operations and load barriers, without stopping your application.

published: May 26, 2026 reading time: 21 min read author: GeekWorkBench

ZGC and Shenandoah: Ultra-Low Latency Garbage Collectors

G1 made GC pauses more manageable, but it still stops your application for certain phases. ZGC and Shenandoah take a fundamentally different approach: they aim for pauses that are effectively unmeasurable - measured in microseconds rather than milliseconds. They do this by doing almost all GC work concurrently with your application, including the compaction phase.

This covers how ZGC and Shenandoah work, what makes them different from G1, and when the trade-offs are worth it.

Introduction

ZGC and Shenandoah are the JVM’s answer to applications that cannot tolerate GC pauses — trading off some throughput for the ability to keep pause times consistently below one millisecond, even during heavy GC loads. G1 improved on older collectors by_parallelizing pauses, but it still stops the world for young-generation collection and mixed collections. ZGC and Shenandoah take a fundamentally different approach: they perform nearly all GC work concurrently with the application threads, including the compaction phase that traditionally requires a stop-the-world pause.

The key enabling technology in both collectors is load barriers — small checks injected at read access points that ensure the garbage collector can keep track of object references even while mutator threads are actively modifying the heap. This allows the collector to move objects (evacuate them) without stopping the application. ZGC achieves this with colored pointers and a multi-generational design in Java 21+, while Shenandoah uses a Brooks pointer and a single heap view. This post covers how both work, their performance characteristics, and when the trade-offs justify choosing one over G1 or a parallel collector.

When to Use This Knowledge

Use when:

Your latency SLA is measured in milliseconds or microseconds
Your application cannot tolerate GC pauses at all (trading systems, real-time gaming, control systems)
You have very large heaps (100GB+) and need consistent performance
You are running on Java 11+ and need the lowest possible pause times

Do not use when:

Throughput is your primary metric (batch workloads, ETL)
You are on Java 8 (ZGC and Shenandoah require Java 11+)
Your latency SLA is in the hundreds of milliseconds and G1 already meets it
You have strict memory constraints and cannot afford the overhead

When NOT to Use

ZGC and Shenandoah are overkill in several common scenarios. The CPU and memory overhead only makes sense when sub-millisecond pauses actually matter.

For short-lived serverless functions, skip them. A Lambda that runs for 500ms does not need sub-millisecond pauses. G1 pauses in that environment are measured in milliseconds at most, and the 5-15% CPU overhead would hurt throughput for no gain. Cold start times for these collectors also work against you in serverless.

Memory-constrained containers without hard pause SLAs also do not benefit. Running in a container with 512MB or 1GB heap when your SLA is measured in seconds? The load barrier overhead is pure waste. Even G1 is probably overkill here. Parallel GC or Serial GC with correct heap sizing often performs better when latency is not the concern.

If your application does not have a latency problem, ZGC and Shenandoah will not help. GC logs showing G1 pauses averaging 50-100ms when your SLA is 500ms means you already have headroom. Profile first. Do not switch collectors to solve a problem that does not exist.

The Load Barrier Approach

Both ZGC and Shenandoah use a technique called a load barrier (also called a read barrier) to maintain correctness without stopping the world. The basic idea is that every heap read - every time your code accesses an object field - goes through a small check. This check ensures the object is in a consistent state.

graph LR
    A["Application\nThread"] --> B["Heap Read"]
    B --> C{"Load Barrier\nCheck"}
    C -->|"Object OK"| D["Return\nReference"]
    C -->|"Object being\nmoved"| E["Fix pointer\n atomically"]
    E --> D

The load barrier is extremely fast - a handful of instructions. It adds some CPU overhead but eliminates the need for stop-the-world pauses for most GC work.

ZGC (Z Garbage Collector)

ZGC was developed by Oracle and introduced in Java 11 as an experimental feature, becoming production-ready in Java 15. It is designed for very large heaps and very low latency.

How ZGC Works

ZGC divides the heap into regions called pages (not to be confused with OS pages). These are different from G1 regions - ZGC uses three size classes: small (2MB), medium (32MB), and large (2MB multiples, up to 16TB on 64-bit systems).

graph TB
    subgraph ZGCHeap["ZGC Heap - Colored Pointers"]
        Z1["Small Page\n(2MB)"]
        Z2["Medium Page\n(32MB)"]
        Z3["Large Page\n(N * 2MB)"]
    end

ZGC uses colored pointers. A pointer to an object encodes information about the object: whether it is marked live, whether it is in a remapping set, and more. When you read a pointer, the load barrier checks the colors and handles any necessary fixups on the fly.

ZGC Phases

ZGC has three main pause types, all very short:

GC Locker (Stop-The-World, microseconds): Happens when your Java code throws an OutOfMemoryError due to GC locker conditions. Rare.
Pause Mark Start (Stop-The-World, microseconds): Briefly pauses to mark roots from thread stacks.
Pause Mark End (Stop-The-World, microseconds): Briefly pauses to finalize the marking phase.

Everything else - concurrent marking, concurrent relocate, concurrent remap - runs while your application runs.

graph TB
    A["Pause Mark Start\n(microseconds)"] --> B["Concurrent Mark\n(app running)"]
    B --> C["Concurrent Relocate\n(app running)"]
    C --> D["Pause Mark End\n(microseconds)"]
    D --> E["Concurrent Remap\n(app running)"]
    E --> A

ZGC Key Characteristics

Aspect	Behavior
Pause times	Sub-millisecond (typically under 1ms)
Throughput impact	Moderate - load barrier adds 5-10% CPU overhead
Heap scalability	Excellent - tested to 16TB
Compaction	Yes - concurrent relocation
Java version	Java 11+ (production in Java 15+)
NUMA awareness	Yes
Single-heap mode	Yes

ZGC JVM Flags

-XX:+UseZGC
-Xmx64g -Xms64g          # Works well with large heaps
-XX:+ZCollectionInterval=120  # Set GC interval target (seconds)
-XX:+ZProactive           # Enable proactive GC (default on)

Shenandoah

Shenandoah was developed by Red Hat and open-sourced before being adopted in OpenJDK. Unlike ZGC, Shenandoah is designed to work with a wide range of heap sizes, not just very large ones.

How Shenandoah Works

Shenandoah uses a Brooks pointer - an extra indirection layer. Every object has an extra word that points to its actual location. When Shenandoah moves an object, it updates the Brooks pointer; the old location still points to the new location. The load barrier checks the Brooks pointer and follows the redirect if needed.

graph TB
    subgraph ShenandoahObject["Object with Brooks Pointer"]
        A["Old Location\n(Forwarding Pointer)"] -->|"redirect"| B["New Location\n(Actual Object)"]
    end

This means Shenandoah can move objects without updating any references in other objects or thread stacks - the Brooks pointer handles the indirection. This is fundamentally different from how G1 or ZGC handle relocation.

Shenandoah Phases

Init Mark (Stop-The-World, short): Brief pause to start marking from roots.
Concurrent Marking (Stop-The-World, concurrent): Marks live objects across the heap while your application runs.
Final Mark (Stop-The-World, short): Finalizes marking and prepares for evacuation.
Concurrent Evacuation (Stop-The-World, concurrent): Moves live objects to new locations using Brooks pointer indirection.
Concurrent Update Refs (Stop-The-World, concurrent): Updates references to moved objects across the heap.

graph TB
    A["Init Mark\n(STW - short)"] --> B["Concurrent Mark\n(app running)"]
    B --> C["Final Mark\n(STW - short)"]
    C --> D["Concurrent Evacuate\n(app running)"]
    D --> E["Concurrent Update Refs\n(app running)"]
    E --> A

Shenandoah Key Characteristics

Aspect	Behavior
Pause times	Sub-millisecond (typically under 1ms)
Throughput impact	Moderate - load barrier + Brooks pointer adds overhead
Heap scalability	Good - works well from small to large heaps
Compaction	Yes - concurrent evacuation
Java version	Java 8+ (via backport) / Java 11+ (native)
NUMA awareness	Limited
Heap efficiency	Lower overhead than ZGC on smaller heaps

Shenandoah JVM Flags

-XX:+UseShenandoahGC
-Xmx32g -Xms32g
-XX:ShenandoahGCHeuristics=adaptive   # Adaptive heuristics (default)
-XX:ShenandoahGCHeuristics=static     # Static heuristics
-XX:ShenandoahGCHeuristics=compact     # Run GC more aggressively

ZGC vs Shenandoah

Aspect	ZGC	Shenandoah
Developer	Oracle	Red Hat / OpenJDK
Production-ready since	Java 15	Java 12 (OpenJDK)
Pause times	Sub-ms (typically 0.5-1ms)	Sub-ms (typically 0.5-1ms)
Heap size	Best for very large heaps (16GB+)	Works across heap sizes
Throughput impact	Slightly lower overhead	Slightly higher overhead
NUMA awareness	Full	Limited
Compact on each GC	Yes	Yes
Brooks pointer	No	Yes

Production Failure Scenarios

1. ZGC Allocation Stall

Symptom: Application stalls while ZGC waits for available memory.

Cause: ZGC cycles are not completing fast enough to keep up with allocation rate. Usually means you need more heap or a lower allocation rate.

Solution:

// Increase heap
-Xmx128g -Xms128g

// Reduce GC interval target
-XX:ZCollectionInterval=60

// Enable proactive GC
-XX:+ZProactive

2. Shenandoah Heuristics Mismatch

Symptom: GC runs too frequently or not frequently enough.

Cause: The chosen heuristics does not match your workload. adaptive works well for most, but static or compact may be better for specific workloads.

Solution:

# Try different heuristics
-XX:ShenandoahGCHeuristics=adaptive
-XX:ShenandoahGCHeuristics=static
-XX:ShenandoahGCHeuristics=compact

# Or set explicit targets
-XX:ShenandoahMinFreeThreshold=10
-XX:ShenandoahMaxFreeThreshold=30

3. ZGC Large Heap + NUMA Issues

Symptom: Performance drops on NUMA systems with very large heaps.

Cause: ZGC is NUMA-aware but may not perfectly balance allocations across nodes on startup.

Solution: Bind the JVM to specific NUMA nodes or use -XX:+UseNuma to let ZGC handle it automatically.

Trade-off Table

Configuration	Benefit	Trade-off
`-XX:+UseZGC`	Sub-ms pauses, scales to 16TB	Requires Java 11+, moderate CPU overhead
`-XX:+UseShenandoahGC`	Sub-ms pauses, works at any heap size	Brooks pointer overhead, slightly higher than ZGC
`-XX:ZCollectionInterval=60`	Control GC frequency	May increase memory usage
`-XX:+ZProactive`	Proactive GC to prevent stalls	More GC cycles overall
`-XX:ShenandoahGCHeuristics=adaptive`	Auto-tune based on workload	Default for most cases

Implementation Snippets

Enabling ZGC

java -XX:+UseZGC \
     -Xmx64g -Xms64g \
     -XX:+ZCollectionInterval=120 \
     -XX:+ZProactive \
     -Xlog:gc*:file=zgc.log \
     -jar myapp.jar

Enabling Shenandoah

java -XX:+UseShenandoahGC \
     -Xmx32g -Xms32g \
     -XX:ShenandoahGCHeuristics=adaptive \
     -Xlog:gc*:file=shenandoah.log \
     -jar myapp.jar

Checking ZGC/Shenandoah in JMX

import java.lang.management.*;
import java.util.*;

public class UltraLowLatencyGC {
    public static void main(String[] args) {
        List<GarbageCollectorMXBean> gcs = ManagementFactory.getGarbageCollectorMXBeans();
        for (GarbageCollectorMXBean gc : gcs) {
            String name = gc.getName();
            if (name.contains("ZGC") || name.contains("Shenandoah")) {
                System.out.println("Collector: " + name);
                System.out.println("  Collections: " + gc.getCollectionCount());
                System.out.println("  Time: " + gc.getCollectionTime() + "ms");
                System.out.println("  Avg pause: " +
                    (gc.getCollectionCount() > 0 ?
                     gc.getCollectionTime() * 1000.0 / gc.getCollectionCount() : 0) + "us");
            }
        }
    }
}

Reading ZGC Logs

[2026-05-26T10:15:30.123+0000] GC(12345) Garbage Collection
  Metadata GC: No
  Pause Mark Start: 0.118ms
  Concurrent Mark: 45.230ms
  Pause Mark End: 0.089ms
  Concurrent Reset: 0.023ms
  Concurrent Relocate: 78.456ms
  Total GC Time: 123.916ms

ZGC logs break down each phase and show microsecond-level pause times.

Observability Checklist

Enable GC logging: -Xlog:gc*:file=zgc.log or -Xlog:gc*:file=shenandoah.log
Monitor pause times in GC logs - should be consistently under 1ms
Track jstat -gc <pid> for collection counts and time
Watch for allocation stalls (ZGC) or evacuation failures
Monitor CPU usage - both collectors add 5-15% overhead compared to G1
For ZGC: watch for ZCollectionInterval effectiveness
For Shenandoah: experiment with different heuristics

Security Notes

Load barrier overhead adds measurable CPU usage - factor this into resource planning
GC logs at -Xlog:gc* level reveal collector timing patterns useful for profiling
Large heaps mean longer dump times if you trigger a heap dump - plan accordingly
Brooks pointer (Shenandoah) adds an extra indirection that tools may not show in object layouts

Common Pitfalls / Anti-Patterns

Pitfall	What happens	Fix
Forgetting Java version	ZGC/Shenandoah not available	Requires Java 11+ (ZGC) or Java 12+ (Shenandoah)
Setting heap too small	Frequent allocation stalls	Size heap to handle your allocation rate
Wrong heuristics (Shenandoah)	Suboptimal GC behavior	Try `adaptive`, `static`, or `compact`
Ignoring CPU overhead	Application throttled under load	Both add 5-15% CPU overhead vs G1
Expecting zero pauses	Some pauses are unavoidable	Both have brief stop-the-world phases

Quick Recap Checklist

ZGC = Oracle’s collector, sub-ms pauses, scales to 16TB, Java 11+
Shenandoah = Red Hat’s collector, sub-ms pauses, works at any heap size, Brooks pointer
Both use load barriers to eliminate most stop-the-world phases
Moderate CPU overhead (5-15%) compared to G1
ZGC uses colored pointers; Shenandoah uses Brooks pointer
Pause times typically under 1ms even on large heaps
Not a replacement for proper heap sizing and application-level optimization
For Java 8, Shenandoah is available via backport but ZGC is not

Interview Questions

1. What is a load barrier and how does it enable concurrent GC?

A load barrier is a small check that runs every time your application reads a heap reference. It verifies the object is in a consistent state and handles any necessary fixups on the fly - like redirecting to a new location if the object was moved. This lets the GC move objects while the application runs, because the application never sees a half-moved object. The overhead is just a handful of CPU instructions per reference read.

2. What is the difference between ZGC's colored pointers and Shenandoah's Brooks pointer?

ZGC encodes metadata (mark state, remap info) directly into the pointer bits - the pointer itself carries the information. When you read an object reference, the load barrier checks the pointer's colors and follows redirects if needed. Shenandoah uses a Brooks pointer: every object has an extra word that acts as a forwarding pointer. When objects move, the old location keeps a pointer to the new location. Shenandoah's approach adds an extra indirection on every object access; ZGC's approach is more elegant but limited to 64-bit addressable space.

3. Why does ZGC perform better on very large heaps compared to G1?

On large heaps, G1's incremental compaction still requires stop-the-world pauses that scale with heap size - especially during mixed collections that evacuate old regions. ZGC does almost no stop-the-world work; its pauses are microseconds regardless of heap size. ZGC can handle 100GB+ heaps with pause times under 1ms because pause time is tied to root scanning, not heap size.

4. What are the pause times for ZGC and Shenandoah compared to G1?

ZGC and Shenandoah typically pause for 0.1-1ms, usually under 1ms even on large heaps. G1 on the same workload might see pauses of 50-500ms depending on heap size and collection choice. The difference is that ZGC and Shenandoah do not have stop-the-world phases for marking or compaction - just brief pauses for root scanning that do not scale with heap size.

5. What are the trade-offs of using ZGC or Shenandoah over G1?

CPU overhead. The load barrier in both collectors adds 5-15% more CPU usage compared to G1. On a machine with spare CPU, this is fine. On a CPU-bound workload, you may see lower overall throughput. The trade-off is: G1 gives you better throughput but occasional long pauses; ZGC/Shenandoah give you consistent low latency but use more CPU doing concurrent work.

6. What is ZCollectionInterval and when should you tune it?

ZCollectionInterval sets the target time between ZGC cycles (in seconds). Default is 0 (disabled). Setting it to 120 means ZGC targets a GC cycle every 120 seconds proactively, before memory runs out. This is useful for latency-sensitive workloads that prefer predictable small pauses over occasional larger ones. With ZProactive enabled (default), ZGC runs proactively anyway, so this flag is mainly for fine-tuning the interval.

7. Why is Shenandoah's Brooks pointer an indirection overhead?

Every heap read in Shenandoah goes through the Brooks pointer first. If an object is at its original location, the Brooks pointer points there directly. If it was moved, the Brooks pointer points to the new location, and the load barrier follows that redirect. This extra indirection adds latency on every object access, whereas ZGC's colored pointers can often be resolved without following redirects. Shenandoah's overhead is roughly 5-10% higher than ZGC on the same workload.

8. What does ZProactive do in ZGC?

ZProactive (enabled by default) tells ZGC to run GC cycles before memory actually runs out. This prevents allocation stalls where the application has to wait for free memory. With ZProactive, ZGC monitors memory pressure and initiates GC cycles when heap usage reaches a threshold, well before actual exhaustion. This is the key to ZGC's consistent sub-ms pauses - it stays ahead of memory pressure rather than reacting to it.

9. What is NUMA awareness in ZGC?

On NUMA systems (multi-socket servers where memory has different latency depending on which CPU accesses it), ZGC tries to allocate objects on the NUMA node where the allocating thread runs. This reduces cross-NUMA memory access latency. ZGC is fully NUMA-aware; Shenandoah has limited support. For large-heap workloads on multi-socket servers, ZGC's NUMA awareness provides meaningful performance benefits.

10. Why does ZGC scale better than G1 on large heaps?

G1's pause times scale with heap size because its stop-the-world phases (young GC, mixed GC) must process more objects as heap grows. ZGC's stop-the-world phases are limited to root scanning (thread stacks, registers), which is constant regardless of heap size. ZGC does all marking, relocation, and remapping concurrently. Even on 128GB heaps, ZGC pauses remain under 1ms because they only touch roots, not the heap itself.

11. What is the allocation stall in ZGC?

When ZGC cannot keep up with allocation rate, the allocating thread itself performs some GC work to free memory - this is called an allocation stall. Unlike normal ZGC pauses which are brief and in separate threads, allocation stalls directly impact application latency. They happen when heap is too small for the workload or when allocation rate spikes unexpectedly. Increase heap, enable ZProactive, or reduce allocation rate to fix.

12. What is the difference between ZGC's three pause types?

GC Locker pauses happen when OutOfMemoryError is thrown due to GC locker conditions (rare). Pause Mark Start briefly pauses to mark roots from thread stacks. Pause Mark End briefly pauses to finalize marking. All three are very short (microseconds to sub-millisecond). Everything else - concurrent marking, relocation, remap - runs while the application runs. This is fundamentally different from G1 where most work happens in stop-the-world pauses.

13. Why does Shenandoah work on Java 8 while ZGC requires Java 11+?

Shenandoah was backported to Java 8 by Red Hat before being contributed to OpenJDK. ZGC required significant JVM internal changes that only landed in Java 11. The load barrier implementations differ - ZGC's colored pointers required changes to the JIT compiler and object layout that were not available in Java 8. If you need sub-ms pauses on Java 8, Shenandoah is your option; otherwise upgrade to Java 11+ for ZGC.

14. What is the remap phase in ZGC?

Remap in ZGC updates references to point to objects' new locations after relocation. Unlike Shenandoah which updates references as a separate concurrent phase, ZGC lazily remaps references the first time they are accessed after being relocated. This is possible because colored pointers encode whether an object has been remapped. The load barrier handles remapping on-the-fly, spreading the work across normal memory accesses rather than a dedicated GC phase.

15. What is the relationship between ZGC's page sizes and object sizes?

ZGC divides heap into small (2MB), medium (32MB), and large (2MB multiples, up to 16TB) pages. Small objects (up to ~116KB) allocate in small pages. Large objects allocate in large pages directly. Medium objects (116KB to 32MB) use medium pages. This three-tier approach reduces internal fragmentation compared to G1's fixed region size while maintaining efficient allocation.

16. How does Shenandoah's concurrent eviction work and what is its impact on throughput?

Shenandoah evacuates live objects from regions concurrently using Brooks pointer indirection. During concurrent evacuation, application threads can read and write objects while the GC moves them. The Brooks pointer in the old location redirects to the new location. This means reads incur an extra indirection (the pointer check and potential follow), adding roughly 5-10% throughput overhead compared to no-load-barrier collectors. The trade-off is sub-millisecond pauses regardless of heap size, which is worth the overhead for latency-sensitive workloads.

17. What is ZGC's colored pointer encoding and how far can it address?

ZGC encodes three colors (meta bits) in a 64-bit pointer: marked (1), remapped (1), and pending marker (0). On a 64-bit system, ZGC uses the low 44 bits for the address itself, giving it addressability up to 16TB (2^44 bytes). This is far beyond the practical heap limit for ZGC. The colored pointer approach means no extra object header is needed - the reference itself carries the GC state, which is what makes ZGC's load barrier so fast compared to Shenandoah's Brooks pointer.

18. Why does Shenandoah have higher throughput overhead than ZGC for the same workload?

Shenandoah's Brooks pointer requires an extra indirection on every heap read - the load barrier checks the pointer, and if the object was moved, follows the redirect. ZGC stores colors in the pointer itself without indirection - the load barrier reads the pointer bits directly and follows the redirect only if necessary. ZGC's colored pointers typically succeed without a redirect, while Shenandoah's Brooks pointer always requires a follow even when no evacuation happened. Additionally, Shenandoah updates all references in a separate concurrent phase, while ZGC lazily remaps references on access.

19. What are the constraints on object movement in ZGC and Shenandoah during concurrent phases?

Both ZGC and Shenandoah move objects concurrently, but only when no application threads are referencing the object at that instant. The load barrier ensures any read of an object being moved can detect the move and return the new address. Objects cannot be moved while a thread is mid-instruction with a local reference to them, but this is handled naturally because the load barrier executes before the reference is used. Large objects (humongous in ZGC) or pinned objects may not be movable during certain phases.

20. How does ZGC handle GC scheduling and what triggers a ZGC cycle?

ZGC triggers cycles based on allocation rate and heap occupancy. With ZProactive enabled (default), ZGC periodically checks if a cycle is needed even before memory runs out. The ZCollectionInterval flag sets a target time between cycles. When heap occupancy exceeds a threshold (dynamically tuned), ZGC initiates a concurrent cycle. ZGC's proactive scheduling prevents allocation stalls by staying ahead of demand. Under extreme allocation pressure, ZGC can still experience allocation stalls, which are the most impactful latency events to avoid.

Conclusion

ZGC and Shenandoah achieve sub-millisecond pause times by doing almost all GC work concurrently with the application, including compaction. ZGC uses colored pointers (Oracle, Java 11+, best for 16GB+ heaps); Shenandoah uses Brooks pointer indirection (Red Hat, Java 8+, works across heap sizes). Both add 5-15% CPU overhead compared to G1 but eliminate stop-the-world pauses for marking and compaction — choose based on your heap size, Java version, and latency requirements.

ZGC and Shenandoah: Ultra-Low Latency Garbage Collectors

Introduction

When to Use This Knowledge

When NOT to Use

The Load Barrier Approach

ZGC (Z Garbage Collector)

How ZGC Works

ZGC Phases

ZGC Key Characteristics

ZGC JVM Flags

Shenandoah

How Shenandoah Works

Shenandoah Phases

Shenandoah Key Characteristics

Shenandoah JVM Flags

ZGC vs Shenandoah

Production Failure Scenarios

1. ZGC Allocation Stall

2. Shenandoah Heuristics Mismatch

3. ZGC Large Heap + NUMA Issues

Trade-off Table

Implementation Snippets

Enabling ZGC

Enabling Shenandoah

Checking ZGC/Shenandoah in JMX

Reading ZGC Logs

Observability Checklist

Security Notes

Common Pitfalls / Anti-Patterns

Quick Recap Checklist

Interview Questions

Further Reading

Conclusion

Category

Tags

Related Posts

CMS and G1 Collectors: Low-Latency Garbage Collection

GC Fundamentals: Mark-Compact, Copying, and Mark-Sweep

JVM GC Tuning: Heap Sizing and Threshold Optimization