GC – 2 [ Garbage Collectors ]

Garbage Collectors are the implementations of garbage collections.

Util today, with the development of java, it has derived many garbage collectors :

Serial Garbage Collector;
Parallel Garbage Collector;
CMS Garbage Collector;
G1 Garbage Collector;
Z Garbage Collector.

What You Need

About 20 minutes
A favorite text editor or IDE
Java 8 or later

1. Garbage Collectors’ Performance Metrics

1.1 Throughput

The percentage of total time spent in application over total time (time spent in application + time spent in garbage collection).

For instance, if JVM has been running for 100 minutes, and the garbage collection has been taken 1 minute, then the throughput = (100 – 1) / 100 = 99%.

1.2 Pause Time (STW time)

The amount of time the application’s worker threads are suspended while garbage collection is being performed.

For instance, if stw time = 100ms, then it means that during this 100ms no application threads are running, only gc threads are running.

1.3 Throughput VS Pause Time

Throughput and Pause Time are a pair of conflicting goals.

If you choose to prioritize high throughput, it is necessary to reduce the execution frequency of garbage collection, but this will cause JVM to need longer pause time to perform garbage collection.

Conversely, if you choose to prioritize low pause time, then in order to reduce the stw, we can only perform garbage collection frequently, which again leads to a drop in throughput.

Most modern garbage collectors aim to reduce pause time when high throughput is a priority.

2. Classic Garbage Collectors

Serial Garbage Collectors : Serial, Serial Old;
Parallel Garbage Collectors : ParNew, Parallel Scavenge, Parallel Old;
Concurrent Garbage Collectors : CMS, G1.

The combination of those garbage collectors is like below :

Serial, ParNew and Parallel Scanvenge garbage collectors are used for garbage collection of the new generation;
Serial Old, CMS and Parallel Old garbage collectors are used for garbage collection of the old generation, for CMS, if there is concurrent mode failure, it will fallback to the Serial Old;
G1 garbage collector is used for garbage collection of both new and old generations.

2.1 Serial And Serial Old

Serial garbage collector is the default garbage collector for the new generation in hotspot JVM under client mode.

It uses copy algorithm to recycle the new generation serially.

STW will be occuried when it is doing garbage collection.

For the old generation, Serial Old garbage collector is the default one in hotspot JVM under client mode.

It does the garbage collection serially just like Serial garbage collector.

It occurs STW as well except that it uses mark compact algorithm.

Although serial garbage collector is barely used any more, we can still use it by running JVM with flag -XX:+UseSerialGC.

2.2 ParNew

If Serial garbage collector is mono-thread version of garbage collector for the new generation, then ParNew garbage collector can be considered as its multi-thread version.

Par stands for parallel and New means that it can only be used for garbage collection of the new generation.

It is better to use ParNew gc with multi CPUs, Serial gc would be a better choice when there is only one CPU.

We can use the flag -XX:+UseParNewGC to enable ParNew garbage collector.

The flag -XX:ParallelGCThreads allows us to set the number of threads used during parallel phases of the garbage collectors.

The default value varies with the platform on which the JVM is running.

Normally it equals to CPU’s core number.

2.3 Parallel Scavenge And Parallel Old

Besides ParNew garbage collector, Parallel Scavenge is another classic garbage collector used for garbage collection of the new generation.

Different from ParNew, Parallel Scavenge is a throughput priority garbage collector.

A high throughput can efficiently use cpu time to complete the program’s computing tasks as soon as possible.

So it is mainly suitable for background computing, not suitable for foreground user interaction.

Since jdk 1.6, Parallel Old garbage collector is introduced to make a combination with Parallel Scanvenge.

Its main job consists of doing the garbage collection of the old generation in parallel.

Parallel Scavenge and Parallel Old are the default garbage collectors of the new and old generation in jdk 1.8.

Below are some JVM flags to configure Parallel Scavenge and Parallel Old garbage collectors :

FLAG	DESCRIPTION
-XX:+UseParallelGC	Enable Parallel Scavenge for the new generation, set this flag will enable -XX:+UseParallelOldGC automatically
-XX:+UseParallelOldGC	Enable Parallel Old for the old generation, set this flag will enable -XX:+UseParallelGC automatically
-XX:ParallelGCThreads	Set the number of threads used during parallel phases of the garbage collectors. By default, it equals to CPU numbers.
-XX:MaxGCPauseMillis	Set the peak pause time expected, the JVM will make its best effort to achieve it, it is not recommended to set this flag, because a short gc pause time may lead to a low throughput.
-XX:GCTimeRatio	Set the ratio between the time spent in GC and the time spent outside of GC. It is defined as 1/(1 + N) and it’s a percentage of time spent in garbage collection. For example, setting -XX:GCTimeRatio=9 means that 10% of the application’s working time may be spent in the garbage collection, its values are between 0 and 100, by default, it is 99 (gc time does not pass 1%)
-XX:+UseAdaptiveSizePolicy	Enable adaptive size policy for Parallel Scavenge. In this mode, the size of new generation, the ratio of Eden and Survivor, the age of surviving objects in the new generation to be moved to the old generation etc will be configured adaptively and automatically by JVM. When gc tunning is difficult manually, we can only set the heap maximum size + GCTimeRatio + MaxGCPauseMillis, and let JVM to optimize adaptively.

2.4 CMS (Concurrent Mark Sweep)

CMS is a concurrent garbage collector in hotspot JVM, it allows gc threads execute at the same time with application threads.

Its full name is Concurrent Mark Sweep garbage collector.

It focus on minimizing the pause time of application threads during garbage collection as much as possible.

It is a good choice for applications that interact a lot with users, like internet site or B/S system.

It uses mark and sweep algorithm and it still have STW.

The whole process of cms is divided into 4 stages : initial-mark, concurrent-mark, remark and concurrent sweep.

STAGE	DESCRIPTION
Initial Mark	Mark objects that gc roots can directly associate with, all the application threads will be suspended during this stage, but since there are not so much objects directly linked to gc roots, so it is relatively fast.
Concurrent Mark	Starting from the object directly associated with gc roots, this stage traverses the entire objects graph and takes a long time, but it can be executed together with the application threads, so there is no STW.
Remark	Since in the concurrent mark stage, the application threads and the gc threads are executed at the same time, some of the marked objects’ status may change, so it is necessary to correct the mark record for those objects, the pause time in this stage is longer than the initial mark stage, but much shorter than the concurrent mark stage.
Concurrent Sweep	Clean up the objects that are judged to be dead in the previous mark stages, and release the memory space, since there is no need to move live objects, it can also be executed at the same time as the app thread.

CMS has below shortcomings :

Low throughput : low pause time lead to low throughput, resulting in insufficient CPU utilization;
Unable to handle floating garbage : floating garbage refers to the garbage generated by the application threads continuing to run in the concurrent mark stage, this part of the garbage can only be collected at the next GC, due to the existence of floating garbage, it is necessary to reserve a part of memory, which means that CMS collection cannot wait for the old generation to be full before recycling, If the reserved memory is not enough to store floating garbage, Concurrent Mode Failure will occur, and the JVM will temporarily enable Serial Old to replace CMS;
Memory fragmentation : due to the use of mark sweep algorithm, the memory fragmentation is inevitable, the reason of not using mark compact algorithm is that changing memory address for all the alive objects is impossible if application threads run at the same time with gc threads.

Below are some classic JVM flags which we can configure for CMS :

FLAG	DESCRIPTION
-XX:+UseConcMarkSweepGC	Enable CMS for the old generation, it will also enable automatically the use of ParNew garbage collector (-XX:+UseParNewGC) for the new generation, ParNew + CMS (backup by Serial Old).
-XX:CMSInitiatingOccupancyFraction	Set a threshold for memory usage in the old generation. CMS will be triggered when this threshold is reached. If the memory grows slowly, it is better to set a slightly larger value to reduce the trigger frequency of CMS. Otherwise if the memory grows fast, it is better to set a small value, in this way, even if CMS will be triggered frequently, it can avoid trigger frequently the Serial Old (Full GC) when there is CMS concurrent mode failure.
-XX+UseCMSCompactAtFullCollection	After every Full GC, compact and compress the memory space to avoid memory fragmentation, but it will make the pause time longer.
-XX:CMSFullGCsBeforeCompaction	Determine after how many times of Full GC to compact and to compress the memory space.
-XX:ParallelCMSThreads	Set the threads number for CMS. By default, it equals to (ParallelGCThreads + 3) / 4.

2.5 G1 (Garbage First)

The ideal situation of garbage collection is to have low pause time and high throughput.

But in reality, low pause time and high throughput are opposite, if one of them become higher, the other will become lower.

From the point of view of garbage collector, Parallel Scavenge + Parallel Old focus more on high throughput, and ParNew + CMS focus more on low pause time.

Is there a garbage collector which focus both on high throughput and low pause time to achieve a balance situation for garbage collection?

Here comes the G1 garbage collector.

It is a server-style garbage collector, targeted for multi-processor machines with large memories.

It is fully supported in Oracle JDK 7 update 4 and later releases, since JDK 1.9, it becomes the default garbage collector.

It is planned as the long term replacement for the Concurrent Mark-Sweep Collector (CMS).

Comparing G1 with CMS, there are differences that make G1 a better solution.

One of the differences is that G1 is a compacting collector.

Also, G1 offers more predictable garbage collection pauses than the CMS collector, and allows users to specify desired pause targets.

2.5.1 G1 Overview

The older garbage collectors (serial, parallel, CMS) all structure the heap into two sections: young generation and old generation of a fixed memory size.

All memory objects end up in one of these three sections.

The G1 collector takes a different approach.

The heap is one memory area split into many fixed sized regions.

Region size is chosen by the JVM at startup.

The JVM generally targets around 2000 regions varying in size from 1 to 32Mb.

In reality, these regions are mapped into logical representations of Eden, Survivor, and old generation spaces.

In addition, there is a fourth type of object known as Humongous regions.

These regions are designed to hold objects that are 50% the size of a standard region or larger.

G1 focuses on collecting the areas of memory with the most garbage first, which usually yields a large amount of free space, in order to optimize performance and reduce pause times.

This is why this method of garbage collection is called Garbage-First.

G1 uses a pause prediction model to meet a user-defined pause time target and selects the number of regions to collect based on the specified pause time target.

G1 is beyond the capability of both the previous garbage collectors.

CMS (Concurrent Mark Sweep ) garbage collector does not do compaction.

ParallelOld garbage collector performs only whole-heap compaction, which results in considerable pause times.

G1 has both concurrent (runs along with application threads, e.g., refinement, marking, cleanup) and parallel (multi-threaded, e.g., stop the world) phases.

However, full garbage collections are still single threaded, but if tuned properly our applications should avoid full GCs.

2.5.2 G1 Garbage Collection Steps

For young Generation in G1, the following can be said :

The heap is a single memory space split into approximately 2000 regions.
Minimum size is 1Mb and maximum size is 32Mb;
Young generation memory is composed of a set of non-contiguous regions.
This makes it easy to resize when needed;
Young generation garbage collections, or young GCs, are stop the world events.
All application threads are stopped for the operation;
The young GC is done in parallel using multiple threads;

Live objects are copied to new survivor or old generation regions.

Below is an example of G1’s before and after garbage collection for young generation :

G1 performs the following phases on the old generation of the heap :

PHASE	DESCRIPTION
1. Initial Mark	Mark survivor regions (root regions) which may have references to objects in old generation. This is a stop the world event. In the logs this is noted as GC pause (young)(initial-mark).
2. Root Region Scanning	Scan survivor regions for references into the old generation. This happens while the application continues to run. The phase must be completed before a young GC can occur.
3. Concurrent Marking	Find live objects over the entire heap. This happens while the application is running. If empty regions are found, they are removed immediately in the Remark phase. Also, accounting information that determines liveness is calculated.
4. Remark	Completes the marking of live object in the heap. Empty regions are removed and reclaimed. Region liveness is now calculated for all regions. Uses an algorithm called snapshot-at-the-beginning (SATB) which is much faster than what was used in the CMS collector.
5. Cleanup	G1 selects the regions with the lowest aliveness which can be scrubbed or collected the fastest. It is STW event.
6. Copying	G1 evacuates or copies live objects to new unused regions. This can be done with young generation regions which are logged as [GC pause (young)]. Or both young and old generation regions which are logged as [GC Pause (mixed)].

Below are a few key points about the G1 garbage collection on the old generation :

Concurrent Marking Phase : Aliveness information is calculated concurrently while the application is running.
This liveness information identifies which regions will be best to reclaim during an evacuation pause.

Remark Phase : Completely empty regions are reclaimed.
Copying/Cleanup Phase : Old generation regions are selected based on their aliveness.
Young generation and old generation are reclaimed at the same time.

2.5.3 G1 Flags

G1 has below classic JVM flags :

FLAG	DESCRIPTION
-XX:+UseG1GC	Tells the JVM to use the G1 Garbage collector.
-XX:G1HeapRegionSize	Sets the size of a G1 region. The value will be a power of two and can range from 1MB to 32MB.
-XX:MaxGCPauseMillis	Sets a target for the maximum GC pause time. This is a soft goal, and the JVM will make its best effort to achieve it. Therefore, the pause time goal will sometimes not be met. The default value is 200 milliseconds.
-XX:ParallelGCThreads	Sets the value of the STW worker threads. Its value is the same as the number of logical processors up to a value of 8.
-XX:ConcGCThreads	Sets the number of parallel marking threads. It is recommended to set its value to approximately 1/4 of the number of parallel garbage collection threads (ParallelGCThreads).
-XX:InitiatingHeapOccupancyPercent	Percentage of the (entire) heap occupancy to start a concurrent GC cycle. A value of 0 denotes ‘do constant GC cycles’. The default value is 45 (i.e., 45% full or occupied).
-XX:G1ReservePercent	Sets the percentage of heap that is reserved as a false ceiling to reduce the possibility of promotion failure. The default value is 10.

2.5.4 G1 Best Practices

The principle design of G1 is to simplify JVM performance tuning.

In general, we only need the following three steps to use G1 :

Enable G1 garbage collector;
Set the maximum value of heap size;
Set the maximum gc pause.

There are a few best practices to follow when using G1 :

Do not Set Young Generation Size : explicitly setting young generation size via -Xmn or -XX:NewRatio meddles with the default behavior of the G1 collector.
It will no longer respect the pause time target for collections;
Avoid Evacuation Failure : increase heap size, increase the -XX:G1ReservePercent, increase the number of marking threads using the -XX:ConcGCThreads;
Do not be too harsh when setting -XX:MaxGCPauseMillis : consider setting a value that the JVM can achieve at least 90% of the time.
This means that for 90% of user requests, the response time will stay below your target pause time.

2.6 ZGC (Z Garbage Collector)

Java 11 introduced the Z Garbage Collector (ZGC) as an experimental garbage collector implementation.

Today, it’s not uncommon for applications to serve thousands or even millions of users concurrently.

Such applications need enormous amounts of memory, but managing all that memory may easily impact application performance.

ZGC aims to manage garbage collection for those multi-terabyte heaps with low pause times (<10ms).

Ever since jdk 15, ZGC is not anymore an experimental implementation and is ready for production.

2.6.1 ZGC Overview

ZGC is a concurrent, single-generation, region-based, NUMA-aware, low latency, compacting collector.

NUMA-aware means that a system or software (like a garbage collector) is designed to work efficiently on NUMA (Non-Uniform Memory Access) hardware.

In NUMA systems, memory is divided into regions attached to specific CPUs;
Accessing memory local to a CPU is faster than accessing memory attached to another CPU.

NUMA-aware software :

Optimizes memory allocation and access to prefer local memory;
Reduces latency and improves performance on multi-CPU servers.

ZGC divides memory into regions, also called ZPages which can be dynamically created and destroyed.

ZPages can also be dynamically sized, which are multiples of 2 MB : Small (2 MB), Medium (32 MB), Large (N * 2 MB).

ZGC heap can have multiple occurrences of these heap regions and the medium and large regions are allocated contiguously.

For other garbage collectors, when user allocate a really big object in memory, it often leads to multiple GC cycles to free up enough contiguous space.

If none are available, even after multiple GC cycles, the JVM will shutdown with OutOfMemoryError.

However, this particular use case is not an issue with the ZGC.

Since the physical heap regions of ZGC can map into a bigger heap address space, locating a bigger contiguous space is feasible.

2.6.2 ZGC Garbage Collection Steps

The GC cycle of ZGC is divided into three pauses :

Pause Mark Start : ZGC walks through the object graph to mark the objects live or garbage.
This phase also includes the remapping of live data.
It is one of the most heavy-duty workloads in the ZGC GC cycle.
Pause Mark End : ZGC starts with reference processing which is used for synchronization and starts with a short pause of 1 ms.
It also includes the relocation set selection.
ZGC marks the regions it wants to compact.
Pause Relocate Start : ZGC triggers the actual region compaction.

2.6.3 ZGC Flags

ZGC has below tuning JVM flags :

FLAG	DESCRIPTION
-XX:+UnlockExperimentalVMOptions -XX:+UseZGC	Enable ZGC from the JDK 11 until the JDK 15, it has to unlock the experimental options.
-XX:+UseZGC	Enable ZGC from JDK 15 in advance.
-Xmx	Set the max heap size, it is one of the most important options to tune in ZGC. Since ZGC is a concurrent collector, a max heap size must be selected such that the heap can accommodate the live-set of application and there is enough headroom in the heap to allow allocations to be serviced while the GC is running.
-XX:ConcGCThreads	Set the number of concurrent GC threads. A higher value of ConcGCThreads will leave less amount of CPU time for application. On the other hand, a lower value may result in application struggling for memory, it might generate more garbage than what is collected by ZGC.
-XX:+ZUncommit	Return unused memory to the operating system. By default, ZGC uncommits unused memory, returning it to the operating system. This is useful for applications and environments where memory footprint is a concern.
-XX:+UseLargePages	Enable large pages for enhanced performance of application. But this flag requires root privileges.
-XX:+UseTransparentHugePages	Enable transparent huge pages.
-XX:+UseNUMA	ZGC is a NUMA-aware GC. Applications executing on the NUMA machine can result in a noticeable performance gain. By default, NUMA support is enabled for ZGC.

2.6.4 ZGC Best Practices

The principle design of ZGC is to simplify JVM performance tuning.

In general, we only need the following two steps to use ZGC :

Enable ZGC garbage collector;
Set the maximum value of heap size.

2.7 G1 vs ZGC

In most recent JDK (i.e. JDK 21), the default garbage collector is G1 GC (Garbage-First Garbage Collector).

ZGC is not the default GC because :

ZGC is optimized for low-latency and large heaps, but it may use more CPU and memory overhead than G1 in typical workloads;
G1 GC is more mature and broadly tested across many types of applications, making it a safer default for most users;
ZGC is still evolving and may not be ideal for all scenarios, especially small heaps or environments where ultra-low latency is not required.

G1 is the default because it offers a good balance of performance, predictability, and compatibility for most applications.

You should prefer ZGC over G1 GC when :

Your application requires very low GC pause times (typically less than 2ms);
You have a large heap size (hundreds of GB to several TB);
You run latency-sensitive workloads (e.g., real-time trading, gaming servers, large-scale web services);
You want concurrent GC with minimal impact on application threads.

So use ZGC for ultra-low latency and large heap applications.

For general-purpose or smaller heap apps, G1 is usually sufficient.

3. Guidelines for Selecting a Garbage Collector

Unless the application has rather strict pause-time requirements, first run the application and allow the VM to select a collector.

If necessary, adjust the heap size to improve performance.

If the performance still doesn’t meet the goals, then use the following guidelines as a starting point for selecting a collector :

GC	Serial / Concurrent / Parallel	New / Old Generation	Algorithm	Throughput / Pause Time	Use Case
Serial	Serial	New	Copy	Pause Time	single CPU, client mode, small memory footprint
Serial Old	Serial	Old	Mark Compact	Pause Time	single CPU, client mode, small memory footprint
ParNew	Parallel	New	Copy	Pause Time	multiple CPUs, server mode
CMS	Concurrent	Old	Mark Sweep	Pause Time	scenarios that needs lots of interactions with the users
Parallel Scanvenge	Parallel	New	Copy	Throughput	background operations without lots of interactions with users
Parallel Old	Parallel	Old	Mark Compact	Throughput	background operations without lots of interactions with users
G1	Parallel and Concurrent	New and Old	Copy and Mark Compact	Pause Time over Throughput	response time is more important than overall throughput and garbage collection pauses must be kept shorter
ZGC	Parallel and Concurrent	New and Old	Copy and Mark Compact	Throughput and Pause Time	response time is a high priority, and/or with a very large heap