GC – 1 [ Garbage Collection ]

Garbage Collection is another important part of JVM execution engine.

It tracks each and every object available in the JVM heap space and method area, then it removes unused ones.

An object is considered as garbage when it can no longer be reached from any pointer in the running program.

If the garbage in memory is not cleaned up in time, the memory space occupied by these garbage objects will be retained until the end of the application, and the reserved space cannot be used by other objects.

In this case, the memory will be used up sooner or later.

This is why garbage collection is required.

It helps to free useless objects and to clear record fragments in memory automatically.

Programmers do not need to handle manually memory allocation/deallocation.

But it might result in application stopping unpredictably.

What You Need

  • About 19 minutes
  • A favorite text editor or IDE
  • Java 8 or later

1. Garbage Detection Stage

Before performing garbage collection, it is first necessary to distinguish which objects are alive and which objects are dead in the memory.

During garbage collection, only objects identified as dead will have their occupied memory released.

This process is called the garbage marking phase and there are 2 algorithms for this.

1.1 Reference Counting Algorithm

Add a reference counter to the object.

When the object is referenced by another object, its counter is incremented by 1, and the counter is decremented by 1 when the reference is invalid.

Objects with a reference count of 0 can be garbage collected.

But in the case of circular references between two objects, the reference counter will never be 0, so they cannot be recycled, which leads to memory leak.

Because of this, the JVM does not use reference counting algorithms.

1.2 Reachability Analysis Algorithm

Searching GC Roots as the starting point, then from GC Roots, all the objects that can be reached are alive, and all the unreachable objects can be recycled.

Compared with the reference counting algorithm, the reachability analysis algorithm not only has the characteristics of simple implementation and efficient execution, but more importantly, it can effectively solve the problem of circular references in the reference counting algorithm.

The JVM uses this algorithm to determine whether the object can be recycled.

GC Roots generally include the following :

  • Objects referenced in the JVM stack;
  • Objects referenced in the native method stack;
  • Objects referenced by the static property of the class in the method area;
  • Objects referenced by the constants in the method area;
  • Objects held by synchronized locks.

In general, since JVM heap is the priciple target of garbage collection, so if an object has a reference to another object in the heap, but itself is not in the heap, then it can be considered as a gc root.

If the reachability analysis algorithm is to be used to determine whether the memory can be reclaimed, the analysis must be performed in a snapshot that can guarantee consistency, otherwise the accuracy of the analysis results cannot be guaranteed.

This is also an important reason why there is STW (Stop The World) when gc is running.

2. Garbage Collection Stage

After successfully distinguishing between live objects and recyclable objects in memory, the next task is to perform garbage collection.

There are currently three common garbage collection algorithms.

2.1 Mark-Sweep Algorithm

Mark : from each GC Roots, mark all the reachable objects (alive objects);

Sweep : recycle all the non-marked objects (non-reachable objects).

Mark-Sweep algorithm has below shortcomings :

  • It needs to traverse all the memory area twice (mark x 1 + sweep x 1), so the process is not very efficient;
  • Application is under STW (Stop The World) during the process beacause it has to give up cpu time to gc thread;
  • A large number of discontinuous memory fragments will be generated after gc, resulting in the inability to allocate memory for large objects;
  • A large number of discontinuous memory fragments will be generated after gc, resulting in the need of maintaining a free list in order to allocate memory to new objects;
  • To store the maintained free list, it also needs memory space.
    The so called ‘Sweep’ will not remove the object from memory but put its memory address into the free list, so new object can be allocated to its memory space.

2.2 Mark-Compact Algorithm

The final effect of the Mark-Compact algorithm is equivalent to performing memory defragmentation after the Mark-Sweep algorithm has been executed.

Therefore, it is also called Mark-Sweep-Compact algorithm.

Mark : from each GC Roots, mark all the reachable objects (alive objects);

Sweep : recycle all the non-marked objects (non-reachable objects);

Compact : move all alive objects at one end of memory.

In this way, this algorithm does not need to maintain a free list for memory allocation of new objects, only an available memory start address is sufficient.

Allocate memory for large objects is more possible than Mark-Sweep algorithm.

Mark-Compact algorithm has below shortcomings :

  • While moving the object, if the object is referenced by other objects, it also needs to adjust the address of the reference;
  • Application is under STW (Stop The World) during the process because it has to give up cpu time to gc thread.

2.3 Copy Algorithm

Divide the memory into two blocks of equal size, and use only one of them at a time.

During garbage collection, copy alive objects from in-use memory to unused memory block.

Then clean up the used memory space, in this way, the role of two memory blocks are switched.

This algorithm will not create fragments in memory, however its main disadvantage is that only half of the memory is used.

If there are lots of dead objects, then this algorithm is efficient, otherwise it needs to copy and move lots of alive objects from one memory block to another, which is not efficient.

For this reason, this algorithm is used for garbage collection of new generation, because in new generation, almost 70% – 99% of objects can be recycled during MinGC.

Instead of dividing the new generation into two equal-sized blocks, JVM divide new generation into a larger Eden space and two smaller Survivor spaces.

JVM uses the Eden space and one of the Survivors.

During a Minor GC, the live objects in Eden and one Survivor space are copied to the other Survivor space in a single operation.

Finally, Eden and the previously used Survivor space are cleared.

3. Object Finalization Mechanism

The objects’ finalization mechanism allows developers to provide custom processing logic before objects are destroyed.

For a recyclable object, before destroying it, gc will call its finalize() method, the thread in charge of executing this method is called finalizer.

The main purpose is to release resources used by object before it is removed from the memory.

However, try-finally mechanism can do better than finalize() method for this kind of job.

Because finalize() method has below disadvantages :

  • it may bring an object back to life;
  • its execution time is not guaranteed, it depends on the thread of gc, if no gc, this method may never be executed;
  • a heavy operations in this method may severely affects gc performance.

Below is an example of bring a recyclable object back to life :

 1public class FinalizeTest {
 2    private static Resurrected obj;
 3
 4    public static void main(String[] args) throws InterruptedException {
 5        obj = new Resurrected();
 6
 7        obj = null;
 8
 9        System.gc();
10
11        System.out.println("GC time");
12
13        Thread.sleep(2000);
14
15        if (obj == null) {
16            System.out.println("obj is dead");
17        } else {
18            System.out.println("obj is still alive");
19        }
20    }
21
22    private static class Resurrected {
23        protected void finalize() throws Throwable {
24            obj = this; // if comment this line, obj will be recycled after gc
25        }
26    }
27}

Below is the output of above code snippet :

GC time
obj is still alive

If we comment the line 24 and rerun above code snippet, the output is like below :

GC time
obj is dead

It is recommanded that we shall never put any code in finalize() method nor call it directly in our code.

It has to be noticed that finalize has been deprecated starting with Java 9, and will eventually be removed.

4. System.gc()

Explicitly calling System.gc() is known for being a bad practice.

Beacause it will trigger Full GC which recycle new and old generations and cause STW.

And also there is no guarantee that the actual GC will be triggered :

public class SystemGCTest {
    public static void main(String[] args) {
        for (int i = 0; i < 3; i++) {
            new MyClass(i);

            // If System.gc() is guaranted that the actual GC will be triggered
            // Then the result would have been like below

            // 0 is being created
            // 0 is getting garbage collected
            // 1 is being created
            // 1 is getting garbage collected
            // 2 is being created
            // 2 is getting garbage collected
            System.gc();
            // System.runFinalization();
        }
    }

    private static class MyClass {
        private int cid;

        MyClass(int cid) {
            this.cid = cid;
            System.out.printf("%s is being created\n", cid);
        }

        @Override
        protected void finalize() throws Throwable {
            System.out.printf("%s is getting garbage collected\n", cid);
        }
    }
}

Below is the output of above code snippet :

0 is being created
1 is being created
2 is being created
0 is getting garbage collected
1 is getting garbage collected
2 is getting garbage collected

We can prevent System.gc() from doing any work by using the -XX:DisableExplicitGC JVM flag.

An explicit call to the System.gc() is more a debugging practice for memory leak analysis than something we would like to keep in the production code.

Calling System.gc() and seeing heap space still being high might be an indication of a memory leak.

5. STW (Stop The World)

It refers to the pause of the application during the occurrence of the gc event.

In the reachability analysis algorithm, during the detection of gc roots, all java execution threads will be suspended.

Because the analysis work must be done in a snapshot that ensures consistency.

Consistency refers to the fact that the entire execution system appears to be frozen at a certain point in time.

Otherwise if the object reference relationship is constantly changing during the analysis process, the accuracy of the analysis results will not be guaranteed.

The suspended application is resumed after completing the gc.

We should do the best to reduce STW frequency.

STW has nothing to do with which garbage collector is used, all garbage collectors have this event.

STW is automatically initiated and completed in the background, it is invisible for users.

Below is an example to observe the existence of STW :

import java.util.ArrayList;
import java.util.List;

public class STWTest {
    public static void main(String[] args) {
        TimerThread timer = new TimerThread();
        timer.start();

        // normally if uncomment below gc() method
        // the output should be like :
        // 1.0 secs
        // 1.1 secs
        // 1.1 secs
        // 1.0 secs
        // 1.1 secs
        // 1.1 secs
        // 1.1 secs
        gc();
    }

    private static void gc() {
        List<byte[]> list = new ArrayList<>();
        while (true) {
            for (int i = 0; i < 10000; i++) {
                list.add(new byte[1024]);
            }

            list.clear();
            System.gc();
        }
    }

    private static class TimerThread extends Thread {
        @Override
        public void run() {
            long previous = System.currentTimeMillis();

            while (true) {
                sleep(1000);

                long current = System.currentTimeMillis();
                long duration = current - previous;
                previous = current;

                System.out.println(duration / 1000 + "." 
                + duration % 1000 + " secs");
            }
        }

        private void sleep(int millis) {
            try {
                Thread.currentThread().sleep(millis);
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
        }
    }
}

Below is the output of above code snippet :

1.8 secs
1.226 secs
1.3 secs
1.3 secs
1.0 secs
1.7 secs
1.3 secs
1.0 secs
1.6 secs
1.9 secs
1.5 secs
1.27 secs
1.11 secs
1.0 secs
1.6 secs
1.3 secs

6. Concurrency And Parallelism

6.1 Concurrency In Operating System

In operating system, concurrency refers to the fact that, during a period of time, several programs are in the period between being started and running, and these programs are all running on the same processor.

Concurrency is not real simultaneous, the CPU divides a time period into several time segments, and then switches back and forth between these time segments.

Since the CPU processing speed is very fast, user can feel programs running at the same time.

6.2 Parallelism In Operating System

If there are more than one cpu, when one cpu executes a process, the other cpu can execute another process.

The two processes do not preempt cpu resources from each other, and can be performed at the same time, which is called parallelism.

The factor that determines parallelism is not necessarily the number of CPUs, but the number of cores of the CPU.

A CPU with multiple cores can also achieve parallelism.

Concurrency vs Parallelism In Operating System

CONCURRENCYPARALLELISM
Multiple things happened simultaneously at a period of timeMultiple things happened simultaneously at a point of time
Multiple concurrent tasks preempt each other for resourcesMultiple tasks in parallel do not preempt each other for resources
One CPU with one coreOne CPU with multi cores / Multiple CPUs with one or more cores

6.3 Concurrency In Garbage Collection

In garbage collection, concurrency means that the garbage collection threads and the user threads are executed at the same time, and the garbage collection thread will not stop the operation of the user program.

Concurrent garbage collectors are CMS, G1 …, but they still have STW.

6.4 Parallelism In Garbage Collection

It refers to multiple garbage collection threads working in parallel, but at this time the user thread is still in a waiting state.

Parallel garbage collectors are ParNew, Parallel Scavenge, Parallel Old …

Serial garbage collection refers to only one garbage collection thread is recycling, and at this time the user thread is still in a waiting state.

Serial garbage collectors are Serial, Serial Old …

Concurrency vs Parallelism vs Serial In Garbage Collection

CONCURRENCYPARALLELISMSERIAL
Garbage collection threads can be executed at the same time with user threadsMultiple garbage collection threads run at the same time to recycle garbageSingle garbage collection thread run every time to recycle garbage
CMS, G1ParNew, Parallel Scavenge, Parallel OldSerial, Serial Old
STWSTWSTW

7. Safe Point And Safe Region

7.1 Safe Point

When the program is executing, it is not possible to stop it and start gc in all places, only at specific positions to stop it and start gc.

These specific positions are called safe points.

The choice of the safe point is very important.

If it is not frequent, it may cause the gc to wait too long.

If it is too frequent, it may cause low performance at runtime.

In general, a big part of instructions are very fast to execute.

Some instructions with longer execution time should be selected as safe points, such as method calls, loop jumps and exception jumps, etc.

7.2 Safe Region

Some threads are in sleep or blocked state, and these threads cannot go to the safe points.

In this case, a safe region is required.

A safe region means that in a piece of code, the relationship of objects’ references will not change, and it is safe to start gc anywhere in this area.

The safe region is the extension of safe points.

8. Object References

If there is such a kind of objects, when the memory space is enough, it can be kept in the memory, and if the memory space is not enough, these objects are discarded without harm.

In java, there are 4 types of references : Strong, Soft, Weak, Phantom.

The strength of these four references decreases gradually.

Except strong reference, the other references can be found in java.lang.ref, the developers can use them directly in the program.

8.1 Strong Reference

The most traditional definition of reference, such as Object obj = new Object(), as long as a strong reference relationship still exists, the garbage collector will never reclaim the referenced object.

In java, when a new object is created using the new operator and assigned to a variable, the variable becomes a strong reference to the object.

It is the default reference type, more than 99% of ordinary systems are strong references.

If an object referenced by strong reference is still reachable, garbage collector will never recycle it.

Only strong reference can cause memory leak or OOM.

public class StrongReferenceTest {
    public static void main(String[] args) throws InterruptedException {
        Object obj = new Object();

        // uncomment below will make obj unreachable and recyclable by gc
        // obj = null;

        System.gc();

        Thread.currentThread().sleep(3000);

        if (obj == null) {
            System.out.println("obj does not exist anymore after gc");
        } else {
            System.out.println("obj is alive after gc");
        }
    }
}

Below is the output of above code snippet :

obj is alive after gc

8.2 Soft Reference

Soft references are used to describe objects that are still useful but not absolutely required.

A softly reachable object has no strong references pointing to it.

All soft references to softly-reachable objects are guaranteed to be cleared before a JVM throws an OutOfMemoryError.

Soft references’ typical use case is to implement memory-sensitive caches.

If there is still free memory, then temporarily keep the cache and clear it when the memory is low.

This ensures that the memory will not be exhausted while using the caches.

import java.lang.ref.SoftReference;

public class SoftReferenceTest {
    /**
     * -Xms10m -Xmx10m
     */
    public static void main(String[] args) throws InterruptedException {

        SoftReference<Person> p = new SoftReference<>(new Person("tom", 18));
        // Above line equals to below three lines
        // 1.declare person object; 2. associate with soft reference;
        // 3. disable strong reference)
        // Person tom = new Person("tom", 18);
        // SoftReference<Person> p = new SoftReference<>(tom);
        // tom = null;

        System.out.println(p.get());

        System.gc();
        Thread.currentThread().sleep(3000);

        System.out.println();
        System.out.println("After gc, when memory is still enough :");

        System.out.println(p.get());

        byte[] bigObject = new byte[1024 * 1024 * 7];
        byte[] notSoBigObject = new byte[512];

        System.out.println();
        System.out.println("After gc, when memory is not enough :");

        System.out.println(p.get());

    }

    private static class Person {
        private String name;
        private int age;
        Person(String name, int age) {
            this.name = name;
            this.age = age;
        }
        @Override
        public String toString() {
            return "Person [age=" + age + ", name=" + name + "]";
        }
    }
}

Below is the output of above code snippet :

Person [age=18, name=tom]

After gc, when memory is still enough :
Person [age=18, name=tom]

After gc, when memory is not enough :
null

8.3 Weak Reference

Weak reachability means that an object has neither strong nor soft references pointing to it.

Objects associated with weak references can only survive until the next garbage collection.

When the garbage collector works, objects associated with weak references will be reclaimed regardless of whether the memory space is sufficient.

import java.lang.ref.WeakReference;

public class WeakReferenceTest {
    public static void main(String[] args) throws InterruptedException {
        WeakReference<Person> p = new WeakReference<>(new Person("tom", 18));

        System.out.println(p.get());

        System.gc();

        Thread.currentThread().sleep(3000);

        System.out.println("After gc, no matter if memory is sufficient");

        System.out.println(p.get());
    }

    private static class Person {
        private String name;
        private int age;

        Person(String name, int age) {
            this.name = name;
            this.age = age;
        }

        @Override
        public String toString() {
            return "Person [age=" + age + ", name=" + name + "]";
        }
    }
}

Below is the output of above code snippet :

Person [age=18, name=tom]
After gc, no matter if memory is sufficient
null

Most known use of these references is the WeakHashMap class.

It’s the implementation of the Map interface where every key is stored as a weak reference.

When the Garbage Collector removes a key, the entity associated with this key is deleted as well.

import java.util.HashMap;
import java.util.Map;
import java.util.WeakHashMap;

public class WeakHashMapTest {
    public static void main(String[] args) throws InterruptedException {
        Map<Key, String> map = new HashMap<>();
        test(map);

        map = new WeakHashMap<>();
        test(map);
    }

    private static void test(Map<Key, String> map) throws InterruptedException {
        Key key = new Key();

        map.put(key, "hello world");

        key = null;

        System.gc();

        Thread.sleep(3000);

        System.out.println("After gc, element number in map is :");

        System.out.println(map.size());
    }

    private static class Key {

    }
}

Below is the output of above code snippet :

After gc, element number in map is :
1
After gc, element number in map is :
0

8.4 Phantom Reference

Whether an object has a phantom reference will not affect its lifetime at all, and it is impossible to obtain an instance of an object through a phantom reference.

Its purpose is that when the object is garbage collected, we can receive a system notification.

The phantom reference must be used together with the reference queue.

When the garbage collector is ready to recycle an object, if it finds that it still has a phantom reference, it will add the phantom reference to the reference queue after recycling the object to notify the application of the object’s recycling.

import java.lang.ref.PhantomReference;
import java.lang.ref.Reference;
import java.lang.ref.ReferenceQueue;

public class PhantomReferenceTest {
    public static void main(String[] args) throws InterruptedException {
        Person tom = new Person("tom", 18);
        Person jerry = new Person("jerry", 16);

        ReferenceQueue<Person> queue = new ReferenceQueue<>();

        PhantomReference<Person> r1 = new PhantomReference<Person>(tom, queue);
        PhantomReference<Person> r2 = new PhantomReference<Person>(jerry, queue);

        System.out.println("Can not get tom through panthom reference : " + r1.get());
        System.out.println("Can not get jerry through panthom reference : " + r2.get());

        tom = null;
        jerry = null;

        System.gc();

        Thread.sleep(3000);

        Reference<? extends Person> ref;
        while ((ref = queue.poll()) != null) {
            if (r1 == ref) {
                System.out.println("tom has been recycled");
            }

            if (r2 == ref) {
                System.out.println("jerry has been recycled");
            }
        }
    }

    private static class Person {
        private String name;
        private int age;

        Person(String name, int age) {
            this.name = name;
            this.age = age;
        }

        @Override
        public String toString() {
            return "Person [age=" + age + ", name=" + name + "]";
        }
    }
}

Below is the output of above code snippet :

Can not get tom through panthom reference : null
Can not get jerry through panthom reference : null
jerry has been recycled
tom has been recycled