Java Under the Hood - Dive Into JVM Memory Model and Garbage Collector

I’ve started my programming journey more than 8 years ago with a book on learning programming in Java (specifically Bruce Eckel thinking in java). I chose Java because I was fortunate enough to have an older brother who already worked as a software engineer and he explained to me in simple terms how to choose a programming language. And at that time I was basically thinking between Java and C++. I eventually chose Java due to simple fact that it allowed me to ignore memory management and focus on more conceptual OOP learning, and all in all Java had a much easier learning curve than C++.

Now as my learning progressed as any software developer I’ve started to dive into a myriad of topics non related to Java - SQL, NoSQL, deployment, cloud, Spring Framework etc. In other words I’ve started preparing for a “real” world job, which I thought would require broad knowledge in multiple software domains, and as any developer reading this knows - this grind never stops. You master one skill - and 2 new skills appear. You read a book on architecture and come to a conclusion that you need to order 3 more to cover your newly identified knowledge gaps.

But recently I’ve rediscovered my love for learning the basics - plain old Java language. Java programming language is written by much more smarter devs than you and me and as such we’ve benefited greatly from the automatic memory management that it ships with. But in this blog post I would like to revise again the underlying memory model that Java uses, as well as inner workings of Garbage Collector. Knowledge of inner workings of these structure help a lot with understanding multi-threading pitfalls, as well as gives one ideas on how to tune and improve large applications.

Memory model

In Java Memory is organised into two large spaces: the stack and the heap (there is also a small third segment of native memory called Metaspace used to store class metadata). Please note that the sizes in the diagram are not up to scale, in reality the Heap space is much larger than the stack.

Stack memory

The stack memory is used by java for static memory allocation and the thread execution. Each new called method pushes a block onto the stack memory, it also stores primitive values and references to objects created with the keyword new. Data on the stack is ordered and is organised using LIFO (Last In First Out) structure. Stack shrinks and grows together with the code execution. For example the following sample code will look as follows in memory:

Some other features of stack memory include:

It grows and shrinks as new methods are called and returned, respectively.
Variables inside the stack block exist only as long as the method that created that block is running.
It’s automatically allocated and deallocated when the method finishes execution.
Access to this memory is fast when compared to heap memory.
This memory is threadsafe, as each thread operates in its own stack.
If this memory is full, Java throws java.lang.StackOverFlowError.

Heap space

This region of memory stores the actual object instances. Local variables on the stack hold references that point to these objects. Consider the following line of code:

Person person = new Person();

The new keyword allocates space on the heap for a new Person object, constructs it, and returns a reference to it. This reference is then stored in the local variable person on the stack.

The heap is a single, shared memory area for the entire JVM process. All threads access the same heap, regardless of how many are running. As such thread safety of this space is up to the application developer. The heap is divided into 3 big chunks for facilitation of garbage collection:

Young generation - new objects are created here. This space is further subdivided into
- Eden space - where objects are created initially
- Survivor space (S0 and S1) - object that survive the Garbage Collection cycles are moved here
  
  Most short-lived objects die here quickly, which is highly efficient.
Old (Tenured) generation
- Objects that survive multiple young-gen GCs are promoted to the Old Generation. This space stores long-lived objects like cached, collections that grow over time, singletons, interned or persisted data.
For Java 7 and lower Permanent Generation (PermGen) - this was used to store class metadata, method bytecode, static fields and interned Strings, but this since been replaced by Metaspace

Metaspace

After Java 8 was introduced a new memory segment replaced PermGen and class metadata now lives in Metaspace, which uses native memory, not heap.

Object Reference types

Not many people know but Java actually supports multiple types of references that affect the way garbage collector works. Let’s consider each one.

Strong reference

Default and the most popular reference types that we all are used to. In the example above with the Person, we actually hold a strong reference to an object from the heap. The object on the heap it is not garbage collected while there is a strong reference pointing to it, or if it is strongly reachable through a chain of strong references. In other words, any object unless specifically wrapped in other reference type will be considered as strong.

Weak reference

A weak reference will most likely not survive the next pass of Garbage Collection. A classic use case for weak reference is caches, if memory gets tight GC will clean these objects before crashing with OutOfMemoryError. Weak reference are intialised as follows:

WeakReference<Person> reference = new WeakReference<>(new Person());

There is actually a WeakHashMap in the official java collection API that is using this exact concept as keys to that map. Once a key from the WeakHashMap is garbage collected, the entire entry is removed from the map.

Soft reference

Soft references are useful in memory-sensitive situations. Objects referenced softly are only reclaimed when the JVM is under memory pressure. As long as there is enough free space available, the garbage collector leaves these objects untouched. Before the JVM ever throws an OutOfMemoryError, it is guaranteed to clear all soft-referenced objects. As the JavaDocs put it, “all soft references to softly-reachable objects are guaranteed to have been cleared before the virtual machine throws an OutOfMemoryError.”

Similarly to Weak, Soft reference is created as follows:

SoftReference<Person> reference = new SoftReference<>(new Person());

Difference is that Soft references won’t be cleared unless memory is running out, while Weak reference are most likely cleared once they are reached on next pass of the GC. So soft = less aggressive, weak = more aggressive.

Phantom reference

Phantom references are used to track the lifecycle of an object after it has been finalised and is about to be reclaimed by the garbage collector. Unlike soft or weak references, phantom references cannot be dereferenced and are used together with a ReferenceQueue to perform post-mortem cleanup, typically for native or off-heap resources.

Used only with a reference queue, since the .get() method of such references will always return null. These types of references are considered preferable to finalisers.

ReferenceQueue<Person> queue = new ReferenceQueue<>();
Person p = new Person();
PhantomReference<Person> phantom = new PhantomReference<>(p, queue);
phantom.get(); // always returns null

Phantom references are used when an object owns resources outside the Java heap (e.g., native memory, file handles, GPU buffers, sockets, mmap’d files). Those resources must be freed after the JVM is sure the object is truly unreachable.

Java’s GC only knows how to free heap objects — it has no idea how to free native/off-heap memory. Phantom references provide a reliable notification point.

Garbage Collection Process

So now that we’ve talked about where objects live and how references work, it’s time to look at the “invisible janitor” that actually frees memory for us: the Garbage Collector.

At a very high level, the GC does one thing - find objects that are no longer reachable from any running code, and reclaim their memory. The exact implementation depends on the chosen collector, but most modern HotSpot GCs are based on some variation of mark-and-sweep with generational optimisations.

Mark-and-sweep in practice

The classic mark-and-sweep algorithm works roughly like this:

Stop the world

The JVM briefly pauses all application threads. This is called a stop-the-world pause. During this time, only GC threads run, so your application doesn’t process requests, handle user input, etc.
Find GC roots

The JVM identifies a set of well-known entry points called GC roots. Examples include:
- local variables on each threads stack
- static fields of loaded classes
- JIT/compiler-internal references
- JNI references, etc.
Mark phase

Starting from these roots, the GC traverses the object graph - it follows all references from roots into the heap, every object it visits it “marks” as reachable, from each marked object it recursively follows outgoing objects marking those objects too. After this phase the GC knows which objects are still in use (i.e. marked) and which are effectively dead (unmarked).
Sweep (and optionally compact) phase

Once marking is done the GC scans through heap regions and reclaims memory for unmarked objects. Many collectors also compact memory: they move surviving objects together to reduce memory fragmentation and update references to them. Compaction is important because over time, without it the heap becomes fragmented - lots of small free blocks scattered everywhere - making large allocation harder.
Resume application threads

After reclaiming and possibly compacting memory, the GC resumes all application threads. Program execution continues as if nothing happened – except with more free memory.

This is the basic mental model to keep in mind:

Stop → mark reachable → free unreachable → (maybe compact) → resume.

In reality, modern GCs do this in more sophisticated ways (incrementally, concurrently, in different regions of the heap), but they all revolve around the same core idea.

Minor vs Major GC

Because the heap is split into young and old generations, GC also runs in two flavours:

Minor (young) GC
- Only operates on the young generation (Eden + Survivor S0 and S1 spaces).
- Very fast and frequent.
- Most objects die here – which is exactly what generational GC is optimised for.
Major / Full GC
- Includes the old generation (and sometimes the entire heap).
- Much more expensive and can lead to noticeable pauses if not tuned well.
- Usually happens less frequently, e.g., when the old generation fills up.

When people complain “GC pauses are killing my app”, they’re usually suffering from too many or too long old-gen collections.

You can’t really force GC

Java gives you methods like:

System.gc();
Runtime.getRuntime().gc();

But these are polite suggestions, not commands. The JVM is free to run a GC immediately or defer it or even outright ignore your request (and some JVMs are configured to do exactly that in production). The reasoning is simple: the JVM has far more information than your application about heap state, allocation rates, and GC pressure. Manually sprinkling System.gc() calls almost always makes things worse, not better.

So as a rule of thumb - you don’t control when GC runs, you influence it indirectly through how many objects you create, how long they stay reachable, which data structures you use and how you configure/tune the collector.

GC is not free: stop-the-world and performance

Even with all the optimisations in modern JVMs, GC has a cost:

Stop-the-world pauses

All collectors have at least some phases where application threads are paused. For many collectors, the young-gen collections are very short (sub-millisecond to a few milliseconds), but full GCs can be much longer if the heap is big and the configuration is poor.
CPU overhead

GC threads are doing real work: scanning memory, marking, compacting, copying. On a high-throughput system, a non-trivial portion of CPU time can be spent inside the GC rather than your business logic.
Latency spikes

Even rare but long pauses can be deadly for low-latency systems (trading platforms, real-time APIs, etc.). That’s why there is so much innovation around “low pause” collectors.

Modern GC algorithms in Java

Modern Java (especially JDK 11+ and 17+) ships with several different collectors, each with its own trade-offs. Very roughly:

Serial GC

Single-threaded GC. It is simple, compact, great for small heaps or client-style applications, however it is not suited for large, multi-core servers.
Parallel GC

Throughput-oriented collector, uses multiple threads for GC. Focuses on maximising total work done and is fine with longer pauses. As such it is good for batch processing where latency is less important than raw throughput.
G1 (Garbage-First) GC

Region-based, mostly concurrent collector - default in many modern JDKs. Aims to provide predictable pause times by collecting “garbage-first” regions with the most reclaimable memory. Good general-purpose choice for server applications with medium to large heaps.
ZGC (Z Garbage Collector)

Low-latency, region-based, heavily concurrent collector. Target: very short pauses (usually in the sub-millisecond to a few milliseconds range), even for very large heaps (tens or hundreds of GB). Great for services where consistent latency is more important than absolute throughput.
Shenandoah GC

Another low-pause collector, with a design similar in spirit to ZGC (concurrent compaction, region-based). Also aims at very low and predictable pause times.

Historically there was also CMS (Concurrent Mark Sweep) and PermGen, but both are essentially legacy at this point (CMS is deprecated/removed, PermGen replaced by Metaspace).

When you’re tuning a modern JVM, the tuning question often boils down to - What’s more important for my workload – maximum throughput or predictable low pauses? and you choose a collector accordingly (e.g., Parallel vs G1 vs ZGC/Shenandoah), then tweak heap sizes and pause targets.

Configuring JVM Memory and Garbage Collector

Knowing how the JVM manages memory is great, but at some point you’ll want to take control: set heap sizes, choose a GC, and tune it for your specific workload. The good news is that most of this is done via JVM flags, so you can experiment easily.

1. Core memory settings: heap and stack

The two most important knobs are:

-Xms – initial heap size
-Xmx – maximum heap size

Example:

java -Xms1g -Xmx1g -jar my-app.jar

This starts the app with a fixed 1 GB heap (initial = max). Using the same value for -Xms and -Xmx is common for server apps because it avoids heap resizing during runtime.

You can use m or g suffixes:

java -Xms512m -Xmx2g -jar my-app.jar

This starts with 512 MB heap and allows it to grow up to 2 GB.

Thestack size per thread is controlled by:

-Xss – stack size (default is usually enough unless you have deep recursion)

java -Xss512k -jar my-app.jar

Be careful: smaller stacks allow more threads but also make stack overflows easier to hit in recursive code.

2. Selecting a garbage collector

Modern JVMs pick a reasonable default GC (e.g., G1), but you can explicitly choose one if your workload has specific requirements.

Common options:

-XX:+UseParallelGC - Throughput-focused (batch jobs, heavy CPU work)
-XX:+UseG1GC - General-purpose server default (balanced pauses vs throughput)
-XX:+UseSerialGC - Single-threaded, simple GC (small apps, tools)

Low-latency collectors (JDK 11+ / 17+):

-XX:+UseZGC - Ultra low pause, large heaps
-XX:+UseShenandoahGC - Low pause, region-based

3. Basic GC tuning knobs

Once a GC is chosen, you can give it some hints. You usually don’t want to micro-tune everything, but a few options are very commonly used.

Target GC pause times (G1 / ZGC / Shenandoah)

For G1, you can specify a desired maximum pause time:

-XX:+UseG1GC
-XX:MaxGCPauseMillis=200

This says “please try to keep GC pauses under ~200 ms”. It’s not a hard guarantee, but G1 will aim for it by adjusting region sizes, concurrent work, etc.

For very low latency with ZGC, you typically just configure heap size and let it do its thing:

-XX:+UseZGC 
-Xms4g 
-Xmx4g

Young/old generation balance (simplified)

Older flags like -XX:NewRatio or explicit young-gen sizes still exist, but for G1 and newer collectors you often let the GC decide. If you really want to tweak:

-XX:+UseParallelGC
-XX:NewRatio=3  # Young gen ~1/4 of heap, old gen ~3/4

For G1, a more relevant knob is when it starts collecting the old generation:

-XX:+UseG1GC
-XX:InitiatingHeapOccupancyPercent=45

This means G1 starts concurrent marking when ~45% of the heap is occupied, rather than waiting for it to fill up more.

4. Enabling GC logging

You can’t tune what you can’t see. GC logs are essential to understanding what your GC is actually doing in production.

-Xlog:gc*:file=gc.log:time,uptime,level,tags

This prints detailed GC info to gc.log. You can later analyse this with tools like GCViewer or gceasy (or even simple grep/awk/Excel if you’re brave).

If you’re on an older JDK, you might still see:

-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-Xloggc:gc.log

You don’t have to memorise every single flag, but knowing the “big four”:

-Xms / -Xmx – heap size
-Xss – stack size
-XX:+UseXYZGC – GC type
-Xlog:gc* – GC logs

already puts you ahead of many Java developers. Combined with an understanding of the memory model and reference types, it gives you a solid foundation for diagnosing memory leaks, GC storms, and “mysterious” latency spikes in real-world systems.

Conclusion

Knowing how memory is organised gives you a very real advantage when it comes to writing correct, efficient, and scalable software. Once you understand how the JVM allocates memory, how objects move through generations, and how the GC reclaims space, you can start making deliberate choices instead of accidental ones. This knowledge also opens the door to tuning the JVM itself. By selecting the right garbage collector and configuring heap sizes, pause targets, and GC behaviour, you can adapt the runtime to the specific needs of your application — whether that’s low latency, high throughput, or simply better stability under load. With the right tools (profilers, GC logs, JFR, etc.), fixing issues like memory issues and GC burns becomes not just possible but straightforward.