Loading... (original) (raw)

Summary

Reduce metaspace waste by dynamically merging and splitting metaspace chunks.

Goals

Improve the existing Metaspace Chunk allocator to reduce Out-Of-Memory errors due to clogging up metaspace with chunks of the "wrong" chunk size.

Non-Goals

Success Metrics

Increased reuse of metaspace chunks. Fewer chunks resting in the freelist. Fewer Out-Of-Memory errors due to a filled-up Metaspace.

Motivation

Chunks of a given size cannot be reused as chunks of a different size. But there are pathological allocation patterns which lead to the metaspace filled up with chunks of a given chunk size. Albeit free, these chunks cannot reused if chunks of a different size are needed.

Example: an application has a large number of class loaders, each one allocating only a few small classes. These class loaders will not need much metaspace, and with the current implementation will only be given small metaspace chunks. This leads to the metaspace being filled with small chunks. When those class loaders are unloaded, the small chunks are freed and added to the freelist.
Now a single classloader continues work, and starts allocating medium-sized chunks. The small chunks will not be reused. With a limit in place (CompressedClassSpaceSize or MaxMetaspaceSize), the VM may hit an OOME from metaspace even though there are plenty of free chunks, but they are locked in into the wrong size.

For a demonstration of this effect please see the two attached example programs:

Also note the attached output files. They show the output of the Example2 program with CompressedClassPointers enabled and a CompressedClassSpaceSize of 10M, running into an OOME. Both use SAP-internal metaspace statistic printouts, which were done at the point of the OOME. Most important is the "-- ChunkManager --" section, which shows how many chunks of which size are residing in the freelists.

Also note that without the patch, the VM manages to load ~1000 large classes before hitting OOM, with the patch, the VM manages to load ~3000 classes.

The printouts also show an ASCII-art metaspace map, another feature we added to our VM, which shows in the former case a lot of small chunks unused (lower-case "s"), in the latter case almost no unused chunks (all letters are uppercase). For the latter case, it also shows less fragmentation.

(Please note that we would be happy to contribute both this statistic to the OpenJDK, however for now they are for now not part of this JEP).

Description

In order to enable small chunks to be reused as larger chunks, multiple neighboring smaller chunks can - if they are all free - be merged to form a larger chunk. Similarly, larger chunks can be split up into smaller chunks if small chunks are needed and only large chunks are available.

As already mentioned, variant of this solution is already implemented as a patch internally at SAP. The following points describe this particular implementation and also serve as a proposal of how an implementation in the OpenJDK could work:

As a result, metaspace will now fill up with larger chunks where possible. This reduces the chance of situations where we need a larger chunk, but only smaller chunks are free.

This takes care of the reverse problem: metaspace is filled with large chunks, but smaller chunks are needed.

Alternatives

Testing

The implementation described above is (as part of the normal nightly tests ran at SAP) tested with TCK, jtreg, and a large number of self-written regression tests, as well as a collection of benchmarks (SPECjvm98, SPECjvm2008, SPECjbb2005).

In addition, a small test case was developed to demonstrate the problem, which shows considerable improvement when running with the fix.

More tests are needed to stress every angle of metachunk allocation.

Risks and Assumptions

There is a performance overhead due to on-the-fly merging and splitting. We think this overhead is small - in our internal tests, its effects were not discernible. However, there may be pathological cases where these costs become larger.

The metaspace coding will become more intricate with this fix, which carries the usual risk of introducing new errors. However, the code could be made simpler in other places - e.g. methods like "get_small_chunk_and_allocate" would not be needed anymore - which may negate the added complexity.

Dependencies

None known.