RFR(L): 8198423: Improve metaspace chunk allocation (was: Proposal for improvements to the metaspace chunk allocator) (original) (raw)

coleen.phillimore at oracle.com coleen.phillimore at oracle.com
Mon Mar 5 23:59:22 UTC 2018


Hi Thomas,

I've read through the new code.  I don't have any substantive comments.  Thank you for adding the functions.

Has this been tested on any 32 bit platforms?   I will sponsor this when you have another reviewer.

Thanks for taking on the metaspace!

Coleen

On 3/1/18 5:36 AM, Thomas Stüfe wrote:

Hi Coleen,

thanks a lot for the review and the sponsoring offer! New version (full): http://cr.openjdk.java.net/~stuefe/webrevs/metaspace-coalescation/2018-03-01/webrev-full/webrev/ <http://cr.openjdk.java.net/%7Estuefe/webrevs/metaspace-coalescation/2018-03-01/webrev-full/webrev/> incremental: http://cr.openjdk.java.net/~stuefe/webrevs/metaspace-coalescation/2018-03-01/webrev-incr/webrev/ <http://cr.openjdk.java.net/%7Estuefe/webrevs/metaspace-coalescation/2018-03-01/webrev-incr/webrev/> Please find remarks inline:

On Tue, Feb 27, 2018 at 11:22 PM, <coleen.phillimore at oracle.com_ _<mailto:coleen.phillimore at oracle.com>> wrote: Thomas, review comments: http://cr.openjdk.java.net/~stuefe/webrevs/metaspace-coalescation/2018-02-26/webrev/src/hotspot/share/memory/metachunk.hpp.udiff.html <http://cr.openjdk.java.net/%7Estuefe/webrevs/metaspace-coalescation/2018-02-26/webrev/src/hotspot/share/memory/metachunk.hpp.udiff.html> +// ChunkIndex (todo: rename?) defines the type of chunk. Chunk types It's really both, isn't it?  The type is the index into the free list or in use lists.  The name seems fine. You are right. What I meant was that a lot of code needs to know about the different chunk sizes, but naming it "Index" and adding enum values like "NumberOfFreeLists" we expose implementation details no-one outside of SpaceManager and ChunkManager cares about (namely, the fact that these values are internally used as indices into arrays). A more neutral naming would be something like "enum ChunkTypes { spec,small, .... , NumberOfNonHumongousChunkTypes, NumberOfChunkTypes }. However, I can leave this out for a possible future cleanup. The change is big enough as it is. Can you add comments on the #endifs if the #ifdef is more than a couple 2-3 lines above (it's a nit that bothers me). +#ifdef ASSERT + // A 32bit sentinel for debugging purposes. +#define CHUNKSENTINEL 0x4d4554EF // "MET" +#define CHUNKSENTINELINVALID 0xFEEEEEEF + uint32t sentinel; +#endif + const ChunkIndex chunktype; + const bool isclass; + // Whether the chunk is free (in freelist) or in use by some class loader.  bool istaggedfree;  +#ifdef ASSERT + ChunkOrigin origin; + int usecount; +#endif + I removed the asserts completely, following your suggestion below that "origin" would be valuable in customer scenarios too. By that logic, the other members are valuable too: the sentinel is valuable when examining memory dumps to see the start of chunks, and the in-use counter is useful too. What do you think? So, I leave the members in - which, depending what the C++ compiler does to enums and bools, may cost up to 128bit additional header space. I think that is ok. In one of my earlier versions of this patch I hand-crafted the header using chars and bitfields to be as small as possible, but that seemed over-engineered. However, I left out any automatic verifications accessing these debug members. These are still only done in debug builds. It seems that if you could move origin and usecount into the ASSERT block above (maybe putting usecount before origin. http://cr.openjdk.java.net/~stuefe/webrevs/metaspace-coalescation/2018-02-26/webrev/src/hotspot/share/memory/metaspace.cpp.udiff.html <http://cr.openjdk.java.net/%7Estuefe/webrevs/metaspace-coalescation/2018-02-26/webrev/src/hotspot/share/memory/metaspace.cpp.udiff.html> In takefromcommitted, can the allocation of padding chunks be its own function like addchunkstoaligment() lines 1574-1615? The function is too long now. I moved the padding chunk allocation into an own function as you suggested. I don't think coalescation is a word in English, at least my dictionary cannot find it.  Although it makes sense in the context, just distracting. I replaced "coalescation" with "chunk merging" throughout the code. Also less of a tongue breaker. + // Now check if in the coalescation area there are still life chunks. "live" chunks I guess.   A sentence you won't read often :). Now that I read it it almost sounded sinister :) Fixed. In freechunksget() can you handle the Humongous case first? The else for humongous chunk size is buried tons of lines below. Otherwise it might be helpful to the logic to make your addition to this function be a function you call like chunk = splitfromlargerfreechunk(); I did the latter. I moved the splitting of a larger chunk to an own function. This causes a slight logic change: the new function (ChunkManager::splitchunk()) splits an existing large free chunks into n smaller free chunks and adds them all back to the freelist - that includes the chunk we are about to return. That allows us to use the same exit path - which removes the chunk from the freelist and adjusts all counters - in the caller function "ChunkManager::freechunksget" instead of having to return in the middle of the function. To make the test more readable, I also remove the "test-that-free-chunks-are-optimally-merged" verification - which was quite lengthy - from VirtualSpaceNode::verify() to a new function, VirtualSpaceNode::verifyfreechunksareideallymerged(). You might want to keep the origin in product mode if it doesn't add to the chunk footprint.   Might help with customer debugging. See above Awesome looking test... Thanks, I was worried it would be too complicated. I changed it a bit because there were sporadic errors. Not a "real" error, just the test itself was faulty. The "metaspacesinuse" counter was slightly wrong in one corner case. I've read through most of this and thank you for adding this to at least partially solve the fragmentation problem.  The irony is that we templatized the Dictionary from CMS so that we could use it for Metaspace and that has splitting and coalescing but it seems this code makes more sense than adapting that code (if it's even possible). Well, it helps other metadata use cases too, no. Thank you for working on this.  I'll sponsor this for you. Coleen

Thanks again! I also updated my jdk-submit branch to include these latest changes; tests are still runnning. Kind Regards, Thomas On 2/26/18 9:20 AM, Thomas Stüfe wrote: Hi all, I know this patch is a bit larger, but may I please have reviews and/or other input? Issue: https://bugs.openjdk.java.net/browse/JDK-8198423 <https://bugs.openjdk.java.net/browse/JDK-8198423> Latest version: http://cr.openjdk.java.net/~stuefe/webrevs/metaspace-coalescation/2018-02-26/webrev/ <http://cr.openjdk.java.net/%7Estuefe/webrevs/metaspace-coalescation/2018-02-26/webrev/> For those who followed the mail thread, this is the incremental diff to the last changes (included feedback Goetz gave me on- and off-list): http://cr.openjdk.java.net/~stuefe/webrevs/metaspace-coalescation/2018-02-26/webrev-incr/webrev/ <http://cr.openjdk.java.net/%7Estuefe/webrevs/metaspace-coalescation/2018-02-26/webrev-incr/webrev/> Thank you! Kind Regards, Thomas Stuefe On Thu, Feb 8, 2018 at 12:58 PM, Thomas Stüfe <thomas.stuefe at gmail.com <mailto:thomas.stuefe at gmail.com>> wrote: Hi, We would like to contribute a patch developed at SAP which has been live in our VM for some time. It improves the metaspace chunk allocation: reduces fragmentation and raises the chance of reusing free metaspace chunks. The patch: http://cr.openjdk.java.net/~stuefe/webrevs/metaspace-coalesc <http://cr.openjdk.java.net/%7Estuefe/webrevs/metaspace-coalesc> ation/2018-02-05--2/webrev/ In very short, this patch helps with a number of pathological cases where metaspace chunks are free but cannot be reused because they are of the wrong size. For example, the metaspace freelist could be full of small chunks, which would not be reusable if we need larger chunks. So, we could get metaspace OOMs even in situations where the metaspace was far from exhausted. Our patch adds the ability to split and merge metaspace chunks dynamically and thus remove the "size-lock-in" problem. Note that there have been other attempts to get a grip on this problem, see e.g. "SpaceManager::getsmallchunksandallocate()". But arguably our patch attempts a more complete solution. In 2016 I discussed the idea for this patch with some folks off-list, among them Jon Matsimutso. He then did advice me to create a JEP. So I did: [1]. However, meanwhile changes to the JEP process were discussed [2], and I am not sure anymore this patch needs even needs a JEP. It may be moderately complex and hence carries the risk inherent in any patch, but its effects would not be externally visible (if you discount seeing fewer metaspace OOMs). So, I'd prefer to handle this as a simple RFE. -- How this patch works: 1) When a class loader dies, its metaspace chunks are freed and returned to the freelist for reuse by the next class loader. With the patch, upon returning a chunk to the freelist, an attempt is made to merge it with its neighboring chunks - should they happen to be free too - to form a larger chunk. Which then is placed in the free list. As a result, the freelist should be populated by larger chunks at the expense of smaller chunks. In other words, all free chunks should always be as "coalesced as possible". 2) When a class loader needs a new chunk and a chunk of the requested size cannot be found in the free list, before carving out a new chunk from the virtual space, we first check if there is a larger chunk in the free list. If there is, that larger chunk is chopped up into n smaller chunks. One of them is returned to the callers, the others are re-added to the freelist. (1) and (2) together have the effect of removing the size-lock-in for chunks. If fragmentation allows it, small chunks are dynamically combined to form larger chunks, and larger chunks are split on demand. -- What this patch does not: This is not a rewrite of the chunk allocator - most of the mechanisms stay intact. Specifically, chunk sizes remain unchanged, and so do chunk allocation processes (when do which class loaders get handed which chunk size). Almost everthing this patch does affects only internal workings of the ChunkManager. Also note that I refrained from doing any cleanups, since I wanted reviewers to be able to gauge this patch without filtering noise. Unfortunately this patch adds some complexity. But there are many future opportunities for code cleanup and simplification, some of which we already discussed in existing RFEs ([3], [4]). All of them are out of the scope for this particular patch. -- Details: Before the patch, the following rules held: - All chunk sizes are multiples of the smallest chunk size ("specialized chunks") - All chunk sizes of larger chunks are also clean multiples of the next smaller chunk size (e.g. for class space, the ratio of specialized/small/medium chunks is 1:2:32) - All chunk start addresses are aligned to the smallest chunk size (more or less accidentally, see metaspacereservealignment). The patch makes the last rule explicit and more strict: - All (non-humongous) chunk start addresses are now aligned to their own chunk size. So, e.g. medium chunks are allocated at addresses which are a multiple of medium chunk size. This rule is not extended to humongous chunks, whose start addresses continue to be aligned to the smallest chunk size. The reason for this new alignment rule is that it makes it cheap both to find chunk predecessors of a chunk and to check which chunks are free. When a class loader dies and its chunk is returned to the freelist, all we have is its address. In order to merge it with its neighbors to form a larger chunk, we need to find those neighbors, including those preceding the returned chunk. Prior to this patch that was not easy - one would have to iterate chunks starting at the beginning of the VirtualSpaceNode. But due to the new alignment rule, we now know where the prospective larger chunk must start - at the next lower larger-chunk-size-aligned boundary. We also know that currently a smaller chunk must start there (*). In order to check the free-ness of chunks quickly, each VirtualSpaceNode now keeps a bitmap which describes its occupancy. One bit in this bitmap corresponds to a range the size of the smallest chunk size and starting at an address aligned to the smallest chunk size. Because of the alignment rules above, such a range belongs to one single chunk. The bit is 1 if the associated chunk is in use by a class loader, 0 if it is free. When we have calculated the address range a prospective larger chunk would span, we now need to check if all chunks in that range are free. Only then we can merge them. We do that by querying the bitmap. Note that the most common use case here is forming medium chunks from smaller chunks. With the new alignment rules, the bitmap portion covering a medium chunk now always happens to be 16- or 32bit in size and is 16- or 32bit aligned, so reading the bitmap in many cases becomes a simple 16- or 32bit load. If the range is free, only then we need to iterate the chunks in that range: pull them from the freelist, combine them to one new larger chunk, re-add that one to the freelist. (*) Humongous chunks make this a bit more complicated. Since the new alignment rule does not extend to them, a humongous chunk could still straddle the lower or upper boundary of the prospective larger chunk. So I gave the occupancy map a second layer, which is used to mark the start of chunks. An alternative approach could have been to make humongous chunks size and start address always a multiple of the largest non-humongous chunk size (medium chunks). That would have caused a bit of waste per humongous chunk (<64K) in exchange for simpler coding and a simpler_ _occupancy map._ _--_ _The patch shows its best results in scenarios where a lot_ _of smallish_ _class loaders are alive simultaneously. When dying, they_ _leave continuous_ _expanses of metaspace covered in small chunks, which can_ _be merged nicely._ _However, if class loader life times vary more, we have_ _more interleaving of_ _dead and alive small chunks, and hence chunk merging does_ _not work as well_ _as it could._ _For an example of a pathological case like this see_ _example program: [5]_ _Executed like this: "java -XX:CompressedClassSpaceSize=10M_ _-cp test3_ _test3.Example2" the test will load 3000 small classes in_ _separate class_ _loaders, then throw them away and start loading large_ _classes. The small_ _classes will have flooded the metaspace with small chunks,_ _which are_ _unusable for the large classes. When executing with the_ _rather limited_ _CompressedClassSpaceSize=10M, we will run into an OOM_ _after loading about_ _800 large classes, having used only 40% of the class_ _space, the rest is_ _wasted to unused small chunks. However, with our patch the_ _example program_ _will manage to allocate ~2900 large classes before running_ _into an OOM, and_ _class space will show almost no waste._ _Do demonstrate this, add -Xlog:gc+metaspace+freelist._ _After running into_ _an OOM, statistics and an ASCII representation of the_ _class space will be_ _shown. The unpatched version will show large expanses of_ _unused small_ _chunks, the patched variant will show almost no waste._ _Note that the patch could be made more effective with a_ _different size_ _ratio between small and medium chunks: in class space,_ _that ratio is 1:16,_ _so 16 small chunks must happen to be free to form one_ _larger chunk. With a_ _smaller ratio the chance for coalescation would be larger._ _So there may be_ _room for future improvement here: Since we now can merge_ _and split chunks_ _on demand, we could introduce more chunk sizes._ _Potentially arriving at a_ _buddy-ish allocator style where we drop hard-wired chunk_ _sizes for a_ _dynamic model where the ratio between chunk sizes is_ _always 1:2 and we_ _could in theory have no limit to the chunk size? But this_ _is just a thought_ _and well out of the scope of this patch._ _--_ _What does this patch cost (memory):_ _- the occupancy bitmap adds 1 byte per 4K metaspace._ _- MetaChunk headers get larger, since we add an enum and_ _two bools to it._ _Depending on what the c++ compiler does with that, chunk_ _headers grow by_ _one or two MetaWords, reducing the payload size by that_ _amount._ _- The new alignment rules mean we may need to create_ _padding chunks to_ _precede larger chunks. But since these padding chunks are_ _added to the_ _freelist, they should be used up before the need for new_ _padding chunks_ _arises. So, the maximally possible number of unused_ _padding chunks should_ _be limited by design to about 64K._ _The expectation is that the memory savings by this patch_ _far outweighs its_ _added memory costs._ _.. (performance):_ _We did not see measurable drops in standard benchmarks_ _raising over the_ _normal noise. I also measured times for a program which_ _stresses metaspace_ _chunk coalescation, with the same result._ _I am open to suggestions what else I should measure,_ _and/or independent_ _measurements._ _--_ _Other details:_ _I removed SpaceManager::getsmallchunkandallocate() to_ _reduce_ _complexity somewhat, because it was made mostly obsolete_ _by this patch:_ _since small chunks are combined to larger chunks upon_ _return to the_ _freelist, in theory we should not have that many free_ _small chunks anymore_ _anyway. However, there may be still cases where we could_ _benefit from this_ _workaround, so I am asking your opinion on this one._ _About tests: There were two native tests -_ _ChunkManagerReturnTest and_ _TestVirtualSpaceNode (the former was added by me last_ _year) - which did not_ _make much sense anymore, since they relied heavily on_ _internal behavior_ _which was made unpredictable with this patch._ _To make up for these lost tests,  I added a new gtest_ _which attempts to_ _stress the many combinations of allocation pattern but_ _does so from a layer_ _above the old tests. It now uses Metaspace::allocate() and_ _friends. By_ _using that point as entry for tests, I am less dependent_ _on implementation_ _internals and still cover a lot of scenarios._ _--_ _Review pointers:_ _Good points to start are_ _- ChunkManager::returnsinglechunk() - specifically,_ _ChunkManager::attempttocoalescearoundchunk() - here we_ _merge chunks_ _upon return to the free list_ _- ChunkManager::freechunksget(): Here we now split large_ _chunks into_ _smaller chunks on demand_ _- VirtualSpaceNode::takefromcommitted() : chunks are_ _allocated_ _according to align rules now, padding chunks are handles_ _- The OccupancyMap class is the helper class implementing_ _the new_ _occupancy bitmap_ _The rest is mostly chaff: helper functions, added tests_ _and verifications._ _--_ _Thanks and Best Regards, Thomas_ _[1] https://bugs.openjdk.java.net/browse/JDK-8166690_ _<https://bugs.openjdk.java.net/browse/JDK-8166690> [2] http://mail.openjdk.java.net/pipermail/jdk-dev/2017-November <http://mail.openjdk.java.net/pipermail/jdk-dev/2017-November> /000128.html [3] https://bugs.openjdk.java.net/browse/JDK-8185034 <https://bugs.openjdk.java.net/browse/JDK-8185034> [4] https://bugs.openjdk.java.net/browse/JDK-8176808 <https://bugs.openjdk.java.net/browse/JDK-8176808> [5] https://bugs.openjdk.java.net/secure/attachment/63532/test3.zip <https://bugs.openjdk.java.net/secure/attachment/63532/test3.zip>



More information about the hotspot-dev mailing list