Epsilon GC JEP updates (was: Re: EpsilonGC and throughput.) (original) (raw)

Aleksey Shipilev shade at redhat.com
Mon Mar 12 13:12:37 UTC 2018

Previous message (by thread): RFR: 8199027: Make protected members private in G1Policy
Next message (by thread): RFR (XXXS): 8199516: Remove dead code overlooked during Full GC work
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hey Thomas,

Updated the JEP here: https://bugs.openjdk.java.net/browse/JDK-8174901

On 01/08/2018 05:45 PM, Thomas Schatzl wrote:

I apologize for my somewhat inappropriate words, this has been due to some frustration; also for the long delay that were due to the winter holidays.

It took me quite some time to recover from this, and I am still slightly bitter. If we want contributions to OpenJDK, then we all have to understand this kind of thing really dissuades people from contributing. I have got quite a few personal "raised eyebrows" replies on this, one person calling the whole thing right away hostile.

I can imagine much less thick-skinned contributors just walking away. I know the intent wasn't that, but what matters here is not the intent, but the appearance too (I fell into the same trap before numerous times, and learned to keep it together -- call this thread a karmic justice).

It also helps if the JEP is written in a way to make it interesting for the community to read it, and respond. The less thinking a reader has to do to answer whether he is impacted or not, and whether and by how much it would simplify the life of himself or in general Java users, the more people will feel urged to get this in (or at least not deterred).

Come to think about it, the public discussion this JEP is getting is positive, and there is no need to urge more people to get it in at this point. Lots of people read it, and surprisingly quite a lot of them tried the prototype, at least for fun, but also for performance work too.

"Motivation ----------

JEP text: "Java implementations are well known for a broad choice of highly configurable GC implementations."

Potential answer to "Why should this work be done?". Or does the sentence indicate we need another GC because we already have so many, and another does not hurt? I am asking this in full seriousness, I really do not know. Or is this only an introductory sentence without meaning?

This statement underpins there is no single all-purpose GC in OpenJDK.

"The variety of available collectors caters for different needs in the end, even if their configurability make their functionality intersect. It is sometimes easier to maintain a separate implementation, rather than piling on another configuration option on the existing GC implementation."

Let's go into these benefits in more detail:

JEP text: "Performance testing. ..." Benefit. Maybe it would be useful to list a few of these performance artifacts here ("... , e.g. barrier code, concurrent threads").

I added some. We (in Shenandoah development land), and others (in Shenandoah/ZGC/Zing comparison land) have used Epsilon as the ultimate latency baseline. New text captures that bit:

"Having a GC that does almost nothing is a useful tool to do differential performance analysis for other, real GCs. Having a no-op GC can help to filter out GC-induced performance artifacts, like GC workers scheduling, GC barriers costs, GC cycles triggered at unfortunate times, locality changes, etc. Moreover, there are latency artifacts that are not GC-induced (e.g. scheduling hiccups, compiler transition hiccups, etc), and removing the GC-induced artifacts help to contrast those. For example, having the no-op GC allows to estimate the natural "background" latency baseline for low-latency GC work."

An alternative could be a developer just nop'ing out the relevant GC interface section. That is somewhat cumbersome, but for how many users is this a problem? Spell that out in the appropriate Alternatives section.

Spelled: "The developers might just no-op out the existing GC implementation to get the baseline implementation for testing. The problem with this is inconvenience: the developers would need to make sure such the implementation is still correct, that it provides enough performance to be a good baseline, that it is hooked up into the other runtime facilities (heap dumping, thread stack walking, MXBeans) to amend the differential analysis. The implementations for other platforms would require much more work. Having the ready-to-go no-op implementation in the mainline solves this inconvenience."

JEP text: "Functional testing. For Java code testing, a way to establish a threshold for allocated memory is useful to assert memory pressure invariants. Today, we have to pick up the allocation data from MXBeans, or even resort to parsing GC logs. Having a GC that accepts only the bounded number of allocations, and fails on heap exhaustion, simplifies testing."

Benefit. For regression testing, in how many cases do you think it is sufficient (or in what circumstances) to get a fail/no-fail answer only? This seems to pass work on a failure to the dev, them needing to write another test that also prints and monitors the memory usage increases over time anyway. How much work, given that you already need to monitor memory usage is the test to fail when heap usage goes above a threshold then?

I don't quite believe debugging the test like this would involve tracking the memory allocated so far, because that is not readily actionable. Even if it is, Epsilon prints the messages when n% of the heap was allocated, which actually improves developer's experience, because as a dev I don't need to copy-paste MXBeans blocks anymore.

But what is more actionable is the actual heap dump. And this is where Epsilon comes handy: you just set -Xmx1g -XX:HeapDumpOnOutOfMemoryError, and run the test. If the test fails, you get the ready heap dump that tell you what the test had ever allocated to blow the allocation limit, down to every single object.

I added the example: "For example, knowing that test should allocate no more than 1 GB of memory, we can configure no-op GC with -Xmx1g, and let it crash with a heap dump if that constraint is violated."

"VM interface testing. For VM development purposes, having a simple GC helps to understand the absolute minimum required from the VM-GC interface to have a functional allocator. This serves as proof that the VM-GC interface is sane, which is important in lieu of JEP 304 ("Garbage Collector Interface")."

Benefit. Who are the (main) benefactors for that - probably developers? For a developer, how much is that benefit if there are already 5 or 6 implementations of that interface?

The benefit is simple: for no-op GC, the implemented GC interface should be in epsilon-neighborhood of zero. It is not right now, by the way, in both native and codegen parts: https://builds.shipilev.net/patch-openjdk-epsilon-jdk/

I added: "For no-op GC, the interface should not have anything implemented, and good interface means Epsilon's BarrierSet would just use no-op barrier implementations from the default implementation".

"Last-drop performance improvements. For ultra-latency-sensitive applications, where developers are conscious about memory allocations and know the application memory footprint exactly, or even have (almost) completely garbage-free applications. In those applications, GC cycles may be considered an implementation bug that wastes CPU cycles for no good reason."

I split this section in three: "Extremely short lived jobs", "Last-drop latency improvements", and "Last-drop throughput improvements", because it seems more logical that way.

It may be useful to investigate the problem of these power users in more detail, and see if we could provide a (more?) complete solution for them.

The problem with this is, those power users consider whatever tricks they managed to tame the misbehaving GC their competitive advantage (think HFT), and are not really inclined to share. (Some did not manage, and they moved out of Java, to our disadvantage). The bits and pieces I got are "give us the no-op GC, and we shall figure out how to manage our memory ourselves, thank you very much". And in many cases, from the little glimpses that were shared behind the curtains, those guys seem to really know what they doing. All I'm saying here is that we have no alternative than to take it on faith, which I am willing to do.

"Extremely short lived jobs are one example of this."

I do not understand the use of Epsilon in such use case. The alternative I can see would be to restart the VM after every short lived job (something for the Alternatives section). That seems strange to me, depending on the definition of a "short lived job", particularly if nothing survives after execution of that short lived job, a GC will be extremely fast.

This relies on per-supposition that GC is fast, because heap is a graveyard. It is not always the case.

I have demonstrated one example: http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2017-December/021042.html

Added: "A short-lived job might rely on exiting quickly to free the resources (e.g. heap memory). In this case, accepting the GC cycle to futilely clean up the heap is a waste of time, because the heap would be freed on exit anyway. Note that the GC cycle might take a while, because it would depend on the amount of live data in the heap, which can be a lot."

"There are also cases when restarting the JVM -- letting load balancers figure out failover -- is sometimes a better recovery strategy than accepting a GC cycle."

I really can't find a good example where a GC, particularly in the situation that has been described so far, also for these short-lived jobs, where a GC (on an almost empty heap) is not at least as fast as a restart.

Again, this relies on per-supposition that GC is fast, because heap is a graveyard. It is, again, not always the case.

The real-world systems I know of, the latency of node restart does not matter as much: it is more important to reliably fail the node to let balancer act. In other words, the "global" detect-and-evade latency is more important than "local" restart latency.

Accepting the GC cycle makes the availability logic harder: you now have to disambiguate between the normal 100ms execution in the business logic, and the first 100ms of multi-second GC pause. Which probably means timeout to be several-sigma larger than the usual business logic wait time, which prolongs the recovery. Instead, you might just crash, and let balancer figure out where to restart the processing right away.

I understand this goes against our own intuition how the systems should be built. Of course, we want our GCs to never push users to come up with these contraptions, but the sad reality is that they are doing that, because the world and GC implementations in this world are not perfect.

And you would not find that in our spectrum of well-behaved workloads. Talking to people who maintain large real-world systems can be sobering for understanding what they have to deal with. For example, one customer asked me to come up with this contraption for their high-availability in-memory grid -- they are ready for JVM to crash, and it fact they would like it to crash instead of stalling! https://bugs.openjdk.java.net/browse/JDK-8181143

It would make for a very good paragraph explaining this use case in the alternatives section.

Another problem with these two sentences to me is (and I am by no means a "FaaS power user") that I believe that waiting for the VM to crash/shut down to steer the load balancers is not a good strategy. Maybe you can give some more information about this use case?

JEP does not advocate for using this strategy. It just reports what users are doing in the wild. So, discussion around this seems to go outside the scope of the JEP.

In the earlier email I only directly asked for performance numbers because in order to streamline this discussion, and given that you are a well-known performance and benchmark guru (afaik you were "R"eviewer long before me joining) it seemed a logical request. If you can't find numbers, there is also the reference ("Barriers, Friendlier Still" or so from Blackburn et al I think) I got that is also mentioned iirc in the very good Jones GC book. "Real" newbies I would just ask to perform this test.

See here: https://shipilev.net/jvm-anatomy-park/13-intergenerational-barriers/

I considered it a bad taste to link my blog to the JEP, but I can do this anyway.

"Alternatives ------------ Again, try to make these alternative review balanced, and in context of the users the benefit is for.

I have rewritten that part with most things we have discussed, and tried to discuss why those alternatives are not exactly better. I might still be missing some salient alternatives, because I ran out of steam...

Thanks, -Aleksey

-------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20180312/42e80a6f/signature.asc>

Previous message (by thread): RFR: 8199027: Make protected members private in G1Policy
Next message (by thread): RFR (XXXS): 8199516: Remove dead code overlooked during Full GC work
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the hotspot-gc-dev mailing list