JEP 248: Make G1 the Default Garbage Collector (original) (raw)
Erik Österlund erik.osterlund at lnu.se
Tue Jun 2 13:39:25 UTC 2015
- Previous message: JEP 248: Make G1 the Default Garbage Collector
- Next message: JEP 248: Make G1 the Default Garbage Collector
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi Charlie,
So in summary what you are said is that G1 is in general, without GC fine tuning, a better default GC than ParallelGC for larger heaps because it by design considers more QoS aspects than ParallelGC does and has lots of ergonomics stuff, which is attractive for a default choice of GC when it’s unknown which specific QoS is a concern to the user and how to best tune it. Except hypothetically in the unlikely event that ParallelGC by mere accident happens to behave like a finely GC-tuned application with additional application-specific properties that ultimately results in better latencies for ParallelGC.
I can see where this is going… ;)
Thanks, /Erik
Den 02/06/15 14:29 skrev charlie hunt <charlie.hunt at oracle.com>:
Hi Erik,
Let’s pull out a couple of your questions here and see I can offer some answers.
I do not see why this (latency requirement uncertainty) specifically would be a problem for this particular transition into using G1 more instead of ParallelGC. Let’s focus only on the narrow scope of transitioning application contexts from ParallelGC to G1 only for “larger” heaps. Is there any application context then where G1 has worse latency than ParallelGC? My observations have been that it is not a question of the size of the Java heap, although folks often refer to it in this way. It is more about the combination of the amount of live data, the amount of available space between the live data and the Java heap and the object lifetimes. There are Java apps out there where if Parallel GC is configured in a way (either by mere accident the defaults hit this situation, which would be unusual, or by manually tuning GC), Parallel GC’s young generation is configured in a way that old generation collection can be avoided, many of objects allocated die young, (i.e. there is not a humongous amount of objects sloshing around between survivor spaces, and few, if any are promoted to old gen), Parallel GC will likely offer lower latency than G1. Again, to reiterate, to do this with Parallel GC will likely require a GC tuning effort. Yet there may be an app, or some small number of apps that Parallel GC with its JVM defaults could fit what I just described. But, then again, the context of this JEP is before GC tuning. To up level this a bit, with a given GC, it is generally accepted that as one performance attribute is emphasized, (throughput, latency and memory footprint), there is a sacrifice in one or two of the others. I think you are kind of saying this here too. But, I thought it was worth mentioning specifically. I’ll come back to this a bit later. Another concern Jenny mentioned where G1 could perform worse was JVM start up time. Again, I have a hard time imagining a /server application/ with an explicitly specified “large” heap where anyone would care too much about this. Am I wrong? If we are talking about the difference in time to start the JVM relative to the time it takes to generally initiate a Java app with a large Java heap, then yes, I don’t think it would be much of a concern. This is not to be confused with whether someone is not concerned with the time it takes to initiate an application with a large Java heap taking a long time. That is a different story. ;-) And here G1 was designed, correct me if I’m wrong, to be not necessarily the best at anything, but pretty good at everything (latencies, performance and memory footprints). This sounds to me like a reasonable choice for default application contexts where it’s not known if the user cares about this or that QoS. This is pretty much the conclusion that I have arrived at. IMO, G1 due to its ergonomics offers a larger population of applications a “happy medium” of tradeoffs between the three performance attributes than Parallel GC in the absence of further tuning. thanks, charlie On Jun 1, 2015, at 6:16 PM, Erik Österlund <erik.osterlund at lnu.se> wrote:
Hi Charlie, Den 01/06/15 22:51 skrev charlie hunt <charlie.hunt at oracle.com>:
Hi Erik,
HotSpot does some of this ergonomics today for both GC and JIT compiler in cases where the JVM sees less than 2 GB of RAM and the OS it is running on. These decisions are based on what is called a “server class machine”. A “server class machine” as of JDK 6u18 is defined as a system that has 2 GB or more of RAM, two or more hardware threads. There are other cases for a given hardware platform, and if it is a 32-bit JVM, the collector (and JIT compiler) ergonomically selected may also differ from other configurations. AFAIK, the JEP is proposing to change the default GC in configurations where the default GC is Parallel GC to using G1 as the default. I think the fact that these ergonomics tricks are already around only motivates the approach further as it is in line with the current philosophy that if the user is not explicit about things, then the runtime can and will guess a bit and try to aim for some kind of middle ground solutions that are pretty good but not necessarily the best at everything (like G1 was designed to be). If the guess doesn’t cut it because it turns out that only a single QoS was important, like for instance performance over everything else, then maybe the user should have said so. ;) The challenge with what you are describing is that the best GC cannot always be ergonomically selected by the JVM without some input from the user, i.e. GC doesn’t know if any GC pauses greater than 200 ms are acceptable regardless of Java heap size, number of hardware threads, etc. I do not see why this (latency requirement uncertainty) specifically would be a problem for this particular transition into using G1 more instead of ParallelGC. Let’s focus only on the narrow scope of transitioning application contexts from ParallelGC to G1 only for “larger" heaps. Is there any application context then where G1 has worse latency than ParallelGC? I assume not. So the only visible effect such a change would bring is improved latencies if anything. And the whole mega-low-latency discussion where G1 doesn’t cut it is quite irrelevant for this change as well as those people affected are already not satisfied with ParallelGC that wouldn’t cut it either, and hence specify something explicitly. Another concern Jenny mentioned where G1 could perform worse was JVM start up time. Again, I have a hard time imagining a /server application/ with an explicitly specified “large" heap where anyone would care too much about this. Am I wrong? What is left to annoy people with such a change then (apart from bugs) with latency not being one of them, is resource trade offs in terms of memory footprints and performance. And here G1 was designed, correct me if I’m wrong, to be not necessarily the best at anything, but pretty good at everything (latencies, performance and memory footprints). This sounds to me like a reasonable choice for default application contexts where it’s not known if the user cares about this or that QoS. And with the observation from Jenny that even performance seems to actually be better than ParallelGC for application contexts with large heaps, and the knowledge that latency is in general more important then, does it not make sense to choose G1 at least for those application contexts? Of course this is just a suggestion based on generalizations. Just thought it’s an interesting middle ground worth considering to instead of only considering either changing all or none of the default server application contexts, to only change the subset where we think it is least likely to annoy people, and then as G1 continues to improve and one size starts fitting all, expand that subset in a smoother transition. Thanks, /Erik
thanks, charlie On Jun 1, 2015, at 2:53 PM, Erik Österlund <erik.osterlund at lnu.se> wrote: Hi all, Does there have to be a single default one-size-fits-all GC algorithm for users to rely on? Or could we allow multiple algorithms and explicitly document that unless a GC is picked, the runtime is free to pick whatever it believes is better? This could have multiple benefits. 1. This could make such a similar change easier in the future as everyone will already be aware that if they really rely on the properties of a specific GC algorithm, then they should choose that GC explicitly and not rely on defaults not changing; there are no guarantees that defaults will not change. 2. Obviously there has been a long discussion in this thread which GC is better in which context, and it seems like right now one size does not fit all. The user that relied on the defaults might not be so aware of these specifics. Therefore we might do them a big favour of attempting to make a guess for them to work out-of-the-box, which is pretty neat. 3. This approach allows deploying G1 not everywhere, but where we guess it performs pretty well. This means it will run in fewer JVM contexts and hence pose less risk than deploying it to be used for all contexts, making the transition smoother. One idea could be to first determine valid GC variants given the supplied flags (GC-specific flags imply use of that GC), and then among the valid GCs left, ³guess² which algorithm is better based on the other general parameters, such as e.g. heap size (and maybe target latency)? Could for instance pick ParallelGC for small heaps, G1 for larger heaps and CMS for ridiculously large heaps or cases when extremely low latency is wanted? My reasoning is based on two assumptions: 1) changing the defaults would target the users that don¹t know what¹s best for them, 2) one size does not fit all. If these assumption are wrong, then this is a bad idea. Thanks, /Erik
Den 01/06/15 20:53 skrev charlie hunt <charlie.hunt at oracle.com>: Hi Jenny, A couple questions and comments below. thanks, charlie On Jun 1, 2015, at 1:28 PM, Yu Zhang <yu.zhang at oracle.com> wrote: Hi, I have done some performance comparison g1/cms/parallelgc internally at Oracle. I would like to post my observations here to get some feedback, as I have limited benchmarks and hardware. These are out of box performance. Memory footprint/startup: g1 has bigger memory footprint and longer start up time. The overhead comes from more gc threads, and internal data structures to keep track of remember set. This is the memory footprint of the JVM itself when using the same size Java heap, right? I don¹t recall if it has been your observation? One observation I have had with G1 is that it tends to be able to operate within tolerable throughput and latency with a smaller Java heap than with Parallel GC. I have seen cases where G1 may not use the entire Java heap because it was able to keep enough free regions available yet still meet pause time goals. But, Parallel GC always use the entire Java heap, and once its occupancy reach capacity, it would GC. So they are cases where between the JVM¹s footprint overhead, and taking into account the amount of Java heap required, G1 may actually require less memory.
g1 vs parallelgc: If the workload involves young gc only, g1 could be slightly slower. Also g1 can consume more cpu, which might slow down the benchmark if SUT is cpu saturated. If there are promotions from young to old gen and leads to full gc with parallelgc, for smaller heap, parallel full gc can finish within some range of pause time, still out performs g1. But for bigger heap, g1 mixed gc can clean the heap with pause times a fraction of parallel full gc time, so improve both throughput and response time. Extreme cases are big data workloads(for example ycsb) with 100g heap. I think what you are saying here is that it looks like if one can tune Parallel GC such that you can avoid a lengthy collection of old generation, or the live occupancy of old gen is small enough that the time to collect is small enough to be tolerated, then Parallel GC will offer a better experience. However, if the live data in old generation at the time of its collection is large enough such that the time it takes to collect it exceeds a tolerable pause time, then G1 will offer a better experience. Would also say that G1 offers a better experience in the presences of (wide) swings in object allocation rates since there would likely be a larger number of promotions during the allocation spikes? In other words, G1 may offer more predictable pauses. g1 vs cms: I will focus on response time type of workloads. Ben mentioned "Having said that, there is definitely a decent-sized class of systems (not just in finance) that cannot really tolerate any more than about 10-15ms of STW. So, what usually happens is that they live with the young collections, use CMS and tune out the CMFs as best they can (by clustering, rolling restart, etc, etc). I don't see any possibility of G1 becoming a viable solution for those systems any time soon." Can you give more details, like what is the live data set size, how big is the heap, etc? I did some cache tests (Oracle coherence) to compare cms vs g1. g1 is better than cms when there are fragmentations. If you tune cms well to have little fragmentation, then g1 is behind cms. But for those cases, they have to tune CMS very well, changing default to g1 won't impact them. For big data kind of workloads (ycsb, spark in memory computing), g1 is much better than cms. Thanks, Jenny On 6/1/2015 10:06 AM, Ben Evans wrote: Hi Vitaly, Instead, G1 is now being talked of as a replacement for the default collector. If that's the case, then I think we need to acknowledge it, and have a conversation about where G1 is actually supposed to be used. Are we saying we want a "reasonably high throughput with reduced STW, but not low pause time" collector? If we are, that's fine, but that's not where we started. That's a fair point, and one I'd be interesting in hearing an answer to as well. FWIW, the only GC I know of that's actually used in low latency systems is Azul's C4, so I'm not even sure Oracle is trying to target the same use cases. So when we talk about "low latency" GCs, we should probably also be clear on what "low" actually means. Well, when I started playing with them, "low latency" meant a sub-10-ms transaction time with 100ms STW as acceptable, if not ideal. These days, the same sort of system needs a sub 500us transaction time, and ideally no GC pause at all. But that leads to Zing, or non-JVM solutions, and I think takes us too far into a specialised use case. Having said that, there is definitely a decent-sized class of systems (not just in finance) that cannot really tolerate any more than about 10-15ms of STW. So, what usually happens is that they live with the young collections, use CMS and tune out the CMFs as best they can (by clustering, rolling restart, etc, etc). I don't see any possibility of G1 becoming a viable solution for those systems any time soon. Thanks, Ben
- Previous message: JEP 248: Make G1 the Default Garbage Collector
- Next message: JEP 248: Make G1 the Default Garbage Collector
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]