EpsilonGC and throughput. (original) (raw)

Sergey Kuksenko sergey.kuksenko at oracle.com
Tue Dec 19 18:52:48 UTC 2017

Previous message (by thread): EpsilonGC and throughput.
Next message (by thread): EpsilonGC and throughput.
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 12/19/2017 12:14 AM, Aleksey Shipilev wrote:

I assume you have ran SPECjvm2008. Bingo. All of them which were able to work with EpsilonGC at least 30 seconds.

Beware of what I call the Catch-22 of (GC) Performance Evaluation: "standard benchmarks" tend to be developed/tuned with existing GCs in mind. You are partially true. Looking into some sources I could conclude that they were written having general Java style in mind, not tuned to particular GCs.

For example, it would be hard to find the "standard benchmark" that exhibits large LDS, or otherwise experiences large GC pauses, or experiences GC problems in its steady state (ignoring transient hiccups in the warmups).

- EpsilonGC vs ParallelOld: -- only on 3 benchmarks overall throughput with Epsilon GC was higher than ParallelOld and speedup was : 0.2%-0.6% -- on 6 benchmarks, ParallelOld (with barriers and pauses) was faster (faster means throughput!), within 1%-10%. - EpsilonGC vs G1 -- EpsilonGC has shown higher throughput on 4 benchmarks, within 2%-3% - G1 was faster on 5 benchmarks, within 2%-10%. Oh! The throughput figures are actually pretty good for non-compacting collector, and performance improvements are in-line with that is called out in JEP as "Last-drop performance improvements" on special workloads. For special cases yes. I wrote about typical cases. And I my my message was: don't expect that EpsilonGC will show you "ideal throughput" without GC overheads, sometimes GC overhead is important for higher performance. As noted above, it makes little sense to run Epsilon for throughput on "standard benchmarks" that do not suffer from GC issues. It is instructive, however, to run workloads that do suffer from them. I have concerns here. I am afraid that if application does suffer from GC issues it will continue suffering from EpsilonGC issues (OutOfMemory). For example, try this for a quick turn-around CLI workload that is supposed to do one thing very quickly: public class AL { static List l; public static void main(String... args) throws Throwable { l = new ArrayList<>(); for (int c = 0; c < 100000000; c++) { l.add(new Object()); } System.out.println(l.hashCode()); } } $ time java -XX:+UseParallelGC AL -1907572722 real 0m25.063s user 1m5.700s sys 0m1.084s $ time java -XX:+UseG1GC AL -1907572722 real 0m14.908s user 0m33.264s sys 0m0.788s $ time java -XX:+UnlockExperimentalVMOptions -XX:+UseEpsilonGC AL -1907572722 real 0m8.995s user 0m8.784s sys 0m0.260s It doesn't look like throughput benchmark, it's startup. I am sorry, I had to be more clear in my previous email, I was writing about steady state throughput. Converting this into throughput benchmark I've got: G1: 12 seconds ParallelOld: 24 seconds EpsilonGC: 9.5 seconds Not so huge difference, and EpsilonGC can't do more than a couple iterations. In workloads like these, having GC pauses does impact application throughput. Nobody argued with this. I just have shown examples that sometimes GC pauses (with compaction) provide better overall throughput. When out-of-the-box GC performance is concerned, the difference is not even in single-digit percents. Of course, you can configure GC to avoid pauses in the timespan that is critical for you (e.g. setting -Xms8g -Xmx8g -Xmn7g for the workload above), and hope you got it right, but one of the points for Epsilon is not to guess about this, but actually have the guarantee GC never happens. Compacting GCs have significant advantage over non-GC in terms of throughput (e.g. https://shipilev.net/jvm-anatomy-park/11-moving-gc-locality/) True, and it is called out in JEP: "Locality considerations. Non-compacting GC implicitly means it maintains the object graph in its allocation order. This has impact on spatial locality, and regular applications may experience the throughput hit if allocations are random or generate lots of sparse garbage. While this may entail some throughput overhead, this is outside of GC control, and would affect most non-moving GCs. Locality-aware application coding would be required to mitigate this drawback, if locality proves to be a problem." Locality is something that users can control, especially when small contained applications are concerned, and/or (hopefully) Valhalla and other language features that help to flatten the memory. Sure. Just have to note that such special tuned locality-aware application barely could use standard Java API, because of it is out of user control. Epsilon GC is not a silver bullet, and for practical usage it will require more efforts than existing GCs to achieve benefits. I don't mind that such benefits are exist. Thanks, -Aleksey

-- Best regards, Sergey Kuksenko

Previous message (by thread): EpsilonGC and throughput.
Next message (by thread): EpsilonGC and throughput.
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the hotspot-gc-dev mailing list