EpsilonGC and throughput. (original) (raw)

Aleksey Shipilev shade at redhat.com
Tue Dec 19 08:14:43 UTC 2017


On 12/18/2017 08:01 PM, Sergey Kuksenko wrote:> I agree that it makes sense to talk about latency, but, please, don't expect that you will be able> to achieve high throughput with Epsilon GC. Having zero barriers is not enough for this.> Just a simple example, I randomly took 9 standard throughput measuring benchmarks and compared> Epsilon GC vs G1 and ParallelOld. I assume you have ran SPECjvm2008.

Beware of what I call the Catch-22 of (GC) Performance Evaluation: "standard benchmarks" tend to be developed/tuned with existing GCs in mind. For example, it would be hard to find the "standard benchmark" that exhibits large LDS, or otherwise experiences large GC pauses, or experiences GC problems in its steady state (ignoring transient hiccups in the warmups).

- EpsilonGC vs ParallelOld: -- only on 3 benchmarks overall throughput with Epsilon GC was higher than ParallelOld and speedup was : 0.2%-0.6% -- on 6 benchmarks, ParallelOld (with barriers and pauses) was faster (faster means throughput!), within 1%-10%.

- EpsilonGC vs G1 -- EpsilonGC has shown higher throughput on 4 benchmarks, within 2%-3% -  G1 was faster on 5 benchmarks, within 2%-10%.

Oh! The throughput figures are actually pretty good for non-compacting collector, and performance improvements are in-line with that is called out in JEP as "Last-drop performance improvements" on special workloads.

As noted above, it makes little sense to run Epsilon for throughput on "standard benchmarks" that do not suffer from GC issues. It is instructive, however, to run workloads that do suffer from them. For example, try this for a quick turn-around CLI workload that is supposed to do one thing very quickly:

public class AL { static List l; public static void main(String... args) throws Throwable { l = new ArrayList<>(); for (int c = 0; c < 100_000_000; c++) { l.add(new Object()); } System.out.println(l.hashCode()); } }

$ time java -XX:+UseParallelGC AL -1907572722

real 0m25.063s user 1m5.700s sys 0m1.084s

$ time java -XX:+UseG1GC AL -1907572722

real 0m14.908s user 0m33.264s sys 0m0.788s

$ time java -XX:+UnlockExperimentalVMOptions -XX:+UseEpsilonGC AL -1907572722

real 0m8.995s user 0m8.784s sys 0m0.260s

In workloads like these, having GC pauses does impact application throughput. When out-of-the-box GC performance is concerned, the difference is not even in single-digit percents. Of course, you can configure GC to avoid pauses in the timespan that is critical for you (e.g. setting -Xms8g -Xmx8g -Xmn7g for the workload above), and hope you got it right, but one of the points for Epsilon is not to guess about this, but actually have the guarantee GC never happens.

Compacting GCs have significant advantage over non-GC in terms of throughput (e.g. https://shipilev.net/jvm-anatomy-park/11-moving-gc-locality/) True, and it is called out in JEP:

"Locality considerations. Non-compacting GC implicitly means it maintains the object graph in its allocation order. This has impact on spatial locality, and regular applications may experience the throughput hit if allocations are random or generate lots of sparse garbage. While this may entail some throughput overhead, this is outside of GC control, and would affect most non-moving GCs. Locality-aware application coding would be required to mitigate this drawback, if locality proves to be a problem."

Locality is something that users can control, especially when small contained applications are concerned, and/or (hopefully) Valhalla and other language features that help to flatten the memory.

Thanks, -Aleksey

-------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20171219/3e851f3a/signature.asc>



More information about the hotspot-gc-dev mailing list