[Exp] First prototype of new acmp bytecode (original) (raw)
Tobias Hartmann tobias.hartmann at oracle.com
Thu Mar 8 16:39:07 UTC 2018
- Previous message (by thread): Java Valhalla and Maths support, Floating point
- Next message (by thread): [Exp] First prototype of new acmp bytecode
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi,
I've added type speculation on non-nullness to avoid the null check when emitting the new acmp (it only helps for non-jsr292 methods if -XX:TypeProfileLevel is set to > 111). We might want to add more speculation in the future but I think that should be enough for now. I've also fixed the code in macro.cpp and converted the test to the jtreg format: http://cr.openjdk.java.net/~thartmann/valhalla/exp/acmp.04/
Here is a simple JMH benchmark that tests some common cases: http://cr.openjdk.java.net/~thartmann/valhalla/exp/acmp.04/NewAcmpBenchmark.java
-XX:-TieredCompilation -XX:-UseNewAcmp Benchmark Mode Cnt Score Error Units NewAcmpBenchmark.newCmp thrpt 200 108.911 ± 0.086 ops/us NewAcmpBenchmark.newCmpDoubleNull thrpt 200 88.206 ± 4.792 ops/us NewAcmpBenchmark.newCmpDoubleNullFalse thrpt 200 72.742 ± 7.563 ops/us NewAcmpBenchmark.newCmpField thrpt 200 107.090 ± 0.083 ops/us NewAcmpBenchmark.oldCmp thrpt 200 114.466 ± 0.077 ops/us
-XX:-TieredCompilation -XX:+UseNewAcmp -XX:ValueBasedClasses=compiler/valhalla/valuetypes/MyValue Benchmark Mode Cnt Score Error Units NewAcmpBenchmark.newCmp thrpt 200 101.480 ± 0.260 ops/us NewAcmpBenchmark.newCmpDoubleNull thrpt 200 90.429 ± 4.741 ops/us NewAcmpBenchmark.newCmpDoubleNullFalse thrpt 200 81.230 ± 4.115 ops/us NewAcmpBenchmark.newCmpField thrpt 200 102.224 ± 0.019 ops/us NewAcmpBenchmark.oldCmp thrpt 200 114.336 ± 0.239 ops/us
In the worst case, if we need to emit the new acmp and the first operand is not null, there is a performance impact of 6.80% (see newCmp).
However, in many cases we can use static type information to optimize. For example, if we know that one operand is a value type, we can emit a "double null check". This causes the performance impact to disappear into the noise (see newCmpDoubleNull). If we know in addition that one operand is always non-null, we can emit a static false. This improves performance by ~11% (high error) compared to old acmp.
There is one pitfall. If we compare two object fields, C2 optimizes old acmp to directly compare the narrow oops (no need to decode). With the new acmp, we need to decode the oop because we use derived oops for perturbation. Surprisingly, the newCmpField benchmark shows that the regression is even lower than in the newCmp case (4.5%). That's probably because the comparison is always false and therefore the CPUs branch prediction works better, mitigating the performance impact of the additional instructions.
The last benchmark (oldCmp) verifies that if C2 is able to determine that one operand is not a value type, we can use the old acmp and performance is equal to the baseline.
I will re-run the tests with type speculation enabled to see how much of a difference that makes.
I think this is stable enough to be pushed to the Exp branch. Any objections?
Thanks, Tobias
On 23.02.2018 14:22, Tobias Hartmann wrote:
Hi John,
On 21.02.2018 22:04, John Rose wrote: You might even be able to get rid of the special node type (arity=3), if the cases where CmpP sees a derived oop can be recognized as perturbations. I don't think we do CmpP on derived oops in any other circumstance (no C-style pointer/limit loops). Yes, we only use CmpP with AddP inputs for raw pointer comparisons (for example, in PhaseMacroExpand::expandallocatecommon) and we can easily filter these out. Here's the new webrev: http://cr.openjdk.java.net/~thartmann/valhalla/exp/acmp.03/ Changes include: - Using derived oops for perturbation - Got rid of all CastX2P usages - Removed additional input edge from CmpP - Factored common code into separate methods - Swap operand optimization to avoid null checks in new acmp - Added code to ensure OrX is folded to null check or constant false if possible - Interface supertype support I've executed performance runs with -XX:-TieredCompilation -XX:ValueBasedClasses= -XX:+UseNewAcmp and there is no significant performance difference with SPECjvm2008 and SPECjbb2015 (and some of our internal benchmarks). TODOs: - JMH benchmarks - Type speculation on (non-)nullness - Fix changes in macro.cpp - Convert test to jtreg format Best regards, Tobias
- Previous message (by thread): Java Valhalla and Maths support, Floating point
- Next message (by thread): [Exp] First prototype of new acmp bytecode
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]