A loop optimization sometimes results in pessimization (original) (raw)
Vladimir Kozlov vladimir.kozlov at oracle.com
Tue Feb 18 12:05:28 PST 2014
- Previous message: A loop optimization sometimes results in pessimization
- Next message: RFR(XS): 8025562: [parfait] warnings from b13 for hotspot.src.share.vm.opto: null pointer dereference
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Yes, it is know problem:
https://bugs.openjdk.java.net/browse/JDK-7101232
"LCM place all loads at the beginning of the loop and then use loaded values. And loaded values are spilled on stack since there are no registers enough"
I hope we will have time to look this soon.
Thanks, Vladimir
On 2/18/14 3:29 AM, Marko Topolnik wrote:
I have found that the following code:
for (int i = 0; i < array.length; i++) {_ _final int entry = array[i];_ _result ^= i + array[i];_ _if (entry == 0) break;_ _}_ _executes slower if I comment out the last line_ _if (entry == 0) break;_ _I have ensured that the array has no zero values._ _After some analysis, I have isolated the cause to be an additional transformation which is performed in the latter case, when the loop has no premature exit points (subject to the elimination of array bounds check, which does happen in my case). The transformation collates the operations in the unrolled loop, so that array access for all unrolled steps is performed first, followed by all the XORing into the accumulator._ _The slowdown occurs because there is a lot of temporary storage needed to remember all the results of memory reads until they are used for XOR-ing. Some of the reads overflow onto the stack._ _In detail, I have gathered the following table which correlates execution times with the -XX:LoopUnrollLimit command-line argument:_ _LoopUnrollLimit: 14 15 18 19 22 23 60_ _withExitPoint: 96 95 95 79 80 80 69 1/100 ns_ _withoutExitPoint: 94 64 64 63 64 77 75 1/100 ns_ _The following sudden changes can be observed:_ _- on the transition from 14 to 15, the withoutExitPoint variant receives a beneficial transformation and is significantly sped up;_ _- on 18->19, the withExitPoint variant receives a speedup, lesser than the above; - on 22->23, the withoutExitPoint variant is slowed down. At this point I see the above described collation starting to happen. The default loopUnrollLimit for my setup is 60, so I present its results in the last column. The withExitPoint variant is faster, even though it has one more line of code.
Is this a known effect? Would it be worth fixing? -Marko Topolnik P.S. This is the JMH code I have used for measurements: @OutputTimeUnit(TimeUnit.NANOSECONDS) @BenchmarkMode(Mode.AverageTime) @OperationsPerInvocation(Measure.ARRAYSIZE) @Warmup(iterations = 20, time = 200, timeUnit=MILLISECONDS) @Measurement(iterations = 20, time = 200, timeUnit=MILLISECONDS) @State(Scope.Thread) @Threads(1) @Fork(2) public class Measure { public static final int ARRAYSIZE = 1024; private final int[] array = new int[ARRAYSIZE]; @Setup public void setup() { final Random random = new Random(); for (int i = 0; i < ARRAYSIZE; ++i) { final int x = random.nextInt(); array[i] = x == 0? 1 : x; } } @GenerateMicroBenchmark public int withoutExitPoint() { final int[] array = this.array; int result = 0; for (int i = 0; i < array.length; i++) { final int entry = array[i]; result ^= i + entry; } return result; } @GenerateMicroBenchmark public int withExitPoint() { final int[] array = this.array; int result = 0; for (int i = 0; i < array.length; i++) { final int entry = array[i]; result ^= i + array[i]; if (entry == 0) break; } return result; } }
- Previous message: A loop optimization sometimes results in pessimization
- Next message: RFR(XS): 8025562: [parfait] warnings from b13 for hotspot.src.share.vm.opto: null pointer dereference
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the hotspot-compiler-dev mailing list