Strange branching performance (original) (raw)

Vladimir Kozlov vladimir.kozlov at oracle.com
Wed Feb 12 17:09:50 PST 2014


I filed RFE to track this issue:

https://bugs.openjdk.java.net/browse/JDK-8034833

Regards, Vladimir

On 2/12/14 4:18 PM, Vladimir Kozlov wrote:

Hi Martin,

The issue is more complicated than I thought. The code I pointed before was added by me about 3 years ago for: 7097546: Optimize use of CMOVE instructions https://bugs.openjdk.java.net/browse/JDK-7097546 Changes were done to avoid 2x performance hit with cmov for code like next: public static int test(int result, int limit, int mask) { // mask = 15 for (int i = 0; i < limit; i++) { if ((i&mask) == 0) result++; // Non frequent } return result; } Cmov instruction has big flow - it requires an additional register. If loop's body is complex, using cmov will result in a register spilling - additional instructions. The performance hit could be high than branch misprediction. I am not sure how to proceed from here. I may do some benchmark testing to see affects if cmov is used in more cases. Regards, Vladimir On 2/8/14 1:11 PM, Martin Grajcar wrote: Hi Vladimir!

On Sat, Feb 8, 2014 at 4:36 AM, Vladimir Kozlov <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>> wrote: Hi Martin, Your observation is correct. The corresponding code is next: float infrequentprob = PROBUNLIKELYMAG(3); // 0.001 // BlockLayoutByFrequency optimization moves infrequent branch // from hot path. No point in CMOV'ing in such case (110 is used // instead of 100 to take into account not exactness of float value). if (BlockLayoutByFrequency) { infrequentprob = MAX2(infrequentprob, _(float)BlockLayoutMinDiamondPercentag_e/110.0f); } // Check for highly predictable branch. No point in CMOV'ing if // we are going to predict accurately all the time. if (iff->prob < infrequentprob ||_ _iff->prob > (1.0f - infrequentprob)) return NULL; Note, BlockLayoutMinDiamondPercentag_e is default 20 so infrequentprob become 0.2 as you observed.

Yes, there's a sharp edge somewhere below 0.2. C2 moves infrequent code outside the loop (with branches out and back) to keep only hot code inside. To me it looks like there's nothing to be moved outside of the loop. Mainly because you'd hardy save anything as you'd replace the two instructions LEA (%resultreg, 1), %tmpreg CMOVEQ %tmpreg, %resultreg by a conditional jump. Saving a single instruction on the hot path and risking a branch misprediction penalty might make sense for very low probabilities like PROBUNLIKELYMAG(3), not 20%. It looks like it does not happen in your case and I need to look why. There are several conditions besides BlockLayoutByFrequency and the above code could be incorrect and needs to be fixed (or removed). Nice that you can look into it. There are a lot of attempts to eliminate branching manually like in http://grepcode.com/file/repo1.maven.org/maven2/com.google.guava/guava/15.0/com/google/common/math/IntMath.java#IntMath.gcd%28int%2Cint%29 but this is nearly always less efficient than using CMOVcc. Regards, Martin.



More information about the hotspot-compiler-dev mailing list