Strange branching performance (original) (raw)
Vitaly Davidovich vitalyd at gmail.com
Wed Feb 12 18:14:58 PST 2014
- Previous message: Strange branching performance
- Next message: Strange branching performance
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
FWIW, I recall reading on gcc forums that someone made a comparison of cmov vs jmp on i7 and bulldozer, and concluded that cmov becomes better if the branch is predicted < 92% of the time or so. In addition, bulldozer suffered bigger penalty for cmov than intel. I can try to dig up that thread if there's interest.
In addition to register pressure, cmov also adds a dependency chain and the instruction size is bigger.
I guess try to write code with more predictable branching is the answer :).
Sent from my phone On Feb 12, 2014 7:20 PM, "Vladimir Kozlov" <vladimir.kozlov at oracle.com> wrote:
Hi Martin,
The issue is more complicated than I thought. The code I pointed before was added by me about 3 years ago for: 7097546: Optimize use of CMOVE instructions https://bugs.openjdk.java.net/browse/JDK-7097546 Changes were done to avoid 2x performance hit with cmov for code like next: public static int test(int result, int limit, int mask) { // mask = 15 for (int i = 0; i < limit; i++) { if ((i&mask) == 0) result++; // Non frequent } return result; } Cmov instruction has big flow - it requires an additional register. If loop's body is complex, using cmov will result in a register spilling - additional instructions. The performance hit could be high than branch misprediction. I am not sure how to proceed from here. I may do some benchmark testing to see affects if cmov is used in more cases. Regards, Vladimir On 2/8/14 1:11 PM, Martin Grajcar wrote:
Hi Vladimir!
On Sat, Feb 8, 2014 at 4:36 AM, Vladimir Kozlov <vladimir.kozlov at oracle.com <mailto:vladimir.kozlov at oracle.com>> wrote: Hi Martin, Your observation is correct. The corresponding code is next: float infrequentprob = PROBUNLIKELYMAG(3); // 0.001 // BlockLayoutByFrequency optimization moves infrequent branch // from hot path. No point in CMOV'ing in such case (110 is used // instead of 100 to take into account not exactness of float value). if (BlockLayoutByFrequency) { infrequentprob = MAX2(infrequentprob, _(float)BlockLayoutMinDiamondPercentag_e/110.0f); } // Check for highly predictable branch. No point in CMOV'ing if // we are going to predict accurately all the time. if (iff->prob < infrequentprob ||_ _iff->prob > (1.0f - infrequentprob)) return NULL; Note, BlockLayoutMinDiamondPercentag_e is default 20 so infrequentprob become 0.2 as you observed.
Yes, there's a sharp edge somewhere below 0.2. C2 moves infrequent code outside the loop (with branches out and back) to keep only hot code inside. To me it looks like there's nothing to be moved outside of the loop. Mainly because you'd hardy save anything as you'd replace the two instructions LEA (%resultreg, 1), %tmpreg CMOVEQ %tmpreg, %resultreg by a conditional jump. Saving a single instruction on the hot path and risking a branch misprediction penalty might make sense for very low probabilities like PROBUNLIKELYMAG(3), not 20%. It looks like it does not happen in your case and I need to look why. There are several conditions besides BlockLayoutByFrequency and the above code could be incorrect and needs to be fixed (or removed). Nice that you can look into it. There are a lot of attempts to eliminate branching manually like in http://grepcode.com/file/repo1.maven.org/maven2/com. google.guava/guava/15.0/com/google/common/math/IntMath. java#IntMath.gcd%28int%2Cint%29 but this is nearly always less efficient than using CMOVcc. Regards, Martin. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20140212/b00b2a1b/attachment.html
- Previous message: Strange branching performance
- Next message: Strange branching performance
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the hotspot-compiler-dev mailing list