original) (raw)
(FWIW, I recall reading on gcc forums that someone made a comparison of cmov vs jmp on i7 and bulldozer, and concluded that cmov becomes better if the branch is predicted < 92% of the time or so.� In addition, bulldozer suffered bigger penalty for cmov than intel.� I can try to dig up that thread if there's interest.
In addition to register pressure, cmov also adds a dependency chain and the instruction size is bigger.
I guess try to write code with more predictable branching is the answer :).
Sent from my phone
Hi Martin,
The issue is more complicated than I thought. The code I pointed before was added by me about 3 years ago for:
7097546: Optimize use of CMOVE instructions
https://bugs.openjdk.java.net/browse/JDK-7097546
Changes were done to avoid 2x performance hit with cmov for code like next:
� � public static int test(int result, int limit, int mask) { // mask = 15
� � � � for (int i = 0; i < limit; i++) {
� � � � � if ((i&mask) == 0) result++; // Non frequent
� � � � }
� � � � return result;
� � }
Cmov instruction has big flow - it requires an additional register. If loop's body is complex, using cmov will result in a register spilling - additional instructions. The performance hit could be high than branch misprediction.
I am not sure how to proceed from here. I may do some benchmark testing to see affects if cmov is used in more cases.
Regards,
Vladimir
On 2/8/14 1:11 PM, Martin Grajcar wrote:
Hi Vladimir!
On Sat, Feb 8, 2014 at 4:36 AM, Vladimir Kozlov
<vladimir.kozlov@oracle.com <mailto:vladimir.kozlov@oracle.com>> wrote:
� � Hi Martin,
� � Your observation is correct. The corresponding code is next:
� � � �float infrequent\_prob = PROB\_UNLIKELY\_MAG(3); // 0.001
� � � �// BlockLayoutByFrequency optimization moves infrequent branch
� � � �// from hot path. No point in CMOV'ing in such case (110 is used
� � � �// instead of 100 to take into account not exactness of float value).
� � � �if (BlockLayoutByFrequency) {
� � � � �infrequent\_prob = MAX2(infrequent\_prob,
� � (float)\_\_BlockLayoutMinDiamondPercentag\_\_e/110.0f);
� � � �}
� � � �// Check for highly predictable branch. �No point in CMOV'ing if
� � � �// we are going to predict accurately all the time.
� � � �if (iff->\_prob < infrequent\_prob ||
� � � � � �iff->\_prob > (1.0f - infrequent\_prob))
� � � � �return NULL;
� � Note, BlockLayoutMinDiamondPercentag\_\_e is default 20 so
� � infrequent\_prob become 0.2 as you observed.
Yes, there's a sharp edge somewhere below 0.2.
� � C2 moves infrequent code outside the loop (with branches out and
� � back) to keep only hot code inside.
To me it looks like there's nothing to be moved outside of the loop.
Mainly because you'd hardy save anything as you'd replace the two
instructions
LEA (%result\_reg, 1), %tmp\_reg
CMOVEQ %tmp\_reg, %result\_reg
by a conditional jump. Saving a single instruction on the hot path and
risking a branch misprediction penalty might make sense for very low
probabilities like PROB\_UNLIKELY\_MAG(3), not 20%.
� � It looks like it does not happen in your case and I need to look
� � why. There are several conditions besides BlockLayoutByFrequency and
� � the above code could be incorrect and needs to be fixed (or removed).
Nice that you can look into it. There are a lot of attempts to eliminate
branching manually like in
http://grepcode.com/file/repo1.maven.org/maven2/com.google.guava/guava/15.0/com/google/common/math/IntMath.java#IntMath.gcd%28int%2Cint%29
but this is nearly always less efficient than using CMOVcc.
Regards,
Martin.