(original) (raw)
Hey all - me again,
So I'm looking at llvm.expect specifically for branch hints. In the following example LLVM will hoist the pow/cos calls into the entry block even though I've used the llvm.expect intrinsic to make it clear that one of the calls is unlikely to occur.
target datalayout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86\_64-pc-windows-msvc-coff"
define dllexport double @foo(i32 %val) {
entry:
%0 = icmp slt i32 %val, 42
%1 = call i1 @llvm.expect.i1(i1 %0, i1 false)
%2 = sitofp i32 %val to double
br i1 %1, label %true, label %false
true:
%3 = call double @llvm.cos.f64(double %2)
br label %merge
false:
%4 = call double @llvm.pow.f64(double %2, double 4.242000e+01)
br label %merge
merge:
%var.1.0 = phi double \[ %4, %false \], \[ %3, %true \]
ret double %var.1.0
}
declare i1 @llvm.expect.i1(i1, i1)
declare double @llvm.cos.f64(double)
declare double @llvm.pow.f64(double, double)
target triple = "x86\_64-pc-windows-msvc-coff"
define dllexport double @foo(i32 %val) {
entry:
%0 = icmp slt i32 %val, 42
%1 = call i1 @llvm.expect.i1(i1 %0, i1 false)
%2 = sitofp i32 %val to double
br i1 %1, label %true, label %false
true:
%3 = call double @llvm.cos.f64(double %2)
br label %merge
false:
%4 = call double @llvm.pow.f64(double %2, double 4.242000e+01)
br label %merge
merge:
%var.1.0 = phi double \[ %4, %false \], \[ %3, %true \]
ret double %var.1.0
}
declare i1 @llvm.expect.i1(i1, i1)
declare double @llvm.cos.f64(double)
declare double @llvm.pow.f64(double, double)
This seems counter intuitive to me - I've told LLVM that one of the calls will probably not happen, and I expected it to preserve the call in the unlikely branch so we don't pay the cost for something unlikely to be used.
I also injected a pass locally that adds, for branches whose condition is llvm.expect, the branch weight metadata - but LLVM will still always fold the branch away ensuring that the expensive call is always called.
The part of SimplifyCFG that does this is FoldTwoEntryPHINode from what I can tell.
So is there anything I can do here without modifying LLVM? Have I missed something?
Otherwise I guess I'd have to change
FoldTwoEntryPHINode to not do this in the presence of branch weights / expect?
Thanks for any help,
Cheers,
-Neil.
