[llvm-dev] how experimental are the llvm.experimental.vector.reduce.* functions? (original) (raw)
Andrew Kelley via llvm-dev llvm-dev at lists.llvm.org
Sat Feb 9 12:56:25 PST 2019
- Previous message: [llvm-dev] how experimental are the llvm.experimental.vector.reduce.* functions?
- Next message: [llvm-dev] how experimental are the llvm.experimental.vector.reduce.* functions?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 2/9/19 2:05 PM, Craig Topper wrote:
Something like this should work I think.
; ModuleID = 'test.ll' sourcefilename = "test.ll" define void @entry(<4 x i32>* %a, <4 x i32>* %b, <4 x i32>* %x) { Entry: %tmp = load <4 x i32>, <4 x i32>* %a, align 16 %tmp1 = load <4 x i32>, <4 x i32>* %b, align 16 %tmp2 = add <4 x i32> %tmp, %tmp1 %tmpsign = icmp slt <4 x i32> %tmp, zeroinitializer %tmp1sign = icmp slt <4 x i32> %tmp1, zeroinitializer %sumsign = icmp slt <4 x i32> %tmp2, zeroinitializer %signsequal = icmp eq <4 x i1> %tmpsign, %tmp1sign %summismatch = icmp ne <4 x i1> %sumsign, %tmpsign %overflow = and <4 x i1> %signsequal, %summismatch %tmp5 = bitcast <4 x i1> %overflow to i4 %tmp6 = icmp ne i4 %tmp5, 0 br i1 %tmp6, label %OverflowFail, label %OverflowOk OverflowFail: ; preds = %Entry tail call fastcc void @panic() unreachable OverflowOk: ; preds = %Entry store <4 x i32> %tmp2, <4 x i32>* %x, align 16 ret void } declare fastcc void @panic()
Thanks! I was able to get it working with your hint:
%tmp5 = bitcast <4 x i1> %overflow to i4
(Thanks also to LebedevRI who pointed this out on IRC)
Until LLVM 9 when the llvm..with.overflow. intrinsics gain vector support, here's what I ended up with:
%a = alloca <4 x i32>, align 16 %b = alloca <4 x i32>, align 16 %x = alloca <4 x i32>, align 16 store <4 x i32> <i32 1, i32 2, i32 3, i32 4>, <4 x i32>* %a, align 16, !dbg !55 store <4 x i32> <i32 5, i32 6, i32 7, i32 8>, <4 x i32>* %b, align 16, !dbg !56 %0 = load <4 x i32>, <4 x i32>* %a, align 16, !dbg !57 %1 = load <4 x i32>, <4 x i32>* %b, align 16, !dbg !58 %2 = sext <4 x i32> %0 to <4 x i33>, !dbg !59 %3 = sext <4 x i32> %1 to <4 x i33>, !dbg !59 %4 = add <4 x i33> %2, %3, !dbg !59 %5 = trunc <4 x i33> %4 to <4 x i32>, !dbg !59 %6 = sext <4 x i32> %5 to <4 x i33>, !dbg !59 %7 = icmp ne <4 x i33> %4, %6, !dbg !59 %8 = bitcast <4 x i1> %7 to i4, !dbg !59 %9 = icmp ne i4 %8, 0, !dbg !59 br i1 %9, label %OverflowFail, label %OverflowOk, !dbg !59
Idea being: extend and do the operation with more bits. Truncate to get the result. Re-extend the result and check if it is the same as the pre-truncated result.
This works pretty well unless the vector integer size is as big or larger than the native vector register. Here's a quick performance test:
https://gist.github.com/andrewrk/b9734f9c310d8b79ec7271e7c0df4023
Summary: safety-checked integer addition with no optimizations
<4 x i32>: scalar = 893 MiB/s vector = 3.58 GiB/s
<16 x i128>: scalar = 3.6 GiB/s vector = 2.5 GiB/s
-------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190209/ddeb0994/attachment.sig>
- Previous message: [llvm-dev] how experimental are the llvm.experimental.vector.reduce.* functions?
- Next message: [llvm-dev] how experimental are the llvm.experimental.vector.reduce.* functions?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]