[llvm-dev] how experimental are the llvm.experimental.vector.reduce.* functions? (original) (raw)
Andrew Kelley via llvm-dev llvm-dev at lists.llvm.org
Sat Feb 9 10:05:31 PST 2019
- Previous message: [llvm-dev] how experimental are the llvm.experimental.vector.reduce.* functions?
- Next message: [llvm-dev] how experimental are the llvm.experimental.vector.reduce.* functions?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Sat, Feb 9, 2019 at 1:42 AM Craig Topper via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
I don't think I understand your pseudocode using llvm.experimental.vector.reduce.umax. All of the types you showed are scalar, but that intrinsic doesn't work on scalars so I'm having a hard time understanding what you're trying to do with it. llvm.experimental.vector.reduce.umax takes a vector input and returns a scalar result. Are you wanting to find if any of the additions overflowed or a mask of which addition overflowed?
Apologies for the confusion - let me try to clarify. Here is frontend code that works now:
export fn entry() void { var a: @Vector(4, i32) = []i32{ 1, 2, 3, 4 }; var b: @Vector(4, i32) = []i32{ 5, 6, 7, 8 }; var x = a +% b; }
This generates the following LLVM IR code:
define void @entry() #2 !dbg !41 { Entry: %a = alloca <4 x i32>, align 16 %b = alloca <4 x i32>, align 16 %x = alloca <4 x i32>, align 16 store <4 x i32> <i32 1, i32 2, i32 3, i32 4>, <4 x i32>* %a, align 16, !dbg !55 call void @llvm.dbg.declare(metadata <4 x i32>* %a, metadata !45, metadata !DIExpression()), !dbg !55 store <4 x i32> <i32 5, i32 6, i32 7, i32 8>, <4 x i32>* %b, align 16, !dbg !56 call void @llvm.dbg.declare(metadata <4 x i32>* %b, metadata !51, metadata !DIExpression()), !dbg !56 %0 = load <4 x i32>, <4 x i32>* %a, align 16, !dbg !57 %1 = load <4 x i32>, <4 x i32>* %b, align 16, !dbg !58 %2 = add <4 x i32> %0, %1, !dbg !59 store <4 x i32> %2, <4 x i32>* %x, align 16, !dbg !60 call void @llvm.dbg.declare(metadata <4 x i32>* %x, metadata !53, metadata !DIExpression()), !dbg !60 ret void, !dbg !61 }
However I used the +% operator, which in Zig is wrapping addition. Now I want to implement the + operator for vectors, which Zig defines to panic if any of the elements overflowed. Here is how the IR could look for this:
define void @entry() #2 !dbg !41 { Entry: %a = alloca <4 x i32>, align 16 %b = alloca <4 x i32>, align 16 %x = alloca <4 x i32>, align 16 store <4 x i32> <i32 1, i32 2, i32 3, i32 4>, <4 x i32>* %a, align 16, !dbg !55 store <4 x i32> <i32 5, i32 6, i32 7, i32 8>, <4 x i32>* %b, align 16, !dbg !56 %0 = load <4 x i32>, <4 x i32>* %a, align 16, !dbg !57 %1 = load <4 x i32>, <4 x i32>* %b, align 16, !dbg !58 %2 = call { <4 x i32>, <4 x i1> } @llvm.sadd.with.overflow.i32(i32 %0, i32 %1) %3 = extractvalue { <4 x i32>, <4 x i1> } %2, 0, !dbg !56 %4 = extractvalue { <4 x i32>, <4 x i1> } %2, 1, !dbg !56 %5 = call i1 @llvm.experimental.vector.reduce.umax.i1.v4i1(%4) br i1 %5, label %OverflowFail, label %OverflowOk, !dbg !56
OverflowFail: ; preds = %Entry tail call fastcc void @panic(%"[]u8"* @2, %StackTrace* null), !dbg !56 unreachable, !dbg !56
OverflowOk: ; preds = %Entry store <4 x i32> %3, <4 x i32>* %x, align 16, !dbg !60 ret void, !dbg !61 }
You can see that it depends on @llvm.sadd.with.overflow working on vector types, and it relies on @llvm.experimental.vector.reduce.umax. I will note that my strategy with sign extension and icmp would be a semantically equivalent alternative to @llvm.sadd.with.overflow.
On 2/9/19 12:37 PM, Nikita Popov wrote:
On Sat, Feb 9, 2019 at 6:25 PM Simon Pilgrim <llvm-dev at redking.me.uk_ _<mailto:llvm-dev at redking.me.uk>> wrote: Regarding the reduction functions - I think the integer intrinsics at least are relatively stable and we can probably investigate dropping the experimental tag before the next release (assuming someone has the time to take on the work) - it'd be nice to have the SLP vectorizer emit reduction intrinsics directly for these.
The vector reduction intrinsics still need quite a lot of work. Apart from SplitVecOp, all legalizations are currently missing. This is only noticeable on AArch64 right now, because all other targets expand vector reductions prior to codegen.
My follow-up question, then, is this:
What do you recommend, in terms of LLVM IR, in order to obtain the %5 value above?
Thanks for the help, Andrew
-------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190209/5feb6cc8/attachment.sig>
- Previous message: [llvm-dev] how experimental are the llvm.experimental.vector.reduce.* functions?
- Next message: [llvm-dev] how experimental are the llvm.experimental.vector.reduce.* functions?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]