[LLVMdev] Weird problems with cos (was Re: [PATCH v3 2/3] R600: Add carry and borrow instructions. Use them to implement UADDO/USUBO) (original) (raw)
Jan Vesely jan.vesely at rutgers.edu
Fri Oct 3 09:32:03 PDT 2014
- Previous message: [LLVMdev] ASAN tests on ARM
- Next message: [LLVMdev] Weird problems with cos (was Re: [PATCH v3 2/3] R600: Add carry and borrow instructions. Use them to implement UADDO/USUBO)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi Tom, Matt,
I'm running into strange issues with the cos test (piglit generated_tests/cl/builtin/math/builtin-float-cos-1.0.generated.c)
I have been seeing random failures (incorrect results) for some time and tried to investigate. the weird part is that the failures are not 100% reproducible, sometimes the tests pass, or partly pass (it's usually float8 and float16 subtests that fail). Failure is always the same "Expecting -0.925879 (0xbf6d0668) with tolerance 0.000000 (2 ulps), but got nan (0x7fc00000)" although the position may vary. even if the same value was computed earlier in the results array
The first patch of this series does not change the behavior (or instruction dump). however, using the ADDC instruction results in hang on every cos test "ring 0 stalled for more than 10000msec" "GPU lockup (waiting for 0x00000000001023cf last fence id 0x00000000001023ce on ring 0)"
although the actual test results follow the same result as before (random failures mostly in float8/16 tests). I can even get test pass with hang on every subtest
Using SIGN_EXTEND_INREG instead of "SUB 0" in this patch gets rid of the hangs, and makes the failures fully reproducible in every subtest, triggered on the first occurrence of what should have been -0.925879.
the GPU is AMD TURKS (HD 7570 1002:675d)
I tried digging throught he manual but it oly mentions that ADDC is vec and trans inst. Is there any errata document the might give a hint?
thanks, jan
PS: There are no problems with sin, so I might be able to triage at least the code that hangs with this patch
On Wed, 2014-09-24 at 20:27 -0400, Jan Vesely wrote:
v2: tighten the sub64 tests v3: rename to CARRY/BORROW
Signed-off-by: Jan Vesely <jan.vesely at rutgers.edu> --- lib/Target/R600/AMDGPUISelLowering.h | 2 + lib/Target/R600/AMDGPUInstrInfo.td | 6 ++ lib/Target/R600/AMDGPUSubtarget.h | 8 ++ lib/Target/R600/EvergreenInstructions.td | 3 + lib/Target/R600/R600ISelLowering.cpp | 39 +++++++- test/CodeGen/R600/add.ll | 154 +++++++++++++++++-------------- test/CodeGen/R600/sub.ll | 18 ++-- test/CodeGen/R600/uaddo.ll | 17 +++- test/CodeGen/R600/usubo.ll | 23 ++++- 9 files changed, 189 insertions(+), 81 deletions(-) diff --git a/lib/Target/R600/AMDGPUISelLowering.h b/lib/Target/R600/AMDGPUISelLowering.h index 911576b..6eaf001 100644 --- a/lib/Target/R600/AMDGPUISelLowering.h +++ b/lib/Target/R600/AMDGPUISelLowering.h @@ -205,6 +205,8 @@ enum { RSQCLAMPED, LDEXP, DOT4, + CARRY, + BORROW, BFEU32, // Extract range of bits with zero extension to 32-bits. BFEI32, // Extract range of bits with sign extension to 32-bits. BFI, // (src0 & src1) | (~src0 & src2) diff --git a/lib/Target/R600/AMDGPUInstrInfo.td b/lib/Target/R600/AMDGPUInstrInfo.td index 3d70791..1600c4a 100644 --- a/lib/Target/R600/AMDGPUInstrInfo.td +++ b/lib/Target/R600/AMDGPUInstrInfo.td @@ -91,6 +91,12 @@ def AMDGPUumin : SDNode<"AMDGPUISD::UMIN", SDTIntBinOp,_ _[SDNPCommutative, SDNPAssociative]_ _>; +// out = (src0 + src1 > 0xFFFFFFFF) ? 1 : 0 +def AMDGPUcarry : SDNode<"AMDGPUISD::CARRY", SDTIntBinOp, []>; + +// out = (src1 > src0) ? 1 : 0 +def AMDGPUborrow : SDNode<"AMDGPUISD::BORROW", SDTIntBinOp, []>; + def AMDGPUcvtf32ubyte0 : SDNode<"AMDGPUISD::CVTF32UBYTE0",_ _SDTIntToFPOp, []>; diff --git a/lib/Target/R600/AMDGPUSubtarget.h b/lib/Target/R600/AMDGPUSubtarget.h index 6797972..9f2ba61 100644 --- a/lib/Target/R600/AMDGPUSubtarget.h +++ b/lib/Target/R600/AMDGPUSubtarget.h @@ -168,6 +168,14 @@ public: return (getGeneration() >= EVERGREEN); } + bool hasCARRY() const { + return (getGeneration() >= EVERGREEN); + } + + bool hasBORROW() const { + return (getGeneration() >= EVERGREEN); + } + bool IsIRStructurizerEnabled() const { return EnableIRStructurizer; } diff --git a/lib/Target/R600/EvergreenInstructions.td b/lib/Target/R600/EvergreenInstructions.td index 8117b60..d3822ef 100644 --- a/lib/Target/R600/EvergreenInstructions.td +++ b/lib/Target/R600/EvergreenInstructions.td @@ -336,6 +336,9 @@ defm CUBEeg : CUBECommon<0xC0>; def BCNTINT : R6001OPHelper <0xAA, "BCNTINT", ctpop, VecALU>; +def ADDCUINT : R6002OPHelper <0x52, "ADDCUINT", AMDGPUcarry>; +def SUBBUINT : R6002OPHelper <0x53, "SUBBUINT", AMDGPUborrow>; + def FFBHUINT : R6001OPHelper <0xAB, "FFBHUINT", ctlzzeroundef, VecALU>; def FFBLINT : R6001OPHelper <0xAC, "FFBLINT", cttzzeroundef, VecALU>; diff --git a/lib/Target/R600/R600ISelLowering.cpp b/lib/Target/R600/R600ISelLowering.cpp index 9b2b689..a28b76a 100644 --- a/lib/Target/R600/R600ISelLowering.cpp +++ b/lib/Target/R600/R600ISelLowering.cpp @@ -89,6 +89,15 @@ R600TargetLowering::R600TargetLowering(TargetMachine &TM) : setOperationAction(ISD::SELECT, MVT::v2i32, Expand); setOperationAction(ISD::SELECT, MVT::v4i32, Expand); + // ADD, SUB overflow. These need to be Custom because + // SelectionDAGLegalize::LegalizeOp (LegalizeDAG.cpp) + // turns Legal into expand + if (Subtarget->hasCARRY()) + setOperationAction(ISD::UADDO, MVT::i32, Custom); + + if (Subtarget->hasBORROW()) + setOperationAction(ISD::USUBO, MVT::i32, Custom); + // Expand sign extension of vectors if (!Subtarget->hasBFE()) setOperationAction(ISD::SIGNEXTENDINREG, MVT::i1, Expand); @@ -154,8 +163,6 @@ R600TargetLowering::R600TargetLowering(TargetMachine &TM) : setTargetDAGCombine(ISD::SELECTCC); setTargetDAGCombine(ISD::INSERTVECTORELT); - setOperationAction(ISD::SUB, MVT::i64, Expand); - // These should be replaced by UDVIREM, but it does not happen automatically // during Type Legalization setOperationAction(ISD::UDIV, MVT::i64, Custom); @@ -578,6 +585,34 @@ SDValue R600TargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const case ISD::SHLPARTS: return LowerSHLParts(Op, DAG); case ISD::SRAPARTS: case ISD::SRLPARTS: return LowerSRXParts(Op, DAG); + case ISD::UADDO: { + SDLoc DL(Op); + EVT VT = Op.getValueType(); + + SDValue Lo = Op.getOperand(0); + SDValue Hi = Op.getOperand(1); + + SDValue OVF = DAG.getNode(AMDGPUISD::CARRY, DL, VT, Lo, Hi); + //negate sign + OVF = DAG.getNode(ISD::SUB, DL, VT, DAG.getConstant(0, VT), OVF); + SDValue Res = DAG.getNode(ISD::ADD, DL, VT, Lo, Hi); + + return DAG.getNode(ISD::MERGEVALUES, DL, DAG.getVTList(VT, VT), Res, OVF); + } + case ISD::USUBO: { + SDLoc DL(Op); + EVT VT = Op.getValueType(); + + SDValue Arg0 = Op.getOperand(0); + SDValue Arg1 = Op.getOperand(1); + + SDValue OVF = DAG.getNode(AMDGPUISD::BORROW, DL, VT, Arg0, Arg1); + //negate sign + OVF = DAG.getNode(ISD::SUB, DL, VT, DAG.getConstant(0, VT), OVF); + SDValue Res = DAG.getNode(ISD::SUB, DL, VT, Arg0, Arg1); + + return DAG.getNode(ISD::MERGEVALUES, DL, DAG.getVTList(VT, VT), Res, OVF); + } case ISD::FCOS: case ISD::FSIN: return LowerTrig(Op, DAG); case ISD::SELECTCC: return LowerSELECTCC(Op, DAG); diff --git a/test/CodeGen/R600/add.ll b/test/CodeGen/R600/add.ll index 8cf43d1..fddb951 100644 --- a/test/CodeGen/R600/add.ll +++ b/test/CodeGen/R600/add.ll @@ -1,12 +1,12 @@ -; RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck --check-prefix=EG-CHECK --check-prefix=FUNC %s_ _-; RUN: llc < %s -march=r600 -mcpu=verde -verify-machineinstrs | FileCheck --check-prefix=SI-CHECK --check-prefix=FUNC %s_ _+; RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck --check-prefix=EG --check-prefix=FUNC %s_ _+; RUN: llc < %s -march=r600 -mcpu=verde -verify-machineinstrs | FileCheck --check-prefix=SI --check-prefix=FUNC %s_ _;FUNC-LABEL: @test1:_ _-;EG-CHECK: ADDINT {{[* ]}}T{{[0-9]+.[XYZW], T[0-9]+.[XYZW], T[0-9]+.[XYZW]}}_ +;EG: ADDINT {{[* ]*}}T{{[0-9]+.[XYZW], T[0-9]+.[XYZW], T[0-9]+.[XYZW]}} -;SI-CHECK: VADDI32e32 [[REG:v[0-9]+]], {{v[0-9]+, v[0-9]+}} -;SI-CHECK-NOT: [[REG]] -;SI-CHECK: BUFFERSTOREDWORD [[REG]], +;SI: VADDI32e32 [[REG:v[0-9]+]], {{v[0-9]+, v[0-9]+}} +;SI-NOT: [[REG]] +;SI: BUFFERSTOREDWORD [[REG]], define void @test1(i32 addrspace(1)* %out, i32 addrspace(1)* %in) { %bptr = getelementptr i32 addrspace(1)* %in, i32 1 %a = load i32 addrspace(1)* %in @@ -17,11 +17,11 @@ define void @test1(i32 addrspace(1)* %out, i32 addrspace(1)* %in) { } ;FUNC-LABEL: @test2: -;EG-CHECK: ADDINT {{[* ]*}}T{{[0-9]+.[XYZW], T[0-9]+.[XYZW], T[0-9]+.[XYZW]}} -;EG-CHECK: ADDINT {{[* ]*}}T{{[0-9]+.[XYZW], T[0-9]+.[XYZW], T[0-9]+.[XYZW]}} +;EG: ADDINT {{[* ]*}}T{{[0-9]+.[XYZW], T[0-9]+.[XYZW], T[0-9]+.[XYZW]}} +;EG: ADDINT {{[* ]*}}T{{[0-9]+.[XYZW], T[0-9]+.[XYZW], T[0-9]+.[XYZW]}} -;SI-CHECK: VADDI32e32 v{{[0-9]+, v[0-9]+, v[0-9]+}} -;SI-CHECK: VADDI32e32 v{{[0-9]+, v[0-9]+, v[0-9]+}} +;SI: VADDI32e32 v{{[0-9]+, v[0-9]+, v[0-9]+}} +;SI: VADDI32e32 v{{[0-9]+, v[0-9]+, v[0-9]+}} _define void @test2(<2 x i32> addrspace(1) %out, <2 x i32> addrspace(1)* %in) { %bptr = getelementptr <2 x i32> addrspace(1)* %in, i32 1 @@ -33,15 +33,15 @@ define void @test2(<2 x i32> addrspace(1)* %out, <2 x i32> addrspace(1)* %in) { } ;FUNC-LABEL: @test4: -;EG-CHECK: ADDINT {{[* ]*}}T{{[0-9]+.[XYZW], T[0-9]+.[XYZW], T[0-9]+.[XYZW]}} -;EG-CHECK: ADDINT {{[* ]*}}T{{[0-9]+.[XYZW], T[0-9]+.[XYZW], T[0-9]+.[XYZW]}} -;EG-CHECK: ADDINT {{[* ]*}}T{{[0-9]+.[XYZW], T[0-9]+.[XYZW], T[0-9]+.[XYZW]}} -;EG-CHECK: ADDINT {{[* ]*}}T{{[0-9]+.[XYZW], T[0-9]+.[XYZW], T[0-9]+.[XYZW]}} +;EG: ADDINT {{[* ]*}}T{{[0-9]+.[XYZW], T[0-9]+.[XYZW], T[0-9]+.[XYZW]}} +;EG: ADDINT {{[* ]*}}T{{[0-9]+.[XYZW], T[0-9]+.[XYZW], T[0-9]+.[XYZW]}} +;EG: ADDINT {{[* ]*}}T{{[0-9]+.[XYZW], T[0-9]+.[XYZW], T[0-9]+.[XYZW]}} +;EG: ADDINT {{[* ]*}}T{{[0-9]+.[XYZW], T[0-9]+.[XYZW], T[0-9]+.[XYZW]}} -;SI-CHECK: VADDI32e32 v{{[0-9]+, v[0-9]+, v[0-9]+}} -;SI-CHECK: VADDI32e32 v{{[0-9]+, v[0-9]+, v[0-9]+}} -;SI-CHECK: VADDI32e32 v{{[0-9]+, v[0-9]+, v[0-9]+}} -;SI-CHECK: VADDI32e32 v{{[0-9]+, v[0-9]+, v[0-9]+}} +;SI: VADDI32e32 v{{[0-9]+, v[0-9]+, v[0-9]+}} +;SI: VADDI32e32 v{{[0-9]+, v[0-9]+, v[0-9]+}} +;SI: VADDI32e32 v{{[0-9]+, v[0-9]+, v[0-9]+}} +;SI: VADDI32e32 v{{[0-9]+, v[0-9]+, v[0-9]+}} define void @test4(<4 x i32> addrspace(1)* %out, <4 x i32> addrspace(1)* %in) { %bptr = getelementptr <4 x i32> addrspace(1)* %in, i32 1 @@ -53,22 +53,22 @@ define void @test4(<4 x i32> addrspace(1)* %out, <4 x i32> addrspace(1)* %in) { } ; FUNC-LABEL: @test8 -; EG-CHECK: ADDINT -; EG-CHECK: ADDINT -; EG-CHECK: ADDINT -; EG-CHECK: ADDINT -; EG-CHECK: ADDINT -; EG-CHECK: ADDINT -; EG-CHECK: ADDINT -; EG-CHECK: ADDINT -; SI-CHECK: SADDI32 -; SI-CHECK: SADDI32 -; SI-CHECK: SADDI32 -; SI-CHECK: SADDI32 -; SI-CHECK: SADDI32 -; SI-CHECK: SADDI32 -; SI-CHECK: SADDI32 -; SI-CHECK: SADDI32 +; EG: ADDINT +; EG: ADDINT +; EG: ADDINT +; EG: ADDINT +; EG: ADDINT +; EG: ADDINT +; EG: ADDINT +; EG: ADDINT +; SI: SADDI32 +; SI: SADDI32 +; SI: SADDI32 +; SI: SADDI32 +; SI: SADDI32 +; SI: SADDI32 +; SI: SADDI32 +; SI: SADDI32 define void @test8(<8 x i32> addrspace(1)* %out, <8 x i32> %a, <8 x i32> %b) { entry: %0 = add <8 x i32> %a, %b @@ -77,38 +77,38 @@ entry: } ; FUNC-LABEL: @test16 -; EG-CHECK: ADDINT -; EG-CHECK: ADDINT -; EG-CHECK: ADDINT -; EG-CHECK: ADDINT -; EG-CHECK: ADDINT -; EG-CHECK: ADDINT -; EG-CHECK: ADDINT -; EG-CHECK: ADDINT -; EG-CHECK: ADDINT -; EG-CHECK: ADDINT -; EG-CHECK: ADDINT -; EG-CHECK: ADDINT -; EG-CHECK: ADDINT -; EG-CHECK: ADDINT -; EG-CHECK: ADDINT -; EG-CHECK: ADDINT -; SI-CHECK: SADDI32 -; SI-CHECK: SADDI32 -; SI-CHECK: SADDI32 -; SI-CHECK: SADDI32 -; SI-CHECK: SADDI32 -; SI-CHECK: SADDI32 -; SI-CHECK: SADDI32 -; SI-CHECK: SADDI32 -; SI-CHECK: SADDI32 -; SI-CHECK: SADDI32 -; SI-CHECK: SADDI32 -; SI-CHECK: SADDI32 -; SI-CHECK: SADDI32 -; SI-CHECK: SADDI32 -; SI-CHECK: SADDI32 -; SI-CHECK: SADDI32 +; EG: ADDINT +; EG: ADDINT +; EG: ADDINT +; EG: ADDINT +; EG: ADDINT +; EG: ADDINT +; EG: ADDINT +; EG: ADDINT +; EG: ADDINT +; EG: ADDINT +; EG: ADDINT +; EG: ADDINT +; EG: ADDINT +; EG: ADDINT +; EG: ADDINT +; EG: ADDINT +; SI: SADDI32 +; SI: SADDI32 +; SI: SADDI32 +; SI: SADDI32 +; SI: SADDI32 +; SI: SADDI32 +; SI: SADDI32 +; SI: SADDI32 +; SI: SADDI32 +; SI: SADDI32 +; SI: SADDI32 +; SI: SADDI32 +; SI: SADDI32 +; SI: SADDI32 +; SI: SADDI32 +; SI: SADDI32 define void @test16(<16 x i32> addrspace(1)* %out, <16 x i32> %a, <16 x i32> %b) { entry: %0 = add <16 x i32> %a, %b @@ -117,8 +117,12 @@ entry: } ; FUNC-LABEL: @add64 -; SI-CHECK: SADDU32 -; SI-CHECK: SADDCU32 +; SI: SADDU32 +; SI: SADDCU32 + +; EG-DAG: ADDINT +; EG-DAG: ADDCUINT +; EG-DAG: ADDINT define void @add64(i64 addrspace(1)* %out, i64 %a, i64 %b) { entry: %0 = add i64 %a, %b @@ -132,7 +136,11 @@ entry: ; to a VGPR before doing the add. ; FUNC-LABEL: @add64sgprvgpr -; SI-CHECK-NOT: VADDCU32e32 s +; SI-NOT: VADDCU32e32 s + +; EG-DAG: ADDINT +; EG-DAG: ADDCUINT +; EG-DAG: ADDINT define void @add64sgprvgpr(i64 addrspace(1)* %out, i64 %a, i64 addrspace(1)* %in) { entry: %0 = load i64 addrspace(1)* %in @@ -143,8 +151,12 @@ entry: ; Test i64 add inside a branch. ; FUNC-LABEL: @add64inbranch -; SI-CHECK: SADDU32 -; SI-CHECK: SADDCU32 +; SI: SADDU32 +; SI: SADDCU32 + +; EG-DAG: ADDINT +; EG-DAG: ADDCUINT +; EG-DAG: ADDINT define void @add64inbranch(i64 addrspace(1)* %out, i64 addrspace(1)* %in, i64 %a, i64 %b, i64 %c) { entry: %0 = icmp eq i64 %a, 0 diff --git a/test/CodeGen/R600/sub.ll b/test/CodeGen/R600/sub.ll index 8678e2b..1225ebd 100644 --- a/test/CodeGen/R600/sub.ll +++ b/test/CodeGen/R600/sub.ll @@ -43,10 +43,13 @@ define void @test4(<4 x i32> addrspace(1)* %out, <4 x i32> addrspace(1)* %in) { ; SI: SSUBU32 ; SI: SSUBBU32 +; EG: MEMRATCACHELESS STORERAW [[LO:T[0-9]+.[XYZW]]] +; EG: MEMRATCACHELESS STORERAW [[HI:T[0-9]+.[XYZW]]] +; EG-DAG: SUBINT {{[* ]*}}[[LO]] +; EG-DAG: SUBBUINT ; EG-DAG: SUBINT -; EG-DAG: SETGTUINT -; EG-DAG: SUBINT -; EG-DAG: ADDINT +; EG-DAG: SUBINT {{[* ]*}}[[HI]] +; EG-NOT: SUB define void @ssubi64(i64 addrspace(1)* noalias %out, i64 %a, i64 %b) nounwind { %result = sub i64 %a, %b store i64 %result, i64 addrspace(1)* %out, align 8 @@ -57,10 +60,13 @@ define void @ssubi64(i64 addrspace(1)* noalias %out, i64 %a, i64 %b) nounwind ; SI: VSUBI32e32 ; SI: VSUBBU32e32 +; EG: MEMRATCACHELESS STORERAW [[LO:T[0-9]+.[XYZW]]] +; EG: MEMRATCACHELESS STORERAW [[HI:T[0-9]+.[XYZW]]] +; EG-DAG: SUBINT {{[* ]*}}[[LO]] +; EG-DAG: SUBBUINT ; EG-DAG: SUBINT -; EG-DAG: SETGTUINT -; EG-DAG: SUBINT -; EG-DAG: ADDINT +; EG-DAG: SUBINT {{[* ]*}}[[HI]] +; EG-NOT: SUB define void @vsubi64(i64 addrspace(1)* noalias %out, i64 addrspace(1)* noalias %inA, i64 addrspace(1)* noalias %inB) nounwind { %tid = call i32 @llvm.r600.read.tidig.x() readnone %aptr = getelementptr i64 addrspace(1)* %inA, i32 %tid diff --git a/test/CodeGen/R600/uaddo.ll b/test/CodeGen/R600/uaddo.ll index 0b854b5..ce30bbc 100644 --- a/test/CodeGen/R600/uaddo.ll +++ b/test/CodeGen/R600/uaddo.ll @@ -1,5 +1,5 @@ ; RUN: llc -march=r600 -mcpu=SI -verify-machineinstrs< %s | FileCheck -check-prefix=SI -check-prefix=FUNC %s -; RUN: llc -march=r600 -mcpu=cypress -verify-machineinstrs< %s +; RUN: llc -march=r600 -mcpu=cypress -verify-machineinstrs< %s | FileCheck -check-prefix=EG -check-prefix=FUNC %s declare { i32, i1 } @llvm.uadd.with.overflow.i32(i32, i32) nounwind readnone declare { i64, i1 } @llvm.uadd.with.overflow.i64(i64, i64) nounwind readnone @@ -8,6 +8,9 @@ declare { i64, i1 } @llvm.uadd.with.overflow.i64(i64, i64) nounwind readnone ; SI: ADD ; SI: ADDC ; SI: ADDC + +; EG: ADDCUINT +; EG: ADDCUINT define void @uaddoi64zext(i64 addrspace(1)* %out, i64 %a, i64 %b) nounwind { %uadd = call { i64, i1 } @llvm.uadd.with.overflow.i64(i64 %a, i64 %b) nounwind %val = extractvalue { i64, i1 } %uadd, 0 @@ -20,6 +23,9 @@ define void @uaddoi64zext(i64 addrspace(1)* %out, i64 %a, i64 %b) nounwind { ; FUNC-LABEL: @suaddoi32 ; SI: SADDI32 + +; EG: ADDCUINT +; EG: ADDINT define void @suaddoi32(i32 addrspace(1)* %out, i1 addrspace(1)* %carryout, i32 %a, i32 %b) nounwind { %uadd = call { i32, i1 } @llvm.uadd.with.overflow.i32(i32 %a, i32 %b) nounwind %val = extractvalue { i32, i1 } %uadd, 0 @@ -31,6 +37,9 @@ define void @suaddoi32(i32 addrspace(1)* %out, i1 addrspace(1)* %carryout, i32 ; FUNC-LABEL: @vuaddoi32 ; SI: VADDI32 + +; EG: ADDCUINT +; EG: ADDINT define void @vuaddoi32(i32 addrspace(1)* %out, i1 addrspace(1)* %carryout, i32 addrspace(1)* %aptr, i32 addrspace(1)* %bptr) nounwind { %a = load i32 addrspace(1)* %aptr, align 4 %b = load i32 addrspace(1)* %bptr, align 4 @@ -45,6 +54,9 @@ define void @vuaddoi32(i32 addrspace(1)* %out, i1 addrspace(1)* %carryout, i32 ; FUNC-LABEL: @suaddoi64 ; SI: SADDU32 ; SI: SADDCU32 + +; EG: ADDCUINT +; EG: ADDINT define void @suaddoi64(i64 addrspace(1)* %out, i1 addrspace(1)* %carryout, i64 %a, i64 %b) nounwind { %uadd = call { i64, i1 } @llvm.uadd.with.overflow.i64(i64 %a, i64 %b) nounwind %val = extractvalue { i64, i1 } %uadd, 0 @@ -57,6 +69,9 @@ define void @suaddoi64(i64 addrspace(1)* %out, i1 addrspace(1)* %carryout, i64 ; FUNC-LABEL: @vuaddoi64 ; SI: VADDI32 ; SI: VADDCU32 + +; EG: ADDCUINT +; EG: ADDINT define void @vuaddoi64(i64 addrspace(1)* %out, i1 addrspace(1)* %carryout, i64 addrspace(1)* %aptr, i64 addrspace(1)* %bptr) nounwind { %a = load i64 addrspace(1)* %aptr, align 4 %b = load i64 addrspace(1)* %bptr, align 4 diff --git a/test/CodeGen/R600/usubo.ll b/test/CodeGen/R600/usubo.ll index c293ad7..d7718e2 100644 --- a/test/CodeGen/R600/usubo.ll +++ b/test/CodeGen/R600/usubo.ll @@ -1,10 +1,13 @@ ; RUN: llc -march=r600 -mcpu=SI -verify-machineinstrs< %s | FileCheck -check-prefix=SI -check-prefix=FUNC %s -; RUN: llc -march=r600 -mcpu=cypress -verify-machineinstrs< %s +; RUN: llc -march=r600 -mcpu=cypress -verify-machineinstrs< %s | FileCheck -check-prefix=EG -check-prefix=FUNC %s declare { i32, i1 } @llvm.usub.with.overflow.i32(i32, i32) nounwind readnone declare { i64, i1 } @llvm.usub.with.overflow.i64(i64, i64) nounwind readnone ; FUNC-LABEL: @usuboi64zext + +; EG: SUBBUINT +; EG: ADDCUINT define void @usuboi64zext(i64 addrspace(1)* %out, i64 %a, i64 %b) nounwind { %usub = call { i64, i1 } @llvm.usub.with.overflow.i64(i64 %a, i64 %b) nounwind %val = extractvalue { i64, i1 } %usub, 0 @@ -17,6 +20,10 @@ define void @usuboi64zext(i64 addrspace(1)* %out, i64 %a, i64 %b) nounwind { ; FUNC-LABEL: @susuboi32 ; SI: SSUBI32 + +; EG-DAG: SUBBUINT +; EG-DAG: SUBINT +; EG: SUBINT define void @susuboi32(i32 addrspace(1)* %out, i1 addrspace(1)* %carryout, i32 %a, i32 %b) nounwind { %usub = call { i32, i1 } @llvm.usub.with.overflow.i32(i32 %a, i32 %b) nounwind %val = extractvalue { i32, i1 } %usub, 0 @@ -28,6 +35,10 @@ define void @susuboi32(i32 addrspace(1)* %out, i1 addrspace(1)* %carryout, i32 ; FUNC-LABEL: @vusuboi32 ; SI: VSUBREVI32e32 + +; EG-DAG: SUBBUINT +; EG-DAG: SUBINT +; EG: SUBINT define void @vusuboi32(i32 addrspace(1)* %out, i1 addrspace(1)* %carryout, i32 addrspace(1)* %aptr, i32 addrspace(1)* %bptr) nounwind { %a = load i32 addrspace(1)* %aptr, align 4 %b = load i32 addrspace(1)* %bptr, align 4 @@ -42,6 +53,11 @@ define void @vusuboi32(i32 addrspace(1)* %out, i1 addrspace(1)* %carryout, i32 ; FUNC-LABEL: @susuboi64 ; SI: SSUBU32 ; SI: SSUBBU32 + +; EG-DAG: SUBBUINT +; EG-DAG: SUBINT +; EG-DAG: SUBINT +; EG: SUBINT define void @susuboi64(i64 addrspace(1)* %out, i1 addrspace(1)* %carryout, i64 %a, i64 %b) nounwind { %usub = call { i64, i1 } @llvm.usub.with.overflow.i64(i64 %a, i64 %b) nounwind %val = extractvalue { i64, i1 } %usub, 0 @@ -54,6 +70,11 @@ define void @susuboi64(i64 addrspace(1)* %out, i1 addrspace(1)* %carryout, i64 ; FUNC-LABEL: @vusuboi64 ; SI: VSUBI32 ; SI: VSUBBU32 + +; EG-DAG: SUBBUINT +; EG-DAG: SUBINT +; EG-DAG: SUBINT +; EG: SUBINT define void @vusuboi64(i64 addrspace(1)* %out, i1 addrspace(1)* %carryout, i64 addrspace(1)* %aptr, i64 addrspace(1)* %bptr) nounwind { %a = load i64 addrspace(1)* %aptr, align 4 %b = load i64 addrspace(1)* %bptr, align 4
-- Jan Vesely <jan.vesely at rutgers.edu>
-- Jan Vesely <jan.vesely at rutgers.edu> -------------- next part -------------- A non-text attachment was scrubbed... Name: dumps.tgz Type: application/x-compressed-tar Size: 1166570 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141003/d4c2ded0/attachment.bin> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141003/d4c2ded0/attachment.sig>
- Previous message: [LLVMdev] ASAN tests on ARM
- Next message: [LLVMdev] Weird problems with cos (was Re: [PATCH v3 2/3] R600: Add carry and borrow instructions. Use them to implement UADDO/USUBO)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]