[LLVMdev] RFC: Proposal for Poison Semantics (original) (raw)
David Majnemer david.majnemer at gmail.com
Tue Jan 27 19:44:24 PST 2015
- Previous message: [LLVMdev] RFC: Proposal for Poison Semantics
- Next message: [LLVMdev] RFC: Proposal for Poison Semantics
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Tue, Jan 27, 2015 at 7:23 PM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote:
Hi David,
I spent some time thinking about poison semantics this way, but here is where I always get stuck: Consider the IR fragment %x = zext i32 %maybepoison to i64 %y = lshr i64 %x 32 %ptr = gep %global, %y store 42 to %ptr If %maybepoison is poison, then is %y poison? For all i32 values of %maybepoison, %y is i64 0, so in some sense you can determine the value %y without looking at %x.
I agree, this makes sense.
Given that, the store in the above fragment is not undefined behavior even if %maybepoison is poison. However, this means if "%maybepoison" is "add nuw %m, %n" it cannot be optimized to "add nuw (zext %m) (zext %n)" since that will change program behavior in an otherwise well-defined program.
Hmm, I'm not so sure this is right.
Are we talking about transforming: %add = add nuw i32 %x, %y %res = zext i32 %add to i64
to: %z1 = zext i32 %x to i64 %z2 = zext i32 %y to i64 %res = add nuw i64 %z1, %z2
?
This is OK because performing a zext by that many bits cannot result in a NUW violation.
One way out of this is to say that zext and sext of a value that is dependent on poison is poison. We'll have to do something similar for load shearing.
sext must be dependent on the underlying value because it splats the sign bit.
The above "problem" can also be seen in > ### Does comparing a poison value result in poison? > > It depends. If the comparison couldn't solely be determined by > looking at the other operand, the result is poison. This means we cannot do the C-style optimization "int a = ...; a <_ _(a + 1)" ==> "true". In the case "a + 1" overflows, a is INTSIGNEDMAX. But if a is INTSIGNEDMAX, "a < X" is always false, for all values of X. So "a < a + 1" is defined; and it is true for "a != INTSIGNEDMAX" but false for "a == INTSIGNEDMAX". Hence the expression cannot be folded to true.
This part sounds tricky, I'm not immediately sure what to do.
I think the reason why poison is hard to formalize is that it fundamentally tries to give an N bit value behavior that cannot be justified by any N bit value. It "breaks" llvm's type system. -- Sanjoy On Tue, Jan 27, 2015 at 6:50 PM, David Majnemer <david.majnemer at gmail.com> wrote: > Hello, > > What follows is my attempt to describe how poison works. Let me know what > you think. > > -- > David > > > # LLVM Poison Semantics > > Poison is an LLVM concept which exists solely to enable further optimization > of LLVM IR. The exact behavior of poison has been, to say the least, > confusing for users, researchers and engineers working with LLVM. > > This document hopes to clear up some of the confusion of poison and > hopefully explain why it has its semantics. > > ## A Quick Introduction to Poison > > Let's start with a concrete motivating example in C: >
_ _> int isSumGreater(int a, int b) {_ _> return a + b > a;_ _> }_ _>> > The C specification permits us to optimize the comparison inisSumGreater> tob > 0because signed overflow results in undefined behavior. A > reasonable translation ofisSumGreaterto LLVM IR could be: > >_ _> define i32 @isSumGreater(i32 %a, i32 %b) {_ _> entry:_ _> %add = add i32 %a, %b_ _> %cmp = icmp sgt i32 %add, %a_ _> %conv = zext i1 %cmp to i32_ _> ret i32 %conv_ _> }_ _>> > However, LLVM cannot determine that%cmpshould not consider cases where >%addresulted in signed overflow. We need a way to communicate this > information to LLVM. > > This is where thenswandnuwflags come into play.nswis short for > "no signed wrap",nuwis short for "no unsigned wrap". > > With these, we can come up with a new formulation of%add:add i32 nsw_ _> %a, %b. > LLVM can take this into account when it is optimizing the%cmpand replace > it with:icmp sgt i32 %b, 0. > > ## Differences Between LLVM and C/C++ > > There are some interesting differences between what C++ and C specify and > how LLVM behaves with respect to performing an operationg which is not > permitted to overflow. > > Perhaps chief among them is that evaluating an expression in C++ or C which > results performs an overflow is undefined behavior. In LLVM, executing an > instruction which is markednswbut which violates signed overflow results > in poison. Values which have no relationship with poisoned values are not > effected by them. > > Let us take the following C program into consideration: >_ _> int calculateImportantResult(int a, int b) {_ _> int result = 0;_ _> if (a) {_ _> result = a + b;_ _> }_ _> return result;_ _> }_ _>> > A straightforward lowering to LLVM IR could be: >_ _> define i32 @calculateImportantResult(i32 %a, i32 %b) {_ _> entry:_ _> %tobool = icmp ne i32 %a, 0_ _> br i1 %tobool, label %if.then, label %if.end_ _>_ _> if.then:_ _> %add = add nsw i32 %a, %b_ _> br label %if.end_ _>_ _> if.end:_ _> %result = phi i32 [ %add, %if.then ], [ 0, %entry ]_ _> ret i32 %result_ _> }_ _>> > Moving%addto the%entryblock would be preferable and would allow > further optimizations: >_ _> define i32 @calculateImportantResult(i32 %a, i32 %b) {_ _> entry:_ _> %tobool = icmp ne i32 %a, 0_ _> %add = add nsw i32 %a, %b_ _> %result = select i1 %tobool, i32 0, i32 %add_ _> ret i32 %result_ _> }_ _>> > In the original code, the calculation of%addwas control dependent. > Now,%addmight result in signed overflow in violation of thenswflag. > Despite this, the program should behave as it did before because the > poisoned value is masked-out by the select. The next section will dive into > this in greater detail. > > # Computation Involving Poison Values > Poison in a computation results in poison if the result cannot be > constrained by its non-poison operands. > > Examples of this rule which will result in poison: >_ _> %add = add i32 %x, %alwayspoison_ _> %sub = sub i32 %x, %alwayspoison_ _> %xor = xor i32 %x, %alwayspoison_ _> %mul = mul i32 %alwayspoison, 1_ _>> > Examples of this rule which do not result in poison: >_ _> %or = or i32 %alwayspoison, 2_ _> %and = and i32 %alwayspoison, 2_ _> %mul = mul i32 %alwayspoison, 0_ _>> > In fact, it would be reasonable to optimize%orto2and%andto0. > In this respect, poison is not different fromundef. > > The following example is only poison if%condis false: >_ _> %sel = select i1 %cond, i32 2, %alwayspoison_ _>> > ### Is it safe to have poison as acallargument? > > Acallinstruction may or may not result in poison depending on exactly > how the callee uses the supplied arguments, it is not necessarily the case > thatcall i32 @someFunction(i32 %alwayspoison)results in poison. > > LLVM cannot forbid poison from enteringcallarguments without prohibiting > an optimization pass from outlining code. > > ### Is it safe to store poison to memory? > >store i32 %alwayspoison, i32* %memdoes not result in undefined behavior. > A subsequent load instruction like%load = load i32* %memwill result in >%loadbeing a poison value. > > ### Is it safe to load or store a poison memory location? > > No. Poison works just likeundefin this respect. > > ### Does comparing a poison value result in poison? > > It depends. If the comparison couldn't solely be determined by looking at > the other operand, the result is poison. > > For example,icmp i32 ule %alwayspoison, 4294967295istrue, not > poison. > However,icmp i32 ne %alwayspoison, 7is poison. > > ### What if the condition operand in aselectis poison? > > In the example%sel = select i1 %alwayspoison, i1 true, false,%selis > eithertrueorfalse. Because,%seldepends on%alwayspoisonit > too is poison. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150127/106edad0/attachment.html>
- Previous message: [LLVMdev] RFC: Proposal for Poison Semantics
- Next message: [LLVMdev] RFC: Proposal for Poison Semantics
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]