[llvm-dev] Is this undefined behavior optimization legal? (original) (raw)

Mehdi Amini via llvm-dev llvm-dev at lists.llvm.org
Mon Oct 3 14:27:46 PDT 2016


On Oct 3, 2016, at 1:51 PM, Tom Stellard via llvm-dev <llvm-dev at lists.llvm.org> wrote:

Hi, I've found a test case where SelectionDAG is doing an undefined behavior optimization, and I need help determining whether or not this is legal. Here is the example IR: define void @test(<4 x i8> addrspace(1)* %out, float %a) { %uint8 = fptoui float %a to i8 %vec = insertelement <4 x i8> <i8 0, i8 0, i8 0, i8 0>, i8 %uint8, i32 0 store <4 x i8> %vec, <4 x i8> addrspace(1)* %out ret void } Since %vec is a 32-bit vector, a common way to implement this function on a target with 32-bit registers would be to zero initialize a 32-bit register to hold the initial vector and then 'mask' and 'or' the inserted value with the initial vector. In AMDGPU assembly it would look something like: vmovb32 v0, 0 vcvtu32f32e32 v1, s0 vandb32 v1, v1, 0x000000ff vorb32 v0, v0, v1 The optimization the SelectionDAG does for us in this function, though, ends up removing the mask operation. Which gives us: vmovb32 v0, 0 vcvtu32f32e32 v1, s0 vorb32 v0, v0, v1 The reason the SelectionDAG is doing this is because it knows that the result of %uint8 = fptoui float %a to i8 is undefined when the result uses more than 8-bits. So, it assumes that the result will only set the low 8-bits, because anything else would be undefined behavior and the program would be broken. This assumption is what causes it to remove the 'and' operation. So effectively, what has happened here, is that by inserting the result of an operation with undefined behavior into one lane of a vector, we have overwritten all the other lanes of the vector. Is this optimization legal? To me it seems wrong that undefined behavior in one lane of a vector could affect another lane.

Isn’t undefined behavior in a program that all the program is undefined? I’m not sure why you think that there should be a limit to what the optimizer can do specifically on the vector lane while we don’t put any limit usually.

There might be a question about your fptoui conversion here though: is it guarantee to write zero to the upper bits of the 32bits register? In the IR it produces an i8 value, and insert it in a vector. It isn’t clear to me which combine / transformation knows that the fptoui will zero the upper part of the register.

— Mehdi

However, given that LLVM IR is SSA and we are technically creating a new vector and not modifying the old one, then maybe it's OK. I'm just not sure.

Appreciate any insight people may have. Thanks, Tom


LLVM Developers mailing list llvm-dev at lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



More information about the llvm-dev mailing list