[llvm-dev] Is it ok to allocate (original) (raw)

[llvm-dev] Is it ok to allocate > half of address space?

Nuno Lopes via llvm-dev llvm-dev at lists.llvm.org
Wed Nov 8 15🔞12 PST 2017


On 11/8/2017 9:24 AM, Nuno Lopes via llvm-dev wrote:

Hi,

I was looking into the semantics of GEP inbounds and some BasicAA rules and I'm wondering if it's valid in LLVM IR to allocate more than half of the address space with a global variable or an alloca. If that's a scenario want to consider, then we have problems :) Consider this C code (32 bits): #include <string.h> char obj[0x80000008]; char f() { char *p = obj + 0x79999999; char *q = obj + 0x80000000; *q = 1; memcpy(p, "abcd", 4); return *q; }

Clearly the stores alias, and the memcpy should override the value written by "*q = 1". I dunno if this is legal in C or not, but the IR produced by clang looks like (32 bits): @obj = common global [2147483656 x i8] zeroinitializer, align 1 define signext i8 @f() { store i8 1, i8* getelementptr inbounds (i8, i8* getelementptr inbounds ([2147483656 x i8], [2147483656 x i8]* @obj, i32 0, i32 0), i32 -2147483648), align 1 call void @llvm.memcpy.p0i8.p0i8.i32(i8* getelementptr inbounds ([2147483656 x i8], [2147483656 x i8]* @obj, i32 0, i32 2040109465), i8* getelementptr inbounds ([5 x i8], [5 x i8]* @.str, i32 0, i32 0), i32 4, i32 1, i1 false) %1 = load i8, i8* getelementptr inbounds (i8, i8* getelementptr inbounds ([2147483656 x i8], [2147483656 x i8]* @obj, i32 0, i32 0), i32 -2147483648), align 1 ret i8 %1 } With -O2, the store to q gets forwarded, and so we get "ret i8 1". So, BasicAA concluded that p and q don't alias. The culprit is an overflow in BasicAAResult::isGEPBaseAtNegativeOffset(). So my question is do we care about this use case where a single allocation can take more than half of the address space? Accoding to LangRef, your IR currently has undefined behavior: the rules for "inbounds" GEPs say that indexes are treated as signed values. And solving that would involve changing the way we represent GEPs in IR, so I think you can consider that out of scope.

Sorry, that was a typo. The test case was supposed to not have inbounds (it should work without as well). The current definition of GEP inbounds is complicated, though.. It disallows the following: %a = gep %p, 0x88888888 %b = gep inbounds %a, 1

If %a is within bounds, the "gep inbounds" gives a signed overflow even though it's just a +1 (since 0x88888888 + 1 overflows). So GEP inbounds disables large objects outright.

BTW I've always wondered why EmitGEPOffset (http://llvm.org/doxygen/Local_8h_source.html#l00247) doesn't use 'add nsw' if the semantics of GEP inbounds allows that (if my reading of LangRef is correct).

Assuming we're not dealing with inbounds GEPs (e.g. you pass -fwrapv to clang), I don't see any particular reason to disallow allocations more than half the address-space.

Ok, I can file bug reports for the cases I'm seeing. I can verify correctness of fixes as well. But only starting in a week from now; I'm quite busy at the moment.

Nuno



More information about the llvm-dev mailing list