(original) (raw)
On Mon, 25 Feb 2019 at 13:11, Bruce Hoult via llvm-dev <llvm-dev@lists.llvm.org> wrote:
LLVM has no idea whether the address computed by GEP is actually
within a legal object. The "inbounds" keyword is just you, the
programmer, promising LLVM that you know it's ok and that you don't
care what happens if it is actually out of bounds.
https://llvm.org/docs/GetElementPtr.html#what-happens-if-an-array-index-is-out-of-bounds
Hi Bruce,
it's not true in general that LLVM has no idea about (or doesn't care about) object sizes. It can infer object size and other things from allocas, global variables, and calls to built-in functions such as malloc(). In the case of Rust we even have an out of tree patch to teach LLVM the same for Rust's (global) heap allocation functions. You can see this information being computed in lib/Analysis/MemoryBuiltins.cpp.
More importantly, the question is \*what\* actually is being promised to LLVM, more specifically, what the definitions of the terms "out of bounds" and "object" are in this context. It is easy enough to answer intuitively in many specific cases whether a GEP should be considered "out of bounds", but in the cases Ralf described, where offsets and "object sizes" are equal to 0, it is not so clear-cut and depends on tricky matters such as whether zero-sized allocations exist. We (Rust developers) very much care what happens in those cases (it should be a NOP), so it's important to check whether that is compatible with the Rust compiler emitting inbounds GEPs.
It is true that in practice in many cases LLVM won't be able to determine conclusively whether an object exists or not and what its bounds are, but that doesn't answer the question.
Cheers,
Robin
On Sun, Feb 24, 2019 at 9:05 AM Ralf Jung via llvm-dev
<llvm-dev@lists.llvm.org> wrote:
\>
\> Hi all,
\>
\> What exactly are the rules for \`getelementptr inbounds\` with offset 0?
\>
\> In Rust, we are relying on the fact that if we use, for example, \`inttoptr\` to
\> turn \`4\` into a pointer, we can then do \`getelementptr inbounds\` with offset 0
\> on that without LLVM deducing that there actually is any dereferencable memory
\> at location 4\. The argument is that we can think of there being a zero-sized
\> allocation. Is that a reasonable assumption? Can something like this be
\> documented in the LangRef?
\>
\> Relatedly, how does the situation change if the pointer is not created "out of
\> thin air" from a fixed integer, but is actually a dangling pointer obtained
\> previously from \`malloc\` (or \`alloca\` or whatever)? Is getelementptr inbounds\`
\> with offset 0 on such a pointer a NOP, or does it result in \`poison\`? And if
\> that makes a difference, how does that square with the fact that, e.g., the
\> integer \`0x4000\` could well be inside such an allocation, but doing
\> \`getelementptr inbounds\` with offset 0 on that would fall under the first
\> question above?
\>
\> Kind regards,
\> Ralf
\> \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
\> LLVM Developers mailing list
\> llvm-dev@lists.llvm.org
\> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
LLVM Developers mailing list
llvm-dev@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev