[llvm-dev] [RFC] Adding range metadata to array subscripts. (original) (raw)

Clement Courbet via llvm-dev llvm-dev at lists.llvm.org
Wed Mar 24 07:06:40 PDT 2021


On Wed, Mar 24, 2021 at 2:20 PM Johannes Doerfert < johannesdoerfert at gmail.com> wrote:

I really like encoding more (range) information in the IR, more thoughts inlined.

On 3/24/21 4:14 AM, Clement Courbet via llvm-dev wrote: > Hi everyone, > > tl;dr: I would like to teach clang to output range metadata so that LLVM > can do better alias analysis. I have a proposal as D99248 > <https://reviews.llvm.org/D99248> (clang part) and D99247 > <https://reviews.llvm.org/D99247> (llvm part). But there are other possible > options that I'm detailing below. > > Consider the following code, adapted from brotli > <https://en.wikipedia.org/wiki/Brotli>: > > _ _>_ _> struct Histogram {_ _>_ _> int values[256];_ _>_ _> int total;_ _>_ _> };_ _>_ _> Histogram DoIt(const int* image, int size) {_ _>_ _> Histogram histogram;_ _>_ _> for (int i = 0; i < size; ++i) {_ _>_ _> ++histogram.values[image[i]]; // (A)_ _>_ _> ++histogram.total; // (B)_ _>_ _> }_ _>_ _> return histogram;_ _>_ _> }_ _> > > In this code, the compiler does not know anything about the values of > images[i], so it assumes that 256 is a possible value for it. In that case, > (A) would change the value of histogram.total, so (B) has to load, add one > and store [godbolt <https://godbolt.org/z/KxE343>]. > > Fortunately, C/C++ has a rule that it is invalid (actually, UB) to use > values to form a pointer to total and dereference it. What valid C/C++ code > is allowed to do with values is: > - Form any pointer in [values, values + 256]. > - Form and dereference any pointer in [values, values + 256) > > Note that the LLVM memory model is much laxer than that of C/C++. It has no > notion of types. In particular, given an LLVM aggregate definition: > > _ _> %struct.S = type { [42 x i32], i32, i32 }_ _> > > It is perfectly valid to use an address derived from a GEP(0,0,%i) [gep > reference] representing indexing into the [42 x i32] array to load the i32 > member at index 2. It is also valid for %i to be 43 (though not 44 if an > inbound GEP is used). > So clang has to give LLVM more information about the C/C++ rules. > > IR representation: > LLVM has several ways of representing ranges of values: > - !range metadata can be attached to integer call and load instructions > to indicate the allowed range of values of the result. LLVM's ValueTracking > provides a function for querying the range for any llvm::Variable. > - The llvm.assume intrinsic takes a boolean condition that can also be > used by ValueTracking to infer range of values. > - The inrange attribute of GEP can be used to indicate C-like semantics > for the structure field marked with the inrange attribute. It can only be > used for GEP constantexprs (ie.e. GEPs defined inline), but not for > standalone GEPs defining instructions. relevant discussion > <https://reviews.llvm.org/D22793?id=65626#inline-194653>. > > Alternatives: > *(1) *Annotate each array subscript index value with a range, e.g.: > _ _> %i = i64 …_ _> %ri = call i64 @llvm.annotation.i64(%index), !range !0_ _> %gep1 = getelementptr inbounds %struct.S, %struct.S* %s, i64 0, i32 0,_ _i32_ _> %ri_ _> ..._ _> !0 = !{i64 0, i64 42}_ _> > *(2) *(variant of 1) relax the constraint that !range metadata can only be > set on call and load instructions, and set the !range metadata on the index > expression. We still need annotations for function parameters though: > _ _> %i = i64 … , !range !0_ _> %gep1 = getelementptr inbounds %struct.S, %struct.S* %s, i64 0, i32 0,_ _i32_ _> %i_ _> ..._ _> !0 = !{i64 0, i64 42}_ _> > This is slightly more compact. > > (3) Same as (1), with llvm.assume. This feels inferior to annotations. > (4) Extend inrange to non-constantexprs GEPs. It is unclear how this will > interfere with optimizations. I would very much like not to introduce another way to encode assumptions other than llvm.assume. If you want to avoid the extra instructions, use llvm.assume(i1 true) ["range"(%val, %lb, %ub)], which is in line with our move towards operand bundle use.

Thanks, I did not know about that. I've just tried it but it appears that tags have to be attribute names, and !range is not a valid attribute, it's a metadata node. Is there a way to encode this ?

SCEV should be thought about this (as well), unsure what the problem is you describe below. If BasicAA needs to know, sure.

scev-aa already knows how to use range information. If we add a !range metadata in clang right now and use SCEV, there is nothing to do on the LLVM side. My point was that scev-aa is not enabled in the default pipeline, so we might as well teach BasicAA about this cheap case.

Actually I think it makes sense to teach BasicAA about range information anyway (D99247) given that it could already be useful in cases like:

define dso_local void @DoIt(%struct.Histogram* noalias nocapture sret(
%struct.Histogram) align 4 %0, i32 %1, *i8 zeroext %2*) local_unnamed_addr
#0 {
...
*%6 = zext i8 %2 to i64*
%7 = getelementptr inbounds %struct.Histogram, %struct.Histogram* %0, i64 0
, i32 0, *i64 %6*

Where ValueTracking could easily deduce that %6 is in [0,255].

~ Johannes

> > On the clang side: > The clang part is quite trivial as the infrastructure is already in place > to emit dynamic ubsan guards: D99248 <https://reviews.llvm.org/D99248> > > On the LLVM Side: > Alternatives: > (A) - (annotation or assume options) Simply enable scev-aa which knows > how to handle value ranges in general. IIUC it's not enabled in clang > because it has issues with invalidation when code changes, and is therefore > not cacheable. This makes it too slow to be practical. > *(B) *- (annotation or assume options) Teach BasicAA to honor !range > metadata (D99247 <https://reviews.llvm.org/D99247>) > (C) - (inrange option) Teach BasicAA to honor inrange attributes of GEP. > > I was leaning towards (1) and (B) because: > - BasicAA already has basic support for value range analysis > (zero/nonzero), this is a small and natural extension. > - The BasicAA improvement already benefits some existing code (as > evidenced by the test changes in D99247 <https://reviews.llvm.org/D99247_ _>) > - Using range metadata rather than the inrange attribute means that > BasicAA will automatically benefit from improvements in value tracking in > the future. > > Opinions ? > > _> ________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210324/024fedfe/attachment.html>



More information about the llvm-dev mailing list