[llvm-dev] [RFC] How to manifest information in LLVM-IR, or, revisiting llvm.assume (original) (raw)
Doerfert, Johannes via llvm-dev llvm-dev at lists.llvm.org
Mon Dec 16 15:16:44 PST 2019
- Previous message: [llvm-dev] LLVM Weekly - #311, December 16th 2019
- Next message: [llvm-dev] [RFC] How to manifest information in LLVM-IR, or, revisiting llvm.assume
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Abstract:
It is often hard or impossible to encode complex, e.g., non-boolean,
information in an llvm.assume(i1)
. This RFC describes various problems
we have right now and provides alternative design ideas.
Some Existing Problems:
A) The boolean requirement.
The current llvm.assume(i1)
expects a boolean that is known to hold
true at runtime (once the llvm.assume
call is reached). However,
forming this boolean for "arbitrary" information is hard or even
impossible. Alignment, which is an easy case, already requires 3 extra
instructions, one of which is a ptrtoint
and therefore not really
optimizer friendly. Dereferenceability, is even scarier. Thus, we are
currently limited to (almost) boolean information when we want to
manifest information in the IR (which happens due to inlining or code
motion, see https://reviews.llvm.org/D69477 for an example).
B) The one-use checks.
Various pattern matching passes check the number of uses of a value.
However, while llvm.assume
is eventually eliminated by the backend
it will still increase the use count of the operand. I doubt we are
able to not increase the use count at all (and keep everything else
working), but what we can do is make sure the uses in "assumptions"
are easy to spot, thus filter. This is not the case right now because
of the additional instructions we need to make the values boolean.
Even if you have __builtin_assume(ptr);
the ptr
use will not be
the llvm.assume
call but a icmp
.
C) The side-effect avoidance.
__assume
, __builtin_assume
, __builtin_assume_aligned
, and OpenMP
omp assume
are all defined to not evaluate their argument, thus to
not cause the side effects that the evaluation of the argument would
otherwise imply. The way we implement this restriction is by not
emitting the argument expression in IR if it might cause a side
effect. We warn the user if that happens. While this is generally
speaking fine, it would be interesting to lift the implementation
restriction. One benefit would be that we could implement assert
through __builtin_assume
properly.
D) The singleton ranges.
An llvm.assume
will only provide information for a single program
point not a range. Even if the beginning and the end of a range have
an llvm.assume
, there are cases where the information will not be
as good as a proper range assumption. OpenMP 5.1 introduces such
range assumptions but other situations would benefit from them as
well. Take for example function attributes and inlining. Since we know
they hold for the entire function and not only when it is entered we
could encode the information over the entire range of the inlined
code.
Some Site Notes:
It seems of little use that we interleave the code for the assumed expression with the user code. Having the
llvm.assume
allows us to tie information to a program point, beyond that we just clutter the function with instructions that we later remove anyway.Reconstructing information from the pattern of instructions that feed into the
llvm.assume
is also not optimal, especially since we do not need to "optimize" these instructions anyway.The current (=arbitrary) side-effects of
llvm.assume
are too strong. We haveinaccessiblemem_or_argmemonly
and we might be able to come up with something even more specialized for this, e.g.,control_dependences_only
to indicate that there are only control dependences.
Some Design Ideas:
Use named operand bundles to encode information. If we want to encode property XYZ for a value %V holds at a certain program point and the property is dependent on %N we could encode that as:
llvm.assume() ["XYZ"(%V, %N)]
There are various benefits, including:- It is freely extensible.
- The property is directly tied to the value. Thus, no need for encoding patterns that introduce extra instructions and uses and which we need to preserve and decode later.
- Allows dynamic properties even when corresponding attributes do
not, e.g.,
llvm.assume() ["align"(%arg_ptr, %N)]
is fine and once%N
becomes a constant, or we determine a lower bound, we can introduce thealign(C)
attribute for%arg_ptr
.
Outline assumption expression code (with side-effects). If we potentially have side-effects, or we simply have a non-trivial expression that requires to be lowered into instructions, we can outline the assumption expression code and tie it to the
llvm.assume
via another operand bundle property. It could look something like this:__builtin_assume(foo(i) == bar(j));
will cause us to generate/// Must return true! static bool llvm.assume.expression_#27(int i, int j) { return foo(i) == bar(j); }
and a llvm.assume
call like this:
llvm.assume() ["assume_fn"(@llvm.assume.expression_#27, %i, %j))] So we generate the expression in a new function which we (only) tie to the
llvm.assume()through the "assume_fn" operand bundle. This will make sure we do not accidentally evaluate the code, or assume it is evaluated and produced side-effects. We can still optimize the code and use the information that we learn from it at the
llvm.assume`
site though.
- Use tokens to mark ranges.
We have tokens which can be used to tie two instructions together,
basically forming a range (with some conditions on the initial CFG).
If we tie two
llvm.assume
calls together we can say that the information provided by the first holds for any point dominated by it and post-dominated by the second.
I tried to be brief so I hope I could still get some ideas across without too much confusion. In any case, please let me know what you think!
Thanks, Johannes -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 228 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191216/1e43f565/attachment.sig>
- Previous message: [llvm-dev] LLVM Weekly - #311, December 16th 2019
- Next message: [llvm-dev] [RFC] How to manifest information in LLVM-IR, or, revisiting llvm.assume
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]