[llvm-dev] [RFC] The implements
attribute, or how to swap functions statically but late (original) (raw)
Johannes Doerfert via llvm-dev llvm-dev at lists.llvm.org
Fri Mar 19 17:10:36 PDT 2021
- Previous message: [llvm-dev] [cfe-dev] Zero length function pointer equality
- Next message: [llvm-dev] [RFC] The `implements` attribute, or how to swap functions statically but late
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
The basic idea is to provide a way to replace a callee statically inside the LLVM-core pipeline [0]. The initial use case was the missing translation of certain (math) intrinsics by the NVPTX backend, however, we believe there are other use cases that would benefit from a generic model, one is described below as well as the benefits over alternatives.
Use cases
1a) NVPTX + fast-math + math functions
The original use case was discussed here [1] and the problem is shown
here [2].
In a nutshell, llvm.sin
(and friends) are not supported by the backend
and it
will yell at you. This is unfortunate but could arguably be resolved in
a lot of
different ways (below). Note that there is not necessarily a 1:1 mapping
for all
intrinsics and we might need to write a wrapper which would probably be in
clang/lib/Header with other CUDA wrapper code.
Considered solutions we did not pick:
a) Teach the backend explicitly about the mapping llvm.sin
-> __nv_sin
.
b) Teach clang not to emit the intrinsics in the first place.
c) Use an existing overload "hack", e.g., asm.
Let's discuss those first. a) Works, is very limited to this use case, the mapping is not with the declarations and definitions which makes the maintenance less appealing. b) Seems easy enough, we loose all the benefits of intrinsics though, more on that later. c) Right now, IR isn't happy with defining intrinsics so there is little we could do better than b) here I think, short of allowing intrinsic definitions (which is an option).
1b) User can overwrite the implementation of any intrinsic, regardless if the backend supports it or not. They still get the intrinsic benefits during the compilation. This is not limited to NVPTX and math intrinsics but also opens the door for cross platform tooling, e.g., Tool XYZ will emit intrinsics our backed doesn't support but we can map them to implementations for our architecture.
1c) You can prototype/test different lowerings for functions and intrinsics easily by linking in a bitcode file (which can be compiled form C*).
- Function specifications and implementations
I would like to embed a function specification with the implementation, e.g.
void vec_add(double *A, double *B, int N) {
for (int i = 0; i < N; ++i)
A[i] += B[i];
}
__attribute__((implements("vec_add")))
void vec_add_impl(double *A, double *B, int N) {
#pragma omp parallel for
for (int i = 0; i < N; ++i)
A[i] += B[i];
}
I think that would open up cool possibilities:
A) Replace only the first N specifications with implementations, allows
to improve debugging.
B) Use the specification for the first part of the optimization pipeline
so you
can derive facts the implementation "hides". E.g., llvm.sin
calls
could be
annotated as readnone
while the __nv_sin
call contains inline
assembly and
acts as a optimization barrier. Or, in the above example, it is
obvious from the
"specification" that B
is readonly but the implementation could
arbitrarily hide this,
e.g., the implementation might be a runtime call.
C) Program verification, test synthesis, etc. would be possible on a new
level in a defined way.
By keeping implementation and specification close together and
written in the same
language we might get somewhere in this area.
Thoughts
I initially thought about the reverse attribute, so vec_add
in the
above example would have
__attribute__((specifies(vec_add_impl))
for the same reasons (A-C).
The problem is that I'm
not sure how to attach anything to an intrinsic. Some __asm__
trickery
might help but it's
unclear that this is better than the implements
version. Maybe we want
both for convenience
with the implements
version being able to handle intrinsics.
~ Johannes
[0] https://reviews.llvm.org/D98516 [1]https://lists.llvm.org/pipermail/llvm-dev/2021-March/149117.html [2] https://godbolt.org/z/PxsEWs
-- ─────────────────── ∽ Johannes (he/his)
- Previous message: [llvm-dev] [cfe-dev] Zero length function pointer equality
- Next message: [llvm-dev] [RFC] The `implements` attribute, or how to swap functions statically but late
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]