[RFC] LLVM policy for top level directories and language runtimes (original) (raw)
Hello,
This is a short RFC looking to get some wider community guidance on how to proceed with some restructuring for GPU / offloading. For a bit of background, OpenMP offloading has a GPU runtime library that is currently built as part of the offload/
project alongside the CPU portion. I have a patch [OpenMP] Change build of OpenMP device runtime to be a separate runtime by jhuber6 · Pull Request #136729 · llvm/llvm-project · GitHub that splits the GPU / CPU builds into separate compilation jobs. This currently moves it back into openmp/
where it was before the offload/
split. Build scripts will then enable the openmp
project for the GPU to get the runtime, like so.
-DLLVM_RUNTIME_TARGETS=amdgcn-amd-amdhsa \
-DRUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES='openmp'
The concern raised by @jdoerfert is that in the future we may want to add native LLVM runtimes for other languages, such as sycl
, cuda
, or openacc
. In my current scheme, this would require that all of these projects have a top level directory in the LLVM tree, similar to openmp/
. Then, enabling all of those language’s runtimes would look like this. All of these may also potentially depend on some common utilities present in offload/
.
-DLLVM_RUNTIME_TARGETS=amdgcn-amd-amdhsa \
-DRUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES='openacc;openmp;sycl;cuda'
I believe this is relatively straightforward and matches what we do with other language runtimes like flang-rt
, libc
, or libclc
. The alternative approach proposed by @jdoerfert is to contain all of these in the offload/
directory itself to avoid too many new LLVM directories. We will instead put it under offload like offload/device/openmp
. That approach would look like this under the proposed scheme.
-DLLVM_RUNTIME_TARGETS=amdgcn-amd-amdhsa \
-DRUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES='offload'
-DOFFLOAD_TARGETS_TO_BUILD='openacc;openmp;sycl;cuda'
TL;DR, for the future where offload/
begins to contain other language’s runtimes. Should we prefer a separate top level directory for each one? Or should they all be put under offload/
. I am in personally favor of separate TLD’s matching what we currently do with openmp/
.
Since switching from per-project Subversion repositories to a Git monorepository, I don’t think there is a need to be economical on top-level directories anymore.
On the contrary, we have been quite liberal on them: .ci
, .github
, cmake
, utils
, third-party
, runtimes
, cross-project-tests
, clang-tools-extra
, etc. I myself added one: flang-rt
.
The LLVM_ENABLE_PROJECTS and LLVM_ENABLE_RUNTIMES system requires top-level repositories.
While LLVM_ENABLE_PROJECTS can be replaced by just add_subdirectory
somewhere in another project, LLVM_ENABLE_RUNTIMES comes with a system for cross-compilation.
So far we have a pretty fine-grained separation of runtimes:
- Multiple C runtimes: libc, compiler-rt, llvm-libgcc
- Multiple C++ runtimes: libunwind, libcxxabi, libc++, pstl
- Fortran: flang-rt
- OpenMP (CPU): openmp
- OpenCL: libclc
I don’t see why for runtimes of GPU-supporting languages (OpenMP, OpenACC, OpenCL, SYCL, CUDA, HIP, …) – in contrast for CPU-languages – we should need to be economic with top-level directories and collect them under a single uber-project/runtime.
For users I think the typical use case is to only need support for a subset of those languages (irrespective of whether it is a GPU or CPU-side language), and only compile the runtimes they need using the existing LLVM_ENABLE_RUNTIMES system.
Git also allows sparse checkouts to only download directories that are actually needed, which get more complicated if those are subdirectories. Not everybody needs/wants the entire gigabyte-sized repository.
On the contrary, @jhuber6 is working on making host/CPU and offload/GPU-side more similar to also use the LLVM_ENABLE_RUNTIMES mechanism for NVPTX/AMDGPU targets like he already has done for libc and libcxx.
IIUC correctly, the end goal is that the only difference between host-side and offload-side libraries is that the host-side contains main()
, and the offload-side some code that prepares for kernel execution, such as receiving function arguments from the driver.
Making host- and offload-side runtime libraries build differently would be in conflict with this goal.
In principle, the offload-side can also be a CPU target (e.g. host offloading (execution in the same process), or execution at another MPI rank), or the GPU executing main() with kernels being offloaded to the GPU (reverse offloading).
Long-term, I see the offload top-level directory as the location for common code shared by language-specific GPU runtimes, such as warp/wavefront-intrinsics, tensor operations, or the aforementioned kernel launching code (the offload plugins).
Kind of what LLVM is for compilers, or libc/shared is for host runtimes.
Counterpoints:
- Clang does not have options to dis-/enable C, C++, Objective-C, CUDA, HIP, or OpenMP support separately
- LLVM backends live in
llvm/lib/Target
, not in top-level directories and use their own mechanism to be dis-/enabled - LLVM_ENABLE_RUNTIMES does not compile any runtime by default, but most LLVM backends are enabled by default.
jhuber6 May 13, 2025, 4:07pm 3
Resurrecting this because [OpenMP] Change build of OpenMP device runtime to be a separate runtime by jhuber6 · Pull Request #136729 · llvm/llvm-project · GitHub is blocked until I get some wider community input. Mostly wondering if people like @nikic @AaronBallman @petrhosek @MaskRay @Artem-B or other people familiar with maintaining LLVM infrastructure have an opinion. If no one has any objections then I’d be fine landing what I have, which would mean that future language support for things like sycl
would get a TLD in llvm-project
.
I’ve objected plenty in the PR and in the 3+ discussions we had. I wanted to avoid tainting this thread early on, but you know there are objections against landing what you have.
The PR has two parts, one is fine, one is not.
The latter is not necessary and will, IMHO, simply make our lives harder down the road. We do not gain anything by reorganizing the code, except that “it fits nicer”, for some notion of “fit”. Keeping all GPU runtimes together provides more synergy than splitting them apart, and we can still build them separately and selectively just fine. To go back to the PR at hand and the “fit” argument: the OpenMP GPU device runtime has no connection to the OpenMP host runtime at this point as far as I can tell. However, it has connections to the Offload infrastructure. As I mentioned in the PR, if anything, we should first split the “generic” parts from the OpenMP parts before a move is considered. And even then, the fact that it is unrelated to anything in the OpenMP folder and only usable with the offload folder still tells me where it should live.
Are there technical considerations beyond organizational ones?
If it’s just a matter of organization, I can see it going either way and I don’t have a strong opinion. On the one hand, separate TLDs makes it easier for folks who aren’t already familiar with the directory structures to find those runtime libraries. On the other hand, grouping all of the offloading runtime library support together makes sense too. I think I weakly lean towards grouped in a single directory because I expect we’re going to want to share as much code between the offloading runtimes as we can, and grouping them together allows for a more natural hierarchy.
jhuber6 May 13, 2025, 5:19pm 6
I agree that libomptarget
should also live in openmp/
. Right now the device runtime depends on the core offload/
interface mostly for handling of globals. Ideally this would be pulled into libomptarget
and moved back. That being said, I don’t think that’s a strict requirement for this to be done first, as they both set the relationship where openmp/
defines the language and depends on offload/
for generic parts.
Mostly my thoughts are in-line with @Meinersbur’s. I think it’s clearer that we use the existing LLVM_ENABLE_RUNTIMES
interface rather than making a new one. Having separate top level directories doesn’t stop us from sharing code, we already do similar cross-project includes from LLVM and libc and in this patch itself. My ideal situation is that offload/
provides a core library for both host and device which can be included as-needed. For example, libc++
depends on libc
so for the C++ language we link both.
Thank you for the explanation! I no longer lean in any particular direction.
So far: One existing cmake option vs a similar new one.
Generally, we can do everything either way and the real downside won’t be for users but developers.
Wrt. cmake, I concur with @jhuber6, most offload users should use the distributed cmake receipt for offload, at least as a starting point. If that cmake uses a secondary variable to control the language apis and device runtimes for offload, akin to the secondary cmake for libc++ backend selection, that is as good as top-level folders.
Then let’s work toward that. Let’s split out common parts from the GPU runtime and the host stuff, put the language specific things on top, see how that actually looks and works before we move everything, including common parts, around.
To make a concrete example:
I believe, moving the current DeviceRTL out of offload will more likely couple CUDA offload via LLVM to OpenMP, than to solve any dependence issues in the meantime. Arguably, introducing such a dependence artifically is even more non-sensical than anything we have right now.
Similar to the device rtl argument, there is no/little connection. libomptarget does, and will, heavily depend on /offload
, but barely, if at all, on /openmp
. So will /cuda
and so on. The device runtimes, and the host runtimes, should all be part of /offload
, as that is (1) a strict requirement for them, (2) by far their strongest connection point, and (3) where synergies between them can be found. If code is in /cuda
and /openmp
and /sycl
, the likelihood we see commonalities, e.g., during review, are much lower than if they all reside in /offload/device-rtl
, or /offload/language-api
. More people will monitor the latter and they will see how we should generalize design decisions rather than going back to building N versions of the same thing.
preames May 14, 2025, 2:29pm 9
Skimming the discussion here (and only here, I haven’t read the discussion on the PR), I personally find the arguments for independent top-level directories slightly more appealing, but don’t really see an overwhelming reason to prefer one or the other.
I think this falls into the category of decisions where the best answer is just to pick something and run with it. We can reverse this decision at a later point without issue. The importance about being “right” upfront is much lower than the importance of having in tree development move forward.
jhuber6 May 14, 2025, 4:55pm 10
It’s logically OpenMP because it implements the OpenMP runtime calls, which binds it pretty closely to openmp/
to me. Just because something is on the dependency path doesn’t mean it needs to be co-located, as libc++
builds off of libc
for example. I can see the argument that forcing them in a single place would encourage people make things common.
My concern is that we want to minimize changes to the CMake required to build this stuff. It’s somewhat detrimental to tell users to do it one way and then change it. However that’s somewhat moot considering that this patch does just that, but I’d still like to minimize it if possible.
rnk May 15, 2025, 11:43pm 11
I’m very far from an offload technology expert, but we touched on these technologies for like 5min in the clang area team meeting today, and I made the point that, from an OSS project PoV, we own the results of any offloading ecosystem fragmentation, whether it’s at the runtime level, or the language level (CUDA/HIP convergence rules). With a maintainer hat on, it’s naturally in our interest to converge as much as possible, whereever possible, between these various technologies. Of course, each of these technologies wants to have a free hand to evolve, and every company has a different stake in the growth of these tools, so there is natural tension between the goals of maintainers and the goals of developers adding functionality.
So, reducing duplication between these runtimes where possible without creating unnecessary entanglement seems like a good guiding principle, but how do we apply that in practice? I definitely do not personally have the answer to whether each of these languages deserves a top-level subproject directory, but our decision should be grounded in why we’re making a good tradeoff between these goals.
So, do we think keeping them under the offload/
TLD enables more reuse, or does it create unnecessary entanglement?
jhuber6 May 16, 2025, 5:51pm 12
Realistically this is just a difference between offload/cuda
and cuda/
. Ideally the ‘core’ offload/
project contains all the generic bits. We then just put the language specific parts in the respective TLD. We already have some projects that do this to avoid code duplication. (Right now the libc++
from_chars method and possibly clang
for constexpr math). I don’t think the location of the directory changes too much beyond forcing us to export the offload/
interface more intentionally.
I’d like to cautiously suggest that minimising cmake witchcraft is a reward in itself.
If the ENABLE_RUNTIMES logic does a sane thing with top level directories and not nested directories, and nesting projects requires rebuilding that style of infra one level deep, let’s not do that. The build is quite complicated enough already.
For dir in many_dirs, do the cross compile seems OK to me.
To make it specific. I’m finishing up a PR with initial Cuda and HIP host APIs. They are 99% the same, and I use a common folder for almost all the code. Everything is set up on the new liboffload API, so not libomptarget. The current setup has ~4 common source files and two common “.inc” files. The Cuda/HIP header and runtime implementation specializes a common “.inc” file with the prefix they choose, “cuda”/“hip”, respectively. All of it is located in offload/languages/{common,cuda,hip}
. Sharing becomes easy, and cross-directory includes/dependencies are local enough not to scare me.
I believe there are only downsides to splitting this into:
offload/liboffload/
offload/[languages/]common/ // depends on offload/liboffload
cuda/ // includes headers and source/object files from offload/common and depends on offload/liboffload
hip/ // includes headers and source/object files from offload/common and depends on offload/liboffload
The “upside” is that all the cuda now lives in cuda/
, the issue is that there is not much “cuda” but way more “shared code”, at least that’s the goal. Most “cuda” code right now is in common
and included with a “cuda” prefix. The cuda/
folder is implicitly dependent on the hip/
folder, as they should be in sync, and explicitly on offload/common
. This is much clearer in my current setup.
A side note: During the discussion with @jhuber6 I asked where to install the cuda/
host headers to, i.a., cuda_runtime.h
, that we provide. We settled on llvm_install/include/offload/cuda/cuda_runtime.h
since liboffload installs their headers into llvm_install/include/offload/
. For me this shows how cuda/
is “part” of offload
and not a standalone thing. It likely will never work standalone, or carry enough code to make it a “thing”.
This differentiates it from other top-level folders cited here. They exist because they are “things” by themselves. They might have cross-folder dependencies, but those are proper “llvm subprojects” with communities and use cases. What we are talking about here is making top-level folders for a couple of files that are useless by themselves. The arguments, as far as I can tell, are: 1) we want to reuse ENABLE_RUNTIMES
and not introduce a second-level version of it within offload. 2) “language” code belongs in “language” subproject, following openmp. 3) offload/
is only for generic code; no language folder shall be in there.
I personally see some appeal in 1) and 2), but not enough to justify this as we lose connections, locality, and, I fear, reuse, over time.
I always agree with this; cmake witchcraft is certainly too complicated for a mere mortal like me.
That said, we have precedent for a “second-level selector” (libc++ backends), so I am not overly afraid of ENABLE_RUNTIMES=offload OFFLOAD_LANG_RUNTIMES="cuda;openmp"
(see first post).
This also shows the connection between the language runtimes and offload/
.
jhuber6 May 29, 2025, 8:13pm 15
Honestly this is the one case where CUDA and HIP are so similar it’s not really worth separating them. Instead I’d just suggest using a single name. Common utilities can all go in offload/
but the actual language runtime lives in cuda/
or kernel/
or whatever you want to call it. The installation directory is a bit of a question. CUDA seems to put theirs in targets/x86_64-linux/include/cuda_runtime.h
, which would probably be close to include/x86_64-unknown-linux-gnu/cuda_runtime.h
for us if we wanted to maintain compatibility. I just think it’s much clearer and in-line with the current way to handle projects if we just kept languages separate.
Common utilities can all go in
offload/
but the actual language runtime lives incuda/
orkernel/
or whatever you want to call it.
So, to confirm, do I read your suggestion correctly as:
Most code will reside in offload/languages/kernel
with a new toplevel folder kernel
(or cuda_hip), that contains the small cuda and hip-specific parts (e.g., the stuff when they actually do not match perfectly). And that toplevel folder will then include files from, and depend on, the offload/languages/kernel
.
jhuber6 May 30, 2025, 9:55pm 17
That’s what I would prefer, yes. Though I think the common utilities would probably be generic helpers, not something tied to a language in offload/
. Anything really language specific would just go in the TLD.
Anything really language specific would just go in the TLD.
So, you want:"kernel"/{common,hip,cuda}
depending onoffload/liboffload
and the hip/cuda folders depend/include on common stuff.
What would the TLD be named? “kernel” seems too generic. cuda_hip? gpu_kernel_runtimes? I liked that I could put them in a “languages” folder under offload before.
I’m still not convinced that this is better, but I’ll commit to the consensus. (Not to say that I really see a consensus here.)