[RFC] LLVM policy for top level directories and language runtimes (original) (raw)
Hello,
This is a short RFC looking to get some wider community guidance on how to proceed with some restructuring for GPU / offloading. For a bit of background, OpenMP offloading has a GPU runtime library that is currently built as part of the offload/
project alongside the CPU portion. I have a patch [OpenMP] Change build of OpenMP device runtime to be a separate runtime by jhuber6 · Pull Request #136729 · llvm/llvm-project · GitHub that splits the GPU / CPU builds into separate compilation jobs. This currently moves it back into openmp/
where it was before the offload/
split. Build scripts will then enable the openmp
project for the GPU to get the runtime, like so.
-DLLVM_RUNTIME_TARGETS=amdgcn-amd-amdhsa \
-DRUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES='openmp'
The concern raised by @jdoerfert is that in the future we may want to add native LLVM runtimes for other languages, such as sycl
, cuda
, or openacc
. In my current scheme, this would require that all of these projects have a top level directory in the LLVM tree, similar to openmp/
. Then, enabling all of those language’s runtimes would look like this. All of these may also potentially depend on some common utilities present in offload/
.
-DLLVM_RUNTIME_TARGETS=amdgcn-amd-amdhsa \
-DRUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES='openacc;openmp;sycl;cuda'
I believe this is relatively straightforward and matches what we do with other language runtimes like flang-rt
, libc
, or libclc
. The alternative approach proposed by @jdoerfert is to contain all of these in the offload/
directory itself to avoid too many new LLVM directories. We will instead put it under offload like offload/device/openmp
. That approach would look like this under the proposed scheme.
-DLLVM_RUNTIME_TARGETS=amdgcn-amd-amdhsa \
-DRUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES='offload'
-DOFFLOAD_TARGETS_TO_BUILD='openacc;openmp;sycl;cuda'
TL;DR, for the future where offload/
begins to contain other language’s runtimes. Should we prefer a separate top level directory for each one? Or should they all be put under offload/
. I am in personally favor of separate TLD’s matching what we currently do with openmp/
.
Since switching from per-project Subversion repositories to a Git monorepository, I don’t think there is a need to be economical on top-level directories anymore.
On the contrary, we have been quite liberal on them: .ci
, .github
, cmake
, utils
, third-party
, runtimes
, cross-project-tests
, clang-tools-extra
, etc. I myself added one: flang-rt
.
The LLVM_ENABLE_PROJECTS and LLVM_ENABLE_RUNTIMES system requires top-level repositories.
While LLVM_ENABLE_PROJECTS can be replaced by just add_subdirectory
somewhere in another project, LLVM_ENABLE_RUNTIMES comes with a system for cross-compilation.
So far we have a pretty fine-grained separation of runtimes:
- Multiple C runtimes: libc, compiler-rt, llvm-libgcc
- Multiple C++ runtimes: libunwind, libcxxabi, libc++, pstl
- Fortran: flang-rt
- OpenMP (CPU): openmp
- OpenCL: libclc
I don’t see why for runtimes of GPU-supporting languages (OpenMP, OpenACC, OpenCL, SYCL, CUDA, HIP, …) – in contrast for CPU-languages – we should need to be economic with top-level directories and collect them under a single uber-project/runtime.
For users I think the typical use case is to only need support for a subset of those languages (irrespective of whether it is a GPU or CPU-side language), and only compile the runtimes they need using the existing LLVM_ENABLE_RUNTIMES system.
Git also allows sparse checkouts to only download directories that are actually needed, which get more complicated if those are subdirectories. Not everybody needs/wants the entire gigabyte-sized repository.
On the contrary, @jhuber6 is working on making host/CPU and offload/GPU-side more similar to also use the LLVM_ENABLE_RUNTIMES mechanism for NVPTX/AMDGPU targets like he already has done for libc and libcxx.
IIUC correctly, the end goal is that the only difference between host-side and offload-side libraries is that the host-side contains main()
, and the offload-side some code that prepares for kernel execution, such as receiving function arguments from the driver.
Making host- and offload-side runtime libraries build differently would be in conflict with this goal.
In principle, the offload-side can also be a CPU target (e.g. host offloading (execution in the same process), or execution at another MPI rank), or the GPU executing main() with kernels being offloaded to the GPU (reverse offloading).
Long-term, I see the offload top-level directory as the location for common code shared by language-specific GPU runtimes, such as warp/wavefront-intrinsics, tensor operations, or the aforementioned kernel launching code (the offload plugins).
Kind of what LLVM is for compilers, or libc/shared is for host runtimes.
Counterpoints:
- Clang does not have options to dis-/enable C, C++, Objective-C, CUDA, HIP, or OpenMP support separately
- LLVM backends live in
llvm/lib/Target
, not in top-level directories and use their own mechanism to be dis-/enabled - LLVM_ENABLE_RUNTIMES does not compile any runtime by default, but most LLVM backends are enabled by default.