[RFC] A reference pass-plugin in LLVM (original) (raw)

Pass-plugins like Polly, Enzyme, Clad and omvll have a long history in LLVM. They allow third-parties to develop domain- and platform-specific extensions for their toolchains and they mitigate the trend to upstream everything into the monorepo. The New Pass Manager has a well-defined plugin interface and supports IR-level plugins today in the optimization pipeline. LLVM’s codegen pipeline doesn’t use the New Pass Manager (yet), so it’s out of scope for now.

In the last years I worked with various companies that develop pass-plugins for both, products and internal tools. We have been facing a number of challenges that ultimately indicate a lack of infrastructure for pass-plugins in upstream LLVM.

This RFC aims to improve our collective experience when building and distributing pass-plugins for LLVM-based tools. The idea is to introduce a reference pass-plugin that gets built and tested on a broad range of platforms and configurations. Ideally it engages toolchain vendors to include it in their distributions. I prepared the following patch as a baseline for a conversation:

Given that we find agreement on something around these lines, we could move on like this:

Land the reference plugin and cross-project tests that reflect the status quo
Roll out tests to cover a broad range of platforms and configurations
Fix bugs and track new ones that we uncover on the way
Extend the plugin interface for commonly requested features

Naturally, the earlier steps are quite sharp already and the later ones are still more blury. Input from the community is very welcome and will hopefully guide us through the process. Earlier this year, I gave a presentaiton and ran two round-tables on EuroLLVM. If you are looking for some more context, please find the recording here: https://www.youtube.com/watch?v=pHfYFGVFczs

Why a reference pass-plugin?

The infrastructure for pass-plugins has issues that have no visibility right now. This is due to missing test coverage and missing test execution:

We do have Bye and ExampleIRTransforms, but these are examples and intentionally use a small subset of functionality. E.g. LLD has a regression test that uses Bye, but it doesn’t cover thread-safety, because the plugin is just too simple.
In general, regression tests for examples are running rarely, because most bots don’t build examples. For pass-plugin examples, we see it in missing maintenance and clumsy integration in the test suite.
We do have unit tests PluginsTest.cpp and DoublerPlugin.cpp that run regularly, but they check specific details in the C++ API.

What we are missing are end-to-end tests in LLVM, Clang and LLD, that run regularly on many bots. Adding a reference plugin under tools helps here, because it would be available by default and visible to subprojects. The tests in the initial patch cover a wide range of features accross subprojects and they are easy to extend.

Another aspect affects distributions: Pass-plugins use the LLVM C++ ABI, which comes with a lot of pitfalls that are difficult to debug. If toolchain distributions shipped a working reference plugin, then plugin vendors could compare their own implementations against it. Having an interesting reference plugin in LLVM, might incentivize toolchain vendors to ship it. This would help plugin vendors to build and maintain their products.

Why Python?

The initial patch proposes a generic pass-plugin that runs a Python script for the actual IR transformations, because:

Python is popular and there are various bindings to LLVM
Python is required to build and test LLVM, so it adds no additional dependencies
Python scripting helps us test a wide range of plugin features
Python’s Limited C-API is pretty stable, so we expect low maintenance and no additional compatibility requirements
Users can experiment with IR transformations without building LLVM, which may guide them on the plugin path rather than the upstreaming path (see the Realtime Sanitizer example in my EuroLLVM presentation)
A plugin that is useful might actually get shipped by toolchain vendors
There are (at least) two independent experimental implementations in the wild:
- github.com/aneeshdurg/pyllvmpass which uses llvmcpy bindings
- github.com/weliveindetail/llvm-py-pass which uses llvmlite bindings (my preliminary PoC)
MLIR recently introduced a way to write passes in Python (when driving MLIR from Python): https://github.com/llvm/llvm-project/pull/156000

Why not roll out tests immediately?

Adding coverage for existing untested functionality isn’t trivial. It often uncovers platform-specific issues that need time to investigate. In order to keep the revert/reland churn low, it seems reasonable to build baseline tests for a restricted set of platforms. Once they are stable, we can lift restrictions step by step to extend coverage. This keeps diffs in reverts small and the number of affected bots and developers low.

The initial patch proposes the requires clause: native, system-linux, llvm-dylib, plugins, pypass-plugin. Tests should run on at least two build bots: llvm-nvptx-nvidia-ubuntu, sanitizer-x86_64-linux-android

What bugfixing is expected?

The initial patch outlines a few known bugs that should be fixed. The most famous one might be plugin-parameter parsing in Clang, which still requires an extra -Xclang -load -Xclang <path> today. This won’t be easy, so it’s best to nail down the status quo and get the big picture first.

We will likely find more issues on the way. We can track them with XFAILing tests and fix them where stakeholders see priority. Everything is better than the current situation where bugs are in the dark.

What are commonly requested features?

Real-world pass-plugins rarely run context-free static transformations. On the contrary, they often need additional context at build-time and support libraries at runtime. They also have their own release cycles and need to match customer toolchain versions. This entails requirements that we don’t currently meet, e.g.:

Can we enable pass-plugins to inject extra cc1-options like -record-command-line? Command-line options provide insights like opt-level, hardening-mode or exception-model. These are essential for domain-specific transformations and caching of intermediate results (just like ccache for objects).
Can we establish a feature for pass-plugins to add options/inputs on the link-line? This is currently possible with platform-specifc workarounds. A unified solution with tests would be very much appreciated.
Can we make it easier to locate pass-plugins, so that -fpass-plugin=omvll loads /usr/lib/llvm-22/plugins/omvll-pluign.so? Tools with plugin support typically offer 3rd-parties a location to install their plugins. It makes loading easier and allows for implicit version match. It should be easy to come up with a sound traversal strategy.

I am sure this isn’t the end of the line, but for now, I think it’s enough planning to decide on the general direction. Looking forward for your feedback!