[RFC] MLIR Project Lighthouse (original) (raw)
Proponents
@rengolin @rolfmorel @banach-space @javedabsar @matthias-springer @nicolasvasilache @asiemien @dcaballe @ftynse
References
TCDG Meeting about Lighthouse #1
TCDG Meeting about Lighthouse #2
Rationale
LLVM pipeline stability is driven from having an upstream end-to-end pipeline with Clang and Flang, as well as several downstream implementations. Unlike LLVM, MLIR only has downstream implementations, with many additions to dialects, transforms, and passes that cannot be shared upstream. This creates a fragmentation in the ecosystem: upstream dialects become hollow, with weaker semantics and no canonical form; there is no upstream pipeline, and transforms and passes are just examples of what to do, but without connection to each other; conflicting requirements push design on divergent directions, creating schism.
With the recently created Tensor Compiler Design Group, we aim to reduce contention and speed up design discussions, but without a common story upstream, there is only so much we can achieve. So we propose the creation of a project that helps demonstrate how transforms and dialects can fit together to achieve different pipeline goals, as a “guiding light” to upstream design as well as downstream integration.
However, MLIR’s flexibility to adapt to the various domains would be diminished if we created a single canonical pipeline like LLVM. Any such pipeline (for example IREE or CIRCT) would force MLIR into a design corner. So the proposal instead is to create a separate project, similar to LLVM’s test-suite, with various “bubbles” of smaller pipelines (for example, ingress-to-tensor, tensor-to-vector, different hardware egress) and provide end-to-end flows that combine those bubbles.
This allows us to not carry dependencies on ingress/egress into the LLVM monorepo, expand into multiple domains and create multi-dimensional testing and benchmarking. It also allows CI builders to pick-and-choose what their platform can/should test, so we get testing coverage by need, not combinatorial choices. More importantly, it would help guide the needs of the community to influence the design of dialects and transforms with clear evidence: concrete tests, well defined pipelines, community that it serves.
In essence, this project should guide you through using MLIR for your own projects, showing the way, but not forcing you to follow a particular path. Essentially, the role of a lighthouse.
Scope
The long term goal of this project is to serve as a guiding light for the whole of MLIR, and allow various user patterns to share testing, recipes, design decisions and to have upstream validation for their assumptions. New projects using MLIR could use this as a starting point to create their own pipelines closer to upstream and in turn foster upstreaming of their own common parts.
However, in the short term, our main goal is to serve as a repository to demonstrate the multiple paths that tensor compilers can go through MLIR dialects and transforms. This will allow us to discuss those patterns upstream, even if the code that uses it is mostly downstream (e.g.,. IREE, TPP, etc.).
The initial focus on the Tensor Compiler stack reflects our current priorities and capacity. That said, if this effort would be helpful for your MLIR-based project, we’d love to hear from you - let’s explore how we might collaborate. Our goal is to make this infrastructure broadly useful, but we also recognise that supporting additional use cases will take deliberate effort.
Core Proposal
The proposal is to create an incubator project in LLVM’s Github org (llvm-project), similar to the existing llvm-test-suite, that concentrates the following components:
- A build infrastructure to collect various dependencies (ex. PyTorch, XLA, ONNXRT, torch-mlir) at “known good commits”, and build them in the “right way” to a particular build of LLVM (with Clang, MLIR, OpenMP, etc).
- A collection of tools to create and operate pipelines, either via transform schedules or pass managers, so that we can run tests to check for IR output (FileCheck) and/or execution output (fp-cmp, norms).
- A structured directory of recipes for the pipeline / schedules, so they can be reused, and form end-to-end pipelines, cross-testing them over different targets and environments.
- A collection of golden files to compare runs against, which can be one for all targets, or to cover specific conditions (ex. different targets, types, quantization).
What this project is
The key here is to serve as an upstream demonstrator of existing expectations. Downstream projects (such as IREE, TPP, Tile IR, CIRCT) would work together with their communities to add common recipes as standalone scripts/binaries or -opt
pass pipelines, and tests, to make sure integration end-to-end testing is done in upstream LLVM/MLIR.
These should not be a replication of their downstream tests, or a copy of existing MLIR integration tests. Such recipes are the documentation and validation of common patterns to all upstream and downstream MLIR users of their respective usages. With a set of common tests, we can more easily argue about invariants, canonical forms, and design decisions, even if there are multiple (concurrent) paths at play.
Repeated or opposing tests (if one passes, the other breaks) would obviate the need for a better design, and lead to RFCs into MLIR, implementation discussions, code/design change and back to stable testing. With time, the design will become stable enough that we can collectively decide to bring the project to the core tier of support (not the monorepo).
What this project is not
This is not about creating an official “tensor compiler” upstream, or moving dialects and transforms away from MLIR core. Practically speaking, this project should not define new operations, types or passes with the rare exception of some trivial ones needed for testing. Neither is it about competing with existing downstream projects, or making it harder for those projects to co-exist.
We want to foster the need to move common code into MLIR core and obviate the separation of what’s upstream and what’s downstream, not to create another layer of segregation between MLIR and its users.
End-to-end stability
One key issue we need to solve is how we get the ingress MLIR from the frameworks we plan to create end-to-end pipelines with. We’ll need to import “real world” models/code from PyTorch, OpenXLA, ONNX, Triton and others, through some MLIR generator (such as torch-mlir or inductor) and then pipe those through the various schedules to reach target execution.
We can solve this problem by having CMake/Python scripts to fetch, build and run particular versions of those tools and generate MLIR files automatically, and further scripts that would pick the appropriate schedules and know how to execute on the target hardware. However, managing the versions of all those tools in addition to LLVM’s own version would be hard to keep.
There are two main ways we can solve this:
- Keep those scripts in the lighthouse project, but make them all optional, and allow a configurable build system to tailor to particular needs on different hardware.
- Move those to a separate project and only keep the MLIR files in the lighthouse project. We still need to solve the same problems, but decoupling complexity may help long term.
The first approach helps with keeping all in one place (CI YAML are much simpler that way) while the second helps separate the complexity. Having a sub-module on the lighthouse to this separate project (as well as torch-mlir) could help with the build system difficulties.
We cannot, however, have that complexity in the monorepo. Not only complex CMake hacks will make building MLIR much harder for those who don’t need it, but we’ll not be able to use sub-modules or have hard dependencies in the monorepo on separate projects, especially ones outside of the LLVM umbrella.
All in all, these are issues that are easier solved with iterative implementation. It will be easier to find issues during initial implementation and open the solution to what works best in the end and not commit to a particular solution before we have all the information we need to make that decision.
Incubator project
As per the incubator policy, we need to demonstrate a few requirements to be considered for addition.
These are:
- Mission alignment: Foster MLIR growth, improving upstream design and downstream interoperability, while increasing the quality of testing and expressing usage and semantics.
- License / Patent / CoC / Coding standards: Obviously applies, by creation.
- Charter & development plan: This is initially designed and developed by the Tensor Compiler Design Group, but we hope very soon after will be adopted by the wider community and we’ll work on the design together. See implementation details below.
- Community: This will be all MLIR users, including downstream and upstream, designers and implementers, industry and academia/enthusiasts.
- Path to the monorepo: Not applicable. This is to remain as a separate test-suite to avoid introducing hard dependencies to the monorepo. We hope that “first-class” support becomes similar to the llvm-test-suite, and not need to move into the monorepo.
- README: Will be added at the root of the project once created.
- RFC: This document.
Implementation Details
Build System
CMake based build system similar to the llvm-test-suite, that can:
- Fetch and/or build various dependencies (e.g.: torch-mlir) at a specific commit/branch/tag without the requirement of being a submodule.
- Fetch and build LLVM/MLIR with the appropriate sub-projects and targets, which will be different depending on the end-to-end pipeline being tested.
- Run the appropriate tests via CMake and/or a Python harness that knows what tests to pick, which recipes to combine and where the golden files are to compare IR/output to.
The standalone example in the MLIR repository has enough boilerplate to help us create a reasonable starting point. The existing downstream projects have enough variations to help cover the rest.
Additional build systems (Bazel, etc) can co-exist, but will be treated as the rest of the project in the peripheral tier of support. Interoperability between the build systems is encouraged, but the official upstream buildbot should run on CMake and that must work at all times.
Basic Tools
Python scripts or C++ files that help combine and run the pipelines. We should make use of existing LLVM tools as much as possible (e.g: mlir-opt, mlir-runner).
Depending on complexity, we can leave the execution to CMake tricks, combine with simpler Python tools, reuse existing harnesses (e.g: gtest, benchmarks) or create new harnesses from scratch. It is recommended to start simple and only create new tools when nothing better is available. The systems (Linux/Mac/Windows), frameworks (PyTorch/JAX/ONNX) and targets (CPU/GPU/XPU) supported will depend on individual efforts by the interested parties. Like the rest of LLVM’s support policy, the amount of effort and code quality will determine what gets included and potentially removed.
We’ll also need a set of tools to extract MLIR from various frameworks. PyTorch, JAX, ONNX all need to be consumed and converted to MLIR before we can start running pipelines on them, and those will need various scripts to download, install/build, pull models or programs, execute them through some pre-compiler (such as torch-mlir) and output a reasonable MLIR file that can be consumed by mlir-opt and mlir-runner. In the future, we could also add C++, Fortran, Verilog and other MLIR users to the project.
The key here is to not force dependencies. Tensor compilers should not need to download and build EDA tools, while CIRCT-like projects should not need to pull the entirety of PyTorch, XLA and ONNXRT. We should also be more fine-grained on dependencies, to help projects focus on their core business when testing (e.g.: only PyTorch) and to promote adoption across all ranges of maturity.
Finally, there will be a discussion on how to compare floating-point output. The llvm-test-suite has a tool called fpcmp, which compares output and prints out absolute/relative deltas. We should move it to a more common location (perhaps LLVM) to enable reuse, and improve its ability to detect tensor-related issues.
Recipes
We’ll need two layers of recipes:
- Focused schedules, with small transform schedules or pass bundles, that make small steps in the pipeline graph. For example a sequence of: canonicalization, two or more transforms, and some cleanup. These should work independently. Those recipes document the expectations and requirements of those transform passes and the dialects that they operate on.
- High level pipelines, that combine the focused schedules above and possibly additional passes and transforms in between, to achieve an end-to-end pipeline and either output some hardware-specific object code or execute in hardware and emit a reasonable output. These recipes document the expectation of MLIR users and how the transforms and dialects combine together.
These recipes will combine to form test units (like the LLVM test-suite). These tests can be a focused recipe (local) test, a longer pipeline test or an end-to-end test that takes in from imported frameworks to execution on a particular target.
A PyTorch to GPU pipeline would start after the Torch MLIR is created, go through ingress lowering to TOSA and/or Linalg, then execute various transformations and emit GPU code for LLVM or some other toolchain to consume and execute on the GPU.
An ONNX to CPU pipeline would run a similar ingress lowering, but run a different set of bubbles, and could be executed directly via mlir-runner, with potentially more information about the target, so that LLVM can create the right back-end options.
Alternative Paths & Optionality
As exposed above, different high level pipelines may have unique choices on how to go from the same dialects to the same targets. For example, a PyTorch model can go through Inductor, to low level IR (vector) and then to target, or through Dynamo, high-level IR (Linalg), tiling/fusing, then to low level IR, etc.
These pipelines may be composable (same low-level path to both) or not. It will depend on the separate usages and their needs. The figure below tries to illustrate a potential multiplicity of pipelines (different colours) using the same schedules (purple nodes) in different order / skipping steps.
This optionality is inherent in catering to different users, and it’s a key feature of this proposal. We want to allow different usages and common infrastructure by allowing different sub-schedules to be reused by multiple pipelines.
We’ll be able to design these focused schedules to cater for different cases without needing to duplicate the implementation details (via parametrization, canonical forms, invariants, etc), reducing the cost to downstream implementations and increasing the power in the MLIR dialects and transforms, to be used to even more cases that may not be explicitly designed for.
Golden Files
Like LLVM LIT testing, we can use FileCheck to verify the IR outputs, but that’s not suitable for test-suite style checks.
Like llvm-test-suite, we need to compare the output with some golden file, a known-good output that is either identical or close enough numerically to allow for floating point issues in different hardware.
However, sometimes the differences in precision that are allowed for one platform is too much on another platform. So we need a way to encode what is the expectation of the target we’re working on, not just the hardware support and toolchain aggressiveness to fast-math, but also quantization issues (like accumulator types) and runtime library issues.
We can encode such expectations in a simple directory hierarchy and automate verification with an fpcmp-like tool.
Next Steps
If the idea is accepted by the community, and the incubator project is created, we shall start working on it to define ingress, recipes, build systems and expectations. This should be in the same way we work on LLVM today, via pull requests, and share the same permissions we have on the monorepo.
The main interested parties shall provide infrastructure for CI, benchmarking and releases (when appropriate). We’ll begin with Intel and AMD CPUs and GPUs, and can extend to others once more hardware is available.
To be clear, this will not be a code dump or a massive initial effort. It will be an incremental deployment of work that is mostly spread across multiple projects until we have something that we’re happy with. It will also be a volunteering effort, so slow progress consisting of weird intermediate states. Another reason not to be in the monorepo.