MLIR Organization & Charter (original) (raw)

Proponents

@rengolin @stellaraccident @ftynse @jpienaar @banach-space @nicolasvasilache @clattner @MaheshRavishankar

Introduction

In the past few years, MLIR has grown immensely and became widely adopted across the industry. All major hardware vendors use it in production and it’s a popular choice amongst AI accelerators and software start-ups. It has a lot of tools with which to build tensor compilers for ML and HPC (like XLA and IREE), hardware design (like CIRCT), language front-ends (like Flang, Clang), and much more. There are also many downstream uses in various organizations in private or less advertised settings.

However, MLIR has become a victim of its own success. Its flexibility in creating new dialects, and the speed and directions in which multiple projects needed going, made it practically impossible to focus on a core pipeline and upstream compiler that has been the main reason for LLVM’s success. For that reason, for the last few years, the MLIR ecosystem has been suffering from a lack of clarity and direction.

In this document, and its associated RFC and Survey, we propose a new governance model for MLIR that focuses on the direction of a few core parts of MLIR, while remaining flexible to unrelated projects and experimentations. The core idea here is not to stifle innovation, but to bring the industry (corporate and academia) behind the core parts together under the same charter, to encourage better collaboration and co-evolution.

LLVM found solutions to many of these problems that ended up working well for the type of community and manner of evolution that it sought out, and that has resulted in most of the industry relying on it for compiler infrastructure. We are working to put MLIR into a similar evolutionary groove.

Background

Before we get into details, we want to express a few assumptions that are driving this proposal. These are the guiding principles we use to design the governance model, code ownership, charter definition and infrastructure decisions. These assumptions were reinforced by the result of the survey and recent upstream discussions. They are built upon the existing governance model creating a structure that should complement it rather than replace it, similar to the clang ownership proposal.

The Need for a Technical Charter

Currently, too many discussions in the forum rely on personal views and interpretation of compiler design. This not only isn’t helpful to reach consensus, but often gets borderline ad hominem. If we focus the design consensus into a technical charter, designed and decided by community maintainers and driven by actual implementations and real world usage, we can discuss each point against the charter and not people’s interpretations. By encouraging charters that cover both a technical guiding light and set out the expectations for how to achieve this, we believe that the technical result will be better than it is now, and we will address some of the recurring feedback asking for more visibility into the current state and next steps for key parts of the infrastructure.

Multi Governance

Open source projects and people with strong opinions always find each other. This is a good thing when there’s strong direction and progress, but it’s a really bad thing when there’s a clash of equally valid technical arguments. While the technical charter solves most of the personal view problems, if we end up with a single maintainer, it would not be easy to make sure the charter is a reflection of the main contributors and stakeholders.

MLIR Core and Areas

MLIR has a core builtin dialect and infrastructure, which is used by all dialects, transforms, passes, etc. This core infrastructure needs to cater to the rest of the code, including downstream users, and it’s very important for it to be stable and reliable. We should not have some corner of the design space changing core infrastructure without the knowledge and agreement of the other areas that rely on the same code. The core is in service of and needs to support multiple usage and user journeys. It is our belief that this part of the project is mature and it would benefit from explicitly being treated as such when considering its evolution.

The other areas identified are users of MLIR core, helping raise needs to core, as well as being in service of downstream users. Cross-talk can create an unproductive coupling by forcing agreement between non-overlapping parts of the code. This slows overall progress by preventing faster advancement in areas that are interdependent. The goal is to identify areas of collaboration with concerted groups of interested folks willing to drive and maintain the work in the service of the area’s users. It is our observation, both direct and informed by feedback, that by and large, these other areas are much less mature than the core, and that they need an evolution and project management approach that sets them up for reaching a more mature state.

Main Goals

Starting from increased understanding as part of the Survey, we wish to separate the ownership model into larger areas, in addition to existing code/dialect ownership, in order to design and drive the technical charter that the overall code will follow for each individual area and MLIR as a whole.

Our main goals are:

Establish a technical governance model that represents the actual stakeholders of each part of the code, avoiding cross-talk (see above). Like the current trend in the rest of LLVM, this must be a multi-area (overlapping) multi-governance model, with a clear leadership model geared towards impact (those who work on and are affected the most) in each area. This RFC focuses on that goal.
Reassess, encode and possibly redesign the technical charter of MLIR. After the governance model is in place, we can start updating the design/rationale documents to make sure we cover all the most used dialects in the various areas and make sure the objectives are the same. This will happen after this RFC is agreed and actioned upon.
Set MLIR up for its next five years of growth. Most projects are not successful, and as a community, we are lucky to have had the opportunity to have grown MLIR from a few “wouldn’t it be nice” statements to a toolset that has become synonymous with general purpose compiler infrastructure. As with all human endeavors, though, what worked for a small set of aligned parties sitting next to each other rarely scales on its own to decade level longevity with a much larger and more fragmented set of stakeholders. We believe it is time to plan for broader stewardship of the success we have been granted.

Definitions and Governance Proposal

Defining “Usage Areas”

As exposed by the recent survey, the key areas in MLIR today are:

Core: Builtin dialect and core APIs.
Tensor Compiler and Kernel generation: with paths from ML and HPC frameworks, through tensor level and below (ex., Triton reaches further down), exiting as LLVM dialect + intrinsics, SPIR-V or EmitC.
Language Front-end and Design: Clang, Flang, Julia, and other front-ends lower to their own dialects (outside of the MLIR tree) and then use the low-level dialects in upstream MLIR to lower to LLVM. DSLs can lower straight to MLIR and then to LLVM.
Hardware Design: CIRCT and similar projects, using MLIR for hardware design and simulation, with most of the work done outside of the monorepo.

MLIR hosts many niche components, and we propose bundling those with core governance and design decisions until proven large enough to have its own area upstream (or be re-organized into an existing one). Work in progress dialects and code that live upstream may have a lower cost barrier for change (in-tree evolution), but still need to make sure they do not inadvertently change core code that affects other areas.

Note this is not a complete list of all MLIR usage domains. But in the survey, other areas were all selected together with those above. Of the three answers (out of 88) that were not, two named areas in the HPC/tensor domain, and one did not involve anything close.

Defining “Dialect Groups”

Despite being very different topics, the areas above reuse most of the dialects upstream. As exposed by the survey, the most common dialects across the three areas above are: cf, scf, func, llvm, arith, math, memref, affine, index.

In addition to those, the most common dialects reused in tensor compilers and kernels are: linalg, tensor, vector. Hardware design, front-ends and language design do not have any substantial usage beyond the core ones above.

Note that tensor/memref/vector here define operations, not the types, which continue to be in the built-in dialect. These very old dialects predated the ability to define types and attributes that were not built-in. Different design decisions likely would have been made if the core infra on which they ride was more mature at the point of inception.

Some of those operations are low-level (and used by all) while others are specific to linear algebra workloads, and used mainly by tensor compiler / kernel generator projects. A near future key goal of that sub-group will be to separate the concerns and clean up the dependencies.

A reasonable separation of dialect groups, that would allow the areas to work together would be:

Low-level: the common dialects above, with maintainers from the three areas.
Tensor: linalg, tensor, TOSA, bufferization which are directly related to such workloads.
Core: Builtin dialect, shared infrastructure, and all other dialects that do not belong in the areas above.

Language and hardware design groups already have their own groups outside of MLIR and do not need to have their groups here for now. A discussion around bringing them in the monorepo or extracting tensor into a separate repository is not part of this proposal but it is a natural follow up step.

Dialects should be allowed to move areas, be split or joined, brought in or extracted out, as long as the technical charter allows (i.e., not a fundamental piece of the MLIR ecosystem), the proponents have all alternatives covered, and the maintainers of the affected areas agree.

Note that most dialects implement various interfaces and many transforms operate on those interfaces. While they’re currently bundled together, their design should be guided by the dialects that use them and their group maintainers, which may be different, even in the same header. Future NFC code movement can be performed to make those distinctions clearer.

Changing Technical Governance

Having defined the project areas and dialects grouping above, we need to agree on a technical governance model and put in place the necessary tools to be able to create a technical charter for those groups. The technical charters should be written by dialect grouping in view of the usage areas. Therefore, we need multiple maintainers for each dialect group that belong to the different areas that use them.

This governance is for the technical charter of each group, which will guide the development of individual dialects. Specific dialect ownership isn’t part of this proposal and will continue as is. Once we agree on governance, it’ll be up to each group’s maintainers to decide on dialect ownership, direction and future.

The governance model does not need to be created from scratch, and should follow the clang model proposed and accepted last year.

The key points that we propose for MLIR is the following:

Multiple maintainers: We do not want to have a single maintainer for any high-level part of the compiler, for reasons of availability, reduction of bias and inclusion, and to make sure we actually design for all. We also don’t want a lot of maintainers, or it would not be different than what it is today.
Technical charter: High-level (technical charter) maintainers and dialect maintainers need not overlap. The former should focus on the key drivers behind the largest impact, while the latter focus on day-to-day implementation details, following the technical charter.
Active maintainers: We want these maintainers to rotate with involvement. We should not have inactive maintainers and we should include those who participate actively and have a vested interest to become maintainers themselves.
Overlapping ownership: We want to cross the boundaries of ownership, especially between the core and non-core groups.
No veto power: We want maintainers to be responsible for discussing, agreeing and enforcing the technical charter according to the community’s use of their area, not their personal views. We want the community to challenge such a charter and have the chance of changing it, when demonstrating strong enough arguments to the pool of maintainers, not individual ones. Maintainers should exercise humility, especially when it comes to leveraging the wisdom and perspectives of those who came before or who have superior knowledge and insights about a topic.

We don’t believe there’s a lot of contention on the points above, at least not on its core principles. Implementation may vary, but the idea is to evolve faster than what we have been doing lately by avoiding battles of personal opinions and moving to technical discussions against an agreed charter.

It is an art to define how many people we have and how we rotate ownership. A reasonable initial proposal is to have 3-5 owners per group and to have some overlap between the shared groups (core, low-level). We would eventually integrate this with the overall LLVM ownership model globally.

The key takeaway here is that having active maintainers, driving scope-limited areas to a charter will address several points of recurring feedback:

Inability for production users to determine whether/when to invest: Maintainers provide proactive visibility into where an area can be expected to be over some period of time, and they are expected to be honest about the level of maturity and churn to expect over time.
Spending too much time debating simple problems: Maintainers are go-to people for determining solutions to questions of execution and sequencing, taking pressure off of the RFC process and open ended discussions for operational matters that are expected to resolve without debate.
Difficulty for newcomers to navigate how to make contributions to an area: Make it clear who the go-to people are to ask for advice and feedback about how to scope and make contributions.

Follow Up

Consolidating a Technical Charter

Once we agree on the governance model and select the initial maintainers, these groups can start writing the charter of their areas. The proposal is to reuse most of MLIR’s existing charter and evolve from there, with the key difference that the groups will be able to be more specific in their areas, and perhaps even take different design decisions on the non-overlapping areas of the code, as long as that does not require incompatible changes to core or other areas. For areas which need significant investment to achieve maturity, we expect that the charter must include a roadmap component describing how evolution is expected to proceed over some achievable timeframe.

Updating Infrastructure to Match

There was enough contention on the RFC thread on actually splitting the code that we will not propose this as a solution in this first iteration. But there was also enough discussion on how dependent the parts of MLIR are when building, that we still need to make sure the code is independent and areas can be built without each other.

This would mean we need to create separate libraries for each group and make sure they can be built as a bundle and linked independently, but also together as a big library, without symbol clashes. This is mostly build system maintenance, but it may require some header movement and will need new integration tests to check on every build.

Next Steps

Technical Governance

Dialect Groups

Action: Define which dialects will be part of the group’s charter.

Ultimately, these groups were defined by the breakdown of dialects exposed in the survey with regards to their usage on related projects (tensor, languages, hardware) but with the constraints of how they’re used today.

A draft proposal:

Core
- llvm, complex, dlti, ub, acc, emitc
- transform, pdl, shape
- polynomial, async, mesh, mpi
- sparse_tensor, ub, quant, vcix
- gpu, nv, rocdl, spirv
- arm_, x86, amx
Tensor
- linalg, tensor, TOSA
- bufferization, ml_program
Low-Level
- arith, math, index, ptr
- cf, scf, func, affine, omp
- memref, vector

Note that this isn’t necessarily the best grouping for the dialects, but it’s a start. But this is a discussion beyond the scope of this proposal, which is to set the starting point, not a final goal.

Near future changes will involve handling memref/vector linear algebra portions, creating a sub-charter for the target dialects (CPU, GPU, C, SPIRV), and handling unused dialects.

High-Level Maintainers

Action: Gather stakeholders with a long history and commitment to the MLIR project that have a vested interest in MLIR being successful beyond prototypes and private projects. Select maintainers for the three areas (core, tensor, low-level).

The main criteria here is to represent a group that has concrete roadmaps for implementing upstream technologies and can define, articulate and defend MLIR’s core principles on design decisions and when resolving contentious issues in a way that is acceptable to the community and its values.

The main responsibilities of the high-level maintainers are:

Set the direction for their groups and agree on a high-level roadmap to follow that direction.
Discuss, form consensus and (re)write the technical charter in line with that direction, outlining the technical challenges to overcome from the current state and previous direction.
In technical discussions, defend the charter, not their personal opinions.
Challenge and expect to be challenged on changing the charter, but accept when maintainer consensus is against them.

These people will be responsible for guiding the re-writing of the technical charter for their groups, design interfaces with other groups and decide on the future of the project around their areas. This is not about code style or which attributes to add, but about how dialects fit together, what is the common infrastructure necessary and how other projects (especially LLVM hosted ones) tie into the MLIR story.

They will also not be writing it alone, but guiding the discussions and reviewing the PRs that will change the documents, submitted by the whole community. They will set the vision and charter of the whole project (and its parts), in unison with the dialect directions and the projects that use them.

Dialect/Code Maintainers

Action: Validate and persist the existing dialect and code maintainers into the new ownership model.

These are the people currently working on the dialects and parts of the core code, and should be making decisions based on the general charter. If a dialect cannot work with a high-level charter defined above, then changing the dialect or the high-level charter are equally possible outcomes.

After we agree on the governance model, we need to go through the list of current dialect owners and make sure they’re still active and each dialect is being used by a sizable portion of the community, and avoid incomplete dialects upstream without a clear roadmap.

Escalation Procedure

In other areas of LLVM, the escalation procedure is to involve top-level maintainers. In the same way, dialect / code maintainers can escalate concerns that did not reach local consensus to the high-level group maintainers where their dialects reside.

However, due to the non-hierarchical nature of the MLIR groups defined above, lack of consensus in one group should not be required to appeal to a single top-level maintainer for the whole project. This would violate the basic principle of multi-governance stated above.

We propose to involve all other high-level maintainers from the other groups, who can choose to participate or not. This still limits the number of people that need to be involved to just those who have already committed to maintainership, while allowing any group (including core) to ask for help beyond their own peers.

As a last resort, when we still can’t make decisions after involving all MLIR maintainers, we can rely on the area teams and the governance model for conflict resolution.

@mehdi_amini @River707 @Mogball @Groverkss @matthias-springer @qed @dcaballe @kuhar @bcardosolopes @jeanPerier @javedabsar