MLIR Tensor Compiler Design Group (original) (raw)

Proposal

Following from the initial proposal, the survey and results, and the final proposal, this is the first step towards implementing design groups in MLIR to bring the technical charter for guiding roadmaps and implementation details.

This particular proposal is for the Tensor Compiler design group, as referenced by the final proposal above.

Role

The role of this group is to consolidate a technical charter for the tensor area dialects, interfaces, transforms and general surrounding infrastructure, compatible with the rest of MLIR.

First, it needs to define the scope by agreeing on a short list of major directions we’re going, for example:

Upstream shared values (“canonical” pipelines and forms, dialect semantics)
Downstream usage + dialect extension (MLIR based tensor compilers)
CPU/GPU/device code-gen vs. micro-kernels, etc. (tensor/memref/vector transforms and semantics, type system)
Common infrastructure (interfaces, matchers, rewrites, attributes)
Research directions (dynamic schedules, composable transforms, stable public APIs)

Agree and document a representational roadmap for each direction above, to understand how they overlap, build upon or get in the way, on each other’s contributions. This is not the charter. This is to build a common understanding of what people use MLIR for and how to build a common upstream infrastructure to support them.

Second, we take stock of what we have, reevaluate the charter documents, make sure we’re still in line with the roadmaps above and how we can solve the technical disagreements between them to reach a unified direction, with a strong upstream model, making clear what the downstream cost is for each party.

This is the “where we have been” part of the charter. It consolidates state but also clarifies the rationale behind the dissonant arguments that we’re having recently. Hopefully by then we’d start having much more fruitful discussions and effective changes.

Here we work upstream and downstream to implement the vision and continue writing the charter. In time, we should have enough direction to write the “where are we going” part. I don’t think we should do that before we have agreed on where we are.

Third, we identify the critical pieces of infrastructure missing to make MLIR more malleable to distributed usage (not just downstream, but also other upstream projects). For example:

A way to compose and extend off-tree dialects without requiring a particular hash of LLVM. This creates an ecosystem outside of the monorepo, helps build momentum before going upstream and reduces the need to go upstream at all for most dialects.
Missing coverage in tests, documentation, semantics definitions, type system requirements, etc. that make it easier for dialect designers to know the bounds that they need to adhere to for minimum functionality expectations.
More rigorous definition of canonicalization and transformation requirements, in view of the expected shapes and transformations, in a way that does not force everyone to use a particular form, or at least make that form generic and powerful enough that can be widely used.

That’s to help reduce the cost of needing a charter, which I’m expecting to be large, complex and still not completely unified. It’s a meta discussion to refine the charter, but one that can only happen after we know where we are, where we’re going and we all generally agree on the tasks needed to get there, upstream and downstream.

That’s the time we start converging into the actual technical charter that we can use for making principled choices.

People

Looking at the recent merges into linalg, tensor and vector that were not NFC, revert, typo fix or “one off”, here are the recurring contributors in the “Tensor Compiler” side of the equation:

@banach-space @MaheshRavishankar @javedabsar @rolfmorel @mshahid @Groverkss @matthias-springer @jpienaar @ftynse @kuhar @asiemien @hanchung @qed @krzysz00 @dcaballe @kurapov-peter @Hardcode84

(Note: even though the vector dialect was somewhat in between tensor and low-level groups, most of the contributions to it are from the tensor side, so I’m considering that as at least an indicator of tensor compiler contributions).

If my count is correct, we have 1 Arm, 7 AMD, 1 Qualcomm, 4 Intel, 2 Nvidia, 1 Google, 1 Independent. Not a bad distribution.

Also, we want people that have been involved in design, not just implementation. Looking at the forum posts, @banach-space @MaheshRavishankar @javedabsar @rolfmorel @matthias-springer @jpienaar @Groverkss @kuhar @dcaballe @ftynse @qed and myself are recurring users.

I don’t want to limit or volunteer people, I’m just listing based on upstream involvement that I see (which is biased). Some folks above may not want (or be able) to participate, others may be more suitable for this role.

Somehow, we need to find a good initial balance and start the process. Doesn’t have to be perfect or static, people can come and go, but we need critical mass, or this won’t work. I would try to keep at least 5 people with the intention to get through the year and consolidate a reasonable draft of the charter.

Happy to take proposals on how to select the team.

Next Steps

Step 1 is selecting how many people and who will be part of the design group. I don’t want to set limits here, and I think we should all agree on something and move on. The only constraint I’d put is to try to balance as much as possible on company / group representation.

Step 2 is the creation of sub-channels in Discourse and Discord, to minimize disruption into the rest of MLIR, and agreement on a recurring design meeting.

Step 3 is to discuss needs and tasks and collect volunteers for those. The output should be RFCs into the forum, PRs into documents and code that will make our life easier when reaching for the roadmaps and charters. These should be documented in a new section of the MLIR docs, and potentially move or deprecate old documentation, pointing to the new pages.

Step 4 is to perform the roles listed above and start working on the common infrastructure upstream.

This is 100% public work, and the main difference in selecting a few people is that they’ll be responsible for making it happen. Once agreed, the charter becomes the driving force behind the changes, not the people that are driving it.

Thank You!

Finally, thank you everyone who participated. This was not easy but it was necessary. More importantly, thank you in advance, because the work has just begun.