[ClangIR][GSoC2025] Validate existing Clang CodeGen test coverage with ClangIR (original) (raw)

February 7, 2025, 2:07am 1

Description

The ClangIR (CIR) project aims to establish a new intermediate representation (IR) for Clang. Built on top of MLIR, it provides a dialect for C/C++ based languages in Clang, and the necessary infrastructure to emit it from the Clang AST, as well as a lowering path to the LLVM-IR dialect. ClangIR upstreaming is currently in progress.

In order to give community more frequent updates it’d be great if we can report ClangIR progress by measuring the coverage of existing Clang’s CodeGen tests in face of a ClangIR enabled pipeline. By collecting information on crashing, passing or failing tests we can come up with a metric that is easier to report and understand, provide entry points for newcomers looking for tasks and help the project by classifying existing issues. Existing Clang CodeGen tests live in clang/test/CodeGen* and can be found in different states of ClangIR support:

FileCheck fails. LLVM IR builds but FileCheck fails to match output.
- LLVM IR differs because ClangIR pipeline is emitting different IR (e.g. different instructions are used, missing attributes). Issues need to be created and ClangIR needs to be fixed.
- LLVM IR differs because CHECK lines need be made more flexible (LLVM-IR dialect output is different, SSA value names, order of attributes, etc). It’s possible a tool like llvm-canon might be of good use here.
Test crash / error. ClangIR doesn’t support some C/C++ construct or LLVM lowering hasn’t been implemented.
Test pass. Yay!

In order to retrieve the information above, the student needs to make changes to Clang’s testing infra (LIT configs, scripts, tests, ???) such that it’s easier to replay the same invocations with ClangIR enabled, compare against traditional pipeline result or retrieve special directives from tests. It’s not clear what is the best methodology just yet, but it’s expected that submitted proposals that want to be taken seriously should present few possible ideas on how to achieve this, prior discussion with other members of the community is encouraged. The student is also expected to interact with the ClangIR community, file github issues, investigate and/or make changes to failing codegen tests.

Expected result

Build the infrastructure to run tests and collect results.
Present the results in a way that is easy to display on the web.
File issues or change check lines for 50% of the “FileCheck fails” category above. The only CodeGen tests directories that need consideration at this time:

clang/test/CodeGen
clang/test/CodeGenCXX
clang/test/CodeGenOpenCL
clang/test/CodeGenCUDA

Bonus points: find ways to automate/facilitate changes to tests, put PRs to fix problems in ClangIR.

Requirements

Skills: Python, intermediate C++ programming skills and familiarity with how to use a compiler as a power user are required. Prior experience with LLVM IR, MLIR, Clang or ClangIR programming is a big plus, but not required.
Project size: Large
Difficulty: Medium
Mentors: Bruno Cardoso Lopes, Andy Kaylor

Hello,
I’m really interested and would love to contribute, especially now that i have spent some time learning MLIR. I have started looking into the codebase to get familiar.

Hey @bcardosolopes, My name is Ayokunle Amodu and I was fortunate to get a compiler optimization research gig with a professor last summer. I also took a compiler course last fall where I worked on an MLIR backend (codegen) for a vector/matrix oriented DSL. It was tough but I enjoyed every bit of it. Currently, I’ve been focusing on fixing issues related to crashes within the linalg dialect and have made some contributions toward that, but I’m ready to branch out. I would really love the opportunity to get my hands really dirty with this project and work on any preliminary tasks you can provide.

Cool, thanks for the reply (you too shrikardongre). We have a list of good first issues in GitHub · Where software is built, and also happy to help in clangir on discord

Hello @bcardosolopes , I am a compilers TA and C++ developer and I’m really interested in this project. I have been studying LLVM and I’m giving a better look at ClangIR!

Hi @bcardosolopes!

As a quick self-intro, I am senior CS-major and am currently taking a compilers class, where, along with 3 other students, we are building a decaf compiler (Decaf being a subset of C). Aside from that, I have also taken a software performance engineering class where I got introduced to LLVMIR.

I have not yet made any open source contributions, but I’m hoping that changes this summer.

Additionally, I have plenty of experience with Python, C/C++, Java and Go.

I know the deadline for GSoC is fairly close, but I wanted to ask if it is still possible to submit a feasible proposal and discuss it.

Please let me know whenever you get a chance!

Thank you!