[LLVM] Automatically generate TableGen file for SPIR-V instruction set (original) (raw)

January 18, 2024, 4:21pm 1

Description: The existing file that describes the SPIR-V instruction set in LLVM was manually created and is not always complete or up to date. Whenever new instructions need to be added to the SPIR-V backend, the file must be amended. In addition, since it is not created in a systematic way, there are often slight discrepancies between how an instruction is described in the SPIR-V spec and how it is declared in the TableGen file. Since SPIR-V backend developers often use the spec as a reference when developing new features, having a consistent mapping between the specification and TableGen records will ease development. This project proposes creating a script capable of generating a complete TableGen file that describes the SPIR-V instruction set given the JSON grammar available in the KhronosGroup/SPIRV-Headers repository, and updating SPIR-V backend code to use the new definitions. The specific method used for translating the JSON grammar to TableGen is left up to the discretion of the applicant, however, it should be checked into the LLVM repository with well-documented instructions to replicate the translation process so that future maintainers will be able to regenerate the file when the grammar changes. Note that the grammar itself should remain out-of-tree in its existing separate repository.

Expected outcomes:

The SPIR-V instruction set’s definition in TableGen is replaced with one that is autogenerated.
A script and documentation are written that support regenerating the definitions as needed given the JSON grammar of the SPIR-V instruction set.

Mentors: Natalie Chouinard, Nathan Gauër

Skills: Previous experience with TableGen is a bonus but not required.

Size: Medium (175 hour)

kuhar January 18, 2024, 4:50pm 2

Note that we have something like this for the SPIR-V dialect op definitions in MLIR: llvm-project/mlir/utils/spirv at main · llvm/llvm-project · GitHub

moste00 January 19, 2024, 2:04pm 3

Hello!

I’m interested in this project, but I don’t have any prior experience in either TableGen or LLVM. Is there a ready list of prerequisites resources/action items I should start checking out?

A little bit about me:

I’m a recent graduate from Cairo University, graduated with excellence with honors
I studied Compilers and Formal Languages in my senior years, but I was extremly interested in Compilers and Virtual Machines from way before that, since my first year of university
I read Robert Nystrom’s Crafting Interpreters in my second year of university and I participated in GSOC 2023 @ truffleruby, my project was about improving the built-in hash table at the Ruby VM level [GSoC] Use a compact hash table for RubyHash instead of the buckets strategy by moste00 · Pull Request #3172 · oracle/truffleruby (github.com)
As for programming languages I’m familiar with, they include C, C++, Java, Python, Ruby, TCL, Forth and plenty of others. I’m a passionate programming language enthusiast after all

Cheers

Hello!

I am a junior year undergrad in Computer Science from India, with good background in compilers and automata, and a good understanding of the theoretical foundations of both subjects.

This project is very interesting, and I have some experience working with JSON data and scripting in a number of languages including shell and Python.

I have worked with LLVM before and am familiar with the backend and its source code. I also contributed bug fixes to FileCheck.

I have read thoroughly about TableGen files and went through the corresponding source in the master branch.

Is there any other resource I should check out to become more familiar with TableGen and SPIR-V? It would be great to learn more!

Thanks a lot!

Thank you for your interest in the project! Anyone looking to do some additional reading may find the following resources helpful:

TableGen Overview
TableGen Programmer’s Reference
SPIR-V Specification: Not intended to be read end-to-end, but at least the 1. Introduction and 2.5 Instructions Sections would provide helpful context.

And @kuhar’s link to the MLIR implementation is a good reference example.

Hello, @chouinard, I went through the resources. I now have a fairly good understanding of what needs to be done. Can you please point me toward some open issues solving whom would help in implementing this feature?

@Sh0g0-1758 We have a few open bugs related to the SPIR-V backend tracked under the backend:SPIR-V label. However, I wouldn’t necessarily expect those to be the best starting place for a new contributor. There are some project-wide good first issues that would be a better entry point to contributing to LLVM in general, even if they’re not specific to this project.

@chouinard the task would require a hand-rolled parser for the JSON grammar, if I am not wrong?

I have been studying this topic thoroughly, and I think a hand-rolled index-overlay parser would fare the best for this task. Once the parser has been implemented, we can simply write a glue shell script that runs the parser over the grammar, yields the required results, and then passes them onto a python script that will generate the TableGen File.

Index-overlay parsing would in my opinion be faster than other kinds of parsers I can think of, including other kinds of random-access parsers too(the main overhead of the whole process is the parsing step).

This would also allow incremental parsing when grammar files change, without the need to re-parse the whole grammar file.

Your thoughts on this?

Thanks!

chouinard February 15, 2024, 5:55pm 9

A hand-rolled index-overlay parser is probably overkill for this project, since the dataset should fit in memory and it’s perfectly acceptable to re-parse the whole grammar file each time it needs to be regenerated. If a standard library in your scripting language of choice makes this easy to implement though, then it could make sense here. Since this script is only going to be manually and infrequently run, we want to prioritize simplicity and maintainability over maximizing performance.

@chouinard Thanks for replying!

I am sure that I could fit a working index-overlay parser for JSON in less than around 3k lines of code in Python including the tokenizer and most helper functions(I have been trying to implement a bare bones one all day today, with a fair amount of success!).

Parsing a well structured JSON file like the one that the headers provide is not very tedious nor time-consuming, but wouldn’t a well documented hand-written parser be easier to maintain and extend than using REs or libraries that could change?

I think the glue shell script can handle much of the generation once the parser is complete, so there is no unneeded complexity within the parser script itself.

It would be great to hear your opinion on this, and whether I should continue reading about it or the idea is totally bonkers and I should think of something else!

Thanks!

If index-overlay does turn out to be overkill, would recursive-descent fulfil the purpose? JSON has a nice grammar suited to top-down parsing. (I am biased towards a hand-rolled parser , but your opinion does have the veto here!).

The only task would then be to validate the AST produced, but I am sure that this can be done through proper testing beforehand.

kuhar February 16, 2024, 10:32am 12

That does seem right!
I read through the whole implementation that you provide, and parsing there seems to be a trivial precursor to the actual automation process.

I have also looked through the json module in python and it does appear to be functional enough(it’s biggest flex being that it already in use!)

Reading through the links also has given me great insight into how such an automation framework should be implemented, so in the meantime, I think it would be more prudent to direct my energy towards improvising and coming up with ideas rather than getting stuck on parsing!

Thanks, @kuhar !

@kuhar , I went through the implementation you gave here :

It seems that the project wants just this to be implemented, except that we have to automate generating tablegen files for the same. Looking at this function :

This is getting the grammar from the KhronosGroup SPIRV header files. Which is the first part. For the second part, a good understanding of tablegen is must for which I think the docs are more than enough.

@chouinard While I am working on a rough implementation of the same, I wanted to confirm whether you expect a PR for the same before the proposal or does a detailed proposal with workflow would suffice?

Thanks

We do not expect a PR with any of the functionality already implemented, but the proposal should include a detailed design with a timeline and milestones. There’s more information about how to write a proposal at the following links:

@chouinard, I have DM’d you my draft proposal for this project. 2 things :

I can’t seem to get hold of the other mentor’s discourse uname, so was unable to message him my draft proposal.
Since It’s better to get constructive feedback from both mentors, should I post the draft proposal
here and move the official discussion here?

[EDIT]: I have created a thread that includes both the mentors for proposal discussion.

kuhar March 7, 2024, 9:46pm 17

Nathan’s handle is @Keenuts

Right, thanks a lot. I will drop him the draft.

Keenuts March 13, 2024, 10:57am 19

Hi all!

If you have questions about the project, or want to share a proposal upfront, don’t hesitate. We won’t be super fast on reviews, but we can give some feedback, or help course correct if something is misunderstood

In case that helps, here is a proposal I wrote when as a Student for another GSOC project: https://studiopixl.com/assets/GSOC-2018-proposal-sample.pdf
It was a different project, but maybe it can shed a light on what a proposal can look like.
Note: this is shared as it was in 2018 (minus redacted PII), hence it it not a perfect example, and is probably full of mistakes and things to improve, but it allowed me participate so I suppose it was good enough

Da-Viper May 12, 2024, 10:51am 20

The SPIR-V instruction set’s definition in TableGen is replaced with one that is autogenerated.

Hello could you provide a link to the current manually generated TableGen file in the repository.