[GSoC 2024] Discussion - Automatically Generate TableGen File for SPIR-V Instruction Set (original) (raw)
February 3, 2024, 7:57pm 1
Hey LLVM Community,
[Mentors]
@chouinard @Keenuts
[Project]
“Automatically Generate TableGen File for SPIR-V Instruction Set”
I am interested in working on this project and would like to discuss by sharing my understanding and thought process considering the given timeline. The following posts would be incremental updates for the same to build up to the final proposal.
I have had a look at the initial learning steps and understanding as in the project post that includes useful information about the resources and specification for this project.
Clarifying TableGen And Its Working
From what I understand, we are aiming to build a script that would be automated for the SPIR-V Instruction Set and generates a TableGen File which being a DSL allows to maintain records in a sophisticated manner for complex structures.
Example (Arm’s AArch64):
$ cat register.td
class Register<int _size, string _alias=""> {
int size = _size;
string alias = _alias;
}
I believe the primary compiler that generates the IR is the llvm-tblgen and the instructions itself aren’t generated, rather the output files in the JSON format are seen that would be implemented in the backend using a language (like C++). In terms of the use-case LLVM MLIR also embraces TableGen for its ongoing purpose of optimising compiler infrastructure.
This handles all sorts of machine instruction sets and its various processes including instruction selection, assembling and disassembling where clang would do the frontend to parse and evaluate various expressions to feed it to the backend where actual record definitions exist.
SPIR-V Instruction Set
I believe this set itself is more of a specific part/target of the project and would apply to any other instruction set as well with only the syntax and few other generalisations for its different components and design principles.
If I read correctly, this is about graphics related shaders and its practical purpose in games and is a binary intermediate representation for graphical-shader stages and compute kernels similar to Vulkan.
JSON Grammar to TableGen
As in the SPIRV-Headers repository that contains headers for the instruction set, the JSON file would be generated and this is where the script would come into play where we would want to appropriately use the new definitions in the backend.
Regarding the translation, I think if I understood it right, just like we would break down parts of code when building a simple interpreter, we could either use a tree structure or a list like structure, of which I feel the latter would be a little faster to implement and efficient in a dynamic language. We can have an eval method that would also have an environment that would help store the records and classify each type of expression like simple variables, conditional statements and flow-based instructions as well.
This is where the the SPIRV documentation would come into play where we have a set of rules and definitions according to which we would build the script to translate the grammar into the desire format of the tablegen file.
MLIR Utils For SPIRV Dialect
I had a look into the gen_dialect utilities for this instruction set specifically where there’s a script for the op-code as mentioned in the previous post.
Here, I see that list like approach is being used for breaking down into sub-lists and is further being considered in chunks due to certain limitations I believe. After collecting the nodes, we are using a standard topological sort algorithm for the right ordering that is a directed-acyclic-graph (DAG) for transformation of basic blocks.
DAG Code:
def get_next_batch(dag):
while True:
no_prev_nodes = set(node for node, prev in dag.items() if not prev)
if not no_prev_nodes:
break
yield sorted(no_prev_nodes, key=sort_fn)
dag = {
node: (prev - no_prev_nodes)
for node, prev in dag.items()
if node not in no_prev_nodes
}
assert not dag, "found cyclic dependency"
sorted_nodes = []
for batch in get_next_batch(dag):
sorted_nodes.extend(batch)
^^^^
Post DFS...
Conclusion
This is what I comprehend till this stage and would love to hear from the mentors so I can move further into for the next steps. I want to make sure I am on the right track with respect to the goals and understanding for this project before moving further and digging deep.
Thanks
– Rajveer
[GitHub - Rajveer100]
Thanks for your interest in the project @Rajveer100! I think there are a few points above that need clarifying.
I believe the primary compiler that generates the IR is the llvm-tblgen and the instructions itself aren’t generated, rather the output files in the JSON format are seen that would be implemented in the backend using a language (like C++ ).
This project would be using a JSON input file from SPIRV-Headers, not generating any JSON output. The output of the desired script would be a Tablegen file which would then be processed by llvm-tblgen
to produce C++.
SPIR-V Instruction Set
I believe this set itself is more of a specific part/target of the project and would apply to any other instruction set as well with only the syntax and few other generalisations for its different components and design principles.
This project is quite focused on the SPIR-V backend alone, and the script would need to be specifically written to parse the SPIR-V grammar, so it would not be generalizable to other instruction sets.
Hope that helps!
That was indeed helpful, do you recommend anything else from your end for progress on this?
While I am researching more about it!
If you have any specific questions about the project feel free to post them here, but it’s still very early in the process, so I would just suggest keeping an eye on the Google Summer of Code site for next steps once contributor applications open.
Regarding implementing the backend, C++/Python seem to be the options available, since there’s already a .py script for spirv_dialect, what do you suggest for this?
If I understood it right, this would be the flow for both cases:
C++ -
TableGen Source → llvm-tblgen → Backend (within llvm-tblgen) → Results
Python -
TableGen Source → llvm-tblgen → JSON → Python → Results
Is this the type of source we are looking to consider for translation, as the attributes match the ones on the SPIRV specifications docs? (I see many different grammar files here.)
And what do the other types of various files in include/spirv
mean here (ex. .cs, .py, .h, etc)?
I think there’s still some confusion here. The input file will be a SPIR-V grammar file in JSON, which should be processed by a script (could be written in python, for example) to produce a TableGen (.td) file. This would be a manually executed script, and the resulting TableGen file would be committed into the source tree. Then during the LLVM build, that TableGen file will be run through llvm-tblgen to generate C++ source.
We could have a script something similar to this:
import json
def generate_tablegen_instruction(instruction):
opcode = instruction['opcode']
operands = instruction['operands']
# Extract operand names and types
operand_names = [operand['name'] for operand in operands]
operand_types = [operand['type'] for operand in operands]
# Generate TableGen code
tablegen_code = f"""
def {opcode} : SPIRV_Inst<0x{opcode}, "SPIRV_{opcode}"> {{
...
}}
"""
return tablegen_code
def generate_tablegen_from_json(json_data):
tablegen_code = ""
for instruction in json_data['instructions']:
tablegen_code += generate_tablegen_instruction(instruction)
return tablegen_code
# Read JSON data from file
with open('spirv_grammar.json', 'r') as file:
json_data = json.load(file)
# Generate TableGen code
tablegen_code = generate_tablegen_from_json(json_data)
# Write the generated code to a TableGen file
with open('spirv_instructions.td', 'w') as file:
file.write(tablegen_code)
We can utilise the already available JSON-parsing library to decode the SPIR-V grammar to extract information and handle the relevant Op-Types.