Are there components in MLIR for analyzing GPU kernel dependencies and scheduling? (original) (raw)

I’m looking for information about whether MLIR or any public downstream projects have implemented components to help analyze GPU kernels (specifically gpu.launch_func) for dependency management and scheduling.

I’ve noticed that while MLIR has implemented support for the asynchronous mode in the GPU dialect and token allocation through low-level calls to cuStreamCreate to create different streams, there doesn’t seem to be an automatic way to schedule these asynchronous tokens for more efficient execution.

If such functionality doesn’t exist, my initial idea is to abstract different parts of the IR as nodes, maintain a directed graph based on dependency relationships, and then use topological sorting to assign numbers to each node. Nodes with the same topological number could effectively run in parallel, allowing for allocation of different asynchronous tokens and creation of multiple streams for parallel execution. However, since not all loop bodies will be converted to gpu.launch_func operations in the final asynchronous token allocation, additional handling for non-GPU operations may be needed, which seems potentially challenging.

I’d appreciate any references to existing dependency analysis implementations that could be helpful for this approach.