Seeking Guidance on Executing MLIR Code with GPU Dialect on GPU (original) (raw)
Hi all,
I am currently working with MLIR and using the GPU dialect in my project.
I have written MLIR code that utilizes the GPU dialect, but I am unsure how to execute it on a GPU. The MLIR code does squaring of a vector (I wrote this just to learn how to use GPU dialects, playing with blocks and threads)
Could someone kindly guide me on the following:
- How do I execute my MLIR code on GPU
- Are there any specific tools or configurations I need to set up in order to enable this?
- The MLIR code is provided below. Please can anyone say whether the code I have written is the correct way of using GPU dialects
The MLIR code that I have is :
module {
func.func private @dslran_print_i64(i64) attributes {llvm.emit_c_interface}
func.func @main() {
%0 = arith.constant 24 : i64
%alloc = memref.alloc() : memref<24xi64>
affine.for %i = 0 to 24 {
%1 = arith.constant 4 : i64
affine.store %1, %alloc[%i] : memref<24xi64>
}
%gpu_alloc = gpu.alloc() : memref<24xi64>
gpu.memcpy %alloc, %gpu_alloc : memref<24xi64>, memref<24xi64>
%blcks = arith.constant 1 : index
%thrds = arith.constant 24 : index
gpu.launch blocks(%arg1, %arg2, %arg3) in (%sz_x = %blcks, %sz_y = %blcks, %sz_z = %blcks) threads(%arg4, %arg5, %arg6) in (%tx = %thrds, %ty = %blcks, %tz = %blcks) {
%i = gpu.block_id x
%elem = memref.load %gpu_alloc[%i] : memref<24xi64>
%result = arith.muli %elem, %elem : i64
memref.store %result, %gpu_alloc[%i] : memref<24xi64>
gpu.terminator
}
gpu.memcpy %gpu_alloc, %alloc : memref<24xi64>, memref<24xi64>
affine.for %i = 0 to 24 {
%val = affine.load %alloc[%i] : memref<24xi64>
func.call @dslran_print_i64(%val) : (i64) -> ()
}
memref.dealloc %alloc : memref<24xi64>
gpu.dealloc %gpu_alloc : memref<24xi64>
return
}
}
Any help or pointers would be greatly appreciated.
Thank you in advance for your time and assistance!
Best regards
Take a look at mlir/test/Integration/GPU for examples on how to run stuff; specifically these lines at the tops of the tests:
mlir-opt %s \
| mlir-opt -gpu-lower-to-nvvm-pipeline="cubin-format=fatbin" \
| mlir-runner \
--shared-libs=%mlir_cuda_runtime \
--shared-libs=%mlir_runner_utils \
--entry-point-result=void \
Note, you will need to have built the “runtime wrappers/utils” for whatever GPU you’re trying to execute on (see the various MLIR_ENABLE_*_RUNNER
at mlir/lib/ExecutionEngine/CMakeLists.txt).
Re your snippet: looks about right but I can’t be sure - easiest way to find out is to try .
lilil March 28, 2025, 7:52am 3
As mentioned in the previous reply, you can find many examples executed using mlir-runner
in the test
directory.
In addition to that, you can also generate an executable through static compilation. Specifically, you need to lower the MLIR representation to the LLVM dialect, then translate it to LLVM IR. After that, use llc
to generate machine code and finally link the dynamic libraries with clang
to produce an executable. You should be able to find detailed descriptions of this process by searching for relevant keywords in the community posts.