How do arguments from a dialect operation get passed to lowered operations? (MLIR Toy tutorial) (original) (raw)
NOTE: I’ve just moved this from the
Beginner
group to theMLIR
group to hopefully try to get more visibility. If I should have kept this inBeginner
please let me know!
For context, I’m working in Ch 5 of the Toy tutorial.
In the tutorial, an example of lowering the Transpose operation is given. I will re-post the example Toy example to be lowered below:
toy.func @main() {
%0 = toy.constant dense<[[1.000000e+00, 2.000000e+00, 3.000000e+00], [4.000000e+00, 5.000000e+00, 6.000000e+00]]> : tensor<2x3xf64>
%2 = toy.transpose(%0 : tensor<2x3xf64>) to tensor<3x2xf64>
%3 = toy.mul %2, %2 : tensor<3x2xf64>
toy.print %3 : tensor<3x2xf64>
toy.return
}
In this example, we see that %0
is the input to the transpose operation.
When toy.constant
gets lowered, I see that the intermediate MLIR representation now looks like this:
"func.func"() ({
%0 = "memref.alloc"() {operand_segment_sizes = array<i32: 0, 0>} : () -> memref<2x3xf64>
%1 = "arith.constant"() {value = 0 : index} : () -> index
%2 = "arith.constant"() {value = 1 : index} : () -> index
%3 = "arith.constant"() {value = 2 : index} : () -> index
%4 = "arith.constant"() {value = 1.000000e+00 : f64} : () -> f64
"affine.store"(%4, %0, %1, %1) {map = affine_map<(d0, d1) -> (d0, d1)>} : (f64, memref<2x3xf64>, index, index) -> ()
%5 = "arith.constant"() {value = 2.000000e+00 : f64} : () -> f64
"affine.store"(%5, %0, %1, %2) {map = affine_map<(d0, d1) -> (d0, d1)>} : (f64, memref<2x3xf64>, index, index) -> ()
%6 = "arith.constant"() {value = 3.000000e+00 : f64} : () -> f64
"affine.store"(%6, %0, %1, %3) {map = affine_map<(d0, d1) -> (d0, d1)>} : (f64, memref<2x3xf64>, index, index) -> ()
%7 = "arith.constant"() {value = 4.000000e+00 : f64} : () -> f64
"affine.store"(%7, %0, %2, %1) {map = affine_map<(d0, d1) -> (d0, d1)>} : (f64, memref<2x3xf64>, index, index) -> ()
%8 = "arith.constant"() {value = 5.000000e+00 : f64} : () -> f64
"affine.store"(%8, %0, %2, %2) {map = affine_map<(d0, d1) -> (d0, d1)>} : (f64, memref<2x3xf64>, index, index) -> ()
%9 = "arith.constant"() {value = 6.000000e+00 : f64} : () -> f64
"affine.store"(%9, %0, %2, %3) {map = affine_map<(d0, d1) -> (d0, d1)>} : (f64, memref<2x3xf64>, index, index) -> ()
%10 = "toy.constant"() {value = dense<[[1.000000e+00, 2.000000e+00, 3.000000e+00], [4.000000e+00, 5.000000e+00, 6.000000e+00]]> : tensor<2x3xf64>} : () -> tensor<2x3xf64>
%11 = "toy.transpose"(%10) : (tensor<2x3xf64>) -> tensor<3x2xf64>
%12 = "toy.mul"(%11, %11) : (tensor<3x2xf64>, tensor<3x2xf64>) -> tensor<3x2xf64>
"toy.print"(%12) : (tensor<3x2xf64>) -> ()
"memref.dealloc"(%0) : (memref<2x3xf64>) -> ()
"toy.return"() : () -> ()
}) {function_type = () -> (), sym_name = "main"} : () -> ()
I note that the important bit here is that %0
now holds the memory location for the 2D array we want to transpose.
When the transpose op gets lowered, we see this intermediate MLIR representation:
"func.func"() ({
%0 = "memref.alloc"() {operand_segment_sizes = array<i32: 0, 0>} : () -> memref<3x2xf64>
%1 = "memref.alloc"() {operand_segment_sizes = array<i32: 0, 0>} : () -> memref<2x3xf64>
%2 = "arith.constant"() {value = 0 : index} : () -> index
%3 = "arith.constant"() {value = 1 : index} : () -> index
%4 = "arith.constant"() {value = 2 : index} : () -> index
%5 = "arith.constant"() {value = 1.000000e+00 : f64} : () -> f64
"affine.store"(%5, %1, %2, %2) {map = affine_map<(d0, d1) -> (d0, d1)>} : (f64, memref<2x3xf64>, index, index) -> ()
%6 = "arith.constant"() {value = 2.000000e+00 : f64} : () -> f64
"affine.store"(%6, %1, %2, %3) {map = affine_map<(d0, d1) -> (d0, d1)>} : (f64, memref<2x3xf64>, index, index) -> ()
%7 = "arith.constant"() {value = 3.000000e+00 : f64} : () -> f64
"affine.store"(%7, %1, %2, %4) {map = affine_map<(d0, d1) -> (d0, d1)>} : (f64, memref<2x3xf64>, index, index) -> ()
%8 = "arith.constant"() {value = 4.000000e+00 : f64} : () -> f64
"affine.store"(%8, %1, %3, %2) {map = affine_map<(d0, d1) -> (d0, d1)>} : (f64, memref<2x3xf64>, index, index) -> ()
%9 = "arith.constant"() {value = 5.000000e+00 : f64} : () -> f64
"affine.store"(%9, %1, %3, %3) {map = affine_map<(d0, d1) -> (d0, d1)>} : (f64, memref<2x3xf64>, index, index) -> ()
%10 = "arith.constant"() {value = 6.000000e+00 : f64} : () -> f64
"affine.store"(%10, %1, %3, %4) {map = affine_map<(d0, d1) -> (d0, d1)>} : (f64, memref<2x3xf64>, index, index) -> ()
%11 = "toy.constant"() {value = dense<[[1.000000e+00, 2.000000e+00, 3.000000e+00], [4.000000e+00, 5.000000e+00, 6.000000e+00]]> : tensor<2x3xf64>} : () -> tensor<2x3xf64>
"affine.for"() ({
^bb0(%arg0: index):
"affine.for"() ({
^bb0(%arg1: index):
%14 = "affine.load"(%1, %arg1, %arg0) {map = affine_map<(d0, d1) -> (d0, d1)>} : (memref<2x3xf64>, index, index) -> f64
"affine.store"(%14, %0, %arg0, %arg1) {map = affine_map<(d0, d1) -> (d0, d1)>} : (f64, memref<3x2xf64>, index, index) -> ()
"affine.yield"() : () -> ()
}) {lower_bound = affine_map<() -> (0)>, step = 1 : index, upper_bound = affine_map<() -> (2)>} : () -> ()
"affine.yield"() : () -> ()
}) {lower_bound = affine_map<() -> (0)>, step = 1 : index, upper_bound = affine_map<() -> (3)>} : () -> ()
%12 = "toy.transpose"(%11) : (tensor<2x3xf64>) -> tensor<3x2xf64>
%13 = "toy.mul"(%12, %12) : (tensor<3x2xf64>, tensor<3x2xf64>) -> tensor<3x2xf64>
"toy.print"(%13) : (tensor<3x2xf64>) -> ()
"memref.dealloc"(%1) : (memref<2x3xf64>) -> ()
"memref.dealloc"(%0) : (memref<3x2xf64>) -> ()
"toy.return"() : () -> ()
}) {function_type = () -> (), sym_name = "main"} : () -> ()
Here, I note that %0
now holds the memory location for the result of the transpose operation, and %1
holds the memory location of the original matrix. That’s fine, because I know that the memory allocation for the result of the transpose op gets inserted at the head of the basic block.
But, I note that in the lowered transpose operation, the loop nest knows to perform the affine_load
operation on the memory location in %1
, and store it in %0
. My main question is, how does the MLIR pass know that the input to the transpose operation is %1
? Before the transpose op, the printed MLIR assembly says that the argument for the transpose op should be in %10
. So how did it know to look for the input in %1
?
I am still very new to compiler theory and the LLVM/MLIR framework, so I may not understand a compiler theory-heavy explanation. (I’m also not sure that one is needed, but I feel the need to put this disclaimer out there.)
Thanks for your help!