How do arguments from a dialect operation get passed to lowered operations? (MLIR Toy tutorial) (original) (raw)

NOTE: I’ve just moved this from the Beginner group to the MLIR group to hopefully try to get more visibility. If I should have kept this in Beginner please let me know!

For context, I’m working in Ch 5 of the Toy tutorial.

In the tutorial, an example of lowering the Transpose operation is given. I will re-post the example Toy example to be lowered below:

toy.func @main() {
  %0 = toy.constant dense<[[1.000000e+00, 2.000000e+00, 3.000000e+00], [4.000000e+00, 5.000000e+00, 6.000000e+00]]> : tensor<2x3xf64>
  %2 = toy.transpose(%0 : tensor<2x3xf64>) to tensor<3x2xf64>
  %3 = toy.mul %2, %2 : tensor<3x2xf64>
  toy.print %3 : tensor<3x2xf64>
  toy.return
}

In this example, we see that %0 is the input to the transpose operation.

When toy.constant gets lowered, I see that the intermediate MLIR representation now looks like this:

"func.func"() ({
  %0 = "memref.alloc"() {operand_segment_sizes = array<i32: 0, 0>} : () -> memref<2x3xf64>
  %1 = "arith.constant"() {value = 0 : index} : () -> index
  %2 = "arith.constant"() {value = 1 : index} : () -> index
  %3 = "arith.constant"() {value = 2 : index} : () -> index
  %4 = "arith.constant"() {value = 1.000000e+00 : f64} : () -> f64
  "affine.store"(%4, %0, %1, %1) {map = affine_map<(d0, d1) -> (d0, d1)>} : (f64, memref<2x3xf64>, index, index) -> ()
  %5 = "arith.constant"() {value = 2.000000e+00 : f64} : () -> f64
  "affine.store"(%5, %0, %1, %2) {map = affine_map<(d0, d1) -> (d0, d1)>} : (f64, memref<2x3xf64>, index, index) -> ()
  %6 = "arith.constant"() {value = 3.000000e+00 : f64} : () -> f64
  "affine.store"(%6, %0, %1, %3) {map = affine_map<(d0, d1) -> (d0, d1)>} : (f64, memref<2x3xf64>, index, index) -> ()
  %7 = "arith.constant"() {value = 4.000000e+00 : f64} : () -> f64
  "affine.store"(%7, %0, %2, %1) {map = affine_map<(d0, d1) -> (d0, d1)>} : (f64, memref<2x3xf64>, index, index) -> ()
  %8 = "arith.constant"() {value = 5.000000e+00 : f64} : () -> f64
  "affine.store"(%8, %0, %2, %2) {map = affine_map<(d0, d1) -> (d0, d1)>} : (f64, memref<2x3xf64>, index, index) -> ()
  %9 = "arith.constant"() {value = 6.000000e+00 : f64} : () -> f64
  "affine.store"(%9, %0, %2, %3) {map = affine_map<(d0, d1) -> (d0, d1)>} : (f64, memref<2x3xf64>, index, index) -> ()
  %10 = "toy.constant"() {value = dense<[[1.000000e+00, 2.000000e+00, 3.000000e+00], [4.000000e+00, 5.000000e+00, 6.000000e+00]]> : tensor<2x3xf64>} : () -> tensor<2x3xf64>
  %11 = "toy.transpose"(%10) : (tensor<2x3xf64>) -> tensor<3x2xf64>
  %12 = "toy.mul"(%11, %11) : (tensor<3x2xf64>, tensor<3x2xf64>) -> tensor<3x2xf64>
  "toy.print"(%12) : (tensor<3x2xf64>) -> ()
  "memref.dealloc"(%0) : (memref<2x3xf64>) -> ()
  "toy.return"() : () -> ()
}) {function_type = () -> (), sym_name = "main"} : () -> ()

I note that the important bit here is that %0 now holds the memory location for the 2D array we want to transpose.

When the transpose op gets lowered, we see this intermediate MLIR representation:

"func.func"() ({
  %0 = "memref.alloc"() {operand_segment_sizes = array<i32: 0, 0>} : () -> memref<3x2xf64>
  %1 = "memref.alloc"() {operand_segment_sizes = array<i32: 0, 0>} : () -> memref<2x3xf64>
  %2 = "arith.constant"() {value = 0 : index} : () -> index
  %3 = "arith.constant"() {value = 1 : index} : () -> index
  %4 = "arith.constant"() {value = 2 : index} : () -> index
  %5 = "arith.constant"() {value = 1.000000e+00 : f64} : () -> f64
  "affine.store"(%5, %1, %2, %2) {map = affine_map<(d0, d1) -> (d0, d1)>} : (f64, memref<2x3xf64>, index, index) -> ()
  %6 = "arith.constant"() {value = 2.000000e+00 : f64} : () -> f64
  "affine.store"(%6, %1, %2, %3) {map = affine_map<(d0, d1) -> (d0, d1)>} : (f64, memref<2x3xf64>, index, index) -> ()
  %7 = "arith.constant"() {value = 3.000000e+00 : f64} : () -> f64
  "affine.store"(%7, %1, %2, %4) {map = affine_map<(d0, d1) -> (d0, d1)>} : (f64, memref<2x3xf64>, index, index) -> ()
  %8 = "arith.constant"() {value = 4.000000e+00 : f64} : () -> f64
  "affine.store"(%8, %1, %3, %2) {map = affine_map<(d0, d1) -> (d0, d1)>} : (f64, memref<2x3xf64>, index, index) -> ()
  %9 = "arith.constant"() {value = 5.000000e+00 : f64} : () -> f64
  "affine.store"(%9, %1, %3, %3) {map = affine_map<(d0, d1) -> (d0, d1)>} : (f64, memref<2x3xf64>, index, index) -> ()
  %10 = "arith.constant"() {value = 6.000000e+00 : f64} : () -> f64
  "affine.store"(%10, %1, %3, %4) {map = affine_map<(d0, d1) -> (d0, d1)>} : (f64, memref<2x3xf64>, index, index) -> ()
  %11 = "toy.constant"() {value = dense<[[1.000000e+00, 2.000000e+00, 3.000000e+00], [4.000000e+00, 5.000000e+00, 6.000000e+00]]> : tensor<2x3xf64>} : () -> tensor<2x3xf64>
  "affine.for"() ({
  ^bb0(%arg0: index):
    "affine.for"() ({
    ^bb0(%arg1: index):
      %14 = "affine.load"(%1, %arg1, %arg0) {map = affine_map<(d0, d1) -> (d0, d1)>} : (memref<2x3xf64>, index, index) -> f64
      "affine.store"(%14, %0, %arg0, %arg1) {map = affine_map<(d0, d1) -> (d0, d1)>} : (f64, memref<3x2xf64>, index, index) -> ()
      "affine.yield"() : () -> ()
    }) {lower_bound = affine_map<() -> (0)>, step = 1 : index, upper_bound = affine_map<() -> (2)>} : () -> ()
    "affine.yield"() : () -> ()
  }) {lower_bound = affine_map<() -> (0)>, step = 1 : index, upper_bound = affine_map<() -> (3)>} : () -> ()
  %12 = "toy.transpose"(%11) : (tensor<2x3xf64>) -> tensor<3x2xf64>
  %13 = "toy.mul"(%12, %12) : (tensor<3x2xf64>, tensor<3x2xf64>) -> tensor<3x2xf64>
  "toy.print"(%13) : (tensor<3x2xf64>) -> ()
  "memref.dealloc"(%1) : (memref<2x3xf64>) -> ()
  "memref.dealloc"(%0) : (memref<3x2xf64>) -> ()
  "toy.return"() : () -> ()
}) {function_type = () -> (), sym_name = "main"} : () -> ()

Here, I note that %0 now holds the memory location for the result of the transpose operation, and %1 holds the memory location of the original matrix. That’s fine, because I know that the memory allocation for the result of the transpose op gets inserted at the head of the basic block.

But, I note that in the lowered transpose operation, the loop nest knows to perform the affine_load operation on the memory location in %1, and store it in %0. My main question is, how does the MLIR pass know that the input to the transpose operation is %1? Before the transpose op, the printed MLIR assembly says that the argument for the transpose op should be in %10. So how did it know to look for the input in %1?

I am still very new to compiler theory and the LLVM/MLIR framework, so I may not understand a compiler theory-heavy explanation. (I’m also not sure that one is needed, but I feel the need to put this disclaimer out there.)

Thanks for your help!