A question about affine memory access analysis in mlir(affine or polygeist) (original) (raw)

October 3, 2025, 7:59am 1

I want use the below command to transform a c-style kernel_durbin function to mlir style, which located in the durbin.c (it is a case in polybench).
cgeist -o durbin.mlir durbin.c -O0 -g -S -memref-fullrank --raise-scf-to-affine --polyhedral-opt -DPOLYBENCH_TIME -DPOLYBENCH_USE_SCALAR_LB -I /usr/lib/gcc/x86_64-linux-gnu/12/include/ -I. -I../../../utilities ../../../utilities/polybench.c -include durbin.h --function="kernel_durbin"

void kernel_durbin(int n,
           DATA_TYPE POLYBENCH_1D(r,N,n),
           DATA_TYPE POLYBENCH_1D(y,N,n))
{
 DATA_TYPE z[N];
 DATA_TYPE alpha;
 DATA_TYPE beta;
 DATA_TYPE sum;

 int i,k;

#pragma scop
 y[0] = -r[0];
 beta = SCALAR_VAL(1.0);
 alpha = -r[0];

 for (k = 1; k < _PB_N; k++) {
   beta = (1-alpha*alpha)*beta;
   sum = SCALAR_VAL(0.0);
   for (i=0; i<k; i++) {
      sum += r[k-i-1]*y[i];
   }
   alpha = - (r[k] + sum)/beta;

   for (i=0; i<k; i++) {
      z[i] = y[i] + alpha*y[k-i-1];
   }
   for (i=0; i<k; i++) {
     y[i] = z[i];
   }
   y[k] = alpha;
 }
#pragma endscop

}

The result is :

#map = affine_map<(d0) -> (d0)>
func.func @kernel_durbin(%arg0: i32, %arg1: memref<2000xf64>, %arg2: memref<2000xf64>) attributes {llvm.linkage = #llvm.linkage<external>} {
  %cst = arith.constant 0.000000e+00 : f64
  %cst_0 = arith.constant 1.000000e+00 : f64
  %alloca = memref.alloca() : memref<f64>
  %0 = llvm.mlir.undef : f64
  affine.store %0, %alloca[] : memref<f64>
  %alloca_1 = memref.alloca() : memref<f64>
  affine.store %0, %alloca_1[] : memref<f64>
  %alloca_2 = memref.alloca() : memref<f64>
  affine.store %0, %alloca_2[] : memref<f64>
  %alloca_3 = memref.alloca() : memref<2000xf64>
  %1 = affine.load %arg1[0] : memref<2000xf64>
  %2 = arith.negf %1 : f64
  affine.store %2, %arg2[0] : memref<2000xf64>
  affine.store %cst_0, %alloca_1[] : memref<f64>
  %3 = affine.load %arg1[0] : memref<2000xf64>
  %4 = arith.negf %3 : f64
  affine.store %4, %alloca_2[] : memref<f64>
  affine.for %arg3 = 1 to 2000 {
    %5 = affine.load %alloca_2[] : memref<f64>
    %6 = arith.mulf %5, %5 : f64
    %7 = arith.subf %cst_0, %6 : f64
    %8 = affine.load %alloca_1[] : memref<f64>
    %9 = arith.mulf %7, %8 : f64
    affine.store %9, %alloca_1[] : memref<f64>
    affine.store %cst, %alloca[] : memref<f64>
    affine.for %arg4 = 0 to #map(%arg3) {
      %15 = affine.load %arg1[%arg3 - %arg4 - 1] : memref<2000xf64>
      %16 = affine.load %arg2[%arg4] : memref<2000xf64>
      %17 = arith.mulf %15, %16 : f64
      %18 = affine.load %alloca[] : memref<f64>
      %19 = arith.addf %18, %17 : f64
      affine.store %19, %alloca[] : memref<f64>
    }
    %10 = affine.load %arg1[%arg3] : memref<2000xf64>
    %11 = affine.load %alloca[] : memref<f64>
    %12 = arith.addf %10, %11 : f64
    %13 = arith.negf %12 : f64
    %14 = arith.divf %13, %9 : f64
    affine.store %14, %alloca_2[] : memref<f64>
    affine.for %arg4 = 0 to #map(%arg3) {
      %15 = affine.load %arg2[%arg4] : memref<2000xf64>
      %16 = affine.load %arg2[%arg3 - %arg4 - 1] : memref<2000xf64>
      # This line uses %14 SSA
      %17 = arith.mulf %14, %16 : f64
      %18 = arith.addf %15, %17 : f64
      affine.store %18, %alloca_3[%arg4] : memref<2000xf64>
    }
    affine.for %arg4 = 0 to #map(%arg3) {
      %15 = affine.load %alloca_3[%arg4] : memref<2000xf64>
      affine.store %15, %arg2[%arg4] : memref<2000xf64>
    }
    # This line uses %14 SSA(same to alloca_2)
    affine.store %14, %arg2[%arg3] : memref<2000xf64>
  }
  return
}

My question is: I want to use the polyhedron model to analyze the entire affine region from durbin.mlir. My default assumption is that all reads and writes within a loop must be performed via affine.load or affine.store, but I’ve discovered that this assumption doesn’t always hold true(for example in the mlir code, %17 = arith.mulf %14, %16 : f64 and affine.store %14, %arg2[%arg3] : memref<2000xf64> use SSA %14 directly (which is in another statement<because this have a loop body between def and use>), instead of by affine.load). I’d like to ask if there’s a simpler way (API or explicit rules) in cgeist, polygeist, or the affine dialect to directly obtain the affine relationships for all reads and writes within the entire loop (of course, this would also involve statement partitioning rules)?

ftynse October 3, 2025, 8:33am 2

My default assumption is that all reads and writes within a loop must be performed via affine.load or affine.store, but I’ve discovered that this assumption doesn’t always hold true

In the IR you posted, all loads and stores are performed by affine.load/store, as opposed to memref.load/store. So this assumption does hold.

I suspect what you actually mean is that you expected intermediate values to be stored into memory and then read back. In particular, that %14is stored and then re-loaded. While this is routinely at lowering to respect the C semantics of (scalar) variables being located on stack, and thus in memory, doing so is detrimental to performance and hinders some analyses. Compilers, including polygeist and clang/llvm, also routinely optimize away such stores and reloads in a pass called mem2reg. There’s one in polygeist you can invoke to remove allocas. If some of them remain, please let us know so we can fix that.

It is unclear what kind of analysis you want to do after the fact, but there is no notion of “statement” at the affine dialect level. We had to re-create equivalents of statements for the original polygeist paper, you can check section C.a of the paper for details. I’d also recommend reading up on the SSA form and what kind of optimization it allows for, e.g. in Rastello’s book - SSA-based Compiler Design | SpringerLink.

I’m sorry, my description wasn’t very clear. I’m attempting to analyze and optimize the affine region in func at the statement level (that is, I’m attempting to separate the statements from the affine region and analyze them similarly to ISL). Currently, this assumption holds true for the entire affine region, but when I try to separate the region into statements, it no longer holds for the statements, as I demonstrated in the example I provided. Perhaps it’s only false for the cgeist results, but I believe cgeist’s results are reasonable because there’s no notion of “statement” at the affine dialect level, as you mentioned.

I will try to fix it by mem2reg pass, thank you!

ftynse October 3, 2025, 3:42pm 5