Aggressive bufferization? (original) (raw)

November 20, 2025, 6:00am 1

my question is about bufferization for tiling IR represented by scf.for

scf.for(...) {
 %0 = extract_slice(input)
 %1 = xx.op
 insert_slice(%1, output)
}

after bufferization, copy is produced,

scf.for(...) {
 subview
 xx.op
 copy
}

i think this is conservative result.

when there is no data race for input and output, copy is NOT needed.

scf.for(...) {
 %0 = subview(input)
 %1 = subview(output)
 xx.op(%0, %1)

}

however, insert_slice is always bufferized to memref.copy

could anyone who have same experience on it share insight about how to enhance bufferization?

tensor.insert_slice always bufferizes to a memref.copy, but it may be a no-op. Try running CSE and the canonicalizer to see if the copy disappears.

If not: The offsets, sizes, strides of the extract_slice/insert_slice must match. If that’s not the case, you most likely see a copy.

FullZing November 21, 2025, 7:02am 3

thank you for your answer.

could you please point out where the code which can removes the memref.copy?

Hope to check in what condition, the copy becomes no op.

I found FoldCopyOfCast, FoldSelfCopy and FoldEmtyCopy pattens.

seems that they are not the case described in the above sample.

So it would be appriciate if you can tell me the place of code (copy → no op).

many thanks.

example: memref.copy %a, %a

If your IR is not like that after CSE, the copy won’t fold away. It’s probably FoldSelfCopy.

FullZing November 26, 2025, 6:22am 5

%0 = memref.alloc
%1 = memref.subview(%0)
%2 = memref.subview(%0)
memref.copy(%1, %2)

i can confirm projection doesn’t overlap between %1 and %2.

such self-copy is not removed.

is it reasonable to patch here for such case ?

The above IR should fold away when running -cse -canonicalize. Assuming that the subviews have the same offsets, sizes and strides.

FullZing November 27, 2025, 9:39am 7

thanks, i miss cse.