Non hyper-rectangular loop tiling in the Affine dialect (original) (raw)
Hi Uday,
Thank you very much for your very detailed reply! I’m indeed grateful for your insight on this matter, regarding polyhedral analysis and automatic loop tiling.
I’ve been thinking through your comments once I got it, and here are my intuitive thoughts, hope you won’t mind:
First of all, about my project -
It is partially true. I’m currently working on transforming polyhedral models to hardware descriptions described by synchronous data-flow (part of my PhD thesis), which should take into account all the tiling opportunities that can be represented by mainstream polyhedral works. Non hyper-rectangular successive loop nests is one case that I’m thinking of at the moment.
I could definitely establish my work on well-developed polyhedral projects, e.g., PolyMage, but since MLIR is a much bigger infrastructure that covers more applications, and it is an objective to rebuild existing polyhedral methods in MLIR (based on your IMPACT '20 keynote), I feel that it is a greater opportunity to contirbute to the polyhedral community and get wider potential audience group of my work.
Now back to this specific question on non hyper-rectangular tiling -
This is a great suggestion. I just fininished reading that paper (by Goumas et al.) and it is very nicely written. One of its contributions that uses non-unimodular transformation to transform the iteration space to hyper-rectangular can be useful to address this problem.
But it might not be very straightforward to put this into MLIR, as far as what I’ve discovered. The current tiling implementation in affine, please correct me if I’m wrong, is ad-hoc and doesn’t explicitly formulates the affine transformation (using matrices) for tiling. For instance, in LoopTiling.cpp, we basically insert loops for tiles and update existing loop nests’ bounds, by ad-hoc rules, and then derive a concise set of constraints by Fourier-Motzkin elimination. To support non hyper-rectangular tiling, we might need to add an additional step to encode the loop transformation in a generic form (e.g., matrix) before altering the AST (please do let me know if there is one already!), and in this encoding step we can support various kinds of tiling cases, including non hyper-rectangular. This design might benefit future polyhedral method implementation as well.
I definitely agree with this. I’ve only tried out to reproduce some examples in your PLUTO paper by CLooG though, but I assume using ISL and other established polyhedral codegen tools will be very easy for this task 
Actually, regarding the polyhedral approach, I do have an additional question on the current affine implementation of tiling. As far as I know, it doesn’t take care of dependence and it may produce illegal code.
For example, I’ve created a short testcast based on the example in the Supernode Partitioning paper:
func @diagonal(%A: memref<?x?xf32>) -> () {
%c0 = constant 0 : index
%c1 = constant 1 : index
%M = dim %A, %c0: memref<?x?xf32>
%N = dim %A, %c1: memref<?x?xf32>
affine.for %i = 0 to %M {
affine.for %j = 0 to %N {
%0 = affine.load %A[%i, %j]: memref<?x?xf32>
%1 = affine.load %A[%i + %c1, %j]: memref<?x?xf32>
%2 = affine.load %A[%i, %j + %c1]: memref<?x?xf32>
%3 = affine.load %A[%i - %c1, %j]: memref<?x?xf32>
%4 = affine.load %A[%i, %j - %c1]: memref<?x?xf32>
%5 = addf %0, %1 : f32
%6 = addf %5, %2 : f32
%7 = addf %6, %3 : f32
%8 = addf %7, %4 : f32
affine.store %8, %A[%i, %j] : memref<?x?xf32>
affine.yield
}
}
return
}
It’s ideal tiling scheme would be partitioning the itertaion space into diamond shape, but if I run mlir-opt with -affine-loop-tile="tile-size=32" I simply get a two additional outer loops over tiles, which obviously means the tiles are rectangular and the dependences are violated.
One possible solution is to use the PLUTO algorithm, which ensures the tiling legality is satisfied. But I’m not sure whether this is somewhere in MLIR, or it will be implemented somehow in the future.
I will be happy to spend some time to extend the current affine tiling method to support non hyper-rectangular iteration space and try to take into account dependence when tiling. Also, I’m interested to know more about the exploratory project you’ve mentioned about getting rid of redundancy 
Please let me know if you have any further suggestions or comments!
Many thanks
Ruizhe