[llvm-dev] [arm, aarch64] Alignment checking in interleaved access pass (original) (raw)

Renato Golin via llvm-dev llvm-dev at lists.llvm.org
Fri Oct 14 07:38:31 PDT 2016


On 10 October 2016 at 22:16, Alina Sbirlea <alina.sbirlea at gmail.com> wrote:

IMO, it makes sense to have Halide generate this instead: %114 = shufflevector <16 x i32> %112, <16 x i32> %113, <16 x i32> <i32 0,_ _i32 8, i32 16, i32 24, i32 1, i32 9, i32 17, i32 25, i32 2, i32 10, i32 18,_ _i32 26, i32 3, i32 11, i32 19, i32 27> store <16 x i32> %114, <16 x i32>* %sunkaddr262 %115 = shufflevector <16 x i32> %112, <16 x i32> %113, <16 x i32> <i32 4,_ _i32 12, i32 20, i32 28, i32 5, i32 13, i32 21, i32 29, i32 6, i32 14, i32_ _22, i32 30, i32 7, i32 15, i32 23, i32 31> store <16 x i32> %115, <16 x i32>* %scevgep241 With the changes from the patch, this translates to the code above, and it is arch independent.

Right, this makes sense.

This should generate 2 VST4/ST4, which together will be contiguous, but not individually.

Yes, I did that with some of the codes generated by Halide, it's what led to patch D23646 to extend the patterns. The new code being generated is the "expected" one.

I have added some comments on the review, but I think overall, it makes sense and it's a much simpler patch than I was expecting to find working all the way to the end. :)

Also, benchmarking some of their apps showed that llvm's pass (after the patch) does the job as well as the custom code generation they were using before. (Note, that Halide's code generation was written before the interleaved access pass was added, so it made sense at the time.)

Nice!

cheers, --renato



More information about the llvm-dev mailing list