[llvm-dev] LoopVectorizer: shufflevectors (original) (raw)

Jonas Paulsson via llvm-dev llvm-dev at lists.llvm.org
Tue Sep 4 09:34:54 PDT 2018

Previous message: [llvm-dev] LoopVectorizer: shufflevectors
Next message: [llvm-dev] LoopVectorizer: shufflevectors
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi Renato,

It's probably a lot simpler to improve the SystemZ model to "know" have the same arch flags / cost model completeness as the other targets. I thought they were - anything particular in mind?

The transformations done by the vectoriser are target-agnostic, but they still ask the targets if certain patterns are possible and profitable before doing so. I have a patch for the LoopVectorizer that makes it recognize this particular case of a load interleave group being only used by a store interleave group. I then pass this flag to TTI.getInterleavedMemoryOpCost(), so that the target can return an accurate cost. During my experiments on SystemZ I added the cost of shuffling the vector(s) only on the load, while then the store group did not get that cost at all.

This then made many more cases of interleaving happen (~450 cases on spec IIRC). Only problem was... the SystemZ backend could not handle those shuffles as well in all the cases. To me that looked like something to be fixed on the I/R level, and after discussions with Sanjay I got the impression that this was the case...

To me, this looks like something the LoopVectorizer is neglecting and should be combining. I suppose with my patch for the Load -> Store groups, I could add also the handling of recomputed indices so that the load group produces a vector that fits the store group directly. But if I understand you correctly, even this is not so wise? And if so, then indeed improving the SystemZ DAGCombiner is the only alternative left, I guess...

But in a target-agnostic compiler you need to "emulate" that using the three-step above: target info, cost model, ISel patterns.

But having the cost functions available is not enough to drive a later I/R pass to optimize the generated vector code? I mean if the target indicated which shuffles were expensive, that could then easily be avoided.

Thanks,

Jonas

Previous message: [llvm-dev] LoopVectorizer: shufflevectors
Next message: [llvm-dev] LoopVectorizer: shufflevectors
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the llvm-dev mailing list