[llvm-dev] MatchLoadCombine(): handling for vectorized loop. (original) (raw)

Friedman, Eli via llvm-dev llvm-dev at lists.llvm.org
Mon Dec 3 15:37:29 PST 2018


On 12/3/2018 8:20 AM, Jonas Paulsson wrote:

Hi,

I have noticed some loops that build a wider element by loading small elements, zero-extending them, shifting them (with different amounts) to then 'or' them all together. They are either equivalent of a wider load, or to that of a byte-swapped one. DAGCombiner::MatchLoadCombine() will combine this to a single wide load, but only in the scalar cases of i16, i32 and i64. The result is that these loops (I have seen a dozen or so on SPEC) get vectorized with a lot of ugly code. I have begun to experiment with handling the vectorized loop also, and would like to know if people think this would be a good idea? Also, am I right to assume that it probably should be run before type legalization? You mean, trying to merge some combination of vector loads and shuffles into a single vector load in DAGCombine?  That seems sort of late, given the cost modeling involved in vectorization.

See also http://lists.llvm.org/pipermail/llvm-dev/2018-February/121000.html ?

-Eli

-- Employee of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project



More information about the llvm-dev mailing list