[llvm-dev] MatchLoadCombine(): handling for vectorized loop. (original) (raw)
Friedman, Eli via llvm-dev llvm-dev at lists.llvm.org
Mon Dec 3 15:37:29 PST 2018
- Previous message: [llvm-dev] MatchLoadCombine(): handling for vectorized loop.
- Next message: [llvm-dev] MatchLoadCombine(): handling for vectorized loop.
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 12/3/2018 8:20 AM, Jonas Paulsson wrote:
Hi,
I have noticed some loops that build a wider element by loading small elements, zero-extending them, shifting them (with different amounts) to then 'or' them all together. They are either equivalent of a wider load, or to that of a byte-swapped one. DAGCombiner::MatchLoadCombine() will combine this to a single wide load, but only in the scalar cases of i16, i32 and i64. The result is that these loops (I have seen a dozen or so on SPEC) get vectorized with a lot of ugly code. I have begun to experiment with handling the vectorized loop also, and would like to know if people think this would be a good idea? Also, am I right to assume that it probably should be run before type legalization? You mean, trying to merge some combination of vector loads and shuffles into a single vector load in DAGCombine? That seems sort of late, given the cost modeling involved in vectorization.
See also http://lists.llvm.org/pipermail/llvm-dev/2018-February/121000.html ?
-Eli
-- Employee of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
- Previous message: [llvm-dev] MatchLoadCombine(): handling for vectorized loop.
- Next message: [llvm-dev] MatchLoadCombine(): handling for vectorized loop.
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]