Dorit Naishlos - Re: [rfc] new tree-codes/optabs for vectorization of non-unit-strideacce (original) (raw)
This is the mail archive of the gcc@gcc.gnu.orgmailing list for the GCC project.
| Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
|---|---|---|
| Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
| Other format: | [Raw text] |
- From: Dorit Naishlos
- To: Paul Brook
- Cc: gcc at gcc dot gnu dot org, Ira Rosen , Richard Henderson
- Date: Thu, 17 Nov 2005 17:56:29 +0200
- Subject: Re: [rfc] new tree-codes/optabs for vectorization of non-unit-strideaccesses
Paul Brook paul@codesourcery.com wrote on 11/16/2005 05:03:47 PM:
On Wednesday 16 November 2005 14:35, Dorit Naishlos wrote:
We're going to commit to autovect-branch vectorization support for non-unit-stride accesses. We'd like to suggest a few new tree-codes/optabs in order to express the extraction and merging of elements from/to vectors.
Background: ? ? ? The new functionality is going to allow us to vectorize computations with strides that are a power-of-2, like in the example below, in which the real and imaginary parts are interleaved, and therefore each of the data-refs accesses data with stride 2:
? for (i = 0; i < n; i++) { ? ? ?tmp_re = in[2i] * coefs[2i] - in[2i+1] * coefs[2i+1]; ? ? ?tmp_im = in[2i] * coefs[2i+1] + in[2i+1] * coefs[2i]; ? ? ?out[2i] = tmp_re; ? ? ?out[2i+1] = temp_im; ? }
What is generally going to happen is that, for a VF=4, we're going to:
(1) load this data from memory: ? ? ? vec_in1 = [re0,im0,re1,im1] = vload &in ? ? ? vec_in2 = [re2,im2,re3,im3] = vload &in[VF] ? ? ? (and similarly for the coefs array)
and then, because we're doing different operations on the odd and even elements, we need to (2) arrange them into separate vectors: ? ? ? vec_in_re = [re0,re1,re2,re3] = extract_even (vec_in1, vec_in2) ? ? ? vec_in_im = [im0,im1,im2,im3] = extract_odd (vec_in1, vec_in2) ? ? ? (and similarly for the coefs array)
Have you considered targets that support interleaved load/store instructions? I'm not sure if this is supported by existing targets, but in the next year there will be targets that can perform steps 1+2 in a single load-interleaved instruction.
I don't know of existing targets that have this capability - it usually requires explicit reordering. Anyhow, when such a time comes, we can consider either adding a new tree-code for that (but sounds like we're running short of tree-codes...) or detect later on (combine?) that a {load,load,extract_even,extract_odd} sequence can be replaced by an "interleaved_load". (I assume this specialized load exists only for stride 2?)
thanks, dorit
Paul
- References:
| Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
|---|---|---|
| Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |