[LLVMdev] SLP vectorizer on AVX feature (original) (raw)

cbergstrom at pathscale.com cbergstrom at pathscale.com
Wed Jul 1 13:30:46 PDT 2015


Is there a patch that will get upstreamed?

  Original Message   From: Frank Winter Sent: Thursday, July 2, 2015 03:29 To: Renato Golin Cc: LLVM Dev Subject: Re: [LLVMdev] SLP vectorizer on AVX feature

Hi Renato,

there were two follow-up emails. The issue is solved. The SLP vectorizer has a magic number built into the code which determines the max. vector length to search for. That was set to 128 bits. Increasing it to 256 bits solved the issue.

For inconsistency reasons it must be '--debug-only=SLP' and the output can be found in one of the follow-up emails.

Thanks, Frank

On 07/01/2015 04:18 PM, Renato Golin wrote:

Hi Frank,

What does --debug-only=vectorize says? You may try to get the datalayout and the triple on the IR header, just to make sure you got everything right. LLVM will honour those, and front-ends should create them correctly. --renato On 1 July 2015 at 19:06, Frank Winter <fwinter at jlab.org> wrote: I realized that the function parameters had no alignment attributes on them. However, even adding an alignment suitable for aligned loads on YMM, i.e. 32 bytes, didn't convince the vectorizer to use [8 x float].

define void @main(i64 %lo, i64 %hi, float* noalias align 32 %arg0, float* noalias align 32 %arg1, float* noalias align 32 %arg2) { ... results still in code using only [4 x float]. Thanks, Frank

On 07/01/2015 10:51 AM, Frank Winter wrote: I seem to have problem to get the SLP vectorizer to make use of the full 8 floats available in a SIMD vector on a Sandy Bridge CPU with AVX. The function is attached, the CPU flags are: flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constanttsc archperfmon pebs bts repgood xtopology nonstoptsc aperfmperf pni pclmulqdq dtes64 monitor dscpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse41 sse42 x2apic popcnt aes xsave avx lahflm ida arat epb xsaveopt pln pts dts tprshadow vnmi flexpriority ept vpid I use LLVM 3.6 checked out yesterday ~/toolchain/install/llvm-3.6/bin/opt -datalayout -basicaa -slp-vectorizer -instcombine < func4x4x4scalarpscalar.ll -S_ _the output goes like:_ _; ModuleID = '' define void @main(i64 %lo, i64 %hi, float* noalias %arg0, float* noalias %arg1, float* noalias %arg2) { entrypoint: %0 = bitcast float* %arg1 to <4 x float>* %1 = load <4 x float>* %0, align 4 %2 = bitcast float* %arg2 to <4 x float>* %3 = load <4 x float>* %2, align 4 %4 = fadd <4 x float> %3, %1 %5 = bitcast float* %arg0 to <4 x float>* store <4 x float> %4, <4 x float>* %5, align 4 .... So, it could make use of <8 x float> available in that machine. But it doesn't. Then I thought, that maybe the YMM registers get used when lowering the IR to machine code. However, the generated assembly doesn't seem to support this assumption :-(

main: .cfistartproc xorl %eax, %eax xorl %esi, %esi .align 16, 0x90 .LBB01: vmovups (%r8,%rax), %xmm0 vaddps (%rcx,%rax), %xmm0, %xmm0 vmovups %xmm0, (%rdx,%rax) addq $4, %rsi addq $16, %rax cmpq $61, %rsi jb .LBB01 retq I played with -mcpu and -march switches without success. In any case, the target architecture should be detected with the -datalayout pass, right? Any idea what I am missing? Frank


LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev


LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev


LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev



More information about the llvm-dev mailing list