[LLVMdev] Alignment of pointee (original) (raw)
Frank Winter fwinter at jlab.org
Tue Mar 25 06:53:35 PDT 2014
- Previous message: [LLVMdev] loop vectorizer
- Next message: [LLVMdev] Alignment of pointee
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi all,
Is there a way to express in the IR that a pointer's value is a multiple of, say, 32 bytes? I.e. the data the pointer points to has an alignment of 32 bytes. I am not meaning the natural alignment determined by the object's size. I am talking about a double* pointer and like to explicitly overestimate the alignment.
I am trying to add this pointer as a function's argument, so that later aligned (vector-) loads would get generated.
See the pseudo code of what I try to accomplish:
define void @foo( double* noalias %arg0 ) { // switching to C style for( int outer=0 ; outer < end ; ++outer ) { for( int inner=0 ; inner < 4 ; ++inner ) { arg0[ outer4 + inner ] += arg0[ outer4 + inner ]; } }
The loop vectorizer does its job on the 'inner' loop and generates vector loads/adds/stores for this code. However, the vector loads/stores are not optimally aligned as they could be resulting a lot of boilerplate code produced in codegen (lots of permutations).
After vectorization the code looks similar to
define void @foo( double* noalias %arg0 ) { // switching to C style for( int outer=0 ; outer < end ; ++outer ) {
vector.body: ; preds = %vector.body, %L5 %index = phi i64 [ 0, %L5 ], [ %index.next, %vector.body ] %42 = add i64 %7, %index %43 = getelementptr double* %arg1, i64 %42 %44 = bitcast double* %43 to <4 x double>* %wide.load = load <4 x double>* %44, align 8
%132 = fadd <4 x double> %wide.load, %wide.load54
%364 = getelementptr double* %arg0, i64 %93 %365 = bitcast double* %364 to <4 x double>* store <4 x double> %329, <4 x double>* %365, align 8 } }
One can see that if the initial alignment of the pointee of %arg0 was 32 bytes and since the vectorizer operates on a loop with a fixed trip count of 4 and the size of double is 8 bytes, the vector loads and stores could be ideally aligned with 32 bytes (which on my target architecture would result in vector loads without additional permutations.
Is it somehow possible to achieve this? I am generating the IR with the builder, i.e. I am not coming from C or clang.
Thank you, Frank
- Previous message: [LLVMdev] loop vectorizer
- Next message: [LLVMdev] Alignment of pointee
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]