[LLVMdev] [x86] Prefetch intrinsics and prefetchw (original) (raw)
Joshua Magee joshua_magee at playstation.sony.com
Thu Jul 30 12:46:05 PDT 2015
- Previous message: [LLVMdev] Where do I update release notes for the 3.7 release?
- Next message: [LLVMdev] Ideas for making llvm-config --cxxflags more useful
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi,
I am looking at how the PREFETCHW instruction is matched to the IR prefetch intrinsic (and __builtin_prefetch).
Consider this C program: char foo[100]; int bar(void) { __builtin_prefetch(foo, 0, 0); __builtin_prefetch(foo, 0, 1); __builtin_prefetch(foo, 0, 2); __builtin_prefetch(foo, 0, 3);
__builtin_prefetch(foo, 1, 0);
__builtin_prefetch(foo, 1, 1);
__builtin_prefetch(foo, 1, 2);
__builtin_prefetch(foo, 1, 3);
*foo = 1;
return foo[0];
}
The generated IR for the prefetches follow this pattern:
tail call void @llvm.prefetch(i8* %0, i32 0, i32 0, i32 1) tail call void @llvm.prefetch(i8* %1, i32 0, i32 1, i32 1) tail call void @llvm.prefetch(i8* %2, i32 0, i32 2, i32 1) tail call void @llvm.prefetch(i8* %3, i32 0, i32 3, i32 1) tail call void @llvm.prefetch(i8* %4, i32 1, i32 0, i32 1) tail call void @llvm.prefetch(i8* %5, i32 1, i32 1, i32 1) tail call void @llvm.prefetch(i8* %6, i32 1, i32 2, i32 1) tail call void @llvm.prefetch(i8* %7, i32 1, i32 3, i32 1)
The generated x86_64 code for the first 4 calls, where the read/write parameter is 0 (read) is exactly as expected: (Generated with clang -O2 -S -march=btver2 test.c) prefetchnta foo(%rip) prefetcht2 foo(%rip) prefetcht1 foo(%rip) prefetcht0 foo(%rip)
The question is what should be expected when the r/w parameter is 1 (write). Currently the backend generates: prefetchnta foo(%rip) prefetcht2 foo(%rip) prefetcht1 foo(%rip) prefetchw foo(%rip)
However, a different possibility would be for the r/w parameter to take precedence over the locality parameter to generate: prefetchw foo(%rip) prefetchw foo(%rip) prefetchw foo(%rip) prefetchw foo(%rip)
The PREFETCHW instruction prefetches the L1 cache line and sets the cache-line state to modified. Since there is no PREFETCHW for higher-level cache-lines, it is debatable what prefetch instruction should be generated when a write prefetch is requested with a locality < 3. One opinion is that the rw parameter takes precedence over locality, therefore prefetch(a, 1, 1, 1) should generate prefetchw and not prefetch2. FWIW, this is what GCC appears to do (write trumps locality.)
Not sure if there is a right/wrong here; what is the preferred behavior?
Thanks,
- Josh
- Previous message: [LLVMdev] Where do I update release notes for the 3.7 release?
- Next message: [LLVMdev] Ideas for making llvm-config --cxxflags more useful
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]