[LLVMdev] Adding masked vector load and store intrinsics (original) (raw)
dag at cray.com dag at cray.com
Fri Oct 24 12:50:39 PDT 2014
- Previous message: [LLVMdev] Adding masked vector load and store intrinsics
- Next message: [LLVMdev] Adding masked vector load and store intrinsics
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
"Das, Dibyendu" <Dibyendu.Das at amd.com> writes:
Is there an example of such a workload ( lets say from the spec cpu 2006 harness or similar ) that you have in mind and the amount of gain expected ?
Literally nearly every code that has significant vector work in it. Even if there is no control flow in the loop, masking allows the compiler to more aggressively vectorize and rely on the masks to prevent unsafe execution.
The amount of gain is highly code-dependent but my guess is that Elena's example of 2x speedup is typical, maybe even on the lower end.
The capability of the vectorizer is the biggest factor. Without masks, the vectorizer cannot be as aggressive. With masks, the vectorizer still has to be written to be aggressive. Ph.D. dissertations have been written on the topic. It's non-trivial work.
Masking is an enabling technology, not an end goal.
-David
- Previous message: [LLVMdev] Adding masked vector load and store intrinsics
- Next message: [LLVMdev] Adding masked vector load and store intrinsics
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]