[llvm-dev] [RFC] Matrix support (take 2) (original) (raw)
lkcl via llvm-dev llvm-dev at lists.llvm.org
Thu Feb 7 19:57:16 PST 2019
- Previous message: [llvm-dev] Buildbot numbers for the week of 01/27/2019 - 02/02/2019
- Next message: [llvm-dev] Different SelectionDAGs for same CPU
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Tue Dec 18 20:45:12 PST 2018, Chris wrote:
Since layout and padding information is important, it seems most logical to put this into the type. Doing so would make it available in all these places.
That said, I still don’t really understand why you need it.
for large vectors and matrices that simply will not fit into the register file, LD/ST and MV etc. in the form of gather/scatter or vectorised MVX [1] is the clear and obvious requirement.
however the penalty for use of LD/ST is the power consumption hit of going through the L1/L2 cache barrier.
for a low-power cost-competitive 3D GPU, for example, a 100% increase in power consumption due to the penalty of being forced to move data back and forth multiple times through the L1/L2 cache would be completely unacceptable.
hence the natural solution, for small vectors and matrices, to be able to process them in-place.
that in turn means having, at the architectural level, a way to re-order the sequence of an otherwise straight linear 1D array of elements. with the right re-ordering capability, it even becomes possible to do arbitrary in-place transposition of the order of elements, such that matrix multiply may be done in-place, without MV operations.
this practice is extremely common in 3D GPUs, as there tend to be a lot of 3x4 matrices. ARM MALI actually added a special hard-coded set of operations just to deal with 3x4 matrix data.
l.
[1] regfile[regfile[rs]] = regfile[rd]
- Previous message: [llvm-dev] Buildbot numbers for the week of 01/27/2019 - 02/02/2019
- Next message: [llvm-dev] Different SelectionDAGs for same CPU
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]