LLVM: lib/Target/AMDGPU/SIPreEmitPeephole.cpp File Reference (original) (raw)
This pass performs the peephole optimizations before code emission.
Additionally, this pass also unpacks packed instructions (V_PK_MUL_F32/F16, V_PK_ADD_F32/F16, V_PK_FMA_F32) adjacent to MFMAs such that they can be co-issued. This helps with overlapping MFMA and certain vector instructions in machine schedules and is expected to improve performance. Only those packed instructions are unpacked that are overlapped by the MFMA latency. Rest should remain untouched. TODO: Add support for F16 packed instructions
Definition in file SIPreEmitPeephole.cpp.