MLIR: lib/Dialect/NVGPU/TransformOps/NVGPUTransformOps.cpp File Reference (original) (raw)

Go to the source code of this file.

Classes
struct RowColIndexing
Helper struct to encode a pair of row/column indexings in the form of affine expressions. More...
struct MmaSyncBuilder
Helper struct to provide a simple mapping from matmul operations to the corresponding mma.sync operation. More...
struct HopperBuilder
Helper to create the base Hopper-specific operations that are reused in various other places. More...
struct CopyBuilder
Helper to create the tma operations corresponding to linalg::CopyOp. More...
Macros
#define DEBUG_TYPE "nvgpu-transforms"
#define DBGS() (llvm::dbgs() << "[" DEBUG_TYPE "]: ")
#define DBGSNL() (llvm::dbgs() << "\n")
#define LDBG(X) LLVM_DEBUG(DBGS() << (X) << "\n")
#define GET_OP_LIST
#define GET_OP_CLASSES
Functions
static bool hasDefaultMemorySpace (BaseMemRefType type)
Returns true if the given type has the default memory space. More...
static bool hasSharedMemorySpace (BaseMemRefType type)
Returns true if the given type has the shared (workgroup) memory space. More...
static Value getValueLoadedFromGlobal (Operation *op)
Returns the value produced by a load from the default memory space. More...
static bool isStoreToShared (Operation *op, Value v)
Returns true if the operation is storing the given value into shared memory. More...
static bool isLoadFromGlobalStoredToShared (Operation *op)
Returns true if the operation is a load from the default memory space the result of which is only stored into the shared memory space. More...
static LogicalResult collectStage0PipeliningOps (scf::ForOp forOp, llvm::SmallPtrSet< Operation *, 16 > &ops)
Populate ops with the set of operations that belong to the stage 0 of the pipelined version of the given loop when pipelining copies to shared memory. More...
static void setAsyncWaitGroupsInFlight (OpBuilder &builder, Operation *op, scf::PipeliningOption::PipelinerPart part, unsigned iteration, unsigned depth)
Hook for the loop pipeliner that sets the "num groups in flight" attribute of async wait operations corresponding to pipelined shared memory copies. More...
static void getPipelineStages (scf::ForOp forOp, std::vector< std::pair< Operation *, unsigned >> &opsWithPipelineStages, unsigned depth, llvm::SmallPtrSetImpl< Operation * > &stage0Ops)
Hook for the loop pipeliner that populates ops with the stage information as follows: More...
static Operation * replaceOpWithPredicatedOp (RewriterBase &rewriter, Operation *op, Value predicate)
Hook for the loop pipeliner. More...
static std::tuple< DiagnosedSilenceableFailure, scf::ForOp > pipelineForSharedCopies (RewriterBase &rewriter, scf::ForOp forOp, int64_t depth, bool epiloguePeeling)
Applies loop pipelining with the given depth to the given loop so that copies into the shared memory are pipelined. More...
template<typename ApplyFn , typename ReduceFn >
static void foreachIndividualVectorElement (Value vector, ApplyFn applyFn, ReduceFn reduceFn)
Helper functions to create customizable load and stores operations. More...
static std::tuple< SmallVector< int64_t >, SmallVector< int64_t >, SmallVector< int64_t > > makeVectorShapes (ArrayRef< int64_t > lhs, ArrayRef< int64_t > rhs, ArrayRef< int64_t > res)
static Attribute getSharedAddressSpaceAttribute (OpBuilder &b)

DBGS

#define DBGS ( ) (llvm::dbgs() << "[" DEBUG_TYPE "]: ")

DBGSNL

#define DBGSNL ( ) (llvm::dbgs() << "\n")

DEBUG_TYPE

#define DEBUG_TYPE "nvgpu-transforms"

GET_OP_CLASSES

GET_OP_LIST

LDBG

| #define LDBG | ( | | X | ) | LLVM_DEBUG(DBGS() << (X) << "\n") | | ------------ | - | | - | - | ------------------------------------------------------------------------------------------------ |

collectStage0PipeliningOps()

Populate ops with the set of operations that belong to the stage 0 of the pipelined version of the given loop when pipelining copies to shared memory.

Specifically, this collects:

  1. all loads from global memory, both sync and async;
  2. the barriers for async loads.

In particular, barriers are omitted if they do not dominate at least one async load for which there is not yet a barrier.

Definition at line 212 of file NVGPUTransformOps.cpp.

References isLoadFromGlobalStoredToShared().

Referenced by pipelineForSharedCopies().

foreachIndividualVectorElement()

template<typename ApplyFn , typename ReduceFn >

static void foreachIndividualVectorElement ( Value vector, ApplyFn applyFn, ReduceFn reduceFn ) static

getPipelineStages()

static void getPipelineStages ( scf::ForOp forOp, std::vector< std::pair< Operation *, unsigned >> & opsWithPipelineStages, unsigned depth, llvm::SmallPtrSetImpl< Operation * > & stage0Ops ) static

Hook for the loop pipeliner that populates ops with the stage information as follows:

Definition at line 283 of file NVGPUTransformOps.cpp.

References mlir::getBackwardSlice(), mlir::Operation::getBlock(), and options.

Referenced by pipelineForSharedCopies().

getSharedAddressSpaceAttribute()

getValueLoadedFromGlobal()

hasDefaultMemorySpace()

hasSharedMemorySpace()

isLoadFromGlobalStoredToShared()

static bool isLoadFromGlobalStoredToShared ( Operation * op) static

isStoreToShared()

makeVectorShapes()

pipelineForSharedCopies()

Applies loop pipelining with the given depth to the given loop so that copies into the shared memory are pipelined.

Doesn't affect other loops. Returns a pair containing the error state and the pipelined op, the latter being null in case of any failure. The error state contains a definite error if the IR has been modified and a silenceable error otherwise.

Definition at line 358 of file NVGPUTransformOps.cpp.

References collectStage0PipeliningOps(), mlir::DiagnosedSilenceableFailure::definiteFailure(), mlir::emitSilenceableFailure(), getPipelineStages(), options, mlir::scf::pipelineForLoop(), replaceOpWithPredicatedOp(), setAsyncWaitGroupsInFlight(), mlir::OpBuilder::setInsertionPoint(), and mlir::DiagnosedSilenceableFailure::success().

replaceOpWithPredicatedOp()

setAsyncWaitGroupsInFlight()