LLVM: llvm::BlockFrequencyInfoImpl< BT > Class Template Reference (original) (raw)

template<class BT>
class llvm::BlockFrequencyInfoImpl< BT >

Shared implementation for block frequency analysis.

This is a shared implementation of BlockFrequencyInfo and MachineBlockFrequencyInfo, and calculates the relative frequencies of blocks.

LoopInfo defines a loop as a "non-trivial" SCC dominated by a single block, which is called the header. A given loop, L, can have sub-loops, which are loops within the subgraph of L that exclude its header. (A "trivial" SCC consists of a single block that does not have a self-edge.)

In addition to loops, this algorithm has limited support for irreducible SCCs, which are SCCs with multiple entry blocks. Irreducible SCCs are discovered on the fly, and modelled as loops with multiple headers.

The headers of irreducible sub-SCCs consist of its entry blocks and all nodes that are targets of a backedge within it (excluding backedges within true sub-loops). Block frequency calculations act as if a block is inserted that intercepts all the edges to the headers. All backedges and entries point to this block. Its successors are the headers, which split the frequency evenly.

This algorithm leverages BlockMass and ScaledNumber to maintain precision, separates mass distribution from loop scaling, and dithers to eliminate probability mass loss.

The implementation is split between BlockFrequencyInfoImpl, which knows the type of graph being modelled (BasicBlock vs. MachineBasicBlock), and BlockFrequencyInfoImplBase, which doesn't. The base class uses BlockNode, a wrapper around a uint32_t. BlockNode is numbered from 0 in reverse-post order. This gives two advantages: it's easy to compare the relative ordering of two nodes, and maps keyed on BlockT can be represented by vectors.

This algorithm is O(V+E), unless there is irreducible control flow, in which case it's O(V*E) in the worst case.

These are the main stages:

  1. Reverse post-order traversal (initializeRPOT()).
    Run a single post-order traversal and save it (in reverse) in RPOT. All other stages make use of this ordering. Save a lookup from BlockT to BlockNode (the index into RPOT) in Nodes.
  2. Loop initialization (initializeLoops()).
    Translate LoopInfo/MachineLoopInfo into a form suitable for the rest of the algorithm. In particular, store the immediate members of each loop in reverse post-order.
  3. Calculate mass and scale in loops (computeMassInLoops()).
    For each loop (bottom-up), distribute mass through the DAG resulting from ignoring backedges and treating sub-loops as a single pseudo-node. Track the backedge mass distributed to the loop header, and use it to calculate the loop scale (number of loop iterations). Immediate members that represent sub-loops will already have been visited and packaged into a pseudo-node.
    Distributing mass in a loop is a reverse-post-order traversal through the loop. Start by assigning full mass to the Loop header. For each node in the loop:
  1. Distribute mass in the function (computeMassInFunction()).
    Finally, distribute mass through the DAG resulting from packaging all loops in the function. This uses the same algorithm as distributing mass in a loop, except that there are no exit or backedge edges.
  2. Unpackage loops (unwrapLoops()).
    Initialize each block's frequency to a floating point representation of its mass.
    Visit loops top-down, scaling the frequencies of its immediate members by the loop's pseudo-node's frequency.
  3. Convert frequencies to a 64-bit range (finalizeMetrics()).
    Using the min and max frequencies as a guide, translate floating point frequencies to an appropriate range in uint64_t.

It has some known flaws.

Definition at line 842 of file BlockFrequencyInfoImpl.h.