RTL passes (GNU Compiler Collection (GCC) Internals) (original) (raw)

The following briefly describes the RTL generation and optimization passes that are run after the Tree optimization passes.

RTL generation
The source files for RTL generation includestmt.cc,calls.cc,expr.cc,explow.cc,expmed.cc,function.cc,optabs.ccand emit-rtl.cc. Also, the fileinsn-emit.cc, generated from the machine description by the program genemit, is used in this pass. The header fileexpr.h is used for communication within this pass.
The header files insn-flags.h and insn-codes.h, generated from the machine description by the programs genflagsand gencodes, tell this pass which standard names are available for use and which patterns correspond to them.
Generation of exception landing pads
This pass generates the glue that handles communication between the exception handling library routines and the exception handlers within the function. Entry points in the function that are invoked by the exception handling library are called landing pads. The code for this pass is located in except.cc.
Control flow graph cleanup
This pass removes unreachable code, simplifies jumps to next, jumps to jump, jumps across jumps, etc. The pass is run multiple times. For historical reasons, it is occasionally referred to as the “jump optimization pass”. The bulk of the code for this pass is incfgcleanup.cc, and there are support routines in cfgrtl.ccand jump.cc.
Forward propagation of single-def values
This pass attempts to remove redundant computation by substituting variables that come from a single definition, and seeing if the result can be simplified. It performs copy propagation and addressing mode selection. The pass is run twice, with values being propagated into loops only on the second run. The code is located in fwprop.cc.
Store forwarding avoidance
This pass attempts to reduce the overhead of store to load forwarding. It detects when a load reads from one or more previous smaller stores and then rearranges them so that the stores are done after the load. The loaded value is adjusted with a series of bit insert instructions so that it stays the same. The code is located in avoid-store-forwarding.cc.
Common subexpression elimination
This pass removes redundant computation within basic blocks, and optimizes addressing modes based on cost. The pass is run twice. The code for this pass is located in cse.cc.
Global common subexpression elimination
This pass performs two different types of GCSE depending on whether you are optimizing for size or not (LCM based GCSE tends to increase code size for a gain in speed, while Morel-Renvoise based GCSE does not). When optimizing for size, GCSE is done using Morel-Renvoise Partial Redundancy Elimination, with the exception that it does not try to move invariants out of loops—that is left to the loop optimization pass. If MR PRE GCSE is done, code hoisting (aka unification) is also done, as well as load motion. If you are optimizing for speed, LCM (lazy code motion) based GCSE is done. LCM is based on the work of Knoop, Ruthing, and Steffen. LCM based GCSE also does loop invariant code motion. We also perform load and store motion when optimizing for speed. Regardless of which type of GCSE is used, the GCSE pass also performs global constant and copy propagation. The source file for this pass is gcse.cc, and the LCM routines are in lcm.cc.
A third version of this pass is run on some targets to optimise assignments to specific hard registers. This can be used in cases where a register has a single purpose, such as specifying a mode as an extra input for specific instructions (see mode switching optimization for another way of handling instruction modes).
Loop optimization
This pass performs several loop related optimizations. The source files cfgloopanal.cc and cfgloopmanip.cc contain generic loop analysis and manipulation code. Initialization and finalization of loop structures is handled by loop-init.cc. A loop invariant motion pass is implemented in loop-invariant.cc. Basic block level optimizations—unrolling, and peeling loops— are implemented in loop-unroll.cc. Replacing of the exit condition of loops by special machine-dependent instructions is handled by loop-doloop.cc.
Jump bypassing
This pass is an aggressive form of GCSE that transforms the control flow graph of a function by propagating constants into conditional branch instructions. The source file for this pass is gcse.cc.
If conversion
This pass attempts to replace conditional branches and surrounding assignments with arithmetic, boolean value producing comparison instructions, and conditional move instructions. In the very last invocation after reload/LRA, it will generate predicated instructions when supported by the target. The code is located in ifcvt.cc.
Web construction
This pass splits independent uses of each pseudo-register. This can improve effect of the other transformation, such as CSE or register allocation. The code for this pass is located in web.cc.
Instruction combination
This pass attempts to combine groups of two or three instructions that are related by data flow into single instructions. It combines the RTL expressions for the instructions by substitution, simplifies the result using algebra, and then attempts to match the result against the machine description. The code is located in combine.cc.
Late instruction combination
This pass attempts to do further instruction combination, on top of that performed by combine.cc. Its current purpose is to substitute definitions into all uses simultaneously, so that the definition can be removed. This differs from the forward propagation pass, whose purpose is instead to simplify individual uses on the assumption that the definition will remain. It differs fromcombine.cc in that there is no hard-coded limit on the number of instructions that can be combined at once. It also differs fromcombine.cc in that it can move instructions, where necessary.
However, the pass is not in principle limited to this form of combination. It is intended to be a home for other, future combination approaches as well.
The pass runs twice, once before register allocation and once after register allocation. The code is located in late-combine.cc.
Mode switching optimization
This pass looks for instructions that require the processor to be in a specific “mode” and minimizes the number of mode changes required to satisfy all users. What these modes are, and what they apply to are completely target-specific. The code for this pass is located inmode-switching.cc.
Modulo scheduling
This pass looks at innermost loops and reorders their instructions by overlapping different iterations. Modulo scheduling is performed immediately before instruction scheduling. The code for this pass is located in modulo-sched.cc.
Instruction scheduling
This pass looks for instructions whose output will not be available by the time that it is used in subsequent instructions. Memory loads and floating point instructions often have this behavior on RISC machines. It re-orders instructions within a basic block to try to separate the definition and use of items that otherwise would cause pipeline stalls. This pass is performed twice, before and after register allocation. The code for this pass is located in haifa-sched.cc,sched-deps.cc, sched-ebb.cc, sched-rgn.cc andsched-vis.c.
Register allocation
These passes make sure that all occurrences of pseudo registers are eliminated, either by allocating them to a hard register, replacing them by an equivalent expression (e.g. a constant) or by placing them on the stack. This is done in several subpasses:
The integrated register allocator (IRA). It is called integrated because coalescing, register live range splitting, and hard register preferencing are done on-the-fly during coloring. It also has better integration with the reload/LRA pass. Pseudo-registers spilled by the allocator or the reload/LRA have still a chance to get hard-registers if the reload/LRA evicts some pseudo-registers from hard-registers. The allocator helps to choose better pseudos for spilling based on their live ranges and to coalesce stack slots allocated for the spilled pseudo-registers. IRA is a regional register allocator which is transformed into Chaitin-Briggs allocator if there is one region. By default, IRA chooses regions using register pressure but the user can force it to use one region or regions corresponding to all loops.
Source files of the allocator are ira.cc, ira-build.cc,ira-costs.cc, ira-conflicts.cc, ira-color.cc,ira-emit.cc, ira-lives, plus header files ira.hand ira-int.h used for the communication between the allocator and the rest of the compiler and between the IRA files.
Reloading. This pass renumbers pseudo registers with the hardware registers numbers they were allocated. Pseudo registers that did not get hard registers are replaced with stack slots. Then it finds instructions that are invalid because a value has failed to end up in a register, or has ended up in a register of the wrong kind. It fixes up these instructions by reloading the problematical values temporarily into registers. Additional instructions are generated to do the copying.
The reload pass also optionally eliminates the frame pointer and inserts instructions to save and restore call-clobbered registers around calls.
Source files are reload.cc and reload1.cc, plus the headerreload.h used for communication between them.
This pass is a modern replacement of the reload pass. Source files are lra.cc, lra-assign.c, lra-coalesce.cc,lra-constraints.cc, lra-eliminations.cc,lra-lives.cc, lra-remat.cc, lra-spills.cc, the header lra-int.h used for communication between them, and the header lra.h used for communication between LRA and the rest of compiler.
Unlike the reload pass, intermediate LRA decisions are reflected in RTL as much as possible. This reduces the number of target-dependent macros and hooks, leaving instruction constraints as the primary source of control.
LRA is run on targets for which TARGET_LRA_P returns true.
Basic block reordering
This pass implements profile guided code positioning. If profile information is not available, various types of static analysis are performed to make the predictions normally coming from the profile feedback (IE execution frequency, branch probability, etc). It is implemented in the file bb-reorder.cc, and the various prediction routines are in predict.cc.
Variable tracking
This pass computes where the variables are stored at each position in code and generates notes describing the variable locations to RTL code. The location lists are then generated according to these notes to debug information if the debugging information format supports location lists. The code is located in var-tracking.cc.
Delayed branch scheduling
This optional pass attempts to find instructions that can go into the delay slots of other instructions, usually jumps and calls. The code for this pass is located in reorg.cc.
Branch shortening
On many RISC machines, branch instructions have a limited range. Thus, longer sequences of instructions must be used for long branches. In this pass, the compiler figures out what how far each instruction will be from each other instruction, and therefore whether the usual instructions, or the longer sequences, must be used for each branch. The code for this pass is located in final.cc.
Register-to-stack conversion
Conversion from usage of some hard registers to usage of a register stack may be done at this point. Currently, this is supported only for the floating-point registers of the Intel 80387 coprocessor. The code for this pass is located in reg-stack.cc.
Final
This pass outputs the assembler code for the function. The source files are final.cc plus insn-output.cc; the latter is generated automatically from the machine description by the tool genoutput. The header file conditions.h is used for communication between these files.
Debugging information output
This is run after final because it must output the stack slot offsets for pseudo registers that did not get hard registers. Source files are dwarfout.c for DWARF symbol table format, files dwarf2out.cc and dwarf2asm.ccfor DWARF2 symbol table format, and vmsdbgout.cc for VMS debug symbol table format.