[llvm-dev] A Propeller link (similar to a Thin Link as used by ThinLTO)? (original) (raw)

Rui Ueyama via llvm-dev llvm-dev at lists.llvm.org
Thu Feb 27 21:00:25 PST 2020


On Fri, Feb 28, 2020 at 11:34 AM Fangrui Song <maskray at google.com> wrote:

I met with the Propeller team today (we work for the same company but it was my first time meeting two members on the team:) ). One thing I have been reassured:

* There is no general disassembly work. General disassembly work would assuredly frighten off developers. (Inherently unreliable, memory usage heavy and difficult to deal with CFI, debug information, etc) Minimal amount of plumbing work (https://reviews.llvm.org/D68065) is acceptable: locating the jump relocation, detecting the jump type, inverting the direction of a jump, and deleting trailing bytes of an input section. The existing linker relaxation schemes already do similar things. Deleting a trailing jump is similar to RISC-V where sections can shrink (not implemented in lld; RRISCVALIGN and RRISCVRELAX are in my mind)) (binutils supports deleting bytes for a few other architectures, e.g. msp430, sh, mips, ft32, rl78). With just minimal amount of disassembly work, conceptually the framework should not be too hard to be ported to another target. One thing I was not aware of (perhaps the description did not make it clear) is that Propeller intends to **reorder basic block sections across translation units**. This is something that full LTO can do while ThinLTO cannot. Our internal systems cannot afford doing a full LTO (**Can we fix the bottleneck of full LTO** [1]?) for large executables and I believe some other users are in the same camp. Now, with ThinLTO, the post link optimization scheme will inevitably require help from the linker/compiler. It seems we have two routes: ## Route 1: Current Propeller framework lld does whole-program reordering of basic block sections. We can extend it in the future to overalign some sections and pad gaps with NOPs. What else can we do? Source code/IR/MCInst is lost at this stage. Without general assembly work, it may be difficult to do more optimization. This makes me concerned of another thing: Intel's Jump Condition Code Erratum. https://www.intel.com/content/dam/support/us/en/documents/processors/mitigations-jump-conditional-code-erratum.pdf Put it in the simplest way, a Jcc instruction whose address ≡ 30 or 31 (mod 32) should be avoided. There are assembler level (MC) mitigations (function sections are overaligned to 32), but because we use basic block sections (shaddralign<32) and need reordering, we have to redo some work at the linking stage. After losing the representation of MCInst, it is not clear to me how we can insert NOPs/segment override prefixes without doing disassembly work in the linker.

I'm not sure how the basic-block sections feature makes it hard to implement a mitigation for that specific JCC erratum. I may be missing something, but doesn't the BB sections actually make it easier to implement, as the JCC occurs only at the ending of each basic block, and with the BB sections we know what the ending instruction is for each block? I mean, when we are reordering sections, and if we found some BB with JCC is not at a desired address, we can add a padding before that BB.

Route 2 does heavy lifting work in the compiler, which can naturally reuse

the assembler level mitigation, CFI and debug information generating, and probably other stuff. (How will debug information be bloated?)

## Route 2: Add another link stage, similar to a Thin Link as used by ThinLTO. Regular ThinLTO with minimized bitcode files: all: compile thinlink thinltobackend finallink compile a.o b.o a.indexing.o b.indexing.o: a.c b.c $(clang) -O2 -c -flto=thin -fthin-link-bitcode=a.indexing.o a.c $(clang) -O2 -c -flto=thin -fthin-link-bitcode=b.indexing.o b.c thinlink lto/a.o.thinlto.bc lto/b.o.thinlto.bc a.rsp: a.indexing.o b.indexing.o $(clang) -fuse-ld=lld -Wl,--thinlto-index-only=a.rsp -Wl,--thinlto-prefix-replace=';lto' -Wl,--thinlto-object-suffix-replace='.indexing.o;.o' a.indexing.o b.indexing.o thinltobackend lto/a.o lto/b.o: a.o b.o lto/a.o.thinlto.bc lto/b.o.thinlto.bc $(clang) -O2 -c -fthinlto-index=lto/a.o.thinlto.bc a.o -o lto/a.o $(clang) -O2 -c -fthinlto-index=lto/b.o.thinlto.bc b.o -o lto/b.o finallink exe: lto/a.o lto/b.o a.rsp # Propeller does basic block section reordering here. $(clang) -fuse-ld=lld @a.rsp -o exe We need to replace the two stages thinltobackend and finallink with three. Propelled ThinLTO with minimized bitcode files: propelledthinltobackend lto/a.mir lto/b.mir: a.o b.o lto/a.o.thinlto.bc lto/b.o.thinlto.bc # Propeller emits something similar to a Machine IR file. # a.o and b.o are all IR files. $(clang) -O2 -c -fthinlto-index=lto/a.o.thinlto.bc -fpropeller a.o -o lto/a.mir $(clang) -O2 -c -fthinlto-index=lto/b.o.thinlto.bc -fpropeller b.o -o lto/b.mir propellerlink propeller/a.o propeller/b.o: lto/a.mir lto/b.mir # Propeller collects input Machine IR files, # spawn threads to generate object files parallelly. $(clang) -fpropeller-backend -fpropeller-prefix-replace='lto;propeller' lto/a.mir lto/b.mir finallink exe: propeller/a.o propeller/b.o # GNU ld/gold/lld links object files. $(clang) $^ -o exe A .mir may be much large than an object file. So lto/a.mir may be actually an object file annotated with some information, or some lower level representation than a Machine IR (there should be a guarantee that the produced object file will keep the basic block structure unchanged => otherwise basic block profiling information will not be too useful).

[1]: Can we fix the bottleneck of full LTO [1]? I wonder whether we have reached a "local maximum" of ThinLTO. If full LTO were nearly as fast as ThinLTO, how would we design a post-link optimization framework? Apparently, if full LTO did not have the scalability problem, we would not do so much work in the linker? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200228/ea70b547/attachment.html>



More information about the llvm-dev mailing list