[llvm-dev] [RFC] Propeller: A frame work for Post Link Optimizations (original) (raw)

Rahman Lavaee via llvm-dev llvm-dev at lists.llvm.org
Mon Feb 10 11:13:29 PST 2020


I respond with regards to "unwind tables" and "code layout".

WIth regards to "unwind tables", the idea of splitting call-sites tables has been implemented in D73739. I think "asynchronous unwind tables" are already supported by our patch (Meaning that the unwind tables are precise at instruction boundary). Please let us know if you believe otherwise.

Let me point out that although the formulation of ext-TSP is more involved than TSP, the heuristic algorithm that is used to solve it is essentially the same as the one for TSP. It's an incremental changing algorithm aimed at maximizing the gain in the score at each step.

On the other hand, most of the complication in the algorithm is caused by split-and-remerge (guided by -propeller-chain-split-threshold) which is highly beneficial for escaping local optima, resulting in 2% more performance improvement. The original ext-TSP work sets the split threshold at 128 bytes, while we can achieve better speedup by setting it to 1024bytes, without sacrificing the complication time (The whole reordering algorithm takes about 4 seconds on clang).

I am concerned that the raw ELF fields (like sh_size) are used for

heuristics. It seems to try reconstructing some information about Machine{Function,BasicBlock,Instr,...} about doing that in the Machine* stage may just be simpler and more accurate.

Yes. Basic block (binary) sizes are used to compute the binary distance for jumps in the final layout (as used by the ext-TSP algorithm). We can currently get this information both from the object code (sections), or the BB symbol sizes (propagated through the propeller profile). Getting this information in the machine stage is not as accurate and convenient. For instance, StackMapShadowTracker has been implemented in llvm to count the instruction bytes after the last StackMap and it shares some code between different classes (X86AsmPrinter and X86MCInstLower).

We feel that the available optimizations may be limited. The lld focused implementation may have some improvement in the near term but the benefits will dilute quickly when the Machine{Function,BasicBlock} interfaces get improved.

Also, doing heavy-lifting work on the linker side: Needs non-trivial work to port to another binary format. Another target may be a very different story. (Even for x86-64, I see the patches attend to address RX8664PC8/RX8664PC32/RX8664PLT32 calls, but the two relocation types are not handled.) The newly lld/ELF code overlaps with jobs of MachineBlockPlacement/BranchFolding/AsmPrinter/etc. If the current MachineFunction interface is not flexible (e.g. it does not allow sharable MachineBasicBlocks between MachineFunctions.) enough, let's fix it.


LLVM Developers mailing list llvm-dev at lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200210/32efe37d/attachment.html>



More information about the llvm-dev mailing list