Integrated Distributed ThinLTO (DTLTO): Design Overview by bd1976bris · Pull Request #126654 · llvm/llvm-project (original) (raw)

Initial DTLTO Support

This PR introduces initial support for Integrated Distributed ThinLTO (DTLTO). DTLTO was previously discussed in an RFC and during the LTO roundtable discussion at the October US 2024 LLVM conference. PlayStation has offered this feature as a proprietary technology for some time, and we would like to see support in LLVM.

Overview of DTLTO

DTLTO enables the distribution of backend ThinLTO compilations via external distribution systems, such as Incredibuild. Existing support for distributing ThinLTO compilations typically involves separate thin-link (--thinlto-index-only), backend compilation, and link steps coordinated by a modern build system, like Bazel. This "Bazel-style" distributed ThinLTO requires a modern build system as it must handle the dynamic dependencies specified in the summary index file shards. However, adopting a modern build system can be prohibitive for users with established build infrastructure.

In contrast, DTLTO manages distribution within LLVM during the traditional link step. This approach means that DTLTO is usable with any build process that supports in-process ThinLTO.

Documentation and Resources

RFC: Integrated Distributed ThinLTO RFC
Documentation:
- Feature documentation added in this PR at llvm/docs/dtlto.rst.
- Clang and LLD CLI documentation added at clang/docs/dtlto.rst and lld/docs/dtlto.rst.
Example Distributors: Included distributors for LIT testing added under llvm/utils/dtlto.
Real Distribution System Support: Example scripts which work with real distribution systems are available for Windows and Linux.

Features of This Initial Commit

This commit provides a minimal but functional implementation of DTLTO, which will be expanded in subsequent commits. The current implementation includes:

COFF and ELF support.
Support for bitcode members in thin archives.
Basic support for distributing backend ThinLTO compilations.
A JSON interface that allows new distribution systems to be supported without modifying LLVM.

The goal of this initial commit is to demonstrate what will be required to support DTLTO while providing useful minimal functionality. Hopefully, having a concrete PR will facilitate discussion and review of the feature.

Performance

We have access to a large farm of computers on Windows. For a link of clang.exe on a modest Windows development machine (AMD64 16 cores, 64GB RAM) DTLTO (via sn-dbs.py) was approximately 4 times as fast as multi-threaded in-process ThinLTO.

To estimate the overhead from DTLTO vs in-process ThinLTO, we measured the difference in the time taken to link Clang with in-process ThinLTO using one thread per core, and DTLTO using one local process per core. On both Windows and Linux the overhead was approximately 6%.

Note that, to facilitate review, this PR elides performance optimizations where possible.

Planned Future Enhancements

The following features will be addressed in future commits:

Support for the ThinLTO cache.
Support for (non-thin) archives/libraries containing bitcode members.
Support for more LTO configuration states e.g., basic block sections.
Performance improvements. For example, improving the performance of the temporary file removal.

Discussion Points

Feature Name: The DTLTO name could potentially cause confusion with the existing Bazel-style distributed ThinLTO. At the LLVM roundtable discussion no one objected to the name, but we remain open to suggestions.
Backend Compilation Configuration: Currently, Clang is invoked to do the backend compilations and a minimal number of options are added to the Clang command line to ensure that the codegen is reasonable (for the testing we have done so far). However, it would be good to find a scalable solution for matching the code-generation state in the invoked external tool to the code-generation state if an in-process ThinLTO backend was in use.
Clang Error Handling: There is some precedent for compiler drivers handling options that only apply to specific linkers. Should Clang emit an error if DTLTO options are used and the linker isn't LLD?

Other approaches

We have experimented with other approaches for implementing DTLTO. In particular we have explored:

Not using a new ThinLTO backend.
Various ways to handle (non-thin) archives.
Use of dynamic library plugins instead of processes.

We have prepared another branch to demonstrate some of these ideas: integrated-DTLTO-no-backend

List of Child PRs:

(I intend to update this section as new PRs are filed.)

LLVM 21:

✅ Core LLVM functionality (merged)
- PR [DTLTO][LLVM] Integrated Distributed ThinLTO (DTLTO) #127749
✅ ELF LLD support: Note no thin archive support (merged)
- PR [DTLTO][LLD][ELF] Add support for Integrated Distributed ThinLTO #142757
✅ LLD ELF Support bitcode members of thin archives (merged)
- PR [DTLTO][LLD][ELF] Support bitcode members of thin archives #149425
✅ Clang UI (merged)
- PR [DTLTO][Clang] Add support for Integrated Distributed ThinLTO #147265
✅ COFF LLD support (merged)
- PR [DTLTO][Clang] Add support for Integrated Distributed ThinLTO #147265
✅ COFF LLD make /wholearchive work with thin archives (merged)
- PR [LLD][COFF] Make /wholearchive thin-archive member identifiers consistent #145487 (merged)

LLVM 22:

✅ DTLTO in-process ThinLTO cache support
- PR [DTLTO] [LLVM] Initial DTLTO cache implementation #156433 (merged)
🔄 non-thin archive support for DTLTO
- PR [DTLTO][ELF][COFF] Add archive support for DTLTO. #157043 (In review)
🔄 Add Time Trace Scopes for the DTLTO ThinLTO backend
- PR [DTLTO] Add DTLTO time traces (and llvm-lto2 time tracing to test) #171600 (In review)