[PSA] Annotating LLVM Public Interface (original) (raw)
Hi everyone! I’m a new contributor to LLVM, and I have been looking into building LLVM as a DLL (shared library) on Windows. To support this option, we are adding annotations to LLVM’s public headers to explicitly describe the set of symbols that should be visible externally.
Details
Code changes will primarily consist of annotating LLVM’s public symbols with the LLVM_ABI
macro already defined in llvm/Support/Compiler.h. There are similar macros for annotating C++ template instantiations which are used in some less common situations. A portion of the codebase is already annotated.
Because the macros are inactive by default, adding them throughout the codebase is low-risk and can be done incrementally. Annotations will not become mandatory until the entire codebase has been annotated and there are CI jobs, documentation, and tools in place to catch regressions.
Generally, annotations will be added to individual symbols rather than to entire classes. This method is preferred for a couple of reasons:
- It leads to exporting only the symbols that are truly needed. Limiting the exported surface area enables improved LTO and will help mitigate the risk of hitting Windows’ 64K DLL export limit
- It avoids issues that arise from exporting generated copy constructors and operators. While these issues can be be solved by explicitly deleting compiler generated methods, they will be difficult for unfamiliar engineers to diagnose and fix.
The bulk of annotations will be added mechanically using the Interface Definition Scanner tool, which leverages clang’s AST and rewriter libraries.
Previous Efforts
This LLVM discourse from 2021 covers the original proposal in detail and is still mostly relevant. Following that discussion, There was some initial work in 2023 which identified issues and proved-out viability. This work resulted in this Discord discussion.
In 2024, the effort was resumed as part of a GSoC project to support clang plugins on Windows. This work is primarily tracked in this issue on GitHub. The project added build options to build LLVM as a DLL, introduced the macros to annotate LLVM’s public surface area, and annotated a portion of the codebase. The work to get LLVM fully building as a DLL is incomplete.
Maintainability
Most LLVM developers do not build on Windows locally, so they may not immediately catch breaks caused by missing symbol annotations. There are a number of things we will do to help identify issues earlier in the development cycle. Annotations will not be mandatory until these pieces are in place.
1. Documentation and Examples
The use cases for LLVM_ABI
and related macros will be documented and discoverable. We will document, with examples, patterns and situations that may occur to make it easy for developers to address related issues that arise during development.
2. Windows LLVM DLL CI build job
A Windows LLVM DLL build job to CI will catch unannotated symbols at link time. This job can run either pre- or post-merge. This build job will not catch any unannotated LLVM symbol referenced by projects that don’t get built.
We may also consider changing the default Windows build to LLVM DLL. This change would let all existing Windows build jobs to catch missing export issues.
3. Approximate DLL export behavior on other shared-library builds
We can achieve similar behavior to Windows DLL exports in other environments by building ELF and Mach-O shared libraries with default hidden symbol visibility. This result is achieved by setting -fvisibility-default=hidden
and re-defining the LLVM_ABI
annotation to __attribute__((__visibility__("default")))
. The existing annotations in llvm/Support/Compiler.h
already behave this way when configured for a non-Windows shared library build.
This mechanism will produce similar behavior to the Windows DLL build and could catch most issues without building for Windows. However, since most developers are using static library builds locally, this change won’t necessarily result in catching missing annotations earlier.
4. Static analysis with the Interface Definition Scanner tool
The Interface Definition Scanner tool will be run on PRs to flag newly introduced symbols that are not properly annotated for export. It can run much faster than a full Windows build of all projects, and can suggest exact fixes to address missing exports.
Once the bulk of symbol annotations have been merged, we can enable IDS to run on all LLVM PRs – there is not need to wait until building Windows as a DLL is a complete or fully supported configuration.
Additional Background
LLVM can already be built as a shared library on ELF- and Mach-O-based systems; however, building it as a Windows DLL is more involved for several reasons:
- Symbols are not exported from a DLL by default, similar to building ELF shared libraries with
fvisibility-default=hidden
. To make a symbol externally visible, it must be explicitly exported when building the DLL. A symbol can be exported by annotating it with__declspec(dllexport)
or by adding its name to a module definition (.def
) file. - Symbols imported from a Windows DLL may be annotated with
__declspec(dllimport)
when compiling clients to remove a level of runtime indirection. This annotation is not strictly required; however, if the symbol is not annotated with__declspec(dllimport)
, it is the responsibility of the developer to dereference the pointer to use the symbol. - CMake v3.4 introduced support to automatically export all symbols from a DLL with the
WINDOWS_EXPORT_ALL_SYMBOLS
target property. LLVM currently requires minimum CMake version 3.20. - A single Windows DLL can export a maximum of 65,535 symbols. This limitation most likely prevents us from brute-force exporting everything using CMake’s
WINDOWS_EXPORT_ALL_SYMBOLS
.
Exporting C++ Classes
When defining DLL exports, it is possible to annotate entire C++ classes and structs, rather than their individual members, with __declspec(dllexport)
. Annotating a class will export every method and static field in the class class including:
- Compiler generated methods, such as copy/move constructors and assignment operators
- Methods defined entirely in the header
- Private methods
- RTTI/vtable as appropriate
Annotating a class does not implicitly export nested classes/structs or any friend class or function declarations. A class with a class-level annotation cannot also have annotated members-- it will fail to compile.
The advantage of annotating at the class level is that new members will be automatically exported. However, exporting entire classes can cause significantly more methods to be exported than necessary, and it can lead to tricky-to-debug problems with compiler-generated methods.