[llvm-dev] [RFC] Proposal: llvm-tapi, adding YAML/stub generation for ELF linking support (original) (raw)

Armando Montanez via llvm-dev llvm-dev at lists.llvm.org
Wed Sep 26 08:29:56 PDT 2018

Previous message: [llvm-dev] [dfsan] union extension for userdata
Next message: [llvm-dev] [RFC] Proposal: llvm-tapi, adding YAML/stub generation for ELF linking support
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hello all,

LLVM-TAPI seeks to decouple the necessary link-time information for a dynamic shared object from the implementation of the runtime object. This process will be referred to as dynamic shared object (DSO) stubbing throughout this proposal. A number of projects have implemented their own versions of shared object stubbing for a variety of reasons related to improving the overall linking experience. This functionality is absent from LLVM despite how close the practice is to LLVM’s domain. The goal of this project would be to produce a library for LLVM that not only provides a means for DSO stubbing, but also gives meaningful insight into the contents of these stubs and how they change. I’ve collected a few example instances of object stubbing as part of larger tools and the key benefits that resulted from them:

Apple’s TAPI [1]: Stubbing used to reduce SDK size and improve build times.
Oracle’s Solaris OS linker [2]: Stubbing used to improve build times, and improve robustness of build system (against dependency cycles and race conditions).
Google’s Bazel [3]: Stubbing used to improve build times.
Google’s Fuchsia [4] [5]: Stubbing used to improve build times.
Android NDK: Stubbing used to reduce size of native sdk, control exported symbols, and improve build times.

Somewhat tangentially, a tool called libabigail [6] provides utilities for tracking changes relevant to ELF files in a meaningful way. One of libabigai’s tools provides very detailed textual XML representations of objects, which is especially useful in the absence of a preexisting textual representation of shared objects’ exposed interfaces. Glibc [7] and libc++ [8] have made an effort to address this in their own ways by using scripts to produce textual representations of object interfaces. This functionality makes it significantly easier to analyze and control symbol visibility, though the existing solutions are quite bespoke. Controlling these symbols can have an implicit benefit of reducing binary size by pruning visible symbols, but the more critical feature is being able to easily view and edit the exposed symbols in the first place. Using human-readable stubs addresses the issues of DSO analysis and control without requiring highly specialized tools. This does not strive to replace tools altogether; it just makes small tasks significantly more approachable.

llvm-tapi would strive to be an intersection between a means to produce and link against stubs, and providing tools that offer more control and insight into the public interfaces of DSOs. More fundamentally, llvm-tapi would introduce a library to generate and ingest human-readable stubs from DSOs to address these issues directly in LLVM. Overall, this idea is most similar to the vein of Apple’s TAPI, as the original TAPI also uses human-readable stubs.

In general, llvm-tapi should:

Produce human-readable text files from dynamic shared objects that are concise, readable, and contain everything required for linking that can’t be implicitly derived.
Produce linkable files from said human readable text files.
Provide tools to track and control the exposed interfaces of object files.
Integrate well with LLVM’s existing tools.
Strive to enable integration of the original TAPI code for Mach-O support.

There are a number of key benefits to using stubs and text-based application binary interfaces such as:

Reducing the size of dynamic shared objects used exclusively for linking.
The ability to avoid re-linking an object when its dependencies’ exposed interfaces do not change but their implementation does (which happens frequently).
Simplicity of viewing a diff for a changed DSO interface. A large number of other use cases exist; this would open up the floor for a variety of other tools and future work as the concept is rather generic.

The proposed YAML format would be analogous to Apple’s .tbd format but differ in a few ways to support ELF object types. An example would be as follows:

--- !tapi-tbe-v1 soname: someobj.so architecture: aarch64 symbols:

name: fish type: object size: 48
name: foobar type: function warning-text: “deprecated in SOMEOBJ_1.3”
name: printf type: function
name: rndfunc type: function undefined: true ...

(Note that this doesn’t account for version sets, but such functionality can be included in a later version.)

Most of the fields are self-explanatory, with size not being relevant to function symbols, and warning text being purely optional. One reason this departs from .tbd format is to make diffs much easier: sorting symbols alphabetically on individual lines makes it much more obvious which symbols are added, removed, or modified. Despite the differences, the desire is for llvm-tapi to be structured such that integrating Apple’s Mach-O TAPI will be plausible and welcomed. Prior discussion [9] indicated interest in integrating Apple TAPI into LLVM, so I’d definitely like to leave that door open and encourage that in the future.

I feel the best place to start this is as a library to best facilitate integration into other areas of LLVM, later wrapping it in a standalone tool and eventually considering direct integration into LLD. The tool will initially support basic generation of .tbe and stub files from .tbe or ELF. This should give enough functionality for manually checking shared object interface diffs, as well as having access to linkable stubs. The goal is for the tool to eventually provide additional functionality such as compatibility checking, but that’s a ways into the future.shared

There’s multiple options for integrating llvm-tapi to work with LLD; LLD could directly use llvm-tapi to produce and ingest .tbe files directly, or llvm-tapi could be used to produce stubs that LLD can be taught to use. From a technical standpoint, these are not mutually exclusive. This step is a ways down the road, but is definitely a high-priority goal.

I’m interested to hear your thoughts and feedback on this.

Best, Armando

[1] https://github.com/ributzka/tapi [2] https://docs.oracle.com/cd/E23824_01/html/819-0690/chapter2-22.html [3] https://docs.bazel.build/versions/master/user-manual.html#flag--interface_shared_objects [4] https://fuchsia.googlesource.com/zircon/+/master/scripts/shlib-symbols [5] https://fuchsia.googlesource.com/zircon/+/master/scripts/dso-abi.h [6] https://sourceware.org/libabigail/ [7] https://sourceware.org/git/?p=glibc.git;a=blob;f=scripts/abilist.awk;h=bad7c3807e478e50e63c3834aa8969214bdd6f63;hb=HEAD [8] https://github.com/llvm-mirror/libcxx/blob/master/utils/sym_extract.py [9] http://lists.llvm.org/pipermail/cfe-dev/2018-April/thread.html#57576

Previous message: [llvm-dev] [dfsan] union extension for userdata
Next message: [llvm-dev] [RFC] Proposal: llvm-tapi, adding YAML/stub generation for ELF linking support
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the llvm-dev mailing list