incr.comp.: Use ICH-based DepNodes in order to make them PODs · Issue #42294 · rust-lang/rust (original) (raw)
The current implementation of DepNode
has a DefId
or something more complex as its identifier. This has a few downsides:
DefId
s are bound to a specific compilation session. When the dependency graph of a previous session is loaded, all nodes in it have to be "re-traced", which means that theirDefPath
has to be mapped to aDefId
in the current compilation session. The has some cost at runtime and complicates the implementation (because we have to generate and store theDefPathDirectory
).- Re-tracing a
DepNode
can also fail because the item it refers to does not exist anymore in the current compilation session. This is not a conceptual problem but it is an annoyance thatDepNodes
from a previous dep-graph cannot easily be represented. At the moment this is solved by makingDepNode
generic over the type of identifier it uses (eitherDefId
orDefPath
) but that is a burden. - Independently of
DefId
, someDepNode
variants are expensive to copy because they contain vectors.
This is a proposal to use a simplified design that uses a generic, globally unique fingerprint as the disambiguator for DepNode
:
enum DepNodeKind { Krate, Hir, HirBody, MetaData, ... }
struct DepNode { kind: DepNodeKind, // 128+ bit hash value identifying the DepNode fingerprint: Fingerprint, }
Since this is using a stable hash (like the ICH), a DepNode
like this is valid across compilation sessions and does not need any re-tracing. It's also "plain old data" in the sense that it contains no pointers or anything else that is context dependent. As a consequence, it can be easily copied, shared between threads, and memory mapped, something that is not possible with the current design.
The fingerprint-based approach has a few potential downsides but all of them can be addressed adequately, I think:
- A truly stable fingerprint/hash value is not trivial to compute.
Solution: We already have the whole ICH infrastructure available which can handle anything we throw at it. - There's a runtime cost to turning a
DefId
or other identifiers into a fingerprint.
Solution: In the vast majority of cases, it is really just oneDefId
that needs to be hashed. In this case we already have the hash available (as theDefPathHash
) and access it via a simple array-lookup. Additionally, allDepNode
are cached in the dependency graph once they are created and can be looked up via a 32-bitDepNodeIndex
if the need arises. - Using a hash value introduces the risk of hash collisions.
Solution: This is already a risk for incremental compilation and is mitigated by using a high quality hash function with low enough collision probability. Risk can be adjusted by using fingerprints with more bits. - A fingerprint is opaque while a
DefId
allows to reconstruct which item it points to. This is bad for debugging output.
Solution: This is can be mitigated by just constructing lookup tables for mapping a Fingerprint back to its ingredients if-Zquery-dep-graph
is specified.
cc @nikomatsakis and @eddyb