incr.comp.: Explore delayed read-edge deduplication or getting rid of it entirely. (original) (raw)

At the moment the compiler will always eagerly deduplicate read-edges in CurrentDepGraph::read_index. In order to be able to do this the compiler has to allocate a HashSet for each task. My suspicion is that there is not much duplication to begin with and that there's potential for optimization here:

So the first step would be to collect some data on how much read-edge duplication there even is. This is most easily done by modifying the compiler to count duplicates (in librustc/dep_graph/graph.rs) and print the number/percentage in -Zincremental-info. This modified compiler can then be used to compile a number of crates to get an idea what's going on.

If duplication is low (e.g. less than 2% of reads got filtered out), then we could just remove de-duplication and test the effect with a try-build. Otherwise, we can move deduplication to DepGraph::serialize() and measure the performance impact of that.