Cache declarative macro expansion on disk (for incremental comp.). Based on #128605 by futile · Pull Request #128747 · rust-lang/rust (original) (raw)

These are my takeaways from the perf run:

Lots of mid-big regressions for many benchmarks, and small-mid regressions on many others.
Only html5ever sees improvements, but there it actually gets a noticeable -10% instructions in most configurations.
hyper and libc both fail to build due to AttrId in the argument TokenStream, which could either not be flattened or not be cached; But TokenStream::flattened() definitely seems to not be enough in practice.

Unconditional Decl Macro Caching Seems to Not Be Worth It

I think the big takeaway is that unconditionally caching all declarative macro expansions for incremental compilation is not worth it. I'd assume the overhead of caching+retrieving is simply much bigger than re-evaluation for many/most of the invocations. However, the big -10% gain for html5ever shows that some crates can benefit from incremental caching. As far as I remember html5ever is very heavy on declarative macro usage (I think it predates proc macros by quite a bit, it was around at/before Rust 1.0 iirc?), and thus might have some macro expansions that take enough time for disk-caching to be worth it.

Needs Cost-Heuristic to Be Useful, but Potential Gains Uncertain

At this point a next step could be to try to figure out some cut-off/condition for when to cache decl macro expansions. I.e., some kind of "complex enough that disk-caching is (probably) useful"-heuristic. However, as we saw from the perf run in #128605, integration with the query system has a non-zero cost, which we would thus need to overcome as well (for almost all crates that invoke decl macros), which would need to be tried out/experimented on. How big the possible gains are is also hard to say, for many crates all decl macro invocations might just be cheap enough that caching is never worth it, thus only leaving the query system integration overhead. This overhead might also be improvable, but I don't really have any idea about that.

One possibility that could make this still useful would be if integration into the query system would enable more parts of the dependency graph to stay "green" (I think?), which currently always need to be recomputed. But given the performance regressions that at least doesn't seem to be the case at the moment (and probably applies for many other operations as well).

Implementation Probably Needs to Wait For #124141 Anyway

The build failures of hyper and libc seem to be due to TokenStream::flattened()/trying to stable-hash the TokenStream, which seems to not work yet. So it probably doesn't make sense to try this before #124141 anyway.

Ok, that said, since my initial motivation was to cache proc macro expansions (see #99515 (comment)), I tackled this mostly because it seemed to be a good first step implementation-wise (and because I wanted to). However, it seems that performance-wise it isn't a good first step 😅 But it has given me a basis for how incremental caching and macro expansion work in the compiler, which should still help when tackling proc macros.

Not sure if I want to immediately start that, also have some other stuff coming up/a bit bummed out by this work not being useful in the end. I didn't know that this might not be a certain good first step (due to potential perf losses), but just assumed it because the other PR had been commented on/discussion going on, so I thought it was the right direction. Turns out it wasn't, guess that can only be known afterwards. Oh well, good experience I guess, and reviewers already have enough to do anyway :)

So thanks a lot for engaging and taking the time to review @petrochenkov! :) It would be ok for me to close this PR, since I don't plan to work on it again soon (would rather tackle proc macro caching), not sure what standard procedure here is. I hope that is okay. I guess it would be nice to have the perf results "findable" somehow, but I'll just leave that up to you.