Explore adding a reproducibility test to rust test infrastructure. by biabbas · Pull Request #139793 · rust-lang/rust (original) (raw)

@Kobzol

If we do any tests post-merge, it will be quite difficult to act upon them. We should IMO either do this on a best-effort basis out-of-tree (or only in PR CI), or make it actually blocking for merging PRs and run it on auto builds.

I'm not fully sure of this. The test seems too slow to be run on each PR. If we do this as a post merge, this would require some effort again to be fixed. Also there are reproducibility issues already with rustdoc, cargo-clippy etc. I suggest that this test be run as post merge test and reproducibility issues to be fixed before releases.

When a reproducibility test fails, it can be quite difficult to understand what went wrong. We should first invest in at least some basic tooling that will display some useful form of a diff between the binary artifacts so that we can actually figure out what went wrong.

I did try adding diffoscope for this, but diffoscope takes up too much time. When I tried adding this test diffoscope always got cancelled, I think this is because the test had already crossed 5 hours.

A related aspect to that is reproducibility of the test itself. If the test will be implemented in a separate CI workflow, it will be impossible to run it locally. It would be great to have the option to run the test locally, likely either as a bootstrap test or a run-make test. Both would require some code changes and test infra improvements to make this possible, probably.

This is also why I'd like to do this in our normal CI workflow, not in a separate workflow that doesn't have any of our supporting infrastructure.

This sounds good. We should provide infrastructure to run this test locally.
Integrating this into test suite, would reduce duplicate steps that don't change the build, for example downloading of components.

Or, as an alternative, maybe we could do a stage 2 build, then move the sources to a different location, do a stage3 build, and just compare stage 2 and stage 3?

This could work.