Add a deterministic constructor for RandomState
· Issue #523 · rust-lang/libs-team (original) (raw)
Proposal
Problem statement
The default hasher for std::collections::{HashMap, HashSet}
(via the S
type parameter) is RandomState
, whose constructor generates fresh random keys for every instance. That's a good default for security, but problematic when hash collections must behave deterministically. Although in principle it is possible to use other deterministic hashers, there are use cases that effectively require RandomState
. The problem is that today there is no way to deterministically construct RandomState
values.
Motivating examples or use cases
My main interest is testing complex concurrent systems using tools like Loom and Shuttle. In these use cases it is crucial for the programs under test to be deterministic, or, rather, that all nondeterminism (e.g., scheduling choices, rand
calls, etc.) is controlled by the tool. Otherwise test failures are not reproducible, which is frustrating and significantly diminishes the utility these testing tools can offer.
One unexpected source of nondeterminism are the standard library hash collections with their default hasher RandomState
. This manifests in different iteration orders in every execution. In 1P code it is possible to replace all hash collections with a deterministic hasher (potentially using a feature flag for testing, to keep the default behavior in production). However, since this results in a different type, we run into a type incompatibility with 3P libraries that have hash collections with RandomState
hardcoded in their public interface. Sadly that's a reality, see for example aws_sdk_dynamodb (and this related discussion in the smithy-rs code generator).
Solution sketch
The proposal is to introduce a new deterministic constructor for RandomState
. My concrete proposal in rust-lang/rust#135578 is to introduce a parameter-less constructor
pub fn deterministic() -> RandomState { RandomState { k0: 0, k1: 0 } }
and keep the internal representation (currently a pair of u64
s) private. However, if experts think that the internal representation can be exposed, that's fine too.
Alternatives
Today there is no safe way to construct deterministic RandomState
values. Folks might take the shameful path of transmuting (u64, u64)
into RandomState
.
It would be even nicer if hash collections (and possibly other parts of the standard library) could just be made deterministic via a compiler option, so that one can just write HashMap::new()
instead of having to write HashMap::with_hasher(RandomState::deterministic())
with the solution described above. I'm happy to entertain a discussion in that direction, but I imagine that this will be a much larger effort.