Add a deterministic constructor for RandomState · Issue #523 · rust-lang/libs-team (original) (raw)

Proposal

Problem statement

The default hasher for std::collections::{HashMap, HashSet} (via the S type parameter) is RandomState, whose constructor generates fresh random keys for every instance. That's a good default for security, but problematic when hash collections must behave deterministically. Although in principle it is possible to use other deterministic hashers, there are use cases that effectively require RandomState. The problem is that today there is no way to deterministically construct RandomState values.

Motivating examples or use cases

My main interest is testing complex concurrent systems using tools like Loom and Shuttle. In these use cases it is crucial for the programs under test to be deterministic, or, rather, that all nondeterminism (e.g., scheduling choices, rand calls, etc.) is controlled by the tool. Otherwise test failures are not reproducible, which is frustrating and significantly diminishes the utility these testing tools can offer.

One unexpected source of nondeterminism are the standard library hash collections with their default hasher RandomState. This manifests in different iteration orders in every execution. In 1P code it is possible to replace all hash collections with a deterministic hasher (potentially using a feature flag for testing, to keep the default behavior in production). However, since this results in a different type, we run into a type incompatibility with 3P libraries that have hash collections with RandomState hardcoded in their public interface. Sadly that's a reality, see for example aws_sdk_dynamodb (and this related discussion in the smithy-rs code generator).

Solution sketch

The proposal is to introduce a new deterministic constructor for RandomState. My concrete proposal in rust-lang/rust#135578 is to introduce a parameter-less constructor

pub fn deterministic() -> RandomState { RandomState { k0: 0, k1: 0 } }

and keep the internal representation (currently a pair of u64s) private. However, if experts think that the internal representation can be exposed, that's fine too.

Alternatives

Today there is no safe way to construct deterministic RandomState values. Folks might take the shameful path of transmuting (u64, u64) into RandomState.

It would be even nicer if hash collections (and possibly other parts of the standard library) could just be made deterministic via a compiler option, so that one can just write HashMap::new() instead of having to write HashMap::with_hasher(RandomState::deterministic()) with the solution described above. I'm happy to entertain a discussion in that direction, but I imagine that this will be a much larger effort.