[Python-Dev] a note in random.shuffle.doc ... (original) (raw)

Tim Peters tim.peters at gmail.com
Sat Jun 10 23:31:39 CEST 2006


[Alex Martelli]

...claims:

Note that for even rather small len(x), the total number of permutations of x is larger than the period of most random number generators; this implies that "most" permutations of a long sequence can never be generated. Now -- why would the behavior of "most" random number generators be relevant here? The module's docs claim, for its specific Mersenne Twister generator, a period of 2**19997-1, which is (e.g.)

Oops! That's wrong. The period is 2**19937-1.

a comfortable 130128673800676351960752618754658780303412233749552410245124492452914636 028095467780746435724876612802011164778042889281426609505759158196749438 742986040468247017174321241233929215223326801091468184945617565998894057 859403269022650639413550466514556014961826309062543 times larger than the number of permutations of 2000 items, which doesn't really feel to me like a "rather small len(x)" in this context (what I'm most often shuffling is just a pack of cards -- len(x)==52 -- for example).

I suspect that the note is just a fossil from a time when the default random number generator was Whichman-Hill, with a much shorter period. Should this note just be removed, or instead somehow reworded to point out that this is not in fact a problem for the module's current default random number generator? Opinions welcome!

It should be removed now. I'll do that :-)

WH's period was indeed so short that it couldn't generate even a tiny fraction of the permutations of a deck of cards, and that's why the note was added.

While a long period is necessary to get a shot at all permutations, it's not sufficient, and I don't know what the true story is wrt the Twister. For example, a miserable PRNG that returns

0.1, 0.1, 0.2, 0.1, 0.2, 0.2, 0.1, 0.2, 0.2, 0.2, 0.1, 0.2, 0.2, 0.2, 0.2, ...

has infinite period, but has few (O(N)) distinct subsequences of length N. That's a failure of so-called equidistribution in N dimensions (for sufficiently large N, some N-vectors appear more often than others across the whole period). "A long" period is necessary but not sufficient for high-dimensional equidistribution.

Off the top of my head, then, since the Twister is provably equidistributed in 623 dimensions to 32-bit accuracy, I expect it should be able to "fairly" generate all permutations of a sequence of <= 623 elements (equidistribution in N dimensions implies equidistribution in all dimensions <= N). So I'm happy to leave a warning out until the casinos switch to 12-deck blackjack ;-)



More information about the Python-Dev mailing list