Since Python 2.1 [1], when random.shuffle was added, the documentation has said: """Note that for even rather small len(x), the total number of permutations of x is larger than the period of most random number generators; this implies that most permutations of a long sequence can never be generated.""" This comment is incorrect and misleading. In fact, I claim that shuffle can produce "all" permutations for any representable sequence. To shuffle a sequence of length N requires log(N!) ~ N * log(N/e) bits of randomness [2]. The random module provides a generator with "a period of 2**19937-1", meaning you can get 2**19937 bits of randomness out of it before it starts repeating. All of which is to say that any representable sequence, say N < 2**50, will need no more than 2**60 bits of randomness to shuffle. That is well within the period of the random number generator. Attached is a patch that deletes the comment. An illustration of this misconception is at [3]. 1: http://docs.python.org/release/2.1/lib/module-random.html 2: https://en.wikipedia.org/wiki/Factorial#Rate_of_growth_and_approximations_for_large_n 3: http://stackoverflow.com/questions/3062741/maximal-length-of-list-to-shuffle-with-python-random-shuffle
It seems to me that 2080 (per the accepted answer to your [3]) is indeed "a rather small len(x)", and that the docs are correct as written. I wonder if it would be worth adding a footnote that explains how to calculate that example 2080 number from the documented period of the random number generator, to make the concepts involved a little bit clearer?
When the comment was introduced, Python's Wichmann-Hill generator had a much shorter period, and we couldn't even generate all the permutations of a deck of cards. The period is astronomically larger now, but the stackoverflow answer (2080) is correct for the current upper bound. The docs could be beefed up to say more about that - but they'd get awfully tedious ;-) To the OP, this is your error: it takes lg(N!) "bits of randomness" (lg is the logarithm to base 2) to generate _one_ random permutation. That says nothing about what's needed to generate _all_ permutations. With an RNG of period P, there are P possible starting states. The starting state wholly determines the permutation produced. Therefore an RNG of period P cannot generate more than P distinct permutations (and that's an absolute upper bound - it may not be achieved). That's why comparing N! to P is the correct computation here, and indeed: >>> math.factorial(2080) > 2**19937 - 1 False >>> math.factorial(2081) > 2**19937 - 1 True If you still don't believe it, here's a challenge: take a toy RNG with a small period, and use it to generate permutations (via any method you like). You'll find that you never get more distinct permutations than the period of your RNG. Then reread the above ;-)
Okay, I see what the comment is saying now. I was mistaken. It might make the statement clearer if it is made more precise. "rather small" is vague. That vagueness is intentional, but it makes it more confusing.