msg147557 - (view) |
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *  |
Date: 2011-11-13 16:43 |
I noticed that several usages of random.getrandbits() actually need bytes. A few examples: - Lib/test/test_zlib.py calls "random.getrandbits(8 * _1M).to_bytes()" - Twisted uses the %x format and then call .decode('hex') Another examples found with Code Search: - struct.pack("Q", random.getrandbits(64)) - for i in range(8): ClientChallenge+= chr(random.getrandbits(8)) - key = sha1(str(random.getrandbits(8*20))).digest() random.getrandbits() itself is not a cheap call: it ends with a call to _PyLong_FromByteArray, and transformation to bytes will involve more copy, conversions from 30bit digits (or 15bit digits) to bytes, or worse. This patch adds random.getrandbytes(), which creates a bytes string directly from calls to genrand_int32(). |
|
|
msg147562 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2011-11-13 17:02 |
Good idea, IMO. + cum = set() + for i in range(100): + val = getbytes(span) + cum |= set(i for i in range(span) if val[i]) + self.assertEqual(len(cum), span) I find this test a bit strange. Also, perhaps it should be bitwise rather than bytewise. |
|
|
msg147567 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2011-11-13 19:01 |
How would this work for other random number generators that don't supply genrand_int32()? The API for random is supposed to be easily subclassable and reusable for other random number generators. The requirements for those generators is intentionally kept minimal (i.e. they only need to supply random(), seed(), getstate() and setstate() and optionally getrandbits()). I'm -0 on making this API fatter. |
|
|
msg147569 - (view) |
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *  |
Date: 2011-11-13 19:15 |
> How would this work for other random number generators that don't > supply genrand_int32()? genrand_int32 is an internal function, only available in C for the Mersenne Twister generator. random.SystemRandom() should provide getrandbytes as well, it would just call os.urandom(); I don't know other generators. |
|
|
msg147570 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2011-11-13 19:50 |
> genrand_int32 is an internal function, > only available in C for the Mersenne Twister generator. Yes, I know. I'm the one added that code ;-) > I don't know other generators. The problem is that we need an API that will accommodate other random number generators and not be specific to the MersenneTwister. Right now, the starting point for everything in the random module is an underlying generator supplying a random() method returning a float and an optional getrandombits() method which returns an int (possibly a long int). The latter is easily convertible to bytes with to_bytes() which uses a fast O(n) algorithm. ISTM, the getrandbytes() proposal amounts to a micro-optimization of existing capabilities, and it comes at the expense of fattening the API. |
|
|
msg147571 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2011-11-13 20:41 |
> The problem is that we need an API that will accommodate other random > number generators and not be specific to the MersenneTwister. Right > now, the starting point for everything in the random module is an > underlying generator supplying a random() method returning a float and > an optional getrandombits() method which returns an int (possibly a > long int). Well, you can provide a default getrandombytes() which calls into getrandombits(), and specialize it in the case genrand_int32() exists. > The latter is easily convertible to bytes with to_bytes() which uses > a fast O(n) algorithm. Well, O(n) doesn't necessarily equate fast. Can Amaury post benchmark numbers of getrandbits().to_bytes() vs getrandbytes()? |
|
|
msg147573 - (view) |
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *  |
Date: 2011-11-13 21:14 |
./python -m timeit -s "from random import getrandbits" "getrandbits(8000000).to_bytes(1000000, 'little')" 10 loops, best of 3: 25 msec per loop ./python -m timeit -s "from random import getrandbytes" "getrandbytes(1000000)" 100 loops, best of 3: 9.66 msec per loop For the Mersenne Twister, getrandbytes() is ~2.5 times faster (A length of 1000 gives exactly the same ratio) When applied to the SytemRandom object, the difference is less impressive (probably because os.urandom is slow) but getrandbytes is still 20ms faster. Updated patch to add getrandbytes at the module level, and to SystemRandom. |
|
|
msg147583 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2011-11-14 02:57 |
The differential cost of generating n random bytes is negligible compared to actually doing anything with the bytes once their generated. This optimization is close to being a total waste (saving 15 milliseconds for the abnormal case of generating 1 million random bytes). |
|
|
msg192924 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2013-07-12 06:33 |
Closing for the reasons listed above. |
|
|