msg309956 - (view) |
Author: Johnny Dude (johnnyd) |
Date: 2018-01-15 09:08 |
When using a tuple that include a string the results are not consistent when invoking a new interpreter or process. For example executing the following on a linux machine will yield different results: python3.6 -c 'import random; random.seed(("a", 1)); print(random.random())" Please note that the doc string of random.seed states: "Initialize internal state from hashable object." Python documentation does not. (https://docs.python.org/3.6/library/random.html#random.seed) This is very confusing, I hope you can fix the behavior, not the doc string. |
|
|
msg309957 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2018-01-15 09:13 |
random.seed(str) uses: if version == 2 and isinstance(a, (str, bytes, bytearray)): if isinstance(a, str): a = a.encode() a += _sha512(a).digest() a = int.from_bytes(a, 'big') Whereas for other types, random.seed(obj) uses hash(obj), and hash is randomized by default in Python 3. Yeah, the random.seed() documentation should describe the implementation and explain that hash(obj) is used and that the hash function is randomized by default: https://docs.python.org/dev/library/random.html#random.seed |
|
|
msg310009 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2018-01-15 18:46 |
> This is very confusing, I hope you can fix the behavior, not the doc string. I'll fix the docstring to make it more specific. We really don't want to use hash(obj) because it produces too few bits of entropy. |
|
|
msg310019 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2018-01-15 21:49 |
Maybe deprecate using a hash? |
|
|
msg320360 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2018-06-24 06:59 |
> Maybe deprecate using a hash? Any deprecation will likely break some existing code, but it would be nice to restrict inputs types to int, float, bytes, bytearray, or str. Then we could remove all reference to hashing. |
|
|
msg320361 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2018-06-24 07:08 |
This is what I meant. Emit a deprecation warning for input types other than explicitly supported types (but I didn't think about float), and raise an error in future. |
|
|
msg320383 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2018-06-24 19:44 |
I'm thinking of something like this: $ git diff diff --git a/Lib/random.py b/Lib/random.py index 1e0dcc87ed..f479e66ada 100644 --- a/Lib/random.py +++ b/Lib/random.py @@ -136,12 +136,17 @@ class Random(_random.Random): x ^= len(a) a = -2 if x == -1 else x - if version == 2 and isinstance(a, (str, bytes, bytearray)): + elif version == 2 and isinstance(a, (str, bytes, bytearray)): if isinstance(a, str): a = a.encode() a += _sha512(a).digest() a = int.from_bytes(a, 'big') + elif not isinstance(a, (type(None), int, float, str, bytes, bytearray)): + _warn('Seeding based on hashing is deprecated.\n' + 'The only supported seed types are None, int, float, ' + 'str, bytes, and bytearray.', DeprecationWarning, 2) + super().seed(a) self.gauss_next = None |
|
|
msg321759 - (view) |
Author: Lee Griffiths (poddster) |
Date: 2018-07-16 19:25 |
a) This below issue added doc to py2.7 that calls out PYTHONHASHSEED, but py doesn't currently contain those words https://bugs.python.org/issue27706 It'd be useful to have the something whether the "behaviour" is fixed or not, as providing other objects (like a tuple) will still be non-deterministic. b) I don't know if this is the correct issue to heap this on, but I think it might as you're looking at changing the seed function? The documentation for `object.__hash__` calls out `str`, `bytes` and `datetime` as being affected by `PYTHONHASHSEED`. Doesn't it seem odd that there's a workaround in the seed function for str and bytes, but not for datetime? https://docs.python.org/3/reference/datamodel.html#object.__hash__ I mainly point this out as seeding random with the current date/time is idiomatic in many languages and environments (usually used when you log the seed to be able to recreate things later, or just blindly copying the historical use `srand(time(NULL))` from C programs!). Anyone shoving a datetime straight into seed() is going to find it non-deterministic and might not understand why, or even notice, especially as the documentation for seed() doesn't call this out. Those "in the know" will get a unix timestamp out of the datetime and put that in seed instead, but I feel that falls under the same argument as users-in-the-know SHA512ing a string, mentioned above, which is undesirable and apparently something the function should implement and not users. Would it be wise for datetime to have a specific implementation as well? |
|
|
msg350209 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2019-08-22 16:19 |
New changeset d0cdeaab76fef8a6e5a04665df226b6659111e4e by Raymond Hettinger in branch 'master': bpo-32554: Deprecate hashing arbitrary types in random.seed() (GH-15382) https://github.com/python/cpython/commit/d0cdeaab76fef8a6e5a04665df226b6659111e4e |
|
|