msg339637 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2019-04-08 13:11 |
Currently, when the random module is imported, the hashlib module is always imported which loads the OpenSSL library, whereas hashlib is only needed when a Random() instance is created with a string seed. For example, "rnd = random.Random()" and "rnd = random.Random(12345)" don't need hashlib. Example on Linux: $ python3 Python 3.7.2 (default, Mar 21 2019, 10:09:12) >>> import os, sys >>> 'hashlib' in sys.modules False >>> res=os.system(f"grep ssl /proc/{os.getpid()}/maps") >>> import random >>> 'hashlib' in sys.modules True >>> res=os.system(f"grep ssl /proc/{os.getpid()}/maps") 7f463ec38000-7f463ec55000 r--p 00000000 00:2a 5791335 /usr/lib64/libssl.so.1.1.1b 7f463ec55000-7f463eca5000 r-xp 0001d000 00:2a 5791335 /usr/lib64/libssl.so.1.1.1b ... Attached PR only imports hashlib on demand. Note: I noticed this issue while working on adding OpenSSL 1.1.1 support to Python 3.4 :-) |
|
|
msg339638 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2019-04-08 13:16 |
In the past, some developers complained when an import has been removed in a stdlib module. I vaguely recall code using "import os" to get the "errno" module from "os.errno". That's "What's New in Python 3.7" contains: "Several undocumented internal imports were removed. One example is that os.errno is no longer available; use import errno directly instead. Note that such undocumented internal imports may be removed any time without notice, even in micro version releases." For this reason, I don't think that stable versions (2.7 and 3.7) should be modified. |
|
|
msg339652 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2019-04-08 15:35 |
Why do we care about this particular import? It doesn't see slow in any way. In general, we don't do deferred imports unless there is a compelling reason (i.e. it is very slow or it is sometimes unavailable). Otherwise, it is a false optimization. |
|
|
msg339666 - (view) |
Author: Brett Cannon (brett.cannon) *  |
Date: 2019-04-08 19:36 |
Could you explain a bit more, Victor, about why you want to avoid importing hashlib and OpenSSL so much? |
|
|
msg339673 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2019-04-08 20:32 |
Raymond: > In general, we don't do deferred imports unless there is a compelling reason (i.e. it is very slow or it is sometimes unavailable). While I was working on adding OpenSSL 1.1.1 to Python 3.4, my _hashopenssl module was broken. In that case, "import random" fails with ImportError because of hashlib failures. I was surprised, since random doesn't need hashlib at startup. I would like to be able to use "import random" even if hashlib is broken. For me, random is a key component, whereas I see hashlib more as optional. (Even if in practice, it should always be available). Raymond: > Otherwise, it is a false optimization. Brett: > Could you explain a bit more, Victor, about why you want to avoid importing hashlib and OpenSSL so much? Well, OpenSSL is not a random tiny library. For example, loading it increases Python RSS of around 2.7 MiB. Example with script x.py: --- import os os.system(f"grep ^VmRSS /proc/{os.getpid()}/status") import random os.system(f"grep ^VmRSS /proc/{os.getpid()}/status") --- Output without the change on Fedora 29: VmRSS: 7396 kB VmRSS: 11796 kB # +4.4 MiB With the change: VmRSS: 7272 kB VmRSS: 8988 kB # +1.7 MiB Another example is that OpenSSL loads the libz.so dynamic library by dependency. I would prefer to minimize Python footprint if possible. |
|
|
msg339674 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2019-04-08 20:48 |
Raymond: > Why do we care about this particular import? It doesn't see slow in any way. I ran a benchmark: $ ./python -m perf command -o ref.json -v -- ./python -c 'import random' $ # apply patch $ ./python -m perf command -o patch.json -v -- ./python -c 'import random' $ ./python -m perf compare_to ref.json patch.json Mean +- std dev: [ref] 13.8 ms +- 0.2 ms -> [patch] 11.8 ms +- 0.2 ms: 1.18x faster (-15%) My PR 12728 makes Python startup 2 ms faster which I consider as quite significant on 13.8 ms. I know that some users are fighting to get a faster Python startup. Mercurial is just one example ;-) |
|
|
msg339689 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2019-04-09 01:19 |
In general, deferred imports are code smell that should avoided unless really necessary. They create an on-going maintenance burden (there's a reason most modules don't do this and put their imports at the top). FWIW, a broken hashlib is a localized bug, not an optimization problem. It doesn't affect any user with a build that passes the test suite. Running "python -v" shows that "random" is not part of the normal startup, so deferring the import saves zero for normal startup. It only affects modules that specifically import random. IIRC, Mercurial uses hashing extensively, so deferring the import doesn't help them at all. This is minor change, so I suppose we could let it go through; however, it seems somewhat arbitrary and the reasons offered seem dubious. For the most part, it isn't a good practice. |
|
|
msg339690 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2019-04-09 01:20 |
One other thought. This code has been present for over a decade. There is no evidence that anyone has ever wanted random to defer one of its imports. This seems like an invented problem. |
|
|
msg339720 - (view) |
Author: Christian Heimes (christian.heimes) *  |
Date: 2019-04-09 09:45 |
You could also use the internal _sha512 module. It's always present, small, lean and provides a SHA512 implementation with sufficient performance. |
|
|
msg339759 - (view) |
Author: Xavier de Gaye (xdegaye) *  |
Date: 2019-04-09 15:05 |
> You could also use the internal _sha512 module. It's always present This is not true at the moment, the _sha512 module is not present when openssl is missing. This is a bug in setup.py that prevents building the _sha512 module when openssl is missing. See issue 36544. It is possible that this issue (since it started because of hashlib failures) and also issue 36414 are caused by this problem. |
|
|
msg339763 - (view) |
Author: Christian Heimes (christian.heimes) *  |
Date: 2019-04-09 15:21 |
Thanks for pointing to the other issue. This is clearly a regression and should be fixed ASAP. I have ACKed your PR and pushed it. |
|
|
msg339791 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2019-04-09 17:28 |
Note: *Technically*, you can disable the compilation of the _sha512 module using "*disabled*" in Modules/Setup, but I'm not sure if it's a common use case. At least, it makes sense to me when we are sure that OpenSSL and _hashlib are available ;-) I didn't want to mention that since I'm not sure that it's really relevant in this discussion. (I was aware of the regression, and hopefully it's now fixed!) |
|
|
msg339822 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2019-04-10 01:24 |
> You could also use the internal _sha512 module. > It's always present, small, lean and provides a SHA512 > implementation with sufficient performance. I suppose we could do this but it borders on telling folks that we're worried about using our own public APIs, that importing hashlib is bad for them. It shouldn't be that way, hashlib is a collection of hash functions -- it is clear why this import isn't small and fast. It suggests that something is wrong with the implementation. The focus on client code in random seems like the wrong focus. That is just typical of what other clients would do. |
|
|
msg339834 - (view) |
Author: Inada Naoki (methane) *  |
Date: 2019-04-10 07:57 |
* Is hashlib large, slow to import library? -- Yes. * random module use hashlib only for specific (rare) use case? -- Yes. * Does user of random module needs hashlib in most case? -- No. For example, tmpfile user may not need hashlib. I'm +1 on PR 12728. |
|
|
msg339887 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2019-04-10 20:18 |
New changeset d914596a671c4b0f13641359cf43aa0d6fc05070 by Raymond Hettinger (Christian Heimes) in branch 'master': bpo-36559: random module: optimize sha512 import (GH-12742) https://github.com/python/cpython/commit/d914596a671c4b0f13641359cf43aa0d6fc05070 |
|
|