[Python-Dev] Investigating time for import requests
(original) (raw)
INADA Naoki songofacandy at gmail.com
Sun Oct 1 22:04:51 EDT 2017
- Previous message (by thread): [Python-Dev] Intention to accept PEP 552 soon (deterministic pyc files)
- Next message (by thread): [Python-Dev] Investigating time for `import requests`
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
See also https://github.com/requests/requests/issues/4315
I tried new -X importtime
option to import requests
.
Full output is here:
https://gist.github.com/methane/96d58a29e57e5be97769897462ee1c7e
Currently, it took about 110ms. And major parts are from Python stdlib. Followings are root of slow stdlib subtrees.
import time: self [us] | cumulative | imported package import time: 1374 | 14038 | logging import time: 2636 | 4255 | socket import time: 2902 | 11004 | ssl import time: 1162 | 16694 | http.client import time: 656 | 5331 | cgi import time: 7338 | 7867 | http.cookiejar import time: 2930 | 2930 | http.cookies
1. logging
logging is slow because it is imported in early stage. It imports many common, relatively slow packages. (collections, functools, enum, re).
Especially, traceback module is slow because linecache.
import time: 1419 | 5016 | tokenize import time: 200 | 5910 | linecache import time: 347 | 8869 | traceback
I think it's worth enough to import linecache lazily.
2. socket
import time: 807 | 1221 | selectors import time: 2636 | 4255 | socket
socket imports selectors for socket.send_file(). And selectors module use ABC. That's why selectors is bit slow.
And socket module creates four enums. That's why import socket took more than 2.5ms excluding subimports.
3. ssl
import time: 2007 | 2007 | ipaddress import time: 2386 | 2386 | textwrap import time: 2723 | 2723 | _ssl ... import time: 306 | 988 | base64 import time: 2902 | 11004 | ssl
I already created pull request about removing textwrap dependency from ssl. https://github.com/python/cpython/pull/3849
ipaddress and _ssl module are bit slow too. But I don't know we can improve them or not.
ssl itself took 2.9 ms. It's because ssl has six enums.
4. http.client
import time: 1376 | 2448 | email.header ... import time: 1469 | 7791 | email.utils import time: 408 | 10646 | email._policybase import time: 939 | 12210 | email.feedparser import time: 322 | 12720 | email.parser ... import time: 599 | 1361 | email.message import time: 1162 | 16694 | http.client
email.parser has very large import tree. But I don't know how to break the tree.
5. cgi
import time: 1083 | 1083 | html.entities import time: 560 | 1643 | html ... import time: 656 | 2609 | shutil import time: 424 | 3033 | tempfile import time: 656 | 5331 | cgi
cgi module uses tempfile to save uploaded file.
But requests imports cgi just for cgi.parse_header()
.
tempfile is not used. Maybe, it's worth enough to import it lazily.
FYI, cgi depends on very slow email.parser too. But this tree doesn't contain it because http.client is imported before cgi. Even though it's not problem for requests, it may affects to real CGI application. Of course, startup time is very important for CGI applications too.
6. http.cookiejar and http.cookies
It's slow because it has many re.compile()
Ideas
There are some places to break large import tree by "import in function" hack.
ABC is slow, and it's used widely without almost no real need. (Who need selectors is ABC?) We can't remove ABC dependency because of backward compatibility. But I hope ABC is implemented in C by Python 3.7.
Enum is slow, maybe slower than most people think. I don't know why exactly, but I suspect that it's because namespace dict implemented in Python.
Anyway, I think we can have C implementation of IntEnum and IntFlag, like namedtpule vs PyStructSequence. It doesn't need to 100% compatible with current enum. Especially, no need for using metaclass.
Another major slowness comes from compiling regular expression.
I think we can increase cache size of re.compile
and use ondemand cached
compiling (e.g. re.match()
),
instead of "compile at import time" in many modules.
PEP 562 -- Module getattr helps a lot too. It make possible to split collection module and strings module. (strings module is used often for constants like strings.ascii_letters, but strings.Template cause import time re.compile())
Regards,
Inada Naoki <songofacandy at gmail.com> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20171002/2431cbfd/attachment.html>
- Previous message (by thread): [Python-Dev] Intention to accept PEP 552 soon (deterministic pyc files)
- Next message (by thread): [Python-Dev] Investigating time for `import requests`
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]