(original) (raw)

See also https://github.com/requests/requests/issues/4315

I tried new \`-X importtime\` option to \`import requests\`.
Full output is here: https://gist.github.com/methane/96d58a29e57e5be97769897462ee1c7e

Currently, it took about 110ms. And major parts are from Python stdlib.
Followings are root of slow stdlib subtrees.

import time: self \[us\] | cumulative | imported package
import time: 1374 | 14038 | logging
import time: 2636 | 4255 | socket
import time: 2902 | 11004 | ssl
import time: 1162 | 16694 | http.client
import time: 656 | 5331 | cgi
import time: 7338 | 7867 | http.cookiejar
import time: 2930 | 2930 | http.cookies


1\. logging

logging is slow because it is imported in early stage.
It imports many common, relatively slow packages. (collections, functools, enum, re).

Especially, traceback module is slow because linecache.

import time: 1419 | 5016 | tokenize
import time: 200 | 5910 | linecache
import time: 347 | 8869 | traceback

I think it's worth enough to import linecache lazily.

2\. socket

import time: 807 | 1221 | selectors
import time: 2636 | 4255 | socket

socket imports selectors for socket.send\_file(). And selectors module use ABC.
That's why selectors is bit slow.

And socket module creates four enums. That's why import socket took more than 2.5ms
excluding subimports.

3\. ssl

import time: 2007 | 2007 | ipaddress
import time: 2386 | 2386 | textwrap
import time: 2723 | 2723 | \_ssl
...
import time: 306 | 988 | base64
import time: 2902 | 11004 | ssl

I already created pull request about removing textwrap dependency from ssl.
https://github.com/python/cpython/pull/3849

ipaddress and \_ssl module are bit slow too. But I don't know we can improve them or not.

ssl itself took 2.9 ms. It's because ssl has six enums.


4\. http.client

import time: 1376 | 2448 | email.header
...
import time: 1469 | 7791 | email.utils
import time: 408 | 10646 | email.\_policybase
import time: 939 | 12210 | email.feedparser
import time: 322 | 12720 | email.parser
...
import time: 599 | 1361 | email.message
import time: 1162 | 16694 | http.client

email.parser has very large import tree.
But I don't know how to break the tree.

5\. cgi

import time: 1083 | 1083 | html.entities
import time: 560 | 1643 | html
...
import time: 656 | 2609 | shutil
import time: 424 | 3033 | tempfile
import time: 656 | 5331 | cgi

cgi module uses tempfile to save uploaded file.
But requests imports cgi just for \`cgi.parse\_header()\`.
tempfile is not used. Maybe, it's worth enough to import it lazily.

FYI, cgi depends on very slow email.parser too.
But this tree doesn't contain it because http.client is imported before cgi.
Even though it's not problem for requests, it may affects to real CGI application.
Of course, startup time is very important for CGI applications too.


6\. http.cookiejar and http.cookies

It's slow because it has many \`re.compile()\`


Ideas

There are some places to break large import tree by "import in function" hack.

ABC is slow, and it's used widely without almost no real need. (Who need selectors is ABC?)
We can't remove ABC dependency because of backward compatibility.
But I hope ABC is implemented in C by Python 3.7.

Enum is slow, maybe slower than most people think.
I don't know why exactly, but I suspect that it's because namespace dict implemented in Python.

Anyway, I think we can have C implementation of IntEnum and IntFlag, like namedtpule vs PyStructSequence.
It doesn't need to 100% compatible with current enum. Especially, no need for using metaclass.

Another major slowness comes from compiling regular expression.
I think we can increase cache size of \`re.compile\` and use ondemand cached compiling (e.g. \`re.match()\`),
instead of "compile at import time" in many modules.

PEP 562 -- Module \_\_getattr\_\_ helps a lot too.
It make possible to split collection module and strings module.
(strings module is used often for constants like strings.ascii\_letters, but strings.Template
cause import time re.compile())


Regards,
--
Inada Naoki <songofacandy@gmail.com>