(original) (raw)

I tried new \`-X importtime\` option to \`import requests\`.

Full output is here: https://gist.github.com/methane/96d58a29e57e5be97769897462ee1c7e

Currently, it took about 110ms. And major parts are from Python stdlib.

Followings are root of slow stdlib subtrees.

import time: self \[us\] | cumulative | imported package

import time: 1374 | 14038 | logging

import time: 2636 | 4255 | socket

import time: 2902 | 11004 | ssl

import time: 1162 | 16694 | http.client

import time: 656 | 5331 | cgi

import time: 7338 | 7867 | http.cookiejar

import time: 2930 | 2930 | http.cookies

1\. logging

logging is slow because it is imported in early stage.

It imports many common, relatively slow packages. (collections, functools, enum, re).

Especially, traceback module is slow because linecache.

import time: 1419 | 5016 | tokenize

import time: 200 | 5910 | linecache

import time: 347 | 8869 | traceback

I think it's worth enough to import linecache lazily.

2\. socket

import time: 807 | 1221 | selectors

import time: 2636 | 4255 | socket

socket imports selectors for socket.send\_file(). And selectors module use ABC.

That's why selectors is bit slow.

And socket module creates four enums. That's why import socket took more than 2.5ms

excluding subimports.

3\. ssl

import time: 2007 | 2007 | ipaddress

import time: 2386 | 2386 | textwrap

import time: 2723 | 2723 | \_ssl

...

import time: 306 | 988 | base64

import time: 2902 | 11004 | ssl

I already created pull request about removing textwrap dependency from ssl.

https://github.com/python/cpython/pull/3849

ipaddress and \_ssl module are bit slow too. But I don't know we can improve them or not.

ssl itself took 2.9 ms. It's because ssl has six enums.

4\. http.client

import time: 1376 | 2448 | email.header

...

import time: 1469 | 7791 | email.utils

import time: 408 | 10646 | email.\_policybase

import time: 939 | 12210 | email.feedparser

import time: 322 | 12720 | email.parser

...

import time: 599 | 1361 | email.message

import time: 1162 | 16694 | http.client

email.parser has very large import tree.

But I don't know how to break the tree.

5\. cgi

import time: 1083 | 1083 | html.entities

import time: 560 | 1643 | html

...

import time: 656 | 2609 | shutil

import time: 424 | 3033 | tempfile

import time: 656 | 5331 | cgi

cgi module uses tempfile to save uploaded file.

But requests imports cgi just for \`cgi.parse\_header()\`.

tempfile is not used. Maybe, it's worth enough to import it lazily.

FYI, cgi depends on very slow email.parser too.

But this tree doesn't contain it because http.client is imported before cgi.

Even though it's not problem for requests, it may affects to real CGI application.

Of course, startup time is very important for CGI applications too.

6\. http.cookiejar and http.cookies

It's slow because it has many \`re.compile()\`

Ideas

There are some places to break large import tree by "import in function" hack.

ABC is slow, and it's used widely without almost no real need. (Who need selectors is ABC?)

We can't remove ABC dependency because of backward compatibility.

But I hope ABC is implemented in C by Python 3.7.

Enum is slow, maybe slower than most people think.

I don't know why exactly, but I suspect that it's because namespace dict implemented in Python.

Anyway, I think we can have C implementation of IntEnum and IntFlag, like namedtpule vs PyStructSequence.

It doesn't need to 100% compatible with current enum. Especially, no need for using metaclass.

Another major slowness comes from compiling regular expression.

I think we can increase cache size of \`re.compile\` and use ondemand cached compiling (e.g. \`re.match()\`),

instead of "compile at import time" in many modules.

PEP 562 -- Module \_\_getattr\_\_ helps a lot too.

It make possible to split collection module and strings module.

(strings module is used often for constants like strings.ascii\_letters, but strings.Template

cause import time re.compile())

Regards,

Inada Naoki <songofacandy@gmail.com>