Avoid full path enumeration on import of setuptools or pkg_resources? · Issue #510 · pypa/setuptools (original) (raw)

Originally reported by: konstantint (Bitbucket: konstantint, GitHub: konstantint)


At the moment on my machine, it takes about 1.28 seconds to do a bare import pkg_resources, 1.47 seconds to do a bare import setuptools, 1.36 seconds to do a bare from pkg_resources import load_entry_point and 1.25 seconds to do a bare from pkg_resources import load_entry_point.

This obviously affects all of the python scripts that are installed as console entry points, because each and every one of them starts with a line like that. In code which does not rely on entry points this may be a problem whenever I want to use resource_filename to consistently access static data.

I believe this problem is decently common, yet I did not find any issue or discussion, hence I'm creating one, hoping I'm not repeating what has been said already elsewhere unnecessarily.

I am using Anaconda Python, which comes along with a fairly large package, alongside several of my own packages, which I commonly add my path via setup.py develop, however I do not believe this setup is anything out of the ordinary. There are 37 items on my sys.path at the moment. Profiling import pkg_resources shows that this leads to 76 calls to workingset.add_entry (timing at about a second), of which most of the time is spent in 466 calls to Distribution.from_location.

Obviously, the reason for the problem lies in the two _call_aside methods at the end of pkg_resources which lead to a full scan of the python path at the moment when the package is imported, and the only way to alleviate it would be to somehow avoid or delay the need for this scan as much as possible.

I see two straightforward remedies:
a) Make the scanning lazy. After all, if all one needs is to find a particular package, the scan could stop as soon as the corresponding package is located. At the very least this would allow me to "fix" my ipython loading problem by moving it up in the path. This might break some import rules which do not respect the precedence of the path, which I'm not aware.
b) Cache a precomputed index and update it lazily. Yes, this might requre some ad-hoc rules for resolving inconsistencies, and this may lead to ugly conflicts with external tools that attempt to install multiple versions of a package, but this will basically avoid the current startup delay in 99.99% of cases and solve so much of my problems, that I'd be willing to pay the price.

Although both options might seem somewhat controversial, the problem itself seems to be serious enough to deserve at least some fix eventually (for example, I've recently discovered I'm reluctant to start ipython for short calculations because of its startup delay which I've now tracked back to this same issue).

I'm contemplating making a separate utility, e.g. fast_pkg_resources, which would implement the strategy b) by simply caching calls to pkg_resources in an external file, yet I thought of raising the issue here to figure out whether someone has already addressed it, whether there are plans to do something about it in the setuptools core codebase, or perhaps I'm missing something obvious.