(original) (raw)

On Thu, Feb 9, 2012 at 5:34 PM, Robert Kern <robert.kern@gmail.com> wrote:
On 2/9/12 10:15 PM, Antoine Pitrou wrote:
On Thu, 9 Feb 2012 17:00:04 -0500
PJ Eby<pje@telecommunity.com> �wrote:
On Thu, Feb 9, 2012 at 2:53 PM, Mike Meyer<mwm@mired.org> �wrote:

For those of you not watching -ideas, or ignoring the "Python TIOBE
\-3%" discussion, this would seem to be relevant to any discussion of
reworking the import mechanism:

http://mail.scipy.org/pipermail/numpy-discussion/2012-January/059801.html

Interesting. �This gives me an idea for a way to cut stat calls per
sys.path entry per import by roughly 4x, at the cost of a one-time
directory read per sys.path entry.

Why do you even think this is a problem with "stat calls"?

All he said is that reading about that problem and its solution gave him an idea about dealing with stat call overhead. The cost of stat calls has demonstrated itself to be a significant problem in other, more typical contexts.

Right. �It was the part of the post that mentioned that all they sped up was knowing which directory the files were in, not the actual loading of bytecode. �The thought then occurred to me that this could perhaps be applied to normal importing, as a zipimport-style speedup. �(The zipimport module caches each zipfile directory it finds on sys.path, so failed import lookups are extremely fast.)

It occurs to me, too, that applying the caching trick to \*only\* the stdlib directories would still be a win as soon as you have between four and eight site-packages (or user specific site-packages) imports in an application, so it might be worth applying unconditionally to system-defined stdlib (non-site) directories.