[Python-Dev] Fixing #7175: a standard location for Python config files (original) (raw)
Andrew Bennetts andrew at bemusement.org
Fri Aug 13 04:00:37 CEST 2010
- Previous message: [Python-Dev] Fixing #7175: a standard location for Python config files
- Next message: [Python-Dev] Fixing #7175: a standard location for Python config files
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Antoine Pitrou wrote:
On Thu, 12 Aug 2010 18:14:44 -0400 Glyph Lefkowitz <glyph at twistedmatrix.com> wrote: > > On Aug 12, 2010, at 6:30 AM, Tim Golden wrote: > > > I don't care how many stats we're doing > > You might not, but I certainly do. And I can guarantee you that the > authors of command-line tools that have to start up in under ten > seconds, for example 'bzr', care too.
The idea that import time is dominated by stat() calls sounds rather undemonstrated (and unlikely) to me.
In the case of bzr startup, the exact breakdown varies depending on a range of factors like OS and whether the relevant parts of the filesystem are in the OS cache or not (i.e. is this the first time the user has run bzr since booting?).
The short answer is that the number of stat calls isn't really the problem at the moment for bzr, at least not compared to the number of directories Python searches, and the amount of non-trivial work done at import time by many modules. But... Your Mileage May Vary.
Here's the longer answer:
I think some stats about this have been posted to this list before, but some points of interest from the top of my head:
- the cost of trying and failing to open foomodule.so + foo.so +
foo.pyc + foo.py in a directory isn't much greater than trying to
open just one on them. Once the OS has cached the directory entries
for that directory subsequent lookups in that directory are fast.
The experiement is fairly easy:
- strace -e open,stat64 -o py.strace python -c "something..."
- by hand, create a .C file that repeats all the stat and open in py.strace (it's pretty easy to munge into valid C)
- and also create one with only the successful stat and open calls
- compare them (using /proc/sys/vm/drop_caches or whatever as appropriate)
- each directory probed is a significant cost, especially in the "cold boot" case. So every in sys.path, and every subdirectory of a package.
- that said, Windows seems much slower than Linux on equivalent hardware, perhaps attempting to open files is intrinsically more expensive there? Certainly it's not safe to assume conclusions drawn on Linux will apply equally well on Windows, or vice versa.
- modules with many class/function definitions are measurably slower than smaller modules.
- module-level re.compile calls and other non-trivial operations are to be avoided, but many modules you depend on will do that. This matters so much that bzr monkey-patches the re module to make re.compile lazy. Try grepping the stdlib to see how many modules do re.compile at import time (including as default values of keyword args)!
- it's death by a thousand cuts: each module import probably imports a dozen others... by far the simplest way to reduce startup time is to just import less modules. Lazy module imports (bzrlib.lazy_import or hg's demandload or whatever) help a lot, and I wish they were a builtin feature of Python.
- I haven't even mentioned NFS or other network filesystems, but you can bet they change the picture significantly.
-Andrew.
- Previous message: [Python-Dev] Fixing #7175: a standard location for Python config files
- Next message: [Python-Dev] Fixing #7175: a standard location for Python config files
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]