Issue 14604: spurious stat() calls in importlib (original) (raw)

It seems importlib does multiple stat() calls on py files:

stat("/home/antoine/cpython/opt/Lib", {st_mode=S_IFDIR|0775, st_size=12288, ...}) = 0 stat("/home/antoine/cpython/opt/Lib/_sysconfigdata.py", {st_mode=S_IFREG|0664, st_size=16032, ...}) = 0 stat("/home/antoine/cpython/opt/Lib/_sysconfigdata.py", {st_mode=S_IFREG|0664, st_size=16032, ...}) = 0 open("/home/antoine/cpython/opt/Lib/pycache/_sysconfigdata.cpython-33.pyc", O_RDONLY) = 3

It also does multiple stat() calls on some directories:

stat("/home/antoine/cpython/opt/build/lib.linux-x86_64-3.3", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0 stat("/home/antoine/.local/lib/python3.3/site-packages", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0 stat("/home/antoine/.local/lib/python3.3/site-packages", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0 stat("/home/antoine/.local/lib/python3.3/site-packages", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0 open("/home/antoine/.local/lib/python3.3/site-packages", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3

That said, the number of system calls issued by 3.3 at startup is now much lower than with 3.2:

$ strace ./python -Sc pass 2>&1 | wc -l 512 $ strace python3.2 -Sc pass 2>&1 | wc -l 1018

OK, so a cursory look at importlib suggests that the possible costs of those stat calls (by looking at what has to examine the filesystem) are:

So looking at that initial block of stat calls, I am willing to bet that Lib is getting the stat call by the os.path.isdir() check in the finder, the 2 Lib/_sysconfigdata.py checks are from the finder checking the file exists and then stat'ing in the loader for bytecode verification, and then finally the opening of the bytecode to read it and discover it's usable.

As for the multiple stat calls on directories, that's validating the cache isn't out-of-date which I don't see how that can be avoided short of hitting the system clock to see if some amount of time has passed.

As for the multiple stat calls between the finder and the loader, I don't see any way to cut that down without coming up with a find + load API which makes the call immediately or some way to pass in stat details, else you have race conditions on the status of the file before you check if the bytecode is stale. If the stat calls on the directories for cache validation is too frequent, then issue #14067 is probably your best bet.