Add fast path to os.[l]stat() that returns incomplete information · Issue #99726 · python/cpython (original) (raw)
A future update to Windows is bringing a new filesystem API for getting stat(-like) information more efficiently from a filename. Currently, we have to open the file, which is quite a slow operation. Being able to simply request metadata based on the path is a real improvement. My testing shows os.stat()
and os.lstat()
(in the case where no traversal is needed) taking less than 1/4 of their current time when using the new API. I'll link the change in a PR below.
However, the new API does not include the volume serial number, which is how we fill in the st_dev
field. Adding an additional call to get the VSN takes all the time we were taking before, so there's no performance benefit.1
So I'd like to propose adding a fast=False
argument to os.stat
and os.lstat
. When left as False
, you get the current behaviour. If you pass True
, we only guarantee a smaller set of data, and warn that other fields may be absent on some platforms.
Looking through the fields, I have proposed that the file type bits of st_mode
(not permissions), the st_size
and st_mtime[_ns]
fields are the only ones that are important to guarantee.2 All the rest can stay as they are, but we then have the option to drop them from the fast path in the future.3 It's no accident that these are the APIs we already offer as other os.path
functions (apart from samestat
, which will have to stay on the slow path and probably needs an even slower check in order to be x-plat reliable...)
I'm not sure who cares most about this, so I'm going to leave this open for a while.
Linked PRs
- gh-99726: Add 'fast' argument to os.[l]stat for faster calculation #99727
- gh-99726: Adds os.statx function and associated constants #99755
- gh-99726: Improves correctness of stat results for Windows, and uses faster API when available #102149
- gh-99726: Fix order of recently added fields for FILE_STAT_BASIC_INFORMATION #102976
- There is still discussion about changing this API before it releases. If that happens, the rest of this proposal is moot, unless we like the idea anyway. ↩
- On Windows, we can further guarantee
st_file_attributes
andst_reparse_tag
, as these are the raw values used to calculate the file type bits ofst_mode
. ↩ - stat is already very fast on POSIX-ish filesystems, so it's unlikely to be an issue there, but if we wanted to specialise for network FS or similar then we'd be able to. ↩