gh-99726: Improves correctness of stat results for Windows, and uses faster API when available by zooba · Pull Request #102149 · python/cpython (original) (raw)

@zooba, I was waiting to review this PR until GetFileInformationByName() is pushed to the general availability channel. I'm surprised that you merged code that depends on a new Windows API function that isn't generally available and has no documentation -- not even a blog post. CPython has never been that aggressive in adopting new Windows features.

They shouldn't really be able to be any faster. GetFileAttributesW was already using the fast path that we now have for more info with the new API, so in theory they should be slightly faster due to a little less memcpy.

The new ntpath.is*() functions (e.g. os__path_isdir_impl) use CreateFileW() and GetFileInformationByHandleEx(). Using GetFileAttributesW() would still require a fallback implementation that calls CreateFileW() for reparse points. It was simpler to implement just the CreateFileW() path. Also, testing showed that using CreateFileW() on balance was about as fast or faster than GetFileAttributesW(). For reparse points, using just CreateFileW() was obviously faster, since we have to call CreateFileW() anyway. Here's what I said at the time:

For me, this implementation takes about about a third less time than using os.stat(). It takes about 10% more time than using GetFileAttributesW(), but using GetFileAttributesW() is significantly more expensive for a reparse point (e.g. symlink, junction) because CreateFileW() has to be called to traverse it, which means the file is opened and queried twice.

On the other hand, at the time, my tests showed that NtQueryInformationByName() was significantly faster than NtQueryAttributesFile(). NtQueryInformationByName() uses a completely different implementation in the kernel. In principle, NtQueryAttributesFile() could take the same path, so maybe they've updated its implementation in the development channel. Either way, the ntpath.is*() functions would benefit from using this faster path to the filesystem information.