[Python-Dev] Issue 11406: adding os.scandir(), a directory iterator returning stat-like info (original) (raw)

Ben Hoyt benhoyt at gmail.com
Tue May 14 00:41:01 CEST 2013


I'd to see the numbers for NFS or CIFS - stat() can be brutally slow over a network connection (that's why we added a caching mechanism to importlib).

How do I know what file system Windows networking is using? In any case, here's some numbers on Windows -- it's looking pretty good! This is with default DEPTH/NUM_DIRS/NUM_FILES on a LAN:

Benchmarking walks on \anothermachine\docs\Ben\bigtree, repeat 3/3... os.walk took 11.345s, scandir.walk took 0.340s -- 33.3x as fast

And this is on a VPN on a remote network with the benchmark.py values cranked down to DEPTH = 3, NUM_DIRS = 3, NUM_FILES = 20 (because otherwise it was taking far too long):

Benchmarking walks on \ben1.titanmt.local\c$\dev\scandir\benchtree, repeat 3/3... os.walk took 122.310s, scandir.walk took 5.452s -- 22.4x as fast

If anyone can run benchmark.py on Linux / NFS or similar, that'd be great. You'll probably have to lower DEPTH/NUM_DIRS/NUM_FILES first and then move the "benchtree" to the network file system to run it against that.

I initially quite liked the idea of not offering any methods on DirEntry, only properties, to make it obvious that they don't touch the file system, but just report info from the scandir call. However, I think that it ends up reading strangely, and would be confusing relative to the os.path() APIs.

What you have now seems like a good, simple alternative.

Thanks. Yeah, I kinda liked the "DirEntry doesn't make any OS calls" at first too, but then as I got into it I realized it make for a really nasty API for most use cases. I like how it's ended up.

-Ben



More information about the Python-Dev mailing list