[Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator (original) (raw)

Nick Coghlan ncoghlan at gmail.com
Sat Jun 28 11:17:12 CEST 2014


On 28 June 2014 16:17, Gregory P. Smith <greg at krypto.org> wrote:

On Fri, Jun 27, 2014 at 2:58 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:

* it would be nice to see some relative performance numbers for NFS and CIFS network shares - the additional network round trips can make excessive stat calls absolutely brutal from a speed perspective when using a network drive (that's why the stat caching added to the import system in 3.3 dramatically sped up the case of having network drives on sys.path, and why I thought AJ had a point when he was complaining about the fact we didn't expose the dirent data from os.listdir) fwiw, I wouldn't wait for benchmark numbers. A needless stat call when you've got the information from an earlier API call is already brutal. It is easy to compute from existing ballparks remote file server / cloud access: ~100ms, local spinning disk seek+read: ~10ms. fetch of stat info cached in memory on file server on the local network: ~500us. You can go down further to local system call overhead which can vary wildly but should likely be assumed to be at least 10us. You don't need a benchmark to tell you that adding needless >= 500us-100ms blocking operations to your program is bad. :)

Agreed, but walking even a moderately large tree over the network can really hammer home the point that this offers a significant performance enhancement as the latency of access increases. I've found that kind of comparison can be eye-opening for folks that are used to only operating on local disks (even spinning disks, let alone SSDs) and/or relatively small trees (distro build trees aren't that big, but they're big enough for this kind of difference in access overhead to start getting annoying).

Cheers, Nick.

-- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia



More information about the Python-Dev mailing list