[Python-Dev] os.walk() is going to be fast with scandir (original) (raw)

Robert Collins robertc at robertcollins.net
Sun Aug 10 07:40:47 CEST 2014


A small tip from my bzr days - cd into the directory before scanning it - especially if you'll end up statting more than a fraction of the files, or are recursing - otherwise the VFS does a traversal for each path you directly stat / recurse into. This can become a dominating factor in some workloads (I shaved several hundred milliseconds off of bzr stat on kernel trees doing this).

-Rob

On 10 August 2014 15:57, Nick Coghlan <ncoghlan at gmail.com> wrote:

On 10 August 2014 13:20, Antoine Pitrou <antoine at python.org> wrote:

Le 09/08/2014 12:43, Ben Hoyt a écrit :

Just thought I'd share some of my excitement about how fast the all-C version [1] of os.scandir() is turning out to be.

Below are the results of my scandir / walk benchmark run with three different versions. I'm using an SSD, which seems to make it especially faster than listdir / walk. Note that benchmark results can vary a lot, depending on operating system, file system, hard drive type, and the OS's caching state. Anyway, os.walk() can be FIFTY times as fast using os.scandir(). Very nice results, thank you :-) Indeed! This may actually motivate me to start working on a redesign of walkdir at some point, with scandir and DirEntry objects as the basis. My original approach was just too slow to be useful in practice (at least when working with trees on the scale of a full Fedora or RHEL build hosted on an NFS share). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia


Python-Dev mailing list Python-Dev at python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/robertc%40robertcollins.net

-- Robert Collins <rbtcollins at hp.com> Distinguished Technologist HP Converged Cloud



More information about the Python-Dev mailing list