[Python-Dev] Issue 11406: adding os.scandir(), a directory iterator returning stat-like info (original) (raw)

Matthieu Brucher matthieu.brucher at gmail.com
Tue May 14 12:53:42 CEST 2013


Very interesting. Although os.walk may not be widely used in cluster applications, anything that lowers the number of calls to stat() in an spplication is worthwhile for parallel filesystems as stat() is handled by the only non-parallel node, the MDS.

Small test on another NFS drive: Creating tree at benchtree: depth=4, num_dirs=5, num_files=50 Priming the system's cache... Benchmarking walks on benchtree, repeat 1/3... Benchmarking walks on benchtree, repeat 2/3... Benchmarking walks on benchtree, repeat 3/3... os.walk took 0.117s, scandir.walk took 0.041s -- 2.8x as fast

I may try it on a Lustre FS if I have some time and if I don't forget about this.

Cheers,

Matthieu

2013/5/14 Charles-François Natali <cf.natali at gmail.com>

> I wonder how sshfs compared to nfs.

(I've modified your benchmark to also test the case where data isn't in the page cache). Local ext3: cached: os.walk took 0.096s, scandir.walk took 0.030s -- 3.2x as fast uncached: os.walk took 0.320s, scandir.walk took 0.130s -- 2.5x as fast NFSv3, 1Gb/s network: cached: os.walk took 0.220s, scandir.walk took 0.078s -- 2.8x as fast uncached: os.walk took 0.269s, scandir.walk took 0.139s -- 1.9x as fast


Python-Dev mailing list Python-Dev at python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/matthieu.brucher%40gmail.com

-- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher Music band: http://liliejay.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20130514/b0999bd0/attachment.html>



More information about the Python-Dev mailing list