(original) (raw)

Very interesting. Although os.walk may not be widely used in cluster applications, anything that lowers the number of calls to stat() in an spplication is worthwhile for parallel filesystems as stat() is handled by the only non-parallel node, the MDS.

Small test on another NFS drive:
Creating tree at benchtree: depth=4, num\_dirs=5, num\_files=50
Priming the system's cache...
Benchmarking walks on benchtree, repeat 1/3...
Benchmarking walks on benchtree, repeat 2/3...
Benchmarking walks on benchtree, repeat 3/3...
os.walk took 0.117s, scandir.walk took 0.041s -- 2.8x as fast

I may try it on a Lustre FS if I have some time and if I don't forget about this.

Cheers,

Matthieu


2013/5/14 Charles-Fran�ois Natali <cf.natali@gmail.com>
> I wonder how sshfs compared to nfs.

(I've modified your benchmark to also test the case where data isn't
in the page cache).

Local ext3:
cached:
os.walk took 0.096s, scandir.walk took 0.030s -- 3.2x as fast
uncached:
os.walk took 0.320s, scandir.walk took 0.130s -- 2.5x as fast

NFSv3, 1Gb/s network:
cached:
os.walk took 0.220s, scandir.walk took 0.078s -- 2.8x as fast
uncached:
os.walk took 0.269s, scandir.walk took 0.139s -- 1.9x as fast
\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/matthieu.brucher%40gmail.com



--
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
Music band: http://liliejay.com/