[Python-Dev] Issue 11406: adding os.scandir(), a directory iterator returning stat-like info

Matthieu Brucher matthieu.brucher at gmail.com
Tue May 14 12:53:42 CEST 2013


Very interesting. Although os.walk may not be widely used in cluster
applications, anything that lowers the number of calls to stat() in an
spplication is worthwhile for parallel filesystems as stat() is handled by
the only non-parallel node, the MDS.

Small test on another NFS drive:
Creating tree at benchtree: depth=4, num_dirs=5, num_files=50
Priming the system's cache...
Benchmarking walks on benchtree, repeat 1/3...
Benchmarking walks on benchtree, repeat 2/3...
Benchmarking walks on benchtree, repeat 3/3...
os.walk took 0.117s, scandir.walk took 0.041s -- 2.8x as fast

I may try it on a Lustre FS if I have some time and if I don't forget about
this.

Cheers,

Matthieu


2013/5/14 Charles-François Natali <cf.natali at gmail.com>

> > I wonder how sshfs compared to nfs.
>
> (I've modified your benchmark to also test the case where data isn't
> in the page cache).
>
> Local ext3:
> cached:
> os.walk took 0.096s, scandir.walk took 0.030s -- 3.2x as fast
> uncached:
> os.walk took 0.320s, scandir.walk took 0.130s -- 2.5x as fast
>
> NFSv3, 1Gb/s network:
> cached:
> os.walk took 0.220s, scandir.walk took 0.078s -- 2.8x as fast
> uncached:
> os.walk took 0.269s, scandir.walk took 0.139s -- 1.9x as fast
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/matthieu.brucher%40gmail.com
>



-- 
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
Music band: http://liliejay.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20130514/b0999bd0/attachment.html>


More information about the Python-Dev mailing list