[Python-Dev] os.walk() is going to be *fast* with scandir

Robert Collins robertc at robertcollins.net
Sun Aug 10 07:40:47 CEST 2014


A small tip from my bzr days - cd into the directory before scanning
it - especially if you'll end up statting more than a fraction of the
files, or are recursing - otherwise the VFS does a traversal for each
path you directly stat / recurse into. This can become a dominating
factor in some workloads (I shaved several hundred milliseconds off of
bzr stat on kernel trees doing this).

-Rob

On 10 August 2014 15:57, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On 10 August 2014 13:20, Antoine Pitrou <antoine at python.org> wrote:
>> Le 09/08/2014 12:43, Ben Hoyt a écrit :
>>
>>> Just thought I'd share some of my excitement about how fast the all-C
>>> version [1] of os.scandir() is turning out to be.
>>>
>>> Below are the results of my scandir / walk benchmark run with three
>>> different versions. I'm using an SSD, which seems to make it
>>> especially faster than listdir / walk. Note that benchmark results can
>>> vary a lot, depending on operating system, file system, hard drive
>>> type, and the OS's caching state.
>>>
>>> Anyway, os.walk() can be FIFTY times as fast using os.scandir().
>>
>>
>> Very nice results, thank you :-)
>
> Indeed!
>
> This may actually motivate me to start working on a redesign of
> walkdir at some point, with scandir and DirEntry objects as the basis.
> My original approach was just too slow to be useful in practice (at
> least when working with trees on the scale of a full Fedora or RHEL
> build hosted on an NFS share).
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/robertc%40robertcollins.net



-- 
Robert Collins <rbtcollins at hp.com>
Distinguished Technologist
HP Converged Cloud


More information about the Python-Dev mailing list