[Python-ideas] Speed up os.walk() 5x to 9x by using file attributes from FindFirst/NextFile() and readdir()

Robert Collins robertc at robertcollins.net
Wed Nov 14 19:55:32 CET 2012


On Wed, Nov 14, 2012 at 11:52 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> Le Wed, 14 Nov 2012 22:53:44 +1300,
> Robert Collins
> <robertc at robertcollins.net> a écrit :
>>
>> Data from bzr:
>>  you can get a very significant speed up by doing two things:
>>  - use readdir to get the inode numbers of the files in the directory
>> and stat the files in-increasing-number-order. (this gives you
>> monotonically increasing IO).
>
> This assumes directory entries are sorted by inode number (in a btree,
> I imagine). Is this assumption specific to some Linux / Ubuntu
> filesystem?

Its definitely not applicable globally ( but its no worse in general
than any arbitrary sort, so safe to do everywhere). On the ext* family
of file systems inode A < inode B implies inode A is located on a
lower sector than B.

>>  - chdir to the directory before you stat and use a relative path: it
>> turns out when working with many files that the overhead of absolute
>> paths is substantial.
>
> How about using fstatat() instead? chdir() is a no-no since it's
> a process-wide setting.

fstatat looks *perfect*. Thanks for the pointer.

I forget the win32 behaviour, but on Linux a thread is a process ->
chdir is localised to the process and not altered across threads.

-Rob

-- 
Robert Collins <rbtcollins at hp.com>
Distinguished Technologist
HP Cloud Services



More information about the Python-ideas mailing list