[Python-ideas] Speed up os.walk() 5x to 9x by using file attributes from FindFirst/NextFile() and readdir()

Ben Hoyt benhoyt at gmail.com
Wed Nov 14 11:15:07 CET 2012


> Data from bzr:
>  you can get a very significant speed up by doing two things:
>  - use readdir to get the inode numbers of the files in the directory
> and stat the files in-increasing-number-order. (this gives you
> monotonically increasing IO).
>  - chdir to the directory before you stat and use a relative path: it
> turns out when working with many files that the overhead of absolute
> paths is substantial.

Huh, very interesting, thanks. On the first point, did you need to
separately stat() the files after the readdir()? Presumably you needed
information other than the info in the d_type field from readdir's
dirent struct.

> We got a (IIRC 90% reduction in 'bzr status' time applying both of
> these things, and you can grab the pyrex module needed to do readdir
> from bzr - though we tuned what we had to match the needs of a VCS, so
> its likely too convoluted for general purpose use).

Do you have a web link to said source code? I'm having trouble (read:
being lazy) figuring out the bzr source repo.

-Ben



More information about the Python-ideas mailing list