[Python-ideas] Speed up os.walk() 5x to 9x by using file attributes from FindFirst/NextFile() and readdir()

Antoine Pitrou solipsis at pitrou.net
Wed Nov 14 19:43:31 CET 2012


On Wed, 14 Nov 2012 13:20:31 -0500
random832 at fastmail.us wrote:
> On Wed, Nov 14, 2012, at 5:52, Antoine Pitrou wrote:
> > This assumes directory entries are sorted by inode number (in a btree,
> > I imagine). Is this assumption specific to some Linux / Ubuntu
> > filesystem?
> 
> I think he was proposing listing the whole directory in advance (which
> os.walk already does), sorting it, and then looping over it calling
> stat.

But I don't understand why sorting (by inode? by name?) would make
stat() calls faster. That's what I'm trying to understand.

> If the idea is for an API that exposes more information returned
> by readdir, though, why not get d_type too when it's available?
> 
> > >  - chdir to the directory before you stat and use a relative path: it
> > > turns out when working with many files that the overhead of absolute
> > > paths is substantial.
> > 
> > How about using fstatat() instead? chdir() is a no-no since it's
> > a process-wide setting.
> 
> A) is fstatat even implemented in python?

Yup, it's available as a special parameter to os.stat():
http://docs.python.org/dev/library/os.html#os.stat

> B) is fstatat even possible under windows?

No, but Windows has its own functions to solve the issue (as explained
elsewhere in this thread).

> C) using *at functions for this the usual way incurs overhead in the
> form of having to maintain a number of open file handles equal to the
> depth of your directory.

Indeed. But directory trees are usually much wider than they are deep.

Regards

Antoine.





More information about the Python-ideas mailing list