[Python-ideas] Speed up os.walk() 5x to 9x by using file attributes from FindFirst/NextFile() and readdir()

random832 at fastmail.us random832 at fastmail.us
Wed Nov 14 19:20:31 CET 2012


On Wed, Nov 14, 2012, at 5:52, Antoine Pitrou wrote:
> This assumes directory entries are sorted by inode number (in a btree,
> I imagine). Is this assumption specific to some Linux / Ubuntu
> filesystem?

I think he was proposing listing the whole directory in advance (which
os.walk already does), sorting it, and then looping over it calling
stat. If the idea is for an API that exposes more information returned
by readdir, though, why not get d_type too when it's available?

> >  - chdir to the directory before you stat and use a relative path: it
> > turns out when working with many files that the overhead of absolute
> > paths is substantial.
> 
> How about using fstatat() instead? chdir() is a no-no since it's
> a process-wide setting.

A) is fstatat even implemented in python?
B) is fstatat even possible under windows?
C) using *at functions for this the usual way incurs overhead in the
form of having to maintain a number of open file handles equal to the
depth of your directory. IIRC, some gnu tools will fork the process to
avoid this limit. Though, since we're not doing this for security
reasons we could fall back on absolute [or deep relative] paths or
reopen '..' to ascend back up, instead.
D) you have to close those handles eventually. what if the caller
doesn't finish the generator?.



More information about the Python-ideas mailing list