[Python-ideas] Speed up os.walk() 5x to 9x by using file attributes from FindFirst/NextFile() and readdir()

Ronald Oussoren ronaldoussoren at mac.com
Wed Nov 14 08:14:52 CET 2012


On 13 Nov, 2012, at 21:00, Ben Hoyt <benhoyt at gmail.com> wrote:

>> It would be very odd to have an st_mode that contains a subset of
>> the information the platform can provide. In particular having st_mode
>> would give the impression that it is the full mode.
> 
> Yes, it's slightly odd, but not as odd as you'd think. This is
> especially true for Windows users, because we're used to st_mode only
> being a subset of the information -- the permission bits are basically
> meaningless on Windows.

That's one more reason for returning a new tuple/struct with a type field:
the full st_mode is not useful on Windows, and on Unix readdir doesn't
return a full st_mode in the first place.

> 
> The alternative is to introduce yet another new tuple/struct with
> "type size atime ctime mtime" fields. But you still have to specify
> that it's implementation dependent (Linux/BSD only provides type,
> Windows provides all those fields), and then you have to have ways of
> testing what type the type is. stat_result and the stat module already
> give you those things, which is why I think it's best to stick with
> the stat_result structure.

The interface of the stat module for determining the file type is not very
pretty.

> 
> In terms of what's useful, certainly "type" and "size" are, so you may
> as well throw in atime/ctime/mtime, which Windows also gives us for
> free.

How did you measure the 5x speedup you saw with you modified os.walk?

It would be interesting to see if Unix platforms have a simular speedup, because
if they don't the new API could just return the results of stat (or lstat ...).

Ronald




More information about the Python-ideas mailing list