[Python-ideas] BetterWalk, a better and faster os.walk() for Python

Ben Hoyt benhoyt at gmail.com
Thu Nov 22 22:34:42 CET 2012


> There are two that worry me, though:
>
> python3.2:         0.8x as fast
> python 2.7:        0.6x as fast
>
> I get the same results on an nfs mount of a zfs file system (the
> remote fs should not matter) and an memory backed file system
> (typically used for /tmp). I had hunt for a disk-based fs to get the
> first set of results :-(.
>
> I suspect that neither of these have d_type on the fs, so we're seeing
> a serious performance hit for systems that don't have d_type. That
> certainly bears further investigation. Could it just be the
> python/ctype implementation vs. native code?

The fallback when d_type isn't present (or return DT_UNKNOWN) is to
call os.stat() anyway, which is almost exactly what the standard
os.walk() does. So yes, the slow down here is almost certainly due to
my pure Python ctypes implementation vs os.listdir()'s C version.

Antoine's suggestion is a good one: rewriting iterdir_stat() in C or
using a ctypes emulation of listdir. Thanks! I'll see what I get time
for.

-Ben



More information about the Python-ideas mailing list