Faster os.walk()

fuzzylollipop jarrod.roberson at gmail.com
Wed Apr 20 12:54:54 EDT 2005


du is faster than my code that does the same thing in python, it is
highly optomized at the os level.

that said, I profiled spawning an external process to call du and over
the large number of times I need to do this it is actually slower to
execute du externally than my os.walk() implementation.

du does not return the value I need anyway, I need files only not raw
blocks consumed which is what du returns. also I need to filter out
some files and dirs.

after extensive profiling I found out that the way that os.walk() is
implemented it calls os.stat() on the dirs and files multiple times and
that is where all the time is going.

I guess I need something like os.statcache() but that is deprecated,
and probably wouldn't fix my problem. I only walk the dir once and then
cache all bytes, it is the multiple calls to os.stat() that is kicked
off by the os.walk() command internally on all the isdir() and
getsize() and what not.

just wanted to check and see if anyone had already solved this problem.




More information about the Python-list mailing list