Faster os.walk()

Nick Craig-Wood nick at craig-wood.com
Wed Apr 20 13:30:02 EDT 2005


fuzzylollipop <jarrod.roberson at gmail.com> wrote:
>  I am trying to get the number of bytes used by files in a directory.
>  I am using a large directory ( lots of stuff checked out of multiple
>  large cvs repositories ) and there is lots of wasted time doing
>  multiple os.stat() on dirs and files from different methods.

I presume you are saying that the os.walk() has to stat() each file to
see whether it is a directory or not, and that you are stat()-ing each
file to count its bytes?

If you want to just get away with the one stat() you'll have to
re-implement os.walk yourself.

Another trick for speeding up lots of stats is to chdir() to the
directory you are processing, and then just use the leafnames in
stat().  The OS then doesn't have to spend ages parsing lots of paths.

However even if you implement both the above, I don't reckon you'll
see a lot of improvement given that decent OSes have a very good cache
for stat results, and that parsing file names is very quick too,
compared to python.

-- 
Nick Craig-Wood <nick at craig-wood.com> -- http://www.craig-wood.com/nick



More information about the Python-list mailing list