Optimizing code

Steve Holden sholden at bellatlantic.net
Thu Feb 24 13:32:46 EST 2000


Gerrit Holl wrote:
> 
> Hello,
> 
> I have the following script:
> 
> #! /usr/bin/env python
> 
> import sys
> import os
> 
> class DiskUsage:
>     __size = 0
>     def add(self, filename):
>         self.__size = self.__size + os.path.getsize(filename)
>     def __call__(self, arg, d, files):
>         for file in files:
>             filename = os.path.join(d, file)
>             if os.path.isfile(filename): self.add(filename)
> 
>     def __len__(self):
>         return self.__size
> 
> def du(dir):
>     disk = DiskUsage()
>     os.path.walk(dir, disk, ())
>     return len(disk)
> 
> def main():
>     if len(sys.argv) != 2:
>         sys.stderr.write("usage: %s <filename>" % sys.argv[0])
>         sys.exit(1)
>     print du(sys.argv[1])
> 
> if __name__ == '__main__':
>     main()
> 
> Timing turns out that the 'os.path.walk' part takes about 2.7 seconds, for
> a 400 MB dir with 1096 dirs and 9082 files. 'du -s ~' takes 0.2 seconds.
> What makes this slow? The special methods? The redefinition of an integer?
> os.path.walk? With longs, it even takes 12 seconds...
> 
> Can I optimize it? If so how?
> 
> regards,
> Gerrit.
> 
> P.S.
> I know it's _easier_ to do os.popen('du') but 1) it's not crossplatform and
> 2) optimizing is instructive.
> 

Offhand I would suggest that du uses a lot of specific knowledge about UNIX
filesystem structures which probably isn't available on the user side of the
Python APIs.  As you know from your work with GUI's (sorry :-), sometimes
a convenience layer will impose huge penalties in execution time.

Oddly enough I recently worked on reworking dutree.py to work on Windoze,
on the basis that a cross-platform "du" would be a mighty fine thing.  I
got as far as computing usage for the sum of all files in each
subdirectory, then realised two things:

   1.	I would have to know the filesystem's cluster size to go from the
	file *contents* size as reported by os.stat() to the disk usage
	as required by a "decent" du;

   2.	I couldn't find any way to account for disk space used by the
	directories themselves, since it doesn't seem to be reported in
	os.stat()'s output; and

   3.	Nobody expects the Spanish Inquisition.

At this point, since I was only trying to extend my Pythonicity, I gave up
on the exercise.  So thanks for pointing out os.path.getsize: maybe I'll
get back to it!

regards
 Steve


--
"If computing ever stops being fun, I'll stop doing it"



More information about the Python-list mailing list