Optimizing code
Steve Holden
sholden at bellatlantic.net
Thu Feb 24 13:32:46 EST 2000
Gerrit Holl wrote:
>
> Hello,
>
> I have the following script:
>
> #! /usr/bin/env python
>
> import sys
> import os
>
> class DiskUsage:
> __size = 0
> def add(self, filename):
> self.__size = self.__size + os.path.getsize(filename)
> def __call__(self, arg, d, files):
> for file in files:
> filename = os.path.join(d, file)
> if os.path.isfile(filename): self.add(filename)
>
> def __len__(self):
> return self.__size
>
> def du(dir):
> disk = DiskUsage()
> os.path.walk(dir, disk, ())
> return len(disk)
>
> def main():
> if len(sys.argv) != 2:
> sys.stderr.write("usage: %s <filename>" % sys.argv[0])
> sys.exit(1)
> print du(sys.argv[1])
>
> if __name__ == '__main__':
> main()
>
> Timing turns out that the 'os.path.walk' part takes about 2.7 seconds, for
> a 400 MB dir with 1096 dirs and 9082 files. 'du -s ~' takes 0.2 seconds.
> What makes this slow? The special methods? The redefinition of an integer?
> os.path.walk? With longs, it even takes 12 seconds...
>
> Can I optimize it? If so how?
>
> regards,
> Gerrit.
>
> P.S.
> I know it's _easier_ to do os.popen('du') but 1) it's not crossplatform and
> 2) optimizing is instructive.
>
Offhand I would suggest that du uses a lot of specific knowledge about UNIX
filesystem structures which probably isn't available on the user side of the
Python APIs. As you know from your work with GUI's (sorry :-), sometimes
a convenience layer will impose huge penalties in execution time.
Oddly enough I recently worked on reworking dutree.py to work on Windoze,
on the basis that a cross-platform "du" would be a mighty fine thing. I
got as far as computing usage for the sum of all files in each
subdirectory, then realised two things:
1. I would have to know the filesystem's cluster size to go from the
file *contents* size as reported by os.stat() to the disk usage
as required by a "decent" du;
2. I couldn't find any way to account for disk space used by the
directories themselves, since it doesn't seem to be reported in
os.stat()'s output; and
3. Nobody expects the Spanish Inquisition.
At this point, since I was only trying to extend my Pythonicity, I gave up
on the exercise. So thanks for pointing out os.path.getsize: maybe I'll
get back to it!
regards
Steve
--
"If computing ever stops being fun, I'll stop doing it"
More information about the Python-list
mailing list