Optimizing code

Harald Hanche-Olsen hanche at math.ntnu.no
Thu Feb 24 17:09:53 EST 2000


+ Gerrit Holl <gerrit.holl at pobox.com>:

| class DiskUsage:
|     __size = 0
|     def add(self, filename):
|         self.__size = self.__size + os.path.getsize(filename)
|     def __call__(self, arg, d, files):
|         for file in files:
|             filename = os.path.join(d, file)
|             if os.path.isfile(filename): self.add(filename)
| 
|     def __len__(self):
|         return self.__size
| 
| def du(dir):
|     disk = DiskUsage()
|     os.path.walk(dir, disk, ())
|     return len(disk)

[...]

| Timing turns out that the 'os.path.walk' part takes about 2.7
| seconds, for a 400 MB dir with 1096 dirs and 9082 files. 'du -s ~'
| takes 0.2 seconds.  What makes this slow? The special methods? The
| redefinition of an integer?  os.path.walk? With longs, it even takes
| 12 seconds...

One thing that slows your code down, is that it calls stat() three
times on every regular file in the tree:  First, in os.path.isfile,
second, in os.path.getsize, and third, in os.path.walk, which needs to
find out if a filename corresponds to a directory or not.

| Can I optimize it? If so how?

Here is my best effort so far.  It is nearly three times as fast as
yours (but less portable perhaps).  Well, actually yours didn't work
at all on my system, because the length of a file is a long integer:

  File "du.py", line 21, in du
    return len(disk)
TypeError: __len__() should return an int

#! /usr/bin/env python

import sys
import os
import stat

class DiskUsage:
    def __init__(self):
        self.__size = 0
    def __call__(self, dir):
	# Importing these names is possibly a useless optimization:
        from stat import S_ISDIR, S_ISREG, ST_MODE, ST_SIZE
        files = os.listdir(dir)
        dirs = []
        for file in files:
            filename = os.path.join(dir, file)
            s = os.lstat(filename)
            mode = s[ST_MODE]
            if S_ISDIR(mode):
                dirs.append(filename)
            elif S_ISREG(mode):
                self.__size = self.__size + s[ST_SIZE]
        for dir in dirs:
            self(dir)
    def len(self):
        return self.__size

def du(dir):
    disk = DiskUsage()
    disk(dir)
    return disk.len()

def main():
    if len(sys.argv) != 2:
        sys.stderr.write("usage: %s <filename>" % sys.argv[0])
        sys.exit(1)
    print du(sys.argv[1])

if __name__ == '__main__':
    main()

-- 
* Harald Hanche-Olsen     <URL:http://www.math.ntnu.no/~hanche/>
- "There arises from a bad and unapt formation of words
   a wonderful obstruction to the mind."  - Francis Bacon



More information about the Python-list mailing list