Simple py script to calc folder sizes

John Zenger john_zenger at yahoo.com
Tue Mar 21 22:52:45 EST 2006


Caleb Hattingh wrote:
> Hi everyone
> 
> [Short version: I put a some code below: what changes can make it run
> faster?]

On my slow notebook, your code takes about 1.5 seconds to do my 
C:\Python24 dir.  With a few changes my code does it in about 1 second.

Here is my code:

import os, os.path, math

def foldersize(fdir):
     """Returns the size of all data in folder fdir in bytes"""
     root, dirs, files = os.walk(fdir).next()
     files = [os.path.join(root, x) for x in files]
     dirs = [os.path.join(root, x) for x in dirs]
     return sum(map(os.path.getsize, files)) + sum(map(foldersize, dirs))

suffixes = ['bytes','kb','mb','gb','tb']
def prettier(bytesize):
     """Convert a number in bytes to a string in MB, GB, etc"""
     # What power of 1024 is less than or equal to bytesize?
     exponent = int(math.log(bytesize, 1024))
     if exponent > 4:
         return "%d bytes" % bytesize
     return "%8.2f %s" % (bytesize / 1024.0 ** exponent, suffixes[exponent])

rootfolders = [i for i in os.listdir('.') if os.path.isdir(i)]
results = [ (foldersize(folder), folder) for folder in rootfolders ]

for size, folder in sorted(results):
     print "%s\t%s" % (folder, prettier(size))

print
print "Total:\t%s" % prettier(sum ( size for size, folder in results ))

# End

The biggest change I made was to use os.walk rather than os.path.walk. 
os.walk is newer, and a bit easier to understand; it takes just a single 
directory path as an argument, and returns a nice generator object that 
you can use in a for loop to walk the entire tree.  I use it in a 
somewhat unconventional way here.  Look at the docs for a more 
conventional application.

The "map(os.path.getsize, files)" code should run a bit faster than a 
for loop, because map only has to look up the getsize function once.

I use log in the "prettier" function rather than your chain of ifs.  The 
chain of ifs might actually be faster.  But I spent so long studying 
math in school that I like to use it whenever I get a chance.

Some other comments on your code:

> def cmpfunc(a,b):
>     if a.count > b.count:
>         return 1
>     elif a.count == b.count:
>         return 0
>     else:
>         return -1

This could be just "return a.count - b.count".  Cmp does not require -1 
or +1, just a positive, negative, or zero.

> foldersizeobjects.sort(cmpfunc)

You could also use the key parameter; it is usually faster than a cmp 
function.  As you can see, I used a tuple; the sort functions by default 
sort on the first element of the tuples.  Of course, sorting is not a 
serious bottleneck in either program.

> tot=0
> for foldersize in foldersizeobjects:
>     tot=tot+foldersize.count
>     print foldersize

"tot +=" is cooler than tot = tot + .  And perhaps a bit faster.



More information about the Python-list mailing list