Simple py script to calc folder sizes
John Zenger
john_zenger at yahoo.com
Tue Mar 21 22:52:45 EST 2006
Caleb Hattingh wrote:
> Hi everyone
>
> [Short version: I put a some code below: what changes can make it run
> faster?]
On my slow notebook, your code takes about 1.5 seconds to do my
C:\Python24 dir. With a few changes my code does it in about 1 second.
Here is my code:
import os, os.path, math
def foldersize(fdir):
"""Returns the size of all data in folder fdir in bytes"""
root, dirs, files = os.walk(fdir).next()
files = [os.path.join(root, x) for x in files]
dirs = [os.path.join(root, x) for x in dirs]
return sum(map(os.path.getsize, files)) + sum(map(foldersize, dirs))
suffixes = ['bytes','kb','mb','gb','tb']
def prettier(bytesize):
"""Convert a number in bytes to a string in MB, GB, etc"""
# What power of 1024 is less than or equal to bytesize?
exponent = int(math.log(bytesize, 1024))
if exponent > 4:
return "%d bytes" % bytesize
return "%8.2f %s" % (bytesize / 1024.0 ** exponent, suffixes[exponent])
rootfolders = [i for i in os.listdir('.') if os.path.isdir(i)]
results = [ (foldersize(folder), folder) for folder in rootfolders ]
for size, folder in sorted(results):
print "%s\t%s" % (folder, prettier(size))
print
print "Total:\t%s" % prettier(sum ( size for size, folder in results ))
# End
The biggest change I made was to use os.walk rather than os.path.walk.
os.walk is newer, and a bit easier to understand; it takes just a single
directory path as an argument, and returns a nice generator object that
you can use in a for loop to walk the entire tree. I use it in a
somewhat unconventional way here. Look at the docs for a more
conventional application.
The "map(os.path.getsize, files)" code should run a bit faster than a
for loop, because map only has to look up the getsize function once.
I use log in the "prettier" function rather than your chain of ifs. The
chain of ifs might actually be faster. But I spent so long studying
math in school that I like to use it whenever I get a chance.
Some other comments on your code:
> def cmpfunc(a,b):
> if a.count > b.count:
> return 1
> elif a.count == b.count:
> return 0
> else:
> return -1
This could be just "return a.count - b.count". Cmp does not require -1
or +1, just a positive, negative, or zero.
> foldersizeobjects.sort(cmpfunc)
You could also use the key parameter; it is usually faster than a cmp
function. As you can see, I used a tuple; the sort functions by default
sort on the first element of the tuples. Of course, sorting is not a
serious bottleneck in either program.
> tot=0
> for foldersize in foldersizeobjects:
> tot=tot+foldersize.count
> print foldersize
"tot +=" is cooler than tot = tot + . And perhaps a bit faster.
More information about the Python-list
mailing list