looking for speed-up ideas
Andrew Dalke
adalke at mindspring.com
Mon Feb 3 20:34:02 EST 2003
Ram Bhamidipaty wrote:
> I have some python code that processes a large file. I want to see how
> much faster this code can get. Mind you, I don't _need_ the code to go
> faster - but it sure would be nice if it were faster...
Don't create the FileSize object. Use a simple tuple instead. With
an object you have higher overheads to create the object and to make
the comparison.
Try this. I don't have heap so I do a sort and cut every once in a
while. It also doesn't do full error checking in case the input isn't
in the right format. And it uses a more recent version of Python than
the code you have (eg, no need for xreadlines)
This should be quite fast.
def process(infile):
dirid_info = {}
line = infile.readline()
assert line[:1] == "T"
ignore, dirname, dirid = line.split()
dirid_info[dirid] = (None, dirname)
fileinfo = []
for line in infile:
if line[:1] == "F":
ignore, size, name = line.split("/")
# negate size so 'largest' is sorted first
fileinfo.append( (-long(size), dirid, name) )
if len(fileinfo) > 10000:
# Could use a heapq....
fileinfo.sort()
fileinfo = fileinfo[:200]
else:
ignore, dirname, parent_id, dirid = line[:-1].split("/")
dirid_info[dirid] = (parent_id, dirname)
fileinfo.sort()
fileinfo = fileinfo[:200]
for size, dirid, name in fileinfo:
size = -size
components = [name[:-1]] # need to chop newline
while dirid != None:
dirid, dirname = dirid_info[dirid]
components.append(dirname)
components.reverse()
print size, "/".join(components)
def test():
import cStringIO
s = """\
T /remote 0
S/name/0/1
S/joe/1/2
S/bob/1/3
F/3150900/big_file.tar.gz
S/testing/3/4
F/414/.envrc
F/276/BUILD_FLAGS
F/36505/make.incl
F/3861/build_envrc
D/spam/1/5
F/123456789012345678/really_quite_a_bit_of_spam
"""
f = cStringIO.StringIO(s)
process(f)
if __name__ == "__main__":
test()
Here's the output from the test run
123456789012345678 /remote/name/spam/really_quite_a_bit_of_spam
3150900 /remote/name/bob/big_file.tar.gz
36505 /remote/name/bob/testing/make.incl
3861 /remote/name/bob/testing/build_envrc
414 /remote/name/bob/testing/.envrc
276 /remote/name/bob/testing/BUILD_FLAGS
Andrew
dalke at dalkescientific.com
More information about the Python-list
mailing list