Beginner's performance problem

Tue Apr 9 13:20:18 EDT 2002

Mark Charsley wrote in comp.lang.python:
> For reasons best not gone into, I needed to correct the case of a whole 
> bunch of SQL code. As such I created the little script below...
> 
> It reads in a source file, then for each line it does a case-insensitive 
> search for each TableName and ColumnName (as contained in the "names" 
> collection), checks that it's not just matching a substring, and then 
> corrects the case of the word if necessary.
>
> [snip code]
>
> 1) why is it so slow? I wouldn't have thought that processing 13,000 lines 
> * 200 names shouldn't take more than a few seconds on a modern PC.

String manipulation is costly in Python because strings are immutable.
Every string manipulation creates a new string object in effect. A
good approach is to tokenize your string into a list (which is
mutable) and operate on the list only, then join the list into a
string again.

I haven't tried but an approach using the re module should be really
fast, and shorter. Untested code follows:

import re
names = ["ALGORITHM_CLASS", ...]

text = open("ThirteenThousandLines.sql").read()
for name in names:
    regex = re.compile(name, re.I) # case insensitive search for name
    text = regex.sub(name, text)   # replace all occurences
print text

Gerhard
-- 
mail:   gerhard <at> bigfoot <dot> de       registered Linux user #64239
web:    http://www.cs.fhm.edu/~ifw00065/    OpenPGP public key id AD24C930
public key fingerprint: 3FCC 8700 3012 0A9E B0C9  3667 814B 9CAA AD24 C930
reduce(lambda x,y:x+y,map(lambda x:chr(ord(x)^42),tuple('zS^BED\nX_FOY\x0b')))