Beginner's performance problem
Gerhard Häring
gerhard at bigfoot.de
Tue Apr 9 13:20:18 EDT 2002
Mark Charsley wrote in comp.lang.python:
> For reasons best not gone into, I needed to correct the case of a whole
> bunch of SQL code. As such I created the little script below...
>
> It reads in a source file, then for each line it does a case-insensitive
> search for each TableName and ColumnName (as contained in the "names"
> collection), checks that it's not just matching a substring, and then
> corrects the case of the word if necessary.
>
> [snip code]
>
> 1) why is it so slow? I wouldn't have thought that processing 13,000 lines
> * 200 names shouldn't take more than a few seconds on a modern PC.
String manipulation is costly in Python because strings are immutable.
Every string manipulation creates a new string object in effect. A
good approach is to tokenize your string into a list (which is
mutable) and operate on the list only, then join the list into a
string again.
I haven't tried but an approach using the re module should be really
fast, and shorter. Untested code follows:
import re
names = ["ALGORITHM_CLASS", ...]
text = open("ThirteenThousandLines.sql").read()
for name in names:
regex = re.compile(name, re.I) # case insensitive search for name
text = regex.sub(name, text) # replace all occurences
print text
Gerhard
--
mail: gerhard <at> bigfoot <dot> de registered Linux user #64239
web: http://www.cs.fhm.edu/~ifw00065/ OpenPGP public key id AD24C930
public key fingerprint: 3FCC 8700 3012 0A9E B0C9 3667 814B 9CAA AD24 C930
reduce(lambda x,y:x+y,map(lambda x:chr(ord(x)^42),tuple('zS^BED\nX_FOY\x0b')))
More information about the Python-list
mailing list