Beginner's performance problem
Mark Charsley
mark.charsley at REMOVE_THIS.radioscape.com
Tue Apr 9 12:43:00 EDT 2002
For reasons best not gone into, I needed to correct the case of a whole
bunch of SQL code. As such I created the little script below...
It reads in a source file, then for each line it does a case-insensitive
search for each TableName and ColumnName (as contained in the "names"
collection), checks that it's not just matching a substring, and then
corrects the case of the word if necessary.
-------------------------------------------------------------------
import os
import string
names = [
"ALGORITHM_CLASS",
"ALGORITHM_CONSTRAINT",
# 200-odd other names snipped...
"Width",
"WorkUnitHandle"
]
def ProcessFile(filename) :
sourceFile = open(filename)
sourceLines = sourceFile.readlines()
sourceFile.close()
numChanges = 0
for lineNo in range(len(sourceLines)) :
changed = 0
upperSource = string.upper(sourceLines[lineNo])
for name in names :
upperName = string.upper(name)
startPos = 0
while (startPos != -1) :
startPos = string.find(upperSource,upperName,startPos +1)
if (startPos >0) :
previousChar = upperSource[startPos-1:startPos]
nextChar = upperSource[startPos + len(upperName) :\
startPos + len(upperName)+1]
if not previousChar.isalnum() and \
not previousChar == "_" and \
not nextChar.isalnum() and \
not nextChar == "_" :
# OK upperName isn't a substring of the string
# we've found
if not (sourceLines[lineNo][startPos: startPos + \
len(upperName)] == name) :
myNewLine = sourceLines[lineNo][:startPos] + \
name + sourceLines[lineNo][startPos + \
len(upperName):]
sourceLines[lineNo] = myNewLine
changed = 1
print "#",
if changed : numChanges += 1
print
print filename + " : " + ("%d" % numChanges) + " changed lines"
sourceFile = open(filename,"w+")
sourceFile.writelines(sourceLines)
sourceFile.close()
ProcessFile("ThirteenThousandLines.sql")
-------------------------------------------------------------------
It's not a complicated (or particularly elegant) script but it runs rather
slowly - taking about a minute to run on a development spec box.
I (an experienced C++ developer, beginning python programmer) have two
main questions:
1) why is it so slow? I wouldn't have thought that processing 13,000 lines
* 200 names shouldn't take more than a few seconds on a modern PC. PLaying
around with simplified versions seems to indicate that calling
string.find() repeatedly on a list of 13000 strings is a lot slower than
calling string.find() the same number of times on a single large string
containing all the source file. I avoided that approach however because
I'd expect calling "bigString = bigString[:pos] + replacement +
bigString[pos + len(replacement):]" repeatedly _would_ slow things down.
2) what's the best way of finding inefficiencies in python code (other
than asking here?). Are there any profilers out there for python?
Many TIA
--
Mark - personal opinion only, could well be wrong, not representing
company, don't sue us etc. etc.
More information about the Python-list
mailing list