Beginner's performance problem

Mark Charsley mark.charsley at REMOVE_THIS.radioscape.com
Tue Apr 9 12:43:00 EDT 2002


For reasons best not gone into, I needed to correct the case of a whole 
bunch of SQL code. As such I created the little script below...

It reads in a source file, then for each line it does a case-insensitive 
search for each TableName and ColumnName (as contained in the "names" 
collection), checks that it's not just matching a substring, and then 
corrects the case of the word if necessary.

-------------------------------------------------------------------

import os
import string

names = [
    "ALGORITHM_CLASS",
    "ALGORITHM_CONSTRAINT",
# 200-odd other names snipped...
    "Width",
    "WorkUnitHandle"
    ]

def ProcessFile(filename) :
    sourceFile = open(filename)
    sourceLines = sourceFile.readlines()
    sourceFile.close()


    numChanges = 0

    for lineNo in range(len(sourceLines)) :

        changed = 0
        upperSource = string.upper(sourceLines[lineNo])

        for name in names :

            upperName = string.upper(name)
            startPos = 0
            while (startPos != -1) :
                startPos = string.find(upperSource,upperName,startPos +1)
                if (startPos >0) :
                    previousChar = upperSource[startPos-1:startPos]
                    nextChar = upperSource[startPos + len(upperName) :\
			startPos + len(upperName)+1]
                    
                    if  not previousChar.isalnum() and \
                        not previousChar == "_" and \
                        not nextChar.isalnum() and \
                        not nextChar == "_" :
                        # OK upperName isn't a substring of the string 
			# we've found
                        if not (sourceLines[lineNo][startPos: startPos + \
                          len(upperName)] == name) :
                            myNewLine = sourceLines[lineNo][:startPos] + \
                              name + sourceLines[lineNo][startPos + \
                              len(upperName):]
                            
                            sourceLines[lineNo] = myNewLine
                            changed = 1
                            print "#",

        if changed : numChanges += 1

    print
    print filename + " : " + ("%d" % numChanges) + " changed lines"

    sourceFile = open(filename,"w+")
    sourceFile.writelines(sourceLines)
    sourceFile.close()


ProcessFile("ThirteenThousandLines.sql")

-------------------------------------------------------------------

It's not a complicated (or particularly elegant) script but it runs rather 
slowly - taking about a minute to run on a development spec box.

I (an experienced C++ developer, beginning python programmer) have two 
main questions:

1) why is it so slow? I wouldn't have thought that processing 13,000 lines 
* 200 names shouldn't take more than a few seconds on a modern PC. PLaying 
around with simplified versions seems to indicate that calling 
string.find() repeatedly on a list of 13000 strings is a lot slower than 
calling string.find() the same number of times on a single large string 
containing all the source file. I avoided that approach however because 
I'd expect calling "bigString = bigString[:pos] + replacement + 
bigString[pos + len(replacement):]" repeatedly _would_ slow things down.

2) what's the best way of finding inefficiencies in python code (other 
than asking here?). Are there any profilers out there for python?


Many TIA

-- 

Mark - personal opinion only, could well be wrong, not representing 
company, don't sue us etc. etc.



More information about the Python-list mailing list