Program inefficiency?

thebjorn BjornSteinarFjeldPettersen at gmail.com
Sat Sep 29 13:33:54 EDT 2007


On Sep 29, 5:22 pm, hall.j... at gmail.com wrote:
> I wrote the following simple program to loop through our help files
> and fix some errors (in case you can't see the subtle RE search that's
> happening, we're replacing spaces in bookmarks with _'s)
>
> the program works great except for one thing. It's significantly
> slower through the later files in the search then through the early
> ones... Before anyone criticizes, I recognize that that middle section
> could be simplified with a for loop... I just haven't cleaned it
> up...
>
> The problem is that the first 300 files take about 10-15 seconds and
> the last 300 take about 2 minutes... If we do more than about 1500
> files in one run, it just hangs up and never finishes...
>
> Is there a solution here that I'm missing? What am I doing that is so
> inefficient?

Ugh, that was entirely too many regexps for my taste :-)

How about something like:

def attr_ndx_iter(txt, attribute):
    "Return all the start and end indices for the values of
attribute."
    txt = txt.lower()
    attribute = attribute.lower() + '='
    alen = len(attribute)
    chunks = txt.split(attribute)
    if len(chunks) == 1:
        return

    start = len(chunks[0]) + alen
    end = -1

    for chunk in chunks[1:]:
        qchar = chunk[0]
        end = start + chunk.index(qchar, 1)
        yield start + 1, end
        start += len(chunk) + alen

def substr_map(txt, indices, fn):
    "Apply fn to text within indices."
    res = []
    cur = 0

    for i,j in indices:
        res.append(txt[cur:i])
        res.append(fn(txt[i:j]))
        cur = j

    res.append(txt[cur:])
    return ''.join(res)

def transform(s):
    "The transformation to do on the attribute values."
    return s.replace(' ', '_')

def zap_spaces(txt, *attributes):
    for attr in attributes:
        txt = substr_map(txt, attr_ndx_iter(txt, attr), transform)
    return txt

def mass_replace():
    import sys
    w = sys.stdout.write

    for f in open(r'pathname\editfile.txt'):
        try:
            open(f, 'w').write(zap_spaces(open(f).read(), 'href',
'name'))
            w('.') # progress-meter :-)
        except:
            print 'Error processing file:', f

minimally-tested'ly y'rs
-- bjorn




More information about the Python-list mailing list