Program inefficiency?
thebjorn
BjornSteinarFjeldPettersen at gmail.com
Sat Sep 29 13:33:54 EDT 2007
On Sep 29, 5:22 pm, hall.j... at gmail.com wrote:
> I wrote the following simple program to loop through our help files
> and fix some errors (in case you can't see the subtle RE search that's
> happening, we're replacing spaces in bookmarks with _'s)
>
> the program works great except for one thing. It's significantly
> slower through the later files in the search then through the early
> ones... Before anyone criticizes, I recognize that that middle section
> could be simplified with a for loop... I just haven't cleaned it
> up...
>
> The problem is that the first 300 files take about 10-15 seconds and
> the last 300 take about 2 minutes... If we do more than about 1500
> files in one run, it just hangs up and never finishes...
>
> Is there a solution here that I'm missing? What am I doing that is so
> inefficient?
Ugh, that was entirely too many regexps for my taste :-)
How about something like:
def attr_ndx_iter(txt, attribute):
"Return all the start and end indices for the values of
attribute."
txt = txt.lower()
attribute = attribute.lower() + '='
alen = len(attribute)
chunks = txt.split(attribute)
if len(chunks) == 1:
return
start = len(chunks[0]) + alen
end = -1
for chunk in chunks[1:]:
qchar = chunk[0]
end = start + chunk.index(qchar, 1)
yield start + 1, end
start += len(chunk) + alen
def substr_map(txt, indices, fn):
"Apply fn to text within indices."
res = []
cur = 0
for i,j in indices:
res.append(txt[cur:i])
res.append(fn(txt[i:j]))
cur = j
res.append(txt[cur:])
return ''.join(res)
def transform(s):
"The transformation to do on the attribute values."
return s.replace(' ', '_')
def zap_spaces(txt, *attributes):
for attr in attributes:
txt = substr_map(txt, attr_ndx_iter(txt, attr), transform)
return txt
def mass_replace():
import sys
w = sys.stdout.write
for f in open(r'pathname\editfile.txt'):
try:
open(f, 'w').write(zap_spaces(open(f).read(), 'href',
'name'))
w('.') # progress-meter :-)
except:
print 'Error processing file:', f
minimally-tested'ly y'rs
-- bjorn
More information about the Python-list
mailing list