Python idiom: Multiple search-and-replace
Randall Hopper
aa8vb at yahoo.com
Fri Apr 14 08:40:23 EDT 2000
Fredrik Lundh:
|Randall Hopper <aa8vb at yahoo.com> wrote:
|> Thanks! It's much more efficient. The 140 seconds original running time
|> was reduced to 11.6 seconds. I can certainly live with that.
|
|thought so ;-)
|
|while you're at it, try replacing the original readline loop with:
|
| while 1:
| lines = fp.readlines(BUFFERSIZE)
| if not lines:
| break
| lines = string.join(lines, "")
| lines = re.sub(...)
| out_fp.write(lines)
|
|where BUFFERSIZE is 1000000 or so...
Elapsed: 2.7118 sec. 76% speedup. 98% speedup overall.
Thanks again Fredrik!
It's hard to believe we're burning 8.9 sec out of 11.6 seconds (77%) just
in the simple loop overhead of iterating over lines. Where's that JIT
Python interpreter when you need it ;-)
Seriously, it would be pretty spiffy if Python could compile the inner loop
to machine code on the Nth iteration (N is some bite-the-bullet threshold),
and reuse it for all subsequent iterations. I know the dynamic nature of
the language complicates the analysis, but if it detects no strange
rebinding (string.join being redefined, etc.), seems like it'd be a real
win.
--
Randall Hopper
aa8vb at yahoo.com
More information about the Python-list
mailing list