Python idiom: Multiple search-and-replace

Randall Hopper aa8vb at yahoo.com
Fri Apr 14 08:40:23 EDT 2000


Fredrik Lundh:
 |Randall Hopper <aa8vb at yahoo.com> wrote:
 |> Thanks!  It's much more efficient.  The 140 seconds original running time
 |> was reduced to 11.6 seconds.  I can certainly live with that.
 |
 |thought so ;-)
 |
 |while you're at it, try replacing the original readline loop with:
 |
 |    while 1:
 |        lines = fp.readlines(BUFFERSIZE)
 |        if not lines:
 |            break
 |        lines = string.join(lines, "")
 |        lines = re.sub(...)
 |        out_fp.write(lines)
 |
 |where BUFFERSIZE is 1000000 or so...

Elapsed: 2.7118 sec.  76% speedup.  98% speedup overall.

Thanks again Fredrik!

It's hard to believe we're burning 8.9 sec out of 11.6 seconds (77%) just
in the simple loop overhead of iterating over lines.  Where's that JIT
Python interpreter when you need it ;-)  

Seriously, it would be pretty spiffy if Python could compile the inner loop
to machine code on the Nth iteration (N is some bite-the-bullet threshold),
and reuse it for all subsequent iterations.  I know the dynamic nature of
the language complicates the analysis, but if it detects no strange
rebinding (string.join being redefined, etc.), seems like it'd be a real
win.

-- 
Randall Hopper
aa8vb at yahoo.com




More information about the Python-list mailing list