Python idiom: Multiple search-and-replace

Randall Hopper aa8vb at yahoo.com
Wed Apr 12 10:08:16 EDT 2000


There's got to be a better way.  Is there a Python idiom I'm missing?

I want to do search-and-replace of multiple symbols on each line of a file.
But the simple-minded code below takes a while.  It simply uses
string.replace N times per line (N is 240 in this case).  There are 33994
lines (1.1Meg).

Total time: 140.7 seconds.

I stopped to investigate.  What was slowing it up so much?

- Comment out the inner loop, and Python completes in 0.9 sec.  No prob
  there. It can read and write-the data very quickly.
- It's not dictionary lookups.  Converted to a tuple-list and took 3 sec
  longer.  
- I tried a few other things but it only made it take longer than
  140 sec.

Is there a Python feature or standard library API that will get me less
Python code spinning inside this loop?   re.multisub or equivalent? :-)

Thanks,

Randall


------------------------------------------------------------------------------

  symbol_map = { 'oldsym1' : 'newsym1', oldsym2' : 'newsym2', ... }
  fp = open( net_path, "r" )

  while 1:
    line = fp.readline()
    if not line: break

    for old_sym in symbol_map.keys():
      line = string.replace( line, old_sym, symbol_map[ old_sym ] )

    out_fp.write( line )

------------------------------------------------------------------------------

-- 
Randall Hopper
aa8vb at yahoo.com




More information about the Python-list mailing list