Optimization help needed: Search and Replace using dictionary of parameters

Jason Orendorff jason at jorendorff.com
Mon Dec 31 13:37:32 EST 2001


>     Before I start writing code, any ideas what is the fastest way of
> doing it ?:
>         regex- or string -functions ? Map or readlines() ?

sed could be much faster (albeit less flexible) than Python
for this task.

Your data structure could be slow.
Use nested dictionaries instead:

x = {
  'filename1':    { 'parameter': 'value',
                    'parameter2': 'value',
                    'parameter3': 'value' },
  'filename2':    { 'parameter2': 'value',
                    'parameter4': 'value',
                    'parameter6': 'value' },
  'filename3':    { 'parameter5': 'value',
                    'parameter7': 'value',
                    'parameter8': 'value' },
  'filename4':    { 'parameter': 'value',
                    'parameter3': 'value' }
}


As for string vs re, it depends.  Just use whichever one is easier
for your particular situation.  But take a special look at re.sub().

  def get_replacement(match):
      param = match.group(1)
      return lookup[filename][param]

  for line in file.xreadlines():
      # Find and replace all tags that are set off in a certain way...
      line = re.sub(r'<<([A-Z0-9_]+)>>', get_replacement, line)
      out.write(line)



To read lines from a file, the fastest thing is probably:

  # Convoluted, but speedy
  x = file.readlines(16000)
  while x:
      for line in x:
          blah_blah_blah(line)
      x = file.readlines(16000)

But it is usually plenty fast enough to do:

  # Quick and obvious
  for line in file.xreadlines():
      blah_blah_blah(line)

Or:

  # Quick and even more obvious, but new in Python 2.2
  for line in file:
      blah_blah_blah(line)

## Jason Orendorff    http://www.jorendorff.com/




More information about the Python-list mailing list