Complicated string substitution

Tim Chase python.list at tim.thechases.com
Wed Feb 13 20:16:31 EST 2008


> I have a file with a lot of the following ocurrences:
> 
> denmark.handa.1-10
> denmark.handa.1-12344
> denmark.handa.1-4
> denmark.handa.1-56

Each on its own line?  Scattered throughout the text?  With other
content that needs to be un-changed?  With other stuff on the
same line?

> denmark.handa.1-10_1
> denmark.handa.1-12344_1
> denmark.handa.1-4_1
> denmark.handa.1-56_1
> 
> so basically I add "_1" at the end of each ocurrence.
> 
> I thought about using sed, but as each "root" is different I have no
> clue how to go through this.

How are the roots different?  Do they all begin with
"denmark.handa."?  Or can the be found by a pattern of "stuff
period stuff period number dash number"?

A couple sed solutions, since you considered them first:

  sed '/denmark\.handa/s/$/_1/'
  sed 's/denmark\.handa\.\d+-\d+/&_1/g'
  sed 's/[a-z]+\.[a-z]+\.\d+-\d+/&_1/g'

Or are you just looking for "number dash number" and want to
suffix the "_1"?

  sed 's/\d+-\d+/&_1/g'

Most of the sed versions translate pretty readily into Python
regexps in the .sub() call.

  import re
  r = re.compile(r'[a-z]+\.[a-z]+\.\d+-\d+')
  out = file('out.txt', 'w')
  for line in file('in.txt'):
    out.write(r.sub(r'\g<0>_1', line))
  out.close()

Tweak the regexps accordingly.

-tkc






More information about the Python-list mailing list