python/regex question... hope someone can help

John Machin sjmachin at lexicon.net
Sun Dec 9 04:57:23 EST 2007


On Dec 9, 6:13 pm, charonzen <your.mas... at gmail.com> wrote:
> I have a list of strings.  These strings are previously selected
> bigrams with underscores between them ('and_the', 'nothing_given', and
> so on).  I need to write a regex that will read another text string
> that this list was derived from and replace selections in this text
> string with those from my list.  So in my text string, '... and the...
> ' becomes ' ... and_the...'.   I can't figure out how to manipulate
>
> re.sub(r'([a-z]*) ([a-z]*)', r'(????)', textstring)
>
> Any suggestions?

The usual suggestion is: Don't bother with regexes when simple string
methods will do the job.

>>> def ch_replace(alist, text):
...     for bigram in alist:
...         original = bigram.replace('_', ' ')
...         text = text.replace(original, bigram)
...     return text
...
>>> print ch_replace(
...     ['quick_brown', 'lazy_dogs', 'brown_fox'],
...     'The quick brown fox jumped over the lazy dogs.'
...     )
The quick_brown_fox jumped over the lazy_dogs.
>>> print ch_replace(['red_herring'], 'He prepared herring fillets.')
He prepared_herring fillets.
>>>

Another suggestion is to ensure that the job specification is not
overly simplified. How did you parse the text into "words" in the
prior exercise that produced the list of bigrams? Won't you need to
use the same parsing method in the current exercise of tagging the
bigrams with an underscore?

Cheers,
John



More information about the Python-list mailing list