text processing problem

Matt matthew_shomphe at countrywide.com
Thu Apr 7 19:14:01 EDT 2005


Maurice LING wrote:
> Hi,
>
> I'm looking for a way to do this: I need to scan a text (paragraph or

> so) and look for occurrences of "<text-x> (<text-x>)". That is, if
the
> text just before the open bracket is the same as the text in the
> brackets, then I have to delete the brackets, with the text in it.
>
> Does anyone knows any way to achieve this?
>
> The closest I've seen is
> (http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/305306) by
> Raymond Hettinger
>
>  >>> s = 'People of [planet], take us to your leader.'
>  >>> d = dict(planet='Earth')
>  >>> print convert_template(s) % d
> People of Earth, take us to your leader.
>
>  >>> s = 'People of <planet>, take us to your leader.'
>  >>> print convert_template(s, '<', '>') % d
> People of Earth, take us to your leader.
>
> """
>
> import re
>
> def convert_template(template, opener='[', closer=']'):
>      opener = re.escape(opener)
>      closer = re.escape(closer)
>      pattern = re.compile(opener + '([_A-Za-z][_A-Za-z0-9]*)' +
closer)
>      return re.sub(pattern, r'%(\1)s', template.replace('%','%%'))
>
> Cheers
> Maurice


Try this:
import re
my_expr = re.compile(r'(\w+) (\(\1\))')
s = "this is (is) a test"
print my_expr.sub(r'\1', s)
#prints 'this is a test'

M@




More information about the Python-list mailing list