text processing problem

Maurice LING mauriceling at acm.org
Thu Apr 7 17:54:49 EDT 2005


Hi,

I'm looking for a way to do this: I need to scan a text (paragraph or 
so) and look for occurrences of "<text-x> (<text-x>)". That is, if the 
text just before the open bracket is the same as the text in the 
brackets, then I have to delete the brackets, with the text in it.

Does anyone knows any way to achieve this?

The closest I've seen is 
(http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/305306) by 
Raymond Hettinger

 >>> s = 'People of [planet], take us to your leader.'
 >>> d = dict(planet='Earth')
 >>> print convert_template(s) % d
People of Earth, take us to your leader.

 >>> s = 'People of <planet>, take us to your leader.'
 >>> print convert_template(s, '<', '>') % d
People of Earth, take us to your leader.

"""

import re

def convert_template(template, opener='[', closer=']'):
     opener = re.escape(opener)
     closer = re.escape(closer)
     pattern = re.compile(opener + '([_A-Za-z][_A-Za-z0-9]*)' + closer)
     return re.sub(pattern, r'%(\1)s', template.replace('%','%%'))

Cheers
Maurice



More information about the Python-list mailing list