Python Regular Expressions: re.sub(regex, replacement, subject)

George Sakkis gsakkis at rutgers.edu
Tue Jul 5 15:43:26 EDT 2005


"Vibha Tripathi" <vibtrip at yahoo.com> wrote:

> Hi Folks,
>
> I put a Regular Expression question on this list a
> couple days ago. I would like to rephrase my question
> as below:
>
> In the Python re.sub(regex, replacement, subject)
> method/function, I need the second argument
> 'replacement' to be another regular expression ( not a
> string) . So when I find a 'certain kind of string' in
> the subject, I can replace it with 'another kind of
> string' ( not a predefined string ). Note that the
> 'replacement' may depend on what exact string is found
> as a result of match with the first argument 'regex'.

In re.sub, 'replacement' can be either a string, or a callable that
takes a single match argument and should return the replacement string.
So although replacement cannot be a regular expression, it can be
something even more powerful, a function. Here's a toy example of what
you can do that wouldn't be possible with regular expressions alone:

>>> import re
>>> from datetime import datetime
>>> this_year = datetime.now().year
>>> rx = re.compile(r'(born|gratuated|hired) in (\d{4})')
>>> def replace_year(match):
>>>     return "%s %d years ago" % (match.group(1), this_year - int(match.group(2)))
>>> rx.sub(replace_year, 'I was born in 1979 and gratuated in 1996.')
'I was born 26 years ago and gratuated 9 years ago'

In cases where you don't have to transform the matched string (such as
calling int() and evaluating an expression as in the example) but only
append or prepend another string, there is a simpler solution that
doesn't require writing a replacement function: backreferences.
Replacement can be a string where \1 denotes the first group of the
match, \2 the second and so on. Continuing the example, you could hide
the dates by:

>>> rx.sub(r'\1 in ****', 'I was hired in 2001 in a company of 2001 employees.')
'I was hired in **** in a company of 2001 employees.'

By the way, run the last example without the 'r' in front of the
replacement string and you'll see why it is there for.

HTH,

George




More information about the Python-list mailing list