How to insert string in each match using RegEx iterator

John S jstrickler at gmail.com
Sat Jun 13 17:08:27 EDT 2009


On Jun 10, 12:13 am, "504cr... at gmail.com" <504cr... at gmail.com> wrote:
> By what method would a string be inserted at each instance of a RegEx
> match?
>
> For example:
>
> string = '123 abc 456 def 789 ghi'
> newstring = ' INSERT 123 abc INSERT 456 def INSERT 789 ghi'
>
> Here's the code I started with:
>
> >>> rePatt = re.compile('\d+\s')
> >>> iterator = rePatt.finditer(string)
> >>> count = 0
> >>> for match in iterator:
>
>         if count < 1:
>                 print string[0:match.start()] + ' INSERT ' + string[match.start
> ():match.end()]
>         elif count >= 1:
>                 print ' INSERT ' + string[match.start():match.end()]
>         count = count + 1
>
> My code returns an empty string.
>
> I'm new to Python, but I'm finding it really enjoyable (with the
> exception of this challenging puzzle).
>
> Thanks in advance.

I like using a *callback* function instead of *plain text* with the
re.sub() method. To do this, call the sub() function in the normal
way, but instead of specifying a string as the replacement, specify a
function. This function expects the same match object returned by
re.search() or re.match(). The text matched by your RE is replaced by
the return value of the function. This gives you a lot of flexibility;
you can use the matched text to look up values in files or databases,
or online, for instance, and you can do any sort of text manipulation
desired.

----8<-----------------------------------------------------------------------
import re

# original string
oldstring = '123 abc 456 def 789 ghi'

# RE to match a sequence of 1 or more digits
rx_digits = re.compile(r"\d+")

# callback function -- expects a Match object, returns the replacement
string
def repl_func(m):
    return 'INSERT ' + m.group(0)

# do the substitution
newstring =  rx_digits.sub(repl_func,oldstring)

print "OLD:",oldstring
print "NEW:",newstring
---------------------------------------------------------------------------------
Output:
OLD: 123 abc 456 def 789 ghi
NEW: INSERT 123 abc INSERT 456 def INSERT 789 ghi


You could also do it with a lambda function if you didn't want to
write a separate function:
newstring =  rx_digits.sub(lambda m: 'INSERT ' + m.group(0),oldstring)

I understand that for this simple case, '
    'INSERT ' + \1
is sufficient, and a callback is overkill; I wanted to show the OP a
more generic approach to complex substitutions.




More information about the Python-list mailing list