[Tutor] Substring substitution

Kent Johnson kent37 at tds.net
Thu Sep 8 21:36:30 CEST 2005


Bernard Lebel wrote:
> Hello,
> 
> I have a string, and I use a regular expression to search a match in
> it. When I find one, I would like to break down the string, using the
> matched part of it, to be able to perform some formatting and to later
> build a brand new string with the separate parts.
> 
> The regular expression part works ok, but my problem is to extract the
> matched pattern from the string. I'm not sure how to do that...
> 
> 
> sString = 'mt_03_04_04_anim'
> 
> # Create regular expression object
> oRe = re.compile( "\d\d_\d\d\_\d\d" )
> 
> # Break-up path
> aString = sString.split( os.sep )
> 
> # Iterate individual components
> for i in range( 0, len( aString ) ):
> 	
> 	sSubString = aString[i]
> 	
> 	# Search with shot number of 2 digits
> 	oMatch = oRe.search( sSubString )
> 
> 	if oMatch != None:
> 		# Replace last sequence of two digits by 3 digits!!

Hi Bernard,

It sounds like you need to put some groups into your regex and use re.sub().

By putting groups in the regex you can refer to pieces of the match. For example

 >>> import re
 >>> s  = 'mt_03_04_04_anim'
 >>> oRe = re.compile( "(\d\d_\d\d\_)(\d\d)" )
 >>> m = oRe.search(s)
 >>> m.group(1)
'03_04_'
 >>> m.group(2)
'04'

With re.sub(), you provide a replacement pattern that can refer to the groups from the match pattern. So to insert new characters between the groups is easy:

 >>> oRe.sub(r'\1XX\2', s)
'mt_03_04_XX04_anim'

This may be enough power to do what you want, I'm not sure from your description. But re.sub() has another trick up its sleeve - the replacement 'expression' can be a callable which is passed the match object and returns the string to replace it with. For example, if you wanted to find all the two digit numbers in a string and add one to them, you could do it like this:

 >>> def incMatch(m):
 ...   s = m.group(0) # use the whole match
 ...   return str(int(s)+1).zfill(2)
 ...
 >>> re.sub(r'\d\d', incMatch, '01_09_23')
'02_10_24'

This capability can be used to do complicated replacements.

Kent



More information about the Tutor mailing list