reusing parts of a string in RE matches?

John Salerno johnjsal at NOSPAMgmail.com
Thu May 11 10:16:06 EDT 2006


Mirco Wahab wrote:

> Py:
>   import re
>   tx = 'a1a2a3A4a35a6b7b8c9c'
>   rg = r'(\w)(?=(.\1))'
>   print re.findall(rg, tx)

The only problem seems to be (and I ran into this with my original 
example too) that what gets returned by this code isn't exactly what you 
are looking for, i.e. the numbers '1', '2', etc. You get a list of 
tuples, and the second item in this tuple contains the number, but also 
the following \w character.

So there still seems to be some work that must be done when dealing with 
overlapping patterns/look-ahead/behind.

Oh wait, a thought just hit me. Instead of doing it as you did:

rg = r'(\w)(?=(.\1))'

Could you do:

rg = r'(\w)(?=(.)\1)'

That would at least isolate the number, although you'd still have to get 
it out of the list/tuple.



More information about the Python-list mailing list