reusing parts of a string in RE matches?

Thu May 11 10:16:06 EDT 2006

Mirco Wahab wrote:

> Py:
>   import re
>   tx = 'a1a2a3A4a35a6b7b8c9c'
>   rg = r'(\w)(?=(.\1))'
>   print re.findall(rg, tx)

The only problem seems to be (and I ran into this with my original 
example too) that what gets returned by this code isn't exactly what you 
are looking for, i.e. the numbers '1', '2', etc. You get a list of 
tuples, and the second item in this tuple contains the number, but also 
the following \w character.

So there still seems to be some work that must be done when dealing with 
overlapping patterns/look-ahead/behind.

Oh wait, a thought just hit me. Instead of doing it as you did:

rg = r'(\w)(?=(.\1))'

Could you do:

rg = r'(\w)(?=(.)\1)'

That would at least isolate the number, although you'd still have to get 
it out of the list/tuple.