reusing parts of a string in RE matches?

Wed May 10 22:47:45 EDT 2006

Murali wrote:
> > Yes, and no extra for loops are needed!  You can define groups inside
> > the lookahead assertion:
> >
> >   >>> import re
> >   >>> re.findall(r'(?=(aba))', 'abababababababab')
> >   ['aba', 'aba', 'aba', 'aba', 'aba', 'aba', 'aba']
>
> Wonderful and this works with any regexp, so
>
> import re
>
> def all_occurences(pat,str):
>   return re.findall(r'(?=(%s))'%pat,str)
>
> all_occurences("a.a","abacadabcda") returns ["aba","aca","ada"] as
> required.

Careful.  That won't work as expected for *all* regexps.  Example:

  >>> import re
  >>> re.findall(r'(?=(a.*a))', 'abaca')
  ['abaca', 'aca']

Note that this does *not* find 'aba'.  You might think that making it
non-greedy might help, but:

  >>> re.findall(r'(?=(a.*?a))', 'abaca')
  ['aba', 'aca']

Nope, now it's not finding 'abaca'.

This is by design, though.   From
http://www.regular-expressions.info/lookaround.html (a good read, by
the way):

"""As soon as the lookaround condition is satisfied, the regex engine
forgets about everything inside the lookaround. It will not backtrack
inside the lookaround to try different permutations."""

Moral of the story:  keep lookahead assertions simple whenever
possible.  :-)

--Ben