Regular expression intricacies: why do REs skip some matches?

Tue Apr 11 18:41:51 EDT 2006

Hey guys and gals,
This is a followup of my "Counting all permutations of a substring"
thread (see
http://groups.google.com/group/comp.lang.python/browse_thread/thread/60ebeb7ae381b0a9/7657235b3fd3966f#7657235b3fd3966f
in Google Groups) I'm still having a difficult time figuring out the
intricacies of regular expressions and consecutive matches. Here's a
brief example:

In [1]: import re

In [2]: aba_re = re.compile('aba')

In [3]: aba_re.findall('abababa')
Out[3]: ['aba', 'aba']

The return is two matches, whereas, I expected three. Why does this
regular expression work this way?

Using redemo.py, one can see that the matches are occurring at the
following spots:
abababa
^   ^   (where ^ indicates the start of a match)
Ideally, there'd be a way to create the regular expression to get at
this match, too:
abababa
  ^
So that the total matches are:
abababa
^ ^ ^

Is this simply not the way REs work? Does this sort of matching really
have to be home-coded?

Confusedly yours,
Chris