Regular expression intricacies: why do REs skip some matches?

Ben Cartwright bencvt at gmail.com
Tue Apr 11 19:38:52 EDT 2006


Tim Chase wrote:
> > In [1]: import re
> >
> > In [2]: aba_re = re.compile('aba')
> >
> > In [3]: aba_re.findall('abababa')
> > Out[3]: ['aba', 'aba']
> >
> > The return is two matches, whereas, I expected three. Why does this
> > regular expression work this way?

It's just the way regexes work.  You may disagree, but it's more
intuitive that iterated pattern searching be non-overlapping by
default.  See also:

  >>> 'abababa'.count('aba')
  2

> Well, if you don't need the actual results, just their
> count, you can use
>
> how_many = len(re.findall('(?=aba)', 'abababa')
>
> which will return 3.  However, each result is empty:
>
> 	>>> print re.findall('(?=aba)', 'abababa')
> 	['','','']
>
> You'd have to do some chicanary to get the actual pieces:
(snip)

Actually, you can just define a group inside the lookahead assertion:

  >>> re.findall('(?=(aba))', 'abababa')
  ['aba', 'aba', 'aba']

--Ben




More information about the Python-list mailing list