Regular expression intricacies: why do REs skip some matches?
Ben Cartwright
bencvt at gmail.com
Tue Apr 11 19:38:52 EDT 2006
Tim Chase wrote:
> > In [1]: import re
> >
> > In [2]: aba_re = re.compile('aba')
> >
> > In [3]: aba_re.findall('abababa')
> > Out[3]: ['aba', 'aba']
> >
> > The return is two matches, whereas, I expected three. Why does this
> > regular expression work this way?
It's just the way regexes work. You may disagree, but it's more
intuitive that iterated pattern searching be non-overlapping by
default. See also:
>>> 'abababa'.count('aba')
2
> Well, if you don't need the actual results, just their
> count, you can use
>
> how_many = len(re.findall('(?=aba)', 'abababa')
>
> which will return 3. However, each result is empty:
>
> >>> print re.findall('(?=aba)', 'abababa')
> ['','','']
>
> You'd have to do some chicanary to get the actual pieces:
(snip)
Actually, you can just define a group inside the lookahead assertion:
>>> re.findall('(?=(aba))', 'abababa')
['aba', 'aba', 'aba']
--Ben
More information about the Python-list
mailing list