Regular expression intricacies: why do REs skip some matches?
Tim Chase
python.list at tim.thechases.com
Tue Apr 11 19:29:59 EDT 2006
> In [1]: import re
>
> In [2]: aba_re = re.compile('aba')
>
> In [3]: aba_re.findall('abababa')
> Out[3]: ['aba', 'aba']
>
> The return is two matches, whereas, I expected three. Why does this
> regular expression work this way?
Well, if you don't need the actual results, just their
count, you can use
how_many = len(re.findall('(?=aba)', 'abababa')
which will return 3. However, each result is empty:
>>> print re.findall('(?=aba)', 'abababa')
['','','']
You'd have to do some chicanary to get the actual pieces:
s = 'abababa'
for f in re.finditer('(?=aba)', s):
print "Found %s at %i" % (
s[f.start():f.start()+3],
f.start())
or
[s[f.start():f.start()+3] for f in
re.finditer('(?=aba)', s)]
Note that both of these know the length of the desired
piece. If not, you may have to do additional processing to
get them to work the way you want. Yippie.
All lovely hacks, but they each return all three hits.
-tim
PS: These likely only work in Python...to use them in grep
or another regexp engine, you'd have to tweak them :*)
More information about the Python-list
mailing list