Regular expression intricacies: why do REs skip some matches?

Tim Chase python.list at tim.thechases.com
Tue Apr 11 19:29:59 EDT 2006


> In [1]: import re
> 
> In [2]: aba_re = re.compile('aba')
> 
> In [3]: aba_re.findall('abababa')
> Out[3]: ['aba', 'aba']
> 
> The return is two matches, whereas, I expected three. Why does this
> regular expression work this way?

Well, if you don't need the actual results, just their 
count, you can use

how_many = len(re.findall('(?=aba)', 'abababa')

which will return 3.  However, each result is empty:

	>>> print re.findall('(?=aba)', 'abababa')
	['','','']

You'd have to do some chicanary to get the actual pieces:

	s = 'abababa'
	for f in re.finditer('(?=aba)', s):
		print "Found %s at %i" % (
			s[f.start():f.start()+3],
			f.start())

or

	[s[f.start():f.start()+3] for f in
		re.finditer('(?=aba)', s)]

Note that both of these know the length of the desired 
piece.  If not, you may have to do additional processing to 
get them to work the way you want.  Yippie.

All lovely hacks, but they each return all three hits.

-tim
PS:  These likely only work in Python...to use them in grep 
or another regexp engine, you'd have to tweak them :*)









More information about the Python-list mailing list