the buggy regex in Python

MRAB python at mrabarnett.plus.com
Thu Nov 25 12:18:24 EST 2010


On 25/11/2010 16:44, Yingjie Lan wrote:
> --- On Thu, 11/25/10, MRAB<python at mrabarnett.plus.com>  wrote:
>> re.findall performs multiple searches, each starting where
>> the previous
>> one finished. The first match started at the start of the
>> string and
>> finished at its end. The second match started at that point
>> (the end of
>> the string) and found another match, ending at the end of
>> the string.
>> It tried to match a third time, but that failed because it
>> would have
>> matched an empty string again (it's not allowed to return 2
>> contiguous
>> empty strings).
>>
>>> Isn't this a bug?
>>>
>> No, but it can be confusing at times! :-)
>> --
>
> But the last empty string is matched twice -- so it is
> an overlapping. But findall is supposed not to return
> overlapping matches. So I think this does not live up
> to the documentation -- thus I still consider it a bug.
>
Look at the spans:

 >>> for m in re.finditer('((.d.)*)*', 'adb'):
	print(m.span())

	
(0, 3)
(3, 3)

There's an non-empty match followed by an empty match.



More information about the Python-list mailing list