Must be a bug in the re module [was: Why this result with the re module]

MRAB python at mrabarnett.plus.com
Wed Nov 3 13:53:08 EDT 2010


On 03/11/2010 04:10, John Bond wrote:
> On 3/11/2010 4:02 AM, MRAB wrote:
>> On 03/11/2010 03:42, Yingjie Lan wrote:
>>>> Matches an empty string, returns ''
>>>>
>>>> The result is therefore ['Mar', '', '', 'lam', '', '']
>>>
>>> Thanks, now I see it through with clarity.
>>> Both you and JB are right about this case.
>>> However, what if the regex is ((.a.)*)* ?
>>>
>> Actually, in hindsight, my explanation is slightly wrong!
>>
>> re.search and the others return None for an unmatched group, but
>> re.findall returns '' for an unmatched group, so instead of saying:
>>
>> Matches an empty string, returns ''
>>
>> I should have said:
>>
>> The group doesn't match at all, so .findall returns ''
>>
>> As for "((.a.)*)*", the inner group and repeat match like before, but
>> then the outer repeat and group try again.
>>
>> The inner group can't match again, so it's unchanged (and it still
>> remembers the last successful capture), and the outer group therefore
>> matches an empty string.
>>
>> Therefore the outer (first) group is always an empty string and the
>> inner (second) group is the same as the previous example (the last
>> capture or '' if no capture).
>
> Now I'm confused - how can something with "zero or more occurrences" not
> match?
>
Perhaps I just phrased it badly.

Given a regex like "(.a.)*", the group might not match, but the regex
itself will.



More information about the Python-list mailing list