Must be a bug in the re module [was: Why this result with the re module]

John Bond lists at asd-group.com
Wed Nov 3 00:10:57 EDT 2010


On 3/11/2010 4:02 AM, MRAB wrote:
> On 03/11/2010 03:42, Yingjie Lan wrote:
>>> Matches an empty string, returns ''
>>>
>>> The result is therefore ['Mar', '', '', 'lam', '', '']
>>
>> Thanks, now I see it through with clarity.
>> Both you and JB are right about this case.
>> However, what if the regex is ((.a.)*)* ?
>>
> Actually, in hindsight, my explanation is slightly wrong!
>
> re.search and the others return None for an unmatched group, but
> re.findall returns '' for an unmatched group, so instead of saying:
>
>     Matches an empty string, returns ''
>
> I should have said:
>
>     The group doesn't match at all, so .findall returns ''
>
> As for "((.a.)*)*", the inner group and repeat match like before, but
> then the outer repeat and group try again.
>
> The inner group can't match again, so it's unchanged (and it still
> remembers the last successful capture), and the outer group therefore
> matches an empty string.
>
> Therefore the outer (first) group is always an empty string and the
> inner (second) group is the same as the previous example (the last
> capture or '' if no capture).

Now I'm confused - how can something with "zero or more occurrences" not 
match?

Cheers, JB




More information about the Python-list mailing list