Unexpected regex result

MRAB google at mrabarnett.plus.com
Thu Aug 21 21:29:44 EDT 2008


I'm working on the sources for the regex module (_sre.c) but I've come
across some behaviour that I wasn't aware of before:

>>> re.match('((a)|b)*', 'abc').groups()
('b', 'a')

The regex module was modified to return this instead of the previous
('b', '') in issue #725106 because both Perl and sed returned this.

My version of the module returns ('b', None), which was what I thought/
expected was the correct answer. Could someone explain to me what the
rationale for returning ('b', 'a') is? Is it just because Perl and sed
do this? (And if so, why has it been decided that it shouldn't be
possible to split a string on a zero-width match even though Perl
does? :-()



More information about the Python-list mailing list