[issue1647489] zero-length match confuses re.finditer()

Jeffrey C. Jacobs report at bugs.python.org
Wed Sep 24 20:33:09 CEST 2008


Jeffrey C. Jacobs <timehorse at users.sourceforge.net> added the comment:

Ah, I see the problem, if ptr is not incremented, then it will keep
matching the first expression, (^z*), so it would have to both 'skip'
the 'a' and NOT skip the 'a'.  Hmm.  You're right, Matthew, this is
pretty complicated.  Now, for your expression, Matthew,
r'(z*)|(^q*)|(\w+)', Perl gives:

"",undef,undef
undef,undef,"abc"
"",undef,undef

Meaning it doesn't even bother matching the ^q* since the ^z* matches
first.  This seems the logical behaviour and fits with the idea that a
Zero-Width match would both only match once and NOT consume any
characters.  An internal flag would just have to be created to tell the
2 find functions whether the current value of ptr would allow for a "No
Zero-Width Match" option on second go-around.

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue1647489>
_______________________________________


More information about the Python-bugs-list mailing list