[issue16688] Backreferences make case-insensitive regex fail on non-ASCII strings.

Matthew Barnett report at bugs.python.org
Sun Dec 16 01:24:21 CET 2012


Matthew Barnett added the comment:

I found another bug while looking through the source.

On line 495 in function SRE_COUNT:

    if (maxcount < end - ptr && maxcount != 65535)
        end = ptr + maxcount*state->charsize;

where 'end' and 'ptr' are of type 'char*'. That means that 'end - ptr' is the length in _bytes_, not characters.

If the byte after the end of the string is 0 then you get this:

>>> # Good:
>>> re.search(r"\x00{1,3}", "a\x00\x00").span()
(1, 3)
>>> # Bad:
>>> re.search(r"\x00{1,3}", "\u0100\x00\x00").span()
(1, 4)

I'll keep looking before submitting a patch.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue16688>
_______________________________________


More information about the Python-bugs-list mailing list