[issue16688] Backreferences make case-insensitive regex fail on non-ASCII strings.

pyos report at bugs.python.org
Fri Dec 14 23:19:34 CET 2012


New submission from pyos:

The title says it all: if a regular expression that makes use of backreferences is compiled with `re.I` flag, it will always fail when matched against a string that contains characters outside of U+0000-U+00FF range. I've been unable to further narrow the bug down.

A simple example:

    >>> import re
    >>> r = re.compile(r'(a)\1', re.I)  # should match "aa", "aA", "Aa", or "AA"
    >>> r.findall('aa')  # works as expected
    ['a']
    >>> r.findall('aa bcd')  # still works
    ['a']
    >>> r.findall('aa Ā')  # ord('Ā') == 0x0100
    []

The same code works as expected in Python 3.2:

    >>> r.findall('aa Ā')
    ['a']

----------
components: Regular Expressions
messages: 177518
nosy: ezio.melotti, mrabarnett, pitrou, pyos
priority: normal
severity: normal
status: open
title: Backreferences make case-insensitive regex fail on non-ASCII strings.
type: behavior
versions: Python 3.3

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue16688>
_______________________________________


More information about the Python-bugs-list mailing list