[issue2636] Regexp 2.7 (modifications to current re 2.2.2)

Matthew Barnett report at bugs.python.org
Tue Sep 21 13:41:35 CEST 2010


Matthew Barnett <python at mrabarnett.plus.com> added the comment:

I use Python 3, where len("\U00010337") == 2 on a narrow build.

Yes, wide Unicode on a narrow build is a problem:

>>> regex.findall("\\U00010337", "a\U00010337bc")
[]
>>> regex.findall("(?i)\\U00010337", "a\U00010337bc")
[]

I'm not sure how (or whether!) to handle surrogate pairs. It _would_ make things more complicated.

I suppose the moral is that if you want to use wide Unicode then you really should use a wide build.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue2636>
_______________________________________


More information about the Python-bugs-list mailing list