[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Matthew Barnett
report at bugs.python.org
Tue Sep 21 13:41:35 CEST 2010
Matthew Barnett <python at mrabarnett.plus.com> added the comment:
I use Python 3, where len("\U00010337") == 2 on a narrow build.
Yes, wide Unicode on a narrow build is a problem:
>>> regex.findall("\\U00010337", "a\U00010337bc")
[]
>>> regex.findall("(?i)\\U00010337", "a\U00010337bc")
[]
I'm not sure how (or whether!) to handle surrogate pairs. It _would_ make things more complicated.
I suppose the moral is that if you want to use wide Unicode then you really should use a wide build.
----------
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue2636>
_______________________________________
More information about the Python-bugs-list
mailing list