[issue12892] UTF-16 and UTF-32 codecs should reject (lone) surrogates
STINNER Victor
report at bugs.python.org
Tue Nov 29 21:42:30 CET 2011
STINNER Victor <victor.stinner at haypocalc.com> added the comment:
Python 3.3 has a strange behaviour:
>>> '\uDBFF\uDFFF'.encode('utf-16-le').decode('utf-16-le')
'\U0010ffff'
>>> '\U0010ffff'.encode('utf-16-le').decode('utf-16-le')
'\U0010ffff'
I would expect text.decode(encoding).encode(encoding)==text or an encode or decode error.
So I agree that the encoder should reject lone surogates.
----------
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue12892>
_______________________________________
More information about the Python-bugs-list
mailing list