[issue12892] UTF-16 and UTF-32 codecs should reject (lone) surrogates

Ezio Melotti report at bugs.python.org
Mon Jan 30 09:51:06 CET 2012


Ezio Melotti <ezio.melotti at gmail.com> added the comment:

Thanks for the patch!

>  * fix an error in the error handler for utf-16-le. (In, Python3.2 
> b'\xdc\x80\x00\x41'.decode('utf-16-be', 'ignore') returns "\x00" 
> instead of "A" for some reason)

This should probably be done on a separate patch that will be applied to 3.2/3.3 (assuming that it can go to 3.2).  Rejecting surrogates will go in 3.3 only.  (Note that lot of Unicode-related code changed between 3.2 and 3.3.)

> Should we really reject lone surrogates for UTF-7?

No, I meant only UTF-8/16/32; UTF-7 is fine as is.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue12892>
_______________________________________


More information about the Python-bugs-list mailing list