[issue13333] utf-7 inconsistent with surrogates
Martin v. Löwis
report at bugs.python.org
Thu Nov 3 18:28:59 CET 2011
Martin v. Löwis <martin at v.loewis.de> added the comment:
RFC 2152 talks about encoding 16-bit unicode, and clarifies
Surrogate pairs (UTF-16) are converted by treating each half
of the pair as a separate 16 bit quantity (i.e., no special
treatment).
So lone surrogates clearly should be supported.
This text could be interpreted as saying that decoding surrogate pairs should also keep them (rather than combining them). However, the RFC also assumes that the decoded form will use 16-bit code units; for Python, I think we should continue combining surrogate pairs on decoding UTF-7 when we find them.
----------
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue13333>
_______________________________________
More information about the Python-bugs-list
mailing list