[issue8438] Codecs: "surrogateescape" error handler in Python 2.7

Ezio Melotti report at bugs.python.org
Mon Apr 19 10:55:48 CEST 2010


Ezio Melotti <ezio.melotti at gmail.com> added the comment:

> I consider this an important missing backport for 2.7, since
> without this handler, the UTF-8 codecs in 2.7 and 3.x are
> incompatible and there's no other way to work around this
> other than to make use of the errorhandler conditionally
> depend on the Python version.

FWIW I tried to updated the UTF-8 codec on trunk from RFC 2279 to RFC 3629 while working on #8271, and found out this difference in the handling of surrogates (only on 3.x they are invalid).
I didn't change the behavior of the codec in the patch I attached to #8271 because it was out of the scope of the issue, but I consider the fact that in Python 2.x surrogates can be encoded as a bug, because it doesn't follow RFC 3629.
IMHO Python 2.x should provide an RFC-3629-compliant UTF-8 codec, however I didn't have time yet to investigate how Python 3 handles this and what is the best solution (e.g. adding another codec or change the default behavior).

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue8438>
_______________________________________


More information about the Python-bugs-list mailing list