[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0

John Machin report at bugs.python.org
Thu Apr 1 15:47:38 CEST 2010


John Machin <sjmachin at users.sourceforge.net> added the comment:

@lemburg: RFC 2279 was obsoleted by RFC 3629 over 6 years ago. The standard now says 21 bits is it. F5-FF are declared to be invalid. I don't understand what you mean by "supporting those possibilities". The code is correctly issuing an error message. The goal of supporting the new resyncing and FFFD-emitting rules might be better met however by throwing away the code in the default clause and instead merely setting the entries for F5-FF in the utf8_code_length array to zero.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue8271>
_______________________________________


More information about the Python-bugs-list mailing list