[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0

Ezio Melotti report at bugs.python.org
Thu May 17 19:36:09 CEST 2012


Ezio Melotti <ezio.melotti at gmail.com> added the comment:

> Tests fails, but I'm not sure that the tests are correct.

> b'\xe0\x00' raises 'unexpected end of data' and not 'invalid
> continuation byte'. This is terminological issue.

This might be just because it first checks if there two more bytes before checking if they are valid, but 'invalid continuation byte' works too.

> b'\xe0\x80'.decode('utf-8', 'replace') returns one U+FFFD and not
> two. I don't think that is right.

Why not?

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue8271>
_______________________________________


More information about the Python-bugs-list mailing list