[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0

Thu May 17 19:36:09 CEST 2012

Ezio Melotti <ezio.melotti at gmail.com> added the comment:

> Tests fails, but I'm not sure that the tests are correct.

> b'\xe0\x00' raises 'unexpected end of data' and not 'invalid
> continuation byte'. This is terminological issue.

This might be just because it first checks if there two more bytes before checking if they are valid, but 'invalid continuation byte' works too.

> b'\xe0\x80'.decode('utf-8', 'replace') returns one U+FFFD and not
> two. I don't think that is right.

Why not?

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue8271>
_______________________________________