UTF16 codec doesn't round-trip?

"Martin v. Löwis" martin at v.loewis.de
Sat May 28 18:33:51 EDT 2005


John Perks and Sarah Mount wrote:
> If the ascii can't be recognized as UTF16, then surely the codec
> shouldn't have allowed it to be encoded in the first place? I could
> understand if it was trying to decode ascii into (native) UTF32.

Please don't call the thing you are trying to decode "ascii". ASCII
is the name of the American Standard Code for Information Interchange;
it is a 7-bit code, and what you are trying to decode certainly isn't
ascii. Call it "bytes" instead.

So you are trying to decode bytes as UTF-16. The bytes you have
definitely are not UTF-16 - the specific sequence of bytes is invalid
in UTF-16. Therefore, the codec is right to reject it when decoding.

It might be considered as a bug that the codec encoded the characters
in the first place.

> On a similar note, if you are using UTF32 natively, are you allowed to
> have raw surrogate escape sequences (paired or otherwise) in unicode
> literals?

Python accepts such literals.

Regards,
Martin



More information about the Python-list mailing list