UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to <undefined>

Marko Rauhamaa marko at pacujo.net
Sun Oct 21 14:48:04 EDT 2018


pjmclenon at gmail.com:

> not sure why utf-8 gives an error when thats the most wide all caracters
> inclusive right?/

Not all sequences of bytes are legal in UTF-8. For example,

   >>> b'\x80'.decode("utf-8")
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
   UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

Not all sequences of bytes are legal in ASCII, either.

However, all sequences of bytes are legal in Latin-1 (among others). Of
course, decoding with Latin-1 gives you gibberish unless the data really
is Latin-1. But you'll never get a UnicodeDecodeError.


Marko



More information about the Python-list mailing list