UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 10: ordinal not in range(128)

Jeff Epler jepler at unpythonic.net
Thu Oct 7 19:54:13 EDT 2004


If you compare a unicode string to a byte string, and the byte-string
has byte values >127, you will get an error like this:
    >>> u'a' == '\xc0'
    Traceback (most recent call last):
      File "<stdin>", line 1, in ?
    UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 0: ordinal not in range(128)

There is no sensible way for Python to perform this comparison, because
the byte string '\xc0' could be in any encoding.  If the encoding of the
byte string is latin-1, it's LATIN CAPITAL LETTER A WITH GRAVE.  If it's
koi8-r encoded, it's CRYILLIC SMALL LETTER YU.  Python refuses to guess
in this case.

It doesn't matter whether the unicode string contains any characters
that are non-ASCII characters.

To correct your function, you'll have to know what encoding the byte
string is in, and convert it to unicode using the decode() method,
and compare that result to the unicode string.

Jeff
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-list/attachments/20041007/6374b74a/attachment.sig>


More information about the Python-list mailing list