unicode to ascii converting
"Martin v. Löwis"
martin at v.loewis.de
Fri Aug 6 15:46:01 EDT 2004
Peter Wilkinson wrote:
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0:
> ordinal not in range(128)
That error actually says what happened: You have the byte with the
numeric value 0xff in the input, and the ASCII (American Standard
Code for Information Interchange) converter cannot convert that
into a Unicode character. This is because ASCII is a 7-bit character
set, i.e. it goes from 0..127. 0xFF is 255, so it is out of range.
Now, the line triggering this is
bz_file_out.write(line.encode(new_encode))
and it invokes *encode*, not *decode*. Why would it give a decode error
then?
Because:
decode: take a byte string, return a Unicode string
encode: take a Unicode string, take a byte string
So line should be a Unicode string, for .encode to be a meaningful thing
to do. Unfortunately, Python supports .encode also for byte strings.
If new_encode defines a character encoding, this does
class str:
def encode(self, encoding):
unistr = unicode(self)
return unistr.encode(encoding)
So it first tries to convert the current string into unicode, which
uses the system default encoding, which is us-ascii. Hence the error.
HTH,
Martin
More information about the Python-list
mailing list