unicode to ascii converting

Fri Aug 6 15:46:01 EDT 2004

Peter Wilkinson wrote:
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0: 
> ordinal not in range(128)

That error actually says what happened: You have the byte with the 
numeric value 0xff in the input, and the ASCII (American Standard
Code for Information Interchange) converter cannot convert that
into a Unicode character. This is because ASCII is a 7-bit character
set, i.e. it goes from 0..127. 0xFF is 255, so it is out of range.

Now, the line triggering this is

   bz_file_out.write(line.encode(new_encode))

and it invokes *encode*, not *decode*. Why would it give a decode error
then?

Because:

   decode: take a byte string, return a Unicode string
   encode: take a Unicode string, take a byte string

So line should be a Unicode string, for .encode to be a meaningful thing
to do. Unfortunately, Python supports .encode also for byte strings.
If new_encode defines a character encoding, this does

class str:
   def encode(self, encoding):
     unistr = unicode(self)
     return unistr.encode(encoding)

So it first tries to convert the current string into unicode, which
uses the system default encoding, which is us-ascii. Hence the error.

HTH,
Martin