unicode to ascii converting

Peter Wilkinson pwilkinson at videotron.ca
Fri Aug 6 16:18:02 EDT 2004


thanks for the clear explanation.

I modified my code and now this works :)


Peter


At 03:46 PM 8/6/2004, Martin v. Löwis wrote:
>Peter Wilkinson wrote:
>>UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0: 
>>ordinal not in range(128)
>
>That error actually says what happened: You have the byte with the numeric 
>value 0xff in the input, and the ASCII (American Standard
>Code for Information Interchange) converter cannot convert that
>into a Unicode character. This is because ASCII is a 7-bit character
>set, i.e. it goes from 0..127. 0xFF is 255, so it is out of range.
>
>Now, the line triggering this is
>
>   bz_file_out.write(line.encode(new_encode))
>
>and it invokes *encode*, not *decode*. Why would it give a decode error
>then?
>
>Because:
>
>   decode: take a byte string, return a Unicode string
>   encode: take a Unicode string, take a byte string
>
>So line should be a Unicode string, for .encode to be a meaningful thing
>to do. Unfortunately, Python supports .encode also for byte strings.
>If new_encode defines a character encoding, this does
>
>class str:
>   def encode(self, encoding):
>     unistr = unicode(self)
>     return unistr.encode(encoding)
>
>So it first tries to convert the current string into unicode, which
>uses the system default encoding, which is us-ascii. Hence the error.
>
>HTH,
>Martin
>--
>http://mail.python.org/mailman/listinfo/python-list




More information about the Python-list mailing list