Python 3.0 automatic decoding of UTF16

MRAB google at mrabarnett.plus.com
Sat Dec 6 11:50:24 EST 2008


Johannes Bauer wrote:
> info at orlans-amo.be schrieb:
> 
>> 2 problems: endianness and trailing zer byte.
>> This works for me:
> 
> This is very strange - when using "utf16", endianness should be detected
> automatically. When I simply truncate the trailing zero byte, I receive:
> 
> Traceback (most recent call last):
>   File "./modify.py", line 12, in <module>
>     a = AddressBook("2008_11_05_Handy_Backup.txt")
>   File "./modify.py", line 7, in __init__
>     line = f.readline()
>   File "/usr/local/lib/python3.0/io.py", line 1807, in readline
>     while self._read_chunk():
>   File "/usr/local/lib/python3.0/io.py", line 1556, in _read_chunk
>     self._set_decoded_chars(self._decoder.decode(input_chunk, eof))
>   File "/usr/local/lib/python3.0/io.py", line 1293, in decode
>     output = self.decoder.decode(input, final=final)
>   File "/usr/local/lib/python3.0/codecs.py", line 300, in decode
>     (result, consumed) = self._buffer_decode(data, self.errors, final)
>   File "/usr/local/lib/python3.0/encodings/utf_16.py", line 69, in
> _buffer_decode
>     return self.decoder(input, self.errors, final)
> UnicodeDecodeError: 'utf16' codec can't decode byte 0x0a in position 0:
> truncated data
> 
> But I suppose something *is* indeed weird because the file I uploaded
> and which did not yield the "truncated data" error ia 1559 bytes, which
> just cannot be.
> 
It might be that the EOF marker (b'\x1A' or u'\u001A') was written or is 
being read as a single byte instead of 2 bytes for UTF-16 text.



More information about the Python-list mailing list