Read file that starts with '\xff\xfe'
Colin S. Miller
colinsm.spam-me-not at picsel.com
Mon Sep 8 11:41:05 EDT 2003
Bob Gailer wrote:
> At 07:31 AM 9/8/2003, Duncan Booth wrote:
>
>> Bob Gailer <bgailer at alum.rpi.edu> wrote in
>> news:mailman.1063025195.15280.python-list at python.org:
>>
>> > That's a good start. I presume I need to use codecs.open(filename,
>> > mode[, encoding[, errors[, buffering]]]) to read the file. What is the
>> > actual value of the "encoding[" parameter for "Little-endian UTF-16
>> > Unicode character data, with CR line terminators"
>>
>> Try:
>>
>> myFile = codecs.open(filename, "r", "utf16")
>>
>> If the file starts with a UTF-16 marker (either little or big endian) it
>> will be read correctly. If it doesn't start with either marker reading
>> from
>> it will throw a UnicodeError.
>
>
> Interesting error:
>
> UniCodeError: UTF-16 decoding error: truncated data
Are you doing readline on the unicode file?
I bashed my head off this problem a few months ago, and ended up doing
codecs.open(...).read().splitline()
I think what happens is the codecs::readline calls the underlying
readline code, which doesn't respect unicode, and instead splits at the
first \r or \n it finds; in little-endian this will result in a string
with an odd-number of bytes.
Colin Miller
>
> Bob Gailer
> bgailer at alum.rpi.edu
> 303 442 2625
>
>
> ------------------------------------------------------------------------
>
>
> ---
> Outgoing mail is certified Virus Free.
> Checked by AVG anti-virus system (http://www.grisoft.com).
> Version: 6.0.506 / Virus Database: 303 - Release Date: 8/1/2003
More information about the Python-list
mailing list