[Python-Dev] Unicode byte order mark decoding

Walter Dörwald walter at livinglogic.de
Thu Apr 7 23:47:07 CEST 2005


Walter Dörwald sagte:

> Nicholas Bastin sagte:
>
> It should be feasible to implement your own codec for that
> based on Lib/encodings/utf_16.py. Simply replace the line
> in StreamReader.decode():
>   raise UnicodeError,"UTF-16 stream does not start with BOM"
> with:
>   self.decode = codecs.utf_16_be_decode
> and you should be done.

Oops, this only works if you have a big endian system.
Otherwise you have to redecode the input with:
   codecs.utf_16_ex_decode(input, errors, 1, False)

Bye,
   Walter Dörwald





More information about the Python-Dev mailing list