Wrong default endianess in utf-16 and utf-32 !?

jmfauth wxjmfauth at gmail.com
Wed Oct 13 03:07:09 EDT 2010


On 12 oct, 22:00, John Machin <sjmac... at lexicon.net> wrote:
> jmfauth <wxjmfauth <at> gmail.com> writes:
>
> > When an endianess is not specified, (BE, LE, unmarked forms),
> > the Unicode Consortium specifies, the default byte serialization
> > should be big-endian.
>
> > Seehttp://www.unicode.org/faq//utf_bom.html
> > Q: Which of the UTFs do I need to support?
> > and
> > Q: Why do some of the UTFs have a BE or LE in their label,
> > such as UTF-16LE?
>
> Sometimes it is necessary to read right to the end of an answer:
>
> Q: Why do some of the UTFs have a BE or LE in their label, such as UTF-16LE?
>
> A: [snip] the unmarked form uses big-endian byte serialization by default, but
> may include a byte order mark at the beginning to indicate the actual byte
> serialization used.



Well, English is not my native language, however I think I read it
correctly.

My question had nothing to do with the BOM, the encoding/decoding
or the BOM inclusion. My question was:

"What should I understand by "utf-16"?  "utf-16-le" or "utf-16-be"?

And Antoine gave an answer.




More information about the Python-list mailing list