Multibyte Character Surport for Python

Martin v. Loewis martin at v.loewis.de
Mon May 13 02:16:00 EDT 2002


"Stephen J. Turnbull" <stephen at xemacs.org> writes:

>     Martin> Why does it help to have "UTF-16" to be a synonym to
>     Martin> either "UTF-16BE" or "UTF-16LE", but not telling anybody
>     Martin> what it is a synonym to?
> 
> Ask whoever implemented a UTF-16 codec for python, not me.  Evidently
> there's a good reason for it.

In Python codecs, UTF-16 is *not* a synonym for UTF-16LE or BE;
instead, it adds the BOM (which the other two don't). You were
suggesting to omit the BOM, so I asked how that would help.

> The fact is that the current implementation is just begging to produce
> broken output that will be invisible to anyone who has a Unicode-
> capable console.  And that the only way to avoid it (without rewriting
> all the APIs to pass Unicode objects instead of pre-encoded strings)
> is really ugly code like the code I presented earlier.

No, that is not the only way. Just use UTF-16BE, and all will be fine.

Regards,
Martin



More information about the Python-list mailing list