Convertion of Unicode to ASCII NIGHTMARE

Serge Orlov Serge.Orlov at gmail.com
Wed Apr 5 23:48:55 EDT 2006


Roger Binns wrote:
> "Fredrik Lundh" <fredrik at pythonware.com> wrote in message news:mailman.4102.1144215505.27775.python-list at python.org...
> > Roger Binns wrote:
> >
> >> SQLite only accepts Unicode so a Unicode string has to be supplied.
> >
> > fact or FUD?  let's see:
>
> Note I said SQLite.  For APIs that take/give strings, you can either
> supply/get a UTF-8 encoded sequence of bytes, or two bytes per character
> host byte order sequence.  Any wrapper of SQLite that doesn't do
> Unicode in/out is seriously breaking things.
>
> I ended up using the UTF-8 versions of the API as Python can't quite
> make its mind up how to represent Unicode strings at the C api level.
> You can have two bytes per char or four, and the handling/production
> of byte order markers isn't that clear either.

I have an impression that handling/production of byte order marks is
pretty clear: they are produced/consumed only by two codecs: utf-16 and
utf-8-sig. What is not clear?

  Serge




More information about the Python-list mailing list