Convertion of Unicode to ASCII NIGHTMARE

Roger Binns rogerb at rogerbinns.com
Tue Apr 4 22:42:35 EDT 2006


"ChaosKCW" <da.martian at gmail.com> wrote in message news:1144150561.389856.302670 at v46g2000cwv.googlegroups.com...
> me. As for SQLite supporting unicode, it probably does,

No, SQLite *ONLY* supports Unicode.  It will *only* accept
strings in Unicode and only produces strings in Unicode.
All the functionality built into SQLite such as comparison
operators operate only on Unicode strings.

> but something
> on the python side (probabyl in apsw) converts it to ascii at some
> point before its handed to SQLite.

No.  APSW converts it *to* Unicode.  SQLite only accepts Unicode
so a Unicode string has to be supplied.  If you supply a non-Unicode
string then conversion has to happen.  APSW asks Python to
supply the string in Unicode.  If Python can't do that (eg
it doesn't know the encoding) then you get an error.

I strongly recommend reading this:

  The Absolute Minimum Every Software Developer Absolutely,
  Positively Must Know About Unicode and Character Sets

  http://www.joelonsoftware.com/articles/Unicode.html

> Ok if SQLite uses unicode internally why do you need to ignore
> everything greater than 127,

I never said that.  I said that a special case is made so that
if the string you supply only contains ASCII characters (ie <=127)
then the ASCII string is converted to Unicode.  (In fact it is
valid UTF-8 hence the shortcut).

> the ascii table (256 bit one) fits into
> unicode just fine as far as I recall?

No, ASCII characters have defined Unicode codepoints.  The ASCII
character number just happens to be the same as the Unicode
codepoints.  But there are only 127 ASCII characters.

> Or did I miss the boat here ?

For bytes greater than 127, what character set is used?  There
are hundreds of character sets that define those characters.
You have to tell the computer which one to use.  See the Unicode
article referenced above.

Roger 





More information about the Python-list mailing list