sqlite utf8 encoding error

Thu Nov 17 16:35:42 EST 2005

Jarek Zgoda wrote:
> Fredrik Lundh napisa³(a):
>
> >>UnicodeDecodeError: 'utf8' codec can't decode bytes in position 13-18:
> >>unsupported Unicode code range
> >>
> >>does anyone have any idea on what could be going wrong?  The string
> >>that I store in the database table is:
> >>
> >>'Keinen Text für Übereinstimmungsfehler gefunden'
> >
> > $ more test.py
> > # -*- coding: iso-8859-1 -*-
> > u = u'Keinen Text für Übereinstimmungsfehler gefunden'
> > s = u.encode("iso-8859-1")
> > u = s.decode("utf-8") # <-- this gives an error
> >
> > $ python test.py
> > Traceback (most recent call last):
> >   File "test.py", line 4, in ?
> >     u = s.decode("utf-8") # <-- this gives an error
> >   File "lib/encodings/utf_8.py", line 16, in decode
> >     return codecs.utf_8_decode(input, errors, True)
> > UnicodeDecodeError: 'utf8' codec can't decode bytes in position 13-18:
> > unsupported Unicode code range
>
> I cann't wait for the moment when encoded strings go away from Python.
> The more I program in this language, the more confusion this difference
> is causing. Now most of functions and various object's methods accept
> strings and unicode, making it hard to find sources of Unicode*Errors.

Library writers can speed up the transition by hiding 8bit interface,
for example:

import sqlite
sqlite.I_promise_to_pass_8bit_string_only_in_utf8_encoding(my_signature="sig.gif")

if you don't call this function 8bit strings will not be accepted :)
IMHO if libraries keep on excepting both str and unicode till python
3.0, it will just prolong the confusion of unicode newbies instead of
guiding them in the right direction _right now_.