sqlite3 decode error

Tue Nov 8 15:49:48 EST 2005

On Tue, 08 Nov 2005 16:27:25 -0400, David Pratt <fairwinds at eastlink.ca> wrote:
>Recently I have run into an issue with sqlite where I encode strings
>going into sqlite3 as utf-8.  I guess by default sqlite3 is converting
>this to unicode since when I try to decode I get an attribute error
>like this:
>
>AttributeError: 'unicode' object has no attribute 'decode'
>
>The code and data I am preparing is to work on postgres as well a
>sqlite so there are a couple of things I could do.  I could always
>store any data as unicode to any db, or test the data to determine
>whether it is a string or unicode type when it comes out of the
>database so I can deal with this possibility without errors. I will
>likely take the first option but I looking for a simple test to
>determine my object type.
>
>if I do:
>
> >>>type('maybe string or maybe unicode')
>
>I get this:
>
> >>><type 'unicode'>
>
>I am looking for something that I can use in a comparison.
>
>How do I get the type as a string for comparison so I can do something
>like
>
>if type(some_data) == 'unicode':
>	do some stuff
>else:
>	do something else
>

You don't actually want the type as a string.  What you seem to be leaning towards is the builtin function "isinstance":

    if isinstance(some_data, unicode):
        # some stuff
    elif isinstance(some_data, str):
        # other stuff
    ...

But I think what you actually want is to be slightly more careful about what you place into SQLite3.  If you are storing text data, insert is as a Python unicode string (with no NUL bytes, unfortunately - this is a bug in SQLite3, or maybe the Python bindings, I forget which).  If you are storing binary data, insert it as a Python buffer object (eg, buffer('1234')).  When you take text data out of the database, you will get unicode objects.  When you take bytes out, you will get buffer objects (which you can convert to str objects with str()).

You may want to look at Axiom (<http://divmod.org/trac/wiki/DivmodAxiom>) to see how it handles each of these cases.  In particular, the "text" and "bytes" types defined in the attributes module (<http://divmod.org/trac/browser/trunk/Axiom/axiom/attributes.py>).

By only encoding and decoding at the border between your application and the outside world, and the border between your application and the data, you will eliminate the possibility for a class of bugs where encodings are forgotten, or encoded strings are accidentally combined with unicode strings.

Hope this helps,

Jean-Paul