sqlite3 decode error

Tue Nov 8 18:30:26 EST 2005

Hi Jean-Paul for some really good advice.  I'll take a look at the 
project to see how this is handled.  I was not aware of your wrapper 
project for SQLite - so this is something new to look at too.  I have 
worked with SQLObject and also Django's db wrappers. In fact this 
question has come out of an SQLObject implementation in RDFlib since it 
is here I discovered this issue in the way this backend is behaving 
with SQLite3 and I have got it working now. I am only starting to warm 
to the idea of unicode throughout.  For example. In the backend code 
that I am trying to work with you have this.
_tokey is a helper to bring things into the relational database, 
_fromkey is a helper when extracting data from the database.  
Commenting out the .decode("UTF-8") and value = value.decode("UTF-8") 
allowed me to get this working but I need to make this work with 
unicode.  My unicode experience is limited and I am confused about 
writing unicode compatible replacements for things like:
return '<%s>' % ''.join(splituri(term.encode("UTF-8")))

def splituri(uri):
     if uri.startswith('<') and uri.endswith('>'):
         uri = uri[1:-1]
     if uri.startswith('_'):
         uid = ''.join(uri.split('_'))
         return '_', uid
     if '#' in uri:
         ns, local = rsplit(uri, '#', 1)
         return ns + '#', local
     if '/' in uri:
         ns, local = rsplit(uri, '/', 1)
         return ns + '/', local
     return NO_URI, uri

def _fromkey(key):
     if key.startswith("<") and key.endswith(">"):
         key = key[1:-1].decode("UTF-8")     ## Fails here when data 
extracted from database
         if key.startswith("_"):
             key = ''.join(splituri(key))
             return BNode(key)
         return URIRef(key)
     elif key.startswith("_"):
         return BNode(key)
     else:
         m = _literal.match(key)
         if m:
             d = m.groupdict()
             value = d["value"]
             value = unquote(value)
             value = value.decode("UTF-8")     ## Fails here when data 
extracted from database
             lang = d["lang"] or ''
             datatype = d["datatype"]
             return Literal(value, lang, datatype)
         else:
             msg = "Unknown Key Syntax: '%s'" % key
             raise Exception(msg)

def _tokey(term):
     if isinstance(term, URIRef):
         term = term.encode("UTF-8")
         if not '#' in term and not '/' in term:
             term = '%s%s' % (NO_URI, term)
         return '<%s>' % term
     elif isinstance(term, BNode):
         return '<%s>' % ''.join(splituri(term.encode("UTF-8")))
     elif isinstance(term, Literal):
         language = term.language
         datatype = term.datatype
         value = quote(term.encode("UTF-8"))
         if language:
             language = language.encode("UTF-8")
             if datatype:
                 datatype = datatype.encode("UTF-8")
                 n3 = '"%s"@%s&<%s>' % (value, language, datatype)
             else:
                 n3 = '"%s"@%s' % (value, language)
         else:
             if datatype:
                 datatype = datatype.encode("UTF-8")
                 n3 = '"%s"&<%s>' % (value, datatype)
             else:
                 n3 = '"%s"' % value
         return n3
     else:
         msg = "Unknown term Type for: %s" % term
         raise Exception(msg)

In an unrelated question, it appears SQLite is also extremely flexible 
about what types of data it can contain.  When writing SQL in Postgres 
I use timestamp type and can use this also in SQLite. With my work with 
Django, the same information is mapped to datetime type. Would you be 
inclined to recommend the use of one type over the other. If so, can 
you explain the rationale for this choice.  Many thanks.

Regards,
David

On Tuesday, November 8, 2005, at 04:49 PM, Jean-Paul Calderone wrote:

> On Tue, 08 Nov 2005 16:27:25 -0400, David Pratt 
> <fairwinds at eastlink.ca> wrote:
>> Recently I have run into an issue with sqlite where I encode strings
>> going into sqlite3 as utf-8.  I guess by default sqlite3 is converting
>> this to unicode since when I try to decode I get an attribute error
>> like this:
>>
>> AttributeError: 'unicode' object has no attribute 'decode'
>>
>> The code and data I am preparing is to work on postgres as well a
>> sqlite so there are a couple of things I could do.  I could always
>> store any data as unicode to any db, or test the data to determine
>> whether it is a string or unicode type when it comes out of the
>> database so I can deal with this possibility without errors. I will
>> likely take the first option but I looking for a simple test to
>> determine my object type.
>>
>> if I do:
>>
>>>>> type('maybe string or maybe unicode')
>>
>> I get this:
>>
>>>>> <type 'unicode'>
>>
>> I am looking for something that I can use in a comparison.
>>
>> How do I get the type as a string for comparison so I can do something
>> like
>>
>> if type(some_data) == 'unicode':
>> 	do some stuff
>> else:
>> 	do something else
>>
>
> You don't actually want the type as a string.  What you seem to be 
> leaning towards is the builtin function "isinstance":
>
>     if isinstance(some_data, unicode):
>         # some stuff
>     elif isinstance(some_data, str):
>         # other stuff
>     ...
>
> But I think what you actually want is to be slightly more careful 
> about what you place into SQLite3.  If you are storing text data, 
> insert is as a Python unicode string (with no NUL bytes, unfortunately 
> - this is a bug in SQLite3, or maybe the Python bindings, I forget 
> which).  If you are storing binary data, insert it as a Python buffer 
> object (eg, buffer('1234')).  When you take text data out of the 
> database, you will get unicode objects.  When you take bytes out, you 
> will get buffer objects (which you can convert to str objects with 
> str()).
>
> You may want to look at Axiom 
> (<http://divmod.org/trac/wiki/DivmodAxiom>) to see how it handles each 
> of these cases.  In particular, the "text" and "bytes" types defined 
> in the attributes module 
> (<http://divmod.org/trac/browser/trunk/Axiom/axiom/attributes.py>).
>
> By only encoding and decoding at the border between your application 
> and the outside world, and the border between your application and the 
> data, you will eliminate the possibility for a class of bugs where 
> encodings are forgotten, or encoded strings are accidentally combined 
> with unicode strings.
>
> Hope this helps,
>
> Jean-Paul
> -- 
> http://mail.python.org/mailman/listinfo/python-list
>