[Baypiggies] Fwd: [sugar] Database Activity
Shannon -jj Behrens
jjinux at gmail.com
Mon Mar 10 16:57:34 CET 2008
On Mon, Mar 10, 2008 at 12:12 AM, Kelly Yancey <kelly at nttmcl.com> wrote:
> Edward Cherlin wrote:
> > Python UTF-8 issue. Can anybody help?
> >
>
> With no code, hence relying on my psychic debugging prowless....
>
>
> > Question 1: Why does pysqlite convert the utf-8 sting to latin-1 in
> > the *query process* when my sqlite settings are for utf-8?
> > i.e., sqlite.h macro: #define SQLITE_UTF8 1
>
> It doesn't*. The poster's code did (or a library he is calling did).
> See answer to Question 2.
>
> * Unless you set the text_factory to something other than unicode,
> that is.
>
> http://oss.itsystementwicklung.de/download/pysqlite/doc/usage-guide.html#text_factory
>
>
> >
> > Question 2: Does the pysqlite user (programmer) have any control
> > over this query/conversion? (I see nothing in the Python Database
> > API, or the pysqlite/sqlite documentation, about either automatic or
> > specified conversions. Also, this conversion does not occur using
> > the sqlite3 client.)
> >
>
> Yes, see footnote to Question 1.
> (I guess he didn't read the documentation very closely. :( )
>
> My experience is that if a unicode string is recorded to the
> database, it will be returned as a unicode string.
> Now if you use that unicode object in a string context somewhere,
> python will try to convert it based on your default encoding. Which the
> original poster states is "ascii". Look fishy? I think so. Note that
> the error is a Unicode*Encode*Error, not a Unicode*Decode*Error. The
> string returned by pysqlite was unicode, the poster's code is trying to
> encode it into a non-unicode character set (ascii) somewhere.
>
>
> > Question 3: Is this a pysqlite bug or a feature, and why?"
>
> If my hunch is right, then it has nothing to do with pysqlite. It is
> a python feature. The original poster has a bug in their code. No
> surprises here.
>
> How to confirm if my hunch is right:
>
> 1. Set the default encoding to 'utf-8'. Run test program. Does it
> die? (The original poster seems to recognize that this would be a
> good way to identify where the problem lies but then refuses to
> actually try it?!?)
> 2. If the UnicodeEncodeError goes away, congratulations you have
> identified that the bug is somewhere in your code. You got a
> perfectly-good unicode object back from the pysqlite and
> erroneously used it in a non-unicode string context. Python
> tried to convert for you. But you asked it to convert to 'ascii'
> which is can't do. You got an exception. Please do not post
> your code for us to find your bug for you.
> 3. Set the default encoding back to 'ascii' if you like.
> 4. Look a little higher up in your traceback to find *your* code.
> That is where you are erroniously using the unicode object in
> a string context. You may be calling a third-party API that
> expects strings rather than unicode objects. Read the docs. Fix
> your code. Rinse and repeat until all of your UnicodeEncodeError
> exceptions go away.
> 5. Audit your code for further similar bugs.
>
> The worst part about this question is that Christian Boos already
> gave the same answer (albeit more succinctly)!
> http://osdir.com/ml/python.db.pysqlite.user/2006-04/msg00021.html
>
> Kelly
>
> P.S. I was feeling pretty generous when I started my free psychic
> debugging session. But I swear I can't help but think this
> was all a ploy to boost osdir's ad views and now I'm quite
> irritable.
>
> --
> Kelly Yancey
> http://kbyanc.blogspot.com/
>
>
>
> >
> > ---------- Forwarded message ----------
> > From: 7150 <linux.1 at litenverden.com>
> > Date: Sat, Mar 8, 2008 at 4:02 PM
> > Subject: Re: [sugar] Database Activity
> > To: sugar at lists.laptop.org
> >
> > This will be my last post on this topic until I learn a bit more.
> >
> > It appears that pysqlite has some data integrity issues.
> >
> > Interesting link to pysqlite "bug" discussion:
> > http://osdir.com/ml/python.db.pysqlite.user/2006-04/msg00020.html
> >
> > ---
> >
> > "I have built sqlite databases containing utf-8 encoded text. When I
> > access them using pysqlite, utf-8 codepoints appear to be converted to
> > latin-1."
> >
> > It does this.
> >
> > How to stop it?
> >
> > An answer on the list was
> > (http://osdir.com/ml/python.db.pysqlite.user/2006-04/msg00022.html):
> >
> > > This should be:
> > >
> > > req.write(elem.encode('utf-8'))
> > >
> > > write() expects a str object, so what happened
> > > in your code was an implicit conversion of your
> > > unicode object to a str, doing the encoding using
> > > the default system encoding, here 'ascii'.
> >
> > The req.write stuff is Greek to me, but I'll see what I can find out
> > about it.
> >
> > Then from the same post:
> >
> > "So, upon reflection, my situation boils down to this:
> >
> > (1) A utf-8 encoded string in an sqlite database is queried using
> > pysqlite which returns a latin-1 string (a
> > gratuitous/silent/unspecified/unrequested conversion).
> >
> > . . .
> >
> > pysqlite QUESTIONS:
> >
> > Question 1: Why does pysqlite convert the utf-8 sting to latin-1 in the
> > *query process* when my sqlite settings are for utf-8?
> > i.e., sqlite.h macro: #define SQLITE_UTF8 1
> >
> > Question 2: Does the pysqlite user (programmer) have any control over
> > this query/conversion? (I see nothing in the Python Database API, or
> > the pysqlite/sqlite documentation, about either automatic or specified
> > conversions. Also, this conversion does not occur using the sqlite3
> > client.)
> >
> > Question 3: Is this a pysqlite bug or a feature, and why?"
> >
> > The group had no further answer for the guy.
> >
> > ---
> >
> > Someone else has noted thet the SQLite3 client does not transform the
> > data. If I learn how to query my UTF-8 database on the XO, I'll let you
> > know. But, it's not an XO problem.
I didn't actually look at the problem like Kelly did, but I can summarize:
* When talking to a database, you have to make sure that the database
itself is using the right encoding and that the connection to the
database is using the right encoding. For MySQLdb, I like to
configure it to use UTF-8 and have it automatically take care of
encoding and decoding for me.
* Remember not to call str on unicode objects.
* Call print repr(obj) on whatever object you get back from the query
to see whether you have a string "foo" or a unicode object u"foo".
* Read http://wiki.pylonshq.com/display/pylonsdocs/Unicode. At some
point, a few of us at the Pylons project did our best to explain this
stuff really well.
Best Regards,
-jj
--
I, for one, welcome our new Facebook overlords!
http://jjinux.blogspot.com/
More information about the Baypiggies
mailing list