SQlite none english char

Gerhard Häring gh at ghaering.de
Mon Jun 9 04:06:12 EDT 2008


Gandalf wrote:
> [...] 
> I solved the problem by entering data manually but now the problem is
> that i had to delete this function:
> con.text_factory = str
> and now I can't see the data that I entered thought my sqlite manager

Then apparently there is still non-UTF-8 data in the database. Perhaps 
SQLite Manager allows you to insert invalid data as well?

Try this:

con.text_factory = lambda s: unicode(s, "utf-8", "replace")

This will decode to UTF-8 as good as possible, and for non-decodable 
characters you will get a REPLACEMENT CHARACTER instead.

To demonstrate:

 >>> result = con.execute("select ? || ? || ?", ('text ok ', chr(230), ' 
ok, too')).fetchone()[0]

The chr(230) is just random garbage that isn't valid UTF-8:

 >>> print result
text ok �k, too

As you can see, there is now a strange character, but the second 'ok, 
too' got messed up :-/ But at least you can then find out which rows in 
the database have messed up data. You can then iterate over all rows 
with something like:

Then, assuming the text to check for validity is in result, you can do 
something like:

 >>> import unicodedata
 >>> unicodedata.lookup("REPLACEMENT CHARACTER") in result
True

Does this help?

-- Gerhard

PS: This thread reinforces my believe that I have to make it harder for 
users of pysqlite to make themselves shoot in the foot with non-UTF-8 data.




More information about the Python-list mailing list