Psycopg and queries with UTF-8 data

Alban Hertroys alban at magproductions.nl
Thu Oct 14 07:24:00 EDT 2004


Diez B. Roggisch wrote:
> Alban Hertroys wrote:
>>I have a query that inserts data originating from an utf-8 encoded XML
>>file. And guess what, it contains utf-8 encoded characters...
>>Now my problem is that psycopg will only accept queries of type str, so
>>how do I get my utf-8 encoded data into the DB?
> 
> 
> This sounds like the usual unicode/utf-8 confusion: unicode is an abstract
> specification of characters, utf-8 as well as latin1 and ascii are
> encodings of that specification that allow for certain characters to be
> used - namely, ascii for only well-known first 127, latin1 for some major
> european languages, and utf-8 defines escapes for all possible characters
> defined in unicode - with the result that some of the characters aren't one
> byte per character anymore.

Ah, I see now. I _thought_ it was odd that unicode('string') resulted in 
a unicode object and 'string'.encode('utf-8') did not. I understand now 
that 'unicode' is data that is actual unicode data, while 'utf-8' 
_encoded_ data is really a string, but with special characters rewritten 
to specify utf-8 escape sequences instead of the actual unicode bytes.

Thanks for clearing out my confusion.

> Please do read a tutorial on unicode and python - there are several good
> ones out there, use google to your advantage.

I did, though some time ago. Apparently I missed the point being made 
(or forgot about it).

> Confusion again - please repeat: 
> 
> unicode is not utf-8!!!
> unicode is not utf-8!!!
> unicode is not utf-8!!!
> unicode is not utf-8!!!

while confused():
	print "unicode is not utf-8!!!"

> Do encode the unicode object in utf-8, and pass that to the psycopg. If you
> set client_encoding to latin1, you have to encode unicod to that.

I suppose I won't notice much of that until I read from the DB (which is 
done in PHP mostly), as the data inserted is already an ascii string by 
itself (with escaped utf-8 characters, though). I'll worry about that 
later ;)

Many thanks,
Alban.



More information about the Python-list mailing list