Psycopg and queries with UTF-8 data

Diez B. Roggisch deetsNOSPAM at web.de
Thu Oct 14 07:55:34 EDT 2004


> Ah, I see now. I _thought_ it was odd that unicode('string') resulted in
> a unicode object and 'string'.encode('utf-8') did not. I understand now
> that 'unicode' is data that is actual unicode data, while 'utf-8'
> _encoded_ data is really a string, but with special characters rewritten
> to specify utf-8 escape sequences instead of the actual unicode bytes.

Exactly.

> 
> Thanks for clearing out my confusion.

Your welcome.
 
> while confused():
> print "unicode is not utf-8!!!"

Lets hope confused() is True only for a short time, otherwise you'll end up
with pretty much output...

>> Do encode the unicode object in utf-8, and pass that to the psycopg. If
>> you set client_encoding to latin1, you have to encode unicod to that.
> 
> I suppose I won't notice much of that until I read from the DB (which is
> done in PHP mostly), as the data inserted is already an ascii string by
> itself (with escaped utf-8 characters, though). I'll worry about that
> later ;)

Well, AFAIK php doesn't care about unicode - all it knows are strings as
byte sequences, plain old C-style. So if you read from it, things should
work if you set your HTTP header variables correct _and_ other parts of you
html-page aren't made in a different encoding - so make sure typing them in
your editor of choice will yield utf-8 data beeing saved.


-- 
Regards,

Diez B. Roggisch



More information about the Python-list mailing list