How to display Chinese in a list retrieved from database via python

Mon Dec 29 14:19:48 EST 2008

"zxo102" <zxo102 at gmail.com> wrote in message 
news:7e38e76a-d5ee-41d9-9ed5-73a2e2993733 at w1g2000prm.googlegroups.com...
> On 12月29日, 下午5时06分, "Mark Tolonen" <metolone+gm... at gmail.com> wrote:
>> "zxo102" <zxo... at gmail.com> wrote in message
>>
>> news:2560a6e0-c103-46d2-aa5a-8604de4d1968 at b38g2000prf.googlegroups.com...
>>

[snip]

>> That said, learn to use Unicode strings by trying the following program, 
>> but
>> set the first line to the encoding *your editor* saves files in.  You can
>> use the actual Chinese characters instead of escape codes this way.  The
>> encoding used for the source code and the encoding used for the html file
>> don't have to match, but the charset declared in the file and the 
>> encoding
>> used to write the file *do* have to match.
>>
>> # coding: utf8
>>
>> import codecs
>>
>> mydict = {}
>> mydict['JUNK'] = [u'中文',u'中文',u'中文']
>>
>> def conv_list2str(value):
>>     return u'["' + u'","'.join(s for s in value) + u'"]'
>>
>> f_str = u'''<html><head>
>> <META HTTP-EQUIV="Content-Type" CONTENT="text/html;charset=gb2312">
>> <title>test</title>
>> <script language=javascript>
>> var test = %s
>> alert(test[0])
>> alert(test[1])
>> alert(test[2])
>> </script>
>> </head>
>> <body></body></html>'''
>>
>> s = conv_list2str(mydict['JUNK'])
>> f=codecs.open('test04.html','wt',encoding='gb2312')
>> f.write(f_str % s)
>> f.close()
>>
>> -Mark
>>
>> P.S.  Python 3.0 makes this easier for what you want to do, because the
>> representation of a dictionary changes.  You'll be able to skip the
>> conv_list2str() function and all strings are Unicode by default.
>
> Thanks for your comments, Mark. I understand it now. The list(escape
> codes): ['\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4'] is
> from a postgresql database with "select" statement.I will postgresql
> database configurations and see if it is possible to return ['中文','中
> 文','中文'] directly with "select" statement.
>
> ouyang

The trick with working with Unicode is convert anything read into the 
program (from a file, database, etc.) to Unicode characters, manipulate it, 
then convert it back to a specific encoding when writing it back.  So if 
postgresql is returning gb2312 data, use:

data.decode('gb2312') to get the Unicode equivalent:

>>> '\xd6\xd0\xce\xc4'.decode('gb2312')
u'\u4e2d\u6587'
>>> print '\xd6\xd0\xce\xc4'.decode('gb2312')
中文

Google for some Python Unicode tutorials.

-Mark