How to display Chinese in a list retrieved from database via python

Mon Dec 29 08:49:42 EST 2008

On 12月29日, 下午5时06分, "Mark Tolonen" <metolone+gm... at gmail.com> wrote:
> "zxo102" <zxo... at gmail.com> wrote in message
>
> news:2560a6e0-c103-46d2-aa5a-8604de4d1968 at b38g2000prf.googlegroups.com...
>
> > I have a list in a dictionary and want to insert it into the html
> > file. I test it with following scripts of CASE 1, CASE 2 and CASE 3. I
> > can see "中文" in CASE 1 but that is not what I want. CASE 2 does not
> > show me correct things.
> > So, in CASE 3, I hacked the script of CASE 2 with a function:
> > conv_list2str() to 'convert' the list into a string. CASE 3 can show
> > me "中文". I don't know what is wrong with CASE 2 and what is right with
> > CASE 3.
>
> > Without knowing why, I have just hard coded my python application
> > following CASE 3 for displaying Chinese characters from a list in a
> > dictionary in my web application.
>
> > Any ideas?
>
> See below each case...新年快乐！
>
>
>
> > Happy a New Year: 2009
>
> > ouyang
>
> > CASE 1:
> > ########################################################
> > f=open('test.html','wt')
> > f.write('''<html><head>
> > <META HTTP-EQUIV="Content-Type" CONTENT="text/html;charset=gb2312">
> > <title>test</title>
> > <script language=javascript>
> > var test = ['\xd6\xd0\xce\xc4', '\xd6\xd0\xce\xc4', '\xd6\xd0\xce
> > \xc4']
> > alert(test[0])
> > alert(test[1])
> > alert(test[2])
> > </script>
> > </head>
> > <body></body></html>''')
> > f.close()
>
> In CASE 1, the *4 bytes* D6 D0 CE C4 are written to the file, which is the
> correct gb2312 encoding for 中文.
>
>
>
> > CASE 2:
> > #######################################################
> > mydict = {}
> > mydict['JUNK'] = ['\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4','\xd6\xd0\xce
> > \xc4']
> > f_str = '''<html><head>
> > <META HTTP-EQUIV="Content-Type" CONTENT="text/html;charset=gb2312">
> > <title>test</title>
> > <script language=javascript>
> > var test = %(JUNK)s
> > alert(test[0])
> > alert(test[1])
> > alert(test[2])
> > </script>
> > </head>
> > <body></body></html>'''
>
> > f_str = f_str%mydict
> > f=open('test02.html','wt')
> > f.write(f_str)
> > f.close()
>
> In CASE 2, the *16 characters* "\xd6\xd0\xce\xc4" are written to the file,
> which is NOT the correct gb2312 encoding for 中文, and will be interpreted
> however javascript pleases.  This is because the str() representation of
> mydict['JUNK'] in Python 2.x is the characters "['\xd6\xd0\xce\xc4',
> '\xd6\xd0\xce\xc4', '\xd6\xd0\xce\xc4']".
>
>
>
> > CASE 3:
> > ###################################################
> > mydict = {}
> > mydict['JUNK'] = ['\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4','\xd6\xd0\xce
> > \xc4']
>
> > f_str = '''<html><head>
> > <META HTTP-EQUIV="Content-Type" CONTENT="text/html;charset=gb2312">
> > <title>test</title>
> > <script language=javascript>
> > var test = %(JUNK)s
> > alert(test[0])
> > alert(test[1])
> > alert(test[2])
> > </script>
> > </head>
> > <body></body></html>'''
>
> > import string
>
> > def conv_list2str(value):
> >   list_len = len(value)
> >   list_str = "["
> >   for ii in range(list_len):
> >       list_str += '"'+string.strip(str(value[ii])) + '"'
> >       if ii != list_len-1:
> >        list_str += ","
> >   list_str += "]"
> >   return list_str
>
> > mydict['JUNK'] = conv_list2str(mydict['JUNK'])
>
> > f_str = f_str%mydict
> > f=open('test03.html','wt')
> > f.write(f_str)
> > f.close()
>
> CASE 3 works because you build your own, correct, gb2312 representation of
> mydict['JUNK'] (value[ii] above is the correct 4-byte sequence for 中文).
>
> That said, learn to use Unicode strings by trying the following program, but
> set the first line to the encoding *your editor* saves files in.  You can
> use the actual Chinese characters instead of escape codes this way.  The
> encoding used for the source code and the encoding used for the html file
> don't have to match, but the charset declared in the file and the encoding
> used to write the file *do* have to match.
>
> # coding: utf8
>
> import codecs
>
> mydict = {}
> mydict['JUNK'] = [u'中文',u'中文',u'中文']
>
> def conv_list2str(value):
>     return u'["' + u'","'.join(s for s in value) + u'"]'
>
> f_str = u'''<html><head>
> <META HTTP-EQUIV="Content-Type" CONTENT="text/html;charset=gb2312">
> <title>test</title>
> <script language=javascript>
> var test = %s
> alert(test[0])
> alert(test[1])
> alert(test[2])
> </script>
> </head>
> <body></body></html>'''
>
> s = conv_list2str(mydict['JUNK'])
> f=codecs.open('test04.html','wt',encoding='gb2312')
> f.write(f_str % s)
> f.close()
>
> -Mark
>
> P.S.  Python 3.0 makes this easier for what you want to do, because the
> representation of a dictionary changes.  You'll be able to skip the
> conv_list2str() function and all strings are Unicode by default.

Thanks for your comments, Mark. I understand it now. The list(escape
codes): ['\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4','\xd6\xd0\xce\xc4'] is
from a postgresql database with "select" statement.I will postgresql
database configurations and see if it is possible to return ['中文','中
文','中文'] directly with "select" statement.

ouyang