how to use unicode?(2)

vincent wehren v.wehren at home.nl
Mon Jan 27 15:07:32 EST 2003


"gfu" <gfu at public.nn.gx.cn> schrieb im Newsbeitrag
news:mailman.1043680929.18267.python-list at python.org...
> >gfu wrote:
> >> >>> u'Ðж¼'
> >> u'\xcb\xc6\xcd\xf8\xd2\xb3'
> >
> >In Python 2.2, you cannot put non-ASCII characters into a Unicode
> >literals(*). In Python 2.3, this is possible, but only if you declare
> >the file encoding (i.e. you cannot enter them readily in interactive
mode).
> >
> >So if you want those characters in a string, you need to write
> >
> >u'\u884c\u90fd'
> >
> >Here, U+884C and U+90FD are the Unicode code points of the two
> >characters you show above. Alternatively, writing
> >
> >unicode('Ðж¼', 'gb2312')
> >
> >should also work, provided you have a codec for gb2312 installed (and
> >provided your input is really encoded in gb2312).
> >
> >HTH,
> >Martin
> >
> >(*) Strictly speaking, you can put any Latin-1 into a Unicode literal if
> >the file is encoded in Latin-1.
> >
> >--
> >http://mail.python.org/mailman/listinfo/python-list
>
> = = = = = = = = = = = = = = = = = = = =
> thank you.
> In IDLE:
> >>> s=u'\u884c\u90fd'
>     >>> print s
> Ðж¼
>
> it's ok.but:
> >>> s = unicode('Ðж¼', 'gb2312')
> UnicodeError: ASCII encoding error: ordinal not in range(128)
>
> how to install a codec for gb2312 ?

Martin von L?wis kindfully pointed out Chinese codecs at
ftp://freebsd.sinica.edu.tw/pub/ycheng/python/ChineseCodecs1.2.0.tar.gz
in another post...

>how to my input is really encoded in gb2312?

If you know that it is "simplified Chinese" you are inputting, it's safe to
say you are inputting gb2312. On Windows, this is codepage 936. Run "chcp"
in your command prompt. This should yield "CP936" as your current codepage.
This of course only applies if you are using Windows 2000's native support
for non-unicode based applications. If you are using a layer on top of
Windows, you are on your own... If you are inputting "Traditional Chinese"
(which would yield "CP950" when running chcp), it's safe to say that the
encoding is "Big5".

Regards

Vincent Wehren

>
>
> gfu
>
>
>






More information about the Python-list mailing list