how to use unicode?

Mon Jan 27 04:15:05 EST 2003

gfu wrote:
> 	>>> u'行都'
> 	 u'\xcb\xc6\xcd\xf8\xd2\xb3'

In Python 2.2, you cannot put non-ASCII characters into a Unicode 
literals(*). In Python 2.3, this is possible, but only if you declare 
the file encoding (i.e. you cannot enter them readily in interactive mode).

So if you want those characters in a string, you need to write

u'\u884c\u90fd'

Here, U+884C and U+90FD are the Unicode code points of the two 
characters you show above. Alternatively, writing

unicode('行都', 'gb2312')

should also work, provided you have a codec for gb2312 installed (and 
provided your input is really encoded in gb2312).

HTH,
Martin

(*) Strictly speaking, you can put any Latin-1 into a Unicode literal if 
the file is encoded in Latin-1.