unicode wrap unicode object?

"Martin v. Löwis" martin at v.loewis.de
Sat Apr 8 05:54:18 EDT 2006


ygao wrote:
> I must use utf-8 for chinese.

Sure. But please don't do that:

>>>> import sys
>>>> reload(sys)
>>>> sys.setdefaultencoding("utf-8")

As Fredrik says, you should really avoid changing the
default encoding.

>>>> s='\xe9\xab\x98' #this uff-8 string
>>>> ss=U'\xe9\xab\x98'
>>>> ss1=ss.encode('unicode_escape').decode('string_escape')
>>>> s1=s.decode('unicode_escape')
>>>> s1==ss 
> True 
>>>> ss1==s 
> True

Ok. But how about that:

py> s='\xe9\xab\x98'
py> ss=u'\u9ad8'
py> s1=s.decode('utf-8')
py> s1==ss
True

Here, ss is a single character, which uses 3 bytes in UTF-8.
In your example, ss has three characters, which are not Chinese,
but European.

Regards,
Martin



More information about the Python-list mailing list