convert \uXXXX to native character set?

Bengt Richter bokr at oz.net
Tue Dec 21 17:59:11 EST 2004


On Mon, 20 Dec 2004 12:49:39 +0200, Miki Tebeka <miki.tebeka at zoran.com> wrote:

>Hello Joe,
>
>>     Is there any library to convert HTML page with \uXXXX encoded text to
>>    native character set, e.g. BIG5.
>Try: help("".decode)
>
But the OP wants to en-code, I think. E.g. (I don't know what Chinese for ichi is ;-)

 >>> ichi = u'\u4e00'
 >>> ichi
 u'\u4e00'
 >>> ichi.encode('big5')
 '\xa4@'

UIAM that created two str bytes constituting big5 code for
the single horizontal stroke glyph whose unicode code is u'\u4e00'

 >>> list(ichi.encode('big5'))
 ['\xa4', '@']

going from big5-encoded str back to unicode then takes de-coding:

 >>> '\xa4@'.decode('big5')
 u'\u4e00'

Regards,
Bengt Richter



More information about the Python-list mailing list