unichr() question

Martin v. Löwis martin at v.loewis.de
Thu Oct 16 17:31:47 EDT 2003


"Ezequiel, Justin" <j.ezequiel at spitech.com> writes:

> How do I convert strings such as '1D4AA' to unicode without using
> eval()?  Alternatively, how can I break down the value 119978L into
> 55349 and 56490?

I strongly advise that you don't. Even though an UCS-2 Python build
has some capbilities to represent non-BMP characters, you should use
these facilities only if you know what you are doing, and if you
absolutely need it.

To convert UCS-4 into a pair of two UTF-16 codepoints, use

>>> def ucs4toucs2(codepoint):
...   hi,lo=divmod(codepoint-0x10000,0x400)
...   return 0xd800+hi,0xdc00+lo
...
>>> ucs4toucs2(119978L)
(55349L, 56490L)


Regards,
Martin




More information about the Python-list mailing list