unichr() question

Ezequiel, Justin j.ezequiel at spitech.com
Wed Nov 5 01:21:48 EST 2003


From: martin at v.loewis.de

I strongly advise that you don't. Even though an UCS-2 Python build has some capbilities to represent non-BMP characters, you should use these facilities only if you know what you are doing, and if you absolutely need it.

>>> def ucs4toucs2(codepoint):
...   hi,lo=divmod(codepoint-0x10000,0x400)
...   return 0xd800+hi,0xdc00+lo
...

Dear Martin,

Thanks for taking time to reply and for the function.
Sorry for responding so late (I get the mail digest and currently have 390 digests unread).

I am converting XML files with entities to utf-8 using a lookup table:

⏞	0FE37
⏟	0FE38
<sc>O</sc>	1D4AA

I have no idea what I am doing but I sure think that I absolutely need it.
Can you explain more on non-BMP characters (and Python's capabilities to represent these) and how it applies (if it does) to my needs?





More information about the Python-list mailing list