[Tutor] unichr() question

Thu Oct 16 22:54:05 EDT 2003

On Tue, 14 Oct 2003, Ezequiel, Justin wrote:

> PythonWin 2.2.2
> Windows XP
> 
> >>> long('1D4AA', 16)
> 119978L
> >>> unichr(long('1D4AA', 16))
> Traceback (most recent call last):
>   File "<interactive input>", line 1, in ?
> ValueError: unichr() arg not in range(0x10000) (narrow Python build)
> >>> x = eval("u'\\U000%s'" % '1D4AA')
> >>> x
> u'\U0001d4aa'
> >>> for c in x:
> ... 	print ord(c)
> ... 
> 55349
> 56490
> >>> unichr(55349) + unichr(56490)
> u'\U0001d4aa'
> >>> 
> 
> How do I convert strings such as '1D4AA' to unicode without using eval()?
Justin, I see you haven't gotten any responses on this yet.  I don't know 
an answer, but I ran into something similar on some of the Unihan 
characters.  Fortunately for me, I found I could just ignore any that were 
over x'FFFF'; it doesn't sound like you can.

I looked into it for a while and determined that it depends on how your 
Python was built.  If it was a "narrow build", it supports Unicode 
characters only up to x'FFFF'; if a "wide build", it supports Unicode 
x'10000" and higher, as well.  As far as I can tell, it depends on whether 
the installer specified "--enable-unicode=ucs4" to get the wide build.

I'm a Windows user, too, and dependent on the Activestate build, which is
narrow.  In the end, I decided to just avoid the higher Unicode values,
which didn't matter for me.  If you have a way of getting a "wide build" I
suspect this would do the trick for you.

There's more information in PEP 261, 
http://www.python.org/peps/pep-0261.html : I think this is the last word 
on it.

Hopefully, some others more informed on Python internals and Unicode can 
give more information on this, but I hope this helps somewhat.

-- 
Terry Carroll
Santa Clara, CA
carroll at tjc.com