break unichr instead of fix ord?

rurpy at yahoo.com rurpy at yahoo.com
Wed Aug 26 19:27:33 EDT 2009


On 08/26/2009 03:10 PM, "Martin v. Löwis" wrote:
>> >>  In Python 2.5 on Windows I could do [*1]:
>> >>
>> >>    >>>  a = unichr (65600)
>> >>    >>>  a[0],a[1]
>> >>    (u'\ud800', u'\udc40')
> >
> > I can't reproduce that. My copy of Python on Windows gives
> >
> > Traceback (most recent call last):
> >    File "<pyshell#0>", line 1, in<module>
> >      unichr(65600)
> > ValueError: unichr() arg not in range(0x10000) (narrow Python build)
> >
> > This is
> >
> > Python 2.5.4 (r254:67916, Dec 23 2008, 15:10:54) [MSC v.1310 32 bit
> > (Intel)] on win32

My apologies for the red herring.  I was working from
a comment in my replacement ord() function.  I dug up
an old copy of Python 2.4.3 and could not reproduce it
there either so I have no explanation for the comment
(which I wrote).  Python 2.3 maybe?

But regardless, the significant question is, what is
the reason for having ord() (and unichr) not work for
surrogate pairs and thus not usable with a large number
of unicode characters that Python otherwise supports?



More information about the Python-list mailing list