break unichr instead of fix ord?

rurpy at yahoo.com rurpy at yahoo.com
Sat Aug 29 20:16:30 EDT 2009


On 08/29/2009 01:43 PM, Vlastimil Brom wrote:
> > 2009/8/29<rurpy at yahoo.com>:
>> >>  On 08/28/2009 02:12 AM, "Martin v. Löwis" wrote:
>> >>
>> >>  So far, it seems not and that unichr/ord
>> >>  is a poster child for "purity beats practicality".
>> >>  --
>> >>  http://mail.python.org/mailman/listinfo/python-list
>> >>
> >
> > As Mark Tolonen pointed out earlier in this thread, in Python 3 the
> > practicality apparently beat purity in this aspect:
> >
> > Python 3.1.1 (r311:74483, Aug 17 2009, 17:02:12) [MSC v.1500 32 bit
> > (Intel)] on win32
> > Type "copyright", "credits" or "license()" for more information.
> >
>>>> >>>>  goth_urus_1 = '\U0001033f'
>>>> >>>>  list(goth_urus_1)
> > ['\ud800', '\udf3f']
>>>> >>>>  len(goth_urus_1)
> > 2
>>>> >>>>  ord(goth_urus_1)
> > 66367
>>>> >>>>  goth_urus_2 = chr(66367)
>>>> >>>>  len(goth_urus_2)
> > 2
>>>> >>>>  import unicodedata
>>>> >>>>  unicodedata.name(goth_urus_1)
> > 'GOTHIC LETTER URUS'
>>>> >>>>  goth_urus_3 = unicodedata.lookup("GOTHIC LETTER URUS")
>>>> >>>>  goth_urus_4 = "\N{GOTHIC LETTER URUS}"
>>>> >>>>  goth_urus_1 == goth_urus_2 == goth_urus_3 == goth_urus_4
> > True
>>>> >>>>

Yes, that certainly seems like much more sensible behavior.

> > As for the behaviour in python 2.x, it's probably good enough, that
> > the surrogates aren't prohibited and the eventually needed behaviour
> > can be easily added via custom functions.

Yes, I agree that given the current behavior is well documented
and further, is fixed in python 3, it can't be changed.

I would a nit though with "can be easily added via custom
functions."
I don't think that is a good criterion for rejection of functionality
from the library because it is not sufficient; their are many
functions
in the library that fail that test.  I think the criterion should
be more like a ratio: (how often needed) / (ease of writing).
[where "ease" is not just the line count but also the obviousness
to someone who is not a python expert yet.]

And I would also dispute that the generalized unichr/ord functions
are "easily" added.  When I ran into the TypeError in ord(), I
thought "surrogate pairs" were something used in sex therapy. :-)
It took a lot of reading and research before I was able to write
a generalized ord() function.



More information about the Python-list mailing list