[I18n-sig] Unicode surrogates: just say no!

Machin, John JMachin@Colonial.com.au
Thu, 28 Jun 2001 11:23:55 +1000


> Unfortunately, the surrogate-creating behavior of \U
> is present in 2.0 and 2.1, so I
> think we can't reasonably remove this from narrow Python 2.2, and I
> like the rule that unichr and \U match.  But maybe that's the one that
> should go, and unichr() and ord() should deal with single code points
> only.

My understanding is that very few people noticed that \U was creating
surrogate pairs, and my guess would be that nobody would be affected in
practice by stopping this behaviour.

IOW, I suggest treating "\U -> surrogate pairs" just like the more
esoteric parts of xrange() -- or the "Korean mess" in earlier Unicode --
just bury it and move on.

IMO, the type of people wanting to fiddle with surrogate pairs
in narrow Python would also be capable of whipping up a C extension
to unpack a narrow Unicode string into a list of ints and do the shifting
and masking necessary with surrogates. If this is not so, then the next
preference
would be for "someone" to write such a C extension and publicise it. I
would volunteer to be that "someone" in the interests of not
burdening ord() with "magic". 


**************   IMPORTANT MESSAGE  **************

The information contained in or attached to this message is intended only for the people it is addressed to. If you are not the intended recipient, any use, disclosure or copying of this information is unauthorised and prohibited. This information may be confidential or subject to legal privilege. It is not the expressed view of Colonial Limited or any of its subsidiaries unless that is clearly stated. Colonial cannot accept liability for any virus damage caused by this message.

**************************************************