[Python-Dev] len(chr(i)) = 2?

Stephen J. Turnbull stephen at xemacs.org
Tue Nov 23 17:18:57 CET 2010


Nick Coghlan writes:

 > For practical purposes, UCS2/UCS4 convey far more inherent information
 > than narrow/wide:

That was my stance, but in fact (1) the ISO JTC1/SC2 has deliberately
made them ambiguous by changing their definitions over the years[1],
and (2) the more recent definitions and "interpretations" of UCS-2
*prohibit* use of surrogates in UCS-2 as far as I can tell.  And
that's what you'll see everywhere you look, because Wikipedia and
friends pick up the most recent versions of everything.

 > So don't just think about "what will developers know?", also think
 > about "what will developers know, and what will a quick trip to a
 > search engine tell them?".

It will tell them that UCS-2 cannot even *express* non-BMP characters.
Terry and David are *not* dummies, and that's what they got from more
or less careful study of the issue.

 > And once you take that stance, the overly
 > generic narrow/wide terms fail, badly.

I still agree that something more accurate would be nice, but face it:
the ISO will redefine and deprecate such terms as soon as they notice
us using them.<wink>

 > +1 for MAL's suggested tweaks to the Py3k configure options.

Despite my natural sympathy for your arguments, and MAL's, I'm still
-1.  I really wish I could switch back, but it seems to me that
"UCS-2" is a liability we don't need, *especially* on Windows where
the default build is presumably going to be UCS2 forever.

Footnotes: 
[1]  You'd think it would be hard to change the definition of UCS-4,
but they managed. :-(



More information about the Python-Dev mailing list