[Python-Dev] len(chr(i)) = 2?
Stephen J. Turnbull
stephen at xemacs.org
Tue Nov 23 17:18:57 CET 2010
Nick Coghlan writes:
> For practical purposes, UCS2/UCS4 convey far more inherent information
> than narrow/wide:
That was my stance, but in fact (1) the ISO JTC1/SC2 has deliberately
made them ambiguous by changing their definitions over the years[1],
and (2) the more recent definitions and "interpretations" of UCS-2
*prohibit* use of surrogates in UCS-2 as far as I can tell. And
that's what you'll see everywhere you look, because Wikipedia and
friends pick up the most recent versions of everything.
> So don't just think about "what will developers know?", also think
> about "what will developers know, and what will a quick trip to a
> search engine tell them?".
It will tell them that UCS-2 cannot even *express* non-BMP characters.
Terry and David are *not* dummies, and that's what they got from more
or less careful study of the issue.
> And once you take that stance, the overly
> generic narrow/wide terms fail, badly.
I still agree that something more accurate would be nice, but face it:
the ISO will redefine and deprecate such terms as soon as they notice
us using them.<wink>
> +1 for MAL's suggested tweaks to the Py3k configure options.
Despite my natural sympathy for your arguments, and MAL's, I'm still
-1. I really wish I could switch back, but it seems to me that
"UCS-2" is a liability we don't need, *especially* on Windows where
the default build is presumably going to be UCS2 forever.
Footnotes:
[1] You'd think it would be hard to change the definition of UCS-4,
but they managed. :-(
More information about the Python-Dev
mailing list