[Python-3000] String comparison

Rauli Ruohonen rauli.ruohonen at gmail.com
Sat Jun 9 23:01:57 CEST 2007


On 6/9/07, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> Rauli Ruohonen writes:
>  > The ones it absolutely prohibits in interchange are surrogates.
>
> Excuse me?  Surrogates are code points with a specific interpretation
> if it is "purported that the stream is in UTF-16".  Otherwise, Unicode
> 4.0 explicitly says that there is nothing illegal about an isolated
> surrogate (p.75, where an example is given of how such a surrogate
> might occur).

I meant interchange instead of strings. Anything is allowed in strings.

Chapter 2 (not normative, but clear) explains on page 26:

 Restricted interchange. [...]
  - Surrogate code points cannot be conformantly interchanged using
    Unicode encoding forms. [...]
  - Noncharacter code points are reserved for internal use, such as for
    sentinel values. They should never be interchanged. [...]

> My point was precisely that I don't object to this implementation.  I
> want Unicode-ly-correct behavior to be a goal of the language, the
> community disagrees, and Guido disagrees.  That's that.

My understanding is that it is a goal, but practicality beats purity.
I think the only disagreement is on what's practical.


More information about the Python-3000 mailing list