[I18n-sig] Unicode strings: an alternative

Just van Rossum just@letterror.com
Fri, 5 May 2000 14:17:31 +0100


[Me]
> Exactly. By saying "(wide) strings are not tied to Unicode" the question
> whether wide strings should or should not be sorted according to the
> Unicode spec is answered by a simple "no", instead of "hmm, maybe, but it's
> too hard anyway"...

[Tom Emerson]
>Wait a second.
>
>There is nothing about Unicode that would prevent you from defining
>string equality as byte-level equality.

Agreed.

>This strikes me as the wrong way to deal with the complex collation
>issues of Unicode.

All I was trying to say, was that by looking at it this way, it is even
more obvious that the builtin comparison should not deal with Unicode
sorting & collation issues. It seems you're saying the exact same thing:

>It seems to me that by default wide-strings compare at the byte-level
>(i.e., '=' is a byte level comparison). If you want a normalized
>comparison, then you make an explicit function call for that.

Exactly.

>This is no different from comparing strings in a case sensitive
>vs. case insensitive manner.

Good point. All this taken together still means to me that comparisons
between wide and narrow strings should take place at the character level,
which implies that coercion from narrow to wide is done at the character
level, without looking at the encoding. (Which in my book in turn still
implies that as long as we're talking about Unicode, narrow strings are
effectively Latin-1.)

Just