[I18n-sig] Unicode strings: an alternative

Tom Emerson tree@basistech.com
Fri, 5 May 2000 07:46:41 -0400 (EDT)


Just van Rossum writes:
 > At 10:07 AM +0100 05-05-2000, Toby Dickenson wrote:
 > >One other pleasant consequence:
 > >
 > >- String comparisons work character-by character, even if the
 > >  representation of those characters have different widths.
 > 
 > Exactly. By saying "(wide) strings are not tied to Unicode" the question
 > whether wide strings should or should not be sorted according to the
 > Unicode spec is answered by a simple "no", instead of "hmm, maybe, but it's
 > too hard anyway"...

Wait a second.

There is nothing about Unicode that would prevent you from defining
string equality as byte-level equality.

This strikes me as the wrong way to deal with the complex collation
issues of Unicode.

It seems to me that by default wide-strings compare at the byte-level
(i.e., '=' is a byte level comparison). If you want a normalized
comparison, then you make an explicit function call for that.

This is no different from comparing strings in a case sensitive
vs. case insensitive manner.

       -tree

-- 
Tom Emerson                                          Basis Technology Corp.
Language Hacker                                    http://www.basistech.com
  "Beware the lollipop of mediocrity: lick it once and you suck forever"