[I18n-sig] Re: [Python-Dev] Unicode debate

Tom Emerson tree@basistech.com
Tue, 2 May 2000 13:14:24 -0400 (EDT)


M.-A. Lemburg writes:
 > The details are on the www.unicode.org web-site burried
 > in some of the tech reports on normalization and
 > collation.

This is described in the Unicode standard itself, and in UTR #15 and
UTR #10. Normalization is an issue with wider imlications than just
handling glyph variants: indeed, it's irrelevant.

The question is this: should

U+00DC LATIN CAPITAL LETTER U WITH DIAERESIS

compare equal to

U+0055 LATIN CAPITAL LETTER U
U+0308 COMBINING DIAERESIS

or not? It depends on the application. Certainly in a database system
I would want these to compare equal.

Perhaps normalization form needs to be an option of the string comparator?

        -tree

-- 
Tom Emerson                                          Basis Technology Corp.
Language Hacker                                    http://www.basistech.com
  "Beware the lollipop of mediocrity: lick it once and you suck forever"