Unicode normalisation [was Re: [beginner] What's wrong?]

Rustom Mody rustompmody at gmail.com
Fri Apr 8 14:04:53 EDT 2016


On Friday, April 8, 2016 at 11:14:21 PM UTC+5:30, Marko Rauhamaa wrote:
> Peter Pearson :
> 
> > On Fri, 08 Apr 2016 16:00:10 +1000, Steven D'Aprano  wrote:
> >> They are not, and never have been, in the typesetting business.
> >> Perhaps characters are not the only things easily confused *wink*
> >
> > Defining codepoints that deal with appearance but not with meaning is
> > going into the typesetting business. Examples: ligatures, and spaces
> > of varying widths with specific typesetting properties like being
> > non-breaking.
> >
> > Typesetting done in MS Word using such Unicode codepoints will never
> > be more than a goofy approximation to real typesetting (e.g., TeX),
> > but it will cost a huge amount of everybody's time, with the current
> > discussion of ligatures in variable names being just a straw in the
> > wind. Getting all the world's writing systems into a single, coherent
> > standard was an extraordinarily ambitious, monumental undertaking, and
> > I'm baffled that the urge to broaden its scope in this irrelevant
> > direction was entertained at all.
> 
> I agree completely but at the same time have a lot of understanding for
> the reasons why Unicode had to become such a mess. Part of it is
> historical, part of it is political, yet part of it is in the
> unavoidable messiness of trying to define what a character is.

There are standards and standards.
Just because they are standard does not make them useful, well-designed,
reasonable etc..

Its reasonably likely that all our keyboards start QWERT...
 Doesn't make it a sane design.

Likewise using NFKC to define the equivalence relation on identifiers
is analogous to saying: Since QWERTY has been in use for over a hundred years
its a perfectly good design. Just because NFKC has the stamp of the unicode
consortium it does not straightaway make it useful for all purposes



More information about the Python-list mailing list