PEP 3131: Supporting Non-ASCII Identifiers

Tue May 15 21:59:27 EDT 2007

Thus spake Steven D'Aprano (steven at REMOVE.THIS.cybersource.com.au):

> Perhaps you aren't aware that doing something "by eye" is idiomatic
> English for doing it quickly, roughly, imprecisely. It is the opposite of
> taking the time and effort to do the job carefully and accurately. If you
> measure something "by eye", you just look at it and take a guess.

Well, Steve, speaking as someone not entirely unfamiliar with idiomatic
English, I can say with some confidence that that's complete and utter bollocks
(idomatic usage for "nonsense", by the way). To do something "by eye" means
nothing more nor less than doing it visually. Unless you can provide a citation
to the contrary, please move on from this petty little point of yours, and try
to make a substantial technical argument instead.

> So, as I said, if you're relying on VISUAL INSPECTION (your words _now_)
> you're already vulnerable. Fortunately for you, you're not relying on
> visual inspection, you are actually _reading_ and _comprehending_ the
> code. That might even mean, in extreme cases, you sit down with pencil
> and paper and sketch out the program flow to understand what it is doing.

Please, pick up a dictionary, and look up "visual" and "inspection", then
re-read my message. Ponder the fact that visual inspection is in fact a
necessary precursor to "reading" or "comprehending" code. Now, imagine reading
a piece of code where you can never be sure that a character is what it appears
to be...

> >> If I've understood Martin's post, the PEP states that identifiers are
> >> converted to normal form. If two identifiers look the same, they will
> >> be the same.
> >
> > I'm sorry to have to tell you, but you understood Martin's post no
> > better than you did mine. There is no general way to detect homoglyphs
> > and "convert them to a normal form". Observe:
> >
> > import unicodedata
> > print repr(unicodedata.normalize("NFC", u"\u2160")) print u"\u2160"
> > print "I"
>
> Yes, I observe two very different glyphs, as different as the ASCII
> characters I and |. What do you see?

I recommend that you gain a basic understanding of the relationship between
Unicode code points and the glyphs on your screen before attempting to argue
this point again. The particular glyph your current font-set translates the
character into is irrelevant. Indeed, the fact that there is font variation
from client to client is one of the more obvious problems with your technically
illiterate hope that one could homogenize characters so that everything that
looks the same has the same meaning. Fiddle around with your fontsets a bit -
you only have to find one combination where the two glyps look the same to
prove my case...

Regards,

Aldo

-- 
Aldo Cortesi
aldo at nullcube.com
http://www.nullcube.com
Mob: 0419 492 863