PEP 3131: Supporting Non-ASCII Identifiers
Steven D'Aprano
steven at REMOVE.THIS.cybersource.com.au
Tue May 15 20:46:21 EDT 2007
On Tue, 15 May 2007 20:43:31 +1000, Aldo Cortesi wrote:
> Thus spake Steven D'Aprano (steven at REMOVE.THIS.cybersource.com.au):
>
>> >> Me, I try to understand a patch by reading it. Call me
>> >> old-fashioned.
>> >
>> > I concur, Aldo. Indeed, if I _can't_ be sure I understand a patch, I
>> > don't accept it -- I ask the submitter to make it clearer.
>>
>>
>> Yes, but there is a huge gulf between what Aldo originally said he does
>> ("visual inspection") and *reading and understanding the code*.
>
> Let's set aside the fact that you're guilty of sloppy quoting here,
> since the phrase "visual inspection" is yours, not mine.
Yes, my bad, I apologize, that was sloppy of me. What you actually said
was "I can't reliably verify it by eye".
> Regardless,
> your interpretation of my words is just plain dumb. My phrasing was
> intended to draw attention to the fact that one needs to READ code in
> order to understand it. You know - with one's eyes. VISUALLY. And VISUAL
> INSPECTION of code becomes unreliable if this PEP passes.
Not withstanding my misquote, I find it ... amusing ... that after
hauling me over the coals for using the term "visual inspection", you're
not only using it, but shouting it.
Perhaps you aren't aware that doing something "by eye" is idiomatic
English for doing it quickly, roughly, imprecisely. It is the opposite of
taking the time and effort to do the job carefully and accurately. If you
measure something "by eye", you just look at it and take a guess.
So, as I said, if you're relying on VISUAL INSPECTION (your words _now_)
you're already vulnerable. Fortunately for you, you're not relying on
visual inspection, you are actually _reading_ and _comprehending_ the
code. That might even mean, in extreme cases, you sit down with pencil
and paper and sketch out the program flow to understand what it is doing.
Now that (I hope!) you understand why I said what I said, can we agree
that _understanding_ is critical to the process? If you don't understand
the code, you don't accept it. If somebody submits a patch with
identifiers like a9472302 and a 9473202 you're going to reject it as too
difficult to understand.
How do non-ASCII identifiers change that situation? What will be
different?
>> If I've understood Martin's post, the PEP states that identifiers are
>> converted to normal form. If two identifiers look the same, they will
>> be the same.
>
> I'm sorry to have to tell you, but you understood Martin's post no
> better than you did mine. There is no general way to detect homoglyphs
> and "convert them to a normal form". Observe:
>
> import unicodedata
> print repr(unicodedata.normalize("NFC", u"\u2160")) print u"\u2160"
> print "I"
Yes, I observe two very different glyphs, as different as the ASCII
characters I and |. What do you see?
> So, a round 0 for reading comprehension this lesson, I'm afraid. Better
> luck next time.
Ha ha, very funny.
So, let's summarize...
Non-ASCII identifiers are bad, because they are vulnerable to the exact
same problems as ASCII identifiers, only we're happy to live with those
problems if they are ASCII, and just install a font that makes I and l
look different, but we won't install a font that makes I and Ⅰ look
different, because that's too hard.
Well, you've convinced me. Obviously expecting Python programmers to cope
with something as complicated as installing a decent set of fonts is such
a major huddle that people will abandon the language in droves, probably
taking up Haskel and Visual Basic and Lisp and all those other languages
that allow non-ASCII identifiers.
--
Steven.
More information about the Python-list
mailing list