PEP 3131: Supporting Non-ASCII Identifiers

Tue May 15 05:22:22 EDT 2007

On Sun, 13 May 2007 23:00:17 -0700, Alex Martelli wrote:

> Aldo Cortesi <aldo at nullcube.com> wrote:
> 
>> Thus spake Steven D'Aprano (steven at REMOVE.THIS.cybersource.com.au):
>> 
>> > If you're relying on cursory visual inspection to recognize harmful
>> > code, you're already vulnerable to trojans.
>> 
>> What a daft thing to say. How do YOU recognize harmful code in a patch
>> submission? Perhaps you blindly apply patches, and then run your test
>> suite on a quarantined system, with an instrumented operating system to
>> allow you to trace process execution, and then perform a few weeks
>> worth of analysis on the data?
>> 
>> Me, I try to understand a patch by reading it. Call me old-fashioned.
> 
> I concur, Aldo.  Indeed, if I _can't_ be sure I understand a patch, I
> don't accept it -- I ask the submitter to make it clearer.

Yes, but there is a huge gulf between what Aldo originally said he does 
("visual inspection") and *reading and understanding the code*.

If somebody submits a piece of code where all the variable names, 
functions, classes etc. are like a958323094, a498307913, etc. you're 
going to have a massive problem following the code despite being in 
ASCII. You would be sensible to reject the code. If you don't read 
Chinese, and somebody submits a patch in Chinese, you would be sensible 
to reject it, or at least have it vetted by somebody who does read 
Chinese.

But is it really likely that somebody is going to submit a Chinese patch 
to your English or Italian project? I don't think so.

> Homoglyphs would ensure I could _never_ be sure I understand a patch,
> without at least running it through some transliteration tool.  I don't
> think the world of open source needs this extra hurdle in its path.

If I've understood Martin's post, the PEP states that identifiers are 
converted to normal form. If two identifiers look the same, they will be 
the same.

Except, probably, identifiers using ASCII O and 0, or I l and 1, or rn 
and m. Depending on your eyesight and your font, they look the same. The 
solution to that isn't to prohibit O and 0 in identifiers, but to use a 
font that makes them look different. 

But even if the homoglyphs was a problem, as hurdles go, it's hardly a 
big one. No doubt you already use automated tools for patch management, 
revision control, bug tracking, unit-testing, maybe even spell checking. 
Adding a transliteration tool to your arsenal is not really a disaster.

-- 
Steven.