PEP 3131: Supporting Non-ASCII Identifiers - ambiguity issues

Tue May 15 19:51:40 EDT 2007

On May 15, 6:44 pm, John Nagle <n... at animats.com> wrote:
>     There are really two issues here, and they're being
> confused.
>
>     One is allowing non-English identifiers, which is a political
> issuer. The other is homoglyphs, two characters which look the same.
> The latter is a real problem in a language like Python with implicit
> declarations.  If a maintenance programmer sees a variable name
> and retypes it, they may silently create a new variable.
>
>     If Unicode characters are allowed, they must be done under some
> profile restrictive enough to prohibit homoglyphs.  I'm not sure
> if UTS-39, profile 2, "Highly Restrictive", solves this problem,
> but it's a step in the right direction.  This limits mixing of scripts
> in a single identifier; you can't mix Hebrew and ASCII, for example,
> which prevents problems with mixing right to left and left to right
> scripts.  Domain names have similar restrictions.
>
>     We have to have visually unique identifiers.
>
>     There's also an issue with implementations that interface
> with other languages.  Some Python implementations generate
> C, Java, or LISP code.  Even CPython will call C code.
> The representation of external symbols needs to be standardized
> across those interfaces.
>
Surely it should be possible programmatically to compare the visual
appearance of the characters and highlight ones which are similar, or
colour-code various subsets when required.