PEP 3131: Supporting Non-ASCII Identifiers - ambiguity issues

Tue May 15 13:48:21 EDT 2007

John Nagle wrote:
>    There are really two issues here, and they're being
> confused.
> 
>    One is allowing non-English identifiers, which is a political
> issuer. The other is homoglyphs, two characters which look the same.
> The latter is a real problem in a language like Python with implicit
> declarations.  If a maintenance programmer sees a variable name
> and retypes it, they may silently create a new variable.
> 
>    If Unicode characters are allowed, they must be done under some
> profile restrictive enough to prohibit homoglyphs.  I'm not sure
> if UTS-39, profile 2, "Highly Restrictive", solves this problem,
> but it's a step in the right direction.  This limits mixing of scripts
> in a single identifier; you can't mix Hebrew and ASCII, for example,
> which prevents problems with mixing right to left and left to right
> scripts.  Domain names have similar restrictions.
> 
>    We have to have visually unique identifiers.

As others stated before, this is unlikely to become a problem in practice.
Project-internal standards will usually define a specific language for a
project, in which case these issues will not arise. In general, programmers
from a specific language/script background will stick to that script and not
magically start typing foreign characters. And projects where multiple
languages are involved will have to define a target language anyway, most
likely (although not necessarily) English.

Note that adherence to a specific script can easily checked programmatically
through Unicode ranges - if the need ever arises.

Stefan