PEP 3131: Supporting Non-ASCII Identifiers - ambiguity issues

John Nagle nagle at animats.com
Tue May 15 13:44:37 EDT 2007


    There are really two issues here, and they're being
confused.

    One is allowing non-English identifiers, which is a political
issuer. The other is homoglyphs, two characters which look the same.
The latter is a real problem in a language like Python with implicit
declarations.  If a maintenance programmer sees a variable name
and retypes it, they may silently create a new variable.

    If Unicode characters are allowed, they must be done under some
profile restrictive enough to prohibit homoglyphs.  I'm not sure
if UTS-39, profile 2, "Highly Restrictive", solves this problem,
but it's a step in the right direction.  This limits mixing of scripts
in a single identifier; you can't mix Hebrew and ASCII, for example,
which prevents problems with mixing right to left and left to right
scripts.  Domain names have similar restrictions.

    We have to have visually unique identifiers.

    There's also an issue with implementations that interface
with other languages.  Some Python implementations generate
C, Java, or LISP code.  Even CPython will call C code.
The representation of external symbols needs to be standardized
across those interfaces.

				John Nagle



More information about the Python-list mailing list