PEP 3131: Supporting Non-ASCII Identifiers - ambiguity issues
Stefan Behnel
stefan.behnel-n05pAM at web.de
Tue May 15 13:48:21 EDT 2007
John Nagle wrote:
> There are really two issues here, and they're being
> confused.
>
> One is allowing non-English identifiers, which is a political
> issuer. The other is homoglyphs, two characters which look the same.
> The latter is a real problem in a language like Python with implicit
> declarations. If a maintenance programmer sees a variable name
> and retypes it, they may silently create a new variable.
>
> If Unicode characters are allowed, they must be done under some
> profile restrictive enough to prohibit homoglyphs. I'm not sure
> if UTS-39, profile 2, "Highly Restrictive", solves this problem,
> but it's a step in the right direction. This limits mixing of scripts
> in a single identifier; you can't mix Hebrew and ASCII, for example,
> which prevents problems with mixing right to left and left to right
> scripts. Domain names have similar restrictions.
>
> We have to have visually unique identifiers.
As others stated before, this is unlikely to become a problem in practice.
Project-internal standards will usually define a specific language for a
project, in which case these issues will not arise. In general, programmers
from a specific language/script background will stick to that script and not
magically start typing foreign characters. And projects where multiple
languages are involved will have to define a target language anyway, most
likely (although not necessarily) English.
Note that adherence to a specific script can easily checked programmatically
through Unicode ranges - if the need ever arises.
Stefan
More information about the Python-list
mailing list