PEP 3131: Supporting Non-ASCII Identifiers - ambiguity issues
MRAB
google at mrabarnett.plus.com
Tue May 15 19:51:40 EDT 2007
On May 15, 6:44 pm, John Nagle <n... at animats.com> wrote:
> There are really two issues here, and they're being
> confused.
>
> One is allowing non-English identifiers, which is a political
> issuer. The other is homoglyphs, two characters which look the same.
> The latter is a real problem in a language like Python with implicit
> declarations. If a maintenance programmer sees a variable name
> and retypes it, they may silently create a new variable.
>
> If Unicode characters are allowed, they must be done under some
> profile restrictive enough to prohibit homoglyphs. I'm not sure
> if UTS-39, profile 2, "Highly Restrictive", solves this problem,
> but it's a step in the right direction. This limits mixing of scripts
> in a single identifier; you can't mix Hebrew and ASCII, for example,
> which prevents problems with mixing right to left and left to right
> scripts. Domain names have similar restrictions.
>
> We have to have visually unique identifiers.
>
> There's also an issue with implementations that interface
> with other languages. Some Python implementations generate
> C, Java, or LISP code. Even CPython will call C code.
> The representation of external symbols needs to be standardized
> across those interfaces.
>
Surely it should be possible programmatically to compare the visual
appearance of the characters and highlight ones which are similar, or
colour-code various subsets when required.
More information about the Python-list
mailing list