unicode as valid naming symbols

Chris Angelico rosuav at gmail.com
Tue Apr 1 06:58:14 EDT 2014


On Tue, Apr 1, 2014 at 9:37 PM, Antoon Pardon
<antoon.pardon at rece.vub.ac.be> wrote:
> Python also uses symbols for names of operations, like '+'. And when
> someone suggested python might consider increasing the number of
> operations and gave some symbols for those extra operations, nobody
> suggested that would make python unreadable, though it would be far
> more like the path taken by APL then what we are discussing now.

Actually, people did. But mainly the thread (look up "Time we switched
to unicode?") went off looking at how hard it'd be to type those
operators, and therefore the more serious point that there would
either be hard-to-type language elements or duplicate syntactic tokens
("lambda" as well as "λ", etc). That isn't an issue with names,
because any name has only one, well, name. If you choose to use both
"alpha" and "α" as names, that's fine, and they're distinct names. You
can make your code unreadable, and it doesn't impact my code at all.
Language-level features like operators have stronger concerns.

But because, in the future, Python may choose to create new operators,
the simplest and safest way to ensure safety is to put a boundary on
what can be operators and what can be names; Unicode character classes
are perfect for this. It's also possible that all Unicode whitespace
characters might become legal for indentation and separation (maybe
they are already??), so obviously they're ruled out as identifiers;
anyway, I honestly do not think people would want to use U+2007 FIGURE
SPACE inside a name. So if we deny whitespace, and accept letters and
digits, it makes good sense to deny mathematical symbols so as to keep
them available for operators. (It also makes reasonable sense to
*permit* mathematical symbols, thus allowing you to use them for
functions/methods, in the same way that you can use "n", "o", and "t",
but not "not"; but with word operators, the entire word has to be used
as-is before it's a collision - with a symbolic one, any instance of
that symbol inside a name will change parsing entirely. It's a
trade-off, and Python's made a decision one way and not the other.)

ChrisA



More information about the Python-list mailing list