unicode as valid naming symbols

Mon Mar 31 20:47:48 EDT 2014

On Mon, Mar 31, 2014 at 1:31 PM, Antoon Pardon
<antoon.pardon at rece.vub.ac.be> wrote:
> Op 31-03-14 19:40, Ian Kelly schreef:
>> That was an exaggeration on my part.  It wouldn't affect my job, as I
>> wouldn't expect to ever actually have to maintain anything like the
>> above.  My greater point though is that it damages Python's
>> readability for no actual gain in my view.  There is nothing useful
>> you can do with a name that is the U+1F4A9 character that you can't do
>> just as easily with alphanumeric identifiers like pile_of_poo (or
>> куча_фекалий if one prefers; that's auto-translated, so don't blame me
>> if it's a poor translation). The kinds of symbols that we're talking
>> about here aren't part of any writing systems, and so to incorporate
>> them in *names* as if they were is an abuse of Unicode.
>
> Your argument doesn't has much weight. First of all it can be used
> for just restricting names to the ascii range.

I disagree.  Non-ASCII written names are useful to anybody who prefers
not to do all their programming in English.

> Second of all I
> think a good chosen symbolic name can be more readable than a
> name in a character set you are not familiar with. A good chosen
> symbol will evoke a meaning with a lot of people. A name in a
> character set you are not familiar with is just gibberish to
> you.

Well, this is the path taken by APL.  It has its supporters.  It's not
known for being readable.

>> I don't think the comparisons to decorators and the if-else operator
>> are apt.
>
> I didn't make such a comparison. I just noted the arguments against
> were similar.

That's the comparison to which I was referring.

>> First, because while those may degrade readability, they do
>> so in a constrained way.  A decorator application is just the @ symbol
>> and an identifier.
>
> And if abused, can totally change the working of your function. There
> is no guarantee that the function returned, has any relation with the
> original function. If that can't be a night mare for readability,
> I don't know what is.

As Terry Reedy noted, this has nothing to do with the decorator
syntax, so it isn't much of an argument against having such syntax.

>> The if-else is just three expressions separated by
>> keywords.
>
> Yes but if used unrestrained in arbitrary expressions will make those
> expressions hard to understand.

I don't disagree.  I hardly ever use it myself, certainly only if it
can fit comfortably into one line, which is rare.  But it's still
quite limited in syntactic scope.

>> In the case of arbitrary Unicode identifiers, we're talking
>> about approximately doubling the number of different characters (out
>> of a continuously growing set) that could be used, many of which are
>> easily confused with other characters. Of course the potential for
>> confusion already exists, but that's no justification for aggravating
>> it.
>
> So what if we double the number of different characters? I don't care
> about the number of them, I care about how meaningful they are. And
> as you say confusion is already possible. A good programmer knows
> how to deal with such a possible confusion, that the number of
> cases increases, doesn't need to be a problem for those that care
> about this.

So tell me then, how would you deal with it?  In the case of script
identifiers, it's often not hard to discern from context whether a
particular character is e.g. a Latin h or a Cyrillic һ.  Assuming the
original author wasn't being intentionally obfuscatory, if the rest of
the identifier is Cyrillic then the character is probably also
Cyrillic.  If it's a one-character identifier, then hopefully the rest
of the module is consistent and you can guess from that.  If the
identifier in question is just one symbol though, then you have a lot
less context.

>
>> Second, at least in the case of decorators, while I don't dispute that
>> they can harm readability, I think that in the majority of cases they
>> actually help it.
>
> But that is not a fair comparison now, is it. What you are doing here
> is comparing actual use, to a worst case doom scenario.

I contend that there is no scenario with arbitrary Unicode identifiers
where readability is improved.