unicode as valid naming symbols

Tue Apr 1 04:19:27 EDT 2014

On 01-04-14 02:47, Ian Kelly wrote:
> On Mon, Mar 31, 2014 at 1:31 PM, Antoon Pardon
> <antoon.pardon at rece.vub.ac.be> wrote:
>> Op 31-03-14 19:40, Ian Kelly schreef:
>>> That was an exaggeration on my part.  It wouldn't affect my job, as I
>>> wouldn't expect to ever actually have to maintain anything like the
>>> above.  My greater point though is that it damages Python's
>>> readability for no actual gain in my view.  There is nothing useful
>>> you can do with a name that is the U+1F4A9 character that you can't do
>>> just as easily with alphanumeric identifiers like pile_of_poo (or
>>> куча_фекалий if one prefers; that's auto-translated, so don't blame me
>>> if it's a poor translation). The kinds of symbols that we're talking
>>> about here aren't part of any writing systems, and so to incorporate
>>> them in *names* as if they were is an abuse of Unicode.
>> Your argument doesn't has much weight. First of all it can be used
>> for just restricting names to the ascii range.
> I disagree.  Non-ASCII written names are useful to anybody who prefers
> not to do all their programming in English.

Symbols that carry a meaning among different languages are more useful
because they are meaningful to more people and so make the program
readable to more people.

>> Second of all I
>> think a good chosen symbolic name can be more readable than a
>> name in a character set you are not familiar with. A good chosen
>> symbol will evoke a meaning with a lot of people. A name in a
>> character set you are not familiar with is just gibberish to
>> you.
> Well, this is the path taken by APL.  It has its supporters.  It's not
> known for being readable.

No that is not the path taken by APL. AFAICS identifiers in APL are just
like identifiers in python. The path taken by APL was that there were
a lot more operators available that used non-alphanumeric characters.

AFICS APL programs tend to be unreadable because they are mostly written
in a very concise style.

I think this is more the path taken by lisp-like languages where '+' is
a name just like 'alpha' or 'r2d2'. In scheme I can just do the following.

(define √ sqrt)
(√ 4)

Which will give me the normal result. Maybe I missed it but I haven't heard
scheme being called an unreadable language.

>>> First, because while those may degrade readability, they do
>>> so in a constrained way.  A decorator application is just the @ symbol
>>> and an identifier.
>> And if abused, can totally change the working of your function. There
>> is no guarantee that the function returned, has any relation with the
>> original function. If that can't be a night mare for readability,
>> I don't know what is.
> As Terry Reedy noted, this has nothing to do with the decorator
> syntax, so it isn't much of an argument against having such syntax.

Point taken.

>>> The if-else is just three expressions separated by
>>> keywords.
>> Yes but if used unrestrained in arbitrary expressions will make those
>> expressions hard to understand.
> I don't disagree.  I hardly ever use it myself, certainly only if it
> can fit comfortably into one line, which is rare.  But it's still
> quite limited in syntactic scope.

Non alphanumeric characters in names is even more limited in syntatic
scope. It doesn't even play at the syntatic level but only at the lexical
level.

>> So what if we double the number of different characters? I don't care
>> about the number of them, I care about how meaningful they are. And
>> as you say confusion is already possible. A good programmer knows
>> how to deal with such a possible confusion, that the number of
>> cases increases, doesn't need to be a problem for those that care
>> about this.
> So tell me then, how would you deal with it?  In the case of script
> identifiers, it's often not hard to discern from context whether a
> particular character is e.g. a Latin h or a Cyrillic һ.  Assuming the
> original author wasn't being intentionally obfuscatory, if the rest of
> the identifier is Cyrillic then the character is probably also
> Cyrillic.  If it's a one-character identifier, then hopefully the rest
> of the module is consistent and you can guess from that.  If the
> identifier in question is just one symbol though, then you have a lot
> less context.

I deal with it, just the way I deal with it now. I generally trust the
programmer to know what he is doing and to have done a good faith effort
So that I don't have to worry about him having both a variable 'NO' and 'N0'
I see no reason to be more paranoïd about this just because there are more
possibilities.

> Second, at least in the case of decorators, while I don't dispute that
> they can harm readability, I think that in the majority of cases they
> actually help it.
>> But that is not a fair comparison now, is it. What you are doing here
>> is comparing actual use, to a worst case doom scenario.
> I contend that there is no scenario with arbitrary Unicode identifiers
> where readability is improved.

At this moment I see no reason to just accept this.

-- 
Antoon Pardon