Benefits of unicode identifiers (was: Allow additional separator in identifiers)

Fri Nov 24 11:33:30 EST 2017

On Fri, Nov 24, 2017 at 8:03 AM, Chris Angelico <rosuav at gmail.com> wrote:

>> and in Python in particular, because they will be not only forced to learn
>> some english, but also will have all 'pleasures' of  multi-script editing.
>> But wait, probably one can write python code in, say Arabic script *only*?
>> How about such feature proposal?
>
> If Python supports ASCII identifiers only, people have no choice but
> to transliterate. As it is, people get to choose which is better for
> them - to transliterate or not to transliterate, that is the
> readability question.

Sure, let them choose.
Transliteration though is way more reasonable solution.

>
>> As for non-english speaker who know some English already,
>> could of course want to include identifiers in those scripts.
>> But how about libraries?
>
> If you want to use numpy, you have to understand the language of
> numpy. That's a lot of technical jargon, so even if you understand
> English, you have to learn that. So there's ultimately no difference.

That's what I'm saying. There will be anyway major parts of code in
English and pretty much every already existing modules that can
further  help the developer will be in English, like it or not.

>> Ok, so we return back to my original question: apart from
>> ability to do so, how beneficial is it on a pragmatical basis?
>> I mean, e.g. Cyrillic will introduce homoglyph issues.
>> CJK and Arabic scripts are metrically and optically incompatible with
>> latin, so such mixing will end up with messy look. So just for
>> the experiment, yes, it's fun.
>
> Does it really introduce homoglyph issues in real-world situations,
> though? Are there really cases where people can't figure out from
> context what's going on? I haven't seen that happening. Usually there
> are *entire words* (and more) in a single language, making it pretty
> easy to figure out.

The issues can be discussed long, but I have no doubt that even placing words
in two different scripts on one text line is a bad idea, not only for source
code. For mixing Cyrillic+Latin, yes, this also causes extra issues due to
homoglyphs in many cases, I know it practically from everyday work with
Cyrillic filenames, and from past experience with English-Russian textbooks.
In textbooks at least I can help it by proper layout - separating them
in tables,
or putting in quotes or bold for inline usage.

Mikhail