Benefits of unicode identifiers (was: Allow additional separator in identifiers)

Peter J. Holzer hjp at hjp.at
Mon Nov 27 14:35:44 EST 2017


On 2017-11-24 04:52:57 +0100, Mikhail V wrote:
> On Fri, Nov 24, 2017 at 4:13 AM, Chris Angelico <rosuav at gmail.com> wrote:
> > On Fri, Nov 24, 2017 at 1:44 PM, Mikhail V <mikhailwas at gmail.com> wrote:
> >> From my above example, you could probably see that I prefer somewhat
> >> middle-sized identifiers, one-two syllables. And naturally, they tend to
> >> reflect some process/meaining, it is not always achievable,
> >> but yes there is such a natural tendency, although by me personally
> >> not so strong, and quite often I use totally meaningless names,
> >> mainly to avoid visual similarity to already created names.
> >> So for very expanded names, it ends up with a lot of underscores :(
> >
> > Okay. So if it makes sense for you to use English words instead of
> > individual letters, since you are fluent in English, does it stand to
> > reason that it would make sense for other programmers to use Russian,
> > Norwegian, Hebrew, Korean, or Japanese words the same way?
> 
> I don't know. Probably, especially if those *programmers* don't know latin
> letters, then they would want to write code with their letters and their
> language. This target group, as I said, will have really hard time
> with programming,

I don't think that's the target group. As you say, if you don't know
latin letters you'll have a hard time with Python (or almost any
programming language). You can't read the keywords or the standard
library function names.

I think the target group is people who can read the latin alphabet and
probably also at least a bit of English, but who are working on in-house
projects.

As a very simple example, many years ago, when I was still at the
university, we decided that we needed a program to manage our students.
So we got some students to write one ;-). As a general rule, identifiers
and comments for all projects had to be in English, which generally made
a lot of sense since we collaborated with institutes in other countries.
But for that project that rule wasn't really appropriate, as we noticed
when one of the students asked us what "Matrikelnummer" is in English.
Nobody knew, so we consulted a dictionary and apparently it's
"enrollment number". Simple enough, but is that intelligible to all
English speakers or is it specific to British universities? And even
worse - whoever is going to maintain that code would be either a staff
member or a student of our institute - they would certainly know what a
"Matrikelnummer" is, but would they understand that enrollment_number is
supposed to contain that? So we decided that domain specific jargon
should not be translated. A bit of bilingual mishmash (first_name and
course_title, but matrikelnummer and kennummer) was better than using
words that knowbody understood.

Now that particular word doesn't contain any non-ASCII characters and
German has only 4 letters not in ASCII, and for all of them there are
official ASCII substitutes, so writing German words in ASCII isn't a
problem.

But for languages with non-latin alphabets (or just a higher density of
accented letters) that's different. If my native language was Russian
and I was writing some in-house application for a Russian company which
contained a lot of Russian company jargon which can't be easily
translated to English (and back), I'm quite sure that I would prefer to
write that jargon in cyrillic and not in some transliteration.

> and in Python in particular, because they will be not only forced to learn
> some english, but also will have all 'pleasures' of  multi-script editing.
> But wait, probably one can write python code in, say Arabic script *only*?
> How about such feature proposal?

There is source filter which lets you write Perl in traditional Chinese.
This even changes the syntax to be closer to Chinese syntax. There is
also one which lets you write Perl in Latin (obviously that uses the
Latin alphabet, but it changes the syntax even more). Don't know whether
something like this is possible in Python, but arguably the result
wouldn't be Python any more (just like Lingua::Romana::Perligata isn't
really Perl - it just happens to be implemented using the Perl
interpreter).


> Ok, so we return back to my original question: apart from
> ability to do so, how beneficial is it on a pragmatical basis?

When I use German identifiers (which I generally don't) I do use
umlauts. When I need to do some physical computations, I might use greek
letters (or maybe not - as a vim user I can type Δt easily enough, but
can the colleague using PyCharm on Windows? I have no idea). So for me
the benefit is rather small. But as I said, German is almost
ASCII-compatible.

        hp

-- 
   _  | Peter J. Holzer    | we build much bigger, better disasters now
|_|_) |                    | because we have much more sophisticated
| |   | hjp at hjp.at         | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-list/attachments/20171127/0ea68e1d/attachment.sig>


More information about the Python-list mailing list