Using non-ascii symbols

Wed Jan 25 18:43:21 EST 2006

On Tue, 24 Jan 2006 04:09:00 +0100
Christoph Zwerschke <cito at online.de> wrote:
> On the page
> http://wiki.python.org/moin/Python3%2e0Suggestions
> I noticed an interesting suggestion:
> 
> "These operators â‰¤ â‰¥ â‰  should be added to the
> language having the  following meaning:
> 
>        <= >= !=
> 
> this should improve readibility (and make language more
> accessible to  beginners).
> 
> This should be an evolution similar to the digraphe and
> trigraph  (digramme et trigramme) from C and C++
> languages."
> 
> How do people on this group feel about this suggestion?

In principle, and in the long run, I am definitely for it.

Pragmatically, though, there are still a lot of places
where it would cause me pain. For example, it exposes
problems even in reading this thread in my mail client
(which is ironic, considering that it manages to correctly
render Russian and Japanese spam messages. Grrr.).

OTOH, there will *always* be backwards systems, so you
can't wait forever to move to using newer features.

> The symbols above are not even latin-1, you need utf-8.

> And while they are better readable, they are not better
> typable (at  least with most current editors).

They're not that bad. I manage to get kana and kanji working
correctly when I really need them.

> Are there similar attempts in other languages? I can only
> think of APL,  but that was a long time ago.

I'm pretty sure that there are. The idea of adding UTF8 for
use in identifiers and stuff has been around for awhile for
Python.  I'm pretty sure you can do this already in Java,
can't you?  (I think I read this somewhere, but I don't
think it gets used much).

> Once you open your mind for using non-ascii symbols, I'm
> sure one can  find a bunch of useful applications.
> Variable names could be allowed to  be non-ascii, as in
> XML. Think class names in Arabian... Or you could  use
> Greek letters if you run out of one-letter variable names,
> just as  Mathematicians do. Would this be desirable or
> rather a horror scenario?  Opinions?

Greek letters would be a real relief in writing scientific
software. There's something deeply annoying about variables
named THETA, theta, and Theta.  Or "w" meaning "omega.

People coming from other programming backgrounds may object
that these uses are less informative. But in the sciences,
some of these symbols have as much recognizability as "+" or
"$" do to other people.  Reading math notation from a
scientists, I can be pretty darned certain that "c" is "the
speed of light" or that "epsilon" is a small, allowable
variation in a variable. And so on. It's true that there are
occasionable problems when problem domains merge, but that's
true of words, too.

It would also reduce the difficulty of going back and forth
between the paper describing the math, and the program
using it.

One thing that I also think would be good is to open up the
operator set for Python. Right now you can overload the
existing operators, but you can't easily define new ones.
And even if you do, you are very limited in what you can
use, and understandability suffers.

But unicode provides codeblocks for operators that
mathematicians use for special operators ("circle-times"
etc).  That would both reduce confusion for people bothered
by weird choices of overloading "*" and "+" and allow people
who need these features the ability to use them.

It's also relevant that scientists in China and Saudi Arabia
probably use a roman "c" for the speed of light, or a "mu"
to represent a mass, so it's likely more understandable
internationally than using, say "lightspeed" and "mass".

OTOH, using identifiers in many different languages would
have the opposite effect. Right now, English is accepted as
a lingua franca for programming (and I admit that as a
native speaker of English, I benefit from that), but if it
became common practice to use lots of different languages,
cooperation might suffer.

But then, that's probably why English still dominates with
Java.  I suspect that just means people wouldn't use it as
much.  And I've certainly dealt with source code commented
in Spanish or German.  It didn't kill me.

So, I'd say that in the long run:

1) Yes it will be adopted

2) The math and greek-letter type symbols will be the big
win

3) Localized variable names will be useful to some people,
but not widely popular, especially for cooperative free
software projects (of course, in the Far East, for example,
han character names might become very popular as they span
several languages).  But I bet it will remain underused so
long as English remains the most popular international trade
language.

In the meantime, though, I predict many luddites will
scream "But it doesn't work on my vintage VT-220 terminal!"
(And I may even be one of them).

Cheers,
Terry

-- 
Terry Hancock (hancock at AnansiSpaceworks.com)
Anansi Spaceworks http://www.AnansiSpaceworks.com