Problems of Symbol Congestion in Computer Languages

Fri Feb 18 20:01:57 EST 2011

On Fri, 18 Feb 2011 11:16:30 -0800, Westley Martínez wrote:

> Allowing non-ascii characters as operators is a silly idea simply
> because if xorg breaks, which it's very likely to do with the current
> video drivers, I'm screwed. 

And if your hard drive crashes, you're screwed too. Why stop at "xorg 
breaks"?

Besides, Windows and MacOS users will be scratching their head asking 
"xorg? Why should I care about xorg?"

Programming languages are perfectly allowed to rely on the presence of a 
working environment. I don't think general purpose programming languages 
should be designed with reliability in the presence of a broken 
environment in mind.

Given the use-cases people put Python to, it is important for the 
language to *run* without a GUI environment. It's also important (but 
less so) to allow people to read and/or write source code without a GUI, 
which means continuing to support the pure-ASCII syntax that Python 
already supports. But Python already supports non-ASCII source files, 
with an optional encoding line in the first two lines of the file, so it 
is already possible to write Python code that runs without X but can't be 
easily edited without a modern, Unicode-aware editor.

> Not only does the Linux virtual terminal not
> support displaying these special characters, but there's also no way of
> inputting them. 

That's a limitation of the Linux virtual terminal. In 1984 I used to use 
a Macintosh which was perfectly capable of displaying and inputting non-
ASCII characters with a couple of key presses. Now that we're nearly a 
quarter of the way into 2011, I'm using a Linux PC that makes entering a 
degree sign or a pound sign a major undertaking, if it's even possible at 
all. It's well past time for Linux to catch up with the 1980s.

> On top of that, these special characters require more
> bytes to display than ascii text, which would bloat source files
> unnecessarily.

Oh come on now, now you're just being silly. "Bloat source files"? From a 
handful of double-byte characters? Cry me a river!

This is truly a trivial worry:

>>> s = "if x >= y:\n"
>>> u = u"if x ≥ y:\n"
>>> len(s)
11
>>> len(u.encode('utf-8'))
12

The source code to the decimal module in Python 3.1 is 205470 bytes in 
size. It contains 63 instances of ">=" and 62 of "<=". Let's suppose 
every one of those were changed to ≥ or ≤ characters. This would "bloat" 
the file by 0.06%.

Oh the humanity!!! How will my 2TB hard drive cope?!?!

> You say we have symbol congestion, but in reality we have our own symbol
> bloat. Japanese has more or less than three punctuation marks, while
> English has perhaps more than the alphabet! The fundamental point here
> is using non-ascii operators violates the Zen of Python. It violates
> "Simple is better than complex," as well as "There should be one-- and
> preferably only one --obvious way to do it."

Define "simple" and "complex" in this context.

It seems to me that single character symbols such as ≥ are simpler than 
digraphs such as >=, simply because the parser knows what the symbol is 
after reading a single character. It doesn't have to read on to tell 
whether you meant > or >=. 

You can add complexity to one part of the language (hash tables are more 
complex than arrays) in order to simplify another part (dict lookup is 
simpler and faster than managing your own data structure in a list).

And as for one obvious way, there's nothing obvious about using a | b for 
set union. Why not a + b? The mathematician in me wants to spell set 
union and intersection as a ⋃ b ⋂ c, which is the obvious way to me (even 
if my lousy editor makes it a PITA to *enter* the symbols).

The lack of good symbols for operators in ASCII is a real problem. Other 
languages have solved it in various ways, sometimes by using digraphs (or 
higher-order symbols), and sometimes by using Unicode (or some platform/
language specific equivalent). I think that given the poor support for 
Unicode in the general tools we use, the use of non-ASCII symbols will 
have to wait until Python4. Hopefully by 2020 input methods will have 
improved, and maybe even xorg be replaced by something less sucky. 

I think that the push for better and more operators will have to come 
from the Numpy community. Further reading:

http://mail.python.org/pipermail/python-dev/2008-November/083493.html

-- 
Steven