Problems of Symbol Congestion in Computer Languages

Fri Feb 18 21:14:32 EST 2011

On Sat, 2011-02-19 at 01:01 +0000, Steven D'Aprano wrote:
> On Fri, 18 Feb 2011 11:16:30 -0800, Westley Martínez wrote:
> 
> > Allowing non-ascii characters as operators is a silly idea simply
> > because if xorg breaks, which it's very likely to do with the current
> > video drivers, I'm screwed. 
> 
> And if your hard drive crashes, you're screwed too. Why stop at "xorg 
> breaks"?
Because I can still edit text files in the terminal.

I guess you could manually control the magnet in the hard-drive if it
failed but that'd be horrifically tedious.

> Besides, Windows and MacOS users will be scratching their head asking 
> "xorg? Why should I care about xorg?"
Why should I care if my programs run on Windows and Mac? Because I'm a
nice guy I guess....

> Programming languages are perfectly allowed to rely on the presence of a 
> working environment. I don't think general purpose programming languages 
> should be designed with reliability in the presence of a broken 
> environment in mind.
> 
> Given the use-cases people put Python to, it is important for the 
> language to *run* without a GUI environment. It's also important (but 
> less so) to allow people to read and/or write source code without a GUI, 
> which means continuing to support the pure-ASCII syntax that Python 
> already supports. But Python already supports non-ASCII source files, 
> with an optional encoding line in the first two lines of the file, so it 
> is already possible to write Python code that runs without X but can't be 
> easily edited without a modern, Unicode-aware editor.
> 
> > Not only does the Linux virtual terminal not
> > support displaying these special characters, but there's also no way of
> > inputting them. 
> 
> That's a limitation of the Linux virtual terminal. In 1984 I used to use 
> a Macintosh which was perfectly capable of displaying and inputting non-
> ASCII characters with a couple of key presses. Now that we're nearly a 
> quarter of the way into 2011, I'm using a Linux PC that makes entering a 
> degree sign or a pound sign a major undertaking, if it's even possible at 
> all. It's well past time for Linux to catch up with the 1980s.

I feel it's unnecessary for Linux to "catch up" simply because we have
no need for these special characters! When I read Python code, I only
see text from Latin-1, which is easy to input and every *decent* font
supports it. When I read C code, I only see text from Latin-1. When I
read code from just about everything else that's plain text, I only see
text from Latin-1. Even Latex, which is designed for typesetting
mathematical formulas, only allows ASCII in its input. Languages that
accept non-ASCII input have always been somewhat esoteric. There's
nothing wrong with being different, but there is something wrong in
being so different that your causing problems or at least speed bumps
for particular users.

> > On top of that, these special characters require more
> > bytes to display than ascii text, which would bloat source files
> > unnecessarily.
> 
> Oh come on now, now you're just being silly. "Bloat source files"? From a 
> handful of double-byte characters? Cry me a river!
> 
> This is truly a trivial worry:
> 
> >>> s = "if x >= y:\n"
> >>> u = u"if x ≥ y:\n"
> >>> len(s)
> 11
> >>> len(u.encode('utf-8'))
> 12
> 
> 
> The source code to the decimal module in Python 3.1 is 205470 bytes in 
> size. It contains 63 instances of ">=" and 62 of "<=". Let's suppose 
> every one of those were changed to ≥ or ≤ characters. This would "bloat" 
> the file by 0.06%.
> 
> Oh the humanity!!! How will my 2TB hard drive cope?!?!
A byte saved is a byte earned. What about embedded systems trying to
conserve as much resources as possible?

> > You say we have symbol congestion, but in reality we have our own symbol
> > bloat. Japanese has more or less than three punctuation marks, while
> > English has perhaps more than the alphabet! The fundamental point here
> > is using non-ascii operators violates the Zen of Python. It violates
> > "Simple is better than complex," as well as "There should be one-- and
> > preferably only one --obvious way to do it."
> 
> Define "simple" and "complex" in this context.
> 
> It seems to me that single character symbols such as ≥ are simpler than 
> digraphs such as >=, simply because the parser knows what the symbol is 
> after reading a single character. It doesn't have to read on to tell 
> whether you meant > or >=. 
> 
> You can add complexity to one part of the language (hash tables are more 
> complex than arrays) in order to simplify another part (dict lookup is 
> simpler and faster than managing your own data structure in a list).
I believe dealing with ASCII is simpler than dealing with Unicode, for
reasons on both the developer's and user's side.

> And as for one obvious way, there's nothing obvious about using a | b for 
> set union. Why not a + b? The mathematician in me wants to spell set 
> union and intersection as a ⋃ b ⋂ c, which is the obvious way to me (even 
> if my lousy editor makes it a PITA to *enter* the symbols).
Not all programmers are mathematicians (in fact I'd say most aren't). I
know what those symbols mean, but some people might think "a u b n c ...
what?" | actually makes sense because it relates to bitwise OR in which
bits are turned on. Here's an example just for context:

01010101 | 10101010 = 11111111
{1, 2, 3} | {4, 5, 6} = {1, 2, 3, 4, 5, 6}

For me, someone who is deeply familiar with bitwise operations but not
very familiar with sets, I found the set syntax to be quite easy to
understand. 

> The lack of good symbols for operators in ASCII is a real problem. Other 
> languages have solved it in various ways, sometimes by using digraphs (or 
> higher-order symbols), and sometimes by using Unicode (or some platform/
> language specific equivalent). I think that given the poor support for 
> Unicode in the general tools we use, the use of non-ASCII symbols will 
> have to wait until Python4. Hopefully by 2020 input methods will have 
> improved, and maybe even xorg be replaced by something less sucky. 
> 
> I think that the push for better and more operators will have to come 
> from the Numpy community. Further reading:
> 
> 
> http://mail.python.org/pipermail/python-dev/2008-November/083493.html
> 
> 
> -- 
> Steven
> 

You have provided me with some well thought out arguments and have
stimulated my young programmer's mind, but I think we're coming from
different angles. You seem to come from a more math-minded, idealist
angle, while I come from a more practical angle. Being a person who has
had to deal with the í in my last name and Japanese text on a variety of
platforms, I've found the current methods of non-ascii input to be
largely platform-dependent and---for lack of a better word---crappy,
i.e. not suitable for a 'wide-audience' language like Python.