Problems of Symbol Congestion in Computer Languages

Sat Feb 19 01:29:39 EST 2011

On Fri, 18 Feb 2011 18:14:32 -0800, Westley Martínez wrote:

>> Besides, Windows and MacOS users will be scratching their head asking
>> "xorg? Why should I care about xorg?"
> Why should I care if my programs run on Windows and Mac? Because I'm a
> nice guy I guess....

Python is a programming language that is operating system independent, 
and not just a Linux tool. So you might not care about your Python 
programs running on Windows, but believe me, the Python core developers 
care about Python running on Windows and Mac OS. (Even if sometimes their 
lack of resources make Windows and Mac somewhat second-class citizens.)

>> That's a limitation of the Linux virtual terminal. In 1984 I used to
>> use a Macintosh which was perfectly capable of displaying and inputting
>> non- ASCII characters with a couple of key presses. Now that we're
>> nearly a quarter of the way into 2011, I'm using a Linux PC that makes
>> entering a degree sign or a pound sign a major undertaking, if it's
>> even possible at all. It's well past time for Linux to catch up with
>> the 1980s.
> 
> I feel it's unnecessary for Linux to "catch up" simply because we have
> no need for these special characters!

Given that your name is Westley Martínez, that's an astonishing claim! 
How do you even write your name in your own source code???

Besides, speak for yourself, not for "we". I have need for them.

> When I read Python code, I only
> see text from Latin-1, which is easy to input 

Hmmm. I wish I knew an easy way to input it. All the solutions I've come 
across are rubbish. How do you enter (say) í at the command line of a 
xterm?

But in any case, ASCII != Latin-1, so you're already using more than 
ASCII characters.

> Languages that
> accept non-ASCII input have always been somewhat esoteric.

Then I guess Python is esoteric, because with source code encodings it 
supports non-ASCII literals and even variables:

[steve at sylar ~]$ cat encoded.py
# -*- coding: utf-8 -*-
résumé = "Some text here..."
print(résumé)

[steve at sylar ~]$ python3.1 encoded.py
Some text here...

[...]
> A byte saved is a byte earned. What about embedded systems trying to
> conserve as much resources as possible?

Then they don't have to use multi-byte characters, just like they can 
leave out comments, and .pyo files, and use `ed` for their standard text 
editor instead of something bloated like vi or emacs.

[...]
> I believe dealing with ASCII is simpler than dealing with Unicode, for
> reasons on both the developer's and user's side.

Really? Well, I suppose if you want to define "you can't do this AT ALL" 
as "simpler", then, yes, ASCII is simpler.

Using pure-ASCII means I am forced to write extra code because there 
aren't enough operators to be useful, e.g. element-wise addition versus 
concatenation. It means I'm forced to spell out symbols in full, like 
"British pound" instead of £, and use legally dubious work-arounds like 
"(c)" instead of ©, and mispell words (including people's names) because 
I can't use the correct characters, and am forced to use unnecessarily 
long and clumsy English longhand for standard mathematical notation.

If by simple you mean "I can't do what I want to do", then I agree 
completely that ASCII is simple.

>> And as for one obvious way, there's nothing obvious about using a | b
>> for set union. Why not a + b? The mathematician in me wants to spell
>> set union and intersection as a ⋃ b ⋂ c, which is the obvious way to me
>> (even if my lousy editor makes it a PITA to *enter* the symbols).
> 
> Not all programmers are mathematicians (in fact I'd say most aren't). I
> know what those symbols mean, but some people might think "a u b n c ...
> what?" | actually makes sense because it relates to bitwise OR in which
> bits are turned on.

Not all programmers are C programmers who have learned that | represents 
bitwise OR. Some will say "a | b ... what?". I know I did, when I was 
first learning Python, and I *still* need to look them up to be sure I 
get them right.

In other languages, | might be spelled as any of

bitor() OR .OR. || ∧

[...]
> Being a person who has
> had to deal with the í in my last name and Japanese text on a variety of
> platforms, I've found the current methods of non-ascii input to be
> largely platform-dependent and---for lack of a better word---crappy,

Agreed one hundred percent! Until there are better input methods for non-
ASCII characters, without the need for huge keyboards, Unicode is hard 
and ASCII easy, and Python can't *rely* on Unicode tokens.

That doesn't mean that languages like Python can't support Unicode 
tokens, only that they shouldn't be the only way to do things. For a long 
time Pascal include (* *) as a synonym for { } because not all keyboards 
included the { } characters, and C has support for trigraphs:

http://publications.gbdirect.co.uk/c_book/chapter2/alphabet_of_c.html

Eventually, perhaps in another 20 years, digraphs like != and <= will go 
the same way as trigraphs. Just as people today find it hard to remember 
a time when keyboards didn't include { and }, hopefully they will find it 
equally hard to remember a time that you couldn't easily enter non-ASCII 
characters.

-- 
Steven