Problems of Symbol Congestion in Computer Languages

Sat Feb 19 02:41:20 EST 2011

On Sat, 2011-02-19 at 06:29 +0000, Steven D'Aprano wrote:
> On Fri, 18 Feb 2011 18:14:32 -0800, Westley Martínez wrote:
> 
> >> Besides, Windows and MacOS users will be scratching their head asking
> >> "xorg? Why should I care about xorg?"
> > Why should I care if my programs run on Windows and Mac? Because I'm a
> > nice guy I guess....
> 
> Python is a programming language that is operating system independent, 
> and not just a Linux tool. So you might not care about your Python 
> programs running on Windows, but believe me, the Python core developers 
> care about Python running on Windows and Mac OS. (Even if sometimes their 
> lack of resources make Windows and Mac somewhat second-class citizens.)

You didn't seem to get my humor. It's ok; most people don't.

> >> That's a limitation of the Linux virtual terminal. In 1984 I used to
> >> use a Macintosh which was perfectly capable of displaying and inputting
> >> non- ASCII characters with a couple of key presses. Now that we're
> >> nearly a quarter of the way into 2011, I'm using a Linux PC that makes
> >> entering a degree sign or a pound sign a major undertaking, if it's
> >> even possible at all. It's well past time for Linux to catch up with
> >> the 1980s.
> > 
> > I feel it's unnecessary for Linux to "catch up" simply because we have
> > no need for these special characters!
> 
> Given that your name is Westley Martínez, that's an astonishing claim! 
> How do you even write your name in your own source code???
> 
> Besides, speak for yourself, not for "we". I have need for them.

The í is easy to input. (Vim has a diacritic feature) It's the funky
mathematical symbols that are difficult.

> > When I read Python code, I only
> > see text from Latin-1, which is easy to input 
> 
> Hmmm. I wish I knew an easy way to input it. All the solutions I've come 
> across are rubbish. How do you enter (say) í at the command line of a 
> xterm?

I use this in my xorg.conf:

Section "InputDevice"
	Identifier  "Keyboard0"
	Driver      "kbd"
    Option      "XkbLayout"     "us"
    Option      "XkbVariant"    "dvorak-alt-intl"
EndSection

Simply remove 'dvorak-' to get qwerty. It allows you to use the right
Alt key as AltGr. For example:
AltGr+' i = í
AltGr+c = ç
AltGr+s = ß

I don't work on Windows or Mac enough to have figured out how to do on
those platforms, but I'm sure there's a simple way.
Again, it's the funky symbols that would be difficult to input.

> But in any case, ASCII != Latin-1, so you're already using more than 
> ASCII characters.
> 
> 
> > Languages that
> > accept non-ASCII input have always been somewhat esoteric.
> 
> Then I guess Python is esoteric, because with source code encodings it 
> supports non-ASCII literals and even variables:
> 
> [steve at sylar ~]$ cat encoded.py
> # -*- coding: utf-8 -*-
> résumé = "Some text here..."
> print(résumé)
> 
> [steve at sylar ~]$ python3.1 encoded.py
> Some text here...

I should reword that to "Languages that require non-ASCII input have
always been somewhat esoteric" i.e. APL.

> [...]
> > A byte saved is a byte earned. What about embedded systems trying to
> > conserve as much resources as possible?
> 
> Then they don't have to use multi-byte characters, just like they can 
> leave out comments, and .pyo files, and use `ed` for their standard text 
> editor instead of something bloated like vi or emacs.

Hey, I've heard of jobs where all you do is remove comments from source
code, believe it or not!

> [...]
> > I believe dealing with ASCII is simpler than dealing with Unicode, for
> > reasons on both the developer's and user's side.
> 
> Really? Well, I suppose if you want to define "you can't do this AT ALL" 
> as "simpler", then, yes, ASCII is simpler.
> 
> Using pure-ASCII means I am forced to write extra code because there 
> aren't enough operators to be useful, e.g. element-wise addition versus 
> concatenation. It means I'm forced to spell out symbols in full, like 
> "British pound" instead of £, and use legally dubious work-arounds like 
> "(c)" instead of ©, and mispell words (including people's names) because 
> I can't use the correct characters, and am forced to use unnecessarily 
> long and clumsy English longhand for standard mathematical notation.
> 
> If by simple you mean "I can't do what I want to do", then I agree 
> completely that ASCII is simple.

I guess it's a matter of taste. I don't mind seeing my name as
westley_martinez and am so use to seeing **, sqrt(), and / that seeing
the original symbols is a bit foreign!

> >> And as for one obvious way, there's nothing obvious about using a | b
> >> for set union. Why not a + b? The mathematician in me wants to spell
> >> set union and intersection as a ⋃ b ⋂ c, which is the obvious way to me
> >> (even if my lousy editor makes it a PITA to *enter* the symbols).
> > 
> > Not all programmers are mathematicians (in fact I'd say most aren't). I
> > know what those symbols mean, but some people might think "a u b n c ...
> > what?" | actually makes sense because it relates to bitwise OR in which
> > bits are turned on.
> 
> Not all programmers are C programmers who have learned that | represents 
> bitwise OR. Some will say "a | b ... what?". I know I did, when I was 
> first learning Python, and I *still* need to look them up to be sure I 
> get them right.
> 
> In other languages, | might be spelled as any of
> 
> bitor() OR .OR. || ∧

Good point, but C is a very popular language.
I'm not saying we should follow C, but we should be aware that that's
where the majority of Python's users are probably coming from (or from
languages with C-like syntax)

> [...]
> > Being a person who has
> > had to deal with the í in my last name and Japanese text on a variety of
> > platforms, I've found the current methods of non-ascii input to be
> > largely platform-dependent and---for lack of a better word---crappy,
> 
> Agreed one hundred percent! Until there are better input methods for non-
> ASCII characters, without the need for huge keyboards, Unicode is hard 
> and ASCII easy, and Python can't *rely* on Unicode tokens.
> 
> That doesn't mean that languages like Python can't support Unicode 
> tokens, only that they shouldn't be the only way to do things. For a long 
> time Pascal include (* *) as a synonym for { } because not all keyboards 
> included the { } characters, and C has support for trigraphs:
> 
> http://publications.gbdirect.co.uk/c_book/chapter2/alphabet_of_c.html
> 
> Eventually, perhaps in another 20 years, digraphs like != and <= will go 
> the same way as trigraphs. Just as people today find it hard to remember 
> a time when keyboards didn't include { and }, hopefully they will find it 
> equally hard to remember a time that you couldn't easily enter non-ASCII 
> characters.
> 
> 
> -- 
> Steven

That was good info. I think there is possibility for more symbols, but
not for a long while, and I'll probably never use them if they do become
available, because I don't really care.