Time we switched to unicode? (was Explanation of this Python language feature?)

Tue Mar 25 02:35:50 EDT 2014

On Tuesday, March 25, 2014 11:42:50 AM UTC+5:30, Chris Angelico wrote:
> On Tue, Mar 25, 2014 at 4:47 PM, Steven D'Aprano wrote:
> > On Tue, 25 Mar 2014 14:57:02 +1100, Chris Angelico wrote:
> >> No, I'm not missing that. But the human brain is a tokenizer, just as
> >> Python is. Once you know what a token means, you comprehend it as that
> >> token, and it takes up space in your mind as a single unit. There's not
> >> a lot of readability difference between a one-symbol token and a
> >> one-word token.
> > Hmmm, I don't know about that. Mathematicians are heavy users of symbols.
> > Why do they write ∀ instead of "for all", or ⊂ instead of "subset"?
> > Why do we write "40" instead of "forty"?

> Because the shorter symbols lend themselves better to the
> "super-tokenization" where you don't read the individual parts but the
> whole. The difference between "40" and "forty" is minimal, but the
> difference between "86400" and "eighty-six thousand [and] four
> hundred" is significant; the first is a single token, which you could
> then instantly recognize as the number of seconds in a day (leap
> seconds aside), but the second is a lengthy expression.

> There's also ease of writing. On paper or blackboard, it's really easy
> to write little strokes and curvy lines to mean things, and to write a
> bolded letter R to mean "Real numbers". In Python, it's much easier to
> use a few more ASCII letters than to write ⊂ ℝ.

> >> Also, since the human brain works largely with words,
> > I think that's a fairly controversial opinion. The Chinese might have
> > something to say about that.

> Well, all the people I interviewed (three of them: me, myself, and I)
> agree that the human brain works with words. My research is 100%
> scientific, and is therefore unassailable. So there. :)

> > I think that heavy use of symbols is a form of Huffman coding -- common
> > things should be short, and uncommon things longer. Mathematicians tend
> > to be *extremely* specialised, so they're all inventing their own Huffman
> > codings, and the end result is a huge number of (often ambiguous) symbols.

> Yeah. That's about the size of it. Usually, each symbol has some read
> form; "ℕ ⊂ ℝ" would be read as "Naturals are a subset of Reals" (or
> maybe "Naturals is a subset of Reals"?), and in program code, using
> the word "subset" or "issubset" wouldn't be much worse. It would be
> some worse, and the exact cost depends on how frequently your code
> does subset comparisons; my view is that the worseness of words is
> less than the worseness of untypable symbols. (And I'm about to be
> arrested for murdering the English language.)

> > Personally, I think that it would be good to start accepting, but not
> > requiring, Unicode in programming languages. We can already write:
> > from math import pi as π
> > Perhaps we should be able to write:
> > setA ⊂ setB

> It would be nice, if subset testing is considered common enough to
> warrant it. (I'm not sure it is, but I'm certainly not sure it isn't.)
> But it violates "one obvious way". Python doesn't, as a general rule,
> offer us two ways of spelling the exact same thing. So the bar for
> inclusion would be quite high: it has to be so much better than the
> alternative that it justifies the creation of a duplicate notation.

I dont think we are anywhere near making real suggestions for real changes
which would need to talk of compatibility, portability, editor support
and all such other good stuff.

Just a bit of brainstorming to see how an alternative python would look like:

Heres a quickly made list of symbols that may be nice to have support for

×
÷
≤
≥
∧
∨
¬
π
λ 
∈
∉
⊂ 
⊃
⊆
⊇
∅
∩
∪
←
… (ellipsis instead of range)