Time we switched to unicode? (was Explanation of this Python language feature?)

Rustom Mody rustompmody at gmail.com
Tue Mar 25 11:19:43 EDT 2014


On Tuesday, March 25, 2014 7:53:23 PM UTC+5:30, Steven D'Aprano wrote:
> On Tue, 25 Mar 2014 05:53:45 -0700, Rustom Mody wrote:

> > And if we had hyphen '‐' distinguished from minus '-' then we could have
> > lispish names like call‐with‐current‐continuation properly spelt. And
> > then generations of programmers will thank us for increasing their
> > debugging overtime!!

> :-)

> Full Unicode support in a language is, alas, a double-edged sword. While 
> it has advantages, it also has disadvantages.

> py> А = 1
> py> A = А + 1
> py> assert A == А
> Traceback (most recent call last):
> AssertionError

Even with 'good' ol ASCII giving enough trouble between O and 0, 1 and
l, we certainly dont want more such headaches!

In apps, a serious consideration of unicode entails a cycle of i18n
and l10n.  The l10n for programming languages is arguably harder -- if
python is 'localized' to some (human) language L, maybe all the
builtins should be also translated?  And whats the use of that without
full translation of the docs?  Its not a pleasing thought...

Something intermediate needs to be found...

Some thoughts (quite half cooked):
1. Human -- 'natural' -- languages and math are not in the same category
If Cyrillic A gets in Roman probably shouldn't. 

2. There needs to be some intermediate binding time -- eg when the
system is installed -- when suitable char-tables are set up.
Case-in-point: Many folks in haskell-land have wanted to replace
the '\' with the 'λ'. This has not worked out so far because λ belongs to the
same unicode class as other lower-case letters. So since this is possible:

Prelude> let λ = 1
Prelude> λ
1
Prelude> 

the other more-natural-to-haskell usage is precluded.
So these classes need to be changeable and late-bindable.
Not as late as runtime but later than build-time.
Probably same as the choice of locales on a system

> While I can see the appeal of certain Unicode symbols, I really wouldn't 
> like to have to deal with code that looks like this:

> x∫2*y+∬e**3∺z≹(x+1)≽y⋝w

> If I wanted line-noise, I know where to get Perl :-)

Yes... neither Perl nor Cobol is pleasant



More information about the Python-list mailing list