[beginner] What's wrong?

Sun Apr 3 10:30:47 EDT 2016

On Sunday, April 3, 2016 at 5:17:36 PM UTC+5:30, Thomas 'PointedEars' Lahn wrote:
> Rustom Mody wrote:
> 
> > On Saturday, April 2, 2016 at 10:42:27 PM UTC+5:30, Thomas 'PointedEars'
> > Lahn wrote:
> >> Marko Rauhamaa wrote:
> >> > Steven D'Aprano :
> >> >> So you're saying that learning to be a fluent speaker of English is a
> >> >> pre-requisite of being a programmer?
> >> > 
> >> > No more than learning Latin is a prerequisite of being a doctor.
> >> 
> >> Full ACK.  Probably starting with the Industrial Revolution enabled by
> >> the improvements of the steam machine in England, English has become the
> >> /lingua franca/ of technology (even though the French often still
> >> disagree, preferring words like « ordinateur » and « octet » over
> >> “computer” and
> >> “byte”, respectively¹).  (With the Internet at the latest, then, it has
> >> also become the /lingua franca/ of science, although Latin terms are
> >> common in medicine.)
> > 
> > IMHO the cavalier usage of random alphabet-soup for identifiers
> 
> Straw man.  Nobody has suggested that.  Suggested were words in natural 
> languages other than English as (parts of) names in Python programs.
> 
> The suggestion was rejected by some (including me) on the grounds that 
> source code is not written only for the person writing it, but also for 
> other developers to read, and that English is the /lingua franca/ of 
> software development at least.  So it is reasonable to expect a software 
> developer to understand English, and more software developers are going to 
> understand the source code if it is written in English.
> 
> Another argument that was made in favor of English-language names (albeit on 
> the grounds of “nausea” instead of the logical reason of practicality) is 
> that the (Python) programming language’s keywords (e.g., False, None, True, 
> and, as, assert [1]) and built-in identifiers (e.g., NotImplemented, 
> Ellipsis, abs, all, int, float, complex, iterator [2]) are (abbreviations or 
> concatenations of) *English* words; therefore, mixing keywords with names
> in a natural language other than English causes source code to be more 
> difficult to read than an all-English source code (string values 
> notwithstanding).  This is particularly true with Python because a lot of 
> (well-written) Python code can easily be read as if it were pseudocode.  (I 
> would not be surprised at all to learn that this was Guido van Rossum’s 
> intention.)
> 
> As for the “Chinese” argument, I did some research recently, indicating that 
> it is a statistical fallacy:
> 
> <http://yaleglobal.yale.edu/content/english-craze-hits-chinese-language-standards>
> <http://yaleglobal.yale.edu/content/asians-offer-region-lesson-%E2%80%93-english>
> 
> From personal experience, I can say that I had no great difficulty 
> communicating in English with my Chinese flatmates and classmates at a 
> German technical university when all of us were studying computer science 
> there 16 years ago.  It was natural.  At least the boys even preferred self-
> chosen English first names for themselves (e.g., in instant messaging) 
> since, as they explained to me, their original names were difficult to 
> pronounce correctly for Europeans (or Europeans might mistakenly call them 
> by their family name since it would come first), and to type on European 
> keyboards (although I observed them to be proficient in using IMEs when 
> chatting with their folks back home).
> 
> ____________
> [1] <https://docs.python.org/3/reference/lexical_analysis.html#identifiers>
> [2] <https://docs.python.org/3/library/>
> 
> > can lead to worse than just aesthetic unpleasantness:
> > https://en.wikipedia.org/wiki/IDN_homograph_attack
> 
> Relevance?
> 
> > When python went to full unicode identifers it should have also added
> > pragmas for which blocks the programmer intended to use -- something like
> > a charset declaration of html.
> > 
> > This way if the programmer says "I want latin and greek"
> > and then A and Α get mixed up well he asked for it.
> > If he didn't ask then springing it on him seems unnecessary and uncalled
> > for
> 
> Nonsense.

Some misunderstanding of what I said it looks
[Guessing also from Marko's "...silly..."]

So here are some examples to illustrate what I am saying:

Example 1 -- Ligatures:

Python3 gets it right
>>> ﬂag = 1
>>> flag
1

Whereas haskell gets it wrong:
Prelude> let ﬂag = 1
Prelude> flag

<interactive>:3:1: Not in scope: ‘flag’
Prelude> ﬂag
1
Prelude> 

Example 2 Case Sensitivity
Scheme¹ gets it right

> (define a 1)
> A
1
> a
1

Python gets it wrong
>>> a=1
>>> A
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'A' is not defined
>>> a

[Likewise filenames windows gets right; Unix wrong]

Unicode Identifiers in the spirit of IDN homograph attack.
Every language that 'supports' unicode gets it wrong

Python3
>>> A=1
>>> Α
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'Α' is not defined
>>> A
1

Can you make out why A both is and is not defined?

When the language does not support it eg python2 the behavior is better

>>> A=1
>>> Α
  File "<stdin>", line 1
    Α
    ^
SyntaxError: invalid syntax
>>> A
1
>>> 

So whats the point?
The notion of 'variable' in programming language is inherently based on that of
'identifier'.
With ASCII the problems are minor: Case-distinct identifiers are distinct --
they dont IDENTIFY. This contradicts standard English usage and practice
buts its not such a big deal
With Unicode there are zillions of look-alike that eg A and Α and А that are identical
until you ferret out the details.

A language that allows this without some red flag is storing future grief for
unsuspecting programmers.

¹ Ironically upto R5RS version of scheme this was true
Thereafter Unix nerds have won over good old standard English practice