Unicode in Python

Steven D'Aprano steve at pearwood.info
Tue Apr 22 02:11:56 EDT 2014


On Mon, 21 Apr 2014 20:57:39 -0700, Rustom Mody wrote:


> As a unicode user (ok wannabe unicode user :D ) Ive written up some
> unicode ideas that have been discussed here in the last couple of weeks:
> 
> http://blog.languager.org/2014/04/unicoded-python.html

What you are talking about is not handling Unicode with Python, but 
extending the programming language to allow non-English *letters* to be 
used as if they were *symbols*.

That's very problematic, since it assumes that nobody would ever want to 
use non-English letters in an alphanumeric context. You write:

    [quote]
    Now to move ahead!
    We dont[sic] want

    >>> λ = 1
    >>> λ
    1

    We want

    >>> (λx : x+1)(2)
    3
    [end quote]



(Speak for yourself.) But this is a problem. Suppose I want to use a 
Greek word as a variable, as Python allows me to do:


λόγος = "a word"


Or perhaps as the parameter to a function. Take the random.expovariate 
function, which currently takes an argument "lambd" (since lambda is a 
reserved word). I might write instead:

def expovariate(self, λ): ...


After all, λ is an ordinary letter of the (Greek) alphabet, why shouldn't 
it be used in variable names? But if "λx" is syntax for "lambda x", then 
I'm going to get syntax errors:

λόγος = "a word"
=> like:  lambda όγος = "a word"

def expovariate(self, λ):
=> like:  def expovariate(self, lambda):


both of which are obviously syntax errors.

This is as hostile to Greek-using programmers as deciding that "f" should 
be reserved for functions would be to English-using programmers:

# space between the f and the function name is not needed
fspam(x, y):
    ...

class Thingy:
    f__init__(selF):
        ...
    fmethod(selF, arg):
        return arg + 1


Notice that I can't even write "self" any more, since that gives a syntax 
error. Presumable "if" is okay, as it is a keyword.

Using Unicode *symbols* rather than non-English letters is less of a 
problem, since they aren't valid in identifiers.


More comments to follow later.


-- 
Steven



More information about the Python-list mailing list