Well, I finally ran into a Python Unicode problem, sort of

Rustom Mody rustompmody at gmail.com
Sun Jul 3 03:26:55 EDT 2016


On Sunday, July 3, 2016 at 12:29:14 PM UTC+5:30, John Ladasky wrote:
> A while back, I shared my love for using Greek letters as variable names in my Python (3.4) code -- when, and only when, they are warranted for improved readability.  For example, I like to see the following:
> 
> 
> from math import pi as π
> 
> c = 2 * π * r
> 
> 
> When I am copying mathematical formulas from publications, and Greek letters are used in that publication, I prefer to follow the text exactly as written.
> 
> Up until today, every character I've tried has been accepted by the Python interpreter as a legitimate character for inclusion in a variable name.  Now I'm copying a formula which defines a gradient.  The nabla symbol (∇) is used in the naming of gradients.  Python isn't having it.  The interpreter throws a "SyntaxError: invalid character in identifier" when it encounters the ∇.
> 
> I am now wondering what constitutes a valid character for an identifier, and how they were chosen.  Obviously, the Western alphabet and standard Greek letters work.  I just tried a few very weird characters from the Latin Extended range, and some Cyrillic characters.  These are also fine.

https://docs.python.org/3.5/reference/lexical_analysis.html
points to
https://www.dcl.hpi.uni-potsdam.de/home/loewis/table-3131.html

Quite hardwired

> 
> A philosophical question.  Why should any character be excluded from a variable name, besides the fact that it might also be an operator?
> 
> This might be a problem I can solve, I'm not sure.  Is there a file that the Python interpreter refers to which defines the accepted variable name characters?  Perhaps I could just add ∇.

You need to try something like

>>> import unicodedata as ud
>>> ud.category("∇")
'Sm'
>>> ud.category("A")
'Lu'
>>> ud.category("π")
'Ll'
>>> ud.category("a")
'Ll'

followed by figuring out why/what etc from (say)
https://en.wikipedia.org/wiki/Unicode_character_property

This is the way it IS
Not saying it SHOULD BE…



More information about the Python-list mailing list