[Python-ideas] .pyu nicode syntax symbols (was Re: Empty set, Empty dict)

Terry Reedy tjreedy at udel.edu
Sun Jun 22 22:18:57 CEST 2014


Problem: For years, various people have suggested that they would like 
to use syntactically significant unicode symbols in Python code. A prime 
example is using U+2205, EMPTY SET, ∅, instead of 'set()'. On the other 
hand, the conservative, overwhelmed core development group is not much 
interested and would rather do other things.

Solution: Act instead of ask.

One or more of the people who really want this could get themselves 
together and produce a working system. (If multiple people, ask for a 
new sig and mailing list).

1. Ask core development to reserve '.pyu' for python with unicode 
symbolds. (If refused, chose something else.)

2. Write pyu.py. It should first translate x.pyu to the equivalent x.py. 
If x.py exists, check the date (at with .py and .pyc). Optionally, but 
probably by default, run x.py.

Translation requires two operations: masking comments and string 
literals from translation and translating the remainder. I personally 
would start by doing the two operations separately, with separately 
testable functions.

def codechunk(unisymcode):
   '''Yield code_or_not, code_chunk pairs for code with unicode symbols.

   Chunks are comments or string literals (code_or_not == False),
   and code that might have unicode symbols that need translation
   'code_or_not' == True).
   '''
   <Simplified parser, possibly derived from tokenize.tokenize(),
   which already knows how to recognize comments and strings.>

unisym = <dict mapping unicode ordinals to ascii replacements>

def unisym2ascii(unisymcode):
   blocklist = []
   for code, block in codeblocks(unisymcode):
     if code:
       block = block.translate(unisym)
     blocklist.append(block)
   return ''.join(blocklist)

3. Upload pyu.py to PyPI, *along with instructions on the various ways 
to enter unicode symbols on various systems*. Announce and promote.


On 6/22/2014 10:41 AM, Philipp A. wrote:
> if people are too lazy to find a input method that works for them (Alt
> Gr, compose key, copy&paste), they should just continue to type ASCII,
> and leave the more elegant unicode variants for others.

Being snarky can be fun, but if I wrote and distributed pyu.py, I would 
want as many users as possible.

> ∅ and λ seem like good ideas to me as un-redefinable empty
> set literal and shorter/more elegant lambda. And “…” for “Ellipsis”.
>
> there’s also ∀, ¬, ×, ∧,∨, ∩, ∪, ∈, ∉, ≠, ≡, ≤, and ≥, but i think those
> are a bit much:

I think the unisym dict should be inclusive and let people choose to use 
the symbols they want. I suspect I use ≤ and ≥ b sooner than λ. A 
mathematician that used most of those symbols, for a math audience, 
could still use the ascii tranlation for other audiences.

On 6/22/2014 11:01 AM, MRAB wrote:
 > λ is a valid identifier in Python 3 because it's a letter.

Overall, I see this as less of a problem than the possibility of 
rebinding builtin names. The program could have a 'translate_lambda' 
(default True) parameter. But I would be willing to say that if you use 
unicode symbols, then you cannot also use λ as an identifier. (If one 
did, the resulting .py would stop with SyntaxError where 'lambda' 
repladed identifier λ.)

-- 
Terry Jan Reedy




More information about the Python-ideas mailing list