[Python-ideas] .pyu nicode syntax symbols (was Re: Empty set, Empty dict)
Terry Reedy
tjreedy at udel.edu
Sun Jun 22 22:18:57 CEST 2014
Problem: For years, various people have suggested that they would like
to use syntactically significant unicode symbols in Python code. A prime
example is using U+2205, EMPTY SET, ∅, instead of 'set()'. On the other
hand, the conservative, overwhelmed core development group is not much
interested and would rather do other things.
Solution: Act instead of ask.
One or more of the people who really want this could get themselves
together and produce a working system. (If multiple people, ask for a
new sig and mailing list).
1. Ask core development to reserve '.pyu' for python with unicode
symbolds. (If refused, chose something else.)
2. Write pyu.py. It should first translate x.pyu to the equivalent x.py.
If x.py exists, check the date (at with .py and .pyc). Optionally, but
probably by default, run x.py.
Translation requires two operations: masking comments and string
literals from translation and translating the remainder. I personally
would start by doing the two operations separately, with separately
testable functions.
def codechunk(unisymcode):
'''Yield code_or_not, code_chunk pairs for code with unicode symbols.
Chunks are comments or string literals (code_or_not == False),
and code that might have unicode symbols that need translation
'code_or_not' == True).
'''
<Simplified parser, possibly derived from tokenize.tokenize(),
which already knows how to recognize comments and strings.>
unisym = <dict mapping unicode ordinals to ascii replacements>
def unisym2ascii(unisymcode):
blocklist = []
for code, block in codeblocks(unisymcode):
if code:
block = block.translate(unisym)
blocklist.append(block)
return ''.join(blocklist)
3. Upload pyu.py to PyPI, *along with instructions on the various ways
to enter unicode symbols on various systems*. Announce and promote.
On 6/22/2014 10:41 AM, Philipp A. wrote:
> if people are too lazy to find a input method that works for them (Alt
> Gr, compose key, copy&paste), they should just continue to type ASCII,
> and leave the more elegant unicode variants for others.
Being snarky can be fun, but if I wrote and distributed pyu.py, I would
want as many users as possible.
> ∅ and λ seem like good ideas to me as un-redefinable empty
> set literal and shorter/more elegant lambda. And “…” for “Ellipsis”.
>
> there’s also ∀, ¬, ×, ∧,∨, ∩, ∪, ∈, ∉, ≠, ≡, ≤, and ≥, but i think those
> are a bit much:
I think the unisym dict should be inclusive and let people choose to use
the symbols they want. I suspect I use ≤ and ≥ b sooner than λ. A
mathematician that used most of those symbols, for a math audience,
could still use the ascii tranlation for other audiences.
On 6/22/2014 11:01 AM, MRAB wrote:
> λ is a valid identifier in Python 3 because it's a letter.
Overall, I see this as less of a problem than the possibility of
rebinding builtin names. The program could have a 'translate_lambda'
(default True) parameter. But I would be willing to say that if you use
unicode symbols, then you cannot also use λ as an identifier. (If one
did, the resulting .py would stop with SyntaxError where 'lambda'
repladed identifier λ.)
--
Terry Jan Reedy
More information about the Python-ideas
mailing list