[Python-ideas] allow `lambda' to be spelled λ
Neil Girdhar
mistersheik at gmail.com
Tue Jul 19 06:32:07 EDT 2016
Sounds like a bug in the lexer? Or maybe a feature request.
On Tuesday, July 19, 2016 at 3:32:39 AM UTC-4, Rustom Mody wrote:
>
>
>
> On Tuesday, July 19, 2016 at 12:39:04 PM UTC+5:30, Neil Girdhar wrote:
>>
>> One solution would be to restrict identifiers to only Unicode characters
>> in appropriate classes. The open quotation mark is in the code class for
>> punctuation, so it doesn't make sense to have it be part of an identifier.
>>
>> http://www.fileformat.info/info/unicode/category/index.htm
>>
>
> Python (3) is doing that alright as far as I can see:
> https://docs.python.org/3/reference/lexical_analysis.html#identifiers
>
> The point is that when it doesn’t fall in the classification(s) the error
> it raises suggests that the lexer is not really unicode-aware
>
>
>>
>>
>> On Tuesday, July 19, 2016 at 1:29:35 AM UTC-4, Rustom Mody wrote:
>>>
>>> On Tuesday, July 19, 2016 at 10:20:29 AM UTC+5:30, Nick Coghlan wrote:
>>>>
>>>> On 18 July 2016 at 13:41, Rustom Mody <rusto... at gmail.com> wrote:
>>>> > Do consider:
>>>> >
>>>> >>>> Α = 1
>>>> >>>> A = 2
>>>> >>>> Α + 1 == A
>>>> > True
>>>> >>>>
>>>> >
>>>> > Can (IMHO) go all the way to
>>>> > https://en.wikipedia.org/wiki/IDN_homograph_attack
>>>>
>>>> Yes, we know - that dramatic increase in the attack surface is why
>>>> PyPI is still ASCII only, even though full Unicode support is
>>>> theoretically possible.
>>>>
>>>> It's not a major concern once an attacker already has you running
>>>> arbitrary code on your system though, as the main problem there is
>>>> that they're *running arbitrary code on your system*. , That means the
>>>> usability gains easily outweigh the increased obfuscation potential,
>>>> as worrying about confusable attacks at that point is like worrying
>>>> about a dripping tap upstairs when the Brisbane River is already
>>>> flowing through the ground floor of your house :)
>>>>
>>>> Cheers,
>>>>
>>>>
>>> There was this question on the python list a few days ago:
>>> Subject: SyntaxError: Non-ASCII character
>>>
>>> Chris Angelico pointed out the offending line:
>>> wf = wave.open(“test.wav”, “rb”)
>>> (should be wf = wave.open("test.wav", "rb") instead)
>>>
>>> Since he also said:
>>> > The solution may be as simple as running "python3 script.py" rather
>>> than "python script.py".
>>>
>>> I pointed out that the python2 error was more helpful (to my eyes) than
>>> python3s
>>>
>>>
>>> Python3
>>>
>>> Traceback (most recent call last):
>>> File "<stdin>", line 1, in <module>
>>> File "/home/ariston/foo.py", line 31
>>> wf = wave.open(“test.wav”, “rb”)
>>> ^
>>> SyntaxError: invalid character in identifier
>>>
>>> Python2
>>>
>>>
>>> Traceback (most recent call last):
>>> File "<stdin>", line 1, in <module>
>>> File "foo.py", line 31
>>> SyntaxError: Non-ASCII character '\xe2' in file foo.py on line 31, but
>>> no encoding declared; see http://python.org/dev/peps/pep-0263/ for
>>> details
>>>
>>> IOW
>>> 1. The lexer is internally (evidently from the error message) so
>>> ASCII-oriented that any “unicode-junk” just defaults out to identifiers
>>> (presumably comments are dealt with earlier) and then if that lexing action
>>> fails it mistakenly pinpoints a wrong *identifier* rather than just an
>>> impermissible character like python 2
>>> combine that with
>>> 2. matrix mult (@) Ok to emulate perl but not to go outside ASCII
>>>
>>> makes it seem (to me) python's unicode support is somewhat wrongheaded.
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20160719/7dd5b573/attachment.html>
More information about the Python-ideas
mailing list