Unicode normalisation [was Re: [beginner] What's wrong?]

Chris Angelico rosuav at gmail.com
Fri Apr 8 00:54:00 EDT 2016


On Fri, Apr 8, 2016 at 2:43 PM, Rustom Mody <rustompmody at gmail.com> wrote:
> No I am not clever/criminal enough to know how to write a text that is visually
> close to
> print "Hello World"
> but is internally closer to
> rm -rf /
>
> For me this:
>  >>> Α = 1
>>>> A = 2
>>>> Α + 1 == A
> True
>>>>
>
>
> is cure enough that I am not amused

To me, the above is a contrived example. And you can contrive examples
that are just as confusing while still being ASCII-only, like
swimmer/swirnmer in many fonts, or I and l, or any number of other
visually-confusing glyphs. I propose that we ban the letters 'r' and
'l' from identifiers, to ensure that people can't mess with
themselves.

> Specifically as far as I am concerned if python were to throw back say
> a ligature in an identifier as a syntax error -- exactly what python2 does --
> I think it would be perfectly fine and a more sane choice

The ligature is handled straight-forwardly: it gets decomposed into
its component letters. I'm not seeing a problem here.

ChrisA



More information about the Python-list mailing list