[Python-ideas] Visually confusable unicode characters in identifiers

Mathias Panzenböck grosser.meister.morti at gmx.net
Mon Oct 1 18:07:19 CEST 2012


I still don't understand why unicode characters are allowed at all in identifier names. Is the 
reason for this written down somewhere?

On 10/01/2012 05:43 PM, Jim Jewett wrote:
> On 9/30/12, Steven D'Aprano <steve at pearwood.info> wrote:
>> On 01/10/12 00:00, Oscar Benjamin wrote:
>
>> py> A = 42
>> py> Α = 23
>> py> A == Α
>> False
>
> It will never be possible to catch all confusables, which is one
> reason that the unicode property stalled.
>
> It seems like it would be reasonable to at least warn when identifiers
> are not all in the same script -- but real-world examples from Emacs
> Lisp made it clear that this is often intentional.  There were still
> clear word-boundaries, but it wasn't clear how that word-boundary
> detection could be properly automated in the general case.
>
>> Besides, just because you and I can't distinguish A from Α in my editor,
>> using one particular choice of font, doesn't mean that the author or his
>> intended audience (Greek programmers perhaps?) can't distinguish them,
>
> In many cases, it does -- for the letters to look different requires
> an unnatural font choice, though perhaps not so extreme as the
> print-the-hex-code font.
>
>> I would welcome "confusable detection" in the standard library, possibly a
>> string method "skeleton" or some other interface to the Confusables file,
>> perhaps in unicodedata.
>
> I would too, and agree that it shouldn't be limited to identifiers.
>
> -jJ



More information about the Python-ideas mailing list