PEP 3131: Supporting Non-ASCII Identifiers

Duncan Booth duncan.booth at invalid.invalid
Mon May 14 09:44:04 EDT 2007


Stefan Behnel <stefan.behnel-n05pAM at web.de> wrote:

>> Just to confirm that: IronPython does accept non-ascii identifiers.
>> From "Differences between IronPython and CPython":
>> 
>>> IronPython will compile files whose identifiers use non-ASCII
>>> characters if the file has an encoding comment such as "# -*-
>>> coding: utf-8 -*-".  CPython will not compile such a file in any
>>> case. 
> 
> Sounds like CPython would better follow IronPython here.

I cannot find any documentation which says exactly which non-ASCII 
characters IronPython will accept. 
I would guess that it probably follows C# in general, but it doesn't 
follow C# identifier syntax exactly (in particular the leading @ to 
quote keywords is not supported).

The C# identifier syntax from http://msdn2.microsoft.com/en-us/library/aa664670(VS.71).aspx 
I think it differs from the PEP only in also allowing the Cf class of characters:

identifier:
    available-identifier
    @   identifier-or-keyword
available-identifier:
    An identifier-or-keyword that is not a keyword
identifier-or-keyword:
    identifier-start-character   identifier-part-charactersopt
identifier-start-character:
    letter-character
    _ (the underscore character U+005F) 
identifier-part-characters:
    identifier-part-character
    identifier-part-characters   identifier-part-character
identifier-part-character:
    letter-character
    decimal-digit-character
    connecting-character
    combining-character
    formatting-character
letter-character:
    A Unicode character of classes Lu, Ll, Lt, Lm, Lo, or Nl
    A unicode-escape-sequence representing a character of classes Lu, Ll, Lt, Lm, Lo, or Nl
combining-character:
    A Unicode character of classes Mn or Mc
    A unicode-escape-sequence representing a character of classes Mn or Mc
decimal-digit-character:
    A Unicode character of the class Nd
    A unicode-escape-sequence representing a character of the class Nd
connecting-character:
    A Unicode character of the class Pc
    A unicode-escape-sequence representing a character of the class Pc
formatting-character:
    A Unicode character of the class Cf
    A unicode-escape-sequence representing a character of the class Cf

For information on the Unicode character classes mentioned above, see 
The Unicode Standard, Version 3.0, section 4.5.



More information about the Python-list mailing list