PEP 3131: Supporting Non-ASCII Identifiers
Duncan Booth
duncan.booth at invalid.invalid
Mon May 14 09:44:04 EDT 2007
Stefan Behnel <stefan.behnel-n05pAM at web.de> wrote:
>> Just to confirm that: IronPython does accept non-ascii identifiers.
>> From "Differences between IronPython and CPython":
>>
>>> IronPython will compile files whose identifiers use non-ASCII
>>> characters if the file has an encoding comment such as "# -*-
>>> coding: utf-8 -*-". CPython will not compile such a file in any
>>> case.
>
> Sounds like CPython would better follow IronPython here.
I cannot find any documentation which says exactly which non-ASCII
characters IronPython will accept.
I would guess that it probably follows C# in general, but it doesn't
follow C# identifier syntax exactly (in particular the leading @ to
quote keywords is not supported).
The C# identifier syntax from http://msdn2.microsoft.com/en-us/library/aa664670(VS.71).aspx
I think it differs from the PEP only in also allowing the Cf class of characters:
identifier:
available-identifier
@ identifier-or-keyword
available-identifier:
An identifier-or-keyword that is not a keyword
identifier-or-keyword:
identifier-start-character identifier-part-charactersopt
identifier-start-character:
letter-character
_ (the underscore character U+005F)
identifier-part-characters:
identifier-part-character
identifier-part-characters identifier-part-character
identifier-part-character:
letter-character
decimal-digit-character
connecting-character
combining-character
formatting-character
letter-character:
A Unicode character of classes Lu, Ll, Lt, Lm, Lo, or Nl
A unicode-escape-sequence representing a character of classes Lu, Ll, Lt, Lm, Lo, or Nl
combining-character:
A Unicode character of classes Mn or Mc
A unicode-escape-sequence representing a character of classes Mn or Mc
decimal-digit-character:
A Unicode character of the class Nd
A unicode-escape-sequence representing a character of the class Nd
connecting-character:
A Unicode character of the class Pc
A unicode-escape-sequence representing a character of the class Pc
formatting-character:
A Unicode character of the class Cf
A unicode-escape-sequence representing a character of the class Cf
For information on the Unicode character classes mentioned above, see
The Unicode Standard, Version 3.0, section 4.5.
More information about the Python-list
mailing list