[Python-Dev] Is this safe enough? Re: [Python-checkins] cpython: _Py_Identifier are always ASCII strings

"Martin v. Löwis" martin at v.loewis.de
Tue Feb 7 09:23:34 CET 2012


> _Py_IDENTIFIER(xxx) defines a variable called PyId_xxx, so xxx can
> only be ASCII: the C language doesn't accept non-ASCII identifiers.

That's not exactly true. In C89, source code is in the "source character
set", which is implementation-defined, except that it must contain
the "basic character set". I believe that it allows for
implementation-defined characters in identifiers. In C99, this is
extended to include "universal character names" (\u escapes). They may
appear in identifiers
as long as the characters named are listed in annex D.59 (which I cannot
locate).

In C 2011, annexes D.1 and D.2 specify the characters that you can use
in an identifier:

D.1 Ranges of characters allowed
1. 00A8, 00AA, 00AD, 00AF, 00B2−00B5, 00B7−00BA, 00BC−00BE, 00C0−00D6,
00D8−00F6, 00F8−00FF
2. 0100−167F, 1681−180D, 180F−1FFF
3. 200B−200D, 202A−202E, 203F−2040, 2054, 2060−206F
4. 2070−218F, 2460−24FF, 2776−2793, 2C00−2DFF, 2E80−2FFF
5. 3004−3007, 3021−302F, 3031−303F
6. 3040−D7FF
7. F900−FD3D, FD40−FDCF, FDF0−FE44, FE47−FFFD
8. 10000−1FFFD, 20000−2FFFD, 30000−3FFFD, 40000−4FFFD, 50000−5FFFD,
60000−6FFFD, 70000−7FFFD, 80000−8FFFD, 90000−9FFFD, A0000−AFFFD,
B0000−BFFFD, C0000−CFFFD, D0000−DFFFD, E0000−EFFFD

D.2 Ranges of characters disallowed initially
1. 0300−036F, 1DC0−1DFF, 20D0−20FF, FE20−FE2F

Regards,
Martin


More information about the Python-Dev mailing list