PEP 3131: Supporting Non-ASCII Identifiers

Mon May 14 18:14:17 EDT 2007

> Not providing an explicit listing of allowed characters is inexcusable
> sloppiness.

That is a deliberate part of the specification. It is intentional that
it does *not* specify a precise list, but instead defers that list
to the version of the Unicode standard used (in the unicodedata
module).

> The XML standard is an example of how listings of large parts of the
> Unicode character set can be provided clearly, exactly and (almost)
> concisely.

And, indeed, this is now recognized as one of the bigger mistakes
of the XML recommendation: they provide an explicit list, and fail
to consider characters that are unassigned. In XML 1.1, they try
to address this issue, by now allowing unassigned characters in
XML names even though it's not certain yet what those characters
mean (until they are assigned).

>> ``ID_Continue`` is defined as all characters in ``ID_Start``, plus
>> nonspacing marks (Mn), spacing combining marks (Mc), decimal number
>> (Nd), and connector punctuations (Pc).
> 
> Am I the first to notice how unsuitable these characters are?

Probably. Nobody in the Unicode consortium noticed, but what
do they know about suitability of Unicode characters...

Regards,
Martin