[Python-3000] PEP 3131 - the details

James Y Knight foom at fuhm.net
Thu May 17 07:50:17 CEST 2007


On May 16, 2007, at 10:30 PM, Talin wrote:
> While there has been a lot of discussion as to whether to accept PEP
> 3131 as a whole, there has been little discussion as to the specific
> details of the PEP. In particular, is it generally agreed that the
> Unicode character classes listed in the PEP are the ones we want to
> include in identifiers?

One issue I see is that the PEP defines ID_Start and ID_Continue  
itself. It should not do that, bue instead reference as authoritative  
the unicode properties ID_Start and ID_Continue defined in the  
unicode property database.

ID_Start is officially: Lu+Ll+Lt+Lm+Lo+Nl+Other_ID_Start
and ID_Continue is officially: ID_Start + Mn+Mc+Nd+Pc +  
Other_ID_Continue

The only differences between PEP 3131's definition and the official  
ones is the Other_* bits. Those are there to ensure the requirement  
that anything now in ID_Start/ID_Continue will always in the future  
be in said categories. That is an important feature, and should not  
be overlooked. Without the supplemental list, a future version of  
unicode which changes the general class of a character could make a  
previously valid identifier become invalid. The list currently  
includes the following entries:

2118          ; Other_ID_Start # So       SCRIPT CAPITAL P
212E          ; Other_ID_Start # So       ESTIMATED SYMBOL
309B..309C    ; Other_ID_Start # Sk   [2] KATAKANA-HIRAGANA VOICED  
SOUND MARK..KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK
1369..1371    ; Other_ID_Continue # No   [9] ETHIOPIC DIGIT  
ONE..ETHIOPIC DIGIT NINE

This list is available as part of the PropList.txt file in the  
unicode data, which ought to be included automatically in python's  
unicode database so as to get future changes.

> My preference is to be conservative in terms of what's allowed.

I do not believe it is a good idea for python to define its own  
identifier rules. The rules defined in UAX31 make sense and should be  
used directly, with only the minor amendment of _ as an allowable  
start character.

James



More information about the Python-3000 mailing list