[Python-ideas] Allow using symbols from Unicode block "Superscripts and Subscripts" in identifiers

Sun May 4 09:10:56 CEST 2014

On Sunday 04 May 2014 12:40:44 Steven D'Aprano wrote:
> On Sun, May 04, 2014 at 03:34:32AM +0900, Stephen J. Turnbull wrote:
> 
> > Note that Unicode itself considers them *compatibility* characters and
> > says:
> > 
> >     Superscripts and subscripts have been included in the Unicode
> >     Standard only to provide compatibility with existing character
> >     sets.  In general, the Unicode character encoding does not attempt
> >     to describe the positioning of a character above or below the
> >     baseline in typographical layout.
> > 
> > In other words, Unicode is reluctant to guarantee that x2, x², and x₂
> > are actually different identifiers!
> [...]
> 
> I don't think this is a valid interpretation of what the Unicode 
> standard is trying to say, but the point is moot. I think you've just 
> identified (pun intended) a major objection to the proposal, one serious 
> enough to change my mind from limited support to opposition.
> 
> Python identifiers are treated by their NFKC normalised form:
> 
>     All identifiers are converted into the normal form NFKC while 
>     parsing; comparison of identifiers is based on NFKC.
> 
> https://docs.python.org/3/reference/lexical_analysis.html
> 
> And superscripts and subscripts normalise to standard characters:
> 
> py> [unicodedata.normalize('NFKC', s) for s in 'x² x₂ x2'.split()]
> ['x2', 'x2', 'x2']
> 
> So that categorically rules out allowing superscripts and subscripts as 
> *distinct* characters in identifiers. So even if they were allowed, it 
> would mean that x² and x₂ would be treated as the same identifier as x2.
> 
> For my use-case, I would want x² and x₂ to be treated as distinct 
> identifiers, not just as a funny way of writing x2. So from my 
> perspective, *at best* there is now insufficient benefit to bother 
> allowing them.
> 
> It's actually stronger than that: allowing superscripts and subscripts 
> would be an attractive nuisance for my use-case. If they were allowed, I 
> would be tempted to write x² and x₂, which could end up being a subtle 
> source of bugs if I accidentally used them both in the same namespace, 
> thinking that they were distinct when they actually aren't. So I am now 
> -1 on allowing superscripts and subscripts.
> 
> 
> 
That's the strongest point against allowing superscripts and subscripts in a whole discussion, IMHO. I would want x² and x₂ to be treated as distinct identifiers either.

I've tried this use case in Julia and it works:
julia> x₂ = 1
1

julia> x² = 2
2

julia> x₂
1

julia> x²
2

But then I've found thread in Julia's bugtracker covering unicode identifiers normalization[1]. As I understood they don't use NFKC. As a consequence symbols "μ" (0x00b5) and "µ" (0x03bc) are treated as different. They understood that it's weird and they need to do something about this. Some of they don't want to use NFKC because of the same reason (+ for example, "H" and "ℍ" would became equal identifiers). Others decided to give a warning when new identifier is equal to the defined one (in the terms of NFKC normalization).

Now I understood that things are more complicated that I considered them when I did a proposal. I think that there is no "good way" to add support for subscripts and superscripts. So it's better to leave the situation as is.

-- 
Regards, Roman Inflianskas

--------
[1] covering unicode identifiers normalization
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140504/0422ed41/attachment-0001.html>