[Python-Dev] Allowing non-ASCII identifiers

François Pinard pinard at iro.umontreal.ca
Mon Feb 9 17:43:59 EST 2004


[Martin von Löwis]
> François Pinard wrote:
> >>1. At run-time, identifiers are represented as Unicode objects unless
> >>they are pure ASCII.  IOW, they are converted from the source encoding
> >>to Unicode objects in the process of parsing.

> >This is already the case, isn't it?

> Currently, all identifiers are byte strings, at run-time, representing
> ASCII characters. IOW, you currently won't observe Unicode strings
> as identifiers.

Oops, sorry.  I misread your sentence as limiting itself to identifiers.
I thought having read that the effect of `coding:' was to convert the
whole source to Unicode before the scanner pass.  This is all from fuzzy
memory.

> >I do not much know the internals, yet I suspect one more thing to
> >consider is whether Unicode strings looking like non-ASCII identifiers
> >should be interned or not, the same as currently done for ASCII.

> Indeed; I had not thought about this.

This is only an optimisation consideration, which might be premature.
On the other hand, speed considerations should not be serious enough to
play against one willing to write national identifiers, in the long run.

> ># -*- coding: Latin-1 -*-
> >élève = 3
> >print élève
> [...]
> >So, the Python compiler is sensitive to the active locale.

> Yes, that's a bug. It will use byte strings as identifiers (without
> running your example, I'd expect that dir() shows they are UTF-8)

They seem to be Latin-1.  Consider that characters could not be
classified correctly in a Latin-1 environment, if they were UTF-8.

> >This is kind of an happy bug!  May we count on it being supported in the
> >interim? :-) :-)

> I would think so: this bug has been present for quite some time,
> and nobody complained :-)

Would Guido accept to pronounce on that? :-) As much as we starve to
write our things in better French, we would not allow ourselves to write
code that will likely break later.  We work in a production environment.

A Python command option to set the locale from environment variables,
before compilation of `__main__' starts, but that might be too much
effort in the wrong direction.  Best and simplest would be that the
`coding:' pragma really drives the character set used in a file.

-- 
François Pinard   http://www.iro.umontreal.ca/~pinard



More information about the Python-Dev mailing list