[Python-3000] PEP: Supporting Non-ASCII Identifiers

Jim Jewett jimjjewett at gmail.com
Tue Jun 5 17:37:48 CEST 2007


On 6/5/07, "Martin v. Löwis" <martin at v.loewis.de> wrote:
> > Always normalizing would have the advantage of simplicity (no
> > matter what the encoding, the result is the same), and I think
> > that is the real path of least surprise if you sum over all
> > surprises.

> I'd like to repeat that this is out of scope of this PEP, though.
> This PEP doesn't, and shouldn't, specify how string literals get
> from source to execution.

I see that as a gray area.

Unicode does say pretty clearly that (at least) canonical equivalents
must be treated the same.

In theory, this could be done only to identifiers, but then it needs
to be done inline for getattr.

Since we don't want the results of (str1 == str2) to change based on
context, I think string equality also needs to look at canonicalized
(though probably not compatibility) forms.  This in turn means that
hashing a unicode string should first canonicalize it.  (I believe
that is a change from 2.x.)

This means that all literal unicode characters are subject to
normalization unless they appear in a comment.

At that point, it might be simpler to just canonicalize the whole
source file up front.

-jJ


More information about the Python-3000 mailing list