[Python-3000] Support for PEP 3131

Sat Jun 2 22:39:53 CEST 2007

On 6/2/07, Josiah Carlson <jcarlson at uci.edu> wrote:
> Whether or not there exists a tool to convert from Python 2.6 to
> Python 3.0 (2to3), every tool that currently handles Python source
> code encodings via the method specified in the documentation
> (just about every Python-centric editor I know) would need to be
> changed.

How so? The old regexp can still match the encoding tag unless
the user insists on using it in an incompatible way. As syntax
changes go, this one causes little trouble for editors.

> Guido doesn't always overrule everyone.

Yet he makes the decisions. That's why i used his latest comments
on the topic to set the defaults in the suggestion. These are
easily changed when necessary, and the whole issue of
defaults is quite minor. What matters more is having a convenient
way of setting the character set restrictions of a module. The
reason I quoted him at such length was that I thought that you
might have missed some of his posts because you simply ignored
what he had to say (and no, I generally don't remember people's
names).

> There are other solutions (global registry of individual module
> allowed identifiers, in-module with a different syntax, etc.).

These are more to the point. Do you have anything concrete?
A global registry sounds unwieldy and most would probably
enable everything instead of going through the trouble of using it.
What kind of in-module syntax would you use?

> Adding a tool to an arbitrarily large or small previously existing
> toolchain, so that the majority of users can verify that their code
> doesn't contain characters that shouldn't be allowed in the first
> place, isn't a very good solution.

I doubt the majority of users care, so the verifiers would be
a minority. You're exaggerating the amount of work caused
by Guido's solution. I made my suggestion because in my opinion
it or something like it is a more convenient solution for most cases,
but Guido's isn't as bad as you make it out to be.

> Only because it is so rarely used that no one really runs into
> unicode identifiers.

It doesn't really matter why they're not a problem in practice,
just that they aren't. A non-issue is a non-issue, no matter why.

> As such, the only sane position is to require
> the explicit enabling of unicode identifiers.

Neither default would cause big problems, so there are
at least two sane positions. One may be better than the other
or they may be equally good, it's hard to say which.

> where else in Python has the tiny minority defined the defaults for
> the vast majority of users?

I'm sure you will find tinier minorities if you search for them, but
most users don't use extended slice notation to its full extent, yet
it's enabled by default even though it silently accepts a probable
typo. Confusing non-ascii characters are also accepted by
default in strings, even though only a tiny minority uses those
particular characters in strings (I'm sure you've seen the examples).

> yet you still don't understand that ascii is the only sane default.

It is not the default in Java, which is a major language, and I don't
hear constant complaints about it having to be changed, so there
are quite many people who think that the above statement is not
true for programming languages in general. The claim that
static typing makes a big enough difference here is less than
convincing.