[Python-3000] Support for PEP 3131

Sat Jun 2 09:14:58 CEST 2007

"Rauli Ruohonen" <rauli.ruohonen at gmail.com> wrote:
> On 5/27/07, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> >James Y Knight writes:
> >> a 'pyidchar.txt' file with a list of character ranges, and now that
> >> pyidchar.txt file is going to have separate sections based on module
> >> name? Sorry, but are you !@# kidding me?!?
> >
> >The scalability issue was raised by Guido, not the ASCII advocates.
> 
> He did not say that such files or command-line options would be
> scalable either. They are fine tools for auditing, but not for using
> finished products. One should provide both auditing tools and ease
> of use of already audited code.
> 
> One possibility for providing both:
> 
> (1) Add a mandatory ASCII-only special comment at the beginning of
>     each module. The comment would continue until the first empty
>     line and would contain only valid directives matching some
>     regular expression. Only whitespace is allowed before the
>     comment. Anything else is a syntax error.

"""
If a comment in the first or second line of the Python script matches
the regular expression coding[=:]\s*([-\w.]+), this comment is processed
as an encoding declaration; the first group of this expression names the
encoding of the source code file.
"""

Your suggestion would unnecessarily change the semantics of the encoding
declarations.  I would call this gratuitous breakage.

> (2) Allow directives in the special comment to change encoding and
>     tab/space rules. Also allow them to restrict the identifier
>     character set and the string character set.

Sounds like the application of vim settings as a solution to a whole
bunch of completely unrelated "problems" in Python (especially with 4
space indents being the "one true way to indent" and the encoding
declaration already being established).  Please keep your vim out of my
Python ;) .

> (3) Defaults: utf-8 encoding, no mixed tabs and spaces, identifier
>     and string content is not restricted.

All except for the identifier content is already going to be the default
with Python 3.0 .  I've never heard a particularly good reason to allow
for mixing tabs and spaces, and the current encoding declaration works
just fine (except for the whole unicode character thing).

And as stated by basically everyone, the only *sane* default is ascii
identifiers.  Since the vast majority of users will have no use for
unicode identifiers in the short or long term, making them the default
is overzealous at best.

> (4) Have a command line parameter for restricting the character sets
>     of all modules. Every module must satisfy both this and its own
>     directives simultaneously. A default value for this could be set
>     in site.py, but it must be immutable after first assignment.
> Example 3 (inclusion from a file, similar to import):
> 
> # identifier_charset: fooproject.codingstyle.identifier_charset

I really don't like the idea of adding a *different* import-like thing. 
We already have imports (that are evaluated at run time, not compile
time), and due to their semantics, can't use a mechanism like the above.

Obviously I'm overall -1 .  I don't see this as a good solution to the
character set problem.  And I think its a step back regarding encodings,
indentation, etc.

 - Josiah