[Python-3000] Support for PEP 3131

Sun Jun 3 15:42:23 CEST 2007

Rauli Ruohonen writes:

 > He did not say that such files or command-line options would be
 > scalable either. They are fine tools for auditing, but not for using
 > finished products. One should provide both auditing tools and ease
 > of use of already audited code.

Ease of use of audited code is trivial; turn the checks off.

The question is how to do that.

 > (1) Add a mandatory ASCII-only special comment at the beginning of
 >     each module. The comment would continue until the first empty
 >     line and would contain only valid directives matching some
 >     regular expression. Only whitespace is allowed before the
 >     comment. Anything else is a syntax error.

-1

You still need command-line options or local configuration files to
decide *what* to audit.  We *don't* trust the file!  Just because it
audits to having the character sets it claims doesn't mean it doesn't
use constructs we want to prohibit.  Merely to define those is
non-trivial, and it is absolutely out of the question to expect that
the average Python user will know what the character set
"strictly-conforms-to-UTR39-restrictions-allows-confusables" is.  So
those character sets are basically meaningless for ease of use; ease
of use is "globally restrict to what my students can read = ASCII +
Japanese".

Now, the same code that would be needed to audit the declarations you
propose could easily be generalized to *generate* them.  Once you've
got that, who needs the auditing code in the Python translator?  AIUI
the implementation of PEP 263, you could just substitute an auditing
UTF-8 codec based on that code for the PEP 263 standard UTF-8 codec.
This codec is Python code, and thus could be configured using a file,
which could be generated by the codec and compared with the old
version; the possibilities are endless ... and in no way need to be
defined in the language if I'm correct about the implementation.[1]

The reason I favor the single command line flag (perhaps even
restricted to the binary choice of compatibility ASCII vs. PEP 3131
Unicode) is as a transition strategy.  I do not agree with Ka-Ping
inter alia that there are bogeymen under the bed, but then I live in
Japan, and there *is* no "under the bed" (we sleep on mats on the
floor<wink>).  I think it's quite reasonable to provide a
non-invasive, *simple* auditing facility for those who want it.  When
you're talking about security holes, the burden of proof should *not*
be on the paranoid, especially when the backward-compatibility cost of
security is *zero* (there are *no* Python programs containing
non-ASCII identifiers in the wild yet!)

As James Knight says, the "configure the world in one file" strategy
that jJ and I were batting around is a bit nuts, but it might not be a
bad strategy for configuring a loadable auditing codec or external
utility; I don't think that's wasted mental effort at all.

Footnotes: 
[1]  Caveat, the implementation will be much more heavyweight than a
standard codec since it must contain a Python parser.