[Python-3000] Support for PEP 3131

Wed Jun 13 00:15:49 CEST 2007

--- "Stephen J. Turnbull" <turnbull at sk.tsukuba.ac.jp>
wrote:
> Ka-Ping Yee writes:
>  > Both of these come down to the wastefulness of
> redoing something
>  > that the Python interpreter itself already knows
> how to do very
>  > well, and is, in some sense by definition, the
> authority on how
>  > to do it correctly.
> 
> True.  However, Guido has already indicated that he
> favors some
> approach like this, as an external lint utility.  My
> question is how
> to minimize impact on users who desire flexible
> automatic auditing.
> 

I would like to comment on both points.

I am somebody who would use such an external lint
utility, even it was just out of idle curiosity about
the code I was importing (in other words, no fear
involved).

It seems like such a utility would need to be able to
do the following.

   1) The utility would need to tokenize my code.  It
seems like this could be done by the tokenizer module
pretty easily, even under PEP 3131.  Tokenizer.py does
not tap into Python internals right now AFAIK, and I
don't think it would need to under Py3K.

   2) The utility should triage my identifiers
according to their alphabet content.  In an ideal
world, since I'm not a Unicode expert, I would like it
somewhat simplifed for me -- e.g. the utility would
classify identifiers as ascii-only, totally mixed,
definitely Cyrillic, definitely French, German, mixed
Latin variant, Japanese, etc. To the extent that
Python knows how to classify strings on those general
levels, I would hope that those functions would be
exposed at the Python level.  

   But to the extent that CPython really shouldn't
care, I don't see a big problem with some third party
library implementing some kind of routine that can
deduce languages from Unicode spellings.  It's
basically a big dictionary, and maybe a small tree
structure, and something like a forty-line algorithm
(walk through the letters, look up their most specific
language, then with all the letters, walk up the tree
until you found the most specific
species/phylum/kingdom etc. of languages that
encompasses all letters). 

   3) The utility should be able to efficiently figure
out which files I want to inspect, by statically
walking the import structure.  To Ping's point, I
think this is one area where you lose something by
having to do this outside of the interpreter, but it
doesn't seem to be a terribly difficult problem to
solve.  (To the extent that Python can dynamically
import stuff at run-time, I'm willing to accept that
limitation in an external lint utility, since even if
CPython were doing the auditing for me, I'd still only
find out at runtime.)

____________________________________________________________________________________
Choose the right car based on your needs.  Check out Yahoo! Autos new Car Finder tool.
http://autos.yahoo.com/carfinder/