[Python-Dev] Re: [Python-checkins] python/dist/src/Lib pyclbr.py,1.26,1.27

Guido van Rossum guido@python.org
Fri, 23 Aug 2002 10:39:09 -0400


> > Rewritten using the tokenize module, which gives us a real tokenizer
> > rather than a number of approximating regular expressions.
> > Alas, it is 3-4 times slower.  Let that be a challenge for the
> > tokenize module.
> 
> Was this just for purity, or did it fix a bug?  The regexps there
> were close to being heroically careful, and even so it was somtimes
> uncomfortably slow using the class browser in IDLE (based on
> pyclbr), and even on a fast machine.  A factor of 3 or 4 might make
> that unbearable.
> 
> If it was for purity, note that tokenize is also based on mounds of
> regexp tricks <wink>.

It was for purity, with an eye towards future improvements (I want to
teach it more about packages and import-aliasing).  While tokenize
uses regexp tricks, they are much closer to 100% correct than those in
pyclbr.  E.g. the pyclbr regexps don't cope with continuation
backslashes (which often occur in long import statements), or comments
or expressions inside the list of superclasses.  It also didn't cope
well with 'import M as N' which is showing up more and more
frequently.  I think there are still bugs in that area, but they will
be much simpler to fix now.

I was going to use this as an excuse to learn how to use the hotshot
profiler to find out if there are any bottlenecks in the tokenize
module.

pyclbr.readmodule_ex('Tkinter') takes under 1.2 seconds on my home
machine now.  I find that acceptable (it's a lot quicker than IDLE
takes to colorize Tkinter.py :-).

--Guido van Rossum (home page: http://www.python.org/~guido/)