[Python-Dev] Re: Automatic flex interface for Python?

Guido van Rossum guido@python.org
Tue, 20 Aug 2002 23:57:51 -0400


> Lexers are painful in Python.  They hit the language in a weak spot
> created by the immutability of strings.  I've found this an obstacle
> more than once, but then I'm a battle-scarred old compiler jock who
> attacks *everything* with lexers and parsers.

I think you're exaggerating the problem, or at least underestimating
the re module.  The re module is pretty fast!  Reading a file
line-by-line is very fast in Python 2.3 with the new "for line in
open(filename)" idiom.  I just scanned nearly a megabyte of ugly data
(a Linux kernel) in 0.6 seconds using the regex '\w+', finding 177,000
words.  The regex (?:\d+|[a-zA-Z_]+) took 1 second, yielding 1 second,
finding 190,000 words.  I expect that the list creation (one hit at a
time) took more time than the matching.

--Guido van Rossum (home page: http://www.python.org/~guido/)