Lexing in Python 2
Tim Peters
tim_one at email.msn.com
Sun Jan 23 22:55:26 EST 2000
[Paul Prescod]
> At the last XML conference I told someone that the reason
> that re doesn't take a stream instead of string parameter
> was because anyone sane working on a large file would use
> a proper tokenizer. Shouldn't such a tokenizer come with
> Python? With all due respect, what the hell is shlex and
> how did it get into the standard distribution?
I've wondered that myself <0.9 wink>.
> I mean the standard distribution alone must contain half a
> dozen hand-coded lexers and in a few places, the weirdness
> you need to apply regular expressions to streams. Surely we
> can do better for Python 2?
If nobody was motivated enough to write the code for Python 1, I don't know
why that would change for Python 3000 (that's what Guido insists on calling
it now <wink>). If you want a *fast* Python lexer today, mxTextTools is
your best hope.
> It is my unconsidered, uneducated opinion that lexers do not
> vary as widely as parsers (LL(1), LR(1), LR(N) etc.) so we
> could just choose one at random and start building modules
> around it.
Curiously, mxTextTools is nothing like lex/flex. Flex does such a good job
it's hard to get motivated to duplicate all that effort (it's not easy)
solely to get something releasable under a more Python-like license. I
don't know how Marc-Andre would feel about folding mxTextTools into the
distribution.
> All in favor? Opposed? Carried.
I'm almost never opposed to someone else doing work <wink>.
BTW, there are several interesting parsing projects going on in the Java
world; at least JPython should be able to exploit them.
IDLE-adds-at-least-two-more-hand-crafted-lexers-ly y'rs - tim
More information about the Python-list
mailing list