Lexing in Python 2

Glen Starchman glen at electricorb.com
Fri Jan 28 07:18:40 EST 2000


even though the code is a bit hairy (blatant understatement!), the tokenize
module works very well with a little bit of tweaking. I am currently using it for
a handcrafted language that is translated into python. I had previously tried
kjBuckets (good but SLOW) and pyBison (I think that is the name == never got it
to work quite the way I wanted).

However, I would stand on my feet and cheer for a nice, open (meaning not tied
directly into python lexing rules) lexer to be part of python2.


Paul Prescod wrote:

> Tim Peters wrote:
> >
> > Yes, but that's really a different topic.  The Python world has no good
> > approach to that now, paying attention to the "fast" part, and where "good"
> > means "enough like Flex and Bison that you don't feel you've been stranded
> > on some strange alien planet" <wink>.
>
> At the last XML conference I told someone that the reason that re
> doesn't take a stream instead of string parameter was because anyone
> sane working on a large file would use a proper tokenizer. Shouldn't
> such a tokenizer come with Python? With all due respect, what the hell
> is shlex and how did it get into the standard distribution?
>
> I mean the standard distribution alone must contain half a dozen
> hand-coded lexers and in a few places, the weirdness you need to apply
> regular expressions to streams. Surely we can do better for Python 2?
>
> It is my unconsidered, uneducated opinion that lexers do not vary as
> widely as parsers (LL(1), LR(1), LR(N) etc.) so we could just choose one
> at random and start building modules around it.
>
> All in favor? Opposed? Carried.
> --
>  Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
> Earth will soon support only survivor species -- dandelions, roaches,
> lizards, thistles, crows, rats. Not to mention 10 billion humans.
>         - Planet of the Weeds, Harper's Magazine, October 1998




More information about the Python-list mailing list