re vs. sgmllib (was: Moving from Perl to Python)
Tim Peters
tim_one at email.msn.com
Sun Oct 3 17:16:48 EDT 1999
[Bob Horvath]
> I have not had the time to check out the Python parsers as much as I
> would like, but I have played around with JavaCC and liked it. If there
> were something in Python that was similar to JavaCC, this would be
> very nice.
I've said before that Python parsers tend to be "excessively novel" <0.9
wink> -- very creative, but unclear that's an overall advantage over a
traditional parsing approach.
> None of the Python parsers seem to jump out as being "the one to use".
> Perhaps I need to spend some more time - a nudge in the right direction
> would be appreciated?
I rarely save URLs, so you'll have to trudge thru DejaNews to find these (or
maybe their authors will pipe up):
Most conventional: TRAP (this builds on Aaron Watters's kjParsing, in more
than less conventional ways).
Least conventional, & fastest: Marc-Andre Lemburg's mxTextTools. You
basically build your own state machine out of Python tuples, which are
executed by a C extension module; very fast, very delicate. MikeF put a
more conventional layer on top of it, in his own unconventional <wink> way.
Most general, most elegant, & slowest: John Aycock's framework, which is
throughly OO and handles any context-free grammar using an Earley parser.
Least general, easiest for a parsing newbie to understand: Amit Patel's
YAPPS, which generates readable recursive descent parsers (i.e., the kind
God intended people to write if they have to write one by hand).
Most invisible: PyBison, a Python interface to GNU flex/bison, to which you
can find scattered references but never the actual code.
Most commonly used: Everyone eventually learns to regret trying to use
regexps for parsing <wink>.
Most tantalizingly unusable: Python's own Guido-grown parser generator,
which generates the parse tables Python itself uses. This is tied into
Python rather more deeply than it "should be".
combine-the-best-features-of-each-and-it-probably-wouldn't-work-ly y'rs -
tim
More information about the Python-list
mailing list