re vs. sgmllib (was: Moving from Perl to Python)

Tim Peters tim_one at email.msn.com
Sun Oct 3 17:16:48 EDT 1999


[Bob Horvath]
> I have not had the time to check out the Python parsers as much as I
> would like, but I have played around with JavaCC and liked it.  If there
> were something in Python that was similar to JavaCC, this would be
> very nice.

I've said before that Python parsers tend to be "excessively novel" <0.9
wink> -- very creative, but unclear that's an overall advantage over a
traditional parsing approach.

> None of the Python parsers seem to jump out as being "the one to use".
> Perhaps I need to spend some more time - a nudge in the right direction
> would be appreciated?

I rarely save URLs, so you'll have to trudge thru DejaNews to find these (or
maybe their authors will pipe up):

Most conventional:  TRAP (this builds on Aaron Watters's kjParsing, in more
than less conventional ways).

Least conventional, & fastest:  Marc-Andre Lemburg's mxTextTools.  You
basically build your own state machine out of Python tuples, which are
executed by a C extension module; very fast, very delicate.  MikeF put a
more conventional layer on top of it, in his own unconventional <wink> way.

Most general, most elegant, & slowest:  John Aycock's framework, which is
throughly OO and handles any context-free grammar using an Earley parser.

Least general, easiest for a parsing newbie to understand:  Amit Patel's
YAPPS, which generates readable recursive descent parsers (i.e., the kind
God intended people to write if they have to write one by hand).

Most invisible:  PyBison, a Python interface to GNU flex/bison, to which you
can find scattered references but never the actual code.

Most commonly used:  Everyone eventually learns to regret trying to use
regexps for parsing <wink>.

Most tantalizingly unusable:  Python's own Guido-grown parser generator,
which generates the parse tables Python itself uses.  This is tied into
Python rather more deeply than it "should be".

combine-the-best-features-of-each-and-it-probably-wouldn't-work-ly y'rs  -
tim






More information about the Python-list mailing list