parser recommendation

Tue Jun 3 10:41:46 EDT 2008

On Jun 3, 8:43 am, "Filipe Fernandes" <fernandes... at gmail.com> wrote:
>
> I've briefly looked at PLY and pyparsing.  There are several others,
> but too many to enumerate.  My understanding is that PLY (although
> more difficult to use) has much more flexibility than pyparsing.  I'm
> basically looking to make an informed choice.  Not just for this
> project, but for the long haul.  I'm not afraid of using a difficult
> (to use or learn) parser either if it buys me something like
> portability (with other languages) or flexibility).
>

Short answer: try them both.  Learning curve on pyparsing is about a
day, maybe two.  And if you are already familiar with regex, PLY
should not seem too much of a stretch.  PLY parsers will probably be
faster running than pyparsing parsers, but I think pyparsing parsers
will be quicker to work up and get running.

Longer answer: PLY is of the lex/yacc school of parsing libraries
(PLY=Python Lex/Yacc).  Use regular expressions to define terminal
token specifications (a la lex).  Then use "t_XXX" and "p_XXX" methods
to build up the parsing logic - docstrings in these methods capture
regex or BNF grammar definitions.  In contrast, pyparsing is of the
combinator school of parsers.  Within your Python code, you compose
your parser using '+' and '|' operations, building up the parser using
pyparsing classes such as Literal, Word, OneOrMore, Group, etc.  Also,
pyparsing is 100% Python, so you wont have any portability issues
(don't know about PLY).

Here is a link to a page with a PLY and pyparsing example (although
not strictly a side-by-side comparison): http://www.rexx.com/~dkuhlman/python_201/.
For comparison, here is a pyparsing version of the PLY parser on that
page (this is a recursive grammar, not necessarily a good beginner's
example for pyparsing):
===============
term = Word(alphas,alphanums)

func_call = Forward()
func_call_list = Forward()
comma = Literal(",").suppress()
func_call_list << Group( func_call + Optional(comma +
func_call_list) )

lpar = Literal("(").suppress()
rpar = Literal(")").suppress()
func_call << Group( term + lpar +
Optional(func_call_list,default=[""]) + rpar )
command = func_call

prog = OneOrMore(command)

comment = "#" + restOfLine
prog.ignore( comment )
================
With the data set given at Dave Kuhlman's web page, here is the
output:
[['aaa', ['']],
 ['bbb', [['ccc', ['']]]],
 ['ddd',
  [['eee', ['']],
   [['fff', [['ggg', ['']], [['hhh', ['']], [['iii', ['']]]]]]]]]]

Pyparsing makes some judicious assumptions about how you will want to
parse, most significant being that whitespace can be ignored during
parsing (this *can* be overridden in the parser definition).
Pyparsing also supports token grouping (for building parse trees),
parse-time callbacks (called 'parse actions'), and assigning names
within subexpressions (called 'results names'), which really helps in
working with the tokens returned from the parsing process.

If you learn both, you may find that pyparsing is a good way to
quickly prototype a particular parsing problem, which you can then
convert to PLY for performance if necessary.  The pyparsing prototype
will be an efficient way to work out what the grammar "kinks" are, so
that when you get around to PLY-ifying it, you already have a clear
picture of what the parser needs to do.

But, really, "more flexible"?  I wouldn't really say that was the big
difference between the two.

Cheers,
-- Paul

(More pyparsing info at http://pyparsing.wikispaces.com.)