parser recommendation

Tue Jun 3 10:32:19 EDT 2008

On 3 Jun., 15:43, "Filipe Fernandes" <fernandes... at gmail.com> wrote:

> I have a project that uses a proprietary format and I've been using
> regex to extract information from it.  I haven't hit any roadblocks
> yet, but I'd like to use a parsing library rather than maintain my own
> code base of complicated regex's.  I've been intrigued by the parsers
> available in python, which may add some much needed flexibility.
>
> I've briefly looked at PLY and pyparsing.  There are several others,
> but too many to enumerate.  My understanding is that PLY (although
> more difficult to use) has much more flexibility than pyparsing.  I'm
> basically looking to make an informed choice.  Not just for this
> project, but for the long haul.  I'm not afraid of using a difficult
> (to use or learn) parser either if it buys me something like
> portability (with other languages) or flexibility).
>
> I've been to a few websites that enumerate the parsers, but not all
> that very helpful when it came to comparisons...
>
> http://nedbatchelder.com/text/python-parsers.htmlhttp://www.python.org/community/sigs/retired/parser-sig/towards-stand...
>
> I'm not looking to start a flame war... I'd just like some honest opinions.. ;)
>
> thanks,
> filipe

Trail [1] that comes with EasyExtend 3 is not Batchelders list. Trail
is an EBNF based, top-down, 1 token of lookahead, non-backtracking
parser which is strictly more powerful than LL(1). For LL(1) languages
Trail *is* an LL(1) parser. Trail isn't well reasearched yet so I can
say little about performance characteristics for parsing non LL(1)
languages. They are somewhat varying with regard to the size of the
automata generated by Trail. There are also classes of grammars which
are not accepted by Trail. Just few of them are known.

I used a Trail based parser to replace the regular expression based
tokenizer for Python tokenization in EasyExtend 3. There are
definitely performance issues with the pure Python implementation and
Trail is an order of magnitude slower than tokenizer.py. On the other
hand EBNF grammars are compositional and one can easily add new rules.

[1] http://www.fiber-space.de/EasyExtend/doc/EE.html
    http://www.fiber-space.de/EasyExtend/doc/trail/Trail.html