Choosing the right parser for parsing C headers

Jean de Largentaye jlargentaye at gmail.com
Tue Feb 8 06:10:23 EST 2005


Hi,

I need to parse a subset of C (a header file), and generate some unit
tests for the functions listed in it. I thus need to parse the code,
then rewrite function calls with wrong parameters. What I call "shaking
the broken tree" :)
I chose to make my UT-generator in Python 2.4. However, I am now
encountering problems in choosing the right parser for the job. I
struggle in choosing between the inappropriate, the out-of-date, the
alpha, or the too-big-for-the task...

So far I've indentified 9(!) potential candidates (Mostly taken from
the http://www.python.org/moin/LanguageParsing page) :

- Plex:
Only a lexical analyser as far as I understand. Kinda RE++, no syntax
processing
- ply:
Lex / Yacc for python! Tackle the Beast! Syntax processing looks
complex..
- Pyggy:
Lex / Yacc -styled too. More recent, but will a 0.4 version be good
enough?
- PyLR:
fast parser with core functions in C... hasn't moved since '97
- Pyparsing:
quick and easy parser... but I don't think it does more than lexical
analysis
- spark:
Here's some wood. Now build your house.
- yapps2 :
yapps2+ (I hesitate to call it yapps3):
chosen by http://www.python.org/sigs/parser-sig/towards-standard.html.
Is the choice up-to-date?
But will it do for parsing C?
- TPG (Toy Parser Generator):
looks cool
- ANTLR (latest version from Jan 28 produces Python code) :
Seems powerful and has a lot of support, but I don't want to have to
use an exterior Java tool. Furthermore, does it let me control what
happens at each stage easily, or does it just make me a compiler?

I've omitted these: shlex, kwparsing (webpage?), PyBison, Trap
(webpage?), DParser, and SimpleParse (I don't want the extra
dependancy).

I was hoping for a quick and easy choice, but got caught in the tar pit
of Too Much Information. Parsing is a large and complex field. As an
added handicap, I'm new to the dark minefield of parsers... I've had
some experience with Lex/Yacc, and have some knowledge of parser
theory, through a course on compilators. I am thus used to EBNF-style
grammar.
I was disappointed to see that Parser-SIG has died out.
Would you have any ideas on which parser is best suited for the task?

John




More information about the Python-list mailing list