Parser suggestion

Fri Sep 30 14:39:30 EDT 2005

[Jorge Godoy]

> You're someone [...]

You make me shy! :-)  Nevertheless, thanks for the appreciation! :-)

> > > It looks like it stopped being developed circa 2002...

> How can I be sure that if I find a bug I'll be able to discuss it with
> the developer if it's 3 years since the last release of his code?

SPARK is rock solid for me, and for the little doubts or improvements I
wanted, I remember having written to John Aycock, who always replied.
But of course, without having experienced it yourself, I understand that
one may have doubts.  I also did not have any need for writing to Jonh
in a few years.  In any case, SPARK is a single module file, and not
such a big one after all.  I very slightly adapted it to my own needs
and habits, and merely copy it from project to project since then.

One warning is worth being told about SPARK.  As it accepts a wider
variety of grammars, it uses algorithms that may be slow depending
on your grammar design, and may become slow when you have errors in
your source.  Compromises are needed.  In all cases I used it so far,
whenever the input to parse was sizeable, it was easy for me to split
the source in smaller chunks with boundaries recognisable by other
means, and calling the parser on each chunk instead of globally on the
whole thing.  This yielded reasonable parsing speed in production code.

With SPARK, you have to provide a tokenizer.  SPARK offers one based on
regular expressions as in Python, I found out I often prefer writing
my own instead.  If you have to process FORTRAN code, you may have
some difficulty in this area (yet I may be all wrong by saying so, as
my FORTRAN is very rusty, and I did not keep up with the evolution of
FORTRAN standards).  At a time in the past, FORTRAN did allow spurious,
ignored whitespace about everywhere outside strings, would it be within
identifiers.  Whitespace can (could) also be spared between tokens where
it would have been clearly mandatory in any other language I know.

So, the split between the lexical and syntactic analysis for FORTRAN
is (or at least once was) fairly fuzzy.  But this is theoretical.  In
practice, FORTRAN programmers almost never resort to insane use of
whitespace.  So you may probably resort to easier, standard two-level
analysis, rather than FORTRAN as formally defined, and still be winning!

-- 
François Pinard   http://pinard.progiciels-bpi.ca