When Good Regular Expressions Go Bad

Ionel Simionescu ionel at psy.uva.nl
Sun Oct 3 15:39:18 EDT 1999


|
| Okay, okay, you've mostly convinced me.  Now, here's the hard part: got
| any recommendations for a parsing book similar to the O'Reilly Regex
| book?  (By "similar", I mean a good combined tutorial/reference that's
| aimed at the ignorant but generally well-educated techie.  Two or three
| books that combine to that goal would be fine.)
| --

While I am no specialist and no practician of languages design, I have
recently read about parsing, searched for and evaluated several free tools
designed to make the job of designing compilers easier.

I list below some free resources that I found valuable.

(I guess that Paislei will, at least in concept, be enjoyed by many of the
people alredy using a lot of REs in their work. In fact, for this intuition
I dropped this message.)

ionel
----------------------------------------------------------------------------
-------------

Book:

Parsing Techniques - A Practical Guide
--------------------------------------
by Dick Grune and Ceriel J.H. Jacobs
http://www.cs.vu.nl/~dick/PTAPG.html
'''
This 320-page book treats parsing in its own right, in greater depth than is
found in most computer science and linguistics books. It offers a clear,
accessible, and thorough discussion of many different parsing techniques
with their interrelations and applicabilities, including error recovery
techniques. Unlike most books, it treats (almost) all parsing methods, not
just the popular ones. See Preface + Introduction and/or Table of Contents
for a quick impression.
The book features a 48 page systematic bibliography containing over 400
entries. A general context-free parser is supplied (Figure 12.1 and Figure
12.2) and discussed in detail.
No advanced mathematical knowledge is required; the book is based on an
intuitive and engineering-like understanding of the processes involved in
parsing, rather than on the set manipulations used in practice.
'''


Tools:

An index:
---------
Catalog of Compiler Construction Tools
FREEWARE AND COMMERCIAL RESOURCES FOR COMPILER WRITERS
SEPTEMBER 1999
http://www.first.gmd.de/cogent/catalog/


ANTLR/PCCTS
-----------
http://www.antlr.org/
Solidly established, active user base. Recurrent descent parsers, LL(k)
grammars.
'''
What's An ANTLR?
ANTLR, ANother Tool for Language Recognition, (formerly PCCTS) is a language
tool that provides a framework for constructing recognizers, compilers, and
translators from grammatical descriptions containing C++ or Java actions
[You can use PCCTS 1.xx to generate C-based parsers].
'''


Elegant
--------
http://www.research.philips.com/generalinfo/special/elegant/elegant.html
Almost unknown but with a definite elegance in concept and implementation
which makes it worth checking. Attribute grammars with late evaluation.
Outputs unfriendly, but quite portable C code. Has relatively few users.
'''
What is Elegant
Elegant started as a compiler generator based on attributed grammars (the
name stands for Exploiting Lazy Evaluation for the Grammar Attributes of
Non-Terminals) and has grown into a full programming language. It has been
inspired by the abstraction mechanisms found in modern functional languages,
yet Elegant is an imperative language (Elegant has assignment).
Elegant is written in Elegant. (Beware of any language implementation not
written in that same language!) and has been used for internal use within
Philips for about 10 years now. In this period, dozens of compilers have
been built with Elegant. Elegant release 7 is distributed under the Gnu
Public License Agreement.
'''

Paislei
--------
http://www.qtj.net/~lpm/index.html
IDE for developing *and* testing compilers.
The grammar rules are specified as pseudo regular expressions. The docs are
pretty clear.
Outputs C++ classes, which are not quite easy to play with, but you can put
your code in for implementing semantics, so you shouldn't care.
'''
PAISLEI is an experimental system that allows you to design adaptive
recursive descent predicate grammars using the LPM notation.  To view
screenshots, click here.
Once you have graphically designed and debugged your grammar in PAISLEI and
have convinced yourself that it is behaving on test data as you wish it to,
you can generate a potentially cross-platform compatible C++ class (derived
from the LPM class CLpmGrammar) which will accept the same data as you
tested in the IDE.  Since the IDE uses the CLpmGrammar class internally
during debugging, your class will behave identically and accept the same
files.
The LPM language has several features which allow it to parse certain
"classic" difficult items in the grammar, rather than in the attached code.
The examples included with PAISLEI demonstrate this.
Both The PAISLEI IDE and LPM Cross Platform are currently offered completely
free of charge for all use, but the author reserves the exclusive right to
distribute the source.   What this means is that you absolutely must not
distribute PAISLEI or the LPM source files.  You hold the copyright on C++
grammar classes generated by PAISLEI, and on the contents of files in the
.LPM and .PSL formats.
'''








More information about the Python-list mailing list