Stackless & String-processing

Fernando Pereira pereira at research.att.com
Sat Jul 17 18:35:33 EDT 1999


In article <7mqaht$r27$2 at cronkite.cc.uga.edu>, Graham Matthews
<graham at sloth.math.uga.edu> wrote:

> <graham at sloth.math.uga.edu> wrote:
> : > But the class of 
> : > languages recognisable by automaton doesn't cover all the languages of
> : > interest (in fact my guess is it covers very few ... context ...).
> Fernando Pereira (pereira at research.att.com) wrote:
> : In what sense of "very few"? Most programming languages are designed to
> : have an LR(k) or LL(k) backbone, and regular lexical structure 
> 
> So the regular expressions help you with the regular lexical structure.
> But they don't help with the LR(n) bit. They also don't help with context
> dependencies. That leaves out a lot of languages.
By "automaton" you seem to have misunderstood "finite-state automaton."
I was referring more generally to determinizable classes of automata,
eg. determinizable PDAs. Hence my reference to LR and LL constructions,
which effectively attempt to determinize PDAs (and fail with conflicts
if they cannot). 
> 
> <graham at sloth.math.uga.edu> wrote:
> : > And then
> : > there are software engineering issues ...
> Fernando Pereira (pereira at research.att.com) wrote:
> : Agreed. However, more traditional lexer- and parser-generators had
> : sensible answers to many of those issues, that might not be as elegant
> : as monadic approaches, but that paid serious attention to optimization
> : in the regular and CF backbones. 
> 
> I don't really see why using parser combinators means that you cannot
> optimise. You are not precluded from using regular expression parsers
> as primitive parsers, nor are you precluded from using LALR(n) parsers
> as primitive parsers. Hence you can optimise those parts of your
> parsing with automaton if you want. So I don't really understand your
> criticism.
The monadic approach mixes up the search structure for pattern matching
or parsing (what is elsewhere represented as an automaton) with other
data processing. Unless your optimizer can solve the halting problem,
it will not be able to determinize otherwise determinizable control
structures because it cannot figure out the data dependencies.
> 
> Also as I said parser combinators are really more of a sequencing 
> technique -- one that applies nicely to parsing.
> 
Sure, but so what? Automata are too a sequencing technique, usable for
many other things besides pattern matching and parsing. The fundamental
difference is that automata-theoretic approaches separate neatly the
representation of sequences from what one wants to do with the parts of
those sequences, allowing optimizations that are difficult to obtain
within general-purpose programs.
> Fernando Pereira (pereira at research.att.com) wrote:
> : By the way, I find it faintly amusing
> : to see the functional programming community rediscovering and dressing
> : in categorial garb a simple idea that was a main motivator for the
> : development of logic programming in the early 70s (see
> : <http://akpublic.research.att.com:9000/~pereira/bib.html#Pereira+Warren-
> : 80:DCGs> for a survey). 
> 
> Yes combinators are much like definite clause grammars in prolog. But if
> I remember correctly from my use of DCG's, they gave you a fixed set of
> combinators?
It's easy enough to define one's own combinators, but not as elegantly
as in a higher-order functional language. 
>Moreover they were limited to string parsing. 
Not true. There's nothing in DCGs that ties them to strings. The
underlying programming pattern (often called "difference lists") is in
fact a general accumulator pattern that has been used in many other
applications, eg. code generation.
> Fernando Pereira (pereira at research.att.com) wrote:
> : It *is* a good idea for rapid prototyping, but
> : interspersing pattern matching/parsing with arbitrary computations
> : (even purely functional ones) effectively blocks optimization, which is
>                                 -------------------------------
> 
> Can you explain why? I don't see it, since you can have reg expr parsers
> and other such fast parsers as primitive parsers that you combine ...
Either you put all the interesting stuff in separate primitive parsers,
in which case I'm not sure what's the point of the monadic approach, or
you combine parsing and other data processing more tightly, which takes
advantage of the monadic approach to achieve greater conciseness, but
then it is hard to optimize the parser because the other data
processing is in the way of the necessary analyses. Focusing on
determinization (a crucial part of regular and LR constructions), you
need to determine whether two alternative computation paths that
consume the same input sequence can be merged into a single path. In
general, this is undecidable since it requires figuring out whether two
arbitrary functions (those associated to those two paths) produce the
same output. Now, it might be interesting to see whether compiler
optimization and code rewriting techniques could be used in trying to
recognize possible otimizations, but the general point stands that
tangling pattern-matching or parsing with general computation makes
optimization very difficult.

-- F

-- 
-- F




More information about the Python-list mailing list