Stackless & String-processing

Neel Krishnaswami neelk at brick.cswv.com
Thu Jul 22 21:46:26 EDT 1999


In article <3795F4E4.8330755F at appliedbiometrics.com>,
Christian Tismer  <tismer at appliedbiometrics.com> wrote:
>
>I believe this thread would not have evolved if I had not
>begun the Stackless Python / continuation/coroutine/generator
>talk. Now I'm asking to excuse my limited understanding.
>
>Can you please be all a little more concrete?
>How would your patterns look in Python?
>Please show some small example code pieces.

I am working through the Hutton/Meijer paper, and (slowly!) 
implementing the parsers they describe. Basically, the important part
isn't the "monadic" part, but the "combinator" part. The idea is that
you can/should write primitive parsers, and then write higher-order
functions to weave those little parsers into a bigger parser. It's a
nice way to write recursive descent parsers without sinking into
spaghetti code, basically.

(This is kind of hard to do right in Python, because the lack of
lexical scoping and easy-to-use metaclasses makes it trickier to mimic
combinators they describe exactly. But it's still a fun project.)

One of the consequences of this approach is that outputting all the
parses for an ambiguous grammar as a stream of parse trees is very
straightforward. In Haskell or Gofer, you could rely on the laziness
of the language to avoid unnecessary computations, but in Python
it would be not-fun to wait for Python to compute the whole list. 

Here's a (simple, rough-draft) combinator I've written:

class Bind:
    def __init__(self, parser1, func):
        self.parser1 = parser1
        self.func = func # func(v :: AST) -> Parser; e.g. Result
    def __call__(self, input):
        parse_list = self.parser1(input)
        lst = []
        for ast, s in parse_list:
            foo = self.func(ast)(s)
            for elt in foo:
                lst.append( elt ) # <-- a generator would help a lot here!
        return lst

I would love to rewrite the __call__ method so that it returns a
generator that returns parses one at a time rather than returning a
list. Note that the combinator approach magnifies the benefits of
generators, since parsers are normally built through composition, and
each sub-parser can potentially generate a sequence of parses.
Laziness is a real win here. 

Also, it might make sense to add a xreadlines() method to file
objects, which returns the lines of a file one at a time, because then
you can write code in the natural style of:

for line in file.xreadlines():
    blah_blah_blah(line)

without having to read the whole file into memory. (It would
also likely eliminate about 80% of the push for assignment
expressions.)


Neel




More information about the Python-list mailing list