Stackless & String-processing

Tim Peters tim_one at email.msn.com
Thu Jul 15 03:00:22 EDT 1999


[Neel Krishnaswami]
> ...
> I've been looking at Icon, and it occurs to me that if coroutines and
> generators were available at the Python level, it might yield a way of
> doing string processing that is more "Pythonic" than regexps.

Yes, and you can find much more about this in the very early days of the
String SIG archives.

> Regexps are nice because when you have a pattern that they can
> directly represent, then you can simply specify a pattern and then you
> don't have to worry about the tedious bookkeeping of looping over the
> string and keeping track of the state variable.
>
> However, when you need to match a pattern that a regexp can't match,
> then suddenly you need to break the string into tokens with regexps
> and then loop over the pieces and keep track of a bunch of state
> variables that don't necessarily correspond to the pieces you are
> actually interested in.
>
> This is unpleasant, because a) clumsy bookkeeping is bad, and b)
> there's two different sorts of syntax needed to do basically
> similar tasks.

The latter is one of the arguments Griswold gave in the paper that
introduced Icon, contrasting Icon's uniform approach to SNOBOL4's distinct
pattern and procedural sublanguages.

I don't think he'd make the same argument today!  Icon is an idiosyncratic
delight, but most string pattern matching was in fact easier (to write,
modify, or read) in SNOBOL4.  Once you start using Icon for complex pattern
matching, you'll soon discover that it's very hard to go back to old code
and disentangle "the pattern" from "the processing" -- *because* everything
looks the same, and it's all intermixed.  The distinct sublanguages in
SNOBOL4 enforced a clean separation; and that was more often an aid than a
burden.

Have noted before that writing string matching code in Icon feels a lot like
writing in an assembly language for SNOBOL4.  Of course the latter didn't
have Icon's power in other areas (e.g. it's at least possible to program
structural pattern-matchers in Icon, using generators to match e.g. tree
patterns; SNOBOL4's patterns started & ended with strings).

> If we could compose generators just like functions, then the bookkeeping
> can be abstracted away and the same approach will work for arbitrarily
> complicated parsing tasks.

The real advantage of regexps is speed, & that's probably while they'll
always be much more popular.  SNOBOL4 and Icon didn't define "bal" builtins
because you couldn't code that pattern yourself <wink>.  bal is beyond a
regexp's abilities, but it's still a simple kind of pattern, and just runs
too darned slow if implemented via the recursive definition as run with the
general backtracking machinery.

> So I think it would be nice if these lovely toys were available at
> the Python level. Is this a possibility?

I've been bugging Guido about generators since '91, but for the billion and
one uses other than string matching.  Anything is possible -- except that
I'll ever stop bugging him <wink>.

> (I'd love to be able to define coroutines in Python like so:
>
> def evens(z):
>     for elt in z:
>         if z % 2 == 0:
>             suspend(z)

How do you intend to resume it?

> It would probably require some rethinking of Python's iteration
> protocol though.)

That's not a hangup; Guido already knows how to do it cleanly.  Like any Big
New Feature there are real questions about costs vs benefits, and so far
this stuff has been widely viewed as "too esoteric".  I think it's natural
as breathing, to the point that a *non*-resumable function strikes me as
never inhaling <wink>.

For now, there's a working implementation of a generator protocol (somewhat
more flexible than Icon's) in the source distribution, under
Demo/threads/Generator.py.  I also posted a general (thread-based) coroutine
implementation a bit over 5 years ago.  Building these on Christian's
stackless Python instead should run much faster.

BTW, generators are much easier to implement than coroutines -- the former
don't require threads or stacklessness or continuations under the covers,
just some relatively straightforward changes to Python's current
implementation (Steven Majewski noted this 5 years ago).  This is why
generators work in all ports of Icon, but coroutines are an optional feature
that's supported only if a platform guru has taken the time to write the
platform-dependent context-switching assembly code Icon coexps require.

degeneratoredly y'rs  - tim






More information about the Python-list mailing list