SnoPy 0.1 - SNOBOL pattern matching for Python

Tim Peters tim.one at home.com
Sun Dec 23 14:32:09 EST 2001


[Paul Rubin]
> Wow, cool.  A lot of SNOBOL patterns can be transformed into regular
> expressions.  However, a lot of them can't be.  Regular expressions
> have caught on as a pattern language because they can be interpreted
> very efficiently.

But few of the popular "regular expression" packages restrict themselves to
the theoretical notion of "regular expression" that *can* be matched very
efficiently.  Perl/Python regexps are not "regular" in the efficient
theoretical sense, and the regexp engines underlying Perl and Python are as
general as SNOBOL4's pattern-matching engine (they're general backtracking
engines, not linear-time DFAs).  They're also just as slow in some cases.
For example, try

def sloooow(m, n):
    import re
    re.match('.*x' * m + 'y', 'x' * n) # can't match, but sure tries <wink>

in Python for various values of m and n.  Say, sloooow(8, 50).  That's a
theoretically pure regular expression, and *can* be matched very
efficiently, but the regexp engines don't use the theoretical DFA approach
because it's inadequate to deal with non-regular gimmicks like
backreferences.  So, in the end, non-regular regexps give a bizarre
pattern-matching language that's also slow and feeble.  I've been trying for
a decade to account for their popularity anyway, but have given up.

> Completely general SNOBOL patterns can be extremely slow to interpret.

The simple regexp above takes exponential time in Python/Perl, same as in a
SNOBOL4 rewrite, and for the same reason.

> When SNOBOL was invented, these things weren't understood as well as they
> are now.

All the relevant portions of finite automata theory were well-known then.
The fascination with trying to *use* regexps started (AFAICT) with Unix, and
they have a certain seduction in that very simple patterns are very easy to
express.  The SNOBOL4 folks knew beforehand that (theoretical) regexps were
too feeble for the tasks they wanted to tackle, so didn't even bother with
them.

> The successor to SNOBOL is called ICON: http://www.cs.arizona.edu/icon
> Python draws some ideas from Icon and looking at Icon will probably
> be interesting to Python fans, even if they don't take up programming
> in it.

s/ICON/Icon/ (one *pleasant* use of regexps <wink>), and I fully agree.
SNOBOL4 remains a superior pattern-matching language, though (Icon met its
design goal of "opening up" the SNOBOL4 pattern engine and integrating
goal-directed bactracking into the semantics of the language, but in doing
so it lost the economy of expression afforded by SNOBOL4's distinct
pattern-matching sublanguage; there isn't a SNOBOL4 pattern you can't write
an Icon program to emulate, but it *feels* like emulation, not direct
specification, in Icon -- in Icon you have to spell out more of "how to do
it", not just "what it is".  That's both a strength and a weakness, of
course, but more of a weakness for casual pattern users.).

pattern-matching-is-hard-ly y'rs  - tim





More information about the Python-list mailing list