Matching templates against a tree - any ideas?

Tim Peters tim_one at email.msn.com
Sun Oct 3 17:01:15 EDT 1999


[Phil Hunt]
> People say Earley parsing is slow, but my experience is that it is
> fast enough. My Parrot program uses a modified version of John
> Aycock's framework (the modifications slow it down, by remembering
> the line and column numbers of the start of each token), and it
> can read in a 150 line input file, parse it, process it, generate code
> from it, and save the results to disk, in 1.5 seconds. (This is
> on a cheapish PC with a 300 MHz AMD K6-2 processor).
>
> I guess it depends on the grammar you are parsing.

It's "slow" compared to the best methods known for restricted grammars,
whether in an expected-case sense or worst-case.  It *can* take time cubic
in the length of the input, and consume memory quadratic in that length.
The best methods known for suitably friendly grammars (like, say, Python's
or C's) take worst-case=expected-case linear time in the length of the
input, and require constant working space regardless of input length.  The
generality of the Earley algorithm also requires use of fancier supporting
data structures, which even if all else were equal would take longer to
create and manipulate.  Write it all in Python, and use re's NFA regexps for
tokenization, and, ya, it's s-l-o-w <wink>.  That's relative to what's
possible.

100 lines a second is fine for infrequent processing of small files.  If,
like MikeF, you're parsing multiple megabytes routinely, or even if you're
just running a compiler, it would be intolerably slow.

just-another-tradeoff-ly y'rs  - tim






More information about the Python-list mailing list