Parsing speed was( Matching templates against a tree - any ideas?)

Mike Fletcher mcfletch at vrtelecom.com
Thu Sep 23 21:04:18 EDT 1999


Hmm, speed is in the eye of the text getting parsed as well :) .  I use
exclusively "simple" grammars, I just use them on larger files ;) .

I often need to parse (and process) 2MB VRML files and need to be able to do
it as many as two or three times/minute (i.e. a designer makes multiple
changes/minute to a model and needs to see the result in the target system
before deciding what to do next).  mcf.pars and simpleparse are able to
handle those needs.  Our work is a real-time network-oriented language, so
people in the same domain (3D graphics) but a different field
(non-real-time-graphics) might be looking at much larger files (where my
parsers still aren't fast enough).

I know many people who frequently parse 15MB XML files representing a good
chuck of structured database information.  There are even people working
with 2GB XML files (though they generally get yelled at when they tell
people that).  I won't even try 15MB with either of my systems (which are
strongly biased toward 100k to 5MB file sizes)... without a
highly-optimising stream-oriented parser/processor scheme the 2GB range is
way out of reach.

Which is to say, for some tasks, such as reading and intelligently examining
a few hundred lines of high-level text, Earley is blazingly fast (sort of).
For others, such as reading MBs of highly-structured (easily parsed) data,
it can be painful.

Enjoy,
Mike

-----Original Message-----
From: python-list-request at cwi.nl [mailto:python-list-request at cwi.nl]On
Behalf Of Phil Hunt
Sent: September 23, 1999 6:02 PM
To: python-list at cwi.nl
Subject: RE: Matching templates against a tree - any ideas?
...
People say Earley parsing is slow, but my experience is that it is
fast enough. My Parrot program uses a modified version of John
Aycock's framework (the modifications slow it down, by remembering
the line and column numbers of the start of each token), and it
can read in a 150 line input file, parse it, process it, generate code
from it, and save the results to disk, in 1.5 seconds. (This is
on a cheapish PC with a 300 MHz AMD K6-2 processor).

I guess it depends on the grammar you are parsing.
...





More information about the Python-list mailing list