[XML-SIG] qp_xml

Andy Robinson andy@reportlab.com
Fri, 11 Aug 2000 08:24:50 -0700 (PDT)


> PyExpat has the additional cost of having to cross
> the C->Python line
> all of the time. Still, I haven't heard that anyone
> has made a
> reasonably complete pure-Python parser that is as
> fast as PyExpat. I
> don't know anything about Aaron's.

That comment of Robin's wasn't supposed to have leaked
yet!

The cat is out of the bag, so here's what is
happening:  ReportLab (me, Robin and Aaron) have been
looking at all the parsers around and trying to figure
out a natural way to map XML to Python object models
for a whole load of customer projects.  So we wanted
the easiest way to get a tree structure, without
caring if it was DOM or not.

Aaron sat down to write a simple rec-descent parser
using string.find and nothing else, which outputs a
tree of dictionaries.  It handles tags, text, cdata
and very little else.  This was mostly a learning
exercise and took half a day.  Amazingly, it gets
similar speeds on Hamlet to qp_xml.  We reckon this is
because essentially the same thing is going on: C code
(string.find) grabs the next token, then calls back
into Python to do something with it.  We've found out
in the past that extensions don't give much of a
speedup when you make lots of little calls to them.

Don't get too excited, as there are probably a whole
bunch of occasional cases it doesn't handle yet and
which may slow it down.  It may not be "reasonably
complete" by your definition - unlike PyExpat, which
is extremely well proven. 

It should hopefully get released in a week or two, but
there's some more to do first.


- Andy


=====
Andy Robinson
ReportLab, Inc.

__________________________________________________
Do You Yahoo!?
Kick off your party with Yahoo! Invites.
http://invites.yahoo.com/