[XML-SIG] I am confused...
Uche Ogbuji
uche.ogbuji@fourthought.com
Mon, 29 Jan 2001 13:39:48 -0700
> > I remember I was doing queries in the form
> > "/article/author/name"
> > - and it was so slow... (0.5 - 1 sec per query on Celeron 400)
>
> What kind of API did you use? For simple queries like this, a SAX
> ContentHandler may be sufficient. Using Uche's bigxml file, you can
> try
>
> import xml.sax
> class NameRetriever(xml.sax.ContentHandler):
> def __init__(self):
> self.authors = []
> self.in_author = self.in_name = 0
>
> def startElement(self, tag, attrs):
> if tag=="author":
> self.in_author = 1
> else:
> if self.in_author and tag == "name":
> self.in_name = 1
> self.txt = ""
>
> def characters(self,str):
> if self.in_name:
> self.txt = self.txt+str
>
> def endElement(self,tag):
> if self.in_name and tag=="name":
> self.authors.append(self.txt)
> self.in_name=0
> elif self.in_author and tag=="author":
> self.in_author=0
>
> h = NameRetriever()
> start=time.time();xml.sax.parse("bigxml",handler=h);end = time.time()
> print end - start
> print len(h.authors)
This one needs to go into the XML HOWTO as an example. We now have an XPath
and SAX approach. It would be easy to add a DOM approach. I'll try to do it
with the extra 3 hours the Devil offered me today in exchange for the pinkie
fingernail of my soul.
> To my own surprise, this is not as fast as the cDomlette; probably
> because the latter links directly with expat, and thus avoids a number
> of indirections. Still, it takes only three times as long (0.5s vs
> 1.4s on my machine), and it will work on any Python 2.0 installation.
Cool! I must confess that I would have guessed that SAX was close to
cDomlette. Yes, PySAX does add quite a bit of overhead (which was one of the
motivations for the PyExpat reader and cDomlette), but I would have though
that the integration of the processing with the parsing would make up the
advantage.
Looks as if we might want to consider expanding cDomlette into a full-blown
mutable DOM, though Mike and I are still discussing the best internal data
structures.
--
Uche Ogbuji Principal Consultant
uche.ogbuji@fourthought.com +1 303 583 9900 x 101
Fourthought, Inc. http://Fourthought.com
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python