[XML-SIG] I am confused...

Martin v. Loewis martin@mira.cs.tu-berlin.de
Sun, 28 Jan 2001 23:05:11 +0100


> I remember I was doing queries in the form
> "/article/author/name"
> - and it was so slow... (0.5 - 1 sec per query on Celeron 400)

What kind of API did you use? For simple queries like this, a SAX
ContentHandler may be sufficient. Using Uche's bigxml file, you can
try

import xml.sax
class NameRetriever(xml.sax.ContentHandler):
    def __init__(self):
        self.authors = []
        self.in_author = self.in_name = 0

    def startElement(self, tag, attrs):
        if tag=="author":
            self.in_author = 1
        else:
            if self.in_author and tag == "name":
                self.in_name = 1
                self.txt = ""

    def characters(self,str):
        if self.in_name:
            self.txt = self.txt+str

    def endElement(self,tag):
        if self.in_name and tag=="name":
            self.authors.append(self.txt)
            self.in_name=0
        elif self.in_author and tag=="author":
            self.in_author=0

h = NameRetriever()
start=time.time();xml.sax.parse("bigxml",handler=h);end = time.time()
print end - start
print len(h.authors)

To my own surprise, this is not as fast as the cDomlette; probably
because the latter links directly with expat, and thus avoids a number
of indirections. Still, it takes only three times as long (0.5s vs
1.4s on my machine), and it will work on any Python 2.0 installation.

> Please, tell me if I did it wrong:
> 
> - parsed xml-file
> - quered each variable in a template-file from the xml-file
> - filled template with values found to produce web-page
>   (some variables go to other pages, for example, content page)

In general, that is ok - except that the description is unprecise. How
did you parse? How did you query? How did you fill the template?

> Anyway, before claiming XML tools for Python slow I need to recheck
> with new versions - if there are no objections to the above
> scheme. (And what is preferrable tool for queries?  XPath?)

It depends. A SAX ContentHandler may do in many cases - although it is
apparently not necessarily faster than XPath over a fast DOM
implementation.

> Is there any on-line tutorial (?) or just example code
> to learn how to work efficiently with XML from Python?

To learn PyXML, there is a an online tutorial on the PyXML topic
guide. To learn working efficiently is probably not something that can
be taught in a tutorial - that is much a matter of experience.

Regards,
Martin