[XML-SIG] Performance question

Fred L. Drake, Jr. fdrake@acm.org
Tue, 5 Nov 2002 09:23:52 -0500


Henry S. Thompson writes:
 > If you want _another_ factor of 10, go to PyLTXML.  The report below
 > is from Python 2.2.1 on RedHat Linux 7.2 using PyXML 0.8.1 and
 > PyLTXML-1.3-2.

Wow!  That's fast!

 > I used Fred's driver, added two new functions to text bit-level and
 > tree-level access via PyLTXML.
 > 
 > parser performance test
 > 100 parses took 3.88 seconds, or 0.04 seconds/parse
 > 100 parses took 0.25 seconds, or 0.00 seconds/parse
 > 100 parses took 0.02 seconds, or 0.00 seconds/parse
 > 100 parses took 0.03 seconds, or 0.00 seconds/parse
 > 
 > The first measurement is the original 4DOM DOM builder, the second is
 > the expatbuilder, the third is PyLTXML returning the whole tree, the
 > fourth is PyLTXML returning every bit (start tag, end tag, text).  I
 > guess the tree is faster because it's slightly lazy wrt Python
 > structures, i.e. only the root is in Python form as returned, the rest
 > gets converted from the native C structs as you walk the Python tree.

So is the resulting object compliant (or at least close) to the Python
DOM, as defined in the Python Library Reference?

    http://www.python.org/doc/current/lib/module-xml.dom.html

(Lazy building of structures is fine, of course, since that's
implementation.)  If it doesn't support the DOM API, does it support
something with an equivalent model and functionality?

 > Here are the additions I made to Fred's version of the script:
...
 > def allBits(s):
 >   f=PyLTXML.OpenString(s1,PyLTXML.NSL_read|PyLTXML.NSL_read_namespaces)
 >   b=PyLTXML.GetNextBit(f)
 >   while b:
 >     b=PyLTXML.GetNextBit(f)
 >   PyLTXML.Close(f)
 > 
 > def itemParse(s):
 >   f=PyLTXML.OpenString(s1,PyLTXML.NSL_read|PyLTXML.NSL_read_namespaces)
 >   b=PyLTXML.GetNextBit(f)
 >   while b.type!='start':
 >     b=PyLTXML.GetNextBit(f)
 >   d=PyLTXML.ItemParse(f,b.item)
 >   PyLTXML.Close(f)
 >   return d  

Ouch!  Very inscrutible code... at least to me.  I must confess that
I've not had time to dig into the LTXML API (C or Python), though I've
stashed a copy of the documentation on my desk somewhere, meaning to
get to it.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Zope Corporation