[XML-SIG] pyexpat & xmlproc : irreproductible results!
Lars Marius Garshol
larsga@garshol.priv.no
11 Nov 1999 11:13:22 +0100
* Alain Michard
|
| This time I tried to be systematic! You'll find the results below.
| To make it short, on these tests, pyexpat is always faster than
| xmlproc, with a relative difference which can vary enormously
| depending on the type of xml file and on the application programme.
I've done more or less the same tests myself, and my results seem to
agree with yours. (I think your results when using Canonizer are due
to the program spending most of its time in the DocumentHandler rather
than the parser.)
I did these tests on my home PC, a Pentium 166MHz with 80 MB of RAM,
running Windows 95. The documents used are:
othello: The othello.xml document from Jon Bosak's Shakespeare
collection. No attributes. 248 K
newlist: An old version of the data document used for my free
XML tools index. Heavy on markup and with quite a few
attributes. 74 K
teij31: A small document marked up in the XML version of TEI
Lite. Some attributes, comments and CDATA sections.
56 K
rec-xml: The XML specification in XML. Heavy use of attributes,
comments, CDATA marked sections and entities. 158 K
nt: The New Testament, from Bosak's religious collection.
Very simple structure, no attributes. 1 MB
WD-xslt: As the XML specification. 182 K
I would consider these results (except for #2) to be accurate in the
two most significant digits only.
Test #1, with empty DocumentHandlers
othello newlist teij31 rec-xml nt WD-xslt TOTAL
sgmlop 0.053 0.022 0.016 0.038 0.172 0.038 0.341
pyexpat 2.766 0.144 0.232 0.017 4.230 1.388 8.780
xmlproc 8.935 2.399 0.950 5.188 12.00 4.056 33.53
xmlproc_val 15.65 4.387 19.07 5.309 21.14 12.00 77.57
xmllib 32.29 6.982 1.902 14.80 45.14 12.22 113.3
Test #2, with xbel2html.py on an 862 K XBEL document
sgmlop 6.539
pyexpat 11.82
xmlproc 41.55
xmlproc_val 59.14
xmllib 80.62
Test #3, with xml.com stats collector (goes through all attributes)
othello newlist teij31 rec-xml nt WD-xslt TOTAL
sgmlop 4.636 0.967 0.296 1.877 5.942 1.670 13.51*
pyexpat 5.189 1.348 0.459 - 7.354 2.461 16.83
xmlproc 11.42 2.990 1.162 5.298 15.174 5.064 35.81*
xmlproc_val 18.59 5.108 20.16 5.560 24.791 13.33 81.97*
xmllib 34.93 7.739 2.112 17.87 47.65 15.28 107.7*
*rec-xml is not included in the sum
This third test unearthed a small non-conformance in xmllib version
0.2: that it reports characters outside the document (or root)
element. I don't know if this still applies to the latest version of
xmllib.
Also note that the xmlproc version used here is the version currently
in my CVS tree, and this is a little faster than 0.61.
The only conclusions I can draw from this is that sgmlop is indeed the
fastest parser (and very much so if it is given no DocumentHandler),
that pyexpat is about twice as fast as xmlproc and that this is not
affected by differences in documents or applications.
--Lars M.