[DOC-SIG] Re: What does this mean for Python?

Sjoerd Mullender Sjoerd.Mullender@cwi.nl
Fri, 13 Mar 1998 12:38:56 +0100


On Thu, Mar 12 1998 Lars Marius Garshol wrote:

> With SAX support in both xmllib and my own incomplete xmlproc I was
> able to do some speed comparisons. For good measure I threw in James
> Clarks XP[3] parser written in Java (and written to be as fast as
> possible) and DataChannels DXP[4] Java parser.
> 
> Here are the results on my 166 MHz
> Pentium:
> 
> Time to run hamlet.xml through validation and grove building via SAX:
> 
>  Parser		1st	2nd	3rd	Avg
> xmllib.py	50.1    48.4	49.8	49.4
> xmlproc.py	40.8	39.4	39.5	39.9
> xp.java		1.49	1.43	1.43	1.45
> dxpcl.java	14	-	-	14
> 
> With no validation or grove building (empty document handler):
> 
>  Parser		1st	2nd	3rd	Avg
> xmllib.py	38.6	37.2	38.7	38.2
> xmlproc.py	32.5	33	32	32.5
> 
> The numbers speak for themselves, I think. I'll have to read the XP
> sources closely to see whatever James Clark did to XP to make it that
> fast. 

I have a question about the timings here.  How was the data fed to the 
XML parser in xmllib.py?  If you do
	python xmllib.py hamlet.xml
the data is fed to the parser one character at the time.  But it is
also possible to feed everything at once.  There are very significant
performance differences between these two methods:  If the XML parser
sees that a tag is incomplete (usually after parsing the first part of
the tag), it saves the data until you feed more data.  This means that
if you feed the data one character at the time, tags will be parsed
partially many times before they are parsed completely, slowing down
the process quite a bit.

> (The comparison between xmllib and xmlproc is not entirely fair since
> I've still got to add some stuff to xmlproc that will slow it down,
> but then I haven't tried optimizing it yet either.)

I haven't done any optimisations in xmllib either.  One obvious
optimization is to use regex instead of re (but I am not planning to
do that).

-- Sjoerd Mullender <Sjoerd.Mullender@cwi.nl>
   <URL:http://www.cwi.nl/~sjoerd/>

_______________
DOC-SIG  - SIG for the Python Documentation Project

send messages to: doc-sig@python.org
administrivia to: doc-sig-request@python.org
_______________