[DOC-SIG] Re: What does this mean for Python?

12 Mar 1998 17:49:40 +0100

* Paul Prescod
|
| Okay, let's play acronym expansion. 

Fredrik: sorry about the headache. I sort of assumed that people were
familiar with XML terminology, which was of course a mistake. Thank
you for doing the expansion, Paul. :)

| xmlproc is Lars' software. 

That's right. It assumes pretty much the same role as xmllib: parsing
a raw XML document and providing hooks for applications that want to
do something with the data.

| When he says he "uses it natively" instead of "through a driver", I
| think he means that his software is not yet set up to drop in
| someone else's parser easily.

Sorry, what I meant was that it doesn't use a SAX driver, but instead
speaks SAX natively, so to speak. It looks like that approach will
become cumbersome as xmlproc becomes more complete, so I may have to
use another approach later.

| I would encourage Lars to use a newer XML linearization format:
| 
| http://www.jclark.com/xml/canonxml.html

Thanks for that pointer, Paul! I'll add support for canonical XML
output to saxlib since that looks like it can be very useful for
testing parsers.

| So where SAX concentrates on generating *events* for stream-based
| handling of documents, the DOM is an API for explicitly traversing
| and navigating an in-memory tree.

It's worth noting here that one can build a DOM implementation using
the information that comes out of the SAX API so that the DOM library
is completely independent of whatever parser is used.

This means that if we have a C XML parser and some Python ones that
all have SAX drivers the DOM library can use whichever of these
happens to be available in each particular installation.

Don Park has already made such a DOM implementation on top of SAX in
Java, called SAXDOM[1].

I've now made a naive SAX driver for xmllib and added it to my web
page[2] together with the ESIS outputter. It's not complete since I
don't know how complete xmllib is, but once I add the canonical XML
outputter I can test that easily.

It's all extremely simple, but should provide a reasonable
demonstration of the potential of SAX for now. I will try to improve
this to comply more fully with the spec later.

With SAX support in both xmllib and my own incomplete xmlproc I was
able to do some speed comparisons. For good measure I threw in James
Clarks XP[3] parser written in Java (and written to be as fast as
possible) and DataChannels DXP[4] Java parser.

Here are the results on my 166 MHz
Pentium:

Time to run hamlet.xml through validation and grove building via SAX:

 Parser		1st	2nd	3rd	Avg
xmllib.py	50.1    48.4	49.8	49.4
xmlproc.py	40.8	39.4	39.5	39.9
xp.java		1.49	1.43	1.43	1.45
dxpcl.java	14	-	-	14

With no validation or grove building (empty document handler):

 Parser		1st	2nd	3rd	Avg
xmllib.py	38.6	37.2	38.7	38.2
xmlproc.py	32.5	33	32	32.5

The numbers speak for themselves, I think. I'll have to read the XP
sources closely to see whatever James Clark did to XP to make it that
fast. 

(The comparison between xmllib and xmlproc is not entirely fair since
I've still got to add some stuff to xmlproc that will slow it down,
but then I haven't tried optimizing it yet either.)

[1] <URL:http://users.quake.net/donpark/saxdom.html>
[2] <URL:http://www.stud.ifi.uio.no/~larsga/download/python/xml/>
[3] <URL:http://www.jclark.com/xml/xp/index.html>
[4] <URL:http://www.datachannel.com/products/xml/DXP/>

-- 
"These are, as I began, cumbersome ways / to kill a man. Simpler, direct, 
and much more neat / is to see that he is living somewhere in the middle /
of the twentieth century, and leave him there."     -- Edwin Brock

 http://www.stud.ifi.uio.no/~larsga/      http://birk105.studby.uio.no/

_______________
DOC-SIG  - SIG for the Python Documentation Project

send messages to: doc-sig@python.org
administrivia to: doc-sig-request@python.org
_______________