[XML-SIG] saxlib.py, package structure, & HOWTO outline

Andrew Kuchling akuchlin@cnri.reston.va.us
Mon, 16 Mar 1998 15:28:37 -0500 (EST)


This message bounces around all over the place.

The current status of the XML-SIG is quite promising; we've already
got prototype implementations of the two XML APIs (SAX and DOM), and a
prototype interface to XMLTok.

A bit of explanation (that will probably get recycled into the HOWTO):
SAX and DOM are two sides of the same coin; they're different ways to
access representations of XML documents.  DOM is a tree-based
representation, so you have the whole document in memory at once
(unless you either do something extremely clever with lazy
construction of the tree, or place constraints on how you can traverse
the tree and build it on the fly).  SAX is an event-based API, so you
write callbacks, which get called by the XML parser as elements begin
and end.  Both are useful for different tasks; you can wander all over
the tree at random with DOM, but SAX is lower-level and lets you
construct only the data structures you require--perhaps none at all.

This distinction is nicely explained at
<http://www.microstar.com/XML/SAX/event.html>.  I'll add a link to
this page to the XML-SIG's Resources page, at
<http://www.python.org/sigs/xml-sig/links.html>. Suggestions for more
links are welcome.

I've taken a brief look at saxlib.py, and it looks very neat and
understandable; I quite agree with Paul Prescod's favorable impression
of it.  What's missing from it?  As far as I can tell, documentation
is the only thing missing, but I'm no XML expert.  Tutorial
information on SAX seems hard to come by, but that's what the HOWTO
will be for...

One minor nit: saxdemo.py has a problem with the following lines.

import xmlproc
p=xmlproc.Parser()

There doesn't seem to be a Parser class or function in xmlproc.py, so 
an AttributeError is raised.  Have I messed something up?

I've done no more than download Stefane Fermigier's DOM code; haven't
actually looked at it yet.  One thing I've noticed is that it uses
packages ("from dom.transformer import *") while the SAX library just
uses top-level modules.  Perhaps we should try to pin down the layout
of the XML package first.  Should there be subpackages (XML.SAX.foo,
XML.DOM.foo, ...) or is it enough to put everything in a package named
'XML'?

Fuzzily thinking about the organization of an XML-HOWTO, my outline
looks like:

Overview: (a few paragraphs)
	What is XML?  Why do you care?  
Introduction to XML: (a few pages)
	Extremely brief intro to XML syntax & ideas, w/ pointers to complete
	resources
Glossary:
	Glossaries usually come at the end, but there are enough
	acronyms and concepts that it might be better placed here.
DOM: 
 	The tree-based interface to XML documents.  Explanations,
	sample code, ...  
SAX:
	The event-based interface.  Explanations, sample code, ...


A.M. Kuchling			http://starship.skyport.net/crew/amk/
Technology is a gift of God. After the gift of life it is perhaps the greatest
of God's gifts. It is the mother of civilizations, of arts and of sciences.
	-- Freeman Dyson , _Infinite in All Directions_