[XML-SIG] Functional interface to XML?

Cees de Groot cg@cdegroot.com
Thu, 11 Feb 1999 22:13:05 +0100


Hi,

I'm busy evaluating the best way for SGMLtools to convert DocBook SGML into
groff (for high-quality ASCII output and manpage source), and I have decided
to convert DocBook into XML (with help from Jade and/or sgmlnorm) and 
work from there with Python. In case you don't know SGMLtools: it's a 
repackaging of well-known SGML components (Jade validating parser and
DSSSL engine, DocBook DTD, Norm Walsh' DocBook stylesheets, and JadeTeX)
plus a shell around them in Python.

I've been wondering weither something functional/stylesheet-ish exists on
top of Python's XML stuff. I have thrown together a quick example (see
below) and for what I'll need to do I have the feeling that such an 
approach would be the easiest. My example, though, is a tad slow I think
so I rather use someone elses quality stuff before hacking up my own :-)

Here's what I have in mind:

--- 8< --- snip
from xml.sax import saxexts, saxlib, saxutils

import sys,string

driver=None
    
p=saxexts.make_parser(driver)
p.setErrorHandler(saxutils.ErrorPrinter())

class FunctionalDocumentHandler(saxlib.DocumentHandler):
    
    def __init__(self):
        self.stack = []

    def _makeName(self, startwith, numelems):
        retval = ''
        for i in self.stack[0:numelems]:
            retval = i + '_' + retval
        return startwith + '_' + retval[0:len(retval)-1]

    def startElement(self, name, atts):
        self.stack.insert(0, name)
        for numelems in range(len(self.stack), 0, -1):
            funcname = self._makeName('start', numelems)
            if hasattr(self, funcname):
                getattr(self, funcname)(atts)
                break

    def endElement(self, name):
        for numelems in range(len(self.stack), 0, -1):
            funcname = self._makeName('end', numelems)
            if hasattr(self, funcname):
                getattr(self, funcname)()
                break
        del self.stack[0]

    def characters(self, ch, start, length):
        print string.strip(ch[start:start+length])

#
#  This is what this is all about - a kinda stylesheet interface
#
class MyDocumentHandler(FunctionalDocumentHandler):
    def start_PARA(self, atts):
        print "<P>"
    def end_PARA(self):
        print "</P>"

    def start_AUTHORBLURB_PARA(self, atts):
        print "<P><FONT SIZE=-1>"
    def end_AUTHORBLURB_PARA(self):
        print "</FONT></P>";

dh = MyDocumentHandler()

try:
    p.setDocumentHandler(dh)
    p.parse('test.xml')
except IOError,e:
    print str(e)
except saxlib.SAXException,e:
   print str(e)

---- 8< --- snap ---

Is this a sensible way to process XML in Python?