[XML-SIG] The Zen of DOM

Andy Robinson andy@reportlab.com
Wed, 5 Apr 2000 15:28:10 +0100


Looking for spirtual guidance about the right way to do things...

I've been slogging my way through the current XML package looking at many
different ways of parsing XML documents into my own Python object models.
The target is currently "pythonPoint Markup Language", a markup for creating
PDF presentation slides in ReportLab; but I'll need to do many similar
parsers in future.

At the moment, I have a Python class hierarchy with things like
Presentation, Slide, Frame, Paragraph, and various primitive shapes to
decorate pages.  I use a parser derived from xmllib which walks through a
document, and I wrote start_slide/end_slide, start_para/end_para handlers
which construct instances of my own objects and build a tree.

It seems to me that one could use Python's extreme flexibility to take a
generic approach to tree-building, and see if there was a class available
corresponding to a particular tag before creating some generic node; if so,
create it, pass it the available attributes, then pass child nodes to an
add() method so it could organize them itself.  Then I could magically end
up with a notation like...
    presentation.slides[3].frames[1].paragraphs[0].text
...without having to write lots of new stuff in the parser as well as the
application class hierarchy every time.  Or at least to navigate the tree
using generic node/child notation, but get my own class instances attached
at each point.

To turn this on its head, there must be a generic way to turn a Python class
instance into XML, and unserialize it again later.

Has anyone actually worked on this?  Is there a solution lurking in the
package somewhere?  Or is the preferred approach to get a DOM tree, then
walk through it building my own objects?

Thanks very much,

Andy Robinson