SAX questions...

Alan Kennedy alanmk at hotmail.com
Tue Jul 22 15:01:31 EDT 2003


Timothy Grant wrote:

> I've worked with DOM before, but never SAX, I have a script that
> seems to work quite well, but every time I look at it I think it's
> amazingly unweildly and that I must be doing something wrong.
> 
> def startElement(self, name, attr):
>     if name == "foo":
>         do_foo_stuff()
>     elif name == "bar":
>         do_bar_stuff()
>     elif name == "baz":
>         do_baz_stuff()
> 
> There's similar code in endElement() and characters() and of course,
> the more tags that need processing the more unweildly each method
> becomes.
> 
> I could create a dictionary and dispatch to the correct method based on a
> dictionary key. But it seems to me there must be a better way.

Tim,

No, you're not doing anything wrong in your above code. But, yes, code
like that can get pretty unwieldy when you've got deeply nested
structures. A couple of suggested alternative approaches:

1. As you already suggested, build up a dictionary mapping tag names
to functions. But that still means that you have to maintain state
information between your function calls.

2. Represent each new element that comes in as a python object
(object()), i.e. construct a "Python Object Model" or POM. All xml
attributes on the element become python attributes of the python
object. That object then goes onto a simple stack. Each object() also
has a list of children. When a new xml child element is passed,
through a nested startElement call, it too gets turned into a python
object, and made a child of the python object on the top of the stack.
Pop objects off the stack as you receive endElement events.

This paradigm is described here, with working code

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/149368

3. Have a specialised python class for each element type (i.e. tag
name), that implements the "business logic" you want associated with
the element type. Write descriptors or properties for each specialised
class that, for example, only allow child python objects of a certain
type. Create a dictionary mapping tag names to your classes, and
instantiate a new python object of that class everytime you receive
the relevant tag name through a startElement call. Put it onto a
stack, as above, store nested xml elements as children of the
top-of-stack, and pop the stack when you receive endElement calls.

That's just a couple of approaches. There might be other, perhaps more
appropriate, methods, if you could describe what you're trying to
achieve. Are you extracting data from the documents? Or are you
checking that the documents comply with some form of hierarchical
structure? Or something else entirely?

HTH,

-- 
alan kennedy
-----------------------------------------------------
check http headers here: http://xhaus.com/headers
email alan:              http://xhaus.com/mailto/alan




More information about the Python-list mailing list