[Pythonmac-SIG] XML handler design

David Reed dreedmac at columbus.rr.com
Thu Mar 24 15:13:16 CET 2005


On Mar 24, 2005, at 8:35 AM, Henning.Ramm at mediapro-gmbh.de wrote:

> David Reed wrote:
>
>> There's probably a better mailing list with XML parsing experts. I'm
>> certainly not an expert but have done a little XML parsing.
>> I've always
>> followed the pattern of using startElement, characters and endElement
>> to grab all the data. In the startElement method you set a instance
>> variable to keep track of the current tag you are processing. You use
>> the characters method to build up the values and then in the
>> endElement
>> method you store the data in your data structure. See the pyxml HOWTO
>> for an example - specifically this section:
>> http://pyxml.sourceforge.net/topics/howto/node14.html
>
> Yes, sure. Thanks, but
> that's not what I wanted to know.
> Perhaps I wasn't clear enough.
> It's not really so much XML related...
>
>>> def startElement(self, name, attrs):
>>>     self._queue.append(name) # keep the order of processed tags
>>>     handler = str('_start_'+name)
>>>     if hasattr(self, handler):
>>>         self.__class__.__dict__[handler](self, attrs)
>
> Is there a better syntax for self.__class__.__dict__[handler]?


You should be able to use getattr to get the method and then call it. 
That's a little cleaner IMO.


> And where should the "output" go to?
> All examples use print statements in the element handlers.


I'm not certain we are clear. Instead of output statements you set 
store the data in some instance variable - in your case it appears 
self.pages is your instance variable containing the data. So your 
endElement method would set something in self.pages based on the tag 
indicated and the data built up from the characters method and any of 
the attrs from the start tag. If all your data is in the attrs that you 
get in the startElement tag then there's no need to do anything in the 
characters or endElement methods.  If you want to use the 
startElement/characters/endElement approach, I can try to find a small 
example I've written and send it to you off-list.


> I wrote those get... methods - but I guess they don't belong in the 
> XML handler, but perhaps in the parser or somewhere else.
> It works, but I don't think it's good design.
>
>>> def getPages(self):
>>>     return self.pages.getSortedArray()
>>>
>>> def getPage(self, no):
>>>     return self.pages[no]
>
>>> parser = xml.sax.make_parser()
>>> parser.setFeature(xml.sax.handler.feature_namespaces, 0)
>>> pxh = MyHandler()
>>> parser.setContentHandler(pxh)
>>> parser.parse(dateiname)
>>> for p in pxh.getPages(): ...
>
> I should ask the last question on the twisted ML, I guess:
>
>>> Further, if I'd like to use it in a twisted driven asynchronous app,
>>> would I let the parser run in a thread? (Or how can I make
>>> the parser non-blocking?)
>>>

I've never looked into twister so I can't answer this.

Dave



More information about the Pythonmac-SIG mailing list