[Pythonmac-SIG] XML handler design

Henning.Ramm at mediapro-gmbh.de Henning.Ramm at mediapro-gmbh.de
Thu Mar 24 14:35:33 CET 2005


David Reed wrote:

>There's probably a better mailing list with XML parsing experts. I'm 
>certainly not an expert but have done a little XML parsing. 
>I've always 
>followed the pattern of using startElement, characters and endElement 
>to grab all the data. In the startElement method you set a instance 
>variable to keep track of the current tag you are processing. You use 
>the characters method to build up the values and then in the 
>endElement 
>method you store the data in your data structure. See the pyxml HOWTO 
>for an example - specifically this section:
>http://pyxml.sourceforge.net/topics/howto/node14.html

Yes, sure. Thanks, but
that's not what I wanted to know. 
Perhaps I wasn't clear enough.
It's not really so much XML related...

>> def startElement(self, name, attrs):
>>     self._queue.append(name) # keep the order of processed tags
>>     handler = str('_start_'+name)
>>     if hasattr(self, handler):
>>         self.__class__.__dict__[handler](self, attrs)

Is there a better syntax for self.__class__.__dict__[handler]?

And where should the "output" go to?
All examples use print statements in the element handlers.
I wrote those get... methods - but I guess they don't belong in the XML handler, but perhaps in the parser or somewhere else.
It works, but I don't think it's good design.

>> def getPages(self):
>>     return self.pages.getSortedArray()
>>
>> def getPage(self, no):
>>     return self.pages[no]

>> parser = xml.sax.make_parser()
>> parser.setFeature(xml.sax.handler.feature_namespaces, 0)
>> pxh = MyHandler()
>> parser.setContentHandler(pxh)
>> parser.parse(dateiname)
>> for p in pxh.getPages(): ...

I should ask the last question on the twisted ML, I guess:

>> Further, if I'd like to use it in a twisted driven asynchronous app, 
>> would I let the parser run in a thread? (Or how can I make 
>> the parser non-blocking?)

Best regards,
Henning Hraban Ramm
Südkurier Medienhaus / MediaPro
Support/Admin/Development Dept.


More information about the Pythonmac-SIG mailing list