[Pythonmac-SIG] XML handler design

Eric Nieuwland eric.nieuwland at xs4all.nl
Thu Mar 24 15:20:11 CET 2005


  <Henning.Ramm at mediapro-gmbh.de> wrote:
>>> def startElement(self, name, attrs):
>>>     self._queue.append(name) # keep the order of processed tags
>>>     handler = str('_start_'+name)
>>>     if hasattr(self, handler):
>>>         self.__class__.__dict__[handler](self, attrs)
>
> Is there a better syntax for self.__class__.__dict__[handler]?
>
> And where should the "output" go to?
> All examples use print statements in the element handlers.
> I wrote those get... methods - but I guess they don't belong in the 
> XML handler, but perhaps in the parser or somewhere else.
> It works, but I don't think it's good design.
>
>>> def getPages(self):
>>>     return self.pages.getSortedArray()
>>>
>>> def getPage(self, no):
>>>     return self.pages[no]
>
>>> parser = xml.sax.make_parser()
>>> parser.setFeature(xml.sax.handler.feature_namespaces, 0)
>>> pxh = MyHandler()
>>> parser.setContentHandler(pxh)
>>> parser.parse(dateiname)
>>> for p in pxh.getPages(): ...

My style is to create/build a data structure in the parser and have a 
single get... method that will give me the result.
Your getPage/getPages would be part of the objects in the data 
structure.

So:

class MyDoc(...):
	...
	def getPage(self):
		...
	def getPages(self,no):
		...
	...

class MyXmlParser(...):
	...
	def reset(self):
		super(MyXmlParser,self).reset()
		self._myresult = MyDoc(...)
		...
	def startElement(self, name, attrs):
		... add to self._result ...
	def getResult(self):
		# usually something more advanced than this
		return self._result

> I should ask the last question on the twisted ML, I guess:
>
>>> Further, if I'd like to use it in a twisted driven asynchronous app,
>>> would I let the parser run in a thread? (Or how can I make
>>> the parser non-blocking?)

Sorry, dunnoh!

--eric



More information about the Pythonmac-SIG mailing list