[XML-SIG] pulldom CHARACTERS problem

Grant Morganryuuguu grant at ryuuguu.com
Fri Mar 11 10:16:48 CET 2005


I solved the problem and am responding to myself for the benifit of future googlers.
The sax parsers my split nodes of type CHARACTERS into multiple nodes so they have to be joined back together. Since pulldom depends on a sax parser it also may do this.  My method to find and join together the next CHARACTERS node is below. It assumes that
self.event,self.node  = iter.next()
was executed previously.

     def getCharacterNode(self,iter):
         while self.event != 'CHARACTERS':
             self.event,self.node  = iter.next()
         chars=[]
         chars.append(self.node.nodeValue)
         self.event,self.node  = iter.next()
         while self.event == 'CHARACTERS':
             chars.append(self.node.nodeValue)
             self.event,self.node  = iter.next()
         return ''.join(chars)

Cheers,
Grant

> I am having a problem with only getting part of characters in CHARACTERS node.
> I am using code like this
>
> doc = xml.dom.pulldom.parse(inFile)
> iter=iter(doc)
> event,node  = iter.next()
> if event == 'CHARACTERS':
>      char =self.node.nodeValue
>
> In my small tests it works fine but with a large file (2MB) errors start occuring.
> XML like
>
> <key>Name</key>
>
> sometimes produces char== 'N' or 'Na' where and what it produces varies if I remove some nodes at the begining of the file. the nodes I remove seem parse fine but which later node parses wrong changes.  I though maybe it was related a buffering problem but this only a 4 character string. I tried changing the buffering to line buffering-- parse(inFile,None,1) --as the phrase <key>Name</key> always occurs on one line, this had no affect.
> I tried this with both python 2.3.5 and 2.4 I have not installed pyXML
>
> Any suggestions would be appreciated.
>
> Cheers,
> Grant





More information about the XML-SIG mailing list