xml SAX Parsing in python

Stefan Behnel stefan_ml at behnel.de
Wed Dec 17 05:34:22 EST 2014


Hi,

Abubakar Roko schrieb am 17.12.2014 um 07:30:
> Please I am new in using python to write program. I am trying to parse an XML document using sax parse  and store the parsed result in a tree like definedbelow. XNode class define an xml element which has an ID , a tag, a text value, children element and a parent element
>                    class XNode(object):
>                             def __init__(self, ID ="", elmName="", elmValue="", parent=None):
>                                  self.ID = ID                                 self.elmName=elmName                                 self.elmValue=elmValue                                 self.childs=[]                                self.parent=parent
> 
>                            def  getPath(self):                                  if self.parent is None:                                       return self.elmName                                 else:                                       return self.parent.getPath()+"/"+ self.elmName
> I  wrote a program that parse an XML document ,  convert the  document  into the tree like structure defined above and then return the parsed result tothe program that call it.  The program shown below.
> 
> import xml.saximport XMLnode as n
> 
> class XML_Handler ( xml.sax.ContentHandler):
>     def __init__(self, root):        self.root = root        self.tmp =  n.XNode()
>     def startElement(self, tag, attributes):        #if self.root != None:
>         if self.root is not None:
>             if len(self.tmp.childs) < 10:                ID = self.tmp.ID +"." + "0" + str( len(self.tmp.childs))            else:                ID = self.tmp.ID +"." + str( len(self.tmp.childs))                         self.tmp.childs.append( n.XNode(ID,tag,"",self.tmp))             
>             self.tmp= self.tmp.childs[len(self.tmp.childs)-1]        else:            print "0", tag, self.tmp.getPath()            self.root= n.XNode("0", tag,"",None)            self.tmp=self.root
>     def characters(self, content):        self.tmp.elmValue += content.strip()
>     def endElement(self, tag):        self.tmp= self.tmp.parent
> 
>     def parse(self, f):        xml.sax.parse(self,f)        return self.root
> 
> if ( __name__ == "__main__"):
>      parser = xml.sax.make_parser()     parser.setFeature(xml.sax.handler.feature_namespaces, 0)     root = None    Handler = XML_Handler(root)    parser.setContentHandler( Handler )    treRoot= parser.parse("Movies.xml")    print treRoot
> Can somebody help me answer the following questionMy Question is how do I return the parsed result through the root instance variable of  of XML_Handler classI try to do it but i always get None as answerI am using Window 7 professional and python 2.7

The formatting of your code example was heavily screwed up, please send a
plain text email next time.

My general advice is to use ElementTree instead of SAX. It's way easier to
use (even for simple tasks). Use iterparse() to get event driven
incremental parsing.

https://docs.python.org/3/library/xml.etree.elementtree.html#xml.etree.ElementTree.iterparse

http://effbot.org/zone/element-iterparse.htm

Stefan





More information about the Python-list mailing list