Accessing "sub elements" with xml.sax ?

Diez B. Roggisch deets at nospam.web.de
Mon Feb 25 18:05:47 EST 2008


erikcw schrieb:
> Hi,
> 
> I'm trying to use xml.sax (from xml.sax.handler import ContentHandler)
> to processes the following data:
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <report name="yahoo" masterAccountID="666831"
> masterAccountName="CraftShowSuccess.Com-US"
> dateStart="2008-02-24-0600" dateEnd="2008-02-24-0600"
> booksClosedTimestamp="2008-02-25T01:15:00.000-0600" booksClosed="true"
> createDate="2008-02-25T02:00:27.041-0600" sortColumn="cost"
> sortOrder="desc">
> <totals><analytics numImpr="951" ctr="0.0" numClick="0" cost="0.0"
> averagePosition="9.305993690851736"/></totals>
> <row adName="Craftshows" adGrpName="Craftshows" cmpgnName="craftshows"
> tacticName="Paid Placement" qualityScore="2"><analytics numImpr="951"
> ctr="0.0" numClick="0" cost="0.0" averagePosition="9.305993690851736"/
>> </row>
> 
> </report>
> 
> I've figured out how to access the attributes in "row" - but I want to
> also access the "analytics" child element.
> 
> I've tried:
> class YahooHandler(ContentHandler):
> ccountNum)
> 
>     def startElement(self, name, attrs):
>         if name == 'row' or name == 'analytics':
>             self.campaign = attrs.get('cmpgnName',"")
>             self.adgroup = attrs.get('adGrpName',"")
>             self.headline = attrs.get('adName',"")
>             self.imps = attrs.get('numImpr',None)
>             self.clicks = attrs.get('numClick',None)
>             self.cost = attrs.get('cost',"")
> 
>     def endElement(self, name):
>         if name == 'row':
>             if self.campaign not in self.data:
>                 self.data[self.campaign] = {}
>             if self.adgroup not in self.data[self.campaign]:
>                 self.data[self.campaign][self.adgroup] = []
>             self.data[self.campaign][self.adgroup].append({'campaign':
> self.campaign,
>                 'adgroup': self.adgroup,
>                 'headline': self.headline,
>                 'imps': self.imps,
>                 'clicks': self.clicks,
>                 'ctr': self.ctr,
>                 'cost': self.cost,
>             })
>             print self.data
> 
> But it the data comes out as seperate dictionaries - I want the
> analytics and the row elements in one dictionary.
> 
> What am I doing wrong?

With sax, you can't access a child directly - you need to build up that 
hierarchy yourself, using a stack of elements.

Better go for DOM or better even element-tree, these do that work for 
you and you can easily access child elemements.

Diez



More information about the Python-list mailing list