Trouble using XML Reader

Mike D 42flicks at gmail.com
Mon Mar 3 03:59:07 EST 2008


Hello,

I'm using XML Reader (xml.sax.xmlreader.XMLReader) to create an rss reader.

I can parse the file but am unsure how to extract the elements I require.
For example: For each <item> element I want the title and description.

I have some stub code; I want to create a list of objects which include a
title and description.

I have the following code (a bit hacked up):

import sys
from xml.sax import make_parser
from xml.sax import handler

class rssObject(object):
    objectList=[]
    def addObject(self,object):
        rssObject.objectList.append(object)

class rssObjectDetail(object):
    title = ""
    content = ""


class SimpleHandler(handler.ContentHandler):
    def startElement(self,name,attrs):
        print name

    def endElement(self,name):
        print name

    def characters(self,data):
        print data


class SimpleDTDHandler(handler.DTDHandler):
    def notationDecl(self,name,publicid,systemid):
        print "Notation: " , name, publicid, systemid

    def unparsedEntityDecl(self,name,publicid,systemid):
        print "UnparsedEntity: " , name, publicid, systemid, ndata

p= make_parser()
c = SimpleHandler()
p.setContentHandler(c)
p.setDTDHandler(SimpleDTDHandler())
p.parse('topstories.xml')

And am using this xml file:

<?xml version="1.0"?>
<rss version="2.0">
  <channel>
    <title>Stuff.co.nz - Top Stories</title>
    <link>http://www.stuff.co.nz</link>
    <description>Top Stories from Stuff.co.nz. New Zealand, world, sport,
business & entertainment news on Stuff.co.nz. </description>
    <language>en-nz</language>
    <copyright>Fairfax New Zealand Ltd.</copyright>
    <ttl>30</ttl>
    <image>
      <url>/static/images/logo.gif</url>
      <title>Stuff News</title>
      <link>http://www.stuff.co.nz</link>
    </image>

<item id="4423924" count="1">
<title>Prince Harry 'wants to live in Africa'</title>
<link>http://www.stuff.co.nz/4423924a10.html?source=RSStopstories_20080303
</link>
<description>For Prince Harry it must be the ultimate dark irony: to be in
such a privileged position and have so much opportunity, and yet be unable
to fulfil a dream of fighting for the motherland.</description>
<author>EDMUND TADROS</author>
<guid isPermaLink="false">stuff.co.nz/4423924</guid>
<pubDate>Mon, 03 Mar 2008 00:44:00 GMT</pubDate>
</item>

  </channel>
</rss>

Is there something I'm missing? I can't figure out how to correctly
interpret the document using the SAX parser. I'm sure I;'m missing something
obvious :)

Any tips or advice would be appreciated! Also advice on correctly
implementing what I want to achieve would be appreciated as using
objectList=[] in the ContentHandler seems like a hack.

Thanks!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20080303/19013a7e/attachment.html>


More information about the Python-list mailing list