Trouble using XML Reader
Mike D
42flicks at gmail.com
Mon Mar 3 03:59:07 EST 2008
Hello,
I'm using XML Reader (xml.sax.xmlreader.XMLReader) to create an rss reader.
I can parse the file but am unsure how to extract the elements I require.
For example: For each <item> element I want the title and description.
I have some stub code; I want to create a list of objects which include a
title and description.
I have the following code (a bit hacked up):
import sys
from xml.sax import make_parser
from xml.sax import handler
class rssObject(object):
objectList=[]
def addObject(self,object):
rssObject.objectList.append(object)
class rssObjectDetail(object):
title = ""
content = ""
class SimpleHandler(handler.ContentHandler):
def startElement(self,name,attrs):
print name
def endElement(self,name):
print name
def characters(self,data):
print data
class SimpleDTDHandler(handler.DTDHandler):
def notationDecl(self,name,publicid,systemid):
print "Notation: " , name, publicid, systemid
def unparsedEntityDecl(self,name,publicid,systemid):
print "UnparsedEntity: " , name, publicid, systemid, ndata
p= make_parser()
c = SimpleHandler()
p.setContentHandler(c)
p.setDTDHandler(SimpleDTDHandler())
p.parse('topstories.xml')
And am using this xml file:
<?xml version="1.0"?>
<rss version="2.0">
<channel>
<title>Stuff.co.nz - Top Stories</title>
<link>http://www.stuff.co.nz</link>
<description>Top Stories from Stuff.co.nz. New Zealand, world, sport,
business & entertainment news on Stuff.co.nz. </description>
<language>en-nz</language>
<copyright>Fairfax New Zealand Ltd.</copyright>
<ttl>30</ttl>
<image>
<url>/static/images/logo.gif</url>
<title>Stuff News</title>
<link>http://www.stuff.co.nz</link>
</image>
<item id="4423924" count="1">
<title>Prince Harry 'wants to live in Africa'</title>
<link>http://www.stuff.co.nz/4423924a10.html?source=RSStopstories_20080303
</link>
<description>For Prince Harry it must be the ultimate dark irony: to be in
such a privileged position and have so much opportunity, and yet be unable
to fulfil a dream of fighting for the motherland.</description>
<author>EDMUND TADROS</author>
<guid isPermaLink="false">stuff.co.nz/4423924</guid>
<pubDate>Mon, 03 Mar 2008 00:44:00 GMT</pubDate>
</item>
</channel>
</rss>
Is there something I'm missing? I can't figure out how to correctly
interpret the document using the SAX parser. I'm sure I;'m missing something
obvious :)
Any tips or advice would be appreciated! Also advice on correctly
implementing what I want to achieve would be appreciated as using
objectList=[] in the ContentHandler seems like a hack.
Thanks!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20080303/19013a7e/attachment.html>
More information about the Python-list
mailing list