Trouble writing to database: RSS-reader

Arne arne.k.h at gmail.com
Wed Jan 23 11:06:10 EST 2008


On Jan 21, 11:25 pm, "Gabriel Genellina" <gagsl-... at yahoo.com.ar>
wrote:
> En Mon, 21 Jan 2008 18:38:48 -0200, Arne <arne.... at gmail.com> escribi�:
>
>
>
> > On 21 Jan, 19:15, Bruno Desthuilliers <bruno.
> > 42.desthuilli... at wtf.websiteburo.oops.com> wrote:
>
> >> This should not prevent you from learning how to properly parse XML
> >> (hint: with an XML parser). XML is *not* a line-oriented format, so you
> >> just can't get nowhere trying to parse it this way.
>
> >> HTH
>
> > Do you think i should use xml.dom.minidom for this? I've never used
> > it, and I don't know how to use it, but I've heard it's useful.
>
> > So, I shouldn't use this techinicke (probably wrong spelled) trying to
> > parse XML? Should i rather use minidom?
>
> > Thank you for for answering, I've learnt a lot from both of you,
> > Desthuilliers and Genellina! :)
>
> Try ElementTree instead; there is an implementation included with Python  
> 2.5, documentation  athttp://effbot.org/zone/element.htmand another  
> implementation available athttp://codespeak.net/lxml/
>
> import xml.etree.cElementTree as ET
> import urllib2
>
> rssurl = 'http://www.jabber.org/news/rss.xml'
> rssdata = urllib2.urlopen(rssurl).read()
> rssdata = rssdata.replace('&', '&') # ouch!
>
> tree = ET.fromstring(rssdata)
> for item in tree.getiterator('item'):
>    print item.find('link').text
>    print item.find('title').text
>    print item.find('description').text
>    print
>
> Note that this particular RSS feed is NOT a well formed XML document - I  
> had to replace the & with & to make the parser happy.
>
> --
> Gabriel Genellina

This look very interesting! But it looks like that no documents is
well-formed! I've tried several RSS-feeds, but they are eighter
"undefined entity" or "not well-formed". This is not how it should be,
right? :)




More information about the Python-list mailing list