Simple elementtree question

Stefan Behnel stefan.behnel-n05pAM at web.de
Thu Aug 30 15:28:30 EDT 2007


IamIan wrote:
> This is in Python 2.3.5. I've had success with elementtree and other
> RSS feeds, but I can't get it to work with this format:
> 
> <?xml version="1.0"?><rdf:RDF
>  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>  xmlns:dc="http://purl.org/dc/elements/1.1/"
>  xmlns:fr="http://ASPRSS.com/fr.html"
>  xmlns:pa="http://ASPRSS.com/pa.html"
>  xmlns="http://purl.org/rss/1.0/">
> <channel rdf:about="http://www.sample.com">
> <title>Example feed</title>
[...]
> </rdf:RDF>
> 
> What I want to extract is the text in the title and link tags for each
> item (eg. <title>First story</title> and <link>http://www.sample.com/
> news/20000/news.htm</link>). Starting with the title, my test script
> is:
> 
> import sys
> from urllib import urlopen
> 
> import elementtree.ElementTree as ET
> 
> news = urlopen("http://www.sample.com/rss/rss.xml")
> nTree = ET.parse(news)
> for item in nTree.getiterator("title"):
>   print item.text
> 
> Whether I try this for title or link, nothing is printed.

Your document uses namespaces. What you are looking for is not the tag "title"
without a namespace, but the tag "{http://purl.org/rss/1.0/}title" with the
default namespace.

http://effbot.org/zone/element.htm#xml-namespaces

Stefan



More information about the Python-list mailing list