[Tutor] rss feed reader, but having trouble with unicode

Tom tavspamnofwd at googlemail.com
Tue Feb 26 19:36:53 CET 2008


I'm trying to write a little rss feed reader, but having trouble with
unicode. I would appreciate some help as I feel I'm going round in
circles.

Even when the save command works, ElementTree won't or vice-versa. You
can see what I've been trying from my commented out lines. I think
there is a problem with my understanding of unicode, so feel free to
enlighten me. What encoding is the xml string before I do anything?
Does my approach below make any sense???

import urllib, re, os, sys
os.environ['DJANGO_SETTINGS_MODULE'] = 'djsite.settings'
from djsite.djapp.models import Feed
from xml.etree import ElementTree

url = 'http://www.osirra.com/rss/rss20/1'
#'http://www.michaelmoore.com/rss/mikeinthenews.xml'
#'http://www.michaelmoore.com/rss/mustread.xml'

f = urllib.urlopen(url)
xml = f.read()
f.close()

feed = Feed.objects.get(url=url)

if xml:
    ms = re.findall('\<\?xml version\=\"[^"]+\" encoding\=\"([^"]+)\"\?\>', xml)
if ms:
    encoding = ms[0]
else:
    encoding = 'utf-8'
print 'using encoding:', encoding

#xml = xml.encode(encoding, 'replace')
##xml = xml.decode(encoding, 'replace')
#xml = unicode(xml, encoding)
#xml = unicode(xml)

elem = ElementTree.fromstring(xml)
#do stuff with elem...

feed.xml = xml
feed.save()


Thanks for your time :-)


More information about the Tutor mailing list