Expat issue

diwanh at monica.cs.rpi.edu diwanh at monica.cs.rpi.edu
Mon Dec 23 18:05:38 EST 2002


I'm trying to write a simple script to gather news from several RSS sites. The
code is given below:
from xml.dom import minidom
import urllib
def load(url):
        return minidom.parse(urllib.urlopen(url))
DEFAULT_NAMESPACES= (None, 'http://purl.org/rss/1.0/',
'http://my.netscape.com/rdf/simple/0.9')
def getElementsByTagName(node,tagName,possibleNamespaces=DEFAULT_NAMESPACES):
        for namespace in possibleNamespaces:
                children = node.getElementsByTagName(namespace,tagName)
                if len(children):return children
        return []
def first(node,tagName,possibleNameSpaces=DEFAULT_NAMESPACES):
        children=getElementByTagName(node,tagName)
        return len(children) and children[0] or None
def textOf(node):
        return node and "".join([child.data for child in node.childNode]) or ""
DUBLIN_CORE=('http://purl.org/dc/elements/1.1')
if __name__ == '__main__':
        import sys
        for s in sys.argv:
                rssDocument = load(s)
                for item in getElementsByTagName(rssDocument,item):
                        print 'title:', textOf(first(item, 'title'))
                        print 'link:', textOf(first(item, 'link'))
                        print 'description', textOf(first(item, 'description'))
                        print 'date:', textOf(first(item, 'date', DUBLIN_CORE))
                        print 'author:', textOf(first(item, 'author',
DUBLIN_CORE))
                        print '-----------------------'

It bombs on any xml file handed to it with:
Traceback (most recent call last):
  File "./rss.py", line 21, in ?
    rssDocument = load(s)
  File "./rss.py", line 5, in load
    return minidom.parse(urllib.urlopen(url))
  File "/usr/local/lib/python2.2/site-packages/_xmlplus/dom/minidom.py", line
1595, in parse
    return expatbuilder.parse(file)
  File "/usr/local/lib/python2.2/site-packages/_xmlplus/dom/expatbuilder.py",
line 931, in parse
    result = builder.parseFile(file)
  File "/usr/local/lib/python2.2/site-packages/_xmlplus/dom/expatbuilder.py",
line 173, in parseFile
    parser.Parse(buffer, 0)
xml.parsers.expat.ExpatError: not well-formed (invalid token): line 1, column 1

I'm reasonably sure that the issue is not with the XML (as several different
feeds have exhibited the same behavior). Thanks in advance!

-- 
Hasan =)
PGP Key: http://www.cs.rpi.edu/~diwanh/pgp.key



More information about the Python-list mailing list