Splitting SAX results

IamIan iansan at gmail.com
Tue Jun 12 15:16:45 EDT 2007


I do know how split works, but thank you for the response. The end
result that I want is a dictionary made up of the title results coming
through SAX, looking like {'Title1: Description',
'Title2:Description'}.

The XML data looks like:
<item>
<title>Title1:Description</title>
<link>Link</link>
<description>Desc</description>
<author>Author</author>
<pubDate>Date</pubDate>
</item>
<item>
<title>Title2:Description</title>
<link>Link</link>
<description>Desc</description>
<author>Author</author>
<pubDate>Date</pubDate>
</item>

I've tried different approaches, a couple of which I've added to the
code below (only running one option at a time):

from xml.sax import make_parser
from xml.sax.handler import ContentHandler

tracker = [] # Option 1
tracker = {} # Option 2

class reportHandler(ContentHandler):

  def __init__(self):
    self.isReport = 0

  def startElement(self, name, attrs):
    if name == 'title':
      self.isReport = 1
      self.reportText = ''

  def characters(self, ch):
    if self.isReport:
      self.reportText += ch
      tracker.append(ch) # Option 1
      key, value = ch.split (':') # Option 2
      tracker[key] = value

  def endElement(self, name):
    if name == 'title':
      self.isReport = 0
      print self.reportText

parser = make_parser()
parser.setContentHandler(reportHandler())
parser.parse('http://www.some.com/rss/')

print tracker


Option 1 returns a list with the markup included, looking like:
[u'Title1:", u'\n', u'Description ', u'\n', u'\t\t\t', u'Title2:',
u'\n', u'Description ', u'\n', u'\t\t\t', etc]

Option 2 fails with the traceback:
File "C:\test.py", line 21, in characters
    key, value = ch.split(':')
ValueError: need more than 1 value to unpack

Thank you for the help!




More information about the Python-list mailing list