URL listers

P. Daniell pdaniell at ign.com
Mon Nov 17 02:21:19 EST 2003


I have the following HTML document

<html>
<body>
<a href="http://www.yahoo.com">I don't give a hoot</a>
</body>
</html>

I want my HTMLParser subclass (code below) to output

http://www.yahoo.com I don't give a hoot

Instead it outputs 

http://www.yahoo.com I don
http://www.yahoo.com  '
http://www.yahoo.com t give a hoot


Would anyone care to give me some guidance on how to fix this?

Thanks, 
PD



class URLLister(HTMLParser):
 def __init__(self):
  HTMLParser.__init__(self, formatter.NullFormatter())
  self.in_a = 0
  self.tempurl = ''
  
 def anchor_bgn(self, href, name, type):
  self.in_a = 1
  self.tempurl = href

 def anchor_end(self):
  self.in_a = 0
 
 def handle_data(self, data):
  if self.in_a == 1:
   print self.tempurl, data
   





More information about the Python-list mailing list