Help w/ HTMLParser lib

Kevin T. Ryan kevryan0701 at yahoo.com
Thu May 20 23:23:05 EDT 2004


Hi all - 

I'm somewhat new to python (about 1 year), and I'm trying to write a program
that opens a file like object w/ urllib.urlopen, and then parse the data by
passing it to a class that subclasses HTMLParser.HTMLParser.  On the web
page, however, there is javascript - and I think that is causing an error
in parsing the data.  Here's the error:

Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "html_helper.py", line 30, in parse_data
    p.feed(data)
  File "//usr/lib/python2.2/HTMLParser.py", line 108, in feed
    self.goahead(0)
  File "//usr/lib/python2.2/HTMLParser.py", line 150, in goahead
    k = self.parse_endtag(i)
  File "//usr/lib/python2.2/HTMLParser.py", line 329, in parse_endtag
    self.error("bad end tag: %s" % `rawdata[i:j]`)
  File "//usr/lib/python2.2/HTMLParser.py", line 115, in error
    raise HTMLParseError(message, self.getpos())
HTMLParser.HTMLParseError: bad end tag: "</scr' + 'ipt>", at line 411,
column 7

I've tried to use a try/except clause both w/in my class and w/in a function
that wraps the class for easy access, but to no avail.  The code works on
other websites, so I know that it's not *completely* off.  Any help would
be greatly appreciated!  TIA :)

Kevin



More information about the Python-list mailing list