Help w/ HTMLParser lib
Kevin T. Ryan
kevryan0701 at yahoo.com
Thu May 20 23:23:05 EDT 2004
Hi all -
I'm somewhat new to python (about 1 year), and I'm trying to write a program
that opens a file like object w/ urllib.urlopen, and then parse the data by
passing it to a class that subclasses HTMLParser.HTMLParser. On the web
page, however, there is javascript - and I think that is causing an error
in parsing the data. Here's the error:
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "html_helper.py", line 30, in parse_data
p.feed(data)
File "//usr/lib/python2.2/HTMLParser.py", line 108, in feed
self.goahead(0)
File "//usr/lib/python2.2/HTMLParser.py", line 150, in goahead
k = self.parse_endtag(i)
File "//usr/lib/python2.2/HTMLParser.py", line 329, in parse_endtag
self.error("bad end tag: %s" % `rawdata[i:j]`)
File "//usr/lib/python2.2/HTMLParser.py", line 115, in error
raise HTMLParseError(message, self.getpos())
HTMLParser.HTMLParseError: bad end tag: "</scr' + 'ipt>", at line 411,
column 7
I've tried to use a try/except clause both w/in my class and w/in a function
that wraps the class for easy access, but to no avail. The code works on
other websites, so I know that it's not *completely* off. Any help would
be greatly appreciated! TIA :)
Kevin
More information about the Python-list
mailing list