Help w/ HTMLParser lib

Kevin T. Ryan ktr46 at hotmail.com
Fri May 21 10:51:38 EDT 2004


Thanks to both of you - I will try to incorporate the regex's and I'll check
out tidy.  Take care,

Kevin

"Kevin T. Ryan" <kevryan0701 at yahoo.com> wrote in message
news:40ad7619$0$3114$61fed72c at news.rcn.com...
> Hi all -
>
> I'm somewhat new to python (about 1 year), and I'm trying to write a
program
> that opens a file like object w/ urllib.urlopen, and then parse the data
by
> passing it to a class that subclasses HTMLParser.HTMLParser.  On the web
> page, however, there is javascript - and I think that is causing an error
> in parsing the data.  Here's the error:
>
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
>   File "html_helper.py", line 30, in parse_data
>     p.feed(data)
>   File "//usr/lib/python2.2/HTMLParser.py", line 108, in feed
>     self.goahead(0)
>   File "//usr/lib/python2.2/HTMLParser.py", line 150, in goahead
>     k = self.parse_endtag(i)
>   File "//usr/lib/python2.2/HTMLParser.py", line 329, in parse_endtag
>     self.error("bad end tag: %s" % `rawdata[i:j]`)
>   File "//usr/lib/python2.2/HTMLParser.py", line 115, in error
>     raise HTMLParseError(message, self.getpos())
> HTMLParser.HTMLParseError: bad end tag: "</scr' + 'ipt>", at line 411,
> column 7
>
> I've tried to use a try/except clause both w/in my class and w/in a
function
> that wraps the class for easy access, but to no avail.  The code works on
> other websites, so I know that it's not *completely* off.  Any help would
> be greatly appreciated!  TIA :)
>
> Kevin





More information about the Python-list mailing list