[ python-Bugs-1076070 ] HTMLParser can't handle page with javascript

Tue May 2 22:20:34 CEST 2006

Bugs item #1076070, was opened at 2004-11-30 10:22
Message generated for change (Settings changed) made by fdrake
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1076070&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: None
>Status: Closed
>Resolution: Wont Fix
Priority: 5
Submitted By: Jeremy Hylton (jhylton)
Assigned to: Fred L. Drake, Jr. (fdrake)
Summary: HTMLParser can't handle page with javascript

Initial Comment:
Perhaps the page is malformed -- I notice at least one
other problem with it -- but I'd like to parse it. 
Relevant excerpts appear to be:

<script language="JavaScript">
<!--
um.menuCode[i].replace(/<\/(li|ul)>/ig,'</$1>\n');
-->
</script>

goahead() identifies the next interesting part of the
page as the </$1> inside the javascript.  It's not
seeing the comment.  Should it?  I changed
interesting_cdata to lookup for <! as one interesting
possibility and it parsed the comment successfully.

----------------------------------------------------------------------

>Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2006-05-02 16:20

Message:
Logged In: YES 
user_id=3066

The "<!--" hackery is used to deal with really old browsers
that don't understand <script>.  Technically, all it should
have to look for is "</".

Closing as not worth changing.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1076070&group_id=5470