[ python-Bugs-1144533 ] htmllib quote parse error within a <script>

SourceForge.net noreply at sourceforge.net
Sat Feb 19 22:02:09 CET 2005


Bugs item #1144533, was opened at 2005-02-19 13:02
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1144533&group_id=5470

Category: Python Library
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Allan Hoeltje (ahoeltje)
Assigned to: Nobody/Anonymous (nobody)
Summary: htmllib quote parse error within a <script>

Initial Comment:
I am using the htmllib to parse web pages for plain text content.  I 
came across a web page that contained a script construct similar 
to the example below.  Note that the script is itself writing a script.  
The htmllib appears to be confused by the use of single and double 
quotes used within the real <script> and </script> tags.

I am using "Python 2.3 (#1, Sep 13 2003, 00:49:11) [GCC 
3.3 20030304 (Apple Computer, Inc. build 1495)] on darwin" on a 
PowerBook G4 running OSX 10.3.8.


<html>
  <body>
    <h1> This is a test </h1>
    <br>
    <blockquote>
      <script language="JavaScript">
        rnum = Math.round( Math.random() * 100000 );
        document.write( '<scr' + 'ipt src="http://www.a.org/' + 
rnum + '/"></scr' + 'ipt>' );
      </script>
    </blockquote>
  </body>
</html>

Here is the Python trace:

Traceback (most recent call last):
  File "cleanFeed.py", line 26, in ?
    clean = stripHtml.strip( feed )
  File "/Users/allan/Desktop/Mood for Today/stripHtml.py", line 
144, in strip
    parser.feed(s)
  File "/System/Library/Frameworks/Python.framework/Versions/
2.3/lib/python2.3/HTMLParser.py", line 108, in feed
    self.goahead(0)
  File "/System/Library/Frameworks/Python.framework/Versions/
2.3/lib/python2.3/HTMLParser.py", line 150, in goahead
    k = self.parse_endtag(i)
  File "/System/Library/Frameworks/Python.framework/Versions/
2.3/lib/python2.3/HTMLParser.py", line 327, in parse_endtag
    self.error("bad end tag: %s" % `rawdata[i:j]`)
  File "/System/Library/Frameworks/Python.framework/Versions/
2.3/lib/python2.3/HTMLParser.py", line 115, in error
    raise HTMLParseError(message, self.getpos())
HTMLParser.HTMLParseError: bad end tag: "</scr' + 'ipt>", at line 
1, column 309




----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1144533&group_id=5470


More information about the Python-bugs-list mailing list