[ python-Bugs-1144533 ] htmllib quote parse error within a <script>
SourceForge.net
noreply at sourceforge.net
Sat Feb 19 22:02:09 CET 2005
Bugs item #1144533, was opened at 2005-02-19 13:02
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1144533&group_id=5470
Category: Python Library
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Allan Hoeltje (ahoeltje)
Assigned to: Nobody/Anonymous (nobody)
Summary: htmllib quote parse error within a <script>
Initial Comment:
I am using the htmllib to parse web pages for plain text content. I
came across a web page that contained a script construct similar
to the example below. Note that the script is itself writing a script.
The htmllib appears to be confused by the use of single and double
quotes used within the real <script> and </script> tags.
I am using "Python 2.3 (#1, Sep 13 2003, 00:49:11) [GCC
3.3 20030304 (Apple Computer, Inc. build 1495)] on darwin" on a
PowerBook G4 running OSX 10.3.8.
<html>
<body>
<h1> This is a test </h1>
<br>
<blockquote>
<script language="JavaScript">
rnum = Math.round( Math.random() * 100000 );
document.write( '<scr' + 'ipt src="http://www.a.org/' +
rnum + '/"></scr' + 'ipt>' );
</script>
</blockquote>
</body>
</html>
Here is the Python trace:
Traceback (most recent call last):
File "cleanFeed.py", line 26, in ?
clean = stripHtml.strip( feed )
File "/Users/allan/Desktop/Mood for Today/stripHtml.py", line
144, in strip
parser.feed(s)
File "/System/Library/Frameworks/Python.framework/Versions/
2.3/lib/python2.3/HTMLParser.py", line 108, in feed
self.goahead(0)
File "/System/Library/Frameworks/Python.framework/Versions/
2.3/lib/python2.3/HTMLParser.py", line 150, in goahead
k = self.parse_endtag(i)
File "/System/Library/Frameworks/Python.framework/Versions/
2.3/lib/python2.3/HTMLParser.py", line 327, in parse_endtag
self.error("bad end tag: %s" % `rawdata[i:j]`)
File "/System/Library/Frameworks/Python.framework/Versions/
2.3/lib/python2.3/HTMLParser.py", line 115, in error
raise HTMLParseError(message, self.getpos())
HTMLParser.HTMLParseError: bad end tag: "</scr' + 'ipt>", at line
1, column 309
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1144533&group_id=5470
More information about the Python-bugs-list
mailing list