Found a parsing bug in HTMLParser

Grzegorz Adam Hankiewicz gradha at terra.es
Sun Feb 9 12:06:56 EST 2003


Hi.

I've found a bug in HTMLParser parsing some of my webpages. The
problem is using an attribute with a value inside double quotes
which is near another attribute. I've created a small testcase
which you can see below. The w3c validator says the page is ok
(http://validator.w3.org/check?uri=http://www.terra.es/personal7/gradha/test.html),
and browsers render it without problems.  Does it happen with newer
Python versions? What's the procedure for bug reports?

PD: Don't CC me your replies.

$ cat test.html
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><head><title>t</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
</head><body>
<a href="http://ss"title="pe">P</a>
</body></html>

$ python
Python 2.2.1 (#1, Apr 21 2002, 08:38:44)
[GCC 2.95.4 20011002 (Debian prerelease)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from HTMLParser import HTMLParser
>>> p = HTMLParser()
>>> file = open("test.html", "rt")
>>> p.feed("".join(file.readlines()))
>>> file.close()
>>> p.close()
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/usr/lib/python2.2/HTMLParser.py", line 112, in close
    self.goahead(1)
  File "/usr/lib/python2.2/HTMLParser.py", line 166, in goahead
    self.error("EOF in middle of construct")
  File "/usr/lib/python2.2/HTMLParser.py", line 115, in error
    raise HTMLParseError(message, self.getpos())
HTMLParser.HTMLParseError: EOF in middle of construct, at line 5, column 1





More information about the Python-list mailing list