[Python-bugs-list] [ python-Bugs-705983 ] simple HTMLParser doesn't ignore < within pre-formatted text

SourceForge.net noreply@sourceforge.net
Tue, 18 Mar 2003 16:32:36 -0800


Bugs item #705983, was opened at 2003-03-19 00:32
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=705983&group_id=5470

Category: Python Library
Group: Python 2.2.2
Status: Open
Resolution: None
Priority: 5
Submitted By: David C. Fox (dcfox)
Assigned to: Nobody/Anonymous (nobody)
Summary: simple HTMLParser doesn't ignore < within pre-formatted text

Initial Comment:
The simple HTMLParser in the HTMLParser module fails to
ignore angle brackets or less-than signs within
preformatted text delimited by <PRE> ... </PRE> or
examples <XMP> ... </XMP>.

For example, if I use HTMLParser.HTMLParser to parse
the contents of

http://www.ataword.com/programming/dragons.html,

I get the following (incorrect) error message:

Traceback (most recent call last):
  File "<pyshell#6>", line 1, in ?
    p.close()
  File "E:\PYTHON22\lib\HTMLParser.py", line 112, in close
    self.goahead(1)
  File "E:\PYTHON22\lib\HTMLParser.py", line 166, in
goahead
    self.error("EOF in middle of construct")
  File "E:\PYTHON22\lib\HTMLParser.py", line 115, in error
    raise HTMLParseError(message, self.getpos())
HTMLParseError: EOF in middle of construct, at line 50,
column 31


The more advanced parser in htmllib deals with these
cases properly.

Even if this isn't worth fixing, it would be nice if
this limitation were noted in the library documentation.



----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=705983&group_id=5470