HTMLParser chokes on bad end tag in comment

Fredrik Lundh fredrik at pythonware.com
Mon May 29 02:35:29 EDT 2006


Rene Pijlman wrote:

> The code below results in an exception (Python 2.4.2):
> 
> HTMLParser.HTMLParseError: bad end tag: "</foo' + 'bar>", at line 4,
> column 6
> 
> Should it? The end tag it chokes on is in comment, isn't it?

no.  STYLE and SCRIPT elements contain character data, not parsed 
character data, so comments are treated as characters, and the first 
"</" ends the element.

if you have broken documents, you can tweak this by setting the 
CDATA_CONTENT_ELEMENTS parser attribute before you start parsing.

</F>




More information about the Python-list mailing list