HTMLParser chokes on bad end tag in comment
Fredrik Lundh
fredrik at pythonware.com
Mon May 29 02:35:29 EDT 2006
Rene Pijlman wrote:
> The code below results in an exception (Python 2.4.2):
>
> HTMLParser.HTMLParseError: bad end tag: "</foo' + 'bar>", at line 4,
> column 6
>
> Should it? The end tag it chokes on is in comment, isn't it?
no. STYLE and SCRIPT elements contain character data, not parsed
character data, so comments are treated as characters, and the first
"</" ends the element.
if you have broken documents, you can tweak this by setting the
CDATA_CONTENT_ELEMENTS parser attribute before you start parsing.
</F>
More information about the Python-list
mailing list