HTMLParser chokes on bad end tag in comment

Rene Pijlman reply.in.the.newsgroup at my.address.is.invalid
Mon May 29 03:05:17 EDT 2006


Fredrik Lundh:
>Rene Pijlman:
>[end tag in html comment in script element]
>The end tag it chokes on is in comment, isn't it?
>
>no.  STYLE and SCRIPT elements contain character data, not parsed 
>character data, so comments are treated as characters, and the first 
>"</" ends the element.

Ah, I see. I'll report the problem to the application that's generating
this broken code (vBulletin forum)...

>if you have broken documents, you can tweak this by setting the 
>CDATA_CONTENT_ELEMENTS parser attribute before you start parsing.

... and in the mean time that's a good workaround.

Thank you very much Fredrik.

-- 
René Pijlman



More information about the Python-list mailing list