[New-bugs-announce] [issue7114] HTMLParser doesn't handle <![CDATA[ ... ]]>

Mon Oct 12 23:32:53 CEST 2009

New submission from Greg Baker <ggbaker at sfu.ca>:

I believe what I'm seeing here is somewhat related to issue 670664, but
is easier to handle because of the CDATA structure.  Basically,
HTMLParser doesn't recognize CDATA sections at all, so their content is
incorrectly parsed like normal data.

The following is an attempt to parse (a snippet of) valid XHTML, but it
raises an HTMLParseError.

data = """<script type="text/javascript">
//<![CDATA[
function foo() {
document.write('"></' + 'script>');}
//]]>
</script>"""

from HTMLParser import HTMLParser
parser = HTMLParser()
parser.feed(data)

----------
components: Library (Lib)
messages: 93905
nosy: ggbaker
severity: normal
status: open
title: HTMLParser doesn't handle <![CDATA[ ... ]]>
type: behavior
versions: Python 2.6

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue7114>
_______________________________________