[New-bugs-announce] [issue7114] HTMLParser doesn't handle <![CDATA[ ... ]]>

Greg Baker report at bugs.python.org
Mon Oct 12 23:32:53 CEST 2009


New submission from Greg Baker <ggbaker at sfu.ca>:

I believe what I'm seeing here is somewhat related to issue 670664, but
is easier to handle because of the CDATA structure.  Basically,
HTMLParser doesn't recognize CDATA sections at all, so their content is
incorrectly parsed like normal data.

The following is an attempt to parse (a snippet of) valid XHTML, but it
raises an HTMLParseError.

data = """<script type="text/javascript">
//<![CDATA[
function foo() {
document.write('"></' + 'script>');}
//]]>
</script>"""

from HTMLParser import HTMLParser
parser = HTMLParser()
parser.feed(data)

----------
components: Library (Lib)
messages: 93905
nosy: ggbaker
severity: normal
status: open
title: HTMLParser doesn't handle <![CDATA[ ... ]]>
type: behavior
versions: Python 2.6

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue7114>
_______________________________________


More information about the New-bugs-announce mailing list