HTMLParser fix
Dan Walton
dusenetw4 at opti.cgi.net
Mon Aug 26 03:12:49 EDT 2002
I ran into a problem this evening getting the HTMLParser to parse an
html page with embedded script tags which contain end tag elements
inside the script. Take the following html for instance:
<html>
<body>
<script>
<!--
document.write('<h1>testing</h1>');
-->
</script>
</body>
</html>
In the HTMLParser module which comes with Python 2.2.1chops this up
and returns the </h1> as a tag event when it should be part of a data
event.
The following patch should fix this problem:
98,99d97
< self.cdata_mode = 0
< self.cdata = []
126d123
< self.cdata_mode = 1
130,132d126
< self.handle_data(''.join(self.cdata))
< self.cdata_mode = 0
< self.cdata = []
148,152c142
< if i < j:
< if(self.cdata_mode):
< self.cdata.append(rawdata[i:j])
< else:
< self.handle_data(rawdata[i:j])
---
> if i < j: self.handle_data(rawdata[i:j])
160a151,152
> if k >= 0:
> self.clear_cdata_mode()
339,345d330
< #print('parse_endtag[%s]' % tag)
< if(self.cdata_mode):
< if(tag.lower() in self.CDATA_CONTENT_ELEMENTS):
< self.clear_cdata_mode()
< else:
< self.cdata.append(rawdata[i:j])
< return j
More information about the Python-list
mailing list