htmllib.py and parsing malformed HTML [SOLVED]
KC
nskhcarlso at bellsouth.net
Tue Sep 2 10:00:53 EDT 2003
KC wrote:
>
> What would be really nice is a way to tell the parser it was "inside" a
> <TR> when I encountered a <TD> after a closing </TR>. Browsers still
> display the HTML correctly without a starting <TR>, but if the closing
> </TR> is omitted everything gets mangled.
>
I solved this problem, perhaps not the most elegant way, but it is still
solved. Any suggestions on improvements are welcome. I added the
following method to my parser class to make this work:
def parse_endtag(self, i) :
rawdata = self.rawdata
tag = rawdata[i+2:i+4].strip().lower()
if tag == 'tr' :
self.fmtr.writer.send_tag('</TR>')
return htmllib.HTMLParser.parse_endtag(self, i)
I should also mention that I added the send_tag method to my writer
implementation which simply writes the given text to the output stream.
More information about the Python-list
mailing list