htmllib.py and parsing malformed HTML [SOLVED]

KC nskhcarlso at bellsouth.net
Tue Sep 2 10:00:53 EDT 2003


KC wrote:
> 
> What would be really nice is a way to tell the parser it was "inside" a 
> <TR> when I encountered a <TD> after a closing </TR>.  Browsers still 
> display the HTML correctly without a starting <TR>, but if the closing 
> </TR> is omitted everything gets mangled.
> 
I solved this problem, perhaps not the most elegant way, but it is still 
solved.  Any suggestions on improvements are welcome.  I added the 
following method to my parser class to make this work:


   def parse_endtag(self, i) :
     rawdata = self.rawdata
     tag = rawdata[i+2:i+4].strip().lower()
     if tag == 'tr' :
       self.fmtr.writer.send_tag('</TR>')
     return htmllib.HTMLParser.parse_endtag(self, i)


I should also mention that I added the send_tag method to my writer 
implementation which simply writes the given text to the output stream.






More information about the Python-list mailing list