[XML-SIG] How to get SAX to parse not well formed HTML doc?

Fred L. Drake, Jr. fdrake@acm.org
Tue, 17 Jul 2001 07:50:16 -0400 (EDT)


Dirksen writes:
 > I need to parse a bunch of HTML documents, yet the parser is too 
 > strict for this task. It stops at places where considered correct by 
 > HTML rules, like unquoted attributes. Can I make the parser more 
 > relaxed toward HTML documents?

Martin C Brown writes:
 > The HTML parser is in htmllib and works in much the same way, and it handles
 > unquoted attributes without any problems.

  Another possibility would be to use the HTMLParser module, which is
new in Python 2.2.  It was originally developed for another project
and is stable and well-tested.  Feel free to extract the module from
the Python CVS repository.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations