[XML-SIG] How to get SAX to parse not well formed HTML doc?

Fred L. Drake, Jr. fdrake@acm.org
Wed, 18 Jul 2001 10:28:42 -0400 (EDT)


Martin v. Loewis writes:
 > Of course, a "true" HTML parser should get the DTD right,
 > i.e. generate closing elements where they are missing, expand entities
 > (to unicode strings), etc.

  A "true" HTML parser would do a lot better than the
HTMLParser.HTMLParser class; it exhibits the expectation of the
project that it was created for -- to allow editing the file without
adding new lexical tokens in the output as a side effect of the
parse.  There are certainly other ways to achieve that goal, but this
made the most sense for the original application.
  It should be fairly easy to add a smarter parser as a subclass; this
should arguably be added to the current module.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations