[Python-Dev] htmllib vs. HTMLParser

Bill Janssen janssen at parc.com
Mon Oct 27 19:53:32 EST 2003


Glad to see you volunteering!

But IMO simply adding some handler methods won't really do it.  You
also need to introduce some knowledge about the semantics of the
syntax.  For example, a new "block"-level element should close all
"in-line" elements that are currently open.  Etc.

It would also be handy to have a version of the parser that takes an
HTML page and returns a parse tree, rather than the halfway solution
we currently have, forcing the user to design and write a lot of code
to get anything done.

Bill



More information about the Python-Dev mailing list