[Web-SIG] HTML parsers and DOM; WWW::Mechanize work-alike

Simon Willison cs1spw at bath.ac.uk
Tue Dec 2 11:07:19 EST 2003


Stuart Langridge wrote:
> I don't see that tidy's ability to tidy HTML per se is useful, but I
> think that it's very useful in that it can take invalid HTML and
> convert it to valid XHTML. That way, we can get a DOM tree from invalid
> HTML, which is very useful...

Is there any way we could get a DOM tree from invalid HTML using pure 
Python tools? The HTML tools in the Python standard library at the 
moment are all pure Python. Could we even use the existing sgmllib 
module (or an extension of it) to create our own DOM tree from invalid HTML?




More information about the Web-SIG mailing list