creat a DOM from an html document

Xavier Morel xavier.morel at masklinn.net
Thu Feb 9 17:37:13 EST 2006


Mark Harrison wrote:
> I thought I saw a package that would create a DOM from html, with
> allowances that it would do a "best effort" job to parse
> non-perfectly formed html.
> 
> Now I can't seem to find this... does anybody have a recommendation
> as to a good package to look at?
> 
> Many TIA!
> Mark
While it doesn't generate a W3C DOM, BeautifulSoup is probably your best 
bet for parsing less-than-perfect HTML and get something useable out of it.

Once you have your (parsed) document, you can either use it as is or try 
to convert it to a valid W3C DOM though.



More information about the Python-list mailing list