HTMLParser fragility

Richie Hindle richie at entrian.com
Fri Apr 7 04:42:42 EDT 2006


[Richie]
> But Tidy fails on huge numbers of real-world HTML pages.  [...]
> Is there a Python HTML tidier which will do as good a job as a browser?

[Walter]
> You can also use the HTML parser from libxml2

[Paul]
> libxml2 will attempt to parse HTML if asked to [...] See how it fixes
> up the mismatching tags.

Great!  Many thanks.

-- 
Richie Hindle
richie at entrian.com



More information about the Python-list mailing list