HTMLParser fragility
Richie Hindle
richie at entrian.com
Fri Apr 7 04:42:42 EDT 2006
[Richie]
> But Tidy fails on huge numbers of real-world HTML pages. [...]
> Is there a Python HTML tidier which will do as good a job as a browser?
[Walter]
> You can also use the HTML parser from libxml2
[Paul]
> libxml2 will attempt to parse HTML if asked to [...] See how it fixes
> up the mismatching tags.
Great! Many thanks.
--
Richie Hindle
richie at entrian.com
More information about the Python-list
mailing list