beautifulsoup .vs tidy

Ravi Teja webraviteja at gmail.com
Sat Jul 1 01:23:24 EDT 2006


bruce wrote:
> hi...
>
> never used perl, but i have an issue trying to resolve some html that
> appears to be "dirty/malformed" regarding the overall structure. in
> researching validators, i came across the beautifulsoup app and wanted to
> know if anybody could give me pros/cons of the app as it relates to any of
> the other validation apps...
>
> the issue i'm facing involves parsing some websites, so i'm trying to
> extract information based on the DOM/XPath functions.. i'm using perl to
> handle the extraction....

1.) XPath is not a good idea at all with "malformed" HTML or perhaps
web pages in general.
2.) BeautifulSoup is not a validator but works well with bad HTML. Also
look at Mechanize and ClientForm.
3.) XMLStarlet is a good XML validator
(http://xmlstar.sourceforge.net/). It's not Python but you don't need
to care about the language it is written in.
4.) For a simple HTML validator, Just use http://validator.w3.org/




More information about the Python-list mailing list