beautifulsoup .vs tidy

Matt Good matt.good at gmail.com
Sat Jul 1 18:22:01 EDT 2006


bruce wrote:
> that's exactly what i'm trying to accomplish... i've used tidy, but it seems
> to still generate warnings...
>
>  initFile -> tidy ->cleanFile -> perl app (using xpath/livxml)
>
> the xpath/linxml functions in the perl app complain regarding the file. my
> thought is that tidy isn't cleaning enough, or that the perl xpath/libxml
> functions are too strict!

Clean HTML is not valid XML.  If you want to process the output with an
XML library you'll need to tell Tidy to output XHTML.  Then it should
be valid for XML processing.

Of course BeautifulSoup is also a very nice library if you need to
extract some information, but don't necessarilly require XML processing
to do it.

-- Matt Good




More information about the Python-list mailing list