[Web-SIG] HTML parsers and DOM; WWW::Mechanize work-alike

John J Lee jjl at pobox.com
Tue Dec 2 14:39:10 EST 2003


[...]
> > wrapper first and then gradually develop a high-level interface to it,
> > mostly written in Python. That might also insulate us from future API
> > changes to tidy better.
> >
> I think we also want to consider seriously whether tidy is what we need.
> Does it really provide a necessary function? And, even if it does, how
> valuable would that function be?

Parsing arbitrary (including broken) HTML reliably.  Processing that HTML
with XML tools.

Whether that's "necessary" or valuable is a matter for debate, obviously.


> I wasn't impressed with tidy in either
> of the two attempts I made to use it.
>
> Then, of course, there's the question of prior art:
>
> 	http://www.lemburg.com/files/python/mxTidy.html
>
> might be worth looking at before you go too much further ...

mxTidy and tidylib are based on the same code (HTMLTidy).  tidylib is
being actively maintained (though that may be a mixed blessing, depending
on the relative proportions of old and newly-introduced bugs).


John



More information about the Web-SIG mailing list