[XML-SIG] A "tolerant" parser for structure-challenged HTML
files
Alexandre Fayolle
Alexandre.Fayolle@logilab.fr
Fri, 20 Jul 2001 18:09:13 +0200 (CEST)
On Fri, 20 Jul 2001, Rich Salz wrote:
> Detlef Lannert wrote:
> >
> > A couple of weeks ago I was faced with the problem of processing a few
> > web pages which were generated by Microsoft Word (and post-processed
>
> You might want to look at the "microsoft demoroniser" :)
> http://www.fourmilab.ch/webtools/demoroniser/
You can also use Tidy which has a special mode for MS Word files.
http://www.w3.org/People/Raggett/tidy/
Alexandre Fayolle
--
LOGILAB, Paris (France).
http://www.logilab.com http://www.logilab.fr http://www.logilab.org
Narval, the first software agent available as free software (GPL).