[XML-SIG] A "tolerant" parser for structure-challenged HTML files

Alexandre Fayolle Alexandre.Fayolle@logilab.fr
Fri, 20 Jul 2001 18:09:13 +0200 (CEST)


On Fri, 20 Jul 2001, Rich Salz wrote:

> Detlef Lannert wrote:
> > 
> > A couple of weeks ago I was faced with the problem of processing a few
> > web pages which were generated by Microsoft Word (and post-processed
> 
> You might want to look at the "microsoft demoroniser" :)
> 	http://www.fourmilab.ch/webtools/demoroniser/

You can also use Tidy which has a special mode for MS Word files. 
http://www.w3.org/People/Raggett/tidy/

Alexandre Fayolle
-- 
LOGILAB, Paris (France).
http://www.logilab.com   http://www.logilab.fr  http://www.logilab.org
Narval, the first software agent available as free software (GPL).