SGMLParser problem

Gillou nospam at bigfoot.com
Fri Nov 8 14:10:41 EST 2002


"sanjay" <sanjay2kind at yahoo.com> a écrit dans le message de news:
63170f57.0211080754.4d398296 at posting.google.com...
> Hi,
>
> Any one has suggestion for following problem. Some word documents
> have been converted to HTML page in Ms-Word. Want to filter html tags
> like..
> <o:p></o:p>,
> <![if !supportEmptyParas]> <![endif]>, etc. I couldn't solve
> using SGMLParser. Shows error like..

I'm not sure that XML namespace notation is compliant with strict SGML.
That's certainly the reason of your exception.
As Martin V.Loewis writes, Tidy makes a pretty good cleanup in the strange
MS-Word HTML and removes all that's not standard HTML4.
Search for it from www.w3.org

--Gilles






More information about the Python-list mailing list