clean up html document created by Word

Shane Geiger sgeiger at ncee.net
Fri Mar 30 13:44:22 EDT 2007


Tidy can now perform wonders on HTML saved from Microsoft Word 2000! 
Word bulks out HTML files with stuff for round-tripping presentation 
between HTML and Word. If you are more concerned about using HTML on the 
Web, check out Tidy's "Word-2000" 
<http://www.w3.org/People/Raggett/tidy/#word2000> config option! Of 
course Tidy does a good job on Word'97 files as well!
   -- source:  http://www.w3.org/People/Raggett/tidy/



jkn wrote:
> IIUC, the original poster is asking about 'cleaning up' in the sense
> of removing the swathes of unnecessary and/or redundant 'cruft' that
> Word puts in there, rather than making valid HTML out of invalid HTML.
> Again, IIUC, HTMLtidy does not do this.
>
> If Beautiful Soup does, then I'm intererested!
>
>     jon N
>
>   

-- 
Shane Geiger
IT Director
National Council on Economic Education
sgeiger at ncee.net  |  402-438-8958  |  http://www.ncee.net

Leading the Campaign for Economic and Financial Literacy

-------------- next part --------------
A non-text attachment was scrubbed...
Name: sgeiger.vcf
Type: text/x-vcard
Size: 310 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-list/attachments/20070330/5c2d97e4/attachment.vcf>


More information about the Python-list mailing list