clean up html document created by Word
Shane Geiger
sgeiger at ncee.net
Fri Mar 30 13:44:22 EDT 2007
Tidy can now perform wonders on HTML saved from Microsoft Word 2000!
Word bulks out HTML files with stuff for round-tripping presentation
between HTML and Word. If you are more concerned about using HTML on the
Web, check out Tidy's "Word-2000"
<http://www.w3.org/People/Raggett/tidy/#word2000> config option! Of
course Tidy does a good job on Word'97 files as well!
-- source: http://www.w3.org/People/Raggett/tidy/
jkn wrote:
> IIUC, the original poster is asking about 'cleaning up' in the sense
> of removing the swathes of unnecessary and/or redundant 'cruft' that
> Word puts in there, rather than making valid HTML out of invalid HTML.
> Again, IIUC, HTMLtidy does not do this.
>
> If Beautiful Soup does, then I'm intererested!
>
> jon N
>
>
--
Shane Geiger
IT Director
National Council on Economic Education
sgeiger at ncee.net | 402-438-8958 | http://www.ncee.net
Leading the Campaign for Economic and Financial Literacy
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sgeiger.vcf
Type: text/x-vcard
Size: 310 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-list/attachments/20070330/5c2d97e4/attachment.vcf>
More information about the Python-list
mailing list