convert xhtml back to html

bryan rasmussen rasmussen.bryan at gmail.com
Thu Apr 24 14:08:50 EDT 2008


I'll second the recommendation to use xsl-t, set the output to html.


The code for an XSL-T to do it would be basically:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="html" />
    <xsl:template match="/"><xsl:copy-of select="/"/></xsl:template>
</xsl:stylesheet>

you would probably want to do other stuff than just  copy it out but
that's another case.

Also, from my recollection the solution in CHM to make XHTML br
elements behave correctly was <br /> as opposed to <br/>, at any rate
I've done projects generating CHM and my output markup was well formed
XML at all occasions.

Cheers,
Bryan Rasmussen

On Thu, Apr 24, 2008 at 5:34 PM, Tim Arnold <tim.arnold at sas.com> wrote:
> hi, I've got lots of xhtml pages that need to be fed to MS HTML Workshop to
>  create  CHM files. That application really hates xhtml, so I need to convert
>  self-ending tags (e.g. <br />) to plain html (e.g. <br>).
>
>  Seems simple enough, but I'm having some trouble with it. regexps trip up
>  because I also have to take into account 'img', 'meta', 'link' tags, not
>  just the simple 'br' and 'hr' tags. Well, maybe there's a simple way to do
>  that with regexps, but my simpleminded <img[^(/>)]+/> doesn't work. I'm not
>  enough of a regexp pro to figure out that lookahead stuff.
>
>  I'm not sure where to start now; I looked at BeautifulSoup and
>  BeautifulStoneSoup, but I can't see how to modify the actual tag.
>
>  thanks,
>  --Tim Arnold
>
>
>  --
>  http://mail.python.org/mailman/listinfo/python-list
>



More information about the Python-list mailing list