convert xhtml back to html

M.-A. Lemburg mal at egenix.com
Thu Apr 24 13:41:43 EDT 2008


On 2008-04-24 19:16, John Krukoff wrote:
>> -----Original Message-----
>> From: python-list-bounces+jkrukoff=ltgc.com at python.org [mailto:python-
>> list-bounces+jkrukoff=ltgc.com at python.org] On Behalf Of Tim Arnold
>> Sent: Thursday, April 24, 2008 9:34 AM
>> To: python-list at python.org
>> Subject: convert xhtml back to html
>>
>> hi, I've got lots of xhtml pages that need to be fed to MS HTML Workshop
>> to
>> create  CHM files. That application really hates xhtml, so I need to
>> convert
>> self-ending tags (e.g. <br />) to plain html (e.g. <br>).
>>
>> Seems simple enough, but I'm having some trouble with it. regexps trip up
>> because I also have to take into account 'img', 'meta', 'link' tags, not
>> just the simple 'br' and 'hr' tags. Well, maybe there's a simple way to do
>> that with regexps, but my simpleminded <img[^(/>)]+/> doesn't work. I'm
>> not
>> enough of a regexp pro to figure out that lookahead stuff.
>>
>> I'm not sure where to start now; I looked at BeautifulSoup and
>> BeautifulStoneSoup, but I can't see how to modify the actual tag.

You could filter the XHTML through mxTidy and set the hide_endtags to 1:

http://www.egenix.com/products/python/mxExperimental/mxTidy/

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Apr 24 2008)
 >>> Python/Zope Consulting and Support ...        http://www.egenix.com/
 >>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::


    eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
     D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
            Registered at Amtsgericht Duesseldorf: HRB 46611



More information about the Python-list mailing list