convert xhtml back to html

Arnaud Delobelle arnodel at googlemail.com
Thu Apr 24 12:23:22 EDT 2008


"Tim Arnold" <tim.arnold at sas.com> writes:

> hi, I've got lots of xhtml pages that need to be fed to MS HTML Workshop to 
> create  CHM files. That application really hates xhtml, so I need to convert 
> self-ending tags (e.g. <br />) to plain html (e.g. <br>).
>
> Seems simple enough, but I'm having some trouble with it. regexps trip up 
> because I also have to take into account 'img', 'meta', 'link' tags, not 
> just the simple 'br' and 'hr' tags. Well, maybe there's a simple way to do 
> that with regexps, but my simpleminded <img[^(/>)]+/> doesn't work. I'm not 
> enough of a regexp pro to figure out that lookahead stuff.

Hi, I'm not sure if this is very helpful but the following works on
the very simple example below.

>>> import re
>>> xhtml = '<p>hello <img src="/img.png"/> spam <br/> bye </p>'
>>> xtag = re.compile(r'<([^>]*?)/>') 
>>> xtag.sub(r'<\1>', xhtml)
'<p>hello <img src="/img.png"> spam <br> bye </p>'


-- 
Arnaud



More information about the Python-list mailing list