convert xhtml back to html

Tim Arnold tim.arnold at sas.com
Thu Apr 24 12:48:18 EDT 2008


"Arnaud Delobelle" <arnodel at googlemail.com> wrote in message 
news:m28wz3cjd1.fsf at googlemail.com...
> "Tim Arnold" <tim.arnold at sas.com> writes:
>
>> hi, I've got lots of xhtml pages that need to be fed to MS HTML Workshop 
>> to
>> create  CHM files. That application really hates xhtml, so I need to 
>> convert
>> self-ending tags (e.g. <br />) to plain html (e.g. <br>).
>>
>> Seems simple enough, but I'm having some trouble with it. regexps trip up
>> because I also have to take into account 'img', 'meta', 'link' tags, not
>> just the simple 'br' and 'hr' tags. Well, maybe there's a simple way to 
>> do
>> that with regexps, but my simpleminded <img[^(/>)]+/> doesn't work. I'm 
>> not
>> enough of a regexp pro to figure out that lookahead stuff.
>
> Hi, I'm not sure if this is very helpful but the following works on
> the very simple example below.
>
>>>> import re
>>>> xhtml = '<p>hello <img src="/img.png"/> spam <br/> bye </p>'
>>>> xtag = re.compile(r'<([^>]*?)/>')
>>>> xtag.sub(r'<\1>', xhtml)
> '<p>hello <img src="/img.png"> spam <br> bye </p>'
>
>
> -- 
> Arnaud

Thanks for that. It is helpful--I guess I had a brain malfunction. Your 
example will work for me I'm pretty sure, except in some cases where the IMG 
alt text contains a gt sign. I'm not sure that's even possible, so maybe 
this will do the job.
thanks,
--Tim





More information about the Python-list mailing list