Double replace or single re.sub?

Wed Oct 26 09:22:21 EDT 2005

"Iain King" <iainking at gmail.com> writes:

> I have some code that converts html into xhtml.  For example, convert
> all <i> tags into <em>.  Right now I need to do to string.replace calls
> for every tag:
>
> html = html.replace('<i>','<em>')
> html = html.replace('</i>','</em>')
>
> I can change this to a single call to re.sub:
>
> html = re.sub('<([/]*)i>', r'<\1em>', html)
>
> Would this be a quicker/better way of doing it?

Maybe. You could measure it and see. But neither will work in the face
of attributes or whitespace in the tag.

If you're going to parse [X]HTML, you really should use tools that are
designed for the job. If you have well-formed HTML, you can use the
htmllib parser in the standard library. If you have the usual crap one
finds on the web, I recommend BeautifulSoup.

      <mike
-- 
Mike Meyer <mwm at mired.org>			http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.