Double replace or single re.sub?

Wed Oct 26 09:57:39 EDT 2005

Mike Meyer wrote:
> "Iain King" <iainking at gmail.com> writes:
>
> > I have some code that converts html into xhtml.  For example, convert
> > all <i> tags into <em>.  Right now I need to do to string.replace calls
> > for every tag:
> >
> > html = html.replace('<i>','<em>')
> > html = html.replace('</i>','</em>')
> >
> > I can change this to a single call to re.sub:
> >
> > html = re.sub('<([/]*)i>', r'<\1em>', html)
> >
> > Would this be a quicker/better way of doing it?
>
> Maybe. You could measure it and see. But neither will work in the face
> of attributes or whitespace in the tag.
>
> If you're going to parse [X]HTML, you really should use tools that are
> designed for the job. If you have well-formed HTML, you can use the
> htmllib parser in the standard library. If you have the usual crap one
> finds on the web, I recommend BeautifulSoup.
>

Thanks.  My initial post overstates the program a bit - what I actually
have is a cgi script which outputs my LIveJournal, which I then
server-side include in my home page (so my home page also displays the
latest X entries in my livejournal).  The only html I need to convert
is the stuff that LJ spews out, which, while bad, isn't terrible, and
is fairly consistent.  The stuff I need to convert is mostly stuff I
write myself in journal entries, so it doesn't have to be so
comprehensive that I'd need something like BeautifulSoup.  I'm not
trying to parse it, just clean it up a little.

Iain