Removing an attribute from html with Regex

Selvam s.selvamsiva at gmail.com
Thu Dec 30 02:30:58 EST 2010


Hi all,

I have some HTML string which I would like to feed to BeautifulSoup.

But, One malformed attribute breaks BeautifulSoup.

    <p style='terp_header' wrong_tag=' text1 ' text2 ' and 'para'  '
 class='terp_header'> My String</p>

I would like it to replace all the occurances of that attribute with an
empty string.

I am unable to figure out the exact regex, which can do this job.

This is what, I have managed so far,

m = re.compile("rml_except='([^']*)")

As you see, it will stop at the first occurance of single quote.

Any suggestions will be useful.

-- 
Regards,
S.Selvam
SG E-ndicus Infotech Pvt Ltd.
http://e-ndicus.com/

 " I am because we are "
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20101230/1d936b2e/attachment-0001.html>


More information about the Python-list mailing list