regex remove closest tag

S.Selvam s.selvamsiva at gmail.com
Thu Nov 12 05:55:44 EST 2009


Hi all,


1) I need to remove the <a> tags which is just before the keyword(i.e
some_text2 ) excluding others.

2) input string may or may not contain <a> tags.

3) Sample input:

    inputstr = """start <a href="some_url">some_text1</a> <a
href="">some_text2</a> keyword anything"""

4) I came up with the following regex,


p=re.compile(r'(?P<good1>.*?)(\s*<a.*?</a>keyword|\s*keyword)(?P<good2>.*)',re.DOTALL|re.I)
   s=p.search(inputstr)
  but second group matches both <a> tags,while  i need to match the recent
one only.

I would like to get your suggestions.

Note:

   If i leave group('good1') as greedy, then it matches both the <a> tag.
-- 
Yours,
S.Selvam
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20091112/fe4fee63/attachment.html>


More information about the Python-list mailing list