regex remove closest tag

S.Selvam s.selvamsiva at gmail.com
Fri Nov 13 00:18:31 EST 2009


On Fri, Nov 13, 2009 at 12:47 AM, MRAB <python at mrabarnett.plus.com> wrote:

> S.Selvam wrote:
>
>> Hi all,
>>
>>
>> 1) I need to remove the <a> tags which is just before the keyword(i.e
>> some_text2 ) excluding others.
>>
>> 2) input string may or may not contain <a> tags.
>>
>> 3) Sample input:      inputstr = """start <a
>> href="some_url">some_text1</a> <a href="">some_text2</a> keyword anything"""
>>
>> 4) I came up with the following regex,
>>
>>
>> p=re.compile(r'(?P<good1>.*?)(\s*<a.*?</a>keyword|\s*keyword)(?P<good2>.*)',re.DOTALL|re.I)
>>   s=p.search(inputstr)
>>  but second group matches both <a> tags,while  i need to match the recent
>> one only.
>>
>> I would like to get your suggestions.
>>
>> Note:
>>
>>   If i leave group('good1') as greedy, then it matches both the <a> tag.
>>
>>  ".*?" can match any number of any character, so it can match any
> intervening "<a>" tags. Try "[^<]*?" instead.
>
>
Thanks a lot,

     p=re.compile(r'(?:<a[^<]*?<\/a>\s*%s)'%(keyword),re.I|re.S)   has done
it !

-- 
> http://mail.python.org/mailman/listinfo/python-list
>



-- 
Yours,
S.Selvam
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20091113/bc80a6c7/attachment-0001.html>


More information about the Python-list mailing list