python regex: variable length of positive lookbehind assertion

alister alister.ware at ntlworld.com
Wed Jun 15 08:27:52 EDT 2016


On Tue, 14 Jun 2016 20:28:24 -0700, Yubin Ruan wrote:

> Hi everyone,
> I am struggling writing a right regex that match what I want:
> 
> Problem Description:
> 
> Given a string like this:
> 
>     >>>string = "false_head <a>aaa</a> <a>bbb</a> false_tail \
>              true_head some_text_here <a>ccc</a> <a>ddd</a> <a>eee</a>
>              true_tail"
> 
> I want to match the all the text surrounded by those "<a> </a>",
> but only if those "<a> </a>" locate **in some distance** behind
> "true_head". That is, I expect to result to be like this:
> 
>     >>>import re result = re.findall("the_regex",string)
>     >>>print result
>     ["ccc","ddd","eee"]
> 
> How can I write a regex to match that?
> I have try to use the **positive lookbehind assertion** in python regex,
> but it does not allowed variable length of lookbehind.
> 
> Thanks in advance,
> Ruan

don't try to use regex to parse html it wont work reliably
i am surprised no one has mentioned beautifulsoup yet, which is probably 
what you require.





-- 
What we anticipate seldom occurs; what we least expect generally happens.
-- Bengamin Disraeli



More information about the Python-list mailing list