python regex: variable length of positive lookbehind assertion

Jussi Piitulainen jussi.piitulainen at helsinki.fi
Wed Jun 15 08:55:42 EDT 2016


alister writes:

> On Tue, 14 Jun 2016 20:28:24 -0700, Yubin Ruan wrote:
>
>> Hi everyone,
>> I am struggling writing a right regex that match what I want:
>> 
>> Problem Description:
>> 
>> Given a string like this:
>> 
>>     >>>string = "false_head <a>aaa</a> <a>bbb</a> false_tail \
>>              true_head some_text_here <a>ccc</a> <a>ddd</a> <a>eee</a>
>>              true_tail"
>> 
>> I want to match the all the text surrounded by those "<a> </a>",
>> but only if those "<a> </a>" locate **in some distance** behind
>> "true_head". That is, I expect to result to be like this:
>> 
>>     >>>import re result = re.findall("the_regex",string)
>>     >>>print result
>>     ["ccc","ddd","eee"]
>> 
>> How can I write a regex to match that?
>> I have try to use the **positive lookbehind assertion** in python regex,
>> but it does not allowed variable length of lookbehind.
>> 
>> Thanks in advance,
>> Ruan
>
> don't try to use regex to parse html it wont work reliably
> i am surprised no one has mentioned beautifulsoup yet, which is probably 
> what you require.

Nothing in the question indicates that the data is HTML.



More information about the Python-list mailing list