python regex: variable length of positive lookbehind assertion

alister alister.ware at ntlworld.com
Wed Jun 15 11:31:38 EDT 2016


On Wed, 15 Jun 2016 15:55:42 +0300, Jussi Piitulainen wrote:

> alister writes:
> 
>> On Tue, 14 Jun 2016 20:28:24 -0700, Yubin Ruan wrote:
>>
>>> Hi everyone,
>>> I am struggling writing a right regex that match what I want:
>>> 
>>> Problem Description:
>>> 
>>> Given a string like this:
>>> 
>>>     >>>string = "false_head <a>aaa</a> <a>bbb</a> false_tail \
>>>              true_head some_text_here <a>ccc</a> <a>ddd</a> <a>eee</a>
>>>              true_tail"
>>> 
>>> I want to match the all the text surrounded by those "<a> </a>",
>>> but only if those "<a> </a>" locate **in some distance** behind
>>> "true_head". That is, I expect to result to be like this:
>>> 
>>>     >>>import re result = re.findall("the_regex",string) print result
>>>     ["ccc","ddd","eee"]
>>> 
>>> How can I write a regex to match that?
>>> I have try to use the **positive lookbehind assertion** in python
>>> regex,
>>> but it does not allowed variable length of lookbehind.
>>> 
>>> Thanks in advance,
>>> Ruan
>>
>> don't try to use regex to parse html it wont work reliably i am
>> surprised no one has mentioned beautifulsoup yet, which is probably
>> what you require.
> 
> Nothing in the question indicates that the data is HTML.

the <a></a> tags are a prety good indicator though
even if it is not HTML the same advise stands for XML (the quote example 
would be invalid if it was XML)

if it is neither for these formats but still using a similar tag 
structure then I would say that Reg ex is still unsuitable & the OP would 
need to write a full parser for the format if one does not already exist



-- 
Farewell we call to hearth and hall!
Though wind may blow and rain may fall,
We must away ere break of day
Far over wood and mountain tall.

	To Rivendell, where Elves yet dwell
	In glades beneath the misty fell,
	Through moor and waste we ride in haste,
	And whither then we cannot tell.

With foes ahead, behind us dread,
Beneath the sky shall be our bed,
Until at last our toil be passed,
Our journey done, our errand sped.

	We must away!  We must away!
	We ride before the break of day!
		-- J. R. R. Tolkien



More information about the Python-list mailing list