deleting texts between patterns

John Machin sjmachin at lexicon.net
Sun Jun 4 19:36:14 EDT 2006


On 5/06/2006 2:51 AM, Baoqiu Cui wrote:
> John Machin <sjmachin at lexicon.net> writes:
> 
>> Uh-oh.
>>
>> Try this:
>>
>>>>> pat = re.compile('(?<=abc\n).*?(?=xyz\n)', re.DOTALL)
>>>>> re.sub(pat, '', linestr)
>> 'blahfubarabc\nxyz\nxyzzy'
> 
> This regexp still has a problem.  It may remove the lines between two
> lines like 'aaabc' and 'xxxyz' (and also removes the first two 'x's in
> 'xxxyz').
> 
> The following regexp works better:
> 
>   pattern = re.compile('(?<=^abc\n).*?(?=^xyz\n)', re.DOTALL | re.MULTILINE)
> 

You are quite correct. Your reply, and the rejoinder below, only add to 
the proposition that regexes are not necessarily the best choice for 
every text-processing job :-)

Just in case the last line is 'xyz' but is not terminated by '\n':

pattern = re.compile('(?<=^abc\n).*?(?=^xyz$)', re.DOTALL | re.MULTILINE)

Cheers,
John



More information about the Python-list mailing list