deleting texts between patterns
John Machin
sjmachin at lexicon.net
Fri May 12 05:57:23 EDT 2006
On 12/05/2006 6:11 PM, Ravi Teja wrote:
> mickle... at hotmail.com wrote:
>> hi
>> say i have a text file
>>
>> line1
[snip]
>> line6
>> abc
>> line8 <---to be delete
[snip]
>> line13 <---to be delete
>> xyz
>> line15
[snip]
>> line18
>>
>> I wish to delete lines that are in between 'abc' and 'xyz' and print
>> the rest of the lines. Which is the best way to do it? Should i get
>> everything into a list, get the index of abc and xyz, then pop the
>> elements out? or any other better methods?
>> thanks
>
> In other words ...
> lines = open('test.txt').readlines()
> for line in lines[lines.index('abc\n') + 1:lines.index('xyz\n')]:
> lines.remove(line)
I don't think that's what you really meant.
>>> lines = ['blah', 'fubar', 'abc\n', 'blah', 'fubar', 'xyz\n', 'xyzzy']
>>> for line in lines[lines.index('abc\n') + 1:lines.index('xyz\n')]:
... lines.remove(line)
...
>>> lines
['abc\n', 'blah', 'fubar', 'xyz\n', 'xyzzy']
Uh-oh.
Try this:
>>> lines = ['blah', 'fubar', 'abc\n', 'blah', 'fubar', 'xyz\n', 'xyzzy']
>>> del lines[lines.index('abc\n') + 1:lines.index('xyz\n')]
>>> lines
['blah', 'fubar', 'abc\n', 'xyz\n', 'xyzzy']
>>>
Of course wrapping it in try/except would be a good idea, not for the
slicing, which behaves itself and does nothing if the 'abc\n' appears
AFTER the 'xyz\n', but for the index() in case the sought markers aren't
there. Perhaps it might be a good idea even to do it carefully one piece
at a time: is the abc there? is the xyz there? is the xyz after the abc
-- then del[index1+1:index2].
I wonder what the OP wants to happen in a case like this:
guff1 xyz guff2 abc guff2 xyz guff3
or this:
guff1 abc guff2 abc guff2 xyz guff3
> for line in lines:
> print line,
>
> Regular expressions are better in this case
Famous last words.
> import re
> pat = re.compile('abc\n.*?xyz\n', re.DOTALL)
> print re.sub(pat, '', open('test.txt').read())
>
I don't think you really meant that either.
>>> lines = ['blah', 'fubar', 'abc\n', 'blah', 'fubar', 'xyz\n', 'xyzzy']
>>> linestr = "".join(lines)
>>> linestr
'blahfubarabc\nblahfubarxyz\nxyzzy'
>>> import re
>>> pat = re.compile('abc\n.*?xyz\n', re.DOTALL)
>>> print re.sub(pat, '', linestr)
blahfubarxyzzy
>>>
Uh-oh.
Try this:
>>> pat = re.compile('(?<=abc\n).*?(?=xyz\n)', re.DOTALL)
>>> re.sub(pat, '', linestr)
'blahfubarabc\nxyz\nxyzzy'
... and I can't imagine why you're using the confusing [IMHO]
undocumented [AFAICT] feature that the first arg of the module-level
functions like sub and friends can be a compiled regular expression
object. Why not use this:
>>> pat.sub('', linestr)
'blahfubarabc\nxyz\nxyzzy'
>>>
One-liner fanboys might prefer this:
>>> re.sub('(?i)(?<=abc\n).*?(?=xyz\n)', '', linestr)
'blahfubarabc\nxyz\nxyzzy'
>>>
HTH,
John
More information about the Python-list
mailing list