re.search - just skip it

Wed Jan 26 11:11:29 EST 2005

 wrote:

> Input is this:
> 
> SET1_S_W CHAR(1) NOT NULL,
> SET2_S_W CHAR(1) NOT NULL,
> SET3_S_W CHAR(1) NOT NULL,
> SET4_S_W CHAR(1) NOT NULL,
> ;
> 
> .py says:
> 
> import re, string, sys
> s_ora = re.compile('.*S_W.*')
> lines = open("y.sql").readlines()
> for i in range(len(lines)):
> try:
> if s_ora.search(lines[i]): del lines[i]
> except IndexError:
> open("z.sql","w").writelines(lines)
> 
> but output is:
> 
> SET2_S_W CHAR(1) NOT NULL,
> SET4_S_W CHAR(1) NOT NULL,
> ;
> 
> It should delete every, not every other!

No, it should delete every other line since that is what happens if you use 
an index to iterate over a list while deleting items from the same list. 
Whenever you delete an item the following items shuffle down and then you 
increment the loop counter which skips over the next item.

The fact that you got an IndexError should have been some sort of clue that 
your code was going to go wrong.

Try one of these:
  iterate backwards
  iterate over a copy of the list but delete from the original
  build a new list containing only those lines you want to keep

also, the regex isn't needed here, and you should always close files when 
finished with them.

Something like this should work (untested):

s_ora = 'S_W'
input = open("y.sql")
try:
    lines = [ line for line in input if s_ora in line ]
finally:
    input.close()

output = open("z.sql","w")
try:
    output.write(str.join('', lines))
finally:
    output.close()