Deleting lines from a file

Larry Bates larry.bates at websafe.com
Mon Dec 17 08:19:57 EST 2007


Horacius ReX wrote:
> Hi,
> 
> I need to write a program which reads an external text file. Each time
> it reads, then it needs to delete some lines, for instance from second
> line to 55th line. The file is really big, so what do you think is the
> fastest method to delete specific lines in a text file ?
> 
> Thanks
> 
One way would be to "mark" the lines as being deleted by either:

1) replacing them with some known character sequence that you treat as deleted.
This assumes that the lines are long enough.

or

2) by keeping a separate dictionary that holds line numbers and deleteflag. 
Pickle and dump this dictionary before program execution ends.  Load it at 
program execution beginning.

deletedFlags={1:False, 2: True, ...}

def load():
     pFiles="deletedLines.toc"
     fp=open(pFiles, 'wb')
     deletedFlags=pickle.dump(fp)
     fp.close()


def dump(deletedFlags):
     pFiles="deletedLines.toc"
     fp=open(pFiles, 'rb')
     pickle.dump(deletedFlags, fp)
     fp.close()

Caveats:

1) you must write EXACTLY the same number of bytes (padded with spaces, etc.) on 
top of deleted lines.  This method doesn't work if any of the lines
are so short they don't support your <DELETED> flag string.

2) You must be very careful to maintain consistency of the deletedFlags 
dictionary and the data file (by using try/except/finally around your entire 
process).

Personally I would employ method #2 and periodically "pack" the file with a 
separate process.  That could run unattended (e.g. at night). Or, if I did this 
a lot, I would use a database instead.

-Larry



More information about the Python-list mailing list