split large file by string/regex

Jason Rennie jrennie at csail.mit.edu
Mon Nov 22 09:05:45 EST 2004


On Mon, Nov 22, 2004 at 09:38:55AM +0100, Martin Dieringer wrote:
> I am trying to split a file by a fixed string.
> The file is too large to just read it into a string and split this.
> I could probably use a lexer but there maybe anything more simple?

If the pattern is contained within a single line, do something like this:

import re
myre = re.compile(r'foo')
fh = open(f)
fh1 = open(f1,'w')
s = fh.readline()
while not myre.search(s):
  fh1.write(s)
  s = fh.readline()
fh1.close()
fh2.open(f1,'w')
while fh
  fh2.write(s)
  s = fh.readline()
fh2.close()
fh.close()

I'm doing this off the top of my head, so this code almost certainly
has bugs.  Hopefully its enough to get you started...  Note that only
one line is held in memory at any point in time.  Oh, if there's a
chance that the pattern does not appear in the file, you'll need to
check for eof in the first while loop.

Jason



More information about the Python-list mailing list