[Tutor] Removing/Handing large blocks of text

Wed Dec 8 16:15:13 CET 2004

I would use a loop with a flag to indicate whether you are in the <foo> block or not. If the tags 
are always <foo> and </foo> on a line by themselves you don't need an re:

lines = []
appending = True
f = open('foobar.txt', 'r')
for line in f:
   if appending:
     lines.append(line)
     if line.strip() == '<foo>':
       appending = False

   elif line.strip() == '</foo>':
     appending = True
     lines.append(line)
f.close()

At the end of this loop lines will have the lines you want to write back.

Kent

Jesse Noller wrote:
> Hello,
> 
> I'm trying to do some text processing with python on a farily large
> text file (actually, XML, but I am handling it as plaintext as all I
> need to do is find/replace/move) and I am having problems with trying
> to identify two lines in the text file, and remove everything in
> between those two lines (but not the two lines) and then write the
> file back (I know the file IO part).
> 
> I'm trying to do this with the re module - the two tags looks like:
> 
> <foo>
>     ...
>     a bunch of text (~1500 lines)
>     ...
> </foo>
> 
> I need to identify the first tag, and the second, and unconditionally
> strip out everything in between those two tags, making it look like:
> 
> <foo>
> </foo>
> 
> I'm familiar with using read/readlines to pull the file into memory
> and alter the contents via string.replace(str, newstr) but I am not
> sure where to begin with this other than the typical open/readlines.
> 
> I'd start with something like:
> 
> re1 = re.compile('^\<foo\>')
> re2 = re.compile('^\<\/foo\>')
> 
> f = open('foobar.txt', 'r')
> for lines in f.readlines()
>     match = re.match(re1, line)
> 
> But I'm lost after this point really, as I can identify the two lines,
> but I am not sure how to do the processing.
> 
> thank you
> -jesse
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>