Python editing .txt file

Tue Jun 15 22:06:58 EDT 2010

187braintrust at berkeley.edu wrote:
> I am trying to write a program in Python that will edit .txt log files 
> that contain regression output from R.  Any thoughts or suggestions 
> would be greatly appreciated.  
> 
> To get an idea of what I am trying to do, note that I include fixed 
> effects in the R regressions, resulting in hundreds of extra lines per 
> regression which I am not interested in right now.  Basically, I want to 
> save a shortened version of the .txt files in which the blocks of fixed 
> effects coefficients are replaced by a line that says includes fixed 
> effects for whatever variable it is.  
> 
> All the lines that are to be deleted start with the same six characters 
> -- 'factor(xyz)' where xyz is the variable name -- so my idea is to have 
> Python copy each line to a new file if the first six characters do not 
> match 'factor('.  
> 
> That part I at least know how to approach.  However,  I am not sure how 
> to approach adding the line that says, "includes fixed effects for xyz." 
>  The problem I am having is how to approach the following:
> 
> 
>     1. In the resulting file, I will be skipping blocks of lines, say
>     anywhere from 10 to 500 or so, and inserting one line -- i.e.,
>     whether it inserts the line needs to depend on whether it's the
>     first line or one of the remaining 499 lines.  
> 
>     2. the xyz variable name is different lengths depending on what
>     variable it is.  For example, one block might be 'state' and another
>     block might be 'yr'.  Maybe I can use the fact that the var name
>     starts after the first '(' and ends at the first ')' in the line?  I
>     think I can use the re module for this?
> 
> 
> Any suggestions on any aspect of this, but especially the latter part, 
> would be greatly appreciated.  Thank you.  
> 
How's this:

input_file = open(input_path)
output_file = open(output_path, "w")
for line in input_file:
     if line.startswith("factor("):
         open_paren = line.find("(")
         close_paren = line.find(")")
         variable = line[open_paren + 1 : close_paren]
         output_file.write("*** Factors for %s ***\n" % variable)
         prefix = line[ : close_paren + 1]
         while line.startswith(prefix):
             line = input_file.readline()
     output_file.write(line)
input_file.close()
output_file.close()