Recursive insertion of a line

Tue Nov 20 04:16:53 EST 2007

Please, see below.

--- Gabriel Genellina <gagsl-py2 at yahoo.com.ar> wrote:

> En Mon, 19 Nov 2007 21:15:16 -0300, Henry <henry.robinson at gmail.com>  
> escribió:
> 
> > On 19/11/2007, Francesco Pietra <chiendarret at yahoo.com> wrote:
> >>
> >> How to insert "TER" records recursively, i.e. some thousand fold,  in a
> >> file
> >> like in the following example? "H2 WAT" is the only constant
> >> characteristic of
> >> the line after which to insert "TER"; that distinguishes also for lines
> >
> > If every molecule is water, and therefore 3 atoms,

> you can use this fact  
> > to
> > insert TER in the right place. You don't need recursion:
> >
> > f = open( "atoms.txt", "rt" )
> > lineCount = 0
> > for line in f.xreadlines( ):
> >     lineCount = lineCount + 1
> >     print line
> >     if lineCount == 3:
> >         lineCount = 0
> >         print "TER"
> > f.close( )
> 
> A small variation can handle the original, more generic condition "insert  
> TER after the line containing H2
> WAT"
> 
> f = open("atoms.txt", "r")
> for line in f:
>      print line
>      if "H2  WAT" in line:
>          print "TER"
> f.close()
> 
> (also, note that unless you're using Python 2.2 or earlier, the xreadlines  
> call does no good)

I tried the latter script (which works also if there are other molecules in the
file, as it is my case) encountering two problems:

(1) "TER" records were inserted, as seen on the shell window. Though, the file
on disk was not modified. Your script named "ter_insert.py", in order to get
the modified file I used the classic

$ python ter_insert.py 2>&1 | tee file.out

Now, "file .out" had "TER" inserted where I wanted. It might well be that it
was my incorrect use of your script.

(2) An extra line is inserted (which was not a problem of outputting the file
as I did), except between "TER" and the next line, as shown below:

TER
ATOM  27400  O   WAT  4178      20.289   4.598  26.491  1.00  0.00      W20  O

ATOM  27401  H1  WAT  4178      19.714   3.835  26.423  1.00  0.00      W20  H

ATOM  27402  H2  WAT  4178      21.173   4.237  26.554  1.00  0.00      W20  H

TER
ATOM  27403  O   WAT  4585      23.340   3.428  25.621  1.00  0.00      W20  O

ATOM  27404  H1  WAT  4585      22.491   2.985  25.602  1.00  0.00      W20  H

ATOM  27405  H2  WAT  4585      23.826   2.999  26.325  1.00  0.00      W20  H

TER
ATOM  27406  O   WAT  4966      22.359   0.555  27.001  1.00  0.00      W20  O

ATOM  27407  H1  WAT  4966      21.820   1.202  27.456  1.00  0.00      W20  H

ATOM  27408  H2  WAT  4966      22.554  -0.112  27.659  1.00  0.00      W20  H

TER
END

Where "END" is how Protein Data Bank (pdb) files end. As these files are
extremely sensitive, can the script be modified to avoid these extra lines? Not
tried (it takes time, because I have to go to the big cluster) if the extra
lines really create problems, though, they take a lot of space on the shell
window.

A nearly perfect script. Thank you
francesco

> 
> -- 
> Gabriel Genellina
> 
> -- 
> http://mail.python.org/mailman/listinfo/python-list
> 

      ____________________________________________________________________________________
Never miss a thing.  Make Yahoo your home page. 
http://www.yahoo.com/r/hs