Renumbering
Philipp Pagel
pDOTpagel at helmholtz-muenchen.de
Wed Sep 3 06:41:52 EDT 2008
Francesco Pietra <chiendarret at gmail.com> wrote:
> ATOM 3424 N LEU B 428 143.814 87.271 77.726 1.00115.20 2SG3426
> ATOM 3425 CA LEU B 428 142.918 87.524 78.875 1.00115.20 2SG3427
[...]
> As you can see, the number of lines for a particular value in column 6
> changes from situation to situation, and may even be different for the
> same name in column 4. For example, LEU can have a different number of
> lines depending on the position of this amino acid (leucine).
Others have alreade given good hints but I would like to add a bit of
advice.
The data you show appears to be a PDB protein structure file. It is
important to realize that these are fixed-width files and columns can be
empty so splitting on tab or whithespace will often fail. It is also
important to know that the residue numbering (cols 23-26) is not
necessarily contiguous and is not even unique without taking into
account the 'insertion code' in column 27 which happens to be empty in
your example. I would recommend to use a full-blown PDB parser to read
the data and then iterate over the residues and do whatever you would
like to acomplish that way. Biopython has such a parser:
www.biopython.org
cu
Philipp
--
Dr. Philipp Pagel
Lehrstuhl f. Genomorientierte Bioinformatik
Technische Universität München
http://mips.gsf.de/staff/pagel
More information about the Python-list
mailing list