[Tutor] List Indexing Issue

Bala subramanian bala.biophysics at gmail.com
Mon May 14 13:00:08 CEST 2012


Hi,
I would suggest you to use the biopython package. It has a PDB parser with
which you can extract any specific information like atom name, residue,
chain etc as you wish.
Bala

On Wed, May 9, 2012 at 3:19 AM, Jerry Hill <malaclypse2 at gmail.com> wrote:

> On Tue, May 8, 2012 at 4:00 PM, Spyros Charonis <s.charonis at gmail.com>
> wrote:
> > Hello python community,
> >
> > I'm having a small issue with list indexing. I am extracting certain
> > information from a PDB (protein information) file and need certain
> fields of
> > the file to be copied into a list. The entries look like this:
> >
> > ATOM   1512  N   VAL A 222       8.544  -7.133  25.697  1.00 48.89
> > N
> > ATOM   1513  CA  VAL A 222       8.251  -6.190  24.619  1.00 48.64
> > C
> > ATOM   1514  C   VAL A 222       9.528  -5.762  23.898  1.00 48.32
> > C
> >
> > I am using the following syntax to parse these lines into a list:
> ...
> > charged_res_coord.append(atom_coord[i].split()[1:9])
>
> You're using split, assuming that there will be blank spaces between
> your fields.  That's not true, though.  PDB is a fixed length record
> format, according to the documentation I found here:
> http://www.wwpdb.org/docs.html
>
> If you just have a couple of items to pull out, you can just slice the
> string at the appropriate places.  Based on those docs, you could pull
> the x, y, and z coordinates out like this:
>
>
> x_coord = atom_line[30:38]
> y_coord = atom_line[38:46]
> z_coord = atom_line[46:54]
>
> If you need to pull more of the data out, or you may want to reuse
> this code in the future, it might be worth actually parsing the record
> into all its parts.  For a fixed length record, I usually do something
> like this:
>
> pdbdata = """
> ATOM   1512  N   VAL A 222       8.544  -7.133  25.697  1.00 48.89
>   N
> ATOM   1513  CA  VAL A 222       8.251  -6.190  24.619  1.00 48.64
>   C
> ATOM   1514  C   VAL A 222       9.528  -5.762  23.898  1.00 48.32
>   C
> ATOM   1617  N   GLU A1005      11.906  -2.722   7.994  1.00 44.02
>   N
> """.splitlines()
>
> atom_field_spec = [
>    slice(0,6),
>    slice(6,11),
>    slice(12,16),
>    slice(16,18),
>    slice(17,20),
>    slice(21,22),
>    slice(22,26),
>    slice(26,27),
>    slice(30,38),
>    slice(38,46),
>    slice(46,54),
>    slice(54,60),
>    slice(60,66),
>    slice(76,78),
>    slice(78,80),
>    ]
>
> for line in pdbdata:
>    if line.startswith('ATOM'):
>        data = [line[field_spec] for field_spec in atom_field_spec]
>        print(data)
>
>
> You can build all kind of fancy data structures on top of that if you
> want to.  You could use that extracted data to build a namedtuple for
> convenient access to the data by names instead of indexes into a list,
> or to create instances of a custom class with whatever functionality
> you need.
>
> --
> Jerry
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>



-- 
C. Balasubramanian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20120514/f449ec48/attachment-0001.html>


More information about the Tutor mailing list