[Tutor] List Indexing Issue
Bala subramanian
bala.biophysics at gmail.com
Mon May 14 13:00:08 CEST 2012
Hi,
I would suggest you to use the biopython package. It has a PDB parser with
which you can extract any specific information like atom name, residue,
chain etc as you wish.
Bala
On Wed, May 9, 2012 at 3:19 AM, Jerry Hill <malaclypse2 at gmail.com> wrote:
> On Tue, May 8, 2012 at 4:00 PM, Spyros Charonis <s.charonis at gmail.com>
> wrote:
> > Hello python community,
> >
> > I'm having a small issue with list indexing. I am extracting certain
> > information from a PDB (protein information) file and need certain
> fields of
> > the file to be copied into a list. The entries look like this:
> >
> > ATOM 1512 N VAL A 222 8.544 -7.133 25.697 1.00 48.89
> > N
> > ATOM 1513 CA VAL A 222 8.251 -6.190 24.619 1.00 48.64
> > C
> > ATOM 1514 C VAL A 222 9.528 -5.762 23.898 1.00 48.32
> > C
> >
> > I am using the following syntax to parse these lines into a list:
> ...
> > charged_res_coord.append(atom_coord[i].split()[1:9])
>
> You're using split, assuming that there will be blank spaces between
> your fields. That's not true, though. PDB is a fixed length record
> format, according to the documentation I found here:
> http://www.wwpdb.org/docs.html
>
> If you just have a couple of items to pull out, you can just slice the
> string at the appropriate places. Based on those docs, you could pull
> the x, y, and z coordinates out like this:
>
>
> x_coord = atom_line[30:38]
> y_coord = atom_line[38:46]
> z_coord = atom_line[46:54]
>
> If you need to pull more of the data out, or you may want to reuse
> this code in the future, it might be worth actually parsing the record
> into all its parts. For a fixed length record, I usually do something
> like this:
>
> pdbdata = """
> ATOM 1512 N VAL A 222 8.544 -7.133 25.697 1.00 48.89
> N
> ATOM 1513 CA VAL A 222 8.251 -6.190 24.619 1.00 48.64
> C
> ATOM 1514 C VAL A 222 9.528 -5.762 23.898 1.00 48.32
> C
> ATOM 1617 N GLU A1005 11.906 -2.722 7.994 1.00 44.02
> N
> """.splitlines()
>
> atom_field_spec = [
> slice(0,6),
> slice(6,11),
> slice(12,16),
> slice(16,18),
> slice(17,20),
> slice(21,22),
> slice(22,26),
> slice(26,27),
> slice(30,38),
> slice(38,46),
> slice(46,54),
> slice(54,60),
> slice(60,66),
> slice(76,78),
> slice(78,80),
> ]
>
> for line in pdbdata:
> if line.startswith('ATOM'):
> data = [line[field_spec] for field_spec in atom_field_spec]
> print(data)
>
>
> You can build all kind of fancy data structures on top of that if you
> want to. You could use that extracted data to build a namedtuple for
> convenient access to the data by names instead of indexes into a list,
> or to create instances of a custom class with whatever functionality
> you need.
>
> --
> Jerry
> _______________________________________________
> Tutor maillist - Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>
--
C. Balasubramanian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20120514/f449ec48/attachment-0001.html>
More information about the Tutor
mailing list