[Tutor] List Indexing Issue

Tue May 8 22:33:03 CEST 2012

On Tue, May 8, 2012 at 4:00 PM, Spyros Charonis <s.charonis at gmail.com> wrote:
> Hello python community,
>
> I'm having a small issue with list indexing. I am extracting certain
> information from a PDB (protein information) file and need certain fields of
> the file to be copied into a list. The entries look like this:
>
> ATOM   1512  N   VAL A 222       8.544  -7.133  25.697  1.00 48.89
> N
> ATOM   1513  CA  VAL A 222       8.251  -6.190  24.619  1.00 48.64
> C
> ATOM   1514  C   VAL A 222       9.528  -5.762  23.898  1.00 48.32
> C
>
> I am using the following syntax to parse these lines into a list:
>
> charged_res_coord = [] # store x,y,z of extracted charged resiudes
> for line in pdb:
> if line.startswith('ATOM'):
> atom_coord.append(line)
>
> for i in range(len(atom_coord)):
> for item in charged_res:
> if item in atom_coord[i]:
> charged_res_coord.append(atom_coord[i].split()[1:9])
>
>
> The problem begins with entries such as the following.
>
> ROW1)   ATOM   1572  NH2 ARG A 228       7.890 -13.328  16.363  1.00 59.63
>         N
>
> ROW2)   ATOM   1617  N   GLU A1005      11.906  -2.722   7.994  1.00 44.02
>         N
>
> Here, the code that I use to extract the third spatial coordinate (the last
> of the three consecutive non-integer values) produces a problem:
>
> because 'A1005' (second row) is considered as a single list entry, while 'A'
> and '228' (first row) are two list entries, when I
> use a loop to index the 7th element it extracts '16.363' (entry I want) for
> first row and 1.00 (not entry I want) for the second row.
>
>>>> charged_res_coord[1]
> ['1572', 'NH2', 'ARG', 'A', '228', '7.890', '-13.328', '16.363']
>
>>>> charged_res_coord[10]
> ['1617', 'N', 'GLU', 'A1005', '11.906', '-2.722', '7.994', '1.00']
>
>
> The loop I use goes like this:
>
> for i in range(len(lys_charged_group)):
> lys_charged_group[i][7] = float(lys_charged_group[i][7])
>
> The [7] is the problem - in lines that are like ROW1 the code extracts the
> correct value,
> but in lines that are like ROW2 the code extracts the wrong value.
> Unfortunately, the different formats of rows are interspersed
> so I don't know if I can solve this using text processing routines? Would I
> have to use regular expressions?
>
> Many thanks for your help!
>
> Spyros
>
>
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>
I think regular expressions get overused.  They're great, but they can
get hard to understand.   Python has good built in string functions.
For your case you might want to look at this:
replace( 	str, old, new[, maxsplit])
    Return a copy of string str with all occurrences of substring old
replaced by new. If the optional argument maxsplit is given, the first
maxsplit occurrences are replaced.

You could Replace " A " with " A" which would then leave all your 4th
items like Annnn.  If you don't want the A in your results do
row[3][1:] to get everything after the A

Not a full solution, but check out the built in string capabilities of
python.  There is a lot there

-- 
Joel Goldstick