[Tutor] List Indexing Issue
Joel Goldstick
joel.goldstick at gmail.com
Tue May 8 22:33:03 CEST 2012
On Tue, May 8, 2012 at 4:00 PM, Spyros Charonis <s.charonis at gmail.com> wrote:
> Hello python community,
>
> I'm having a small issue with list indexing. I am extracting certain
> information from a PDB (protein information) file and need certain fields of
> the file to be copied into a list. The entries look like this:
>
> ATOM 1512 N VAL A 222 8.544 -7.133 25.697 1.00 48.89
> N
> ATOM 1513 CA VAL A 222 8.251 -6.190 24.619 1.00 48.64
> C
> ATOM 1514 C VAL A 222 9.528 -5.762 23.898 1.00 48.32
> C
>
> I am using the following syntax to parse these lines into a list:
>
> charged_res_coord = [] # store x,y,z of extracted charged resiudes
> for line in pdb:
> if line.startswith('ATOM'):
> atom_coord.append(line)
>
> for i in range(len(atom_coord)):
> for item in charged_res:
> if item in atom_coord[i]:
> charged_res_coord.append(atom_coord[i].split()[1:9])
>
>
> The problem begins with entries such as the following.
>
> ROW1) ATOM 1572 NH2 ARG A 228 7.890 -13.328 16.363 1.00 59.63
> N
>
> ROW2) ATOM 1617 N GLU A1005 11.906 -2.722 7.994 1.00 44.02
> N
>
> Here, the code that I use to extract the third spatial coordinate (the last
> of the three consecutive non-integer values) produces a problem:
>
> because 'A1005' (second row) is considered as a single list entry, while 'A'
> and '228' (first row) are two list entries, when I
> use a loop to index the 7th element it extracts '16.363' (entry I want) for
> first row and 1.00 (not entry I want) for the second row.
>
>>>> charged_res_coord[1]
> ['1572', 'NH2', 'ARG', 'A', '228', '7.890', '-13.328', '16.363']
>
>>>> charged_res_coord[10]
> ['1617', 'N', 'GLU', 'A1005', '11.906', '-2.722', '7.994', '1.00']
>
>
> The loop I use goes like this:
>
> for i in range(len(lys_charged_group)):
> lys_charged_group[i][7] = float(lys_charged_group[i][7])
>
> The [7] is the problem - in lines that are like ROW1 the code extracts the
> correct value,
> but in lines that are like ROW2 the code extracts the wrong value.
> Unfortunately, the different formats of rows are interspersed
> so I don't know if I can solve this using text processing routines? Would I
> have to use regular expressions?
>
> Many thanks for your help!
>
> Spyros
>
>
> _______________________________________________
> Tutor maillist - Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>
I think regular expressions get overused. They're great, but they can
get hard to understand. Python has good built in string functions.
For your case you might want to look at this:
replace( str, old, new[, maxsplit])
Return a copy of string str with all occurrences of substring old
replaced by new. If the optional argument maxsplit is given, the first
maxsplit occurrences are replaced.
You could Replace " A " with " A" which would then leave all your 4th
items like Annnn. If you don't want the A in your results do
row[3][1:] to get everything after the A
Not a full solution, but check out the built in string capabilities of
python. There is a lot there
--
Joel Goldstick
More information about the Tutor
mailing list