[Tutor] Text Processing Query

Mark Lawrence breamoreboy at yahoo.co.uk
Thu Mar 14 17:33:20 CET 2013


On 14/03/2013 11:28, taserian wrote:

Top posting fixed

>
> On Thu, Mar 14, 2013 at 6:56 AM, Spyros Charonis <s.charonis at gmail.com
> <mailto:s.charonis at gmail.com>> wrote:
>
>     Hello Pythoners,
>
>     I am trying to extract certain fields from a file that whose text
>     looks like this:
>
>     COMPND   2 MOLECULE: POTASSIUM CHANNEL SUBFAMILY K MEMBER 4;
>     COMPND   3 CHAIN: A, B;
>     COMPND  10 MOL_ID: 2;
>     COMPND  11 MOLECULE: ANTIBODY FAB FRAGMENT LIGHT CHAIN;
>     COMPND  12 CHAIN: D, F;
>     COMPND  13 ENGINEERED: YES;
>     COMPND  14 MOL_ID: 3;
>     COMPND  15 MOLECULE: ANTIBODY FAB FRAGMENT HEAVY CHAIN;
>     COMPND  16 CHAIN: E, G;
>
>     I would like the chain IDs, but only those following the text
>     heading "ANTIBODY FAB FRAGMENT", i.e. I need to create a list with
>     D,F,E,G  which excludes A,B which have a non-antibody text heading.
>     I am using the following syntax:
>
>     with open(filename) as file:
>
>          scanfile=file.readlines()
>
>          for line in scanfile:
>
>              if line[0:6]=='COMPND' and 'FAB FRAGMENT' in line: continue
>
>              elif line[0:6]=='COMPND' and 'CHAIN' in line:
>
>                  print line
>
>
>     But this yields:
>
>     COMPND   3 CHAIN: A, B;
>     COMPND  12 CHAIN: D, F;
>     COMPND  16 CHAIN: E, G;
>
>     I would like to ignore the first line since A,B correspond to
>     non-antibody text headings, and instead want to extract only D,F &
>     E,G whose text headings are specified as antibody fragments.
>
>     Many thanks,
>     Spyros
>
> Since the identifier and the item that you want to keep are on different
> lines, you'll need to set a "flag".
>
> with open(filename) as file:
>
>      scanfile=file.readlines()
>
>      flag = 0
>
>      for line in scanfile:
>
>          if line[0:6]=='COMPND' and 'FAB FRAGMENT' in line: flag = 1
>
>          elif line[0:6]=='COMPND' and 'CHAIN' in line and flag = 1:
>
>              print line
>
>              flag = 0
>
>
> Notice that the flag is set to 1 only on "FAB FRAGMENT", and it's reset
> to 0 after the next "CHAIN" line that follows the "FAB FRAGMENT" line.
>
>
> AR
>
>

Notice that this code won't run due to a syntax error.

-- 
Cheers.

Mark Lawrence



More information about the Tutor mailing list