[Tutor] Text Processing Query
Mark Lawrence
breamoreboy at yahoo.co.uk
Thu Mar 14 17:33:20 CET 2013
On 14/03/2013 11:28, taserian wrote:
Top posting fixed
>
> On Thu, Mar 14, 2013 at 6:56 AM, Spyros Charonis <s.charonis at gmail.com
> <mailto:s.charonis at gmail.com>> wrote:
>
> Hello Pythoners,
>
> I am trying to extract certain fields from a file that whose text
> looks like this:
>
> COMPND 2 MOLECULE: POTASSIUM CHANNEL SUBFAMILY K MEMBER 4;
> COMPND 3 CHAIN: A, B;
> COMPND 10 MOL_ID: 2;
> COMPND 11 MOLECULE: ANTIBODY FAB FRAGMENT LIGHT CHAIN;
> COMPND 12 CHAIN: D, F;
> COMPND 13 ENGINEERED: YES;
> COMPND 14 MOL_ID: 3;
> COMPND 15 MOLECULE: ANTIBODY FAB FRAGMENT HEAVY CHAIN;
> COMPND 16 CHAIN: E, G;
>
> I would like the chain IDs, but only those following the text
> heading "ANTIBODY FAB FRAGMENT", i.e. I need to create a list with
> D,F,E,G which excludes A,B which have a non-antibody text heading.
> I am using the following syntax:
>
> with open(filename) as file:
>
> scanfile=file.readlines()
>
> for line in scanfile:
>
> if line[0:6]=='COMPND' and 'FAB FRAGMENT' in line: continue
>
> elif line[0:6]=='COMPND' and 'CHAIN' in line:
>
> print line
>
>
> But this yields:
>
> COMPND 3 CHAIN: A, B;
> COMPND 12 CHAIN: D, F;
> COMPND 16 CHAIN: E, G;
>
> I would like to ignore the first line since A,B correspond to
> non-antibody text headings, and instead want to extract only D,F &
> E,G whose text headings are specified as antibody fragments.
>
> Many thanks,
> Spyros
>
> Since the identifier and the item that you want to keep are on different
> lines, you'll need to set a "flag".
>
> with open(filename) as file:
>
> scanfile=file.readlines()
>
> flag = 0
>
> for line in scanfile:
>
> if line[0:6]=='COMPND' and 'FAB FRAGMENT' in line: flag = 1
>
> elif line[0:6]=='COMPND' and 'CHAIN' in line and flag = 1:
>
> print line
>
> flag = 0
>
>
> Notice that the flag is set to 1 only on "FAB FRAGMENT", and it's reset
> to 0 after the next "CHAIN" line that follows the "FAB FRAGMENT" line.
>
>
> AR
>
>
Notice that this code won't run due to a syntax error.
--
Cheers.
Mark Lawrence
More information about the Tutor
mailing list