[Tutor] drawing a graph

Fathima Javeed fathimajaveed at hotmail.com
Fri Sep 3 04:20:04 CEST 2004


Hi,

I have managed to get distances between sequnces at each P value, using 
randomization, so now i have a html output file where there are two set of 
values one is different distance percentage and another P values from 1 to 
100, How would i draw a graph in Python i.e. distance against P values for 
each sequence, Completely lost now, really would appreciate help, would it 
be helpful to paste my code here?

Cheers
Fuzzi

>From: Kent Johnson <kent_johnson at skillsoft.com>
>To: tutor at python.org
>Subject: Re: [Tutor] need help with comparing list of sequences in  
>Python!!
>Date: Tue, 31 Aug 2004 07:04:09 -0400
>
>Fuzzi,
>
>Here is one way to do this:
>- Use zip() to pair up elements from the two sequences
> >>> s1='aaabbbbcccc'
> >>> s2='aaaccccbcccccccccc'
> >>> zip(s1, s2)
>[('a', 'a'), ('a', 'a'), ('a', 'a'), ('b', 'c'), ('b', 'c'), ('b', 'c'), 
>('b', 'c'), ('c', 'b'), ('c', 'c'), ('c', 'c'), ('c', 'c')]
>
>- Use a list comprehension to compare the elements of the pair and put the 
>results in a new list. I'm not sure if you want to count the matches or the 
>mismatches - your original post says mismatches, but in your example you 
>count matches. This example counts matches but it is easy to change.
> >>> [a == b for a, b in zip(s1, s2)]
>[True, True, True, False, False, False, False, False, True, True, True]
>
>- In Python, True has a value of 1 and False has a value of 0, so adding up 
>the elements of this list gives the number of matches:
> >>> sum([a == b for a, b in zip(s1, s2)])
>6
>
>- min() and len() give you the length of the shortest sequence:
> >>> min(len(s1), len(s2))
>11
>
>- When you divide, you have to convert one of the numbers to a float or 
>Python will use integer division!
> >>> 6/11
>0
> >>> float(6)/11
>0.54545454545454541
>
>Put this together with the framework that Alan gave you to create a program 
>that calculates distances. Then you can start on the randomization part.
>
>Kent
>
>
>At 04:03 AM 8/31/2004 +0100, Fathima Javeed wrote:
>>Hi Kent
>>
>>To awnser your question:
>>well here is how it works
>>sequence one = aaabbbbcccc
>>length = 11
>>
>>seq 2 = aaaccccbcccccccccc
>>length = 18
>>
>>to get the pairwise similarity of this score the program compares the 
>>letters
>>of the two sequences upto length = 11, the length of the shorter sequence.
>>
>>so a match gets a score of 1, therefore using + for match and x for 
>>mismatch
>>
>>aaabbbbcccc
>>aaaccccbcccccccccc
>>+++xxxxx+++
>>
>>there fore the score = 6/11 = 0.5454 or 54%
>>
>>so you only score the first 11 letters of each score and its is not
>>required to compare the rest of the sequence 2. this is what the
>>distance matrix is doing
>>
>>match score == 6
>>
>>The spaces are deleted to make both of them the same length
>>
>>
>>>From: Kent Johnson <kent_johnson at skillsoft.com>
>>>To: "Fathima Javeed" <fathimajaveed at hotmail.com>, tutor at python.org
>>>Subject: Re: [Tutor] need help with comparing list of sequences in
>>>Python!!
>>>Date: Mon, 30 Aug 2004 13:53:19 -0400
>>>
>>>Fuzzi,
>>>
>>>How do you count mismatches if the lengths of the sequences are 
>>>different? Do you start from the front of both sequences or do you look 
>>>for a best match? Do you count the extra characters in the longer string 
>>>as mismatches or do you ignore them? An example or two would help.
>>>
>>>For example if
>>>s1=ABCD
>>>s2=XABDDYY
>>>how many characters do you count as different?
>>>
>>>Kent
>>>
>>>At 07:00 PM 8/29/2004 +0100, Fathima Javeed wrote:
>>>>Hi,
>>>>would really appreciate it if someone could help me in Python as i am 
>>>>new to the language.
>>>>
>>>>Well i have a list of protein sequences in a text file, e.g. (dummy 
>>>>data)
>>>>
>>>>MVEIGEKAPEIELVDTDLKKVKIPSDFKGKVVVLAFYPAAFTSVCTKEMCTFRDSMAKFNEVNAVVIGISVDP
>>>>PFS
>>>>
>>>>MAPITVGDVVPDGTISFFDENDQLQTVSVHSIAAGKKVILFGVPGAFTPTCSMSHVPGFIGKAEELKSKG
>>>>
>>>>APIKVGDAIPAVEVFEGEPGNKVNLAELFKGKKGVLFGVPGAFTPGCSKTHLPGFVEQAEALKAKGVQVVACL
>>>>SVND
>>>>
>>>>HGFRFKLVSDEKGEIGMKYGVVRGEGSNLAAERVTFIIDREGNIRAILRNI
>>>>
>>>>etc etc
>>>>
>>>>They are not always of the same length,
>>>>
>>>>The first sequence is always the reference sequence which i am tring to 
>>>>investigate, basically to reach the objective, i need to compare each 
>>>>sequence with the first one, starting with the the comparison of the 
>>>>reference sequence by itself.
>>>>
>>>>The objective of the program, is to manupulate each sequence i.e. 
>>>>randomly change characters and calculate the distance (Distance: Number 
>>>>of letters between a pair of sequnces that dont match  DIVIDED by the 
>>>>length of the shortest sequence) between the sequence in question 
>>>>against the reference sequence. So therefore need  a program code where 
>>>>it takes the first sequence as a reference sequence (constant which is 
>>>>on top of the list), first it compares it with itself, then it compares 
>>>>with the second sequence, then with the third sequence etc etc  each at 
>>>>a time,
>>>>
>>>>for the first comparison, you take a copy of the ref sequnce and 
>>>>manupulate the copied sequence) i.e. randomly changing the letters in 
>>>>the sequence, and calculating the distances between them.
>>>>(the letters that are used for this are: A R N D C E Q G H I L K M F P S 
>>>>T W Y V)
>>>>
>>>>The reference sequence is never altered or manupulated, for the first 
>>>>comparison, its the copied version of the reference sequence thats 
>>>>altered.
>>>>
>>>>Randomization is done using different P values
>>>>e.g for example (P = probability of change)
>>>>if P = 0      no random change has been done
>>>>if P = 1.0   all the letters in that particular sequence has been 
>>>>randomly changed, therefore p=1.0 equals to the length of the sequence
>>>>
>>>>So its calculating the distance each time between two sequences ( first 
>>>>is always the reference sequnce and another second sequence) at each P 
>>>>value ( starting from 0, then 0.1, 0.2, ....... 1.0).
>>>>
>>>>Note: Number of sequnces to be compared could be any number and of any 
>>>>length
>>>>
>>>>I dont know how to compare each sequence with the first sequnce and how 
>>>>to do randomization of the characters in the sequnce therefore to 
>>>>calculate the distance for each pair of sequnce , if someone can give me 
>>>>any guidance, I would be greatful
>>>>
>>>>Cheers
>>>>Fuzzi
>>>>
>>>>_________________________________________________________________
>>>>Stay in touch with absent friends - get MSN Messenger 
>>>>http://www.msn.co.uk/messenger
>>>>
>>>>_______________________________________________
>>>>Tutor maillist  -  Tutor at python.org
>>>>http://mail.python.org/mailman/listinfo/tutor
>>
>>_________________________________________________________________
>>It's fast, it's easy and it's free. Get MSN Messenger today! 
>>http://www.msn.co.uk/messenger
>
>_______________________________________________
>Tutor maillist  -  Tutor at python.org
>http://mail.python.org/mailman/listinfo/tutor

_________________________________________________________________
Express yourself with cool new emoticons http://www.msn.co.uk/specials/myemo



More information about the Tutor mailing list