[Tutor] drawing a graph
Fathima Javeed
fathimajaveed at hotmail.com
Fri Sep 3 04:20:04 CEST 2004
Hi,
I have managed to get distances between sequnces at each P value, using
randomization, so now i have a html output file where there are two set of
values one is different distance percentage and another P values from 1 to
100, How would i draw a graph in Python i.e. distance against P values for
each sequence, Completely lost now, really would appreciate help, would it
be helpful to paste my code here?
Cheers
Fuzzi
>From: Kent Johnson <kent_johnson at skillsoft.com>
>To: tutor at python.org
>Subject: Re: [Tutor] need help with comparing list of sequences in
>Python!!
>Date: Tue, 31 Aug 2004 07:04:09 -0400
>
>Fuzzi,
>
>Here is one way to do this:
>- Use zip() to pair up elements from the two sequences
> >>> s1='aaabbbbcccc'
> >>> s2='aaaccccbcccccccccc'
> >>> zip(s1, s2)
>[('a', 'a'), ('a', 'a'), ('a', 'a'), ('b', 'c'), ('b', 'c'), ('b', 'c'),
>('b', 'c'), ('c', 'b'), ('c', 'c'), ('c', 'c'), ('c', 'c')]
>
>- Use a list comprehension to compare the elements of the pair and put the
>results in a new list. I'm not sure if you want to count the matches or the
>mismatches - your original post says mismatches, but in your example you
>count matches. This example counts matches but it is easy to change.
> >>> [a == b for a, b in zip(s1, s2)]
>[True, True, True, False, False, False, False, False, True, True, True]
>
>- In Python, True has a value of 1 and False has a value of 0, so adding up
>the elements of this list gives the number of matches:
> >>> sum([a == b for a, b in zip(s1, s2)])
>6
>
>- min() and len() give you the length of the shortest sequence:
> >>> min(len(s1), len(s2))
>11
>
>- When you divide, you have to convert one of the numbers to a float or
>Python will use integer division!
> >>> 6/11
>0
> >>> float(6)/11
>0.54545454545454541
>
>Put this together with the framework that Alan gave you to create a program
>that calculates distances. Then you can start on the randomization part.
>
>Kent
>
>
>At 04:03 AM 8/31/2004 +0100, Fathima Javeed wrote:
>>Hi Kent
>>
>>To awnser your question:
>>well here is how it works
>>sequence one = aaabbbbcccc
>>length = 11
>>
>>seq 2 = aaaccccbcccccccccc
>>length = 18
>>
>>to get the pairwise similarity of this score the program compares the
>>letters
>>of the two sequences upto length = 11, the length of the shorter sequence.
>>
>>so a match gets a score of 1, therefore using + for match and x for
>>mismatch
>>
>>aaabbbbcccc
>>aaaccccbcccccccccc
>>+++xxxxx+++
>>
>>there fore the score = 6/11 = 0.5454 or 54%
>>
>>so you only score the first 11 letters of each score and its is not
>>required to compare the rest of the sequence 2. this is what the
>>distance matrix is doing
>>
>>match score == 6
>>
>>The spaces are deleted to make both of them the same length
>>
>>
>>>From: Kent Johnson <kent_johnson at skillsoft.com>
>>>To: "Fathima Javeed" <fathimajaveed at hotmail.com>, tutor at python.org
>>>Subject: Re: [Tutor] need help with comparing list of sequences in
>>>Python!!
>>>Date: Mon, 30 Aug 2004 13:53:19 -0400
>>>
>>>Fuzzi,
>>>
>>>How do you count mismatches if the lengths of the sequences are
>>>different? Do you start from the front of both sequences or do you look
>>>for a best match? Do you count the extra characters in the longer string
>>>as mismatches or do you ignore them? An example or two would help.
>>>
>>>For example if
>>>s1=ABCD
>>>s2=XABDDYY
>>>how many characters do you count as different?
>>>
>>>Kent
>>>
>>>At 07:00 PM 8/29/2004 +0100, Fathima Javeed wrote:
>>>>Hi,
>>>>would really appreciate it if someone could help me in Python as i am
>>>>new to the language.
>>>>
>>>>Well i have a list of protein sequences in a text file, e.g. (dummy
>>>>data)
>>>>
>>>>MVEIGEKAPEIELVDTDLKKVKIPSDFKGKVVVLAFYPAAFTSVCTKEMCTFRDSMAKFNEVNAVVIGISVDP
>>>>PFS
>>>>
>>>>MAPITVGDVVPDGTISFFDENDQLQTVSVHSIAAGKKVILFGVPGAFTPTCSMSHVPGFIGKAEELKSKG
>>>>
>>>>APIKVGDAIPAVEVFEGEPGNKVNLAELFKGKKGVLFGVPGAFTPGCSKTHLPGFVEQAEALKAKGVQVVACL
>>>>SVND
>>>>
>>>>HGFRFKLVSDEKGEIGMKYGVVRGEGSNLAAERVTFIIDREGNIRAILRNI
>>>>
>>>>etc etc
>>>>
>>>>They are not always of the same length,
>>>>
>>>>The first sequence is always the reference sequence which i am tring to
>>>>investigate, basically to reach the objective, i need to compare each
>>>>sequence with the first one, starting with the the comparison of the
>>>>reference sequence by itself.
>>>>
>>>>The objective of the program, is to manupulate each sequence i.e.
>>>>randomly change characters and calculate the distance (Distance: Number
>>>>of letters between a pair of sequnces that dont match DIVIDED by the
>>>>length of the shortest sequence) between the sequence in question
>>>>against the reference sequence. So therefore need a program code where
>>>>it takes the first sequence as a reference sequence (constant which is
>>>>on top of the list), first it compares it with itself, then it compares
>>>>with the second sequence, then with the third sequence etc etc each at
>>>>a time,
>>>>
>>>>for the first comparison, you take a copy of the ref sequnce and
>>>>manupulate the copied sequence) i.e. randomly changing the letters in
>>>>the sequence, and calculating the distances between them.
>>>>(the letters that are used for this are: A R N D C E Q G H I L K M F P S
>>>>T W Y V)
>>>>
>>>>The reference sequence is never altered or manupulated, for the first
>>>>comparison, its the copied version of the reference sequence thats
>>>>altered.
>>>>
>>>>Randomization is done using different P values
>>>>e.g for example (P = probability of change)
>>>>if P = 0 no random change has been done
>>>>if P = 1.0 all the letters in that particular sequence has been
>>>>randomly changed, therefore p=1.0 equals to the length of the sequence
>>>>
>>>>So its calculating the distance each time between two sequences ( first
>>>>is always the reference sequnce and another second sequence) at each P
>>>>value ( starting from 0, then 0.1, 0.2, ....... 1.0).
>>>>
>>>>Note: Number of sequnces to be compared could be any number and of any
>>>>length
>>>>
>>>>I dont know how to compare each sequence with the first sequnce and how
>>>>to do randomization of the characters in the sequnce therefore to
>>>>calculate the distance for each pair of sequnce , if someone can give me
>>>>any guidance, I would be greatful
>>>>
>>>>Cheers
>>>>Fuzzi
>>>>
>>>>_________________________________________________________________
>>>>Stay in touch with absent friends - get MSN Messenger
>>>>http://www.msn.co.uk/messenger
>>>>
>>>>_______________________________________________
>>>>Tutor maillist - Tutor at python.org
>>>>http://mail.python.org/mailman/listinfo/tutor
>>
>>_________________________________________________________________
>>It's fast, it's easy and it's free. Get MSN Messenger today!
>>http://www.msn.co.uk/messenger
>
>_______________________________________________
>Tutor maillist - Tutor at python.org
>http://mail.python.org/mailman/listinfo/tutor
_________________________________________________________________
Express yourself with cool new emoticons http://www.msn.co.uk/specials/myemo
More information about the Tutor
mailing list