[Tutor] Help

Dave Angel davea at ieee.org
Sun May 16 03:48:21 CEST 2010


(You forgot to post to the list. Normally, you can just do a reply-all
to get both the list, and the person who last responded. You also
top-posted, rather than putting your new message at the end. I'll now
continue at the end.)

she haohao wrote:
> I am stuck because i dont know how do i extract all the p values and how do i sort the file and how i open the respective file. Thanks for helping
>
>   
>> Date: Sat, 15 May 2010 19:58:33 -0400
>> From: davea at ieee.org
>> To: einstein_87 at hotmail.com
>> CC: tutor at python.org
>> Subject: Re: [Tutor] Help
>>
>> she haohao wrote:
>>     
>>> Hi,
>>>
>>> I have some questions that I am unable to figure out. 
>>>
>>> Let say I have a file name peaks.txt.
>>>
>>> Chr1    7       9          4.5         5.5
>>> chr10   6       9          3.5         4.5
>>> chr1     10     6          2.5         4.4
>>>
>>> Question is how can i sort the file so that it looks like this:
>>>
>>>
>>>
>>> Chr1    7       9          4.5         5.5
>>> chr1     10     6          2.5         4.4
>>> chr10   6       9          3.5         4.5
>>>
>>> Next is how do I extract out the p-values(those highlighted in red)
>>>
>>> After I extracted out all the p-values. for example all the p-values from chr1 is 6,7,9,10 and for chr10 are 6 and 9.
>>>
>>> So for example if the p-value is 7 from chr1, i would open out a file called chr1.fa which look like this:
>>>
>>>   
>>>       
>>>> chr1
>>>>     
>>>>         
>>> ATTGTACT
>>> ATTTGTAT
>>> ATTCGTCA
>>>
>>> and I will extract out the subsequence TACTA. Basically p-value(in this case its 7) position counting from second line of the chr1.fa file and print out the subsequence from starting from position 7-d and 7+d, where d=2. Thus if the p-values is taken from chr10 then we read from the a file with file name chr10.fa which can look like like:
>>>
>>> chr10
>>> TTAGTACT
>>> GTACTAGT
>>> ACGTATTT
>>>
>>> So the question is how do I do this for all the p-values.(i.e all the p-values from chr1 and all the p-values from chr10) if let say we dont know peaks.txt files have how many lines.
>>>
>>> And how do i output it to a file such that it will have the following format:
>>>
>>> Chr1
>>>
>>> peak value 6: TTGTA
>>>
>>> peak value 7: TACTA
>>>
>>> etc etc for all the p-values of chr1
>>>
>>> chr10
>>>
>>> peak value 7: TTACT
>>>
>>> etc etc etc...
>>>
>>>
>>> thanks for the help,
>>> Angeline
>>>
>>>
>>>   
>>>       
>> Red has no meaning in a text message, which is what this list is
>> comprised of.
>>
>> What does your code look like now? Where are you stuck?
>>
>> str.split() can be used to divide a line up by whitespace into "words".
>> So if you split a line (string), you get a list. You can use use [] to
>> extract specific items from that list.
>>
>> The first item in that list is your key, so you can then put it into a
>> dictionary. Don't forget that a dictionary doesn't allow dups, so when
>> you see the dictionary already has a match, append to it, rather than
>> replacing it.
>>
>> DaveA
>>     

I didn't offer to write it for you, but to try to help you fix what
you've written. When you have written something that sort-of works,
please post it, along with a specific question about what's failing.

sort() will sort data, not files. If you read in the original data with
readlines(), and sort() that list, it'll be sorted by the first few
characters of each line. Note that may not be what you mean by sorted,
since Chr10 will come before Chr2. Still it'll put lines of identical
keys together. After you sort the lines, you can create another file
with open(..."w") and use writelines(). Don't forget to close() it.

But you probably don't want it sorted, you want a dictionary. Of course,
if it's an assignment, then it depends on the wording of the assignment.

open() and read() will read data from a file, any file.


DaveA



More information about the Tutor mailing list