[Tutor] Help
she haohao
einstein_87 at hotmail.com
Sat May 15 19:41:32 CEST 2010
Hi,
I have some questions that I am unable to figure out.
Let say I have a file name peaks.txt.
Chr1 7 9 4.5 5.5
chr10 6 9 3.5 4.5
chr1 10 6 2.5 4.4
Question is how can i sort the file so that it looks like this:
Chr1 7 9 4.5 5.5
chr1 10 6 2.5 4.4
chr10 6 9 3.5 4.5
Next is how do I extract out the p-values(those highlighted in red)
After I extracted out all the p-values. for example all the p-values from chr1 is 6,7,9,10 and for chr10 are 6 and 9.
So for example if the p-value is 7 from chr1, i would open out a file called chr1.fa which look like this:
>chr1
ATTGTACT
ATTTGTAT
ATTCGTCA
and I will extract out the subsequence TACTA. Basically p-value(in this case its 7) position counting from second line of the chr1.fa file and print out the subsequence from starting from position 7-d and 7+d, where d=2. Thus if the p-values is taken from chr10 then we read from the a file with file name chr10.fa which can look like like:
chr10
TTAGTACT
GTACTAGT
ACGTATTT
So the question is how do I do this for all the p-values.(i.e all the p-values from chr1 and all the p-values from chr10) if let say we dont know peaks.txt files have how many lines.
And how do i output it to a file such that it will have the following format:
Chr1
peak value 6: TTGTA
peak value 7: TACTA
etc etc for all the p-values of chr1
chr10
peak value 7: TTACT
etc etc etc...
thanks for the help,
Angeline
_________________________________________________________________
Hotmail: Trusted email with Microsoft’s powerful SPAM protection.
https://signup.live.com/signup.aspx?id=60969
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20100516/52365d9a/attachment.html>
More information about the Tutor
mailing list