[Tutor] Help

Sat May 15 19:41:32 CEST 2010

Hi,

I have some questions that I am unable to figure out. 

Let say I have a file name peaks.txt.

Chr1    7       9          4.5         5.5
chr10   6       9          3.5         4.5
chr1     10     6          2.5         4.4

Question is how can i sort the file so that it looks like this:

Chr1    7       9          4.5         5.5
chr1     10     6          2.5         4.4
chr10   6       9          3.5         4.5

Next is how do I extract out the p-values(those highlighted in red)

After I extracted out all the p-values. for example all the p-values from chr1 is 6,7,9,10 and for chr10 are 6 and 9.

So for example if the p-value is 7 from chr1, i would open out a file called chr1.fa which look like this:

>chr1
ATTGTACT
ATTTGTAT
ATTCGTCA

and I will extract out the subsequence TACTA. Basically p-value(in this case its 7) position counting from second line of the chr1.fa file and print out the subsequence from starting from position 7-d and 7+d, where d=2. Thus if the p-values is taken from chr10 then we read from the a file with file name chr10.fa which can look like like:

chr10
TTAGTACT
GTACTAGT
ACGTATTT

So the question is how do I do this for all the p-values.(i.e all the p-values from chr1 and all the p-values from chr10) if let say we dont know peaks.txt files have how many lines.

And how do i output it to a file such that it will have the following format:

Chr1

peak value 6: TTGTA

peak value 7: TACTA

etc etc for all the p-values of chr1

chr10

peak value 7: TTACT

etc etc etc...

thanks for the help,
Angeline

_________________________________________________________________
Hotmail: Trusted email with Microsoft’s powerful SPAM protection.
https://signup.live.com/signup.aspx?id=60969
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20100516/52365d9a/attachment.html>