[Tutor] extracting numbers from a list

kumar s ps_python at yahoo.com
Tue Oct 17 23:42:28 CEST 2006


In continuation to :
Re: [Tutor] extracting numbers from a list



hello list

I have coordinates for exons (chunks of sequence). For
instance:

10 - 50  A
10 - 20  B
35 - 50  B
60 - 70  A
60 - 70  B
80 - 100 A
80 - 100 B
(The above coordinates and names are easier than in
dat)

Here my aim is to creat chunks of exons specific to A
or B.

For instance:
10 - 20,35 - 50 are common  to both A and B, whereas
21 - 34 is specific only to A.

The desired output for me is :

10 \t 20  A,B
21 \t 34  A
35 \t 50  A,B
60 \t 70  A,B
80 \t 100 A,B

I just learned python frm a friend and he is also a
novice.

What I could get is the break up of chunks. A problem
here I am getting number different from what I need:
[10, 20] [10, 50]
[21, 35] [10, 50]
[36, 50] [10, 50]
[60, 70] [60, 70]
[80, 100] [80, 100]

The list next to chunks is the pairs( the longer
ones).

could any one help me how can I correct [21, 35],[36,
50] to 21 \t 34 , 35 \t 50.  I tried chaning the
indexs in function chunker, it is not working for me.
Also, how can I point chunks to their names.

This is the abstract example of the complex numbers
and their sequence names.  I want to change the simple
code and then go to the complex one.

Thank you very much for your valuable time. 



REsult: what I am getting now:

[10, 20] [10, 50]
[21, 35] [10, 50]
[36, 50] [10, 50]
[60, 70] [60, 70]
[80, 100] [80, 100]



My code:




from sets import Set
dat = ['10\t50\tA', '10\t20\tB', '35\t50\tB',
'60\t70\tA', '60\t70\tB', '80\t100\tA', '80\t100\tB']

############
# creating a dictionary with coordiates as key and NM_
as value
#####

ekda = {}
for j in dat:
        cols = j.split('\t')
       
ekda.setdefault(cols[0]+'\t'+cols[1],[]).append(cols[2])
######
#getting tab delim numbers only and not the A,B
bat = []
for j in dat:
        cols = j.split('\t')
        bat.append(cols[0]+'\t'+cols[1])
pairs = [ map(int, x.split('\t')) for x in bat ]


#####################################################################################
# this function takes pairs (from the above result)and
longer blocks(exons).
# For instance:
# 10 - 20; 14 - 25; 19 - 30; 40 - 50; 45 - 60; 70 - 80
# a =
[[10,20],[14,25],[19,30],[40,50],[45,60],[70,80]]
# for j in exoner(a):
#       print j
#The result would be:
#10 - 30; 40 - 60; 70 - 80
#####################################################################################
def exoner(pairs):
        pairs.sort()
        i = iter(pairs)
        last = i.next()
        for current in i:
                if current[0] in
xrange(last[0],last[1]):
                        if current[1] > last[1]:
                                last = [last[0],
current[1]]
                        else:
                                last =
[last[0],last[1]]
                else:
                        yield last
                        last = current
        yield last
lon = exoner(pairs)
#####################################################################################
## Here I am getting all the unique numbers in dat

nums = []
for j in pairs:
        for k in j:
                nums.append(k)
unm = Set(nums)
unums = []
for x in unm:
        unums.append(x)
unums.sort()
#####################################################################################
### This function takes a list of numbers and breaks
it in pieces
## For instance [10,15,20,25,30]
#>>> i = [10,15,20,25,30]
#>>> chunker(i)
#[[10, 15], [16, 20], [21, 25], [26, 30]]
####

def chunker(lis):
        res = []
        res.append([lis[0],lis[1]])
        for m in range(2,len(lis)):
                res.append([lis[m-1]+1,lis[m]])
        return res
####
# Here I take each pair (longer block) and roll over
all the unique numbers ((unums) from dat) and check if
that number is in#the range of pair, if so, I will
break all those set of number in pair range into small
blocks
######
gdic = {}
unums.sort()
for pair in exoner(pairs):
        x = pair[0]
        y = pair[1]+1
        sml = []
        for k in unums:
                if k in range(x,y):
                        sml.append(k)
                else:
                        pass
        for j in chunker(sml):
                print j,pair






__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


More information about the Tutor mailing list