find overlapping lines & output times observed

Linsey Raaijmakers lm.raaijmakers at gmail.com
Mon May 6 14:39:23 EDT 2013


Hello,

I have a file like this:
action start    end
50	5321 	5321
7	5323	        5347
12	5339	        5351
45	5373  	5373
45	5420	        5420
25	5425	        5425
26	5425	        5425
50	5451	        5451
45	5452  	5452
14	5497	        5503
26	5513	        5513
25	5517	        5517
45	5533	        5533
50	5533	        5533
5	5537	        5540
25	5580	        5580
45	5586  	5586
26	5595	        5595
45	5603	        5603
50	5634	        5634
45	5645	        5645
7	5657	        5689
25	5682	        5682
26	5682	        5690
26	5708	        5708
45	5717     	5717
50	5740	        5740
45	5777	        5777
45	5804	        5804
7	5805	        5845

and want to find how many times combinations occur in a time frame (between column 2 and 3 ). This can be multiple combinations, which is my problem now.
I have no problems finding overlap between 2 actions. 
I want to start with the first line, action 50. and check for all lines in the rest of the file if there are lines that overlap this action.
So the first line has no overlap, but action 25 and 26 would be a combination that overlaps, and 45 and 50 (5533	-5533). but 7,25,26 would be a combination of 3(5682-5682 & 5682-5690 & 5657-5689 because these three overlap each other.

I have a script now that identifies overlap between two actions (see bottom page), but how can I change this so that it outputs all possible combinations?

My desired output would be:

action    times observed    apex
50         5                          5321, 5451, 5533, 5634,  5740
50,45    1                          5533;5533
7           4                          5347, 5689, 5688, 5845
7,25      2                          5347;5425, 5689;5682
7,25,26 1                          5689;5682;5690

CODE: 

from collections import Counter
f = open('and.txt','r');

action_list = []
onset_list = []
apex_list = []
offset_list = []
action_counter = 0
combination_list = []


for line in f:
  fields = line.split("\t")
  for col in fields:
    action = fields[0]
    onset = fields[1]
    apex = fields[2]
    offset = fields[3]

  action_list.append(action)
  onset_list.append(onset)
  apex_list.append(apex)
  offset_list.append(offset)

action_cnvrt = map(int, action_list)
c = Counter(action_cnvrt)

filtered = list(set(action_list))
filtered_cnvrt = map(int, filtered)

for a in filtered_cnvrt:
  action_count = str(a)+"\t"+str(c[a])
  print action_count

for i in range (0,len(onset_list)):
  combination_list.append(action_list[i])
  for j in range(0,len(apex_list)):
    if i != j:
      if onset_list[j]>= onset_list[i] and apex_list[j] <= apex_list[i]:
        print action_list[j]+","+action_list[i]+'\t'+onset_list[j]+'\t'+apex_list[j]+'\t'+onset_list[i]+'\t'+apex_list[i]


I hope somebody can help me :)



More information about the Python-list mailing list