sorting data

Ricardo Aráoz ricaraoz at gmail.com
Mon Oct 29 12:32:55 EDT 2007


Beema shafreen wrote:
> hi all,
>            I have problem to sort the data.. the file includes data as
> follow.
> file:
> chrX:    123343123    123343182    A_16_P41787782
> chrX:    123343417    123343476    A_16_P03762840
> chrX:    123343460    123343519    A_16_P41787783
> chrX:    12334336    12334395    A_16_P03655927
> chrX:    123343756    123343815    A_16_P03762841
> chrX:    123343807    123343866    A_16_P41787784
> chrX:    123343966    123344024    A_16_P21578670
> chrX:    123344059    123344118    A_16_P21578671
> chrX:    12334438    12334497    A_16_P21384637
> chrX:    123344776    123344828    A_16_P21578672
> chrX:    123344811    123344870    A_16_P03762842
> chrX:    123345165    123345224    A_16_P41787789
> chrX:    123345360    123345419    A_16_P41787790
> chrX:    123345380    123345439    A_16_P03762843
> chrX:    123345481    123345540    A_16_P41787792
> chrX:    123345873    123345928    A_16_P41787793
> chrX:    123345891    123345950    A_16_P03762844
> 
> 
> how do is sort the file based on the column 1 and 2 with values......
> using sort option works for only one column and not for the other how do
> is sort both 1 and 2nd column so that the third column does not change.....
> my script:#sorting the file
> start_lis = []
> end_lis = []
> fh = open('chromosome_location_346010.bed','r')
> for line in fh.readlines():
>         data = line.strip().split('\t')
>         start = data[1].strip()
>         end = data[2].strip()
>         probe_id  = data[3].strip()
>         start_lis.append(start)
>        end_lis.append(end)
> start_lis.sort()
> end_lis.sort()
> for k in start_lis:
>      for i in end_lis
>                print k , i , probe_id(this doesnot worK)
>       result = start#end#probe_id ------->this doesnot work...
>         print result
>  
> What is the error and how do is sort a file based on the two column  to
> get the fourth column also with that.
> regards
> shafreen
> 

Don't know if this is what you are looking for :

dataList = []

for line in open('chromosome_location_346010.bed','r') :
    data = line.strip().split('\t')
    start = data[1].strip()
    end = data[2].strip()
    probe_id  = data[3].strip()
    dataList.append((start, end, probe_id))

dataList.sort(key=lambda x: x[1].rjust(20) + x[2].rjust(20))

for item in dataList:
    print 'Start :', item[0].rjust(11) \
          , '  - End :', item[1].rjust(11) \
          , '  - Probe :', item[2]





More information about the Python-list mailing list