[Tutor] (no subject)

bob gailer bgailer at gmail.com
Fri Jan 8 00:00:33 CET 2010


kumar s wrote:
> dear tutors:
> I have two files. I want to take coordiates of an row in fileA and find if they are in the range of coordinates in fileB. If they are, I want to be able to map else, pass. 
> thanks
> kumar
>
> file a:
> name     loc          x       y
> a	4	40811596	40811620
> b	4	40811619	40811643
> c	4	40811649	40811673
> d	4	40811734	40811758
> e	4	40811797	40811821
> f	4	40811817	40811841
> g	4	40811895	40811919
> h	4	40811938	40811962
>
>
>
> file b:
>
>                               zx       zy
> z1	4	+	40810323	40812000
> z2	4	+	40810323	40812000
> z3	4	+	40810323	40812000
> z4	4	+	40810323	40812000
> z5	4	+	40810323	40812000
> z6	4	+	40810323	40812000
> z7	4	+	40810323	40812000
> z8	4	+	40810323	40812000
>
>
>
>
> I want to take coordiates x and y from each row in file a, and check if they are in range of zx and zy. If they are in range then I want to be able to write both matched rows in a tab delim single row. 
>
>
> my code:
>
> f1 = open('fileA','r')
> f2 = open('fileB','r')
> da = f1.read().split('\n')
> dat = da[:-1]
> ba = f2.read().split('\n')
> bat = ba[:-1]
>
>
> for m in dat:
>         col = m.split('\t')
>         for j in bat:
>                 cols = j.split('\t')
>                 if col[1] == cols[1]:
>                         xc = int(cols[2])
>                         yc = int(cols[3])
>                         if int(col[2]) in xrange(xc,yc):
>                                 if int(col[3]) in xrange(xc,yc):
>                                         print m+'\t'+j
>
> output:
> a	4	40811596	40811620    z1 4 +  40810323     40812000
>
>
>
> This code is too slow. Could you experts help me speed the script a lot faster. 
> In each file I have over 50K rows and the script runs very slow. 
>   

Suggestions:

Translate the values to integer outside the comparison loop.

Test for >= lower value and <= upper value. xrange is overkill. Be aware 
of Python's shortcut:
lower <= x <= upper.

Use:
for m in f1:
  ...
  for j in f2:                                                


-- 
Bob Gailer
Chapel Hill NC
919-636-4239


More information about the Tutor mailing list