[Tutor] (no subject)
bob gailer
bgailer at gmail.com
Fri Jan 8 00:00:33 CET 2010
kumar s wrote:
> dear tutors:
> I have two files. I want to take coordiates of an row in fileA and find if they are in the range of coordinates in fileB. If they are, I want to be able to map else, pass.
> thanks
> kumar
>
> file a:
> name loc x y
> a 4 40811596 40811620
> b 4 40811619 40811643
> c 4 40811649 40811673
> d 4 40811734 40811758
> e 4 40811797 40811821
> f 4 40811817 40811841
> g 4 40811895 40811919
> h 4 40811938 40811962
>
>
>
> file b:
>
> zx zy
> z1 4 + 40810323 40812000
> z2 4 + 40810323 40812000
> z3 4 + 40810323 40812000
> z4 4 + 40810323 40812000
> z5 4 + 40810323 40812000
> z6 4 + 40810323 40812000
> z7 4 + 40810323 40812000
> z8 4 + 40810323 40812000
>
>
>
>
> I want to take coordiates x and y from each row in file a, and check if they are in range of zx and zy. If they are in range then I want to be able to write both matched rows in a tab delim single row.
>
>
> my code:
>
> f1 = open('fileA','r')
> f2 = open('fileB','r')
> da = f1.read().split('\n')
> dat = da[:-1]
> ba = f2.read().split('\n')
> bat = ba[:-1]
>
>
> for m in dat:
> col = m.split('\t')
> for j in bat:
> cols = j.split('\t')
> if col[1] == cols[1]:
> xc = int(cols[2])
> yc = int(cols[3])
> if int(col[2]) in xrange(xc,yc):
> if int(col[3]) in xrange(xc,yc):
> print m+'\t'+j
>
> output:
> a 4 40811596 40811620 z1 4 + 40810323 40812000
>
>
>
> This code is too slow. Could you experts help me speed the script a lot faster.
> In each file I have over 50K rows and the script runs very slow.
>
Suggestions:
Translate the values to integer outside the comparison loop.
Test for >= lower value and <= upper value. xrange is overkill. Be aware
of Python's shortcut:
lower <= x <= upper.
Use:
for m in f1:
...
for j in f2:
--
Bob Gailer
Chapel Hill NC
919-636-4239
More information about the Tutor
mailing list