[Tutor] Code optmisation
Alan Gauld
alan.gauld at btinternet.com
Sat Apr 5 09:44:00 CEST 2008
"yogi" <byogi at yahoo.com> wrote
> #/bin/python
> import sys, os, csv, re
> x = 0 #Define Zero for now
> var = 1000000 #Taking the variation
> # This programme finds the SNPs from the range passed
> # csv splits columns and this file is tab spaced
> fis = csv.reader(open("divs.map", "rb"), delimiter='\t',
> quoting=csv.QUOTE_NONE)
> for row in fis:
> # csv splits columns and this file is "," spaced
> gvalues = csv.reader(open("genvalues", "rb"), delimiter=',',
> quoting=csv.QUOTE_NONE)
Move this outside the loop otherwise you re-read the file
for every line in the other file - slow!
> for gvalue in gvalues:
> # To see Columns (chr) Match
> if row[0] == gvalue[0]:
> # If Column 3 (range) is Zero print row
> if int(gvalue[3]) == x:
> a = int(gvalue[1]) - var
> b = int(gvalue[2]) + var + 1
> if int(a <= int(row[3]) <= b):
> print row
I'd probably use names like 'lo' and 'hi' instead of 'a'
and 'b' but thats a nit pick... but you don't want to convert
the result of the test to an int, the result is a boolean and
you never use the int you create so its just wasted
processing power...
> # If Column 3 (range) is not zero find matches and print row
> else:
> a = int(gvalue[1]) - var
> b = int(gvalue[2]) + var + 1
Repeated code, you could move this above the if test
since its used by both conditions. Easier to maintain if
you change the rules...
> if int(a <= int(row[3]) <= b):
> print row
again you don;t need the int() conversion.
> c = int(gvalue[3]) - var
> d = int(gvalue[4]) + var + 1
> if int(c <= int(row[3]) <=
> d):
and again. You do this so often I'd consider making it a
helper function
def inLimits(min, max, val):
lo = int(min) - var
hi = int(max) + var + 1
return lo <= int(val) <= hi
Your else clause then becomes
else:
if inLmits(gvalue[1],gvalue[2],row[3])
print row
if inLimits(gvalue[3], gvalue[4], row[3]
print row
Which is slightly more readable I think.
> Question1 : Is there a better way ?
There's always a better way.
As a general rule for processing large volumes of data
I tend to go for a SQL database. But thats mainly based
on my experience that if you want to do one lot of queries
you'll eventually want to do more - and SQL is designed
for doing queries on large datasets, Python isn't (although
Python can do SQL...).
> Question2 : For now I'm using shells time call for calculating
> time required. Does Python provide a more fine grained check.
try timeit...
> Question 2: If I have convert this code into a function.
> Should I ?
Only if you have a need to reuse it in a bigger context
or of you want to parameterize it. You could maybe break
it out into smaller helper functions such as the one I
suggested above.
HTH,
--
Alan Gauld
Author of the Learn to Program web site
http://www.freenetpages.co.uk/hp/alan.gauld
More information about the Tutor
mailing list