[Tutor] improving speed using and recalling C functions

Danny Yoo dyoo at hashcollision.org
Fri Apr 11 03:21:10 CEST 2014


Ok, good.

There's a few things you'll want to fix in your mymain() in order for
the profiler to work more effectively in pinpointing issues.



1.  Move functionality outside of "if __name__ == '__main__':"

At the moment, you've put the entire functionality of your program in
the body of that if statement within mymain.  That structure is
probably not right.

I see that this block is computing values for stepENE, nex, Lemin,
Lemax, and, conditionally, freq.  This should be lifted out into its
own function.  You should also note that, because 'freq' is computed
conditionally, there are certain code paths in which your mymain()
will fail.  This is most likely a bad thing.

Recommendation: have mymain() take in parameters.  Move LEstep()
toplevel.  Restructure to:

################################################
import sys

def mymain(stepENE, nex, Lemin, Lemax, freq):
    ## everything starting after "eel = list(range(nex))..."
    ##


if __name__ == '__main__':
    if len(sys.argv)<=1:
        stepENE, nex, Lemin, Lemax = LEstep(200)
    elif len(sys.argv) <= 2:
        stepENE, nex, Lemin, Lemax = LEstep(int(sys.argv[1]))
    else:
        stepENE, nex, Lemin, Lemax = LEstep(int(sys.argv[1]))
        freq=float(sys.argv[2])
    mymain(stepENE, nex, Lemin, Lemax, freq)
################################################



2.  Do not slice lists unless you really mean to do so.  I see a lot
of slices that do not look right.  Examples like:

################################################
for k in eel[:]:
    # code cut ...
################################################

Just loop over eel.  No copy necessary.   This is doing a lot more
memory copying over and over again.  Instead:

################################################
for k in eel:
    # code cut ...
################################################

Do this everywhere you're creating slices with [:], unless you really
need to copy.  This is happening in multiple places in the code.



3.  Be sure to remove dead variables.  There are variables here that
are not used.  omegacliston is dead code, for example.  You're
appending to it, but doing nothing with its value.


4.  Watch for repetitive code.  There's something about the use of
MYMAP1, MYMAP2, etc. that looks very suspicious.  That is, the block
here:

################################################
                oo = 0
                for k in eel:
                        MYMAP1[i, j, k] = MYMAP1[i, j, k] + myinternet[oo]
                        oo = oo + 1
                for k in eel:
                        MYMAP2[i, j, k] = MYMAP2[i, j, k] + myinternet[oo]
                        oo = oo + 1
                for k in eel:
                        MYMAP3[i, j, k] = MYMAP3[i, j, k] + myinternet[oo]
                        oo = oo + 1
                for k in eel:
                        MYMAP4[i, j, k] = MYMAP4[i, j, k] + myinternet[oo]
                        oo = oo + 1
                for k in eel:
                        MYMAP5[i, j, k] = MYMAP5[i, j, k] + myinternet[oo]
                        oo = oo + 1
################################################

feels repetitive and strange.  Martin Brown identifies this problem as
well, so at least we're on the same page.  Solving this problem takes
a little bit of work.


If you have five maps, with some kind of relationship, represent and
use that.  Ah.  You're already doing this by representing BIGMAPS at
reporting time.  Then move the definition of an array of maps to the
front of the code.  Use a container holding those five maps in a
single variable.  Call it MYMAPS.

################################################
MYMAPS = [np.zeros([npha, nobs, nex], dtype=float),
          np.zeros([npha, nobs, nex], dtype=float),
          np.zeros([npha, nobs, nex], dtype=float),
          np.zeros([npha, nobs, nex], dtype=float),
          np.zeros([npha, nobs, nex], dtype=float)]
################################################

Wen you're tempted to say MYMAP1, use MYMAPS[0].  MYMAP2 -->
MYMAPS[1], and so on.


This will allow you to dissolve a lot of complexity out of the code.
We'll see this in a moment.


5.  Change the computational structure.  The computation of MYMAPS
being done after the processing of gmlis is not right.  It's the whole
reason why there's this awkward intermediate myinternet structure
that's used just to fill in each MYMAP later.  Do the processing as
part of your earlier loop.

We know that gmlis is exactly five elements long.  Just say so:

################################################
               gmlis = [my_parts[7],
                         my_parts[8],
                         my_parts[9],
                         my_parts[10],
                         my_parts[11]]
################################################

Once you have this, and once you have a definition of MYMAPS, then you
can kill a lot of the code by doing a zip loop across them both:

                for gammar, MYMAP in zip(gmlis, MYMAPS):
   ## NOTE 1
                        omC = (1.5)*(gammar**3)*c/(rho*rlc)
                        gig = omC*hcut/eVtoErg
                        for w in eel:
                                omega = (10**(w*stepENE+Lemin))*eVtoErg/hcut
                                x = omega/omC
                                kap = instruments.kappa(x)
                                Iom = (1.732050808/c)*(e**2)*gammar*kap
                                P = Iom*(c/(rho*rlc))/(2*pi)
                                phps = P/(hcut*omega)
                                www = phps/(stepPHA*sin(zobs)*stepOB)
                                MYMAP[i, j, w] += www
              ## NOTE 2

This lets you get rid of the second half of your program, essentially.
 Rather than put intermediate values packed into myinternet, and then
follow that with a loop that unpacks those values into MYMAP, just put
the values in.  There's no need for the intermediate myinternet
accumulation variable.  You know exactly where the values are supposed
to go in MYMAP, so just do it!  :P  myinternet is a red herring: it
does no useful work.

Note: this generalization works if you start thinking of arrays rather
than individual variables.  As a general practice, if you find
yourself naming variables as x_1, x_2, x_3, x_4, ... x_n and dealing
with them case by case, you probably want to make an array instead and
deal with it uniformly.


More information about the Tutor mailing list