[Tutor] improving speed using and recalling C functions

Martin A. Brown martin at linux-ip.net
Fri Apr 11 02:59:15 CEST 2014


Gabriele,

> but main is the program that contains everything.

And, that is precisely the point of profiling the thing that 
contains 'everything'.  Because the bottleneck is almost always 
somewher inside of 'everything'.  But, you have to keep digging 
until you find it.

I saw that you replied to Danny Yoo with your code, and I have to 
say that this is rather domain-specific, so it may be quite 
difficult for somebody to glance at it and figure out where the 
hotspot is.  It is for this reason that we were asking about 
profiling.

Some follow-on questions:

Code:  for my_line in open('datasm0_60_5s.dat')

Q:  How big is datasm0_60_5s.dat?  Unless there's a whitespace
     pasting issue, it looks like you are reading that file for each
     run through mymain().  Has this file changed in size recently?

Code:  kap = instruments.kappa(x)

Q:  What is instruments?  A module?  Is the performance hit there?

Code:

        for k in eel[:]:
            MYMAP1[i, j, k] = MYMAP1[i, j, k] + myinternet[oo]
            oo = oo + 1

        for k in eel[:]:
            MYMAP,[i, j, k] = MYMAP1[i, j, k] + myinternet[oo]
            oo = oo + 1

         ...

Comment:  You are looping over your sliced eel five times.  Do you
    need to?  I like eel salad a great deal, as well, but, how about:

        for k in eel:
            MYMAP1[i, j, k] = MYMAP1[i, j, k] + myinternet[oo]
            MYMAP2[i, j, k] = MYMAP2[i, j, k] + myinternet[oo]
            MYMAP3[i, j, k] = MYMAP3[i, j, k] + myinternet[oo]
            MYMAP4[i, j, k] = MYMAP4[i, j, k] + myinternet[oo]
            MYMAP5[i, j, k] = MYMAP5[i, j, k] + myinternet[oo]
            oo = oo + 1

That should cut down a bit of looping time.  Especially as the eel 
grows longer.

Another suggestion, that is more along the lines of "how do I figure 
out what's broken this time in my code".  I almost always add the 
logging module to any program larger than a few lines.  Why? 
Because then, I can simply add logger lines and see what's going on. 
Since I'm a perfect programmer and, like you, I don't make mistakes, 
I never need this, but I do it anyway to look good around my 
colleagues (best practices and all).

In seriousness, using logging [0] is not at all tricky for 
standalone scripts.  (It does get a bit more involved when you are 
importing modules and libraries), but,) Consider the following:

   import sys
   import logging
   logformat='%(asctime)s %(name)s %(levelname)s %(message)s'
   logging.basicConfig(format=logformat, stream=sys.stderr,   level=logging.INFO)
   logger = logging.getLogger({ '__main__': None }.get(__name__, __name__))

With that setup at the top of the program, now you can sprinkle 
lines like this throughout your code with impunity.

   import os
   # -- calling logger.info() will print stuff to STDERR
   logger.info("silly example %r", os.environ)

   # -- calling logger.debug() will not print to STDERR
   #    using, above config
   logger.debug("debug example %d", 1)

   # -- Ok, set it so anything that is set to logging.DEBUG (or
   #    higher) is shown
   logger.setLevel(logging.DEBUG)
   logger.debug("debug example %d", 2)

   # -- and restore the prior pattern; restting so newer .debug lines
   #    are not shown
   logger.setLevel(logging.INFO)
   logger.debug("debug example %d", 3)

OK, so why is this useful?  Well, timestamps in log lines is one 
reason.  Another reason is the typical diagnostic technique.... 
"What is the value of variable x, y, z, oo, text_contens

Good luck tracking down your peformance issue!

-Martin

  [0] https://docs.python.org/2/library/logging.html
      https://docs.python.org/3/library/logging.html

-- 
Martin A. Brown
http://linux-ip.net/


More information about the Tutor mailing list