How can I make this piece of code even faster?

Sun Jul 21 01:11:51 EDT 2013

On Sat, 20 Jul 2013 13:22:03 -0700, pablobarhamalzas asked:

"How can I make this piece of code even faster?"

- Use a faster computer.
- Put in more memory.
- If using Unix or Linux, decrease the "nice" priority of the process.

I mention these because sometimes people forget that if you have a choice 
between "spend 10 hours at $70 per hour to optimize code", and "spend 
$200 to put more memory in", putting more memory in may be more cost 
effective.

Other than that, what you describe sounds like it could be a good 
candidate for PyPy to speed the code up, although PyPy is still (mostly) 
Python 2. You could take this question to the pypy mailing list and ask 
there.

http://mail.python.org/mailman/listinfo/pypy-dev

You also might like to try Cython or Numba.

As far as pure-Python optimizations, once you have a decent algorithm, 
there's probably not a lot of room for major speed ups. But a couple of 
thoughts and a possible optimized version come to mind...

1) In general, it is better/faster to iterate over lists directly, than 
indirectly by index number:

for item in sequence:
    process(item)

rather than:

for i in range(len(sequence)):
    item = sequence[i]
    process(item)

If you need both the index and the value:

for i, item in enumerate(sequence):
    print(i, process(item))

In your specific case, if I have understood your code's logic, you can 
just iterate directly over the appropriate lists, once each.

2) You perform an exponentiation using math.e**(-temp). You will probably 
find that math.exp(-temp) is both faster and more accurate.

3) If you need to add numbers, it is better to call sum() or math.fsum() 
than add them by hand. sum() may be a tiny bit faster, or maybe not, but 
fsum() is more accurate for floats.

See below for my suggestion on an optimized version.

> Ok, I'm working on a predator/prey simulation, which evolve using
> genetic algorithms. At the moment, they use a quite simple feed-forward
> neural network, which can change size over time. Each brain "tick" is
> performed by the following function (inside the Brain class):
> 
>     def tick(self):
>         input_num = self.input_num
>         hidden_num = self.hidden_num
>         output_num = self.output_num
>          
>         hidden = [0]*hidden_num
>         output = [0]*output_num
>         
>         inputs = self.input
>         h_weight = self.h_weight
>         o_weight = self.o_weight
>         
>         e = math.e
>         
>         count = -1
>         for x in range(hidden_num):
>             temp = 0
>             for y in range(input_num):
>                 count += 1
>                 temp += inputs[y] * h_weight[count]
>             hidden[x] = 1/(1+e**(-temp))
>         
>         count = -1
>         for x in range(output_num):
>             temp = 0
>             for y in range(hidden_num):
>                 count += 1
>                 temp += hidden[y] * o_weight[count]
>             output[x] = 1/(1+e**(-temp))
>              
>         self.output = output
> 
> The function is actually quite fast (~0.040 seconds per 200 calls, using
> 10 input, 20 hidden and 3 output neurons), and used to be much slower
> untill I fiddled about with it a bit to make it faster. However, it is
> still somewhat slow for what I need it.
>  
> My question to you is if you an see any obvious (or not so obvious) way
> of making this faster. I've heard about numpy and have been reading
> about it, but I really can't see how it could be implemented here.

Here's my suggestion:

    def tick(self):
        exp = math.exp
        sum = math.fsum  # more accurate than builtin sum

        # This assumes that both inputs and h_weight have exactly
        # self.input_num values.
        temp = fsum(i*w for (i, w) in zip(self.inputs, self.h_weight))
        hidden = [1/(1+exp(-temp))]*self.hidden_num

        # This assumes that both outputs and o_weight have exactly
        # self.output_num values.
        temp = fsum(o*w for (o, w) in zip(self.outputs, self.o_weight))
        self.output = [1/(1+exp(-temp))]*self.output_num

I have neither tested that this works the same as your code (or even 
works at all!) nor that it is faster, but I would expect that it will be 
faster.

Good luck!

-- 
Steven