How can I make this piece of code even faster?

Sat Jul 20 17:25:52 EDT 2013

In article <6bf4d298-b425-4357-9c1a-192e6e6cd9f0 at googlegroups.com>,
 pablobarhamalzas at gmail.com wrote:

> Ok, I'm working on a predator/prey simulation, which evolve using genetic 
> algorithms. At the moment, they use a quite simple feed-forward neural 
> network, which can change size over time. Each brain "tick" is performed by 
> the following function (inside the Brain class):
> 
>     def tick(self):
>         input_num = self.input_num 
>         hidden_num = self.hidden_num
>         output_num = self.output_num
>          
>         hidden = [0]*hidden_num
>         output = [0]*output_num
>         
>         inputs = self.input
>         h_weight = self.h_weight
>         o_weight = self.o_weight
>         
>         e = math.e
>         
>         count = -1
>         for x in range(hidden_num):
>             temp = 0
>             for y in range(input_num):
>                 count += 1
>                 temp += inputs[y] * h_weight[count]
>             hidden[x] = 1/(1+e**(-temp))  
>         
>         count = -1      
>         for x in range(output_num):
>             temp = 0 
>             for y in range(hidden_num):
>                 count += 1 
>                 temp += hidden[y] * o_weight[count]
>             output[x] = 1/(1+e**(-temp))  
>              
>         self.output = output 
> 
> The function is actually quite fast (~0.040 seconds per 200 calls, using 10 
> input, 20 hidden and 3 output neurons), and used to be much slower untill I 
> fiddled about with it a bit to make it faster. However, it is still somewhat 
> slow for what I need it.
>  
> My question to you is if you an see any obvious (or not so obvious) way of 
> making this faster. I've heard about numpy and have been reading about it, 
> but I really can't see how it could be implemented here.

First thing, I would add some instrumentation to see where the most time 
is being spent.  My guess is in the first set of nested loops, where the 
inner loop gets executed hidden_num * input_num (i.e. 10 * 20 = 200) 
times.  But timing data is better than my guess.

Assuming I'm right, though, you do compute range(input_num) 20 times.  
You don't need to do that.  You might try xrange(), or you might just 
factor out creating the list outside the outer loop.  But, none of that 
seems like it should make much difference.

What possible values can temp take?  If it can only take certain 
discrete values and you can enumerate them beforehand, you might want to 
build a dict mapping temp -> 1/(1+e**(-temp)) and then all that math 
becomes just a table lookup.