[Numpy-discussion] Quick Question about Optimization

Mon May 19 15:55:14 EDT 2008

2008/5/19 James Snyder <jbsnyder at gmail.com>:

> First off, I know that optimization is evil, and I should make sure
> that everything works as expected prior to bothering with squeezing
> out extra performance, but the situation is that this particular block
> of code works, but it is about half as fast with numpy as in matlab,
> and I'm wondering if there's a better approach than what I'm doing.
>
> I have a chunk of code, below, that generally iterates over 2000
> iterations, and the vectors that are being worked on at a given step
> generally have ~14000 elements in them.

With arrays this size, I wouldn't worry about python overhead - things
like range versus xrange or self lookups.

> Is there anything in practice here that could be done to speed this
> up?  I'm looking more for general numpy usage tips, that I can use
> while writing further code and not so things that would be obscure or
> difficult to maintain in the future.

Try using a profiler to find which steps are using most of your time.
With such a simple function it may not be very informative, but it's
worth a try.

> Also, the results of this are a binary array, I'm wondering if there's
> anything more compact for expressing than using 8 bits to represent
> each single bit.  I've poked around, but I haven't come up with any
> clean and unhackish ideas :-)

There's a tradeoff between compactness and speed here. The *fastest*
is probably one boolean per 32-bit integer. It sounds awful, I know,
but most modern CPUs have to work harder to access bytes individually
than they do to access them four at a time. On the other hand, cache
performance can make a huge difference, so compactness might actually
amount to speed. I don't think numpy has a packed bit array data type
(which is a shame, but would require substantial implementation
effort).

> I can provide the rest of the code if needed, but it's basically just
> filling some vectors with random and empty data and initializing a few
> things.

It would kind of help, since it would make it clearer what's a scalar
and what's an array, and what the dimensions of the various arrays
are.

>        for n in range(0,time_milliseconds):
>            self.u  =  self.expfac_m  *  self.prev_u +
> (1-self.expfac_m) * self.aff_input[n,:]
>            self.v = self.u + self.sigma *
> np.random.standard_normal(size=(1,self.naff))

You can use "scale" to rescale the random numbers on creation; that'll
save you a temporary.

>            self.theta = self.expfac_theta * self.prev_theta -
> (1-self.expfac_theta)
>
>            idx_spk = np.where(self.v>=self.theta)

You can probably skip the "where"; the result of the expression
self.v>=self.theta is a boolean array, which you can use directly for
indexing.

>            self.S[n,idx_spk] = 1
>            self.theta[idx_spk] = self.theta[idx_spk] + self.b

+= here might speed things up, not just in terms of temporaries but by
saving a fancy-indexing operation.

>            self.prev_u = self.u
>            self.prev_theta = self.theta

Anne