Question about scientific calculations in Python

Wed Mar 13 06:40:38 EST 2002

On Tue, 12 Mar 2002 22:15:43 -0700, "Andrew Dalke"
<dalke at dalkescientific.com> wrote:

Thanks a lot for you long answer!

>Here's some performance improvements you can make

[lots of very helpful stuff snipped...]

>Some of these changes are only faster for sufficiently long arrays,
>so you should test them out yourself.  The biggest trick is to
>precompute as much as possible (Python doesn't optimize constant
>expressions).  This includes precomputing lookups of functions
>references, like the "local_sin". 

Yes, I heard about this "local" trick, but I didn't try it yet.

>That code is almost identical with the previous code, so you can do
>the same optimization tweaks.  You could also precompute which hist
>elements have entry[1] == 0 so you don't do that check every time
>in the loop.  (As it is now, the "if" check is probably more expensive
>than the calculation, even if you know it's zero, unless there are a
>lot of zeros.)

I didn't quite understand this part. How should I precompute the zero
entries? How do I know where they are? Or should I just cut them out
before and access the histogram differently?

>If your lists are really long (O(100,000) or more?) then you might find
>that the append starts taking too long -- append is asymptotically O(n**n)
>in Python -- so that precomputing
>
>  i_vector = [None] * len(s_vector - 1)
>
>then keeping an index 'n' to replace
>  i_vector.append( ...)
>with
>  i_vector[n] = ...
>is faster.  But this is unlikely.

The list are not that long. Probably only a couple of thousands
entries.

>Probably the best optimization you can do, if PyInline is an acceptable
>approach, is to write put that sinc sum into C code.  It reduces to
>taking two lists and returning a float.

I tried to use weave.inline. Here is my code. But put your coffee away
before you read it (you might spill it otherwise over your keyboard)
and _please_ don't laugh out loudly.. I really didn't know what I was
doing, but it somehow worked...

----------- code begin -------------
    code = """
        PyObject *intens=PyList_New(s_length);
        double sum, x, s, h, f_s, i;
        for (long t=1;t<s_length;t++) {
            s=py_to_float(PyList_GetItem(s_vector.ptr(),t),"s"); 
            sum=0;
            for (long r=1;r<hist_length;r++) {
                x=2*PI*s*r/hist_mult;
                h=py_to_int(PyList_GetItem(hist.ptr(),r),"h"); 
                sum=sum+(2*h*sin(x)/x);
            }
            f_s=py_to_float(PyList_GetItem(f_s_vector.ptr(),t),"f_s");
            i=atoms*f_s*f_s*(1+sum/atoms);
            PyList_SetItem(intens, t, PyFloat_FromDouble(i));
        }
        return_val = Py::new_reference_to(intens);
        """
    intens = weave.inline(code, ['s_length', 's_vector',
		 'hist_length', 'hist', 'hist_mult', 'PI','atoms',
		 'f_s_vector'],
                            verbose=2, force=0, compiler = 'gcc')
------------- code end ------------

Today I did a couple of test, but only on small clusters. The timing
routine measured not only the double loop but also the radius
calculation before. Here the results:

------------- results
56 Atom Bi Cluster
running calculate_nohist 3 times took 26.393 s (8.798 s per run)
running calculate_hist 3 times took 17.300 s (5.767 s per run)
running calculate_hist_sci 3 times took 33.052 s (11.017 s per run)
running calculate_hist_weave 3 times took 3.101 s (1.034 s per run)

124 Atom Bi Cluster
running calculate_nohist 3 times took 126.595 s (42.198 s per run)
running calculate_hist 3 times took 21.728 s (7.243 s per run)
running calculate_hist_sci 3 times took 58.345 s (19.448 s per run)
running calculate_hist_weave 3 times took 4.247 s (1.416 s per run)
------------

The "nohist" is the first solution (slow), the "hist" is my second
solution with histograms, the "hist_sci" is the solution using
Scientific Python histograms, and the "hist_weave" is the solution
presented above. The speed of the weave solution is quite incredible!

Thanks again,

Martin