Question about scientific calculations in Python

Wed Mar 13 00:04:19 EST 2002

Martin Kaufmann wrote:
> This is the first raw version without histograms. s_vector is an array
> with all the scattering parameters, f_s_vector the respective
> scattering factors, and r_vector an array with all the distances
> between the atoms. The equation is the Debye equation, so no fancy
> integrals...
> 
>     for s in s_vector[1:]:
>         sum = 0
>         for r in r_vector[1:]:
>             x = 2 * pi * r * s
>             sum = sum + (2 * (sin(x))/x)
>         intensity = atoms * pow(f_s_vector[n], 2) * (1 + sum / atoms)
>         i_vector.append(intensity)
>         n = n + 1

I'm going to attempt to explain how you can do this with Numeric
Python, but you'll have to understand I have no clue what I'm
doing scientifically.

So in particular I'm assuming that "atoms" is a scalar, say atoms=1000.
I also don't really know why you're using s_vector[1:] ... intentionally
omitting the zeroth data point for some reason...?

Anyway, here's an interactive session that implements the above
algorithm using Numeric, with *no loops*.  Or rather, the loops are all
hidden away where you don't have to worry about them.  (They should
run faster, too.)

First, I'll import all the nice Numeric functions.

  C:\dev> python
  Python 2.2 (#28, Dec 21 2001, 12:21:22) [MSC 32 bit (Intel)] on win32
  Type "help", "copyright", "credits" or "license" for more information.
  >>>
  >>> from Numeric import *

Next, I'll create a few arrays of random data to work with.  Perhaps all
these numbers are completely unrealistic, but that's your field... :)

  >>> from random import random

  >>> s_vector = array([random()-.5 for i in range(5)])
  >>> s_vector
  array([-0.27223098,  0.24265847,  0.27374879,  0.17003544,  0.44799364])
  >>> r_vector = array([random()-.5 for i in range(5)])
  >>> r_vector
  array([-0.25522363,  0.09085211,  0.32083166,  0.48250771, -0.26395574])
  >>> f_s_vector = arange(0.1, 0.6, 0.1)
  >>> f_s_vector
  array([ 0.1,  0.2,  0.3,  0.4,  0.5])
  >>> atoms = 1000

Now, I need to make a copy of r_vector, because I want it to be
a 2-D matrix -- a column, not a row.  Because r_vector is currently
a 1-D matrix, I'll need to add a new axis to it.  This is done
by using a special slice syntax:

  >>> r_column = r_vector[:, NewAxis]
  >>> r_column
  array([[-0.25522363],
         [ 0.09085211],
         [ 0.32083166],
         [ 0.48250771],
         [-0.26395574]])

Now.  Multiplying r_column and s_vector creates a full 2-D array,
a sort of multiplication table (if that makes sense).

  >>> x = 2 * pi * r_column * s_vector
  >>> x
  array([[ 0.436554, -0.389131, -0.438988, -0.272671, -0.718410],
         [-0.155400,  0.138519,  0.156266,  0.097063,  0.255732],
         [-0.548775,  0.489161,  0.551835,  0.342765,  0.903085],
         [-0.825318,  0.735664,  0.829920,  0.515493,  1.358175],
         [ 0.451490, -0.402444, -0.454007, -0.282000, -0.742989]])

I deleted some extra spaces and decimal places to try and make that
fit nicely into this message.  When you do it at home you'll get more
precision.

Next, I'll calculate the 2 * sin(x) / x part ...

  >>> y = 2 * sin(x) / x
  >>> y
  array([[ 1.937076, 1.949906, 1.936379, 1.975308, 1.832347],
         [ 1.991959, 1.993610, 1.991870, 1.996861, 1.978271],
         [ 1.901115, 1.921189, 1.900027, 1.961066, 1.739018],
         [ 1.780558, 1.824418, 1.778189, 1.912591, 1.439403],
         [ 1.932741, 1.946448, 1.931996, 1.973597, 1.821001]])

Now I want the sum of each column.  There's a special method
that does this...

  >>> sums = add.reduce(y)
  >>> sums
  array([ 9.543451, 9.635572, 9.538462, 9.819425, 8.810042])

Now calculate the intensities from that.

  >>> i_vector = atoms * f_s_vector**2 * (1 + sums/atoms)
  >>> i_vector
  array([ 10.095434, 40.385422, 90.858461, 161.571108, 252.202510])

Let's demonstrate that this is the same result as you'd get with
your algorithm, mostly as written:

  >>> def f(s_vector, f_s_vector, r_vector):
  ...     n = 0
  ...     i_vector = []
  ...     for s in s_vector:
  ...         sum = 0
  ...         for r in r_vector:
  ...             x = 2 * pi * r * s
  ...             sum = sum + (2 * (sin(x))/x)
  ...         intensity = atoms * pow(f_s_vector[n], 2) * (1 + sum/atoms)
  ...         i_vector.append(intensity)
  ...         n = n + 1
  ...     return i_vector
  ...
  >>> f(s_vector, f_s_vector, r_vector)
  [10.095434515144179, 40.38542290296536, 90.858461626142727,
   161.57110800726178, 252.20251050463611]

Here's how that function would look using Numeric:

  >>> def f(S, FS, R, atoms):
  ...     Rcol = R[:, NewAxis]
  ...     x = 2 * pi * Rcol * S
  ...     y = 2 * sin(x) / x
  ...     sums = add.reduce(y)
  ...     I = atoms * FS**2 * (1 + sums/atoms)
  ...     return I
  ...
  >>> f(s_vector, f_s_vector, r_vector, atoms)
  array([  10.09543452,   40.3854229 ,   90.85846163,  161.57110801,
           252.2025105  ])
  >>> list(_)
  [10.095434515144179, 40.38542290296536, 90.858461626142727,
   161.57110800726178, 252.20251050463611]

Note that when Python displays an array, it only shows a few decimal
places; but full precision is stored (and you can see it if you
convert the result back to a Python list).

The histogram version is up to you.  :)

Numeric lets you mix arrays and scalars randomly and it happily
distributes multiplication and so forth.  So the resulting code
looks a bit more like the original formulas.  Some folks like that.
But if you don't, you can always code this in C and it'll be faster
still.  It won't take very long to learn C.  The real problem is
that C programs can be very difficult and time-consuming to debug.

## Jason Orendorff    http://www.jorendorff.com/