[Numpy-discussion] Parallelizable Performance Python Example

Tue Sep 22 13:03:14 EDT 2009

Hi -

I've recently been trying to adjust the performance python example (http://www.scipy.org/PerformancePython 
) so that it could be compared under a parallelized version.  I've  
adjusted the Gauss-Seidel 4 point method to a red-black checkerboarded  
(http://www.cs.colorado.edu/~mcbryan/3656.04/mail/87.htm) version that  
is much more easily parallelized on a shared memory system.

I've got some examples of this working, but I seem to be having  
trouble making it anywhere near efficient for the NumPy example (it 's  
around an order of magnitude slower than the non-red-black version).

Here's essentially what I'm doing with the NumPy solver:

    def numericTimeStep(self, dt=0.0):
        """
        Takes a time step using a NumPy expression.
        This has been adjusted to use checkerboard style indexing.
        """
        g = self.grid
        dx2, dy2 = np.float32(g.dx**2), np.float32(g.dy**2)
        dnr_inv = np.float32(0.5/(dx2 + dy2))
        u = g.u
        g.old_u = u.copy() # needed to compute the error.

        if self.count == 0:
            # Precompute Matrix Indexes
            X, Y = np.meshgrid(range(1,u.shape[0]-1),range(1,u.shape 
[1]-1))
            checker = (X+Y) % 2
            self.idx1 = checker==1
            self.idx2 = checker==0

        # The actual iteration
        g.u[1:-1, 1:-1][self.idx1] = ((g.u[0:-2, 1:-1][self.idx1] + g.u 
[2:, 1:-1][self.idx1])*dy2 +
                         (g.u[1:-1,0:-2][self.idx1] + g.u[1:-1, 2:] 
[self.idx1])*dx2)*dnr_inv

        g.u[1:-1, 1:-1][self.idx2] = ((g.u[0:-2, 1:-1][self.idx2] + g.u 
[2:, 1:-1][self.idx2])*dy2 +
                        (g.u[1:-1,0:-2][self.idx2] + g.u[1:-1, 2:] 
[self.idx2])*dx2)*dnr_inv

        return g.computeError()

Any ideas?  I presume that the double-indexing is maybe what's killing  
this, and I could precompute some boolean indexing arrays, but the  
original version of this solver (plain Gauss-Seidel, 4 point  
averaging) is rather simple and clean :-):

    def numericTimeStep(self, dt=0.0):
        """Takes a time step using a NumPy expression."""
        g = self.grid
        dx2, dy2 = g.dx**2, g.dy**2
        dnr_inv = 0.5/(dx2 + dy2)
        u = g.u
        g.old_u = u.copy() # needed to compute the error.

        # The actual iteration
        u[1:-1, 1:-1] = ((u[0:-2, 1:-1] + u[2:, 1:-1])*dy2 +
                         (u[1:-1,0:-2] + u[1:-1, 2:])*dx2)*dnr_inv

        return g.computeError()

Here's a pure python version of the red-black solver (which is, of  
course, incredibly slow, but not that much slower than the non-red- 
black version):
    def slowTimeStep(self, dt=0.0):
        """Takes a time step using straight forward Python loops."""
        g = self.grid
        nx, ny = g.u.shape
        dx2, dy2 = np.float32(g.dx**2), np.float32(g.dy**2)
        dnr_inv = np.float32(0.5/(dx2 + dy2))
        u = g.u

        err = 0.0
        for offset in range(0,2):
          for i in range(1, nx-1):
              for j in range(1 + (i + offset) % 2, ny-1, 2):
                  tmp = u[j,i]
                  u[j,i] = ((u[j-1, i] + u[j+1, i])*dx2 +
                            (u[j, i-1] + u[j, i+1])*dy2)*dnr_inv
                  diff = u[j,i] - tmp
                  err += diff*diff

        return np.sqrt(err)

--
James Snyder
Biomedical Engineering
Northwestern University
jbsnyder at fanplastic.org
http://fanplastic.org/key.txt
ph: 847.448.0386
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4012 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20090922/b5e1a41e/attachment.bin>