[SciPy-user] Performance Python with Weave article updated

Wed Sep 29 06:27:32 EDT 2004

Hi Francesc,

Thank you very much for your work. Appologies for the delay in replying I
have been laid low with a bug for a few days. 

It is very interesting to me to see the different ways to do things. I also
worked on my code last week and my approach was very similar I think but I
copied the techniques I had seen. I have the advantage of course of being
able to test my results in my software receiver. The version I have attached
runs in a similar time to the weave example although I didn't check actual
figures. 

I am interested in two things, the relative merits of using the ArrayType
and the technique that you used. Some arrays are output values and I found
that ArrayType did change the source array which was what I wanted. Also I
found that when I tried my code for real it didn't work and I found that the
++ operator did not increment, so the code ran fast but wasn't executing
correctly. I notice you still use ++, is there some way to make this work in
Pyrex?

I just attached the whole file, less the blurb, that will contain my
extensions, there are two functions in there at present.

Regards
Bob  

-----Original Message-----
From: Francesc Alted [mailto:falted at pytables.org] 
Sent: 24 September 2004 17:42
To: scipy-user at scipy.net
Cc: Bob.Cowdery at CGI-Europe.com
Subject: Re: [SciPy-user] Performance Python with Weave article updated

A Divendres 24 Setembre 2004 15:23, Bob.Cowdery at CGI-Europe.com va escriure:
> Ok I see the error of my ways, I should have read the article more 
> closely!! I need to expose each array using ArrayType - I guess that 
> will make all the difference to the Pyrex time...

In fact, As Prahbu has already said it is not really necessary to expose the
Array type in Pyrex.

Anyway, I looked into your code, and I saw that this is the kind of code
where Pyrex can really shine. So, I put my hands to work and this is what I
got:

1.- I've converted all the arithmetic where Numeric arrays where implied and
substituted by pointers in C. As all the arrays implied were
unidimensionals, I was able to keep the indexes (Pyrex can do that for
one-dimensional pointers, not for the general multidimensional case). 

For example, the lines:

       for tapup from 0 <= tapup <= ln-1:
           ISum = ISum + filter_phasing[tapdwn] * I_Delay[delay_ptr]
           QSum = QSum + filter_phasing[tapup] * Q_Delay[delay_ptr]

have been replaced by:

    cdef int     buflen
    cdef void    *data
    cdef double  *p_I_Delay
    cdef double  *p_Q_Delay
    cdef double  *p_filter_phasing

    if PyObject_AsWriteBuffer(I_Delay, &data, &buflen) <> 0:
        raise RuntimeError("Error getting the array data buffer")
    p_I_Delay = <double *>data
    if PyObject_AsWriteBuffer(Q_Delay, &data, &buflen) <> 0:
        raise RuntimeError("Error getting the array data buffer")
    p_Q_Delay = <double *>data
    if PyObject_AsWriteBuffer(filter_phasing, &data, &buflen) <> 0:
        raise RuntimeError("Error getting the array data buffer")
    p_filter_phasing = <double *>data

       for tapup from 0 <= tapup <= ln-1:
           ISum = ISum + p_filter_phasing[tapdwn] * p_I_Delay[delay_ptr]
           QSum = QSum + p_filter_phasing[tapup] * p_Q_Delay[delay_ptr]

and so on and so forth for the other vector operations.

That first optimization gave a speed-up of 6.2x over a the original python
code. You can find the complete code for this step in the attachement named
as: src/sigblocks_ext.pyx.versio1

2.- Then, I declared the type of variables in the main function. If you
don't do that, all the variables are considered python objects, and access
to its value is very expensive. 

This optimization gave an additional speed-up of 3.7x for a total speed-up
of 27.7x. The code is in the attachment: sigblocks_ext.pyx.versio2

3.- I've removed all the python calls in the loops, namely abs() and int().
abs() has been replaced by the fabs() C call, while int() has been removed
completely (why you try to convert a double to an int and then assign again
to a double? if you want to do that, perhaps a C call to roundf would be
better). Also, I've removed the line:

                if(fabs(usb) > peak):
                    self.m_peak = fabs(usb)

and replaced it by:

    cdef double m_peak

                if(fabs(usb) > peak):
                    m_peak = fabs(usb)

and return this maximum at the end:

    return m_peak

This optimization gave an additional speed-up of 3.7x (yes, again) for a
total speed-up of 161x. The code of this final version is in the attachment:
sigblocks_ext.pyx

Well, this is not exactly the more than 200x speed-up that you have achieved
with inline weave, but very close. Perhaps the code might be optimized still
further, although I believe the gains would be minimal.

Well, I think I have ended with a good exercise for my next seminar about
python and scientific computing :)

Cheers,

-- 
Francesc Alted

*** Confidentiality Notice *** Proprietary/Confidential
Information belonging to CGI Group Inc. and its affiliates
may be contained in this message. If you are not a recipient
indicated or intended in this message (or responsible for
delivery of this message to such person), or you think for
any reason that this message may have been addressed to you
in error, you may not use or copy or deliver this message
to anyone else.  In such case, you should destroy this
message and are asked to notify the sender by reply email.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20040929/56a6a9a5/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Copy of sigblocks_ext.pyx
Type: application/octet-stream
Size: 8326 bytes
Desc: not available
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20040929/56a6a9a5/attachment.obj>