[Numpy-discussion] Psyco MA?

Tue Feb 11 13:05:05 EST 2003

Perry Greenfield wrote:

>Tim Hochberg writes:
>  
>
>>                      Overhead (c)       Overhead (nc)     
>>TimePerElement (c)   TimePerElement (nc)
>>NumPy                 10 us                  10 
>>us                        85 ps                         95  ps
>>NumArray          200 us                 530 us                        
>>45 ps                        135 ps
>>Psymeric               50 us                  65 
>>us                         80 ps                           80 ps
>>
>>
>>The times shown above are for Float64s and are pretty approximate, and 
>>they happen to be a particularly favorable array shape for Psymeric. I 
>>have seen pymeric as much as 50% slower than NumPy for large arrays of 
>>certain shapes.
>>
>>The overhead for NumArray is surprisingly large. After doing this 
>>experiment I'm certainly more sympathetic to Konrad wanting less 
>>overhead for NumArray before he adopts it.
>>
>>    
>>
>Wow! Do you really mean picoseconds? I never suspected that
>either Numeric or numarray were that fast. ;-)
>  
>
My bad, I meant ns. What's a little factor of 10^3 among friends.

>Anyway, this issue is timely [Err...]. As it turns out we started
>  
>
>looking at ways of improving small array performance a couple weeks
>ago and are coming closer to trying out an approach that should
>reduce the overhead significantly.
>
>But I have some questions about your benchmarks. Could you show me
>the code that is used to generate the above timings? In particular
>I'm interested in the kinds of arrays that are being operated on.
>It turns out that that the numarray overhead depends on more than
>just contiguity and it isn't obvious to me which case you are testing.
>  
>
I'll send you psymeric, including all the tests by private email to 
avoid cluttering up the list. (Don't worry, it's not huge -- only 750 
lines of Python at this point). You can let me know if you find any 
horrible issues with it.

>For example, Todd's benchmarks indicate that numarray's overhead is
>about a factor of 5 larger than numpy when the input arrays are
>contiguous and of the same type. On the other hand, if the array
>is not contiguous or requires a type conversion, the overhead is 
>much larger. (Also, these cases require blocking loops over large
>arrays; we have done nothing yet to optimize the block size or
>the speed of that loop.) If you are doing the benchmark on 
>contiguous, same type arrays, I'd like to get a copy of the benchmark
>program to try to see where the disagreement arises.
>  
>
Basically, I'm operating on two, random contiguous, 3x3, Float64 
arrays.In the noncontiguous case the arrays are indexed using [::2,::2] 
and [1::2,::2] so these arrays are 2x2 and 1x2. Hmmm, that wasn't 
intentional, I'm measuring axis stretching as well. However using 
[::2.::2] for both axes doesn't change things a whole lot. The core 
timing part looks like this:

            t0 = clock()
            if op == '+':    c = a + b
            elif op == '-':  c = a - b
            elif op == '*':  c = a * b
            elif op == '/':  c = a / b
            elif op == '==': c = a==b
            else:
                raise ValueError("unknown op %s" % op)
            t1 = clock()

This is done N times, the first M values are thrown away and the 
remaining values are averaged. Currently N is 3 and M is 1, so not a lot 
averaging is taking place.

>The very preliminary indications are that we should be able to make
>numarray overheads approximately 3 times higher for all ufunc cases.
>That's still slower, but not by a factor of 20 as shown above. How 
>much work it would take to reduce it further is unclear (the main
>bottleneck at that point appears to be how long it takes to create
>new output arrays)
>  
>
That's good. I think it's important to get people like Konrad on board 
and that will require dropping the overhead.

>We are still mainly in the analysis and design phase of how to
>improve performance for small arrays and block looping. We believe
>that this first step will not require moving very much of the
>existing Python code into C (but some will be). Hopefully we
>will have some working code in a couple weeks.
>

I hope it goes well.

-tim