[Numpy-discussion] Psyco MA?
Tim Hochberg
tim.hochberg at ieee.org
Tue Feb 11 13:05:05 EST 2003
Perry Greenfield wrote:
>Tim Hochberg writes:
>
>
>> Overhead (c) Overhead (nc)
>>TimePerElement (c) TimePerElement (nc)
>>NumPy 10 us 10
>>us 85 ps 95 ps
>>NumArray 200 us 530 us
>>45 ps 135 ps
>>Psymeric 50 us 65
>>us 80 ps 80 ps
>>
>>
>>The times shown above are for Float64s and are pretty approximate, and
>>they happen to be a particularly favorable array shape for Psymeric. I
>>have seen pymeric as much as 50% slower than NumPy for large arrays of
>>certain shapes.
>>
>>The overhead for NumArray is surprisingly large. After doing this
>>experiment I'm certainly more sympathetic to Konrad wanting less
>>overhead for NumArray before he adopts it.
>>
>>
>>
>Wow! Do you really mean picoseconds? I never suspected that
>either Numeric or numarray were that fast. ;-)
>
>
My bad, I meant ns. What's a little factor of 10^3 among friends.
>Anyway, this issue is timely [Err...]. As it turns out we started
>
>
>looking at ways of improving small array performance a couple weeks
>ago and are coming closer to trying out an approach that should
>reduce the overhead significantly.
>
>But I have some questions about your benchmarks. Could you show me
>the code that is used to generate the above timings? In particular
>I'm interested in the kinds of arrays that are being operated on.
>It turns out that that the numarray overhead depends on more than
>just contiguity and it isn't obvious to me which case you are testing.
>
>
I'll send you psymeric, including all the tests by private email to
avoid cluttering up the list. (Don't worry, it's not huge -- only 750
lines of Python at this point). You can let me know if you find any
horrible issues with it.
>For example, Todd's benchmarks indicate that numarray's overhead is
>about a factor of 5 larger than numpy when the input arrays are
>contiguous and of the same type. On the other hand, if the array
>is not contiguous or requires a type conversion, the overhead is
>much larger. (Also, these cases require blocking loops over large
>arrays; we have done nothing yet to optimize the block size or
>the speed of that loop.) If you are doing the benchmark on
>contiguous, same type arrays, I'd like to get a copy of the benchmark
>program to try to see where the disagreement arises.
>
>
Basically, I'm operating on two, random contiguous, 3x3, Float64
arrays.In the noncontiguous case the arrays are indexed using [::2,::2]
and [1::2,::2] so these arrays are 2x2 and 1x2. Hmmm, that wasn't
intentional, I'm measuring axis stretching as well. However using
[::2.::2] for both axes doesn't change things a whole lot. The core
timing part looks like this:
t0 = clock()
if op == '+': c = a + b
elif op == '-': c = a - b
elif op == '*': c = a * b
elif op == '/': c = a / b
elif op == '==': c = a==b
else:
raise ValueError("unknown op %s" % op)
t1 = clock()
This is done N times, the first M values are thrown away and the
remaining values are averaged. Currently N is 3 and M is 1, so not a lot
averaging is taking place.
>The very preliminary indications are that we should be able to make
>numarray overheads approximately 3 times higher for all ufunc cases.
>That's still slower, but not by a factor of 20 as shown above. How
>much work it would take to reduce it further is unclear (the main
>bottleneck at that point appears to be how long it takes to create
>new output arrays)
>
>
That's good. I think it's important to get people like Konrad on board
and that will require dropping the overhead.
>We are still mainly in the analysis and design phase of how to
>improve performance for small arrays and block looping. We believe
>that this first step will not require moving very much of the
>existing Python code into C (but some will be). Hopefully we
>will have some working code in a couple weeks.
>
I hope it goes well.
-tim
More information about the NumPy-Discussion
mailing list