[Numpy-discussion] Counting array elements

Tim Hochberg tim.hochberg at cox.net
Mon Oct 25 14:03:03 EDT 2004


Peter Verveer wrote:

>
> On 25 Oct 2004, at 19:32, Russell E Owen wrote:
>
>> At 7:08 PM +0200 2004-10-25, Peter Verveer wrote:
>>
>>> On 25 Oct 2004, at 18:51, Gary Strangman wrote:
>>>
>>>>
>>>>>  I'm not sure how feasible it is, but I'd much rather an 
>>>>> efficient, non-copying, 1-D view of an noncontiguous array (from 
>>>>> an enhanced version of flat or ravel or whatever) than a bunch of 
>>>>> extra methods. The former allows all of the standard methods to 
>>>>> just work efficiently using sum(ravel(A)) or sum(A.flat) [ and max 
>>>>> and min, etc]. Making special whole array methods for everything 
>>>>> just leads to method eplosion.
>>>>
>>>>
>>>>  I completely agree with this ... an efficient flat/ravel would 
>>>> seem to solve many of the issues being raised. Forgive the 
>>>> potentially naive question here, but is there any reason such an 
>>>> efficient, enhanced view can't be implemented for the .flat method?
>>>
>>>
>>> I believe it is not possible without copying data. The strides 
>>> between elements of a noncontiguous array are not always the same, 
>>> so you cannot efficiently view it as a 1D array.
>>
>>
>> How about providing an iterator that counts through all the elements 
>> of an array (e.g. arr.itervalues()). So long as C extensions could 
>> efficiently make use of such an iterator, I think it'd do the job.
>
>
> It would still be slower, because you would need a function call at 
> each element that returns a value. Not a problem if you do a lot of 
> work at each element, but if you are just adding values you want a 
> custom written C function. You can do it a the C level with macros or 
> so, (I do that in nd_image) but that would not help at the python level.
>
>> One could also imagine:
>> - arr.iteritems(), which returned (index, value) for each item
>> - a mask argument: a boolean array the same shape as the data array; 
>> True means elide the corresponding value from the data array
>> - general support for indexing
>
>
> Essentially you are suggesting to expose iterators at the python level 
> that iterate over an array in some predefined way. That is possible, 
> but I doubt it will be efficient.
>
> At the C level however, it might be worth thinking about as a way of 
> easing writing functions in C. I proposed to do it the other way 
> around in an earlier mail: providing a set of generic functions that 
> take a python or a C function to be applied at each element. I most 
> likely will implement something in that direction, but I should give 
> your idea also some thought.
>
>> More generally, I agree that sum should work the same as a function 
>> and a method, and that an extra axis argument could be a good thing 
>> (it is so common elsewhere, e.g. size). I'd be tempted to break 
>> backwards compatibility to fix this, since numarray is still new and 
>> the current situation is very confusing.
>
>
> I would absolutely vote for such a change. Simply because we would 
> like a range of such functions, e.g. minimum, maximum, and so on. Even 
> if we have to leave sum() as it is, I think we should have the 
> alternatives, we would just have to come up with an alternative name 
> for sum(). In fact I would consider volunteering implementing these 
> functions.

Why the need to break backwards compatability? If one is going to 
reimplement sum, et al so as to operate on an arbitrary set of axes 
there's no reason one couldn't maintain the current behaviour as the 
default. All that is required is to allow axis to be a number (current 
behaviour), a tuple (reduce across the designated axes) or some special 
value to sum over all (None?, "all"?).

Having two sum functions with different names is not particularly better 
than the current proposal of a method and a function.

-tim







More information about the NumPy-Discussion mailing list