[Numpy-discussion] Re: sum and mean methods behaviour

Thu Sep 4 06:35:03 EDT 2003

Peter Verveer writes

> Hi all,
>
> I was thinking a bit more about the changes to reduce() that Todd
> proposed,
> and have some questions:
>
> The problem that the output may not be able to hold the result of
> an operation
> is not unique to the reduce() method. For instance adding two
> arrays of type
> UInt can also give you the wrong answer:
>
> >>> array(255, UInt8) + array(255, UInt8)
> 254
>
> So, if this is a general problem, why should only the reduce method be
> enhanced to avoid this? If you implement this, should this
> capability not be
> supported more broadly than only by reduce(), for instance by universal
> functons such as 'add'? Would it not be unexpected for users that only
> reduce() provides such added functionality?
>
Certainly true (and much more likely a problem for integer multiplication
than addition). On the other hand, it is more likely to be only an
occasional problem for binary operations. With reductions, the risk is
severe that overflows will happen. For example, for addition it is
the difference between a+a for the normal operation and len(a)*a for
the reduction. Arguably reductions on Int8 and Int16 arrays are likely
to run into a problem than not.

> However, as Paul Dubois pointed out earlier, the original design
> philosphy of
> Numeric/numarray was to let the user deal with such problems
> himself and keep
> the package small and fast. This seems actually a sound decision,
> so would it
> not be better to avoid complicating numarray with these type of
> changes and
> also leave reduce as it is?
>
No, I'm inclined to change reductions because of the high potential
for problems, particularly with ints. I don't think ufunc type handling
needs to change though. Todd believes that changing reduction behavior
would not be difficult (though we will try to finish other work first before
doing that). Changing reduction behavior is probably the easiest way
of implementing the improved sum and mean functions. The only thing we
need to determine is what the default behavior is (Todd proposes
the defaults remain the same, I'm not so sure.)

> Personally I don't have a need for the proposed changes to the reduce
> function. My original complaint that started the whole discussion
> was that
> the mean() and sum() array methods did not give the correct
> result in some
> cases. I still think they should return a correct double precision value,
> even if the universal functions may not. That could be achieved
> by a separate
> implementation that does not use the universal functions. I would
> be prepared
> to provide that implementation either to replace the mean and sum
> methods, or
> as a separate add-on.
>
>