[Numpy-discussion] please change mean to use dtype=float
Christopher Barker
Chris.Barker at noaa.gov
Fri Sep 22 12:34:42 EDT 2006
Tim Hochberg wrote:
> It would probably be nice to expose the
> Kahan sum and maybe even the raw_kahan_sum somewhere.
What about using it for .sum() by default? What is the speed hit anyway?
In any case, having it available would be nice.
> I'm on the fence on using the array dtype for the accumulator dtype
> versus always using at least double precision for the accumulator. The
> former is easier to explain and is probably faster, but the latter is a
> lot more accuracy for basically free.
I don't think the difficulty of explanation is a big issue at all -- I'm
having a really hard time imagining someone getting confused and/or
disappointed that their single precision calculation didn't overflow or
was more accurate than expected. Anyone that did, would know enough to
understand the explanation.
In general, users expect things to "just work". They only dig into the
details when something goes wrong, like the mean of a bunch of positive
integers coming out as negative (wasn't that the OP's example?). The
fewer such instance we have, the fewer questions we have.
> speeds shake out I suppose. If the speed of using float64 is comparable
> to that of using float32, we might as well.
Only testing will tell, but my experience is that with double precision
FPUs, doubles are just as fast as floats unless you're dealing with
enough memory to make a difference in caching. In this case, only the
accumulator is double, so that wouldn't be an issue. I suppose the float
to double conversion could take some time though...
> One thing I'm not on the
> fence about is the return type: it should always match the input type,
> or dtype if that is specified.
+1
> Since numpy-scalars are
> generally the results of indexing operations, not literals, I think that
> they should behave like arrays for purposes of determining the resulting
> precision, not like python-scalars.
+1
>> Of course the accuracy is pretty bad at single precision, so
>> the possible, theoretical speed advantage at large sizes probably
>> doesn't matter.
good point. the larger the array -- the more accuracy matters.
-Chris
--
Christopher Barker, Ph.D.
Oceanographer
NOAA/OR&R/HAZMAT (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker at noaa.gov
More information about the NumPy-Discussion
mailing list