[Numpy-discussion] Scalar coercion

Christopher Hanley chanley at stsci.edu
Mon Mar 5 10:23:33 EST 2007


Hello Everyone,

Another behavior we might consider changing for 1.0.2 that I believe is 
somewhat related in theme is the default type used in computations like 
the mean() method.

This is best illustrated with the following example:

sparty> python
Python 2.5 (r25:51908, Sep 21 2006, 13:33:15)
[GCC 3.2.3 20030502 (Red Hat Linux 3.2.3-56)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
 >>> import numpy as n
 >>> n.__version__
'1.0.2.dev3568'
 >>> a = n.ones((1000,1000),dtype=n.float32)*132.00005
 >>> print a
[[ 132.00004578  132.00004578  132.00004578 ...,  132.00004578
    132.00004578  132.00004578]
  [ 132.00004578  132.00004578  132.00004578 ...,  132.00004578
    132.00004578  132.00004578]
  [ 132.00004578  132.00004578  132.00004578 ...,  132.00004578
    132.00004578  132.00004578]
  ...,
  [ 132.00004578  132.00004578  132.00004578 ...,  132.00004578
    132.00004578  132.00004578]
  [ 132.00004578  132.00004578  132.00004578 ...,  132.00004578
    132.00004578  132.00004578]
  [ 132.00004578  132.00004578  132.00004578 ...,  132.00004578
    132.00004578  132.00004578]]
 >>> a.min()
132.000045776
 >>> a.max()
132.000045776
 >>> a.mean()
133.96639999999999
 >>>


Having the mean be greater than the max is a tad odd.


The calculation of the mean is occurring with a single precision 
accumulator.  I do understand that I can force a double precision 
calculation with the following command:

 >>> a.mean(dtype=n.float64)
132.00004577636719
 >>>


I realize that one reason for not doing all calculations as double 
precision is performance.  However, my users would rather have the 
correct answer by default than quickly arriving at the wrong one.


In my opinion we should swap the default behavior.  All calculations 
should be done in double precision.  If you need the performance you can 
then go back and start setting data types.


Not having to worry about overflow would also be consistent with 
numarray's behavior.


Thank you for considering my opinion,

Chris




More information about the NumPy-Discussion mailing list