[SciPy-dev] Casting and rtype arguments [Was: Question about 64-bit integers being cast to double precision]

Thu Oct 27 11:38:23 EDT 2005

Charles R Harris wrote:

> On 10/26/05, *Fernando Perez* <Fernando.Perez at colorado.edu
> <mailto:Fernando.Perez at colorado.edu>> wrote:
>
>     Charles R Harris wrote:
>
>     Since the 'mental slot' is already in scipy's users heads for
>     saying 'modify
>     the default output of this function to accumulate/store data in a
>     different
>     type', I think it would be reasonable to offer
>
>     sqrt(a,rtype=float)
>
>     as an optional way to prevent automatic upcasting in cases where
>     users want
>     that kind of very fine level control.  This can be done uniformly
>     across the
>     library, rather than growing a zillion foof/food/foo* post-fixed
>     forms of
>     every ufunc in the library.
>
>     We would then have:
>
>     - A basic principle for how upcasting is done, driven by the idea of
>     'protecting precision even at the cost of storage'.  This
>     principle forces
>     sqrt(2) to be a double and anint_array.sum() to accumulate to a
>     wider type.
>
>     - A uniform mechanism for overriding upcasting across the library,
>     via the
>     rtype flag.  If most/all of scipy implements this, it seems like a
>     small price
>     of learning to pay for a reasonable balance between convenience,
>     correctness
>     and efficiency.
>
>
> Yes, I think that would work well. Most of us, most of the time, could
> then rely on the unmodified functions to do the right thing. On the
> rare occasion that space really mattered, there would be a fallback
> position. It would also be easy to use a global type string mytype =
> 'Float32' and call everything critical with rtype=mytype. That would
> make it easy to change the behaviour of fairly large programs.

I agree that this would be a nice consistent interface (if we can
implement it) :)

I've added text to the docstrings for a.sum() and a.mean() to reflect
their new behaviour (re. thread on int8 array operations) and the role
of the 'rtype' argument there.  Let me know if you think anything's
wrong.  Otherwise we could aim to migrate gradually to similar behaviour
with other functions.

I'm not sure that 'rtype' (for 'return type'?) is the most accurate
name.  For a.mean() the rtype is currently the type used for
intermediate calculations (in a.sum()), not the return type.  (The
return type is float, even if the 'rtype' is int, and I agree with this
behaviour.)  The same is true, in a sense, for a.sum().  The second
example in the new a.sum() docstring is:

    >>> array([0.5, 1.5]).sum(rtype=int32)
    1

where the floats are downcast to int32 before the sum.  My guess is that
a user who goes to the trouble of specifying a non-default data type for
an operation is at least as interested in the data type of the
intermediate operations as in the return type.  Perhaps we should think
instead about the data types used for intermediate operations, as sum()
and mean() do now, and rename the argument 'itype'.

Another option would be to change the behaviour of a.sum() and a.mean()
so they really do return the given type.  But I'm not keen on this,
since we can already achieve this without any 'rtype' argument by
casting the output to the desired type, and this leaves us less control
over what actually goes on behind the scenes...

Comments?!

-- Ed