[Numpy-discussion] Do we want scalar casting to behave as it does at the moment?

Mon Jan 7 15:12:51 EST 2013

Hi,

On Mon, Jan 7, 2013 at 4:33 PM, Andrew Collette
<andrew.collette at gmail.com> wrote:
> Hi Matthew,
>
>> I realized when I thought about it, that I did not have a clear idea
>> of your exact use case.  How does the user specify the thing to add,
>> and why do you need to avoid an error in the case that adding would
>> overflow the type?  Would you mind giving an idiot-level explanation?
>
> There isn't a specific use case I had in mind... from a developer's
> perspective, what bothers me about the proposed behavior is that every
> use of "+" on user-generated input becomes a time bomb.  Since h5py
> deals with user-generated files, I have to deal with all kinds of
> dtypes, including low-precision ones like int8/uint8.  They come from
> user-supplied function and methods arguments, sure, but also from
> datasets in files; attributes; virtually everywhere.
>
> I suppose what I'm really asking is that numpy provides (continues to
> provide) a default rule in this situation, as does every other
> scientific language I've used.  One reason to avoid a ValueError in
> favor of default behavior (in addition to the large amount of work
> required to check every use of "+") is so there's an established
> behavior users know to expect.
>
> For example, one feature we're thinking of implementing involves
> adding an offset to a dataset when it's read.  Should we roll over?
> Upcast?  It seems to me there's great value in being able to say "We
> do what numpy does."  If numpy doesn't answer the question, everybody
> makes up their own rules.  There are certainly cases where the answer
> is obvious to the application: you have a huge number of int8's and
> don't want to upcast.  Or you don't want to lose precision.  But if
> numpy provides a default rule, nobody is prevented from making careful
> choices based on their application's requirements, and there's the
> additional value of having an common, documented default behavior.

Just to be clear, you mean you might have something like this?

def my_func('array_name', some_offset):
    arr = load_somehow('array_name') # dtype hitherto unknown
    return arr + some_offset

?  And the problem is that it fails late?   Is it really better that
something bad happens for the addition than that it raises an error?

You'll also often get an error when trying to add structured dtypes,
but maybe you cant return these from a 'load'?

Best,

Matthew