[Numpy-discussion] New numpy functions: filled, filled_like

Mon Jan 14 10:35:49 EST 2013

Hi,

On Mon, Jan 14, 2013 at 9:02 AM, Dave Hirschfeld
<dave.hirschfeld at gmail.com> wrote:
> Robert Kern <robert.kern <at> gmail.com> writes:
>
>>
>> >>> >
>> >>> > One alternative that does not expand the API with two-liners is to let
>> >>> > the ndarray.fill() method return self:
>> >>> >
>> >>> >   a = np.empty(...).fill(20.0)
>> >>>
>> >>> This violates the convention that in-place operations never return
>> >>> self, to avoid confusion with out-of-place operations. E.g.
>> >>> ndarray.resize() versus ndarray.reshape(), ndarray.sort() versus
>> >>> np.sort(), and in the broader Python world, list.sort() versus
>> >>> sorted(), list.reverse() versus reversed(). (This was an explicit
>> >>> reason given for list.sort to not return self, even.)
>> >>>
>> >>> Maybe enabling this idiom is a good enough reason to break the
>> >>> convention ("Special cases aren't special enough to break the rules. /
>> >>> Although practicality beats purity"), but it at least makes me -0 on
>> >>> this...
>> >>>
>> >>
>> >> I tend to agree with the notion that inplace operations shouldn't return
>> >> self, but I don't know if it's just because I've been conditioned this way.
>> >> Not returning self breaks the fluid interface pattern [1], as noted in a
>> >> similar discussion on pandas [2], FWIW, though there's likely some way to
>> >> have both worlds.
>> >
>> > Ah-hah, here's the email where Guide officially proclaims that there
>> > shall be no "fluent interface" nonsense applied to in-place operators
>> > in Python, because it hurts readability (at least for Dutch people
>> > ):
>> >   http://mail.python.org/pipermail/python-dev/2003-October/038855.html
>>
>> That's a statement about the policy for the stdlib, and just one
>> person's opinion. You, and numpy, are permitted to have a different
>> opinion.
>>
>> In any case, I'm not strongly advocating for it. It's violation of
>> principle ("no fluent interfaces") is roughly in the same ballpark as
>> np.filled() ("not every two-liner needs its own function"), so I
>> thought I would toss it out there for consideration.
>>
>> --
>> Robert Kern
>>
>
> FWIW I'm +1 on the idea. Perhaps because I just don't see many practical
> downsides to breaking the convention but I regularly see a big issue with there
> being no way to instantiate an array with a particular value.
>
> The one obvious way to do it is use ones and multiply by the value you want. I
> work with a lot of inexperienced programmers and I see this idiom all the time.
> It takes a fair amount of numpy knowledge to know that you should do it in two
> lines by using empty and setting a slice.
>
> In [1]: %timeit NaN*ones(10000)
> 1000 loops, best of 3: 1.74 ms per loop
>
> In [2]: %%timeit
>    ...: x = empty(10000, dtype=float)
>    ...: x[:] = NaN
>    ...:
> 10000 loops, best of 3: 28 us per loop
>
> In [3]: 1.74e-3/28e-6
> Out[3]: 62.142857142857146
>
>
> Even when not in the mythical "tight loop" setting an array to one and then
> multiplying uses up a lot of cycles - it's nearly 2 orders of magnitude slower
> than what we know they *should* be doing.
>
> I'm agnostic as to whether fill should be modified or new functions provided but
> I think numpy is currently missing this functionality and that providing it
> would save a lot of new users from shooting themselves in the foot performance-
> wise.

Is this a fair summary?

=> fill(shape, val), fill_like(arr, val) - new functions, as proposed
For: readable, seems to fit a pattern often used, presence in
namespace may clue people into using the 'fill' rather than * val or +
val
Con: a very simple alias for a = ones(shape) ; a.fill(val), maybe
cluttering already full namespace.

=> empty(shape).fill(val) - by allowing return value from arr.fill(val)
For: readable
Con: breaks guideline not to return anything from in-place operations,
no presence in namespace means users may not find this pattern.

=> no new API
For : easy maintenance
Con : harder for users to discover fill pattern, filling a new array
requires two lines instead of one.

So maybe the decision rests on:

How important is it that users see these function names in the
namespace in order to discover the pattern "a = ones(shape) ;
a.fill(val)"?

How important is it to obey guidelines for no-return-from-in-place?

How important is it to avoid expanding the namespace?

How common is this pattern?

On the last, I'd say that the only common use I have for this pattern
is to fill an array with NaN.

Cheers,

Matthew