[Numpy-discussion] Add sliding_window_view method to numpy

Zimmermann Klaus klaus.zimmermann at smhi.se
Fri Nov 27 10:10:24 EST 2020


Hi Ralf,

On 26/11/2020 15:17, Ralf Gommers wrote:
> On Fri, Nov 6, 2020 at 4:03 PM Zimmermann Klaus
> <klaus.zimmermann at smhi.se <mailto:klaus.zimmermann at smhi.se>> wrote:
>     I was just wondering if, of the top your head, an existing, better fit
>     comes to mind?
> 
> 
> Not really. Outside of stride_tricks there's nothing that quite fits.
> This function is more in scope for something like scipy.signal.

Alright, let's keep it as is then.

Thanks and cheers
Klaus


> 
> Cheers,
> Ralf
> 
> 
> 
>     >     The reason
>     >     from my point of view is that stride tricks is really a
>     technical (and
>     >     slightly ominous) name that might throw of more application
>     oriented
>     >     programmers from finding and using this function. Thinking of my
>     >     scientist colleagues, I think those are exactly the kind of
>     users that
>     >     could benefit from such a prototyping tool.
>     >
>     >
>     > That phrasing is one of a number of concerns. NumPy is normally not in
>     > the business of providing things that are okay as a prototyping tool,
>     > but are potentially extremely slow (as pointed out in the Notes
>     section
>     > of the docstring). A function like that would basically not be the
>     right
>     > tool for almost anything in, e.g., SciPy - it requires an iterative
>     > algorithm. In NumPy we don't prefer performance at all costs, but in
>     > general it's pretty decent rather than "Numba or Cython may gain you
>     > 100x here".
> 
>     I still think that the performance concern is a bit overblown. Yes,
>     application with large windows can need more FLOPs by an equally large
>     factor. But most such applications will use small to moderate windows.
>     Furthermore, this view focuses only on FLOPs. In my current field of
>     climate science (and many others), that is almost never the limiting
>     factor. Memory demands are far more problematic and incidentally, those
>     are more likely to increase in other methods that require the storage of
>     ancillary, temporary data.
> 
>     > Other issues include:
>     > 2) It is very specific to NumPy's memory model (as pointed out by you
>     > and Sebastian) - just like the rest of stride_tricks
>     Not wrong, but on the other hand, that memory model is not exotic. C,
>     Fortran, and any number of other languages play very nicely with this,
>     just as important downstream libraries like dask.
> 
>     > 3) It has "view" in the name, which doesn't quite make sense for the
>     > main namespace (also connected to point 2 above).
>     Ok.
> 
>     > 4) The cost of putting something in the main namespace for other
>     > array/tensor libraries is large. Maybe other libraries, e.g. CuPy,
>     Dask,
>     > TensorFlow, PyTorch, JAX, MXNet, aim to reimplement part or all of the
>     > main NumPy namespace as well as possible. This would trigger
>     discussions
>     > and likely many person-weeks of work for others.
>     Agreed. Though I have to say that my whole motivation comes from
>     corresponding issues in dask that where specifically waiting for (the
>     older version of) this PR (see [1, 2,...]). But I understand that dask
>     is effectively much closer to the numpy memory model than, say, CuPy, so
>     don't take this to mean it should be in the main namespace.
> 
>     > 5) It's a useful function, but it's very much on the margins of
>     NumPy's
>     > scope. It could easily have gone into, for example, scipy.signal. At
>     > this point the bar for functions going into the main namespace
>     should be> (and is) high.
>     I agree that the bar for the main namespace should be high!
> 
>     > All this taken together means it's not even a toss-up for me. If
>     it were
>     > just one or two of these points, maybe. But given all the above, I'm
>     > pretty confident saying "it does not belong in the main namespace".
>     Again, I am happy with that.
> 
> 
>     Thanks for your thoughts and work! I really appreciate it!
> 
>     Cheers
>     Klaus
> 
>     [1] https://github.com/dask/dask/issues/4659
>     <https://github.com/dask/dask/issues/4659>
>     [2] https://github.com/pydata/xarray/issues/3608
>     <https://github.com/pydata/xarray/issues/3608>
>     [3] https://github.com/pandas-dev/pandas/issues/26959
>     <https://github.com/pandas-dev/pandas/issues/26959>
> 
> 
>     >
>     >
>     >     Cheers
>     >     Klaus
>     >
>     >
>     >
>     >     [1]
>     https://github.com/numpy/numpy/pull/17394#issuecomment-700998618
>     <https://github.com/numpy/numpy/pull/17394#issuecomment-700998618>
>     >   
>      <https://github.com/numpy/numpy/pull/17394#issuecomment-700998618
>     <https://github.com/numpy/numpy/pull/17394#issuecomment-700998618>>
>     >     [2]
>     https://github.com/numpy/numpy/pull/17394#discussion_r498215468
>     <https://github.com/numpy/numpy/pull/17394#discussion_r498215468>
>     >   
>      <https://github.com/numpy/numpy/pull/17394#discussion_r498215468
>     <https://github.com/numpy/numpy/pull/17394#discussion_r498215468>>
>     >     [3]
>     https://github.com/numpy/numpy/pull/17394#discussion_r498724340
>     <https://github.com/numpy/numpy/pull/17394#discussion_r498724340>
>     >   
>      <https://github.com/numpy/numpy/pull/17394#discussion_r498724340
>     <https://github.com/numpy/numpy/pull/17394#discussion_r498724340>>
>     >
>     >     On 06/11/2020 01:39, Sebastian Berg wrote:
>     >     > On Thu, 2020-11-05 at 17:35 -0600, Sebastian Berg wrote:
>     >     >> On Thu, 2020-11-05 at 12:51 -0800, Stephan Hoyer wrote:
>     >     >>> On Thu, Nov 5, 2020 at 11:16 AM Ralf Gommers <
>     >     >>> ralf.gommers at gmail.com <mailto:ralf.gommers at gmail.com>
>     <mailto:ralf.gommers at gmail.com <mailto:ralf.gommers at gmail.com>>>
>     >     >>> wrote:
>     >     >>>
>     >     >>>> On Thu, Nov 5, 2020 at 4:56 PM Sebastian Berg <
>     >     >>>> sebastian at sipsolutions.net
>     <mailto:sebastian at sipsolutions.net>
>     <mailto:sebastian at sipsolutions.net <mailto:sebastian at sipsolutions.net>>>
>     >     >>>> wrote:
>     >     >>>>
>     >     >>>>> Hi all,
>     >     >>>>>
>     >     >>>>> just a brief note that I merged this proposal:
>     >     >>>>>
>     >     >>>>>     https://github.com/numpy/numpy/pull/17394
>     <https://github.com/numpy/numpy/pull/17394>
>     >     <https://github.com/numpy/numpy/pull/17394
>     <https://github.com/numpy/numpy/pull/17394>>
>     >     >>>>>
>     >     >>>>> adding `np.sliding_window_view` into the 1.20 release of
>     NumPy.
>     >     >>>>>
>     >     >>>>> There was only one public API change, and that is that the
>     >     >>>>> `shape`
>     >     >>>>> argument is now called `window_shape`.
>     >     >>>>>
>     >     >>>>> This is still a good time for feedback in case you have a
>     >     >>>>> better
>     >     >>>>> idea
>     >     >>>>> e.g. for the function or parameter names.
>     >     >>>>>
>     >     >>>>
>     >     >>>> The old PR had this in the lib.stride_tricks namespace.
>     Seeing it
>     >     >>>> in the
>     >     >>>> main namespace is unexpected and likely will lead to
>     >     >>>> issues/questions,
>     >     >>>> given that such an overlapping view is going to do behave
>     in ways
>     >     >>>> the
>     >     >>>> average user will be surprised by. It may also lead to
>     requests
>     >     >>>> for
>     >     >>>> other
>     >     >>>> array/tensor libraries to implement this. I don't see any
>     >     >>>> discussion on
>     >     >>>> this in PR 17394, it looks like a decision by the PR
>     author that
>     >     >>>> no
>     >     >>>> one
>     >     >>>> commented on - reconsider that?
>     >     >>>>
>     >     >>>> Cheers,
>     >     >>>> Ralf
>     >     >>>>
>     >     >>>
>     >     >>> +1 let's keep this in the lib.stride_tricks namespace.
>     >     >>>
>     >     >>
>     >     >> I have no reservations against having it in the main
>     namespace and am
>     >     >> happy either way (it can still be exposed later in any
>     case). It is
>     >     >> the
>     >     >> conservative choice and maybe it is an uncommon enough
>     function that
>     >     >> it
>     >     >> deserves being a bit hidden...
>     >     >
>     >     >
>     >     > In any case, its the safe bet for NumPy 1.20 at least so I
>     opened
>     >     a PR:
>     >     >
>     >     >     https://github.com/numpy/numpy/pull/17720
>     <https://github.com/numpy/numpy/pull/17720>
>     >     <https://github.com/numpy/numpy/pull/17720
>     <https://github.com/numpy/numpy/pull/17720>>
>     >     >
>     >     > Name changes, etc. are also possible of course.
>     >     >
>     >     > I still think it might be nice to find a better place for
>     this type of
>     >     > function that `np.lib.stride_tricks` though, but dunno...
>     >     >
>     >     > - Sebastian
>     >     >
>     >     >
>     >     >
>     >     >>
>     >     >> But I am curious, it sounds like you have both very strong
>     >     >> reservations, and I would like to understand them better.
>     >     >>
>     >     >> The behaviour can be surprising, but that is why the
>     default is a
>     >     >> read-
>     >     >> only view.  I do not think it is worse than
>     `np.broadcast_to` in this
>     >     >> regard. (It is nowhere near as dangerous as `as_strided`.)
>     >     >>
>     >     >> It is true that it is specific to NumPy (memory model). So
>     that is
>     >     >> maybe a good enough reason right now.  But I am not sure that
>     >     >> stuffing
>     >     >> things into a pretty hidden `np.lib.*` namespaces is a
>     great long
>     >     >> term
>     >     >> solution either. There is very little useful functionality
>     hidden
>     >     >> away
>     >     >> in `np.lib.*` currently.
>     >     >>
>     >     >> Cheers,
>     >     >>
>     >     >> Sebastian
>     >     >>
>     >     >>>>
>     >     >>>>
>     >     >>>>> Cheers,
>     >     >>>>>
>     >     >>>>> Sebastian
>     >     >>>>>
>     >     >>>>>
>     >     >>>>>
>     >     >>>>> On Mon, 2020-10-12 at 08:39 +0000, Zimmermann Klaus wrote:
>     >     >>>>>> Hello,
>     >     >>>>>>
>     >     >>>>>> I would like to draw the attention of this list to PR
>     #17394
>     >     >>>>>> [1] that
>     >     >>>>>> adds the implementation of a sliding window view to numpy.
>     >     >>>>>>
>     >     >>>>>> Having a sliding window view in numpy is a longstanding
>     open
>     >     >>>>>> issue
>     >     >>>>>> (cf
>     >     >>>>>> #7753 [2] from 2016). A brief summary of the discussions
>     >     >>>>>> surrounding
>     >     >>>>>> it
>     >     >>>>>> can be found in the description of the PR.
>     >     >>>>>>
>     >     >>>>>> This PR implements a sliding window view based on stride
>     >     >>>>>> tricks.
>     >     >>>>>> Following the discussion in issue #7753, a first
>     >     >>>>>> implementation
>     >     >>>>>> was
>     >     >>>>>> provided by Fanjin Zeng in PR #10771. After some
>     discussion,
>     >     >>>>>> that PR
>     >     >>>>>> stalled and I picked up the issue in the present PR #17394.
>     >     >>>>>> It
>     >     >>>>>> is
>     >     >>>>>> based
>     >     >>>>>> on the first implementation, but follows the changed API as
>     >     >>>>>> suggested
>     >     >>>>>> by
>     >     >>>>>> Eric Wieser.
>     >     >>>>>>
>     >     >>>>>> Code reviews have been provided by Bas van Beek, Stephen
>     >     >>>>>> Hoyer,
>     >     >>>>>> and
>     >     >>>>>> Eric
>     >     >>>>>> Wieser. Sebastian Berg added the "62 - Python API" label.
>     >     >>>>>>
>     >     >>>>>>
>     >     >>>>>> Do you think this is suitable for inclusion in numpy?
>     >     >>>>>>
>     >     >>>>>> Do you consider the PR ready?
>     >     >>>>>>
>     >     >>>>>> Do you have suggestions or requests?
>     >     >>>>>>
>     >     >>>>>>
>     >     >>>>>> Thanks for your time and consideration!
>     >     >>>>>> Klaus
>     >     >>>>>>
>     >     >>>>>>
>     >     >>>>>> [1] https://github.com/numpy/numpy/pull/17394
>     <https://github.com/numpy/numpy/pull/17394>
>     >     <https://github.com/numpy/numpy/pull/17394
>     <https://github.com/numpy/numpy/pull/17394>>
>     >     >>>>>> [2] https://github.com/numpy/numpy/issues/7753
>     <https://github.com/numpy/numpy/issues/7753>
>     >     <https://github.com/numpy/numpy/issues/7753
>     <https://github.com/numpy/numpy/issues/7753>>
>     >     >>>>>> _______________________________________________
>     >     >>>>>> NumPy-Discussion mailing list
>     >     >>>>>> NumPy-Discussion at python.org
>     <mailto:NumPy-Discussion at python.org>
>     <mailto:NumPy-Discussion at python.org
>     <mailto:NumPy-Discussion at python.org>>
>     >     >>>>>>
>     https://mail.python.org/mailman/listinfo/numpy-discussion
>     <https://mail.python.org/mailman/listinfo/numpy-discussion>
>     >     <https://mail.python.org/mailman/listinfo/numpy-discussion
>     <https://mail.python.org/mailman/listinfo/numpy-discussion>>
>     >     >>>>>>
>     >     >>>>>
>     >     >>>>> _______________________________________________
>     >     >>>>> NumPy-Discussion mailing list
>     >     >>>>> NumPy-Discussion at python.org
>     <mailto:NumPy-Discussion at python.org>
>     <mailto:NumPy-Discussion at python.org
>     <mailto:NumPy-Discussion at python.org>>
>     >     >>>>>
>     https://mail.python.org/mailman/listinfo/numpy-discussion
>     <https://mail.python.org/mailman/listinfo/numpy-discussion>
>     >     <https://mail.python.org/mailman/listinfo/numpy-discussion
>     <https://mail.python.org/mailman/listinfo/numpy-discussion>>
>     >     >>>>>
>     >     >>>> _______________________________________________
>     >     >>>> NumPy-Discussion mailing list
>     >     >>>> NumPy-Discussion at python.org
>     <mailto:NumPy-Discussion at python.org>
>     <mailto:NumPy-Discussion at python.org
>     <mailto:NumPy-Discussion at python.org>>
>     >     >>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>     <https://mail.python.org/mailman/listinfo/numpy-discussion>
>     >     <https://mail.python.org/mailman/listinfo/numpy-discussion
>     <https://mail.python.org/mailman/listinfo/numpy-discussion>>
>     >     >>>>
>     >     >>>
>     >     >>> _______________________________________________
>     >     >>> NumPy-Discussion mailing list
>     >     >>> NumPy-Discussion at python.org
>     <mailto:NumPy-Discussion at python.org>
>     <mailto:NumPy-Discussion at python.org
>     <mailto:NumPy-Discussion at python.org>>
>     >     >>> https://mail.python.org/mailman/listinfo/numpy-discussion
>     <https://mail.python.org/mailman/listinfo/numpy-discussion>
>     >     <https://mail.python.org/mailman/listinfo/numpy-discussion
>     <https://mail.python.org/mailman/listinfo/numpy-discussion>>
>     >     >>
>     >     >> _______________________________________________
>     >     >> NumPy-Discussion mailing list
>     >     >> NumPy-Discussion at python.org
>     <mailto:NumPy-Discussion at python.org>
>     <mailto:NumPy-Discussion at python.org
>     <mailto:NumPy-Discussion at python.org>>
>     >     >> https://mail.python.org/mailman/listinfo/numpy-discussion
>     <https://mail.python.org/mailman/listinfo/numpy-discussion>
>     >     <https://mail.python.org/mailman/listinfo/numpy-discussion
>     <https://mail.python.org/mailman/listinfo/numpy-discussion>>
>     >     >
>     >     >
>     >     > _______________________________________________
>     >     > NumPy-Discussion mailing list
>     >     > NumPy-Discussion at python.org
>     <mailto:NumPy-Discussion at python.org>
>     <mailto:NumPy-Discussion at python.org
>     <mailto:NumPy-Discussion at python.org>>
>     >     > https://mail.python.org/mailman/listinfo/numpy-discussion
>     <https://mail.python.org/mailman/listinfo/numpy-discussion>
>     >     <https://mail.python.org/mailman/listinfo/numpy-discussion
>     <https://mail.python.org/mailman/listinfo/numpy-discussion>>
>     >     >
>     >     _______________________________________________
>     >     NumPy-Discussion mailing list
>     >     NumPy-Discussion at python.org
>     <mailto:NumPy-Discussion at python.org>
>     <mailto:NumPy-Discussion at python.org
>     <mailto:NumPy-Discussion at python.org>>
>     >     https://mail.python.org/mailman/listinfo/numpy-discussion
>     <https://mail.python.org/mailman/listinfo/numpy-discussion>
>     >     <https://mail.python.org/mailman/listinfo/numpy-discussion
>     <https://mail.python.org/mailman/listinfo/numpy-discussion>>
>     >
>     >
>     > _______________________________________________
>     > NumPy-Discussion mailing list
>     > NumPy-Discussion at python.org <mailto:NumPy-Discussion at python.org>
>     > https://mail.python.org/mailman/listinfo/numpy-discussion
>     <https://mail.python.org/mailman/listinfo/numpy-discussion>
>     >
>     _______________________________________________
>     NumPy-Discussion mailing list
>     NumPy-Discussion at python.org <mailto:NumPy-Discussion at python.org>
>     https://mail.python.org/mailman/listinfo/numpy-discussion
>     <https://mail.python.org/mailman/listinfo/numpy-discussion>
> 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
> 


More information about the NumPy-Discussion mailing list