From matti.picus at gmail.com Sun Sep 1 03:46:47 2019 From: matti.picus at gmail.com (Matti Picus) Date: Sun, 1 Sep 2019 10:46:47 +0300 Subject: [Numpy-discussion] Allowing Dependabot access to the numpy repo In-Reply-To: References: <51892585-ba6e-94dd-fb73-9d1091231939@gmail.com> Message-ID: <19a99ca0-fb96-5b08-1838-25452d5e4604@gmail.com> Discussion has died down, I think the consensus is to use Dependabot. I will proceed with allowing it access. Thanks, Matti On 29/8/19 12:07 pm, Nathaniel Smith wrote: > AFAICT all these services work by creating branches inside your repo > and then making a PR from that ? they don't make their own forks. > (Which makes some sense when you consider they would need tens of > thousands of forked epos for all the projects they work with.) > > I don't think there's any need to worry about giving GitHub Inc. (dba > Dependabot) write permissions to a GitHub repo, though. > > You do maybe want to set up CI so that it doesn't run on these > branches, since it will also run on the PRs, and running CI twice on > the same branch is slow and wasteful. > > -n > > On Thu, Aug 29, 2019, 01:45 Ryan May > wrote: > > Hi, > > The answer to why Dependabot needs write permission seems to be to > be able to work with private repos: > > https://github.com/dependabot/feedback/issues/22 > > There doesn't seem to be any way around it... :( > > Ryan > > On Thu, Aug 29, 2019 at 12:04 AM Matti Picus > > wrote: > > In PR 14378 https://github.com/numpy/numpy/pull/14378 I moved > all our python test dependencies to a test_requirements.txt > file (for building numpy the only requirement is cython). This > is worthy since it unifies the different "pip install" > commands across the different CI systems we use. Additionally, > there are services that monitor the file and will issue a PR > if any of those packages have a new release, so we can test > out new versions of dependencies in a controlled fashion. > Someone suggested Dependabot (thanks Ryan), which turns out to > be run by a company bought by github itself. > > > When signing up for the service, it asks for permissions: > https://pasteboard.co/IuTeWNz.png. The service is in use by > other projects like cpython. Does it seem OK to sign up for > this service? > > > Matti > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > -- > Ryan May > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion From einstein.edison at gmail.com Mon Sep 2 05:15:15 2019 From: einstein.edison at gmail.com (Hameer Abbasi) Date: Mon, 2 Sep 2019 09:15:15 +0000 Subject: [Numpy-discussion] =?utf-8?q?NEP_31_=E2=80=94_Context-local_and_?= =?utf-8?q?global_overrides_of_the_NumPy_API?= Message-ID: Hello all, It was recently brought to my attention that my mails to NumPy-discussion were probably going into the spam folder for many people, so here I am trying from another email. Probably Google trying to force people onto their products as usual. ? Me, Ralf Gommers and Peter Bell (both cc?d) have come up with a proposal on how to solve the array creation and duck array problems. The solution is outlined in NEP-31, currently in the form of a PR, [1] Following the high level discussion in NEP-22. [2] It would be nice to get some feedback. Full-text of the NEP: ============================================================ NEP 31 ? Context-local and global overrides of the NumPy API ============================================================ :Author: Hameer Abbasi :Author: Ralf Gommers :Author: Peter Bell :Status: Draft :Type: Standards Track :Created: 2019-08-22 Abstract -------- This NEP proposes to make all of NumPy's public API overridable via an extensible backend mechanism, using a library called ``uarray`` `[1]`_ ``uarray`` provides global and context-local overrides, as well as a dispatch mechanism similar to NEP-18 `[2]`_. First experiences with ``__array_function__`` show that it is necessary to be able to override NumPy functions that *do not take an array-like argument*, and hence aren't overridable via ``__array_function__``. The most pressing need is array creation and coercion functions - see e.g. NEP-30 `[9]`_. This NEP proposes to allow, in an opt-in fashion, overriding any part of the NumPy API. It is intended as a comprehensive resolution to NEP-22 `[3]`_, and obviates the need to add an ever-growing list of new protocols for each new type of function or object that needs to become overridable. Motivation and Scope -------------------- The motivation behind ``uarray`` is manyfold: First, there have been several attempts to allow dispatch of parts of the NumPy API, including (most prominently), the ``__array_ufunc__`` protocol in NEP-13 `[4]`_, and the ``__array_function__`` protocol in NEP-18 `[2]`_, but this has shown the need for further protocols to be developed, including a protocol for coercion (see `[5]`_). The reasons these overrides are needed have been extensively discussed in the references, and this NEP will not attempt to go into the details of why these are needed. Another pain point requiring yet another protocol is the duck-array protocol (see `[9]`_). This NEP takes a more holistic approach: It assumes that there are parts of the API that need to be overridable, and that these will grow over time. It provides a general framework and a mechanism to avoid a design of a new protocol each time this is required. This NEP proposes the following: That ``unumpy`` `[8]`_ becomes the recommended override mechanism for the parts of the NumPy API not yet covered by ``__array_function__`` or ``__array_ufunc__``, and that ``uarray`` is vendored into a new namespace within NumPy to give users and downstream dependencies access to these overrides. This vendoring mechanism is similar to what SciPy decided to do for making ``scipy.fft`` overridable (see `[10]`_). Detailed description -------------------- **Note:** *This section will not attempt to explain the specifics or the mechanism of ``uarray``, that is explained in the ``uarray`` documentation.* `[1]`_ *However, the NumPy community will have input into the design of ``uarray``, and any backward-incompatible changes will be discussed on the mailing list.* The way we propose the overrides will be used by end users is:: import numpy.overridable as np with np.set_backend(backend): x = np.asarray(my_array, dtype=dtype) And a library that implements a NumPy-like API will use it in the following manner (as an example):: import numpy.overridable as np _ua_implementations = {} __ua_domain__ = "numpy" def __ua_function__(func, args, kwargs): fn = _ua_implementations.get(func, None) return fn(*args, **kwargs) if fn is not None else NotImplemented def implements(ua_func): def inner(func): _ua_implementations[ua_func] = func return func return inner @implements(np.asarray) def asarray(a, dtype=None, order=None): # Code here # Either this method or __ua_convert__ must # return NotImplemented for unsupported types, # Or they shouldn't be marked as dispatchable. # Provides a default implementation for ones and zeros. @implements(np.full) def full(shape, fill_value, dtype=None, order='C'): # Code here The only change this NEP proposes at its acceptance, is to make ``unumpy`` the officially recommended way to override NumPy. ``unumpy`` will remain a separate repository/package (which we propose to vendor to avoid a hard dependency, and use the separate ``unumpy`` package only if it is installed) rather than depend on for the time being), and will be developed primarily with the input of duck-array authors and secondarily, custom dtype authors, via the usual GitHub workflow. There are a few reasons for this: * Faster iteration in the case of bugs or issues. * Faster design changes, in the case of needed functionality. * ``unumpy`` will work with older versions of NumPy as well. * The user and library author opt-in to the override process, rather than breakages happening when it is least expected. In simple terms, bugs in ``unumpy`` mean that ``numpy`` remains unaffected. Advantanges of ``unumpy`` over other solutions ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ``unumpy`` offers a number of advantanges over the approach of defining a new protocol for every problem encountered: Whenever there is something requiring an override, ``unumpy`` will be able to offer a unified API with very minor changes. For example: * ``ufunc`` objects can be overridden via their ``__call__``, ``reduce`` and other methods. * ``dtype`` objects can be overridden via the dispatch/backend mechanism, going as far as to allow ``np.float32`` et. al. to be overridden by overriding ``__get__``. * Other functions can be overridden in a similar fashion. * ``np.asduckarray`` goes away, and becomes ``np.asarray`` with a backend set. * The same holds for array creation functions such as ``np.zeros``, ``np.empty`` and so on. This also holds for the future: Making something overridable would require only minor changes to ``unumpy``. Another promise ``unumpy`` holds is one of default implementations. Default implementations can be provided for any multimethod, in terms of others. This allows one to override a large part of the NumPy API by defining only a small part of it. This is to ease the creation of new duck-arrays, by providing default implementations of many functions that can be easily expressed in terms of others, as well as a repository of utility functions that help in the implementation of duck-arrays that most duck-arrays would require. The last benefit is a clear way to coerce to a given backend, and a protocol for coercing not only arrays, but also ``dtype`` objects and ``ufunc`` objects with similar ones from other libraries. This is due to the existence of actual, third party dtype packages, and their desire to blend into the NumPy ecosystem (see `[6]`_). This is a separate issue compared to the C-level dtype redesign proposed in `[7]`_, it's about allowing third-party dtype implementations to work with NumPy, much like third-party array implementations. Mixing NumPy and ``unumpy`` in the same file ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Normally, one would only want to import only one of ``unumpy`` or ``numpy``, you would import it as ``np`` for familiarity. However, there may be situations where one wishes to mix NumPy and the overrides, and there are a few ways to do this, depending on the user's style:: import numpy.overridable as unumpy import numpy as np or:: import numpy as np # Use unumpy via np.overridable Related Work ------------ Previous override mechanisms ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ * NEP-18, the ``__array_function__`` protocol. `[2]`_ * NEP-13, the ``__array_ufunc__`` protocol. `[3]`_ Existing NumPy-like array implementations ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ * Dask: https://dask.org/ * CuPy: https://cupy.chainer.org/ * PyData/Sparse: https://sparse.pydata.org/ * Xnd: https://xnd.readthedocs.io/ * Astropy's Quantity: https://docs.astropy.org/en/stable/units/ Existing and potential consumers of alternative arrays ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ * Dask: https://dask.org/ * scikit-learn: https://scikit-learn.org/ * Xarray: https://xarray.pydata.org/ * TensorLy: http://tensorly.org/ Existing alternate dtype implementations ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ * ``ndtypes``: https://ndtypes.readthedocs.io/en/latest/ * Datashape: https://datashape.readthedocs.io * Plum: https://plum-py.readthedocs.io/ Implementation -------------- The implementation of this NEP will require the following steps: * Implementation of ``uarray`` multimethods corresponding to the NumPy API, including classes for overriding ``dtype``, ``ufunc`` and ``array`` objects, in the ``unumpy`` repository. * Moving backends from ``unumpy`` into the respective array libraries. Backward compatibility ---------------------- There are no backward incompatible changes proposed in this NEP. Alternatives ------------ The current alternative to this problem is NEP-30 plus adding more protocols (not yet specified) in addition to it. Even then, some parts of the NumPy API will remain non-overridable, so it's a partial alternative. The main alternative to vendoring ``unumpy`` is to simply move it into NumPy completely and not distribute it as a separate package. This would also achieve the proposed goals, however we prefer to keep it a separate package for now, for reasons already stated above. Discussion ---------- * ``uarray`` blogpost: https://labs.quansight.org/blog/2019/07/uarray-update-api-changes-overhead-and-comparison-to-__array_function__/ * The discussion section of NEP-18: https://numpy.org/neps/nep-0018-array-function-protocol.html#discussion * NEP-22: https://numpy.org/neps/nep-0022-ndarray-duck-typing-overview.html * Dask issue #4462: https://github.com/dask/dask/issues/4462 * PR #13046: https://github.com/numpy/numpy/pull/13046 * Dask issue #4883: https://github.com/dask/dask/issues/4883 * Issue #13831: https://github.com/numpy/numpy/issues/13831 * Discussion PR 1: https://github.com/hameerabbasi/numpy/pull/3 * Discussion PR 2: https://github.com/hameerabbasi/numpy/pull/4 References and Footnotes ------------------------ .. _[1]: [1] uarray, A general dispatch mechanism for Python: https://uarray.readthedocs.io .. _[2]: [2] NEP 18 ? A dispatch mechanism for NumPy?s high level array functions: https://numpy.org/neps/nep-0018-array-function-protocol.html .. _[3]: [3] NEP 22 ? Duck typing for NumPy arrays ? high level overview: https://numpy.org/neps/nep-0022-ndarray-duck-typing-overview.html .. _[4]: [4] NEP 13 ? A Mechanism for Overriding Ufuncs: https://numpy.org/neps/nep-0013-ufunc-overrides.html .. _[5]: [5] Reply to Adding to the non-dispatched implementation of NumPy methods: http://numpy-discussion.10968.n7.nabble.com/Adding-to-the-non-dispatched-implementation-of-NumPy-methods-tp46816p46874.html .. _[6]: [6] Custom Dtype/Units discussion: http://numpy-discussion.10968.n7.nabble.com/Custom-Dtype-Units-discussion-td43262.html .. _[7]: [7] The epic dtype cleanup plan: https://github.com/numpy/numpy/issues/2899 .. _[8]: [8] unumpy: NumPy, but implementation-independent: https://unumpy.readthedocs.io .. _[9]: [9] NEP 30 ? Duck Typing for NumPy Arrays - Implementation: https://www.numpy.org/neps/nep-0030-duck-array-protocol.html .. _[10]: [10] http://scipy.github.io/devdocs/fft.html#backend-control Copyright --------- This document has been placed in the public domain. Best regards, Hameer Abbasi [1] https://github.com/numpy/numpy/pull/14389 [2] https://numpy.org/neps/nep-0022-ndarray-duck-typing-overview.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Mon Sep 2 17:09:02 2019 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 2 Sep 2019 14:09:02 -0700 Subject: [Numpy-discussion] =?utf-8?q?NEP_31_=E2=80=94_Context-local_and_?= =?utf-8?q?global_overrides_of_the_NumPy_API?= In-Reply-To: References: Message-ID: On Mon, Sep 2, 2019 at 2:15 AM Hameer Abbasi wrote: > Me, Ralf Gommers and Peter Bell (both cc?d) have come up with a proposal on how to solve the array creation and duck array problems. The solution is outlined in NEP-31, currently in the form of a PR, [1] Thanks for putting this together! It'd be great to have more engagement between uarray and numpy. > ============================================================ > > NEP 31 ? Context-local and global overrides of the NumPy API > > ============================================================ Now that I've read this over, my main feedback is that right now it seems too vague and high-level to give it a fair evaluation? The idea of a NEP is to lay out a problem and proposed solution in enough detail that it can be evaluated and critiqued, but this felt to me more like it was pointing at some other documents for all the details and then promising that uarray has solutions for all our problems. > This NEP takes a more holistic approach: It assumes that there are parts of the API that need to be > overridable, and that these will grow over time. It provides a general framework and a mechanism to > avoid a design of a new protocol each time this is required. The idea of a holistic approach makes me nervous, because I'm not sure we have holistic problems. Sometimes a holistic approach is the right thing; other times it means sweeping the actual problems under the rug, so things *look* simple and clean but in fact nothing has been solved, and they just end up biting us later. And from the NEP as currently written, I can't tell whether this is the good kind of holistic or the bad kind of holistic. Now I'm writing vague handwavey things, so let me follow my own advice and make it more concrete with an example :-). When Stephan and I were writing NEP 22, the single thing we spent the most time discussing was the problem of duck-array coercion, and in particular what to do about existing code that does np.asarray(duck_array_obj). The reason this is challenging is that there's a lot of code written in Cython/C/C++ that calls np.asarray, and then blindly casts the return value to a PyArray struct and starts accessing the raw memory fields. If np.asarray starts returning anything besides a real-actual np.ndarray object, then this code will start corrupting random memory, leading to a segfault at best. Stephan felt strongly that this meant that existing np.asarray calls *must not* ever return anything besides an np.ndarray object, and therefore we needed to add a new function np.asduckarray(), or maybe an explicit opt-in flag like np.asarray(..., allow_duck_array=True). I agreed that this was a problem, but thought we might be able to get away with an "opt-out" system, where we add an allow_duck_array= flag, but make it *default* to True, and document that the Cython/C/C++ users who want to work with a raw np.ndarray object should modify their code to explicitly call np.asarray(obj, allow_duck_array=False). This would mean that for a while people who tried to pass duck-arrays into legacy library would get segfaults, but there would be a clear path for fixing these issues as they were discovered. Either way, there are also some other details to figure out: how does this affect the C version of asarray? What about np.asfortranarray ? probably that should default to allow_duck_array=False, even if we did make np.asarray default to allow_duck_array=True, right? Now if I understand right, your proposal would be to make it so any code in any package could arbitrarily change the behavior of np.asarray for all inputs, e.g. I could just decide that np.asarray([1, 2, 3]) should return some arbitrary non-np.ndarray object. It seems like this has a much greater potential for breaking existing Cython/C/C++ code, and the NEP doesn't currently describe why this extra power is useful, and it doesn't currently describe how it plans to mitigate the downsides. (For example, if a caller needs a real np.ndarray, then is there some way to explicitly request one? The NEP doesn't say.) Maybe this is all fine and there are solutions to these issues, but any proposal to address duck array coercion needs to at least talk about these issues! And that's just one example... array coercion is a particularly central and tricky problem, but the numpy API big, and there are probably other problems like this. For another example, I don't understand what the NEP is proposing to do about dtypes at all. That's why I think the NEP needs to be fleshed out a lot more before it will be possible to evaluate fairly. -n -- Nathaniel J. Smith -- https://vorpus.org From ralf.gommers at gmail.com Tue Sep 3 02:20:36 2019 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Mon, 2 Sep 2019 23:20:36 -0700 Subject: [Numpy-discussion] =?utf-8?q?NEP_31_=E2=80=94_Context-local_and_?= =?utf-8?q?global_overrides_of_the_NumPy_API?= In-Reply-To: References: Message-ID: On Mon, Sep 2, 2019 at 2:09 PM Nathaniel Smith wrote: > On Mon, Sep 2, 2019 at 2:15 AM Hameer Abbasi > wrote: > > Me, Ralf Gommers and Peter Bell (both cc?d) have come up with a proposal > on how to solve the array creation and duck array problems. The solution is > outlined in NEP-31, currently in the form of a PR, [1] > > Thanks for putting this together! It'd be great to have more > engagement between uarray and numpy. > > > ============================================================ > > > > NEP 31 ? Context-local and global overrides of the NumPy API > > > > ============================================================ > > Now that I've read this over, my main feedback is that right now it > seems too vague and high-level to give it a fair evaluation? The idea > of a NEP is to lay out a problem and proposed solution in enough > detail that it can be evaluated and critiqued, but this felt to me > more like it was pointing at some other documents for all the details > and then promising that uarray has solutions for all our problems. > This is fair enough I think. We'll need to put some more thought in where to refer to other NEPs, and where to be more concrete. > > This NEP takes a more holistic approach: It assumes that there are parts > of the API that need to be > > overridable, and that these will grow over time. It provides a general > framework and a mechanism to > > avoid a design of a new protocol each time this is required. > > The idea of a holistic approach makes me nervous, because I'm not sure > we have holistic problems. Sometimes a holistic approach is the right > thing; other times it means sweeping the actual problems under the > rug, so things *look* simple and clean but in fact nothing has been > solved, and they just end up biting us later. And from the NEP as > currently written, I can't tell whether this is the good kind of > holistic or the bad kind of holistic. > > Now I'm writing vague handwavey things, so let me follow my own advice > and make it more concrete with an example :-). > > When Stephan and I were writing NEP 22, the single thing we spent the > most time discussing was the problem of duck-array coercion, and in > particular what to do about existing code that does > np.asarray(duck_array_obj). > > The reason this is challenging is that there's a lot of code written > in Cython/C/C++ that calls np.asarray, Cython code only perhaps? It would surprise me if there's a lot of C/C++ code that explicitly calls into our Python rather than C API. and then blindly casts the > return value to a PyArray struct and starts accessing the raw memory > fields. If np.asarray starts returning anything besides a real-actual > np.ndarray object, then this code will start corrupting random memory, > leading to a segfault at best. > > Stephan felt strongly that this meant that existing np.asarray calls > *must not* ever return anything besides an np.ndarray object, and > therefore we needed to add a new function np.asduckarray(), or maybe > an explicit opt-in flag like np.asarray(..., allow_duck_array=True). > > I agreed that this was a problem, but thought we might be able to get > away with an "opt-out" system, where we add an allow_duck_array= flag, > but make it *default* to True, and document that the Cython/C/C++ > users who want to work with a raw np.ndarray object should modify > their code to explicitly call np.asarray(obj, allow_duck_array=False). > This would mean that for a while people who tried to pass duck-arrays > into legacy library would get segfaults, but there would be a clear > path for fixing these issues as they were discovered. > > Either way, there are also some other details to figure out: how does > this affect the C version of asarray? What about np.asfortranarray ? > probably that should default to allow_duck_array=False, even if we did > make np.asarray default to allow_duck_array=True, right? > > Now if I understand right, your proposal would be to make it so any > code in any package could arbitrarily change the behavior of > np.asarray for all inputs, e.g. I could just decide that > np.asarray([1, 2, 3]) should return some arbitrary non-np.ndarray > object. No, definitely not! It's all opt-in, by explicitly importing from `numpy.overridable` or `unumpy`. No behavior of anything in the existing numpy namespaces should be affected in any way. I agree with the concerns below, hence it should stay opt-in. Cheers, Ralf It seems like this has a much greater potential for breaking > existing Cython/C/C++ code, and the NEP doesn't currently describe why > this extra power is useful, and it doesn't currently describe how it > plans to mitigate the downsides. (For example, if a caller needs a > real np.ndarray, then is there some way to explicitly request one? The > NEP doesn't say.) Maybe this is all fine and there are solutions to > these issues, but any proposal to address duck array coercion needs to > at least talk about these issues! > > And that's just one example... array coercion is a particularly > central and tricky problem, but the numpy API big, and there are > probably other problems like this. For another example, I don't > understand what the NEP is proposing to do about dtypes at all. > > That's why I think the NEP needs to be fleshed out a lot more before > it will be possible to evaluate fairly. > > -n > > -- > Nathaniel J. Smith -- https://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From einstein.edison at gmail.com Tue Sep 3 05:06:38 2019 From: einstein.edison at gmail.com (Hameer Abbasi) Date: Tue, 3 Sep 2019 11:06:38 +0200 Subject: [Numpy-discussion] =?utf-8?q?NEP_31_=E2=80=94_Context-local_and_?= =?utf-8?q?global_overrides_of_the_NumPy_API?= In-Reply-To: References: Message-ID: <8bc4cb7c-3334-2a82-ba1b-94b7ed3425dd@gmail.com> Hi Nathaniel, On 02.09.19 23:09, Nathaniel Smith wrote: > On Mon, Sep 2, 2019 at 2:15 AM Hameer Abbasi wrote: >> Me, Ralf Gommers and Peter Bell (both cc?d) have come up with a proposal on how to solve the array creation and duck array problems. The solution is outlined in NEP-31, currently in the form of a PR, [1] > Thanks for putting this together! It'd be great to have more > engagement between uarray and numpy. > >> ============================================================ >> >> NEP 31 ? Context-local and global overrides of the NumPy API >> >> ============================================================ > Now that I've read this over, my main feedback is that right now it > seems too vague and high-level to give it a fair evaluation? The idea > of a NEP is to lay out a problem and proposed solution in enough > detail that it can be evaluated and critiqued, but this felt to me > more like it was pointing at some other documents for all the details > and then promising that uarray has solutions for all our problems. > >> This NEP takes a more holistic approach: It assumes that there are parts of the API that need to be >> overridable, and that these will grow over time. It provides a general framework and a mechanism to >> avoid a design of a new protocol each time this is required. > The idea of a holistic approach makes me nervous, because I'm not sure > we have holistic problems. The fact that we're having to design more and more protocols for a lot of very similar things is, to me, an indicator that we do have holistic problems that ought to be solved by a single protocol. > Sometimes a holistic approach is the right > thing; other times it means sweeping the actual problems under the > rug, so things *look* simple and clean but in fact nothing has been > solved, and they just end up biting us later. And from the NEP as > currently written, I can't tell whether this is the good kind of > holistic or the bad kind of holistic. > > Now I'm writing vague handwavey things, so let me follow my own advice > and make it more concrete with an example :-). > > When Stephan and I were writing NEP 22, the single thing we spent the > most time discussing was the problem of duck-array coercion, and in > particular what to do about existing code that does > np.asarray(duck_array_obj). > > The reason this is challenging is that there's a lot of code written > in Cython/C/C++ that calls np.asarray, and then blindly casts the > return value to a PyArray struct and starts accessing the raw memory > fields. If np.asarray starts returning anything besides a real-actual > np.ndarray object, then this code will start corrupting random memory, > leading to a segfault at best. > > Stephan felt strongly that this meant that existing np.asarray calls > *must not* ever return anything besides an np.ndarray object, and > therefore we needed to add a new function np.asduckarray(), or maybe > an explicit opt-in flag like np.asarray(..., allow_duck_array=True). > > I agreed that this was a problem, but thought we might be able to get > away with an "opt-out" system, where we add an allow_duck_array= flag, > but make it *default* to True, and document that the Cython/C/C++ > users who want to work with a raw np.ndarray object should modify > their code to explicitly call np.asarray(obj, allow_duck_array=False). > This would mean that for a while people who tried to pass duck-arrays > into legacy library would get segfaults, but there would be a clear > path for fixing these issues as they were discovered. > > Either way, there are also some other details to figure out: how does > this affect the C version of asarray? What about np.asfortranarray ? > probably that should default to allow_duck_array=False, even if we did > make np.asarray default to allow_duck_array=True, right? > > Now if I understand right, your proposal would be to make it so any > code in any package could arbitrarily change the behavior of > np.asarray for all inputs, e.g. I could just decide that > np.asarray([1, 2, 3]) should return some arbitrary non-np.ndarray > object. It seems like this has a much greater potential for breaking > existing Cython/C/C++ code, and the NEP doesn't currently describe why > this extra power is useful, and it doesn't currently describe how it > plans to mitigate the downsides. (For example, if a caller needs a > real np.ndarray, then is there some way to explicitly request one? The > NEP doesn't say.) Maybe this is all fine and there are solutions to > these issues, but any proposal to address duck array coercion needs to > at least talk about these issues! I believe I addressed this in a previous email, but the NEP doesn't suggest overriding numpy.asarray or numpy.array. It suggests overriding numpy.overridable.asarray and numpy.overridable.array, so existing code will continue to work as-is and overrides are opt-in rather than forced on you. The argument about this kind of code could be applied to return values from other functions as well. That said, there is a way to request a NumPy array object explicitly: with ua.set_backend(np): ??? x = np.asarray(...) > > And that's just one example... array coercion is a particularly > central and tricky problem, but the numpy API big, and there are > probably other problems like this. For another example, I don't > understand what the NEP is proposing to do about dtypes at all. Just as there are other kinds of arrays, there may be other kinds of dtypes that are not NumPy dtypes. They cannot be attached to a NumPy array object (as Sebastian pointed out to me in last week's Community meeting), but they can still provide other powerful features. > That's why I think the NEP needs to be fleshed out a lot more before > it will be possible to evaluate fairly. > > -n > I just pushed a new version of the NEP to my PR, the full-text of which is below. ============================================================ NEP 31 ? Context-local and global overrides of the NumPy API ============================================================ :Author: Hameer Abbasi :Author: Ralf Gommers :Author: Peter Bell :Status: Draft :Type: Standards Track :Created: 2019-08-22 Abstract -------- This NEP proposes to make all of NumPy's public API overridable via an extensible backend mechanism, using a library called ``uarray`` `[1]`_ ``uarray`` provides global and context-local overrides, as well as a dispatch mechanism similar to NEP-18 `[2]`_. First experiences with ``__array_function__`` show that it is necessary to be able to override NumPy functions that *do not take an array-like argument*, and hence aren't overridable via ``__array_function__``. The most pressing need is array creation and coercion functions - see e.g. NEP-30 `[9]`_. This NEP proposes to allow, in an opt-in fashion, overriding any part of the NumPy API. It is intended as a comprehensive resolution to NEP-22 `[3]`_, and obviates the need to add an ever-growing list of new protocols for each new type of function or object that needs to become overridable. Motivation and Scope -------------------- The motivation behind ``uarray`` is manyfold: First, there have been several attempts to allow dispatch of parts of the NumPy API, including (most prominently), the ``__array_ufunc__`` protocol in NEP-13 `[4]`_, and the ``__array_function__`` protocol in NEP-18 `[2]`_, but this has shown the need for further protocols to be developed, including a protocol for coercion (see `[5]`_). The reasons these overrides are needed have been extensively discussed in the references, and this NEP will not attempt to go into the details of why these are needed. Another pain point requiring yet another protocol is the duck-array protocol (see `[9]`_). This NEP takes a more holistic approach: It assumes that there are parts of the API that need to be overridable, and that these will grow over time. It provides a general framework and a mechanism to avoid a design of a new protocol each time this is required. This NEP proposes the following: That ``unumpy`` `[8]`_ becomes the recommended override mechanism for the parts of the NumPy API not yet covered by ``__array_function__`` or ``__array_ufunc__``, and that ``uarray`` is vendored into a new namespace within NumPy to give users and downstream dependencies access to these overrides.? This vendoring mechanism is similar to what SciPy decided to do for making ``scipy.fft`` overridable (see `[10]`_). Detailed description -------------------- **Note:** *This section will not attempt to go into too much detail about ``uarray``, that is the purpose of the ``uarray`` documentation.* `[1]`_ *However, the NumPy community will have input into the design of ``uarray``, via the issue tracker.* ``uarray`` Primer ^^^^^^^^^^^^^^^^^ Defining backends ~~~~~~~~~~~~~~~~~ ``uarray`` consists of two main protocols: ``__ua_convert__`` and ``__ua_function__``, called in that order, along with ``__ua_domain__``, which is a string defining the domain of the backend. If any of the protocols return ``NotImplemented``, we fall back to the next backend. ``__ua_convert__`` is for conversion and coercion. It has the signature ``(dispatchables, coerce)``, where ``dispatchables`` is an iterable of ``ua.Dispatchable`` objects and ``coerce`` is a boolean indicating whether or not to force the conversion. ``ua.Dispatchable`` is a simple class consisting of three simple values: ``type``, ``value``, and ``coercible``. ``__ua_convert__`` returns an iterable of the converted values, or ``NotImplemented`` in the case of failure. Returning ``NotImplemented`` here will cause ``uarray`` to move to the next available backend. ``__ua_function__`` has the signature ``(func, args, kwargs)`` and defines the actual implementation of the function. It recieves the function and its arguments. Returning ``NotImplemented`` will cause a move to the default implementation of the function if one exists, and failing that, the next backend. If all backends are exhausted, a ``ua.BackendNotImplementedError`` is raised. Backends can be registered for permanent use if required. Defining overridable multimethods ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To define an overridable function (a multimethod), one needs a few things: 1. A dispatcher that returns an iterable of ``ua.Dispatchable`` objects. 2. A reverse dispatcher that replaces dispatchable values with the supplied ?? ones. 3. A domain. 4. Optionally, a default implementation, which can be provided in terms of ?? other multimethods. As an example, consider the following:: ??? import uarray as ua ??? def full_argreplacer(args, kwargs, dispatchables): ??????? def full(shape, fill_value, dtype=None, order='C'): ??????????? return (shape, fill_value), dict( ??????????????? dtype=dispatchables[0], ??????????????? order=order ??????????? ) ??????? return full(*args, **kwargs) ??? @ua.create_multimethod(full_argreplacer, domain="numpy") ??? def full(shape, fill_value, dtype=None, order='C'): ??????? return (ua.Dispatchable(dtype, np.dtype),) A large set of examples can be found in the ``unumpy`` repository, `[8]`_. This simple act of overriding callables allows us to override: * Methods * Properties, via ``fget`` and ``fset`` * Entire objects, via ``__get__``. Using overrides ~~~~~~~~~~~~~~~ The way we propose the overrides will be used by end users is:: ??? import numpy.overridable as np ??? with np.set_backend(backend): ??????? x = np.asarray(my_array, dtype=dtype) And a library that implements a NumPy-like API will use it in the following manner (as an example):: ??? import numpy.overridable as np ??? _ua_implementations = {} ??? __ua_domain__ = "numpy" ??? def __ua_function__(func, args, kwargs): ??????? fn = _ua_implementations.get(func, None) ??????? return fn(*args, **kwargs) if fn is not None else NotImplemented ??? def implements(ua_func): ??????? def inner(func): ??????????? _ua_implementations[ua_func] = func ??????????? return func ??????? return inner ??? @implements(np.asarray) ??? def asarray(a, dtype=None, order=None): ??????? # Code here ??????? # Either this method or __ua_convert__ must ??????? # return NotImplemented for unsupported types, ??????? # Or they shouldn't be marked as dispatchable. ??? # Provides a default implementation for ones and zeros. ??? @implements(np.full) ??? def full(shape, fill_value, dtype=None, order='C'): ??????? # Code here The only change this NEP proposes at its acceptance, is to make ``unumpy`` the officially recommended way to override NumPy. ``unumpy`` will remain a separate repository/package (which we propose to vendor to avoid a hard dependency, and use the separate ``unumpy`` package only if it is installed) rather than depend on for the time being), and will be developed primarily with the input of duck-array authors and secondarily, custom dtype authors, via the usual GitHub workflow. There are a few reasons for this: * Faster iteration in the case of bugs or issues. * Faster design changes, in the case of needed functionality. * ``unumpy`` will work with older versions of NumPy as well. * The user and library author opt-in to the override process, ? rather than breakages happening when it is least expected. ? In simple terms, bugs in ``unumpy`` mean that ``numpy`` remains ? unaffected. Duck-array coercion ~~~~~~~~~~~~~~~~~~~ There are inherent problems about returning objects that are not NumPy arrays from ``numpy.array`` or ``numpy.asarray``, particularly in the context of C/C++ or Cython code that may get an object with a different memory layout than the one it expects. However, we believe this problem may apply not only to these two functions but all functions that return NumPy arrays. For this reason, overrides are opt-in for the user, by using the submodule ``numpy.overridable`` rather than ``numpy``. NumPy will continue to work unaffected by anything in ``numpy.overridable``. If the user wishes to obtain a NumPy array, there are two ways of doing it: 1. Use ``numpy.asarray`` (the non-overridable version). 2. Use ``numpy.overridable.asarray`` with the NumPy backend set and coercion ?? enabled:: ??? import numpy.overridable as np ??? with ua.set_backend(np): ??????? x = np.asarray(...) Advantanges of ``unumpy`` over other solutions ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ``unumpy`` offers a number of advantanges over the approach of defining a new protocol for every problem encountered: Whenever there is something requiring an override, ``unumpy`` will be able to offer a unified API with very minor changes. For example: * ``ufunc`` objects can be overridden via their ``__call__``, ``reduce`` and ? other methods. * Other functions can be overridden in a similar fashion. * ``np.asduckarray`` goes away, and becomes ``np.asarray`` with a backend set. * The same holds for array creation functions such as ``np.zeros``, ? ``np.empty`` and so on. This also holds for the future: Making something overridable would require only minor changes to ``unumpy``. Another promise ``unumpy`` holds is one of default implementations. Default implementations can be provided for any multimethod, in terms of others. This allows one to override a large part of the NumPy API by defining only a small part of it. This is to ease the creation of new duck-arrays, by providing default implementations of many functions that can be easily expressed in terms of others, as well as a repository of utility functions that help in the implementation of duck-arrays that most duck-arrays would require. The last benefit is a clear way to coerce to a given backend (via the ``coerce`` keyword in ``ua.set_backend``), and a protocol for coercing not only arrays, but also ``dtype`` objects and ``ufunc`` objects with similar ones from other libraries. This is due to the existence of actual, third party dtype packages, and their desire to blend into the NumPy ecosystem (see `[6]`_). This is a separate issue compared to the C-level dtype redesign proposed in `[7]`_, it's about allowing third-party dtype implementations to work with NumPy, much like third-party array implementations. These can provide features such as, for example, units, jagged arrays or other such features that are outside the scope of NumPy. Mixing NumPy and ``unumpy`` in the same file ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Normally, one would only want to import only one of ``unumpy`` or ``numpy``, you would import it as ``np`` for familiarity. However, there may be situations where one wishes to mix NumPy and the overrides, and there are a few ways to do this, depending on the user's style:: ??? import numpy.overridable as unumpy ??? import numpy as np or:: ??? import numpy as np ??? # Use unumpy via np.overridable Related Work ------------ Previous override mechanisms ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ * NEP-18, the ``__array_function__`` protocol. `[2]`_ * NEP-13, the ``__array_ufunc__`` protocol. `[3]`_ Existing NumPy-like array implementations ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ * Dask: https://dask.org/ * CuPy: https://cupy.chainer.org/ * PyData/Sparse: https://sparse.pydata.org/ * Xnd: https://xnd.readthedocs.io/ * Astropy's Quantity: https://docs.astropy.org/en/stable/units/ Existing and potential consumers of alternative arrays ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ * Dask: https://dask.org/ * scikit-learn: https://scikit-learn.org/ * xarray: https://xarray.pydata.org/ * TensorLy: http://tensorly.org/ Existing alternate dtype implementations ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ * ``ndtypes``: https://ndtypes.readthedocs.io/en/latest/ * Datashape: https://datashape.readthedocs.io * Plum: https://plum-py.readthedocs.io/ Implementation -------------- The implementation of this NEP will require the following steps: * Implementation of ``uarray`` multimethods corresponding to the ? NumPy API, including classes for overriding ``dtype``, ``ufunc`` ? and ``array`` objects, in the ``unumpy`` repository. * Moving backends from ``unumpy`` into the respective array libraries. Backward compatibility ---------------------- There are no backward incompatible changes proposed in this NEP. Alternatives ------------ The current alternative to this problem is NEP-30 plus adding more protocols (not yet specified) in addition to it.? Even then, some parts of the NumPy API will remain non-overridable, so it's a partial alternative. The main alternative to vendoring ``unumpy`` is to simply move it into NumPy completely and not distribute it as a separate package. This would also achieve the proposed goals, however we prefer to keep it a separate package for now, for reasons already stated above. Discussion ---------- * ``uarray`` blogpost: https://labs.quansight.org/blog/2019/07/uarray-update-api-changes-overhead-and-comparison-to-__array_function__/ * The discussion section of NEP-18: https://numpy.org/neps/nep-0018-array-function-protocol.html#discussion * NEP-22: https://numpy.org/neps/nep-0022-ndarray-duck-typing-overview.html * Dask issue #4462: https://github.com/dask/dask/issues/4462 * PR #13046: https://github.com/numpy/numpy/pull/13046 * Dask issue #4883: https://github.com/dask/dask/issues/4883 * Issue #13831: https://github.com/numpy/numpy/issues/13831 * Discussion PR 1: https://github.com/hameerabbasi/numpy/pull/3 * Discussion PR 2: https://github.com/hameerabbasi/numpy/pull/4 References and Footnotes ------------------------ .. _[1]: [1] uarray, A general dispatch mechanism for Python: https://uarray.readthedocs.io .. _[2]: [2] NEP 18 ? A dispatch mechanism for NumPy?s high level array functions: https://numpy.org/neps/nep-0018-array-function-protocol.html .. _[3]: [3] NEP 22 ? Duck typing for NumPy arrays ? high level overview: https://numpy.org/neps/nep-0022-ndarray-duck-typing-overview.html .. _[4]: [4] NEP 13 ? A Mechanism for Overriding Ufuncs: https://numpy.org/neps/nep-0013-ufunc-overrides.html .. _[5]: [5] Reply to Adding to the non-dispatched implementation of NumPy methods: http://numpy-discussion.10968.n7.nabble.com/Adding-to-the-non-dispatched-implementation-of-NumPy-methods-tp46816p46874.html .. _[6]: [6] Custom Dtype/Units discussion: http://numpy-discussion.10968.n7.nabble.com/Custom-Dtype-Units-discussion-td43262.html .. _[7]: [7] The epic dtype cleanup plan: https://github.com/numpy/numpy/issues/2899 .. _[8]: [8] unumpy: NumPy, but implementation-independent: https://unumpy.readthedocs.io .. _[9]: [9] NEP 30 ? Duck Typing for NumPy Arrays - Implementation: https://www.numpy.org/neps/nep-0030-duck-array-protocol.html .. _[10]: [10] http://scipy.github.io/devdocs/fft.html#backend-control Copyright --------- This document has been placed in the public domain. From warren.weckesser at gmail.com Tue Sep 3 08:56:23 2019 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Tue, 3 Sep 2019 08:56:23 -0400 Subject: [Numpy-discussion] NEP 32: Remove the financial functions from NumPy Message-ID: Github issue 2880 ("Get financial functions out of main namespace", https://github.com/numpy/numpy/issues/2880) has been open since 2013. In a recent community meeting, it was suggested that we create a NEP to propose the removal of the financial functions from NumPy. I have submitted "NEP 32: Remove the financial functions from NumPy" in a pull request at https://github.com/numpy/numpy/pull/14399. A copy of the latest version of the NEP is below. According to the NEP process document, "Once the PR is in place, the NEP should be announced on the mailing list for discussion (comments on the PR itself should be restricted to minor editorial and technical fixes)." This email is the announcement for NEP 32. The NEP includes a brief summary of the history of the financial functions, and has links to several relevant mailing list threads, dating back to when the functions were added to NumPy in 2008. I recommend reviewing those threads before commenting here. Warren ----- ================================================== NEP 32 ? Remove the financial functions from NumPy ================================================== :Author: Warren Weckesser :Status: Draft :Type: Standards Track :Created: 2019-08-30 Abstract -------- We propose deprecating and ultimately removing the financial functions [1]_ from NumPy. The functions will be moved to an independent repository, and provided to the community as a separate package with the name ``numpy_financial``. Motivation and scope -------------------- The NumPy financial functions [1]_ are the 10 functions ``fv``, ``ipmt``, ``irr``, ``mirr``, ``nper``, ``npv``, ``pmt``, ``ppmt``, ``pv`` and ``rate``. The functions provide elementary financial calculations such as future value, net present value, etc. These functions were added to NumPy in 2008 [2]_. In May, 2009, a request by Joe Harrington to add a function called ``xirr`` to the financial functions triggered a long thread about these functions [3]_. One important point that came up in that thread is that a "real" financial library must be able to handle real dates. The NumPy financial functions do not work with actual dates or calendars. The preference for a more capable library independent of NumPy was expressed several times in that thread. In June, 2009, D. L. Goldsmith expressed concerns about the correctness of the implementations of some of the financial functions [4]_. It was suggested then to move the financial functions out of NumPy to an independent package. In a GitHub issue in 2013 [5]_, Nathaniel Smith suggested moving the financial functions from the top-level namespace to ``numpy.financial``. He also suggested giving the functions better names. Responses at that time included the suggestion to deprecate them and move them from NumPy to a separate package. This issue is still open. Later in 2013 [6]_, it was suggested on the mailing list that these functions be removed from NumPy. The arguments for the removal of these functions from NumPy: * They are too specialized for NumPy. * They are not actually useful for "real world" financial calculations, because they do not handle real dates and calendars. * The definition of "correctness" for some of these functions seems to be a matter of convention, and the current NumPy developers do not have the background to judge their correctness. * There has been little interest among past and present NumPy developers in maintaining these functions. The main arguments for keeping the functions in NumPy are: * Removing these functions will be disruptive for some users. Current users will have to add the new ``numpy_financial`` package to their dependencies, and then modify their code to use the new package. * The functions provided, while not "industrial strength", are apparently similar to functions provided by spreadsheets and some calculators. Having them available in NumPy makes it easier for some developers to migrate their software to Python and NumPy. It is clear from comments in the mailing list discussions and in the GitHub issues that many current NumPy developers believe the benefits of removing the functions outweigh the costs. For example, from [5]_:: The financial functions should probably be part of a separate package -- Charles Harris If there's a better package we can point people to we could just deprecate them and then remove them entirely... I'd be fine with that too... -- Nathaniel Smith +1 to deprecate them. If no other package exists, it can be created if someone feels the need for that. -- Ralf Gommers I feel pretty strongly that we should deprecate these. If nobody on numpy?s core team is interested in maintaining them, then it is purely a drag on development for NumPy. -- Stephan Hoyer And from the 2013 mailing list discussion, about removing the functions from NumPy:: I am +1 as well, I don't think they should have been included in the first place. -- David Cournapeau But not everyone was in favor of removal:: The fin routines are tiny and don't require much maintenance once written. If we made an effort (putting up pages with examples of common financial calculations and collecting those under a topical web page, then linking to that page from various places and talking it up), I would think they could attract users looking for a free way to play with financial scenarios. [...] So, I would say we keep them. If ours are not the best, we should bring them up to snuff. -- Joe Harrington For an idea of the maintenance burden of the financial functions, one can look for all the GitHub issues [7]_ and pull requests [8]_ that have the tag ``component: numpy.lib.financial``. One method for measuring the effect of removing these functions is to find all the packages on GitHub that use them. Such a search can be performed with the ``python-api-inspect`` service [9]_. A search for all uses of the NumPy financial functions finds just eight repositories. (See the comments in [5]_ for the actual SQL query.) Implementation -------------- * Create a new Python package, ``numpy_financial``, to be maintained in the top-level NumPy github organization. This repository will contain the definitions and unit tests for the financial functions. The package will be added to PyPI so it can be installed with ``pip``. * Deprecate the financial functions in the ``numpy`` namespace, beginning in NumPy version 1.18. Remove the financial functions from NumPy version 1.20. Backward compatibility ---------------------- The removal of these functions breaks backward compatibility, as explained earlier. The effects are mitigated by providing the ``numpy_financial`` library. Alternatives ------------ The following alternatives were mentioned in [5]_: * *Maintain the functions as they are (i.e. do nothing).* A review of the history makes clear that this is not the preference of many NumPy developers. A recurring comment is that the functions simply do not belong in NumPy. When that sentiment is combined with the history of bug reports and the ongoing questions about the correctness of the functions, the conclusion is that the cleanest solution is deprecation and removal. * *Move the functions from the ``numpy`` namespace to ``numpy.financial``.* This was the initial suggestion in [5]_. Such a change does not address the maintenance issues, and doesn't change the misfit that many developers see between these functions and NumPy. It causes disruption for the current users of these functions without addressing what many developers see as the fundamental problem. Discussion ---------- Links to past mailing list discussions, and to relevant GitHub issues and pull requests, have already been given. References and footnotes ------------------------ .. [1] Financial functions, https://numpy.org/doc/1.17/reference/routines.financial.html .. [2] Numpy-discussion mailing list, "Simple financial functions for NumPy", https://mail.python.org/pipermail/numpy-discussion/2008-April/032353.html .. [3] Numpy-discussion mailing list, "add xirr to numpy financial functions?", https://mail.python.org/pipermail/numpy-discussion/2009-May/042645.html .. [4] Numpy-discussion mailing list, "Definitions of pv, fv, nper, pmt, and rate", https://mail.python.org/pipermail/numpy-discussion/2009-June/043188.html .. [5] Get financial functions out of main namespace, https://github.com/numpy/numpy/issues/2880 .. [6] Numpy-discussion mailing list, "Deprecation of financial routines", https://mail.python.org/pipermail/numpy-discussion/2013-August/067409.html .. [7] ``component: numpy.lib.financial`` issues, https://github.com/numpy/numpy/issues?utf8=%E2%9C%93&q=is%3Aissue+label%3A%22component%3A+numpy.lib.financial%22+ .. [8] ``component: numpy.lib.financial`` pull request, https://github.com/numpy/numpy/pulls?utf8=%E2%9C%93&q=is%3Apr+label%3A%22component%3A+numpy.lib.financial%22+ .. [9] Quansight-Labs/python-api-inspect, https://github.com/Quansight-Labs/python-api-inspect/ Copyright --------- This document has been placed in the public domain. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Tue Sep 3 10:33:58 2019 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 03 Sep 2019 09:33:58 -0500 Subject: [Numpy-discussion] NumPy Community Meeting Wednesday, Sep. 4 Message-ID: Hi all, There will be a NumPy Community meeting Wednesday September 4 at 11 am Pacific Time. Everyone is invited to join in and edit the work-in- progress meeting topics and notes: https://hackmd.io/76o-IxCjQX2mOXO_wwkcpg?both Best wishes Sebastian -------------- next part -------------- A non-text attachment was scrubbed... Name: NumPy_Community_Call.ics Type: text/calendar Size: 3264 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From sebastian at sipsolutions.net Tue Sep 3 12:35:45 2019 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 03 Sep 2019 11:35:45 -0500 Subject: [Numpy-discussion] NEP 32: Remove the financial functions from NumPy In-Reply-To: References: Message-ID: <9067a8f06bc885307d1ec726a55bc5fd906c3c62.camel@sipsolutions.net> On Tue, 2019-09-03 at 08:56 -0400, Warren Weckesser wrote: > Github issue 2880 ("Get financial functions out of main namespace", Very briefly, I am absolutely in favor of this. Keeping the functions in numpy seems more of a liability than help anyone. And this push is more likely to help users by spurring development on a good replacement, than a practically unmaintained corner of NumPy that may seem like it solves a problem, but probably does so very poorly. Moving them into a separate pip installable package seems like the best way forward until a better replacement, to which we can point users, comes up. - Sebastian > https://github.com/numpy/numpy/issues/2880) has been open since 2013. > In a recent community meeting, it was suggested that we create a NEP > to propose the removal of the financial functions from NumPy. I have > submitted "NEP 32: Remove the financial functions from NumPy" in a > pull request at https://github.com/numpy/numpy/pull/14399. A copy of > the latest version of the NEP is below. > > According to the NEP process document, "Once the PR is in place, the > NEP should be announced on the mailing list for discussion (comments > on the PR itself should be restricted to minor editorial and > technical fixes)." This email is the announcement for NEP 32. > > The NEP includes a brief summary of the history of the financial > functions, and has links to several relevant mailing list threads, > dating back to when the functions were added to NumPy in 2008. I > recommend reviewing those threads before commenting here. > > Warren > > ----- > > ================================================== > NEP 32 ? Remove the financial functions from NumPy > ================================================== > > :Author: Warren Weckesser > :Status: Draft > :Type: Standards Track > :Created: 2019-08-30 > > > Abstract > -------- > > We propose deprecating and ultimately removing the financial > functions [1]_ > from NumPy. The functions will be moved to an independent > repository, > and provided to the community as a separate package with the name > ``numpy_financial``. > > > Motivation and scope > -------------------- > > The NumPy financial functions [1]_ are the 10 functions ``fv``, > ``ipmt``, > ``irr``, ``mirr``, ``nper``, ``npv``, ``pmt``, ``ppmt``, ``pv`` and > ``rate``. > The functions provide elementary financial calculations such as > future value, > net present value, etc. These functions were added to NumPy in 2008 > [2]_. > > In May, 2009, a request by Joe Harrington to add a function called > ``xirr`` to > the financial functions triggered a long thread about these functions > [3]_. > One important point that came up in that thread is that a "real" > financial > library must be able to handle real dates. The NumPy financial > functions do > not work with actual dates or calendars. The preference for a more > capable > library independent of NumPy was expressed several times in that > thread. > > In June, 2009, D. L. Goldsmith expressed concerns about the > correctness of the > implementations of some of the financial functions [4]_. It was > suggested then > to move the financial functions out of NumPy to an independent > package. > > In a GitHub issue in 2013 [5]_, Nathaniel Smith suggested moving the > financial > functions from the top-level namespace to ``numpy.financial``. He > also > suggested giving the functions better names. Responses at that time > included > the suggestion to deprecate them and move them from NumPy to a > separate > package. This issue is still open. > > Later in 2013 [6]_, it was suggested on the mailing list that these > functions > be removed from NumPy. > > The arguments for the removal of these functions from NumPy: > > * They are too specialized for NumPy. > * They are not actually useful for "real world" financial > calculations, because > they do not handle real dates and calendars. > * The definition of "correctness" for some of these functions seems > to be a > matter of convention, and the current NumPy developers do not have > the > background to judge their correctness. > * There has been little interest among past and present NumPy > developers > in maintaining these functions. > > The main arguments for keeping the functions in NumPy are: > > * Removing these functions will be disruptive for some users. > Current users > will have to add the new ``numpy_financial`` package to their > dependencies, > and then modify their code to use the new package. > * The functions provided, while not "industrial strength", are > apparently > similar to functions provided by spreadsheets and some > calculators. Having > them available in NumPy makes it easier for some developers to > migrate their > software to Python and NumPy. > > It is clear from comments in the mailing list discussions and in the > GitHub > issues that many current NumPy developers believe the benefits of > removing > the functions outweigh the costs. For example, from [5]_:: > > The financial functions should probably be part of a separate > package > -- Charles Harris > > If there's a better package we can point people to we could just > deprecate > them and then remove them entirely... I'd be fine with that > too... > -- Nathaniel Smith > > +1 to deprecate them. If no other package exists, it can be > created if > someone feels the need for that. > -- Ralf Gommers > > I feel pretty strongly that we should deprecate these. If nobody > on numpy?s > core team is interested in maintaining them, then it is purely a > drag on > development for NumPy. > -- Stephan Hoyer > > And from the 2013 mailing list discussion, about removing the > functions from > NumPy:: > > I am +1 as well, I don't think they should have been included in > the first > place. > -- David Cournapeau > > But not everyone was in favor of removal:: > > The fin routines are tiny and don't require much maintenance once > written. If we made an effort (putting up pages with examples of > common > financial calculations and collecting those under a topical web > page, > then linking to that page from various places and talking it up), > I > would think they could attract users looking for a free way to > play with > financial scenarios. [...] > So, I would say we keep them. If ours are not the best, we > should bring > them up to snuff. > -- Joe Harrington > > For an idea of the maintenance burden of the financial functions, one > can > look for all the GitHub issues [7]_ and pull requests [8]_ that have > the tag > ``component: numpy.lib.financial``. > > One method for measuring the effect of removing these functions is to > find > all the packages on GitHub that use them. Such a search can be > performed > with the ``python-api-inspect`` service [9]_. A search for all uses > of the > NumPy financial functions finds just eight repositories. (See the > comments > in [5]_ for the actual SQL query.) > > > Implementation > -------------- > > * Create a new Python package, ``numpy_financial``, to be maintained > in the > top-level NumPy github organization. This repository will contain > the > definitions and unit tests for the financial functions. The > package will > be added to PyPI so it can be installed with ``pip``. > * Deprecate the financial functions in the ``numpy`` namespace, > beginning in > NumPy version 1.18. Remove the financial functions from NumPy > version 1.20. > > > Backward compatibility > ---------------------- > > The removal of these functions breaks backward compatibility, as > explained > earlier. The effects are mitigated by providing the > ``numpy_financial`` > library. > > > Alternatives > ------------ > > The following alternatives were mentioned in [5]_: > > * *Maintain the functions as they are (i.e. do nothing).* > A review of the history makes clear that this is not the preference > of many > NumPy developers. A recurring comment is that the functions simply > do not > belong in NumPy. When that sentiment is combined with the history > of bug > reports and the ongoing questions about the correctness of the > functions, the > conclusion is that the cleanest solution is deprecation and > removal. > * *Move the functions from the ``numpy`` namespace to > ``numpy.financial``.* > This was the initial suggestion in [5]_. Such a change does not > address the > maintenance issues, and doesn't change the misfit that many > developers see > between these functions and NumPy. It causes disruption for the > current > users of these functions without addressing what many developers > see as the > fundamental problem. > > > Discussion > ---------- > > Links to past mailing list discussions, and to relevant GitHub issues > and pull > requests, have already been given. > > > References and footnotes > ------------------------ > > .. [1] Financial functions, > https://numpy.org/doc/1.17/reference/routines.financial.html > > .. [2] Numpy-discussion mailing list, "Simple financial functions for > NumPy", > > https://mail.python.org/pipermail/numpy-discussion/2008-April/032353.html > > .. [3] Numpy-discussion mailing list, "add xirr to numpy financial > functions?", > > https://mail.python.org/pipermail/numpy-discussion/2009-May/042645.html > > .. [4] Numpy-discussion mailing list, "Definitions of pv, fv, nper, > pmt, and rate", > > https://mail.python.org/pipermail/numpy-discussion/2009-June/043188.html > > .. [5] Get financial functions out of main namespace, > https://github.com/numpy/numpy/issues/2880 > > .. [6] Numpy-discussion mailing list, "Deprecation of financial > routines", > > https://mail.python.org/pipermail/numpy-discussion/2013-August/067409.html > > .. [7] ``component: numpy.lib.financial`` issues, > > https://github.com/numpy/numpy/issues?utf8=%E2%9C%93&q=is%3Aissue+label%3A%22component%3A+numpy.lib.financial%22+ > > .. [8] ``component: numpy.lib.financial`` pull request, > > https://github.com/numpy/numpy/pulls?utf8=%E2%9C%93&q=is%3Apr+label%3A%22component%3A+numpy.lib.financial%22+ > > .. [9] Quansight-Labs/python-api-inspect, > https://github.com/Quansight-Labs/python-api-inspect/ > > > Copyright > --------- > > This document has been placed in the public domain. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From Martin.Gfeller at swisscom.com Wed Sep 4 13:35:28 2019 From: Martin.Gfeller at swisscom.com (Martin.Gfeller at swisscom.com) Date: Wed, 4 Sep 2019 17:35:28 +0000 Subject: [Numpy-discussion] NEP 32: Remove the financial functions from NumPy In-Reply-To: References: Message-ID: Dear all As a user of Numpy in finance, I'm absolutely in favour of removing these functions. They're too domain-specific, not flexible and general enough for widespread use, and probably not easy to maintain. Best regards Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From ilhanpolat at gmail.com Wed Sep 4 14:10:11 2019 From: ilhanpolat at gmail.com (Ilhan Polat) Date: Wed, 4 Sep 2019 20:10:11 +0200 Subject: [Numpy-discussion] NEP 32: Remove the financial functions from NumPy In-Reply-To: <9067a8f06bc885307d1ec726a55bc5fd906c3c62.camel@sipsolutions.net> References: <9067a8f06bc885307d1ec726a55bc5fd906c3c62.camel@sipsolutions.net> Message-ID: +1 on removing them from NumPy. I think there are plenty of alternatives already so many that we might even consider deprecating them just like SciPy misc module by pointing to alternatives. On Tue, Sep 3, 2019 at 6:38 PM Sebastian Berg wrote: > On Tue, 2019-09-03 at 08:56 -0400, Warren Weckesser wrote: > > Github issue 2880 ("Get financial functions out of main namespace", > > Very briefly, I am absolutely in favor of this. > > Keeping the functions in numpy seems more of a liability than help > anyone. And this push is more likely to help users by spurring > development on a good replacement, than a practically unmaintained > corner of NumPy that may seem like it solves a problem, but probably > does so very poorly. > > Moving them into a separate pip installable package seems like the best > way forward until a better replacement, to which we can point users, > comes up. > > - Sebastian > > > > https://github.com/numpy/numpy/issues/2880) has been open since 2013. > > In a recent community meeting, it was suggested that we create a NEP > > to propose the removal of the financial functions from NumPy. I have > > submitted "NEP 32: Remove the financial functions from NumPy" in a > > pull request at https://github.com/numpy/numpy/pull/14399. A copy of > > the latest version of the NEP is below. > > > > According to the NEP process document, "Once the PR is in place, the > > NEP should be announced on the mailing list for discussion (comments > > on the PR itself should be restricted to minor editorial and > > technical fixes)." This email is the announcement for NEP 32. > > > > The NEP includes a brief summary of the history of the financial > > functions, and has links to several relevant mailing list threads, > > dating back to when the functions were added to NumPy in 2008. I > > recommend reviewing those threads before commenting here. > > > > Warren > > > > ----- > > > > ================================================== > > NEP 32 ? Remove the financial functions from NumPy > > ================================================== > > > > :Author: Warren Weckesser > > :Status: Draft > > :Type: Standards Track > > :Created: 2019-08-30 > > > > > > Abstract > > -------- > > > > We propose deprecating and ultimately removing the financial > > functions [1]_ > > from NumPy. The functions will be moved to an independent > > repository, > > and provided to the community as a separate package with the name > > ``numpy_financial``. > > > > > > Motivation and scope > > -------------------- > > > > The NumPy financial functions [1]_ are the 10 functions ``fv``, > > ``ipmt``, > > ``irr``, ``mirr``, ``nper``, ``npv``, ``pmt``, ``ppmt``, ``pv`` and > > ``rate``. > > The functions provide elementary financial calculations such as > > future value, > > net present value, etc. These functions were added to NumPy in 2008 > > [2]_. > > > > In May, 2009, a request by Joe Harrington to add a function called > > ``xirr`` to > > the financial functions triggered a long thread about these functions > > [3]_. > > One important point that came up in that thread is that a "real" > > financial > > library must be able to handle real dates. The NumPy financial > > functions do > > not work with actual dates or calendars. The preference for a more > > capable > > library independent of NumPy was expressed several times in that > > thread. > > > > In June, 2009, D. L. Goldsmith expressed concerns about the > > correctness of the > > implementations of some of the financial functions [4]_. It was > > suggested then > > to move the financial functions out of NumPy to an independent > > package. > > > > In a GitHub issue in 2013 [5]_, Nathaniel Smith suggested moving the > > financial > > functions from the top-level namespace to ``numpy.financial``. He > > also > > suggested giving the functions better names. Responses at that time > > included > > the suggestion to deprecate them and move them from NumPy to a > > separate > > package. This issue is still open. > > > > Later in 2013 [6]_, it was suggested on the mailing list that these > > functions > > be removed from NumPy. > > > > The arguments for the removal of these functions from NumPy: > > > > * They are too specialized for NumPy. > > * They are not actually useful for "real world" financial > > calculations, because > > they do not handle real dates and calendars. > > * The definition of "correctness" for some of these functions seems > > to be a > > matter of convention, and the current NumPy developers do not have > > the > > background to judge their correctness. > > * There has been little interest among past and present NumPy > > developers > > in maintaining these functions. > > > > The main arguments for keeping the functions in NumPy are: > > > > * Removing these functions will be disruptive for some users. > > Current users > > will have to add the new ``numpy_financial`` package to their > > dependencies, > > and then modify their code to use the new package. > > * The functions provided, while not "industrial strength", are > > apparently > > similar to functions provided by spreadsheets and some > > calculators. Having > > them available in NumPy makes it easier for some developers to > > migrate their > > software to Python and NumPy. > > > > It is clear from comments in the mailing list discussions and in the > > GitHub > > issues that many current NumPy developers believe the benefits of > > removing > > the functions outweigh the costs. For example, from [5]_:: > > > > The financial functions should probably be part of a separate > > package > > -- Charles Harris > > > > If there's a better package we can point people to we could just > > deprecate > > them and then remove them entirely... I'd be fine with that > > too... > > -- Nathaniel Smith > > > > +1 to deprecate them. If no other package exists, it can be > > created if > > someone feels the need for that. > > -- Ralf Gommers > > > > I feel pretty strongly that we should deprecate these. If nobody > > on numpy?s > > core team is interested in maintaining them, then it is purely a > > drag on > > development for NumPy. > > -- Stephan Hoyer > > > > And from the 2013 mailing list discussion, about removing the > > functions from > > NumPy:: > > > > I am +1 as well, I don't think they should have been included in > > the first > > place. > > -- David Cournapeau > > > > But not everyone was in favor of removal:: > > > > The fin routines are tiny and don't require much maintenance once > > written. If we made an effort (putting up pages with examples of > > common > > financial calculations and collecting those under a topical web > > page, > > then linking to that page from various places and talking it up), > > I > > would think they could attract users looking for a free way to > > play with > > financial scenarios. [...] > > So, I would say we keep them. If ours are not the best, we > > should bring > > them up to snuff. > > -- Joe Harrington > > > > For an idea of the maintenance burden of the financial functions, one > > can > > look for all the GitHub issues [7]_ and pull requests [8]_ that have > > the tag > > ``component: numpy.lib.financial``. > > > > One method for measuring the effect of removing these functions is to > > find > > all the packages on GitHub that use them. Such a search can be > > performed > > with the ``python-api-inspect`` service [9]_. A search for all uses > > of the > > NumPy financial functions finds just eight repositories. (See the > > comments > > in [5]_ for the actual SQL query.) > > > > > > Implementation > > -------------- > > > > * Create a new Python package, ``numpy_financial``, to be maintained > > in the > > top-level NumPy github organization. This repository will contain > > the > > definitions and unit tests for the financial functions. The > > package will > > be added to PyPI so it can be installed with ``pip``. > > * Deprecate the financial functions in the ``numpy`` namespace, > > beginning in > > NumPy version 1.18. Remove the financial functions from NumPy > > version 1.20. > > > > > > Backward compatibility > > ---------------------- > > > > The removal of these functions breaks backward compatibility, as > > explained > > earlier. The effects are mitigated by providing the > > ``numpy_financial`` > > library. > > > > > > Alternatives > > ------------ > > > > The following alternatives were mentioned in [5]_: > > > > * *Maintain the functions as they are (i.e. do nothing).* > > A review of the history makes clear that this is not the preference > > of many > > NumPy developers. A recurring comment is that the functions simply > > do not > > belong in NumPy. When that sentiment is combined with the history > > of bug > > reports and the ongoing questions about the correctness of the > > functions, the > > conclusion is that the cleanest solution is deprecation and > > removal. > > * *Move the functions from the ``numpy`` namespace to > > ``numpy.financial``.* > > This was the initial suggestion in [5]_. Such a change does not > > address the > > maintenance issues, and doesn't change the misfit that many > > developers see > > between these functions and NumPy. It causes disruption for the > > current > > users of these functions without addressing what many developers > > see as the > > fundamental problem. > > > > > > Discussion > > ---------- > > > > Links to past mailing list discussions, and to relevant GitHub issues > > and pull > > requests, have already been given. > > > > > > References and footnotes > > ------------------------ > > > > .. [1] Financial functions, > > https://numpy.org/doc/1.17/reference/routines.financial.html > > > > .. [2] Numpy-discussion mailing list, "Simple financial functions for > > NumPy", > > > > > https://mail.python.org/pipermail/numpy-discussion/2008-April/032353.html > > > > .. [3] Numpy-discussion mailing list, "add xirr to numpy financial > > functions?", > > > > https://mail.python.org/pipermail/numpy-discussion/2009-May/042645.html > > > > .. [4] Numpy-discussion mailing list, "Definitions of pv, fv, nper, > > pmt, and rate", > > > > https://mail.python.org/pipermail/numpy-discussion/2009-June/043188.html > > > > .. [5] Get financial functions out of main namespace, > > https://github.com/numpy/numpy/issues/2880 > > > > .. [6] Numpy-discussion mailing list, "Deprecation of financial > > routines", > > > > > https://mail.python.org/pipermail/numpy-discussion/2013-August/067409.html > > > > .. [7] ``component: numpy.lib.financial`` issues, > > > > > https://github.com/numpy/numpy/issues?utf8=%E2%9C%93&q=is%3Aissue+label%3A%22component%3A+numpy.lib.financial%22+ > > > > .. [8] ``component: numpy.lib.financial`` pull request, > > > > > https://github.com/numpy/numpy/pulls?utf8=%E2%9C%93&q=is%3Apr+label%3A%22component%3A+numpy.lib.financial%22+ > > > > .. [9] Quansight-Labs/python-api-inspect, > > https://github.com/Quansight-Labs/python-api-inspect/ > > > > > > Copyright > > --------- > > > > This document has been placed in the public domain. > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Wed Sep 4 14:17:01 2019 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 4 Sep 2019 19:17:01 +0100 Subject: [Numpy-discussion] NEP 32: Remove the financial functions from NumPy In-Reply-To: References: <9067a8f06bc885307d1ec726a55bc5fd906c3c62.camel@sipsolutions.net> Message-ID: Hi, Maybe worth asking over at the Pandas list? I bet there are more Python / finance people over there. Cheers, Matthew On Wed, Sep 4, 2019 at 7:11 PM Ilhan Polat wrote: > > +1 on removing them from NumPy. I think there are plenty of alternatives already so many that we might even consider deprecating them just like SciPy misc module by pointing to alternatives. > > On Tue, Sep 3, 2019 at 6:38 PM Sebastian Berg wrote: >> >> On Tue, 2019-09-03 at 08:56 -0400, Warren Weckesser wrote: >> > Github issue 2880 ("Get financial functions out of main namespace", >> >> Very briefly, I am absolutely in favor of this. >> >> Keeping the functions in numpy seems more of a liability than help >> anyone. And this push is more likely to help users by spurring >> development on a good replacement, than a practically unmaintained >> corner of NumPy that may seem like it solves a problem, but probably >> does so very poorly. >> >> Moving them into a separate pip installable package seems like the best >> way forward until a better replacement, to which we can point users, >> comes up. >> >> - Sebastian >> >> >> > https://github.com/numpy/numpy/issues/2880) has been open since 2013. >> > In a recent community meeting, it was suggested that we create a NEP >> > to propose the removal of the financial functions from NumPy. I have >> > submitted "NEP 32: Remove the financial functions from NumPy" in a >> > pull request at https://github.com/numpy/numpy/pull/14399. A copy of >> > the latest version of the NEP is below. >> > >> > According to the NEP process document, "Once the PR is in place, the >> > NEP should be announced on the mailing list for discussion (comments >> > on the PR itself should be restricted to minor editorial and >> > technical fixes)." This email is the announcement for NEP 32. >> > >> > The NEP includes a brief summary of the history of the financial >> > functions, and has links to several relevant mailing list threads, >> > dating back to when the functions were added to NumPy in 2008. I >> > recommend reviewing those threads before commenting here. >> > >> > Warren >> > >> > ----- >> > >> > ================================================== >> > NEP 32 ? Remove the financial functions from NumPy >> > ================================================== >> > >> > :Author: Warren Weckesser >> > :Status: Draft >> > :Type: Standards Track >> > :Created: 2019-08-30 >> > >> > >> > Abstract >> > -------- >> > >> > We propose deprecating and ultimately removing the financial >> > functions [1]_ >> > from NumPy. The functions will be moved to an independent >> > repository, >> > and provided to the community as a separate package with the name >> > ``numpy_financial``. >> > >> > >> > Motivation and scope >> > -------------------- >> > >> > The NumPy financial functions [1]_ are the 10 functions ``fv``, >> > ``ipmt``, >> > ``irr``, ``mirr``, ``nper``, ``npv``, ``pmt``, ``ppmt``, ``pv`` and >> > ``rate``. >> > The functions provide elementary financial calculations such as >> > future value, >> > net present value, etc. These functions were added to NumPy in 2008 >> > [2]_. >> > >> > In May, 2009, a request by Joe Harrington to add a function called >> > ``xirr`` to >> > the financial functions triggered a long thread about these functions >> > [3]_. >> > One important point that came up in that thread is that a "real" >> > financial >> > library must be able to handle real dates. The NumPy financial >> > functions do >> > not work with actual dates or calendars. The preference for a more >> > capable >> > library independent of NumPy was expressed several times in that >> > thread. >> > >> > In June, 2009, D. L. Goldsmith expressed concerns about the >> > correctness of the >> > implementations of some of the financial functions [4]_. It was >> > suggested then >> > to move the financial functions out of NumPy to an independent >> > package. >> > >> > In a GitHub issue in 2013 [5]_, Nathaniel Smith suggested moving the >> > financial >> > functions from the top-level namespace to ``numpy.financial``. He >> > also >> > suggested giving the functions better names. Responses at that time >> > included >> > the suggestion to deprecate them and move them from NumPy to a >> > separate >> > package. This issue is still open. >> > >> > Later in 2013 [6]_, it was suggested on the mailing list that these >> > functions >> > be removed from NumPy. >> > >> > The arguments for the removal of these functions from NumPy: >> > >> > * They are too specialized for NumPy. >> > * They are not actually useful for "real world" financial >> > calculations, because >> > they do not handle real dates and calendars. >> > * The definition of "correctness" for some of these functions seems >> > to be a >> > matter of convention, and the current NumPy developers do not have >> > the >> > background to judge their correctness. >> > * There has been little interest among past and present NumPy >> > developers >> > in maintaining these functions. >> > >> > The main arguments for keeping the functions in NumPy are: >> > >> > * Removing these functions will be disruptive for some users. >> > Current users >> > will have to add the new ``numpy_financial`` package to their >> > dependencies, >> > and then modify their code to use the new package. >> > * The functions provided, while not "industrial strength", are >> > apparently >> > similar to functions provided by spreadsheets and some >> > calculators. Having >> > them available in NumPy makes it easier for some developers to >> > migrate their >> > software to Python and NumPy. >> > >> > It is clear from comments in the mailing list discussions and in the >> > GitHub >> > issues that many current NumPy developers believe the benefits of >> > removing >> > the functions outweigh the costs. For example, from [5]_:: >> > >> > The financial functions should probably be part of a separate >> > package >> > -- Charles Harris >> > >> > If there's a better package we can point people to we could just >> > deprecate >> > them and then remove them entirely... I'd be fine with that >> > too... >> > -- Nathaniel Smith >> > >> > +1 to deprecate them. If no other package exists, it can be >> > created if >> > someone feels the need for that. >> > -- Ralf Gommers >> > >> > I feel pretty strongly that we should deprecate these. If nobody >> > on numpy?s >> > core team is interested in maintaining them, then it is purely a >> > drag on >> > development for NumPy. >> > -- Stephan Hoyer >> > >> > And from the 2013 mailing list discussion, about removing the >> > functions from >> > NumPy:: >> > >> > I am +1 as well, I don't think they should have been included in >> > the first >> > place. >> > -- David Cournapeau >> > >> > But not everyone was in favor of removal:: >> > >> > The fin routines are tiny and don't require much maintenance once >> > written. If we made an effort (putting up pages with examples of >> > common >> > financial calculations and collecting those under a topical web >> > page, >> > then linking to that page from various places and talking it up), >> > I >> > would think they could attract users looking for a free way to >> > play with >> > financial scenarios. [...] >> > So, I would say we keep them. If ours are not the best, we >> > should bring >> > them up to snuff. >> > -- Joe Harrington >> > >> > For an idea of the maintenance burden of the financial functions, one >> > can >> > look for all the GitHub issues [7]_ and pull requests [8]_ that have >> > the tag >> > ``component: numpy.lib.financial``. >> > >> > One method for measuring the effect of removing these functions is to >> > find >> > all the packages on GitHub that use them. Such a search can be >> > performed >> > with the ``python-api-inspect`` service [9]_. A search for all uses >> > of the >> > NumPy financial functions finds just eight repositories. (See the >> > comments >> > in [5]_ for the actual SQL query.) >> > >> > >> > Implementation >> > -------------- >> > >> > * Create a new Python package, ``numpy_financial``, to be maintained >> > in the >> > top-level NumPy github organization. This repository will contain >> > the >> > definitions and unit tests for the financial functions. The >> > package will >> > be added to PyPI so it can be installed with ``pip``. >> > * Deprecate the financial functions in the ``numpy`` namespace, >> > beginning in >> > NumPy version 1.18. Remove the financial functions from NumPy >> > version 1.20. >> > >> > >> > Backward compatibility >> > ---------------------- >> > >> > The removal of these functions breaks backward compatibility, as >> > explained >> > earlier. The effects are mitigated by providing the >> > ``numpy_financial`` >> > library. >> > >> > >> > Alternatives >> > ------------ >> > >> > The following alternatives were mentioned in [5]_: >> > >> > * *Maintain the functions as they are (i.e. do nothing).* >> > A review of the history makes clear that this is not the preference >> > of many >> > NumPy developers. A recurring comment is that the functions simply >> > do not >> > belong in NumPy. When that sentiment is combined with the history >> > of bug >> > reports and the ongoing questions about the correctness of the >> > functions, the >> > conclusion is that the cleanest solution is deprecation and >> > removal. >> > * *Move the functions from the ``numpy`` namespace to >> > ``numpy.financial``.* >> > This was the initial suggestion in [5]_. Such a change does not >> > address the >> > maintenance issues, and doesn't change the misfit that many >> > developers see >> > between these functions and NumPy. It causes disruption for the >> > current >> > users of these functions without addressing what many developers >> > see as the >> > fundamental problem. >> > >> > >> > Discussion >> > ---------- >> > >> > Links to past mailing list discussions, and to relevant GitHub issues >> > and pull >> > requests, have already been given. >> > >> > >> > References and footnotes >> > ------------------------ >> > >> > .. [1] Financial functions, >> > https://numpy.org/doc/1.17/reference/routines.financial.html >> > >> > .. [2] Numpy-discussion mailing list, "Simple financial functions for >> > NumPy", >> > >> > https://mail.python.org/pipermail/numpy-discussion/2008-April/032353.html >> > >> > .. [3] Numpy-discussion mailing list, "add xirr to numpy financial >> > functions?", >> > >> > https://mail.python.org/pipermail/numpy-discussion/2009-May/042645.html >> > >> > .. [4] Numpy-discussion mailing list, "Definitions of pv, fv, nper, >> > pmt, and rate", >> > >> > https://mail.python.org/pipermail/numpy-discussion/2009-June/043188.html >> > >> > .. [5] Get financial functions out of main namespace, >> > https://github.com/numpy/numpy/issues/2880 >> > >> > .. [6] Numpy-discussion mailing list, "Deprecation of financial >> > routines", >> > >> > https://mail.python.org/pipermail/numpy-discussion/2013-August/067409.html >> > >> > .. [7] ``component: numpy.lib.financial`` issues, >> > >> > https://github.com/numpy/numpy/issues?utf8=%E2%9C%93&q=is%3Aissue+label%3A%22component%3A+numpy.lib.financial%22+ >> > >> > .. [8] ``component: numpy.lib.financial`` pull request, >> > >> > https://github.com/numpy/numpy/pulls?utf8=%E2%9C%93&q=is%3Apr+label%3A%22component%3A+numpy.lib.financial%22+ >> > >> > .. [9] Quansight-Labs/python-api-inspect, >> > https://github.com/Quansight-Labs/python-api-inspect/ >> > >> > >> > Copyright >> > --------- >> > >> > This document has been placed in the public domain. >> > >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at python.org >> > https://mail.python.org/mailman/listinfo/numpy-discussion >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion From einstein.edison at gmail.com Thu Sep 5 08:12:04 2019 From: einstein.edison at gmail.com (Hameer Abbasi) Date: Thu, 5 Sep 2019 14:12:04 +0200 Subject: [Numpy-discussion] =?utf-8?q?NEP_31_=E2=80=94_Context-local_and_?= =?utf-8?q?global_overrides_of_the_NumPy_API?= In-Reply-To: References: Message-ID: Hello everyone; Thanks to all the feedback from the community, in particular Sebastian Berg, we have a new draft of NEP-31. Please find the full text quoted below for discussion and reference. Any feedback and discussion is welcome. ============================================================ NEP 31 ? Context-local and global overrides of the NumPy API ============================================================ :Author: Hameer Abbasi :Author: Ralf Gommers :Author: Peter Bell :Status: Draft :Type: Standards Track :Created: 2019-08-22 Abstract -------- This NEP proposes to make all of NumPy's public API overridable via an extensible backend mechanism. Acceptance of this NEP means NumPy would provide global and context-local overrides, as well as a dispatch mechanism similar to NEP-18 [2]_. First experiences with ``__array_function__`` show that it is necessary to be able to override NumPy functions that *do not take an array-like argument*, and hence aren't overridable via ``__array_function__``. The most pressing need is array creation and coercion functions, such as ``numpy.zeros`` or ``numpy.asarray``; see e.g. NEP-30 [9]_. This NEP proposes to allow, in an opt-in fashion, overriding any part of the NumPy API. It is intended as a comprehensive resolution to NEP-22 [3]_, and obviates the need to add an ever-growing list of new protocols for each new type of function or object that needs to become overridable. Motivation and Scope -------------------- The motivation behind ``uarray`` is manyfold: First, there have been several attempts to allow dispatch of parts of the NumPy API, including (most prominently), the ``__array_ufunc__`` protocol in NEP-13 [4]_, and the ``__array_function__`` protocol in NEP-18 [2]_, but this has shown the need for further protocols to be developed, including a protocol for coercion (see [5]_, [9]_). The reasons these overrides are needed have been extensively discussed in the references, and this NEP will not attempt to go into the details of why these are needed; but in short: It is necessary for library authors to be able to coerce arbitrary objects into arrays of their own types, such as CuPy needing to coerce to a CuPy array, for example, instead of a NumPy array. These kinds of overrides are useful for both the end-user as well as library authors. End-users may have written or wish to write code that they then later speed up or move to a different implementation, say PyData/Sparse. They can do this simply by setting a backend. Library authors may also wish to write code that is portable across array implementations, for example ``sklearn`` may wish to write code for a machine learning algorithm that is portable across array implementations while also using array creation functions. This NEP takes a holistic approach: It assumes that there are parts of the API that need to be overridable, and that these will grow over time. It provides a general framework and a mechanism to avoid a design of a new protocol each time this is required. This was the goal of ``uarray``: to allow for overrides in an API without needing the design of a new protocol. This NEP proposes the following: That ``unumpy`` [8]_ becomes the recommended override mechanism for the parts of the NumPy API not yet covered by ``__array_function__`` or ``__array_ufunc__``, and that ``uarray`` is vendored into a new namespace within NumPy to give users and downstream dependencies access to these overrides. This vendoring mechanism is similar to what SciPy decided to do for making ``scipy.fft`` overridable (see [10]_). Detailed description -------------------- Using overrides ~~~~~~~~~~~~~~~ The way we propose the overrides will be used by end users is:: # On the library side import numpy.overridable as unp def library_function(array): array = unp.asarray(array) # Code using unumpy as usual return array # On the user side: import numpy.overridable as unp import uarray as ua import dask.array as da ua.register_backend(da) library_function(dask_array) # works and returns dask_array with unp.set_backend(da): library_function([1, 2, 3, 4]) # actually returns a Dask array. Here, ``backend`` can be any compatible object defined either by NumPy or an external library, such as Dask or CuPy. Ideally, it should be the module ``dask.array`` or ``cupy`` itself. Composing backends ~~~~~~~~~~~~~~~~~~ There are some backends which may depend on other backends, for example xarray depending on `numpy.fft`, and transforming a time axis into a frequency axis, or Dask/xarray holding an array other than a NumPy array inside it. This would be handled in the following manner inside code:: with ua.set_backend(cupy), ua.set_backend(dask.array): # Code that has distributed GPU arrays here Proposals ~~~~~~~~~ The only change this NEP proposes at its acceptance, is to make ``unumpy`` the officially recommended way to override NumPy. ``unumpy`` will remain a separate repository/package (which we propose to vendor to avoid a hard dependency, and use the separate ``unumpy`` package only if it is installed, rather than depend on for the time being). In concrete terms, ``numpy.overridable`` becomes an alias for ``unumpy``, if available with a fallback to the a vendored version if not. ``uarray`` and ``unumpy`` and will be developed primarily with the input of duck-array authors and secondarily, custom dtype authors, via the usual GitHub workflow. There are a few reasons for this: * Faster iteration in the case of bugs or issues. * Faster design changes, in the case of needed functionality. * ``unumpy`` will work with older versions of NumPy as well. * The user and library author opt-in to the override process, rather than breakages happening when it is least expected. In simple terms, bugs in ``unumpy`` mean that ``numpy`` remains unaffected. Advantanges of ``unumpy`` over other solutions ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ``unumpy`` offers a number of advantanges over the approach of defining a new protocol for every problem encountered: Whenever there is something requiring an override, ``unumpy`` will be able to offer a unified API with very minor changes. For example: * ``ufunc`` objects can be overridden via their ``__call__``, ``reduce`` and other methods. * Other functions can be overridden in a similar fashion. * ``np.asduckarray`` goes away, and becomes ``np.overridable.asarray`` with a backend set. * The same holds for array creation functions such as ``np.zeros``, ``np.empty`` and so on. This also holds for the future: Making something overridable would require only minor changes to ``unumpy``. Another promise ``unumpy`` holds is one of default implementations. Default implementations can be provided for any multimethod, in terms of others. This allows one to override a large part of the NumPy API by defining only a small part of it. This is to ease the creation of new duck-arrays, by providing default implementations of many functions that can be easily expressed in terms of others, as well as a repository of utility functions that help in the implementation of duck-arrays that most duck-arrays would require. It also allows one to override functions in a manner which ``__array_function__`` simply cannot, such as overriding ``np.einsum`` with the version from the ``opt_einsum`` package, or Intel MKL overriding FFT, BLAS or ``ufunc`` objects. They would define a backend with the appropriate multimethods, and the user would select them via a ``with`` statement, or registering them as a backend. The last benefit is a clear way to coerce to a given backend (via the ``coerce`` keyword in ``ua.set_backend``), and a protocol for coercing not only arrays, but also ``dtype`` objects and ``ufunc`` objects with similar ones from other libraries. This is due to the existence of actual, third party dtype packages, and their desire to blend into the NumPy ecosystem (see [6]_). This is a separate issue compared to the C-level dtype redesign proposed in [7]_, it's about allowing third-party dtype implementations to work with NumPy, much like third-party array implementations. These can provide features such as, for example, units, jagged arrays or other such features that are outside the scope of NumPy. Mixing NumPy and ``unumpy`` in the same file ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Normally, one would only want to import only one of ``unumpy`` or ``numpy``, you would import it as ``np`` for familiarity. However, there may be situations where one wishes to mix NumPy and the overrides, and there are a few ways to do this, depending on the user's style:: from numpy import overridable as unp import numpy as np or:: import numpy as np # Use unumpy via np.overridable Duck-array coercion ~~~~~~~~~~~~~~~~~~~ There are inherent problems about returning objects that are not NumPy arrays from ``numpy.array`` or ``numpy.asarray``, particularly in the context of C/C++ or Cython code that may get an object with a different memory layout than the one it expects. However, we believe this problem may apply not only to these two functions but all functions that return NumPy arrays. For this reason, overrides are opt-in for the user, by using the submodule ``numpy.overridable`` rather than ``numpy``. NumPy will continue to work unaffected by anything in ``numpy.overridable``. If the user wishes to obtain a NumPy array, there are two ways of doing it: 1. Use ``numpy.asarray`` (the non-overridable version). 2. Use ``numpy.overridable.asarray`` with the NumPy backend set and coercion enabled Related Work ------------ Other override mechanisms ~~~~~~~~~~~~~~~~~~~~~~~~~ * NEP-18, the ``__array_function__`` protocol. [2]_ * NEP-13, the ``__array_ufunc__`` protocol. [3]_ * NEP-30, the ``__duck_array__`` protocol. [9]_ Existing NumPy-like array implementations ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * Dask: https://dask.org/ * CuPy: https://cupy.chainer.org/ * PyData/Sparse: https://sparse.pydata.org/ * Xnd: https://xnd.readthedocs.io/ * Astropy's Quantity: https://docs.astropy.org/en/stable/units/ Existing and potential consumers of alternative arrays ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * Dask: https://dask.org/ * scikit-learn: https://scikit-learn.org/ * xarray: https://xarray.pydata.org/ * TensorLy: http://tensorly.org/ Existing alternate dtype implementations ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * ``ndtypes``: https://ndtypes.readthedocs.io/en/latest/ * Datashape: https://datashape.readthedocs.io * Plum: https://plum-py.readthedocs.io/ Implementation -------------- The implementation of this NEP will require the following steps: * Implementation of ``uarray`` multimethods corresponding to the NumPy API, including classes for overriding ``dtype``, ``ufunc`` and ``array`` objects, in the ``unumpy`` repository. * Moving backends from ``unumpy`` into the respective array libraries. ``uarray`` Primer ~~~~~~~~~~~~~~~~~ **Note:** *This section will not attempt to go into too much detail about uarray, that is the purpose of the uarray documentation.* [1]_ *However, the NumPy community will have input into the design of uarray, via the issue tracker.* ``unumpy`` is the interface that defines a set of overridable functions (multimethods) compatible with the numpy API. To do this, it uses the ``uarray`` library. ``uarray`` is a general purpose tool for creating multimethods that dispatch to one of multiple different possible backend implementations. In this sense, it is similar to the ``__array_function__`` protocol but with the key difference that the backend is explicitly installed by the end-user and not coupled into the array type. Decoupling the backend from the array type gives much more flexibility to end-users and backend authors. For example, it is possible to: * override functions not taking arrays as arguments * create backends out of source from the array type * install multiple backends for the same array type This decoupling also means that ``uarray`` is not constrained to dispatching over array-like types. The backend is free to inspect the entire set of function arguments to determine if it can implement the function e.g. ``dtype`` parameter dispatching. Defining backends ^^^^^^^^^^^^^^^^^ ``uarray`` consists of two main protocols: ``__ua_convert__`` and ``__ua_function__``, called in that order, along with ``__ua_domain__``. ``__ua_convert__`` is for conversion and coercion. It has the signature ``(dispatchables, coerce)``, where ``dispatchables`` is an iterable of ``ua.Dispatchable`` objects and ``coerce`` is a boolean indicating whether or not to force the conversion. ``ua.Dispatchable`` is a simple class consisting of three simple values: ``type``, ``value``, and ``coercible``. ``__ua_convert__`` returns an iterable of the converted values, or ``NotImplemented`` in the case of failure. ``__ua_function__`` has the signature ``(func, args, kwargs)`` and defines the actual implementation of the function. It recieves the function and its arguments. Returning ``NotImplemented`` will cause a move to the default implementation of the function if one exists, and failing that, the next backend. Here is what will happen assuming a ``uarray`` multimethod is called: 1. We canonicalise the arguments so any arguments without a default are placed in ``*args`` and those with one are placed in ``**kwargs``. 2. We check the list of backends. a. If it is empty, we try the default implementation. 3. We check if the backend's ``__ua_convert__`` method exists. If it exists: a. We pass it the output of the dispatcher, which is an iterable of ``ua.Dispatchable`` objects. b. We feed this output, along with the arguments, to the argument replacer. ``NotImplemented`` means we move to 3 with the next backend. c. We store the replaced arguments as the new arguments. 4. We feed the arguments into ``__ua_function__``, and return the output, and exit if it isn't ``NotImplemented``. 5. If the default implementation exists, we try it with the current backend. 6. On failure, we move to 3 with the next backend. If there are no more backends, we move to 7. 7. We raise a ``ua.BackendNotImplementedError``. Defining overridable multimethods ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ To define an overridable function (a multimethod), one needs a few things: 1. A dispatcher that returns an iterable of ``ua.Dispatchable`` objects. 2. A reverse dispatcher that replaces dispatchable values with the supplied ones. 3. A domain. 4. Optionally, a default implementation, which can be provided in terms of other multimethods. As an example, consider the following:: import uarray as ua def full_argreplacer(args, kwargs, dispatchables): def full(shape, fill_value, dtype=None, order='C'): return (shape, fill_value), dict( dtype=dispatchables[0], order=order ) return full(*args, **kwargs) @ua.create_multimethod(full_argreplacer, domain="numpy") def full(shape, fill_value, dtype=None, order='C'): return (ua.Dispatchable(dtype, np.dtype),) A large set of examples can be found in the ``unumpy`` repository, [8]_. This simple act of overriding callables allows us to override: * Methods * Properties, via ``fget`` and ``fset`` * Entire objects, via ``__get__``. Examples for NumPy ^^^^^^^^^^^^^^^^^^ A library that implements a NumPy-like API will use it in the following manner (as an example):: import numpy.overridable as unp _ua_implementations = {} __ua_domain__ = "numpy" def __ua_function__(func, args, kwargs): fn = _ua_implementations.get(func, None) return fn(*args, **kwargs) if fn is not None else NotImplemented def implements(ua_func): def inner(func): _ua_implementations[ua_func] = func return func return inner @implements(unp.asarray) def asarray(a, dtype=None, order=None): # Code here # Either this method or __ua_convert__ must # return NotImplemented for unsupported types, # Or they shouldn't be marked as dispatchable. # Provides a default implementation for ones and zeros. @implements(unp.full) def full(shape, fill_value, dtype=None, order='C'): # Code here Backward compatibility ---------------------- There are no backward incompatible changes proposed in this NEP. Alternatives ------------ The current alternative to this problem is a combination of NEP-18 [2]_, NEP-13 [4]_ and NEP-30 [9]_ plus adding more protocols (not yet specified) in addition to it. Even then, some parts of the NumPy API will remain non-overridable, so it's a partial alternative. The main alternative to vendoring ``unumpy`` is to simply move it into NumPy completely and not distribute it as a separate package. This would also achieve the proposed goals, however we prefer to keep it a separate package for now, for reasons already stated above. The third alternative is to move ``unumpy`` into the NumPy organisation and develop it as a NumPy project. This will also achieve the said goals, and is also a possibility that can be considered by this NEP. However, the act of doing an extra ``pip install`` or ``conda install`` may discourage some users from adopting this method. Discussion ---------- * ``uarray`` blogpost: https://labs.quansight.org/blog/2019/07/uarray-update-api-changes-overhead-and-comparison-to-__array_function__/ * The discussion section of NEP-18: https://numpy.org/neps/nep-0018-array-function-protocol.html#discussion * NEP-22: https://numpy.org/neps/nep-0022-ndarray-duck-typing-overview.html * Dask issue #4462: https://github.com/dask/dask/issues/4462 * PR #13046: https://github.com/numpy/numpy/pull/13046 * Dask issue #4883: https://github.com/dask/dask/issues/4883 * Issue #13831: https://github.com/numpy/numpy/issues/13831 * Discussion PR 1: https://github.com/hameerabbasi/numpy/pull/3 * Discussion PR 2: https://github.com/hameerabbasi/numpy/pull/4 * Discussion PR 3: https://github.com/numpy/numpy/pull/14389 References and Footnotes ------------------------ .. [1] uarray, A general dispatch mechanism for Python: https://uarray.readthedocs.io .. [2] NEP 18 ? A dispatch mechanism for NumPy?s high level array functions: https://numpy.org/neps/nep-0018-array-function-protocol.html .. [3] NEP 22 ? Duck typing for NumPy arrays ? high level overview: https://numpy.org/neps/nep-0022-ndarray-duck-typing-overview.html .. [4] NEP 13 ? A Mechanism for Overriding Ufuncs: https://numpy.org/neps/nep-0013-ufunc-overrides.html .. [5] Reply to Adding to the non-dispatched implementation of NumPy methods: http://numpy-discussion.10968.n7.nabble.com/Adding-to-the-non-dispatched-implementation-of-NumPy-methods-tp46816p46874.html .. [6] Custom Dtype/Units discussion: http://numpy-discussion.10968.n7.nabble.com/Custom-Dtype-Units-discussion-td43262.html .. [7] The epic dtype cleanup plan: https://github.com/numpy/numpy/issues/2899 .. [8] unumpy: NumPy, but implementation-independent: https://unumpy.readthedocs.io .. [9] NEP 30 ? Duck Typing for NumPy Arrays - Implementation: https://www.numpy.org/neps/nep-0030-duck-array-protocol.html .. [10] http://scipy.github.io/devdocs/fft.html#backend-control Copyright --------- This document has been placed in the public domain. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jpivarski at gmail.com Thu Sep 5 11:51:12 2019 From: jpivarski at gmail.com (Jim Pivarski) Date: Thu, 5 Sep 2019 10:51:12 -0500 Subject: [Numpy-discussion] Integer array indexing (numpy.take) as function composition Message-ID: Hi, I'm a long-time user of Numpy; I had a question and I didn't know where else to ask. (It's not a bug?otherwise I would have posted it at https://github.com/numpy/numpy/issues). Has anyone noticed that indexing an array with integer arrays (i.e. numpy.take) is a function composition? For example, suppose you have any two non-negative functions of integers: def f(x): return x**2 - 5*x + 10 def g(y): return max(0, 2*y - 10) + 3 and you sample them as arrays, as well as their composition g(f(?)): F = numpy.array([f(i) for i in range(10)]) # F is f at 10 elements G = numpy.array([g(i) for i in range(100)]) # G is g at enough elements to include max(f) GoF = numpy.array([g(f(i)) for i in range(10)]) # GoF is g?f at 10 elements Indexing G by F (G[F]) returns the same result as the sampled composition ( GoF): print("G\u2218F =", G[F]) # integer indexing print("g\u2218f =", GoF) # array of the composed functions G?F = [13 5 3 3 5 13 25 41 61 85] g?f = [13 5 3 3 5 13 25 41 61 85] This isn't a proof, but I think it's easy to see that it would be true for any non-negative functions (negative index handling spoils this property). It might sound like a purely academic point, but I've noticed that I've been able to optimize and simplify some code by taking advantage of the associative property of function composition, repeatedly applying numpy.take on arrays of integers before applying the fully composed index to my data. As an example of an optimization, if I have to do the same thing to N data arrays, it helps to prepare a single integer index and apply it to the N data arrays instead of modifying all N data arrays in multiple steps. As an example of a simplification, if I need to modify arrays in recursion, it's easier to reason about the recursion if only the terminal case applies an index to data, with the non-terminal steps applying indexes to indexes. This is such a basic property that I bet it has a name, and there's probably some literature on it, like what you could find if you were interested in monads in Haskell. But I haven't been able to find the right search strings?what would you call this property? Is there a literature on it and its uses? Thanks! -- Jim -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu Sep 5 18:55:30 2019 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 5 Sep 2019 16:55:30 -0600 Subject: [Numpy-discussion] 1.17.2 release. Message-ID: Hi All, I'm planning to make a 1.17.2 release Friday or Saturday in order to fix some newly reported regressions. If there is anything that you think absolutely needs to be in that release, please yell. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From grlee77 at gmail.com Thu Sep 5 19:27:19 2019 From: grlee77 at gmail.com (Gregory Lee) Date: Thu, 5 Sep 2019 19:27:19 -0400 Subject: [Numpy-discussion] 1.17.2 release. In-Reply-To: References: Message-ID: Hi Chuck, It is not critical, but it would be nice if the fft ZeroDivisionError fix in https://github.com/numpy/numpy/pull/14279 could make it into 1.17.2. It has an "approved" review and seems to be ready. Thanks! Greg On Thu, Sep 5, 2019 at 6:56 PM Charles R Harris wrote: > Hi All, > > I'm planning to make a 1.17.2 release Friday or Saturday in order to fix > some newly reported regressions. If there is anything that you think > absolutely needs to be in that release, please yell. > > Chuck > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu Sep 5 20:32:29 2019 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 5 Sep 2019 18:32:29 -0600 Subject: [Numpy-discussion] 1.17.2 release. In-Reply-To: References: Message-ID: On Thu, Sep 5, 2019 at 5:27 PM Gregory Lee wrote: > Hi Chuck, > > It is not critical, but it would be nice if the fft ZeroDivisionError fix > in https://github.com/numpy/numpy/pull/14279 could make it into 1.17.2. > It has an "approved" review and seems to be ready. > Thanks! > OK, I put that in and copied `pocketfft.py` from master for the backport. The main argument was over the naming of the new variable and I think Eric made a valid point, but we can always switch things around. I also thought it would be nice to check the `inv_norm` directly rather than through `n`, but there you go. If you would like to clean it up further, feel free to do so, but at least 1.17 will no longer be an issue in that regard. Chuck On Thu, Sep 5, 2019 at 6:56 PM Charles R Harris > wrote: > >> Hi All, >> >> I'm planning to make a 1.17.2 release Friday or Saturday in order to fix >> some newly reported regressions. If there is anything that you think >> absolutely needs to be in that release, please yell. >> >> Chuck >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Fri Sep 6 03:49:15 2019 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 6 Sep 2019 00:49:15 -0700 Subject: [Numpy-discussion] =?utf-8?q?NEP_31_=E2=80=94_Context-local_and_?= =?utf-8?q?global_overrides_of_the_NumPy_API?= In-Reply-To: References: Message-ID: On Mon, Sep 2, 2019 at 11:21 PM Ralf Gommers wrote: > On Mon, Sep 2, 2019 at 2:09 PM Nathaniel Smith wrote: >> The reason this is challenging is that there's a lot of code written >> in Cython/C/C++ that calls np.asarray, > > Cython code only perhaps? It would surprise me if there's a lot of C/C++ code that explicitly calls into our Python rather than C API. I think there's also code written as Python-wrappers-around-C-code where the Python layer handles the error-checking/coercion, and the C code trusts it to have done so. >> Now if I understand right, your proposal would be to make it so any >> code in any package could arbitrarily change the behavior of >> np.asarray for all inputs, e.g. I could just decide that >> np.asarray([1, 2, 3]) should return some arbitrary non-np.ndarray >> object. > > No, definitely not! It's all opt-in, by explicitly importing from `numpy.overridable` or `unumpy`. No behavior of anything in the existing numpy namespaces should be affected in any way. Ah, whoops, I definitely missed that :-). That does change things! So one of the major decision points for any duck-array API work, is whether to modify the numpy semantics "in place", so user code automatically gets access to the new semantics, or else to make a new namespace, that users have to switch over to manually. The major disadvantage of doing changes "in place" is, of course, that we have to do all this careful work to move incrementally and make sure that we don't break things. The major (potential) advantage is that we have a much better chance of moving the ecosystem with us. The major advantage of making a new namespace is that it's *much* easier to experiment, because there's no chance of breaking any projects that didn't opt in. The major disadvantage is that numpy is super strongly entrenched, and convincing every project to switch to something else is incredibly difficult and costly. (I just searched github for "import numpy" and got 17.7 million hits. That's a lot of imports to update!) Also, empirically, we've seen multiple projects try to do this (e.g. DyND), and so far they all failed. It sounds like unumpy is an interesting approach that hasn't been tried before ? in particular, the promise that you can "just switch your imports" is a much easier transition than e.g. DyND offered. Of course, that promise is somewhat undermined by the reality that all these potential backend libraries *aren't* 100% compatible with numpy, and can't be... it might turn out that this ends up like asanyarray, where you can't really use it reliably because the thing that comes out will generally support *most* of the normal ndarray semantics, but you don't know which part. Is scipy planning to switch to using this everywhere, including in C code? If not, then how do you expect projects like matplotlib to switch, given that matplotlib likes to pass array objects into scipy functions? Are you planning to take the opportunity to clean up some of the obscure corners of the numpy API? But those are general questions about unumpy, and I'm guessing no-one knows all the answers yet... and these question actually aren't super relevant to the NEP. The NEP isn't inventing unumpy. IIUC, the main thing the NEP is proposes is simply to make "numpy.overridable" an alias for "unumpy". It's not clear to me what problem this alias is solving. If all downstream users have to update their imports anyway, then they can write "import unumpy as np" just as easily as they can write "import numpy.overridable as np". I guess the main reason this is a NEP is because the unumpy project is hoping to get an "official stamp of approval" from numpy? But even that could be accomplished by just putting something in the docs. And adding the alias has substantial risks: it makes unumpy tied to the numpy release cycle and compatibility rules, and it means that we're committing to maintaining unumpy ~forever even if Hameer or Quansight move onto other things. That seems like a lot to take on for such vague benefits? On Tue, Sep 3, 2019 at 2:04 AM Hameer Abbasi wrote: > The fact that we're having to design more and more protocols for a lot > of very similar things is, to me, an indicator that we do have holistic > problems that ought to be solved by a single protocol. But the reason we've had trouble designing these protocols is that they're each different :-). If it was just a matter of copying __array_ufunc__ we'd have been done in a few minutes... -n -- Nathaniel J. Smith -- https://vorpus.org From einstein.edison at gmail.com Fri Sep 6 04:32:25 2019 From: einstein.edison at gmail.com (Hameer Abbasi) Date: Fri, 6 Sep 2019 10:32:25 +0200 Subject: [Numpy-discussion] =?utf-8?q?NEP_31_=E2=80=94_Context-local_and_?= =?utf-8?q?global_overrides_of_the_NumPy_API?= In-Reply-To: References: Message-ID: <67b837e7-5d3d-337e-49f7-cac078ec4d8f@gmail.com> That's a lot of very good questions! Let me see if I can answer them one-by-one. On 06.09.19 09:49, Nathaniel Smith wrote: > Ah, whoops, I definitely missed that :-). That does change things! > So one of the major decision points for any duck-array API work, is > whether to modify the numpy semantics "in place", so user code > automatically gets access to the new semantics, or else to make a new > namespace, that users have to switch over to manually. > > The major disadvantage of doing changes "in place" is, of course, that > we have to do all this careful work to move incrementally and make > sure that we don't break things. The major (potential) advantage is > that we have a much better chance of moving the ecosystem with us. > > The major advantage of making a new namespace is that it's *much* > easier to experiment, because there's no chance of breaking any > projects that didn't opt in. The major disadvantage is that numpy is > super strongly entrenched, and convincing every project to switch to > something else is incredibly difficult and costly. (I just searched > github for "import numpy" and got 17.7 million hits. That's a lot of > imports to update!) Also, empirically, we've seen multiple projects > try to do this (e.g. DyND), and so far they all failed. > > It sounds like unumpy is an interesting approach that hasn't been > tried before ? in particular, the promise that you can "just switch > your imports" is a much easier transition than e.g. DyND offered. Of > course, that promise is somewhat undermined by the reality that all > these potential backend libraries *aren't* 100% compatible with numpy, > and can't be... This is true, however, with minor adjustments it should be possible to make your code work across backends, if you don't use a few obscure parts of NumPy. > it might turn out that this ends up like asanyarray, > where you can't really use it reliably because the thing that comes > out will generally support *most* of the normal ndarray semantics, but > you don't know which part. Is scipy planning to switch to using this > everywhere, including in C code? Not at present I think, however, it should be possible to "re-write" parts of scipy on top of unumpy in order to make that work, and where speed is required and an efficient implementation isn't available in terms of NumPy functions, make dispatchable multimethods and allow library authors to provide the said implementations. We'll call this project uscipy, but that's an endgame at this point. Right now, we're focusing on unumpy. > If not, then how do you expect > projects like matplotlib to switch, given that matplotlib likes to > pass array objects into scipy functions? Are you planning to take the > opportunity to clean up some of the obscure corners of the numpy API? That's a completely different thing, and to answer that question requires a distinction between uarray and unumpy... uarray is a backend-mechanism, independent of array computing. We hope that matplotlib will adopt it to switch around it's GUI back-ends for example. > But those are general questions about unumpy, and I'm guessing no-one > knows all the answers yet... and these question actually aren't super > relevant to the NEP. The NEP isn't inventing unumpy. IIUC, the main > thing the NEP is proposes is simply to make "numpy.overridable" an > alias for "unumpy". > > It's not clear to me what problem this alias is solving. If all > downstream users have to update their imports anyway, then they can > write "import unumpy as np" just as easily as they can write "import > numpy.overridable as np". I guess the main reason this is a NEP is > because the unumpy project is hoping to get an "official stamp of > approval" from numpy? That's part of it. The concrete problems it's solving are threefold: 1. Array creation functions can be overridden. 2. Array coercion is now covered. 3. "Default implementations" will allow you to re-write your NumPy array more easily, when such efficient implementations exist in terms of other NumPy functions. That will also help achieve similar semantics, but as I said, they're just "default"... The import numpy.overridable part is meant to help garner adoption, and to prefer the unumpy module if it is available (which will continue to be developed separately). That way it isn't so tightly coupled to the release cycle. One alternative Sebastian Berg mentioned (and I am on board with) is just moving unumpy into the NumPy organisation. What we fear keeping it separate is that the simple act of a pip install unumpy will keep people from using it or trying it out. > But even that could be accomplished by just > putting something in the docs. And adding the alias has substantial > risks: it makes unumpy tied to the numpy release cycle and > compatibility rules, and it means that we're committing to maintaining > unumpy ~forever even if Hameer or Quansight move onto other things. > That seems like a lot to take on for such vague benefits? I can assure you Travis has had the goal of "replatforming SciPy" from as far back as I met him, he's spawned quite a few efforts in that direction along with others from Quansight (and they've led to nice projects). Quansight, as I see it, is unlikely to abandon something like this if it becomes successful (and acceptance of this NEP will be a huge success story). > On Tue, Sep 3, 2019 at 2:04 AM Hameer Abbasi wrote: >> The fact that we're having to design more and more protocols for a lot >> of very similar things is, to me, an indicator that we do have holistic >> problems that ought to be solved by a single protocol. > But the reason we've had trouble designing these protocols is that > they're each different :-). If it was just a matter of copying > __array_ufunc__ we'd have been done in a few minutes... uarray borrows heavily from __array_function__. It allows substituting (for example) __array_ufunc__ by overriding ufunc.__call__, ufunc.reduce and so on. It takes, as I mentioned, a holistic approach: There are callables that need to be overriden, possibly with nothing to dispatch on. And then it builds on top of that, adding coercion/conversion. > -n > > -- > Nathaniel J. Smith --https://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From tcaswell at gmail.com Fri Sep 6 14:37:26 2019 From: tcaswell at gmail.com (Thomas Caswell) Date: Fri, 6 Sep 2019 14:37:26 -0400 Subject: [Numpy-discussion] Proposal to accept NEP #29: Recommend Python and Numpy version support as a community policy standard Message-ID: https://numpy.org/neps/nep-0029-deprecation_policy.html The outstanding concern in https://github.com/numpy/numpy/pull/14086 was that some projects want to continue to support additional versions of Python and numpy outside of the minimum support windows. The language has been changed to specify that these are _minimum_ support windows and that projects _should_ not _will_ drop support as they can. There is one trivial wording change PR open ( https://github.com/numpy/numpy/pull/14444). If there are no substantive objections within 7 days from this email, then the NEP will be accepted; see NEP 0 for more details. Tom -- Thomas Caswell tcaswell at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Fri Sep 6 14:44:19 2019 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Fri, 6 Sep 2019 11:44:19 -0700 Subject: [Numpy-discussion] =?utf-8?q?NEP_31_=E2=80=94_Context-local_and_?= =?utf-8?q?global_overrides_of_the_NumPy_API?= In-Reply-To: <67b837e7-5d3d-337e-49f7-cac078ec4d8f@gmail.com> References: <67b837e7-5d3d-337e-49f7-cac078ec4d8f@gmail.com> Message-ID: On Fri, Sep 6, 2019 at 1:32 AM Hameer Abbasi wrote: > That's a lot of very good questions! Let me see if I can answer them > one-by-one. > > On 06.09.19 09:49, Nathaniel Smith wrote: > > But even that could be accomplished by just > putting something in the docs. And adding the alias has substantial > risks: it makes unumpy tied to the numpy release cycle and > compatibility rules, and it means that we're committing to maintaining > unumpy ~forever even if Hameer or Quansight move onto other things. > That seems like a lot to take on for such vague benefits? > > I can assure you Travis has had the goal of "replatforming SciPy" from as > far back as I met him, he's spawned quite a few efforts in that direction > along with others from Quansight (and they've led to nice projects). > Quansight, as I see it, is unlikely to abandon something like this if it > becomes successful (and acceptance of this NEP will be a huge success > story). > Let me address this separately, since it's not really a technical concern. First, this is not what we say for other contributions. E.g. we didn't say no to Pocketfft because Martin Reineck may move on, or __array_function__ because Stephan may get other interests at some point, or a whole new numpy.random, etc. Second, this is not about Quansight. At Quansight Labs we've been able to create time for Hameer to build this, and me and others to contribute - which is very nice, but the two are not tied inextricably together. In the end it's still individuals submitting this NEP. I have been a NumPy dev for ~10 years before joining Quansight, and my future NumPy contributions are not dependent on staying at Quansight (not that I plan to go anywhere!). I'm guessing the same is true for others. Third, unumpy is a fairly thin layer over uarray, which already has another user in SciPy. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Fri Sep 6 14:52:26 2019 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Fri, 6 Sep 2019 11:52:26 -0700 Subject: [Numpy-discussion] =?utf-8?q?NEP_31_=E2=80=94_Context-local_and_?= =?utf-8?q?global_overrides_of_the_NumPy_API?= In-Reply-To: References: Message-ID: On Fri, Sep 6, 2019 at 12:53 AM Nathaniel Smith wrote: > On Mon, Sep 2, 2019 at 11:21 PM Ralf Gommers > wrote: > > On Mon, Sep 2, 2019 at 2:09 PM Nathaniel Smith wrote: > > On Tue, Sep 3, 2019 at 2:04 AM Hameer Abbasi > wrote: > > The fact that we're having to design more and more protocols for a lot > > of very similar things is, to me, an indicator that we do have holistic > > problems that ought to be solved by a single protocol. > > But the reason we've had trouble designing these protocols is that > they're each different :-). If it was just a matter of copying > __array_ufunc__ we'd have been done in a few minutes... > I don't think that argument is correct. That we now have two very similar protocols is simply a matter of history and limited developer time. NEP 18 discusses in several places that __array_ufunc__ should be brought in line with __array_ufunc__, and that we can migrate a function from one protocol to the other. There's no technical reason other than backwards compat and dev time why we couldn't use __array_function__ for ufuncs also. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Fri Sep 6 17:45:11 2019 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Fri, 6 Sep 2019 14:45:11 -0700 Subject: [Numpy-discussion] =?utf-8?q?NEP_31_=E2=80=94_Context-local_and_?= =?utf-8?q?global_overrides_of_the_NumPy_API?= In-Reply-To: <67b837e7-5d3d-337e-49f7-cac078ec4d8f@gmail.com> References: <67b837e7-5d3d-337e-49f7-cac078ec4d8f@gmail.com> Message-ID: On Fri, Sep 6, 2019 at 1:32 AM Hameer Abbasi wrote: > That's a lot of very good questions! Let me see if I can answer them > one-by-one. > > On 06.09.19 09:49, Nathaniel Smith wrote: > > But those are general questions about unumpy, and I'm guessing no-one > knows all the answers yet... and these question actually aren't super > relevant to the NEP. The NEP isn't inventing unumpy. IIUC, the main > thing the NEP is proposes is simply to make "numpy.overridable" an > alias for "unumpy". > > It's not clear to me what problem this alias is solving. If all > downstream users have to update their imports anyway, then they can > write "import unumpy as np" just as easily as they can write "import > numpy.overridable as np". I guess the main reason this is a NEP is > because the unumpy project is hoping to get an "official stamp of > approval" from numpy? > > Also because we have NEP 30 for yet another protocol, and there's likely another NEP to follow after that for array creation. Those use cases are covered by unumpy, so it makes sense to have a NEP for that as well, so they can be considered side-by-side. > That's part of it. The concrete problems it's solving are threefold: > > 1. Array creation functions can be overridden. > 2. Array coercion is now covered. > 3. "Default implementations" will allow you to re-write your NumPy > array more easily, when such efficient implementations exist in terms of > other NumPy functions. That will also help achieve similar semantics, but > as I said, they're just "default"... > > There may be another very concrete one (that's not yet in the NEP): allowing other libraries that consume ndarrays to use overrides. An example is numpy.fft: currently both mkl_fft and pyfftw monkeypatch NumPy, something we don't like all that much (in particular for mkl_fft, because it's the default in Anaconda). `__array_function__` isn't able to help here, because it will always choose NumPy's own implementation for ndarray input. With unumpy you can support multiple libraries that consume ndarrays. Another example is einsum: if you want to use opt_einsum for all inputs (including ndarrays), then you cannot use np.einsum. And yet another is using bottleneck (https://kwgoodman.github.io/bottleneck-doc/reference.html) for nan-functions and partition. There's likely more of these. The point is: sometimes the array protocols are preferred (e.g. Dask/Xarray-style meta-arrays), sometimes unumpy-style dispatch works better. It's also not necessarily an either or, they can be complementary. Actually, after writing this I just realized something. With 1.17.x we have: ``` In [1]: import dask.array as da In [2]: d = da.from_array(np.linspace(0, 1)) In [3]: np.fft.fft(d) Out[3]: dask.array ``` In Anaconda `np.fft.fft` *is* `mkl_fft._numpy_fft.fft`, so this won't work. We have no bug report yet because 1.17.x hasn't landed in conda defaults yet (perhaps this is a/the reason why?), but it will be a problem. The import numpy.overridable part is meant to help garner adoption, and to > prefer the unumpy module if it is available (which will continue to be > developed separately). That way it isn't so tightly coupled to the release > cycle. One alternative Sebastian Berg mentioned (and I am on board with) is > just moving unumpy into the NumPy organisation. What we fear keeping it > separate is that the simple act of a pip install unumpy will keep people > from using it or trying it out. > Note that this is not the most critical aspect. I pushed for vendoring as numpy.overridable because I want to not derail the comparison with NEP 30 et al. with a "should we add a dependency" discussion. The interesting part to decide on first is: do we need the unumpy override mechanism? Vendoring opt-in vs. making it default vs. adding a dependency is of secondary interest right now. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Fri Sep 6 19:50:46 2019 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 6 Sep 2019 16:50:46 -0700 Subject: [Numpy-discussion] =?utf-8?q?NEP_31_=E2=80=94_Context-local_and_?= =?utf-8?q?global_overrides_of_the_NumPy_API?= In-Reply-To: References: <67b837e7-5d3d-337e-49f7-cac078ec4d8f@gmail.com> Message-ID: On Fri, Sep 6, 2019 at 2:45 PM Ralf Gommers wrote: > There may be another very concrete one (that's not yet in the NEP): allowing other libraries that consume ndarrays to use overrides. An example is numpy.fft: currently both mkl_fft and pyfftw monkeypatch NumPy, something we don't like all that much (in particular for mkl_fft, because it's the default in Anaconda). `__array_function__` isn't able to help here, because it will always choose NumPy's own implementation for ndarray input. With unumpy you can support multiple libraries that consume ndarrays. unumpy doesn't help with this either though, does it? unumpy is double-opt-in: the code using np.fft has to switch to using unumpy.fft instead, and then someone has to enable the backend. But MKL/pyfftw started out as opt-in ? you could `import mkl_fft` or `import pyfftw` ? and the whole reason they switched to monkeypatching is that they decided that opt-in wasn't good enough for them. >> The import numpy.overridable part is meant to help garner adoption, and to prefer the unumpy module if it is available (which will continue to be developed separately). That way it isn't so tightly coupled to the release cycle. One alternative Sebastian Berg mentioned (and I am on board with) is just moving unumpy into the NumPy organisation. What we fear keeping it separate is that the simple act of a pip install unumpy will keep people from using it or trying it out. > > Note that this is not the most critical aspect. I pushed for vendoring as numpy.overridable because I want to not derail the comparison with NEP 30 et al. with a "should we add a dependency" discussion. The interesting part to decide on first is: do we need the unumpy override mechanism? Vendoring opt-in vs. making it default vs. adding a dependency is of secondary interest right now. Wait, but I thought the only reason we would have a dependency is if we're exporting it as part of the numpy namespace. If we keep the import as `import unumpy`, then it works just as well, without any dependency *or* vendoring in numpy, right? -n -- Nathaniel J. Smith -- https://vorpus.org From njs at pobox.com Fri Sep 6 20:16:04 2019 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 6 Sep 2019 17:16:04 -0700 Subject: [Numpy-discussion] =?utf-8?q?NEP_31_=E2=80=94_Context-local_and_?= =?utf-8?q?global_overrides_of_the_NumPy_API?= In-Reply-To: References: <67b837e7-5d3d-337e-49f7-cac078ec4d8f@gmail.com> Message-ID: On Fri, Sep 6, 2019 at 11:44 AM Ralf Gommers wrote: > > > > On Fri, Sep 6, 2019 at 1:32 AM Hameer Abbasi wrote: >> >> That's a lot of very good questions! Let me see if I can answer them one-by-one. >> >> On 06.09.19 09:49, Nathaniel Smith wrote: >> >> But even that could be accomplished by just >> putting something in the docs. And adding the alias has substantial >> risks: it makes unumpy tied to the numpy release cycle and >> compatibility rules, and it means that we're committing to maintaining >> unumpy ~forever even if Hameer or Quansight move onto other things. >> That seems like a lot to take on for such vague benefits? >> >> I can assure you Travis has had the goal of "replatforming SciPy" from as far back as I met him, he's spawned quite a few efforts in that direction along with others from Quansight (and they've led to nice projects). Quansight, as I see it, is unlikely to abandon something like this if it becomes successful (and acceptance of this NEP will be a huge success story). > > > Let me address this separately, since it's not really a technical concern. > > First, this is not what we say for other contributions. E.g. we didn't say no to Pocketfft because Martin Reineck may move on, or __array_function__ because Stephan may get other interests at some point, or a whole new numpy.random, etc. > > Second, this is not about Quansight. At Quansight Labs we've been able to create time for Hameer to build this, and me and others to contribute - which is very nice, but the two are not tied inextricably together. In the end it's still individuals submitting this NEP. I have been a NumPy dev for ~10 years before joining Quansight, and my future NumPy contributions are not dependent on staying at Quansight (not that I plan to go anywhere!). I'm guessing the same is true for others. > > Third, unumpy is a fairly thin layer over uarray, which already has another user in SciPy. I'm sorry if that came across as some kind snipe at Quansight specifically. I didn't mean it that way. It's a much more general concern: software projects are inherently risky, and often fail; companies and research labs change focus and funding shifts around. This is just a general risk that we need to take that into account when making decisions. And when there are proposals to add new submodules to numpy, we always put them under intense scrutiny, exactly because of the support commitments. The new fft and random code are replacing/extending our existing public APIs that we already committed to, so that's a very different situation. And __array_function__ was something that couldn't work at all without being built into numpy, and even then it was controversial and merged on an experimental basis. It's always about trade-offs. My concern here is that the NEP is proposing that the numpy maintainers take on this large commitment, *and* AFAICT there's no compensating benefit to justify that: everything that can be done with numpy.overridable can be done just as well with a standalone unumpy package... right? -n -- Nathaniel J. Smith -- https://vorpus.org From charlesr.harris at gmail.com Fri Sep 6 20:42:04 2019 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 6 Sep 2019 18:42:04 -0600 Subject: [Numpy-discussion] NumPy 1.17.2 released. Message-ID: Hi All, On behalf of the NumPy team I am pleased to announce that NumPy 1.17.2 has been released. This release contains fixes for bugs reported against NumPy 1.17.1 along with some documentation improvements. The most important fix is for lexsort when the keys are of type (u)int8 or (u)int16. If you are currently using 1.17 you should upgrade. The Python versions supported in this release are 3.5-3.7, Python 3.8b4 should work with the released source packages, but there are no future guarantees. Downstream developers should use Cython >= 0.29.13 for Python 3.8 support and OpenBLAS >= 3.7 to avoid wrong results on the Skylake architecture. The NumPy wheels on PyPI are built from the OpenBLAS development branch in order to avoid those problems. Wheels for this release can be downloaded from PyPI , source archives and release notes are available from Github . *Contributors* A total of 7 people contributed to this release. People with a "+" by their names contributed a patch for the first time. - CakeWithSteak + - Charles Harris - Dan Allan - Hameer Abbasi - Lars Grueter - Matti Picus - Sebastian Berg *Pull requests merged* A total of 8 pull requests were merged for this release. - #14418: BUG: Fix aradixsort indirect indexing. - #14420: DOC: Fix a minor typo in dispatch documentation. - #14421: BUG: test, fix regression in converting to ctypes - #14430: BUG: Do not show Override module in private error classes. - #14432: BUG: Fixed maximum relative error reporting in assert_allclose. - #14433: BUG: Fix uint-overflow if padding with linear_ramp and negative... - #14436: BUG: Update 1.17.x with 1.18.0-dev pocketfft.py. - #14446: REL: Prepare for NumPy 1.17.2 release. Cheers, Charles Harris -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sat Sep 7 01:54:08 2019 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Fri, 6 Sep 2019 22:54:08 -0700 Subject: [Numpy-discussion] =?utf-8?q?NEP_31_=E2=80=94_Context-local_and_?= =?utf-8?q?global_overrides_of_the_NumPy_API?= In-Reply-To: References: <67b837e7-5d3d-337e-49f7-cac078ec4d8f@gmail.com> Message-ID: On Fri, Sep 6, 2019 at 5:16 PM Nathaniel Smith wrote: > On Fri, Sep 6, 2019 at 11:44 AM Ralf Gommers > wrote: > > > > > > > > On Fri, Sep 6, 2019 at 1:32 AM Hameer Abbasi > wrote: > >> > >> That's a lot of very good questions! Let me see if I can answer them > one-by-one. > >> > >> On 06.09.19 09:49, Nathaniel Smith wrote: > >> > >> But even that could be accomplished by just > >> putting something in the docs. And adding the alias has substantial > >> risks: it makes unumpy tied to the numpy release cycle and > >> compatibility rules, and it means that we're committing to maintaining > >> unumpy ~forever even if Hameer or Quansight move onto other things. > >> That seems like a lot to take on for such vague benefits? > >> > >> I can assure you Travis has had the goal of "replatforming SciPy" from > as far back as I met him, he's spawned quite a few efforts in that > direction along with others from Quansight (and they've led to nice > projects). Quansight, as I see it, is unlikely to abandon something like > this if it becomes successful (and acceptance of this NEP will be a huge > success story). > > > > > > Let me address this separately, since it's not really a technical > concern. > > > > First, this is not what we say for other contributions. E.g. we didn't > say no to Pocketfft because Martin Reineck may move on, or > __array_function__ because Stephan may get other interests at some point, > or a whole new numpy.random, etc. > > > > Second, this is not about Quansight. At Quansight Labs we've been able > to create time for Hameer to build this, and me and others to contribute - > which is very nice, but the two are not tied inextricably together. In the > end it's still individuals submitting this NEP. I have been a NumPy dev for > ~10 years before joining Quansight, and my future NumPy contributions are > not dependent on staying at Quansight (not that I plan to go anywhere!). > I'm guessing the same is true for others. > > > > Third, unumpy is a fairly thin layer over uarray, which already has > another user in SciPy. > > I'm sorry if that came across as some kind snipe at Quansight > specifically. I didn't mean it that way. It's a much more general > concern: software projects are inherently risky, and often fail; > companies and research labs change focus and funding shifts around. > This is just a general risk that we need to take that into account > when making decisions. And when there are proposals to add new > submodules to numpy, we always put them under intense scrutiny, > exactly because of the support commitments. > Yes, that's fair, and we should be critical here. All code we accept is indeed a maintenance burden. > The new fft and random code are replacing/extending our existing > public APIs that we already committed to, so that's a very different > situation. And __array_function__ was something that couldn't work at > all without being built into numpy, and even then it was controversial > and merged on an experimental basis. It's always about trade-offs. My > concern here is that the NEP is proposing that the numpy maintainers > take on this large commitment, Again, not just the NumPy maintainers. There really isn't that much in `unumpy` that's all that complicated. And again, `uarray` has multiple maintainers (note that Peter is also a SciPy core dev) and has another user in SciPy. *and* AFAICT there's no compensating > benefit to justify that: everything that can be done with > numpy.overridable can be done just as well with a standalone unumpy > package... right? > True, mostly. But at that point, if we say that it's the way to do array coercion, and creation (and perhaps some other things as well), we're saying at the same time that every other package that needs this (e.g. Dask, CuPy) should take unumpy as a hard dependency. Which is a much bigger ask than when it comes with NumPy. We can discuss it of course. Major exception is if we want to make it default for some functionality, like for example numpy.fft (I'll answer your other email for that. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sat Sep 7 02:04:02 2019 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Fri, 6 Sep 2019 23:04:02 -0700 Subject: [Numpy-discussion] =?utf-8?q?NEP_31_=E2=80=94_Context-local_and_?= =?utf-8?q?global_overrides_of_the_NumPy_API?= In-Reply-To: References: <67b837e7-5d3d-337e-49f7-cac078ec4d8f@gmail.com> Message-ID: On Fri, Sep 6, 2019 at 4:51 PM Nathaniel Smith wrote: > On Fri, Sep 6, 2019 at 2:45 PM Ralf Gommers > wrote: > > There may be another very concrete one (that's not yet in the NEP): > allowing other libraries that consume ndarrays to use overrides. An example > is numpy.fft: currently both mkl_fft and pyfftw monkeypatch NumPy, > something we don't like all that much (in particular for mkl_fft, because > it's the default in Anaconda). `__array_function__` isn't able to help > here, because it will always choose NumPy's own implementation for ndarray > input. With unumpy you can support multiple libraries that consume ndarrays. > > unumpy doesn't help with this either though, does it? unumpy is > double-opt-in: the code using np.fft has to switch to using unumpy.fft > instead, and then someone has to enable the backend. Very good point. It would make a lot of sense to at least make unumpy default on fft/linalg/random, even if we want to keep it opt-in for the functions in the main namespace. But MKL/pyfftw > started out as opt-in ? you could `import mkl_fft` or `import pyfftw` > ? and the whole reason they switched to monkeypatching is that they > decided that opt-in wasn't good enough for them. > No, that's not correct. The MKL team has asked for a proper backend system, so they can plug into numpy rather than monkeypatch it. Oleksey, Chuck and I discussed that two years ago already at the NumFOCUS Summit 2017. This has been explicitly on the NumPy roadmap for quite a while: "A backend system for numpy.fft (so that e.g. fft-mkl doesn?t need to monkeypatch numpy)" (see https://numpy.org/neps/roadmap.html#other-functionality) And if Anaconda would like to default to it, that's possible - because one registered backend needs to be chosen as the default, that could be mkl-fft. That is still a major improvement over the situation today. > >> The import numpy.overridable part is meant to help garner adoption, and > to prefer the unumpy module if it is available (which will continue to be > developed separately). That way it isn't so tightly coupled to the release > cycle. One alternative Sebastian Berg mentioned (and I am on board with) is > just moving unumpy into the NumPy organisation. What we fear keeping it > separate is that the simple act of a pip install unumpy will keep people > from using it or trying it out. > > > > Note that this is not the most critical aspect. I pushed for vendoring > as numpy.overridable because I want to not derail the comparison with NEP > 30 et al. with a "should we add a dependency" discussion. The interesting > part to decide on first is: do we need the unumpy override mechanism? > Vendoring opt-in vs. making it default vs. adding a dependency is of > secondary interest right now. > > Wait, but I thought the only reason we would have a dependency is if > we're exporting it as part of the numpy namespace. If we keep the > import as `import unumpy`, then it works just as well, without any > dependency *or* vendoring in numpy, right? > Vendoring means "include the code". So no dependency on an external package. If we don't vendor, it's going to be either unused, or end up as a dependency for the whole SciPy/PyData stack. Actually, now that we've discussed the fft issue, I'd suggest to change the NEP to: vendor, and make default for fft, random, and linalg. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From PeterBell10 at live.co.uk Sat Sep 7 05:32:41 2019 From: PeterBell10 at live.co.uk (Peter Bell) Date: Sat, 7 Sep 2019 09:32:41 +0000 Subject: [Numpy-discussion] =?windows-1252?q?NEP_31_=97_Context-local_and?= =?windows-1252?q?_global_overrides_of_the_NumPy_API?= In-Reply-To: References: <67b837e7-5d3d-337e-49f7-cac078ec4d8f@gmail.com> Message-ID: >> There may be another very concrete one (that's not yet in the NEP): allowing other libraries that consume ndarrays to use overrides. An example is numpy.fft: currently both mkl_fft and pyfftw monkeypatch NumPy, something we don't like all that much (in particular for mkl_fft, because it's the default in Anaconda). `__array_function__` isn't able to help here, because it will always choose NumPy's own implementation for ndarray input. With unumpy you can support multiple libraries that consume ndarrays. > unumpy doesn't help with this either though, does it? unumpy is double-opt-in: the code using np.fft has to switch to using unumpy.fft instead, and then someone has to enable the backend. But MKL/pyfftw started out as opt-in ? you could `import mkl_fft` or `import pyfftw` ? and the whole reason they switched to monkeypatching is that they decided that opt-in wasn't good enough for them. Because numpy functions are used to write many library functions, the end user isn't always able to opt-in by changing imports. So, for library functions, monkey patching is not simply convenient but actually necessary. Take for example scipy.signal.fftconvolve: SciPy can't change to pyfftw for licensing reasons so with SciPy < 1.4 your only option is to monkey patch scipy.fftpack and numpy.fft. However in SciPy >= 1.4, thanks to the uarray-based backend support in scipy.fft, I can write from scipy import fft, signal import pyfftw.interfaces.scipy_fft as pyfftw_fft x = np.random.randn(1024, 1024) with fft.set_backend(pyfftw_fft): y = signal.fftconvolve(x, x) # Calls pyfftw's rfft, irfft Yes, we had to opt-in in the library function (signal moved from scipy.fftpack to scipy.fft). But because there can be distance between the set_backend call and the FFT calls, the library is now much more configurable. Generally speaking, any library written to use unumpy would be configurable: (i) by the user, (ii) at runtime, (iii) without changing library code and (iv) without monkey patching. In scipy.fft I actually did it slightly differently than unumpy: the scipy.fft interface itself has the uarray dispatch and I set SciPy's version of pocketfft as the default global backend. This means that normal users don't need to set a backend, and thus don't need to opt-in in any way. For NumPy to follow this pattern as well would require more change to NumPy's code base than the current NEP's suggestion, mainly in separating the interface from the implementation that would become the default backend. - Peter From sebastian at sipsolutions.net Sat Sep 7 16:06:29 2019 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Sat, 07 Sep 2019 15:06:29 -0500 Subject: [Numpy-discussion] =?utf-8?q?NEP_31_=E2=80=94_Context-local_and_?= =?utf-8?q?global_overrides_of_the_NumPy_API?= In-Reply-To: References: <67b837e7-5d3d-337e-49f7-cac078ec4d8f@gmail.com> Message-ID: On Fri, 2019-09-06 at 14:45 -0700, Ralf Gommers wrote: > > > > That's part of it. The concrete problems it's solving are > > threefold: > > Array creation functions can be overridden. > > Array coercion is now covered. > > "Default implementations" will allow you to re-write your NumPy > > array more easily, when such efficient implementations exist in > > terms of other NumPy functions. That will also help achieve similar > > semantics, but as I said, they're just "default"... > > > > There may be another very concrete one (that's not yet in the NEP): > allowing other libraries that consume ndarrays to use overrides. An > example is numpy.fft: currently both mkl_fft and pyfftw monkeypatch > NumPy, something we don't like all that much (in particular for > mkl_fft, because it's the default in Anaconda). `__array_function__` > isn't able to help here, because it will always choose NumPy's own > implementation for ndarray input. With unumpy you can support > multiple libraries that consume ndarrays. > > Another example is einsum: if you want to use opt_einsum for all > inputs (including ndarrays), then you cannot use np.einsum. And yet > another is using bottleneck ( > https://kwgoodman.github.io/bottleneck-doc/reference.html) for nan- > functions and partition. There's likely more of these. > > The point is: sometimes the array protocols are preferred (e.g. > Dask/Xarray-style meta-arrays), sometimes unumpy-style dispatch works > better. It's also not necessarily an either or, they can be > complementary. > Let me try to move the discussion from the github issue here (this may not be the best place). (https://github.com/numpy/numpy/issues/14441 which asked for easier creation functions together with `__array_function__`). I think an important note mentioned here is how users interact with unumpy, vs. __array_function__. The former is an explicit opt-in, while the latter is implicit choice based on an `array-like` abstract base class and functional type based dispatching. To quote NEP 18 on this: "The downsides are that this would require an explicit opt-in from all existing code, e.g., import numpy.api as np, and in the long term would result in the maintenance of two separate NumPy APIs. Also, many functions from numpy itself are already overloaded (but inadequately), so confusion about high vs. low level APIs in NumPy would still persist." (I do think this is a point we should not just ignore, `uarray` is a thin layer, but it has a big surface area) Now there are things where explicit opt-in is obvious. And the FFT example is one of those, there is no way to implicitly choose another backend (except by just replacing it, i.e. monkeypatching) [1]. And right now I think these are _very_ different. Now for the end-users choosing one array-like over another, seems nicer as an implicit mechanism (why should I not mix sparse, dask and numpy arrays!?). This is the promise `__array_function__` tries to make. Unless convinced otherwise, my guess is that most library authors would strive for implicit support (i.e. sklearn, skimage, scipy). Circling back to creation and coercion. In a purely Object type system, these would be classmethods, I guess, but in NumPy and the libraries above, we are lost. Solution 1: Create explicit opt-in, e.g. through uarray. (NEP-31) * Required end-user opt-in. * Seems cleaner in many ways * Requires a full copy of the API. Solution 2: Add some coercion "protocol" (NEP-30) and expose a way to create new arrays more conveniently. This would practically mean adding an `array_type=np.ndarray` argument. * _Not_ used by end-users! End users should use dask.linspace! * Adds "strange" API somewhere in numpy, and possible a new "protocol" (additionally to coercion).[2] I still feel these solve different issues. The second one is intended to make array likes work implicitly in libraries (without end users having to do anything). While the first seems to force the end user to opt in, sometimes unnecessarily: def my_library_func(array_like): exp = np.exp(array_like) idx = np.arange(len(exp)) return idx, exp Would have all the information for implicit opt-in/Array-like support, but cannot do it right now. This is what I have been wondering, if uarray/unumpy, can in some way help me make this work (even _without_ the end user opting in). The reason is that simply, right now I am very clear on the need for this use case, but not sure about the need for end user opt in, since end users can just use dask.arange(). Cheers, Sebastian [1] To be honest, I do think a lot of the "issues" around monkeypatching exists just as much with backend choosing, the main difference seems to me that a lot of that: 1. monkeypatching was not done explicit (import mkl_fft; mkl_fft.monkeypatch_numpy())? 2. A backend system allows libaries to prefer one locally? (which I think is a big advantage) [2] There are the options of adding `linspace_like` functions somewhere in a numpy submodule, or adding `linspace(..., array_type=np.ndarray)`, or simply inventing a new "protocl" (which is not really a protocol?), and make it `ndarray.__numpy_like_creation_functions__.arange()`. > Actually, after writing this I just realized something. With 1.17.x > we have: > > ``` > In [1]: import dask.array as da > > > In [2]: d = da.from_array(np.linspace(0, 1)) > > > In [3]: np.fft.fft(d) > > Out[3]: dask.array chunksize=(50,)> > ``` > > In Anaconda `np.fft.fft` *is* `mkl_fft._numpy_fft.fft`, so this won't > work. We have no bug report yet because 1.17.x hasn't landed in conda > defaults yet (perhaps this is a/the reason why?), but it will be a > problem. > > > The import numpy.overridable part is meant to help garner adoption, > > and to prefer the unumpy module if it is available (which will > > continue to be developed separately). That way it isn't so tightly > > coupled to the release cycle. One alternative Sebastian Berg > > mentioned (and I am on board with) is just moving unumpy into the > > NumPy organisation. What we fear keeping it separate is that the > > simple act of a pip install unumpy will keep people from using it > > or trying it out. > > > Note that this is not the most critical aspect. I pushed for > vendoring as numpy.overridable because I want to not derail the > comparison with NEP 30 et al. with a "should we add a dependency" > discussion. The interesting part to decide on first is: do we need > the unumpy override mechanism? Vendoring opt-in vs. making it default > vs. adding a dependency is of secondary interest right now. > > Cheers, > Ralf > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From ralf.gommers at gmail.com Sat Sep 7 16:33:35 2019 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 7 Sep 2019 13:33:35 -0700 Subject: [Numpy-discussion] =?utf-8?q?NEP_31_=E2=80=94_Context-local_and_?= =?utf-8?q?global_overrides_of_the_NumPy_API?= In-Reply-To: References: <67b837e7-5d3d-337e-49f7-cac078ec4d8f@gmail.com> Message-ID: On Sat, Sep 7, 2019 at 1:07 PM Sebastian Berg wrote: > On Fri, 2019-09-06 at 14:45 -0700, Ralf Gommers wrote: > > > > > > > > > That's part of it. The concrete problems it's solving are > > > threefold: > > > Array creation functions can be overridden. > > > Array coercion is now covered. > > > "Default implementations" will allow you to re-write your NumPy > > > array more easily, when such efficient implementations exist in > > > terms of other NumPy functions. That will also help achieve similar > > > semantics, but as I said, they're just "default"... > > > > > > > There may be another very concrete one (that's not yet in the NEP): > > allowing other libraries that consume ndarrays to use overrides. An > > example is numpy.fft: currently both mkl_fft and pyfftw monkeypatch > > NumPy, something we don't like all that much (in particular for > > mkl_fft, because it's the default in Anaconda). `__array_function__` > > isn't able to help here, because it will always choose NumPy's own > > implementation for ndarray input. With unumpy you can support > > multiple libraries that consume ndarrays. > > > > Another example is einsum: if you want to use opt_einsum for all > > inputs (including ndarrays), then you cannot use np.einsum. And yet > > another is using bottleneck ( > > https://kwgoodman.github.io/bottleneck-doc/reference.html) for nan- > > functions and partition. There's likely more of these. > > > > The point is: sometimes the array protocols are preferred (e.g. > > Dask/Xarray-style meta-arrays), sometimes unumpy-style dispatch works > > better. It's also not necessarily an either or, they can be > > complementary. > > > > Let me try to move the discussion from the github issue here (this may > not be the best place). (https://github.com/numpy/numpy/issues/14441 > which asked for easier creation functions together with > `__array_function__`). > > I think an important note mentioned here is how users interact with > unumpy, vs. __array_function__. The former is an explicit opt-in, while > the latter is implicit choice based on an `array-like` abstract base > class and functional type based dispatching. > > To quote NEP 18 on this: "The downsides are that this would require an > explicit opt-in from all existing code, e.g., import numpy.api as np, > and in the long term would result in the maintenance of two separate > NumPy APIs. Also, many functions from numpy itself are already > overloaded (but inadequately), so confusion about high vs. low level > APIs in NumPy would still persist." > (I do think this is a point we should not just ignore, `uarray` is a > thin layer, but it has a big surface area) > > Now there are things where explicit opt-in is obvious. And the FFT > example is one of those, there is no way to implicitly choose another > backend (except by just replacing it, i.e. monkeypatching) [1]. And > right now I think these are _very_ different. > > > Now for the end-users choosing one array-like over another, seems nicer > as an implicit mechanism (why should I not mix sparse, dask and numpy > arrays!?). This is the promise `__array_function__` tries to make. > Unless convinced otherwise, my guess is that most library authors would > strive for implicit support (i.e. sklearn, skimage, scipy). > > Circling back to creation and coercion. In a purely Object type system, > these would be classmethods, I guess, but in NumPy and the libraries > above, we are lost. > > Solution 1: Create explicit opt-in, e.g. through uarray. (NEP-31) > * Required end-user opt-in. > * Seems cleaner in many ways > * Requires a full copy of the API. > bullet 1 and 3 are not required. if we decide to make it default, then there's no separate namespace > Solution 2: Add some coercion "protocol" (NEP-30) and expose a way to > create new arrays more conveniently. This would practically mean adding > an `array_type=np.ndarray` argument. > * _Not_ used by end-users! End users should use dask.linspace! > * Adds "strange" API somewhere in numpy, and possible a new > "protocol" (additionally to coercion).[2] > > I still feel these solve different issues. The second one is intended > to make array likes work implicitly in libraries (without end users > having to do anything). While the first seems to force the end user to > opt in, sometimes unnecessarily: > > def my_library_func(array_like): > exp = np.exp(array_like) > idx = np.arange(len(exp)) > return idx, exp > > Would have all the information for implicit opt-in/Array-like support, > but cannot do it right now. Can you explain this a bit more? `len(exp)` is a number, so `np.arange(number)` doesn't really have any information here. > This is what I have been wondering, if > uarray/unumpy, can in some way help me make this work (even _without_ > the end user opting in). good question. if that needs to work in the absence of the user doing anything, it should be something like with unumpy.determine_backend(exp): unumpy.arange(len(exp)) # or np.arange if we make unumpy default to get the equivalent to `np.arange_like(len(exp), array_type=exp)`. Note, that `determine_backend` thing doesn't exist today. The reason is that simply, right now I am very > clear on the need for this use case, but not sure about the need for > end user opt in, since end users can just use dask.arange(). > I don't get the last part. The arange is inside a library function, so a user can't just go in and change things there. Cheers, Ralf > > Cheers, > > Sebastian > > > [1] To be honest, I do think a lot of the "issues" around > monkeypatching exists just as much with backend choosing, the main > difference seems to me that a lot of that: > 1. monkeypatching was not done explicit > (import mkl_fft; mkl_fft.monkeypatch_numpy())? > 2. A backend system allows libaries to prefer one locally? > (which I think is a big advantage) > > [2] There are the options of adding `linspace_like` functions somewhere > in a numpy submodule, or adding `linspace(..., array_type=np.ndarray)`, > or simply inventing a new "protocl" (which is not really a protocol?), > and make it `ndarray.__numpy_like_creation_functions__.arange()`. > > > > > Actually, after writing this I just realized something. With 1.17.x > > we have: > > > > ``` > > In [1]: import dask.array as da > > > > > > In [2]: d = da.from_array(np.linspace(0, 1)) > > > > > > In [3]: np.fft.fft(d) > > > > Out[3]: dask.array > chunksize=(50,)> > > ``` > > > > In Anaconda `np.fft.fft` *is* `mkl_fft._numpy_fft.fft`, so this won't > > work. We have no bug report yet because 1.17.x hasn't landed in conda > > defaults yet (perhaps this is a/the reason why?), but it will be a > > problem. > > > > > The import numpy.overridable part is meant to help garner adoption, > > > and to prefer the unumpy module if it is available (which will > > > continue to be developed separately). That way it isn't so tightly > > > coupled to the release cycle. One alternative Sebastian Berg > > > mentioned (and I am on board with) is just moving unumpy into the > > > NumPy organisation. What we fear keeping it separate is that the > > > simple act of a pip install unumpy will keep people from using it > > > or trying it out. > > > > > Note that this is not the most critical aspect. I pushed for > > vendoring as numpy.overridable because I want to not derail the > > comparison with NEP 30 et al. with a "should we add a dependency" > > discussion. The interesting part to decide on first is: do we need > > the unumpy override mechanism? Vendoring opt-in vs. making it default > > vs. adding a dependency is of secondary interest right now. > > > > Cheers, > > Ralf > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Sat Sep 7 17:17:57 2019 From: sebastian at sipsolutions.net (sebastian) Date: Sat, 07 Sep 2019 16:17:57 -0500 Subject: [Numpy-discussion] =?utf-8?q?NEP_31_=E2=80=94_Context-local_and_?= =?utf-8?q?global_overrides_of_the_NumPy_API?= In-Reply-To: References: <67b837e7-5d3d-337e-49f7-cac078ec4d8f@gmail.com> Message-ID: On 2019-09-07 15:33, Ralf Gommers wrote: > On Sat, Sep 7, 2019 at 1:07 PM Sebastian Berg > wrote: > >> On Fri, 2019-09-06 at 14:45 -0700, Ralf Gommers wrote: >>> >>> >> >> >>>> That's part of it. The concrete problems it's solving are >>>> threefold: >>>> Array creation functions can be overridden. >>>> Array coercion is now covered. >>>> "Default implementations" will allow you to re-write your NumPy >>>> array more easily, when such efficient implementations exist in >>>> terms of other NumPy functions. That will also help achieve >> similar >>>> semantics, but as I said, they're just "default"... >>>> >>> >>> There may be another very concrete one (that's not yet in the >> NEP): >>> allowing other libraries that consume ndarrays to use overrides. >> An >>> example is numpy.fft: currently both mkl_fft and pyfftw >> monkeypatch >>> NumPy, something we don't like all that much (in particular for >>> mkl_fft, because it's the default in Anaconda). >> `__array_function__` >>> isn't able to help here, because it will always choose NumPy's own >>> implementation for ndarray input. With unumpy you can support >>> multiple libraries that consume ndarrays. >>> >>> Another example is einsum: if you want to use opt_einsum for all >>> inputs (including ndarrays), then you cannot use np.einsum. And >> yet >>> another is using bottleneck ( >>> https://kwgoodman.github.io/bottleneck-doc/reference.html) for >> nan- >>> functions and partition. There's likely more of these. >>> >>> The point is: sometimes the array protocols are preferred (e.g. >>> Dask/Xarray-style meta-arrays), sometimes unumpy-style dispatch >> works >>> better. It's also not necessarily an either or, they can be >>> complementary. >>> >> >> Let me try to move the discussion from the github issue here (this >> may >> not be the best place). (https://github.com/numpy/numpy/issues/14441 >> which asked for easier creation functions together with >> `__array_function__`). >> >> I think an important note mentioned here is how users interact with >> unumpy, vs. __array_function__. The former is an explicit opt-in, >> while >> the latter is implicit choice based on an `array-like` abstract base >> class and functional type based dispatching. >> >> To quote NEP 18 on this: "The downsides are that this would require >> an >> explicit opt-in from all existing code, e.g., import numpy.api as >> np, >> and in the long term would result in the maintenance of two separate >> NumPy APIs. Also, many functions from numpy itself are already >> overloaded (but inadequately), so confusion about high vs. low level >> APIs in NumPy would still persist." >> (I do think this is a point we should not just ignore, `uarray` is a >> thin layer, but it has a big surface area) >> >> Now there are things where explicit opt-in is obvious. And the FFT >> example is one of those, there is no way to implicitly choose >> another >> backend (except by just replacing it, i.e. monkeypatching) [1]. And >> right now I think these are _very_ different. >> >> Now for the end-users choosing one array-like over another, seems >> nicer >> as an implicit mechanism (why should I not mix sparse, dask and >> numpy >> arrays!?). This is the promise `__array_function__` tries to make. >> Unless convinced otherwise, my guess is that most library authors >> would >> strive for implicit support (i.e. sklearn, skimage, scipy). >> >> Circling back to creation and coercion. In a purely Object type >> system, >> these would be classmethods, I guess, but in NumPy and the libraries >> above, we are lost. >> >> Solution 1: Create explicit opt-in, e.g. through uarray. (NEP-31) >> * Required end-user opt-in. > >> * Seems cleaner in many ways >> * Requires a full copy of the API. > > bullet 1 and 3 are not required. if we decide to make it default, then > there's no separate namespace It does require explicit opt-in to have any benefits to the user. > >> Solution 2: Add some coercion "protocol" (NEP-30) and expose a way >> to >> create new arrays more conveniently. This would practically mean >> adding >> an `array_type=np.ndarray` argument. >> * _Not_ used by end-users! End users should use dask.linspace! >> * Adds "strange" API somewhere in numpy, and possible a new >> "protocol" (additionally to coercion).[2] >> >> I still feel these solve different issues. The second one is >> intended >> to make array likes work implicitly in libraries (without end users >> having to do anything). While the first seems to force the end user >> to >> opt in, sometimes unnecessarily: >> >> def my_library_func(array_like): >> exp = np.exp(array_like) >> idx = np.arange(len(exp)) >> return idx, exp >> >> Would have all the information for implicit opt-in/Array-like >> support, >> but cannot do it right now. > > Can you explain this a bit more? `len(exp)` is a number, so > `np.arange(number)` doesn't really have any information here. > Right, but as a library author, I want a way a way to make it use the same type as `array_like` in this particular function, that is the point! The end-user already signaled they prefer say dask, due to the array that was actually passed in. (but this is just repeating what is below I think). >> This is what I have been wondering, if >> uarray/unumpy, can in some way help me make this work (even >> _without_ >> the end user opting in). > > good question. if that needs to work in the absence of the user doing > anything, it should be something like > > with unumpy.determine_backend(exp): > unumpy.arange(len(exp)) # or np.arange if we make unumpy default > > to get the equivalent to `np.arange_like(len(exp), array_type=exp)`. > > Note, that `determine_backend` thing doesn't exist today. > Exactly, that is what I have been wondering about, there may be more issues around that. If it existed, we may be able to solve the implicit library usage by making libraries use unumpy (or similar). Although, at that point we half replace `__array_function__` maybe. However, the main point is that without such a functionality, NEP 30 and NEP 31 seem to solve slightly different issues with respect to how they interact with the end-user (opt in)? We may decide that we do not want to solve the library users issue of wanting to support implicit opt-in for array like inputs because it is a rabbit hole. But we may need to discuss/argue a bit more that it really is a deep enough rabbit hole that it is not worth the trouble. >> The reason is that simply, right now I am very >> clear on the need for this use case, but not sure about the need for >> end user opt in, since end users can just use dask.arange(). > > I don't get the last part. The arange is inside a library function, so > a user can't just go in and change things there. A "user" here means "end user". An end user writes a script, and they can easily change `arr = np.linspace(10)` to `arr = dask.linspace(10)`, or more likely just use one within one script and the other within another script, while both use the same sklearn functions. (Although using a backend switching may be nicer in some contexts) A library provider (library user of unumpy/numpy) of course cannot just use dask conveniently, unless they write their own `guess_numpy_like_module()` function first. > Cheers, > > Ralf > >> Cheers, >> >> Sebastian >> >> [1] To be honest, I do think a lot of the "issues" around >> monkeypatching exists just as much with backend choosing, the main >> difference seems to me that a lot of that: >> 1. monkeypatching was not done explicit >> (import mkl_fft; mkl_fft.monkeypatch_numpy())? >> 2. A backend system allows libaries to prefer one locally? >> (which I think is a big advantage) >> >> [2] There are the options of adding `linspace_like` functions >> somewhere >> in a numpy submodule, or adding `linspace(..., >> array_type=np.ndarray)`, >> or simply inventing a new "protocl" (which is not really a >> protocol?), >> and make it `ndarray.__numpy_like_creation_functions__.arange()`. >> >>> Actually, after writing this I just realized something. With >> 1.17.x >>> we have: >>> >>> ``` >>> In [1]: import dask.array as da >> >>> >>> >>> In [2]: d = da.from_array(np.linspace(0, 1)) >> >>> >>> >>> In [3]: np.fft.fft(d) >> >>> >>> Out[3]: dask.array>> chunksize=(50,)> >>> ``` >>> >>> In Anaconda `np.fft.fft` *is* `mkl_fft._numpy_fft.fft`, so this >> won't >>> work. We have no bug report yet because 1.17.x hasn't landed in >> conda >>> defaults yet (perhaps this is a/the reason why?), but it will be a >>> problem. >>> >>>> The import numpy.overridable part is meant to help garner >> adoption, >>>> and to prefer the unumpy module if it is available (which will >>>> continue to be developed separately). That way it isn't so >> tightly >>>> coupled to the release cycle. One alternative Sebastian Berg >>>> mentioned (and I am on board with) is just moving unumpy into >> the >>>> NumPy organisation. What we fear keeping it separate is that the >>>> simple act of a pip install unumpy will keep people from using >> it >>>> or trying it out. >>>> >>> Note that this is not the most critical aspect. I pushed for >>> vendoring as numpy.overridable because I want to not derail the >>> comparison with NEP 30 et al. with a "should we add a dependency" >>> discussion. The interesting part to decide on first is: do we need >>> the unumpy override mechanism? Vendoring opt-in vs. making it >> default >>> vs. adding a dependency is of secondary interest right now. >>> >>> Cheers, >>> Ralf >>> >>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion From ralf.gommers at gmail.com Sat Sep 7 17:49:07 2019 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 7 Sep 2019 14:49:07 -0700 Subject: [Numpy-discussion] =?utf-8?q?NEP_31_=E2=80=94_Context-local_and_?= =?utf-8?q?global_overrides_of_the_NumPy_API?= In-Reply-To: References: <67b837e7-5d3d-337e-49f7-cac078ec4d8f@gmail.com> Message-ID: On Sat, Sep 7, 2019 at 2:18 PM sebastian wrote: > On 2019-09-07 15:33, Ralf Gommers wrote: > > On Sat, Sep 7, 2019 at 1:07 PM Sebastian Berg > > wrote: > > > >> On Fri, 2019-09-06 at 14:45 -0700, Ralf Gommers wrote: > >>> > >>> > >> > >> > >>>> That's part of it. The concrete problems it's solving are > >>>> threefold: > >>>> Array creation functions can be overridden. > >>>> Array coercion is now covered. > >>>> "Default implementations" will allow you to re-write your NumPy > >>>> array more easily, when such efficient implementations exist in > >>>> terms of other NumPy functions. That will also help achieve > >> similar > >>>> semantics, but as I said, they're just "default"... > >>>> > >>> > >>> There may be another very concrete one (that's not yet in the > >> NEP): > >>> allowing other libraries that consume ndarrays to use overrides. > >> An > >>> example is numpy.fft: currently both mkl_fft and pyfftw > >> monkeypatch > >>> NumPy, something we don't like all that much (in particular for > >>> mkl_fft, because it's the default in Anaconda). > >> `__array_function__` > >>> isn't able to help here, because it will always choose NumPy's own > >>> implementation for ndarray input. With unumpy you can support > >>> multiple libraries that consume ndarrays. > >>> > >>> Another example is einsum: if you want to use opt_einsum for all > >>> inputs (including ndarrays), then you cannot use np.einsum. And > >> yet > >>> another is using bottleneck ( > >>> https://kwgoodman.github.io/bottleneck-doc/reference.html) for > >> nan- > >>> functions and partition. There's likely more of these. > >>> > >>> The point is: sometimes the array protocols are preferred (e.g. > >>> Dask/Xarray-style meta-arrays), sometimes unumpy-style dispatch > >> works > >>> better. It's also not necessarily an either or, they can be > >>> complementary. > >>> > >> > >> Let me try to move the discussion from the github issue here (this > >> may > >> not be the best place). (https://github.com/numpy/numpy/issues/14441 > >> which asked for easier creation functions together with > >> `__array_function__`). > >> > >> I think an important note mentioned here is how users interact with > >> unumpy, vs. __array_function__. The former is an explicit opt-in, > >> while > >> the latter is implicit choice based on an `array-like` abstract base > >> class and functional type based dispatching. > >> > >> To quote NEP 18 on this: "The downsides are that this would require > >> an > >> explicit opt-in from all existing code, e.g., import numpy.api as > >> np, > >> and in the long term would result in the maintenance of two separate > >> NumPy APIs. Also, many functions from numpy itself are already > >> overloaded (but inadequately), so confusion about high vs. low level > >> APIs in NumPy would still persist." > >> (I do think this is a point we should not just ignore, `uarray` is a > >> thin layer, but it has a big surface area) > >> > >> Now there are things where explicit opt-in is obvious. And the FFT > >> example is one of those, there is no way to implicitly choose > >> another > >> backend (except by just replacing it, i.e. monkeypatching) [1]. And > >> right now I think these are _very_ different. > >> > >> Now for the end-users choosing one array-like over another, seems > >> nicer > >> as an implicit mechanism (why should I not mix sparse, dask and > >> numpy > >> arrays!?). This is the promise `__array_function__` tries to make. > >> Unless convinced otherwise, my guess is that most library authors > >> would > >> strive for implicit support (i.e. sklearn, skimage, scipy). > >> > >> Circling back to creation and coercion. In a purely Object type > >> system, > >> these would be classmethods, I guess, but in NumPy and the libraries > >> above, we are lost. > >> > >> Solution 1: Create explicit opt-in, e.g. through uarray. (NEP-31) > >> * Required end-user opt-in. > > > >> * Seems cleaner in many ways > >> * Requires a full copy of the API. > > > > bullet 1 and 3 are not required. if we decide to make it default, then > > there's no separate namespace > > It does require explicit opt-in to have any benefits to the user. > > > > >> Solution 2: Add some coercion "protocol" (NEP-30) and expose a way > >> to > >> create new arrays more conveniently. This would practically mean > >> adding > >> an `array_type=np.ndarray` argument. > >> * _Not_ used by end-users! End users should use dask.linspace! > >> * Adds "strange" API somewhere in numpy, and possible a new > >> "protocol" (additionally to coercion).[2] > >> > >> I still feel these solve different issues. The second one is > >> intended > >> to make array likes work implicitly in libraries (without end users > >> having to do anything). While the first seems to force the end user > >> to > >> opt in, sometimes unnecessarily: > >> > >> def my_library_func(array_like): > >> exp = np.exp(array_like) > >> idx = np.arange(len(exp)) > >> return idx, exp > >> > >> Would have all the information for implicit opt-in/Array-like > >> support, > >> but cannot do it right now. > > > > Can you explain this a bit more? `len(exp)` is a number, so > > `np.arange(number)` doesn't really have any information here. > > > > Right, but as a library author, I want a way a way to make it use the > same type as `array_like` in this particular function, that is the > point! The end-user already signaled they prefer say dask, due to the > array that was actually passed in. (but this is just repeating what is > below I think). > Okay, you meant conceptually:) > >> This is what I have been wondering, if > >> uarray/unumpy, can in some way help me make this work (even > >> _without_ > >> the end user opting in). > > > > good question. if that needs to work in the absence of the user doing > > anything, it should be something like > > > > with unumpy.determine_backend(exp): > > unumpy.arange(len(exp)) # or np.arange if we make unumpy default > > > > to get the equivalent to `np.arange_like(len(exp), array_type=exp)`. > > > > Note, that `determine_backend` thing doesn't exist today. > > > > Exactly, that is what I have been wondering about, there may be more > issues around that. > If it existed, we may be able to solve the implicit library usage by > making libraries use > unumpy (or similar). Although, at that point we half replace > `__array_function__` maybe. > I don't really think so. Libraries can/will still use __array_function__ for most functionality, and just add a `with determine_backend` for the places where __array_function__ doesn't work. > However, the main point is that without such a functionality, NEP 30 and > NEP 31 seem to solve slightly > different issues with respect to how they interact with the end-user > (opt in)? > Yes, I agree with that. Cheers, Ralf > > We may decide that we do not want to solve the library users issue of > wanting to support implicit > opt-in for array like inputs because it is a rabbit hole. But we may > need to discuss/argue a bit > more that it really is a deep enough rabbit hole that it is not worth > the trouble. > > >> The reason is that simply, right now I am very > >> clear on the need for this use case, but not sure about the need for > >> end user opt in, since end users can just use dask.arange(). > > > > I don't get the last part. The arange is inside a library function, so > > a user can't just go in and change things there. > > A "user" here means "end user". An end user writes a script, and they > can easily change > `arr = np.linspace(10)` to `arr = dask.linspace(10)`, or more likely > just use one within one > script and the other within another script, while both use the same > sklearn functions. > (Although using a backend switching may be nicer in some contexts) > > A library provider (library user of unumpy/numpy) of course cannot just > use dask conveniently, > unless they write their own `guess_numpy_like_module()` function first. > > > > Cheers, > > > > Ralf > > > >> Cheers, > >> > >> Sebastian > >> > >> [1] To be honest, I do think a lot of the "issues" around > >> monkeypatching exists just as much with backend choosing, the main > >> difference seems to me that a lot of that: > >> 1. monkeypatching was not done explicit > >> (import mkl_fft; mkl_fft.monkeypatch_numpy())? > >> 2. A backend system allows libaries to prefer one locally? > >> (which I think is a big advantage) > >> > >> [2] There are the options of adding `linspace_like` functions > >> somewhere > >> in a numpy submodule, or adding `linspace(..., > >> array_type=np.ndarray)`, > >> or simply inventing a new "protocl" (which is not really a > >> protocol?), > >> and make it `ndarray.__numpy_like_creation_functions__.arange()`. > >> > >>> Actually, after writing this I just realized something. With > >> 1.17.x > >>> we have: > >>> > >>> ``` > >>> In [1]: import dask.array as da > >> > >>> > >>> > >>> In [2]: d = da.from_array(np.linspace(0, 1)) > >> > >>> > >>> > >>> In [3]: np.fft.fft(d) > >> > >>> > >>> Out[3]: dask.array >>> chunksize=(50,)> > >>> ``` > >>> > >>> In Anaconda `np.fft.fft` *is* `mkl_fft._numpy_fft.fft`, so this > >> won't > >>> work. We have no bug report yet because 1.17.x hasn't landed in > >> conda > >>> defaults yet (perhaps this is a/the reason why?), but it will be a > >>> problem. > >>> > >>>> The import numpy.overridable part is meant to help garner > >> adoption, > >>>> and to prefer the unumpy module if it is available (which will > >>>> continue to be developed separately). That way it isn't so > >> tightly > >>>> coupled to the release cycle. One alternative Sebastian Berg > >>>> mentioned (and I am on board with) is just moving unumpy into > >> the > >>>> NumPy organisation. What we fear keeping it separate is that the > >>>> simple act of a pip install unumpy will keep people from using > >> it > >>>> or trying it out. > >>>> > >>> Note that this is not the most critical aspect. I pushed for > >>> vendoring as numpy.overridable because I want to not derail the > >>> comparison with NEP 30 et al. with a "should we add a dependency" > >>> discussion. The interesting part to decide on first is: do we need > >>> the unumpy override mechanism? Vendoring opt-in vs. making it > >> default > >>> vs. adding a dependency is of secondary interest right now. > >>> > >>> Cheers, > >>> Ralf > >>> > >>> > >>> > >>> _______________________________________________ > >>> NumPy-Discussion mailing list > >>> NumPy-Discussion at python.org > >>> https://mail.python.org/mailman/listinfo/numpy-discussion > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> NumPy-Discussion at python.org > >> https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sat Sep 7 19:15:50 2019 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 7 Sep 2019 16:15:50 -0700 Subject: [Numpy-discussion] =?utf-8?q?NEP_31_=E2=80=94_Context-local_and_?= =?utf-8?q?global_overrides_of_the_NumPy_API?= In-Reply-To: References: <67b837e7-5d3d-337e-49f7-cac078ec4d8f@gmail.com> Message-ID: On Fri, Sep 6, 2019 at 11:04 PM Ralf Gommers wrote: > Vendoring means "include the code". So no dependency on an external package. If we don't vendor, it's going to be either unused, or end up as a dependency for the whole SciPy/PyData stack. If we vendor it then it also ends up as a dependency for the whole SciPy/PyData stack... > Actually, now that we've discussed the fft issue, I'd suggest to change the NEP to: vendor, and make default for fft, random, and linalg. There's no way we can have an effective discussion of duck arrays, fft backends, random backends, and linalg backends all at once in a single thread. Can you write separate NEPs for each of these? Some questions I'd like to see addressed: For fft: - fft is an entirely self-contained operation, with no interactions with the rest of the system; the only difference between implementations is speed. What problems are caused by monkeypatching, and how is uarray materially different from monkeypatching? For random: - I thought the new random implementation with pluggable generators etc. was supposed to solve this problem already. Why doesn't it? - The biggest issue with MKL monkeypatching random is that it breaks stream stability. How does the uarray approach address this? For linalg: - linalg already support __array_ufunc__ for overrides. Why do we need a second override system? Isn't that redundant? -n -- Nathaniel J. Smith -- https://vorpus.org From ralf.gommers at gmail.com Sat Sep 7 20:07:32 2019 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 7 Sep 2019 17:07:32 -0700 Subject: [Numpy-discussion] =?utf-8?q?NEP_31_=E2=80=94_Context-local_and_?= =?utf-8?q?global_overrides_of_the_NumPy_API?= In-Reply-To: References: <67b837e7-5d3d-337e-49f7-cac078ec4d8f@gmail.com> Message-ID: On Sat, Sep 7, 2019 at 4:16 PM Nathaniel Smith wrote: > On Fri, Sep 6, 2019 at 11:04 PM Ralf Gommers > wrote: > > Vendoring means "include the code". So no dependency on an external > package. If we don't vendor, it's going to be either unused, or end up as a > dependency for the whole SciPy/PyData stack. > > If we vendor it then it also ends up as a dependency for the whole > SciPy/PyData stack... > It seems you're just using an unusual definition here. Dependency == a package you have to install, is present in pyproject.toml/install_requires, shows up in https://github.com/numpy/numpy/network/dependencies, etc. > > Actually, now that we've discussed the fft issue, I'd suggest to change > the NEP to: vendor, and make default for fft, random, and linalg. > > There's no way we can have an effective discussion of duck arrays, fft > backends, random backends, and linalg backends all at once in a single > thread. > > Can you write separate NEPs for each of these? Some questions I'd like > to see addressed: > > For fft: > - fft is an entirely self-contained operation, with no interactions > with the rest of the system; the only difference between > implementations is speed. What problems are caused by monkeypatching, > It was already explained in this thread, it's been on our roadmap for ~2 years at least, and monkeypatching is pretty much universally understood to be bad. If that's not enough, please search the NumPy issues for "monkeypatching". You'll find issues like https://github.com/numpy/numpy/issues/12374#issuecomment-438725645. At the moment this is very confusing, and hard to diagnose - you have to install a whole new NumPy and then find that the problem is gone (or not). Being able to switch backends in one line of code and re-test would be very valuable. It seems perhaps more useful to have a call so we can communicate with higher bandwidth, rather than lots of writing new NEPs here? In preparation, we need to write up in more detail how __array_function__ and unumpy fit together, rather than treat different pieces all separately (because the problems and pros/cons really won't change much between functions and submodules). I'll defer answering your other questions till that's done, so the discussion is hopefully a bit more structured. Cheers, Ralf and how is uarray materially different from monkeypatching? > > For random: > - I thought the new random implementation with pluggable generators > etc. was supposed to solve this problem already. Why doesn't it? > - The biggest issue with MKL monkeypatching random is that it breaks > stream stability. How does the uarray approach address this? > > For linalg: > - linalg already support __array_ufunc__ for overrides. Why do we need > a second override system? Isn't that redundant? > > -n > > -- > Nathaniel J. Smith -- https://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sun Sep 8 01:40:46 2019 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 7 Sep 2019 22:40:46 -0700 Subject: [Numpy-discussion] shipping sdists without generated C sources from Cython code Message-ID: Hi all, There are several open issues about people not being able to compile the latest release with Python 3.8 betas due to our release containing generated C code with a too old version of Cython. This happened for Python 3.7 as well. With the Python packaging system having improved that build dependencies are no longer insane, I think we should stop shipping the generated C sources. We've discussed this a couple of times before on GitHub, but I've now opened a PR for this (https://github.com/numpy/numpy/pull/14453) so I thought it would be good to mention here in case anyone sees an issue with doing this. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sun Sep 8 02:44:30 2019 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 7 Sep 2019 23:44:30 -0700 Subject: [Numpy-discussion] =?utf-8?q?NEP_31_=E2=80=94_Context-local_and_?= =?utf-8?q?global_overrides_of_the_NumPy_API?= In-Reply-To: References: <67b837e7-5d3d-337e-49f7-cac078ec4d8f@gmail.com> Message-ID: On Sat, Sep 7, 2019 at 5:08 PM Ralf Gommers wrote: > > On Sat, Sep 7, 2019 at 4:16 PM Nathaniel Smith wrote: >> >> On Fri, Sep 6, 2019 at 11:04 PM Ralf Gommers wrote: >> > Vendoring means "include the code". So no dependency on an external package. If we don't vendor, it's going to be either unused, or end up as a dependency for the whole SciPy/PyData stack. >> >> If we vendor it then it also ends up as a dependency for the whole >> SciPy/PyData stack... > > > It seems you're just using an unusual definition here. Dependency == a package you have to install, is present in pyproject.toml/install_requires, shows up in https://github.com/numpy/numpy/network/dependencies, etc. That's a pretty trivial definition though. Surely the complexity of the installed code and its maintainer structure is what matters, not the exact details of how the install happens. >> > Actually, now that we've discussed the fft issue, I'd suggest to change the NEP to: vendor, and make default for fft, random, and linalg. >> >> There's no way we can have an effective discussion of duck arrays, fft >> backends, random backends, and linalg backends all at once in a single >> thread. >> >> Can you write separate NEPs for each of these? Some questions I'd like >> to see addressed: >> >> For fft: >> - fft is an entirely self-contained operation, with no interactions >> with the rest of the system; the only difference between >> implementations is speed. What problems are caused by monkeypatching, > > > It was already explained in this thread, it's been on our roadmap for ~2 years at least, and monkeypatching is pretty much universally understood to be bad. If that's not enough, please search the NumPy issues for "monkeypatching". You'll find issues like https://github.com/numpy/numpy/issues/12374#issuecomment-438725645. At the moment this is very confusing, and hard to diagnose - you have to install a whole new NumPy and then find that the problem is gone (or not). Being able to switch backends in one line of code and re-test would be very valuable. Sure, it's not meant a trick question, I'm just saying you should write down the reasons and how you solve them in one place. Maybe some of the reasons monkeypatching is bad don't apply here, or maybe some of them do apply, but uarray doesn't solve them ? we can't tell without doing the work. The link you gave doesn't involve monkeypatching or np.fft, so I'm not sure how it's relevant...? > It seems perhaps more useful to have a call so we can communicate with higher bandwidth, rather than lots of writing new NEPs here? In preparation, we need to write up in more detail how __array_function__ and unumpy fit together, rather than treat different pieces all separately (because the problems and pros/cons really won't change much between functions and submodules). I'll defer answering your other questions till that's done, so the discussion is hopefully a bit more structured. I don't have a lot of time for calls, and you'd still have to write it up for everyone who isn't on the call... -n -- Nathaniel J. Smith -- https://vorpus.org From njs at pobox.com Sun Sep 8 03:53:43 2019 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 8 Sep 2019 00:53:43 -0700 Subject: [Numpy-discussion] =?utf-8?q?NEP_31_=E2=80=94_Context-local_and_?= =?utf-8?q?global_overrides_of_the_NumPy_API?= In-Reply-To: References: Message-ID: On Fri, Sep 6, 2019 at 11:53 AM Ralf Gommers wrote: > On Fri, Sep 6, 2019 at 12:53 AM Nathaniel Smith wrote: >> On Tue, Sep 3, 2019 at 2:04 AM Hameer Abbasi wrote: >> > The fact that we're having to design more and more protocols for a lot >> > of very similar things is, to me, an indicator that we do have holistic >> > problems that ought to be solved by a single protocol. >> >> But the reason we've had trouble designing these protocols is that >> they're each different :-). If it was just a matter of copying >> __array_ufunc__ we'd have been done in a few minutes... > > I don't think that argument is correct. That we now have two very similar protocols is simply a matter of history and limited developer time. NEP 18 discusses in several places that __array_ufunc__ should be brought in line with __array_ufunc__, and that we can migrate a function from one protocol to the other. There's no technical reason other than backwards compat and dev time why we couldn't use __array_function__ for ufuncs also. Huh, that's interesting! Apparently we have a profoundly different understanding of what we're doing here. To me, __array_ufunc__ and __array_function__ are completely different. In fact I'd say __array_ufunc__ is a good idea and __array_function__ is a bad idea, and would definitely not be in favor of combining them together. The key difference is that __array_ufunc__ allows for *generic* implementations. Most duck array libraries can write a single implementation of __array_ufunc__ that works for *all* ufuncs, even new third-party ufuncs that the duck array library has never heard of, because ufuncs all share the same structure of a loop wrapped around a core operation, and they can treat the core operation as a black box. For example: - Dask can split up the operation across its tiled sub-arrays, and then for each tile it invokes the core operation. - xarray can do its label-based axis matching, and then invoke the core operation. - bcolz can loop over the array uncompressing one block at a time, invoking the core operation on each. - sparse arrays can check the ufunc .identity attribute to find out whether 0 is an identity, and if so invoke the operation directly on the non-zero entries; otherwise, it can loop over the array and densify it in blocks and invoke the core operation on each. (It would be useful to have a bit more metadata on the ufunc, so e.g. np.subtract could declare that zero is a right-identity but not a left-identity, but that's a simple enough extension to make at some point.) Result: __array_ufunc__ makes it totally possible to take a ufunc from scipy.special or a random new on created with numba, and have it immediately work on an xarray wrapped around dask wrapped around bcolz, out-of-the-box. That's a clean, generic interface. [1] OTOH, __array_function__ doesn't allow this kind of simplification: if we were using __array_function__ for ufuncs, every library would have to special-case every individual ufunc, which leads to dramatically more work and more potential for bugs. To me, the whole point of interfaces is to reduce coupling. When you have N interacting modules, it's unmaintainable if every change requires considering every N! combination. From this perspective, __array_function__ isn't good, but it is still somewhat constrained: the result of each operation is still determined by the objects involved, nothing else. In this regard, uarray even more extreme than __array_function__, because arbitrary operations can be arbitrarily changed by arbitrarily distant code. It sort of feels like the argument for uarray is: well, designing maintainable interfaces is a lot of work, so forget it, let's just make it easy to monkeypatch everything and call it a day. That said, in my replies in this thread I've been trying to stay productive and focus on narrower concrete issues. I'm pretty sure that __array_function__ and uarray will turn out to be bad ideas and will fail, but that's not a proven fact, it's just an informed guess. And the road that I favor also has lots of risks and uncertainty. So I don't have a problem with trying both as experiments and learning more! But hopefully that explains why it's not at all obvious that uarray solves the protocol design problems we've been talking about. -n [1] There are also some cases that __array_ufunc__ doesn't handle as nicely. One obvious one is that GPU/TPU libraries still need to special-case individual ufuncs. But that's not a limitation of __array_ufunc__, it's a limitation of GPUs ? they can't run CPU code, so they can't use the CPU implementation of the core operations. Another limitation is that __array_ufunc__ is weak at handling operations that involve mixed libraries (e.g. np.add(bcolz_array, sparse_array)) ? to work well, this might require that bcolz have special-case handling for sparse arrays, or vice-versa, so you still potentially have some N**2 special cases, though at least here N is the number of duck array libraries, not the number of ufuncs. I think this is an interesting target for future work. But in general, __array_ufunc__ goes a long way to taming the complexity of interacting libraries and ufuncs. -- Nathaniel J. Smith -- https://vorpus.org From einstein.edison at gmail.com Sun Sep 8 04:03:57 2019 From: einstein.edison at gmail.com (Hameer Abbasi) Date: Sun, 8 Sep 2019 10:03:57 +0200 Subject: [Numpy-discussion] =?utf-8?q?NEP_31_=E2=80=94_Context-local_and_?= =?utf-8?q?global_overrides_of_the_NumPy_API?= In-Reply-To: References: Message-ID: On 08.09.19 09:53, Nathaniel Smith wrote: > On Fri, Sep 6, 2019 at 11:53 AM Ralf Gommers > wrote: >> On Fri, Sep 6, 2019 at 12:53 AM Nathaniel Smith wrote: >>> On Tue, Sep 3, 2019 at 2:04 AM Hameer Abbasi >>> wrote: >>>> The fact that we're having to design more and more protocols for a lot >>>> of very similar things is, to me, an indicator that we do have >>>> holistic >>>> problems that ought to be solved by a single protocol. >>> But the reason we've had trouble designing these protocols is that >>> they're each different . If it was just a matter of copying >>> __array_ufunc__ we'd have been done in a few minutes... >> I don't think that argument is correct. That we now have two very >> similar protocols is simply a matter of history and limited developer >> time. NEP 18 discusses in several places that __array_ufunc__ should >> be brought in line with __array_ufunc__, and that we can migrate a >> function from one protocol to the other. There's no technical reason >> other than backwards compat and dev time why we couldn't use >> __array_function__ for ufuncs also. > Huh, that's interesting! Apparently we have a profoundly different > understanding of what we're doing here. To me, __array_ufunc__ and > __array_function__ are completely different. In fact I'd say > __array_ufunc__ is a good idea and __array_function__ is a bad idea, > and would definitely not be in favor of combining them together. > > The key difference is that __array_ufunc__ allows for *generic* > implementations. Most duck array libraries can write a single > implementation of __array_ufunc__ that works for *all* ufuncs, even > new third-party ufuncs that the duck array library has never heard of, > because ufuncs all share the same structure of a loop wrapped around a > core operation, and they can treat the core operation as a black box. > For example: > > - Dask can split up the operation across its tiled sub-arrays, and > then for each tile it invokes the core operation. > - xarray can do its label-based axis matching, and then invoke the > core operation. > - bcolz can loop over the array uncompressing one block at a time, > invoking the core operation on each. > - sparse arrays can check the ufunc .identity attribute to find out > whether 0 is an identity, and if so invoke the operation directly on > the non-zero entries; otherwise, it can loop over the array and > densify it in blocks and invoke the core operation on each. (It would > be useful to have a bit more metadata on the ufunc, so e.g. > np.subtract could declare that zero is a right-identity but not a > left-identity, but that's a simple enough extension to make at some > point.) > > Result: __array_ufunc__ makes it totally possible to take a ufunc from > scipy.special or a random new on created with numba, and have it > immediately work on an xarray wrapped around dask wrapped around > bcolz, out-of-the-box. That's a clean, generic interface. [1] > > OTOH, __array_function__ doesn't allow this kind of simplification: if > we were using __array_function__ for ufuncs, every library would have > to special-case every individual ufunc, which leads to dramatically > more work and more potential for bugs. But uarray does allow this kind of simplification. You would do the following inside a uarray backend: def __ua_function__(func, args, kwargs): ??? with ua.skip_backend(self_backend): ??????? # Do code here, dispatches to everything but This is possible today and is done in the dask backend inside unumpy for example. > > To me, the whole point of interfaces is to reduce coupling. When you > have N interacting modules, it's unmaintainable if every change > requires considering every N! combination. From this perspective, > __array_function__ isn't good, but it is still somewhat constrained: > the result of each operation is still determined by the objects > involved, nothing else. In this regard, uarray even more extreme than > __array_function__, because arbitrary operations can be arbitrarily > changed by arbitrarily distant code. It sort of feels like the > argument for uarray is: well, designing maintainable interfaces is a > lot of work, so forget it, let's just make it easy to monkeypatch > everything and call it a day. > > That said, in my replies in this thread I've been trying to stay > productive and focus on narrower concrete issues. I'm pretty sure that > __array_function__ and uarray will turn out to be bad ideas and will > fail, but that's not a proven fact, it's just an informed guess. And > the road that I favor also has lots of risks and uncertainty. So I > don't have a problem with trying both as experiments and learning > more! But hopefully that explains why it's not at all obvious that > uarray solves the protocol design problems we've been talking about. > > -n > > [1] There are also some cases that __array_ufunc__ doesn't handle as > nicely. One obvious one is that GPU/TPU libraries still need to > special-case individual ufuncs. But that's not a limitation of > __array_ufunc__, it's a limitation of GPUs ? they can't run CPU code, > so they can't use the CPU implementation of the core operations. > Another limitation is that __array_ufunc__ is weak at handling > operations that involve mixed libraries (e.g. np.add(bcolz_array, > sparse_array)) ? to work well, this might require that bcolz have > special-case handling for sparse arrays, or vice-versa, so you still > potentially have some N**2 special cases, though at least here N is > the number of duck array libraries, not the number of ufuncs. I think > this is an interesting target for future work. But in general, > __array_ufunc__ goes a long way to taming the complexity of > interacting libraries and ufuncs. > > -- > Nathaniel J. Smith -- https://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sun Sep 8 04:56:15 2019 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 8 Sep 2019 01:56:15 -0700 Subject: [Numpy-discussion] =?utf-8?q?NEP_31_=E2=80=94_Context-local_and_?= =?utf-8?q?global_overrides_of_the_NumPy_API?= In-Reply-To: References: Message-ID: On Sun, Sep 8, 2019 at 1:04 AM Hameer Abbasi wrote: > > On 08.09.19 09:53, Nathaniel Smith wrote: >> OTOH, __array_function__ doesn't allow this kind of simplification: if >> we were using __array_function__ for ufuncs, every library would have >> to special-case every individual ufunc, which leads to dramatically >> more work and more potential for bugs. > > But uarray does allow this kind of simplification. You would do the following inside a uarray backend: > > def __ua_function__(func, args, kwargs): > with ua.skip_backend(self_backend): > # Do code here, dispatches to everything but You can dispatch to the underlying operation, sure, but you can't implement a generic ufunc loop because you don't know that 'func' is actually a bound ufunc method, or have any way to access the underlying ufunc object. (E.g. consider the case where 'func' is 'np.add.reduce'.) The critical part of my example was that it's a new ufunc that none of these libraries have ever heard of before. Ufuncs have lot of consistent structure beyond what generic Python callables have, and the whole point of __array_ufunc__ is that implementors can rely on that structure. You get to work at a higher level of abstraction. A similar but simpler example would be the protocol we've sketched out for concatenation: the idea would be to capture the core similarity between np.concatenate/np.hstack/np.vstack/np.dstack/np.column_stack/np.row_stack/any other variants, so that implementors only have to worry about the higher-level concept of "concatenation" rather than the raw APIs of all those individual functions. -n -n -- Nathaniel J. Smith -- https://vorpus.org From einstein.edison at gmail.com Sun Sep 8 05:05:32 2019 From: einstein.edison at gmail.com (Hameer Abbasi) Date: Sun, 8 Sep 2019 11:05:32 +0200 Subject: [Numpy-discussion] =?utf-8?q?NEP_31_=E2=80=94_Context-local_and_?= =?utf-8?q?global_overrides_of_the_NumPy_API?= In-Reply-To: References: Message-ID: On 08.09.19 10:56, Nathaniel Smith wrote: > On Sun, Sep 8, 2019 at 1:04 AM Hameer Abbasi wrote: >> On 08.09.19 09:53, Nathaniel Smith wrote: >>> OTOH, __array_function__ doesn't allow this kind of simplification: if >>> we were using __array_function__ for ufuncs, every library would have >>> to special-case every individual ufunc, which leads to dramatically >>> more work and more potential for bugs. >> But uarray does allow this kind of simplification. You would do the following inside a uarray backend: >> >> def __ua_function__(func, args, kwargs): >> with ua.skip_backend(self_backend): >> # Do code here, dispatches to everything but > You can dispatch to the underlying operation, sure, but you can't > implement a generic ufunc loop because you don't know that 'func' is > actually a bound ufunc method, or have any way to access the > underlying ufunc object. (E.g. consider the case where 'func' is > 'np.add.reduce'.) The critical part of my example was that it's a new > ufunc that none of these libraries have ever heard of before. > > Ufuncs have lot of consistent structure beyond what generic Python > callables have, and the whole point of __array_ufunc__ is that > implementors can rely on that structure. You get to work at a higher > level of abstraction. > > A similar but simpler example would be the protocol we've sketched out > for concatenation: the idea would be to capture the core similarity > between np.concatenate/np.hstack/np.vstack/np.dstack/np.column_stack/np.row_stack/any > other variants, so that implementors only have to worry about the > higher-level concept of "concatenation" rather than the raw APIs of > all those individual functions. There's a solution for that too: Default implementations. Implement concatenate, and you've got a default implementation for all of those you mentioned. Similarly for transpose/swapaxis/moveaxis and family. > > -n > > -n > From ralf.gommers at gmail.com Sun Sep 8 11:39:38 2019 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sun, 8 Sep 2019 08:39:38 -0700 Subject: [Numpy-discussion] =?utf-8?q?NEP_31_=E2=80=94_Context-local_and_?= =?utf-8?q?global_overrides_of_the_NumPy_API?= In-Reply-To: References: Message-ID: On Sun, Sep 8, 2019 at 12:54 AM Nathaniel Smith wrote: > On Fri, Sep 6, 2019 at 11:53 AM Ralf Gommers > wrote: > > On Fri, Sep 6, 2019 at 12:53 AM Nathaniel Smith wrote: > >> On Tue, Sep 3, 2019 at 2:04 AM Hameer Abbasi > wrote: > >> > The fact that we're having to design more and more protocols for a lot > >> > of very similar things is, to me, an indicator that we do have > holistic > >> > problems that ought to be solved by a single protocol. > >> > >> But the reason we've had trouble designing these protocols is that > >> they're each different :-). If it was just a matter of copying > >> __array_ufunc__ we'd have been done in a few minutes... > > > > I don't think that argument is correct. That we now have two very > similar protocols is simply a matter of history and limited developer time. > NEP 18 discusses in several places that __array_ufunc__ should be brought > in line with __array_ufunc__, and that we can migrate a function from one > protocol to the other. There's no technical reason other than backwards > compat and dev time why we couldn't use __array_function__ for ufuncs also. > > Huh, that's interesting! Apparently we have a profoundly different > understanding of what we're doing here. That is interesting indeed. We should figure this out first - no point discussing a NEP about plugging the gaps in our override system when we don't have a common understanding of why we wanted/needed an override system in the first place. To me, __array_ufunc__ and > __array_function__ are completely different. In fact I'd say > __array_ufunc__ is a good idea and __array_function__ is a bad idea, > It's early days, but "customer feedback" certainly has been more enthusiastic for __array_function__. Also from what I've seen so far it works well. Example: at the SciPy sprints someone put together Xarray plus pydata/sparse to use distributed sparse arrays for visualizing some large genetic (I think) data sets. That was made to work in a single day, with impressively little code. and would definitely not be in favor of combining them together. > I'm not saying we should. But __array_ufunc__ is basically a slight specialization - knowing that the function that was called is a ufunc can be handy but is usually irrelevant. > The key difference is that __array_ufunc__ allows for *generic* > implementations. Implementations of what? Most duck array libraries can write a single > implementation of __array_ufunc__ that works for *all* ufuncs, even > new third-party ufuncs that the duck array library has never heard of, > I see where you're going with this. You are thinking of reusing the ufunc implementation to do a computation. That's a minor use case (imho), and I can't remember seeing it used. The original use case was scipy.sparse matrices. The executive summary of NEP 13 talks about this. It's about calling `np.some_ufunc(other_ndarray_like)` and "handing over control" to that object rather than the numpy function starting to execute. Also note that NEP 13 states in the summary "This covers some of the same ground as Travis Oliphant?s proposal to retro-fit NumPy with multi-methods" (reminds one of uarray....). For scipy.sparse, the layout of the data doesn't make sense to numpy. All that was desired was that the sparse matrix needs to know what function was called, so it can call its own implementation of that function instead. because ufuncs all share the same structure of a loop wrapped around a > core operation, and they can treat the core operation as a black box. > For example: > > - Dask can split up the operation across its tiled sub-arrays, and > then for each tile it invokes the core operation. > Works for __array_function__ too. Note, *not* by explicitly reusing the numpy function. Dask anyway has its own functions that mirror the numpy API. Dask's __array_function__ just does the forwarding to its own functions. Also, a Dask array could be a collection of CuPy arrays, and CuPy implements __array_ufunc__. So explicitly reusing the NumPy ufunc implementation on whatever comes in would be, well, not so nice. - xarray can do its label-based axis matching, and then invoke the > core operation. > Could do this with __array_function__ too - bcolz can loop over the array uncompressing one block at a time, > invoking the core operation on each. > not sure about this one - sparse arrays can check the ufunc .identity attribute this is case where knowing if something is a ufunc helps use a property of it. so there the more specialized nature of __array_ufunc__ helps. Seems niche though, and could probably also be done by checking if a function is an instance of np.ufunc via __array_function__ to find out > whether 0 is an identity, and if so invoke the operation directly on > the non-zero entries; otherwise, it can loop over the array and > densify it in blocks and invoke the core operation on each. (It would > be useful to have a bit more metadata on the ufunc, so e.g. > np.subtract could declare that zero is a right-identity but not a > left-identity, but that's a simple enough extension to make at some > point.) > > Result: __array_ufunc__ makes it totally possible to take a ufunc from > scipy.special or a random new on created with numba, and have it > immediately work on an xarray wrapped around dask wrapped around > bcolz, out-of-the-box. That's a clean, generic interface. [1] > This last point, using third-party ufuncs, is the interesting one to me. They have to be generated with the NumPy ufunc machinery, so the dispatch mechanism is attached to them. We could do third party functions for __array_function__ too, but that would require making @array_function_dispatch public, which we haven't done (yet?). > OTOH, __array_function__ doesn't allow this kind of simplification: if > we were using __array_function__ for ufuncs, every library would have > to special-case every individual ufunc, which leads to dramatically > more work and more potential for bugs. > This all assumes that "reusing the ufunc's implementation" is the one thing that matters. To me that's a small side benefit, which we haven't seen a whole lot of use of in the 2+ years that __array_ufunc__ was available. I think that what (for example) CuPy does - use __array_ufunc__ to simply take over control, is both the major use case and the original motivation for introducing the protocol. > To me, the whole point of interfaces is to reduce coupling. When you > have N interacting modules, it's unmaintainable if every change > requires considering every N! combination. From this perspective, > __array_function__ isn't good, but it is still somewhat constrained: > the result of each operation is still determined by the objects > involved, nothing else. In this regard, uarray even more extreme than > __array_function__, because arbitrary operations can be arbitrarily > changed by arbitrarily distant code. It sort of feels like the > argument for uarray is: well, designing maintainable interfaces is a > lot of work, so forget it, let's just make it easy to monkeypatch > everything and call it a day. > > That said, in my replies in this thread I've been trying to stay > productive and focus on narrower concrete issues. I'm pretty sure that > __array_function__ and uarray will turn out to be bad ideas and will > fail, but that's not a proven fact, it's just an informed guess. And > the road that I favor also has lots of risks and uncertainty. But what is that road, and what do you think the goal is? To me it's: separate our API from our implementation. Yours seems to be "reuse our implementations" for __array_ufunc__, but I can't see how that generalizes beyond ufuncs. So I don't have a problem with trying both as experiments and learning > more! But hopefully that explains why it's not at all obvious that > uarray solves the protocol design problems we've been talking about. > > -n > > [1] There are also some cases that __array_ufunc__ doesn't handle as > nicely. One obvious one is that GPU/TPU libraries still need to > special-case individual ufuncs. But that's not a limitation of > __array_ufunc__, it's a limitation of GPUs I think this is an important point. GPUs are massively popular, and when very likely just continue to grow in importance. So anything we do in this space that says "well it works, just not for GPUs" is probably not going to solve our most pressing problems. ? they can't run CPU code, > so they can't use the CPU implementation of the core operations. > Another limitation is that __array_ufunc__ is weak at handling > operations that involve mixed libraries (e.g. np.add(bcolz_array, > sparse_array)) ? to work well, this might require that bcolz have > special-case handling for sparse arrays, or vice-versa, so you still > potentially have some N**2 special cases, though at least here N is > the number of duck array libraries, not the number of ufuncs. I think > this is an interesting target for future work. But in general, > __array_ufunc__ goes a long way to taming the complexity of > interacting libraries and ufuncs. > With *only* ufuncs you can't create that many interesting applications, you need the other functions too...... Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at gmail.com Sun Sep 8 11:54:43 2019 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Sun, 8 Sep 2019 11:54:43 -0400 Subject: [Numpy-discussion] NEP 32: Remove the financial functions from NumPy In-Reply-To: References: <9067a8f06bc885307d1ec726a55bc5fd906c3c62.camel@sipsolutions.net> Message-ID: On 9/4/19, Matthew Brett wrote: > Hi, > > Maybe worth asking over at the Pandas list? I bet there are more > Python / finance people over there. OK, I sent a message to the PyData mailing list. Warren > > Cheers, > > Matthew > > On Wed, Sep 4, 2019 at 7:11 PM Ilhan Polat wrote: >> >> +1 on removing them from NumPy. I think there are plenty of alternatives >> already so many that we might even consider deprecating them just like >> SciPy misc module by pointing to alternatives. >> >> On Tue, Sep 3, 2019 at 6:38 PM Sebastian Berg >> wrote: >>> >>> On Tue, 2019-09-03 at 08:56 -0400, Warren Weckesser wrote: >>> > Github issue 2880 ("Get financial functions out of main namespace", >>> >>> Very briefly, I am absolutely in favor of this. >>> >>> Keeping the functions in numpy seems more of a liability than help >>> anyone. And this push is more likely to help users by spurring >>> development on a good replacement, than a practically unmaintained >>> corner of NumPy that may seem like it solves a problem, but probably >>> does so very poorly. >>> >>> Moving them into a separate pip installable package seems like the best >>> way forward until a better replacement, to which we can point users, >>> comes up. >>> >>> - Sebastian >>> >>> >>> > https://github.com/numpy/numpy/issues/2880) has been open since 2013. >>> > In a recent community meeting, it was suggested that we create a NEP >>> > to propose the removal of the financial functions from NumPy. I have >>> > submitted "NEP 32: Remove the financial functions from NumPy" in a >>> > pull request at https://github.com/numpy/numpy/pull/14399. A copy of >>> > the latest version of the NEP is below. >>> > >>> > According to the NEP process document, "Once the PR is in place, the >>> > NEP should be announced on the mailing list for discussion (comments >>> > on the PR itself should be restricted to minor editorial and >>> > technical fixes)." This email is the announcement for NEP 32. >>> > >>> > The NEP includes a brief summary of the history of the financial >>> > functions, and has links to several relevant mailing list threads, >>> > dating back to when the functions were added to NumPy in 2008. I >>> > recommend reviewing those threads before commenting here. >>> > >>> > Warren >>> > >>> > ----- >>> > >>> > ================================================== >>> > NEP 32 ? Remove the financial functions from NumPy >>> > ================================================== >>> > >>> > :Author: Warren Weckesser >>> > :Status: Draft >>> > :Type: Standards Track >>> > :Created: 2019-08-30 >>> > >>> > >>> > Abstract >>> > -------- >>> > >>> > We propose deprecating and ultimately removing the financial >>> > functions [1]_ >>> > from NumPy. The functions will be moved to an independent >>> > repository, >>> > and provided to the community as a separate package with the name >>> > ``numpy_financial``. >>> > >>> > >>> > Motivation and scope >>> > -------------------- >>> > >>> > The NumPy financial functions [1]_ are the 10 functions ``fv``, >>> > ``ipmt``, >>> > ``irr``, ``mirr``, ``nper``, ``npv``, ``pmt``, ``ppmt``, ``pv`` and >>> > ``rate``. >>> > The functions provide elementary financial calculations such as >>> > future value, >>> > net present value, etc. These functions were added to NumPy in 2008 >>> > [2]_. >>> > >>> > In May, 2009, a request by Joe Harrington to add a function called >>> > ``xirr`` to >>> > the financial functions triggered a long thread about these functions >>> > [3]_. >>> > One important point that came up in that thread is that a "real" >>> > financial >>> > library must be able to handle real dates. The NumPy financial >>> > functions do >>> > not work with actual dates or calendars. The preference for a more >>> > capable >>> > library independent of NumPy was expressed several times in that >>> > thread. >>> > >>> > In June, 2009, D. L. Goldsmith expressed concerns about the >>> > correctness of the >>> > implementations of some of the financial functions [4]_. It was >>> > suggested then >>> > to move the financial functions out of NumPy to an independent >>> > package. >>> > >>> > In a GitHub issue in 2013 [5]_, Nathaniel Smith suggested moving the >>> > financial >>> > functions from the top-level namespace to ``numpy.financial``. He >>> > also >>> > suggested giving the functions better names. Responses at that time >>> > included >>> > the suggestion to deprecate them and move them from NumPy to a >>> > separate >>> > package. This issue is still open. >>> > >>> > Later in 2013 [6]_, it was suggested on the mailing list that these >>> > functions >>> > be removed from NumPy. >>> > >>> > The arguments for the removal of these functions from NumPy: >>> > >>> > * They are too specialized for NumPy. >>> > * They are not actually useful for "real world" financial >>> > calculations, because >>> > they do not handle real dates and calendars. >>> > * The definition of "correctness" for some of these functions seems >>> > to be a >>> > matter of convention, and the current NumPy developers do not have >>> > the >>> > background to judge their correctness. >>> > * There has been little interest among past and present NumPy >>> > developers >>> > in maintaining these functions. >>> > >>> > The main arguments for keeping the functions in NumPy are: >>> > >>> > * Removing these functions will be disruptive for some users. >>> > Current users >>> > will have to add the new ``numpy_financial`` package to their >>> > dependencies, >>> > and then modify their code to use the new package. >>> > * The functions provided, while not "industrial strength", are >>> > apparently >>> > similar to functions provided by spreadsheets and some >>> > calculators. Having >>> > them available in NumPy makes it easier for some developers to >>> > migrate their >>> > software to Python and NumPy. >>> > >>> > It is clear from comments in the mailing list discussions and in the >>> > GitHub >>> > issues that many current NumPy developers believe the benefits of >>> > removing >>> > the functions outweigh the costs. For example, from [5]_:: >>> > >>> > The financial functions should probably be part of a separate >>> > package >>> > -- Charles Harris >>> > >>> > If there's a better package we can point people to we could just >>> > deprecate >>> > them and then remove them entirely... I'd be fine with that >>> > too... >>> > -- Nathaniel Smith >>> > >>> > +1 to deprecate them. If no other package exists, it can be >>> > created if >>> > someone feels the need for that. >>> > -- Ralf Gommers >>> > >>> > I feel pretty strongly that we should deprecate these. If nobody >>> > on numpy?s >>> > core team is interested in maintaining them, then it is purely a >>> > drag on >>> > development for NumPy. >>> > -- Stephan Hoyer >>> > >>> > And from the 2013 mailing list discussion, about removing the >>> > functions from >>> > NumPy:: >>> > >>> > I am +1 as well, I don't think they should have been included in >>> > the first >>> > place. >>> > -- David Cournapeau >>> > >>> > But not everyone was in favor of removal:: >>> > >>> > The fin routines are tiny and don't require much maintenance once >>> > written. If we made an effort (putting up pages with examples of >>> > common >>> > financial calculations and collecting those under a topical web >>> > page, >>> > then linking to that page from various places and talking it up), >>> > I >>> > would think they could attract users looking for a free way to >>> > play with >>> > financial scenarios. [...] >>> > So, I would say we keep them. If ours are not the best, we >>> > should bring >>> > them up to snuff. >>> > -- Joe Harrington >>> > >>> > For an idea of the maintenance burden of the financial functions, one >>> > can >>> > look for all the GitHub issues [7]_ and pull requests [8]_ that have >>> > the tag >>> > ``component: numpy.lib.financial``. >>> > >>> > One method for measuring the effect of removing these functions is to >>> > find >>> > all the packages on GitHub that use them. Such a search can be >>> > performed >>> > with the ``python-api-inspect`` service [9]_. A search for all uses >>> > of the >>> > NumPy financial functions finds just eight repositories. (See the >>> > comments >>> > in [5]_ for the actual SQL query.) >>> > >>> > >>> > Implementation >>> > -------------- >>> > >>> > * Create a new Python package, ``numpy_financial``, to be maintained >>> > in the >>> > top-level NumPy github organization. This repository will contain >>> > the >>> > definitions and unit tests for the financial functions. The >>> > package will >>> > be added to PyPI so it can be installed with ``pip``. >>> > * Deprecate the financial functions in the ``numpy`` namespace, >>> > beginning in >>> > NumPy version 1.18. Remove the financial functions from NumPy >>> > version 1.20. >>> > >>> > >>> > Backward compatibility >>> > ---------------------- >>> > >>> > The removal of these functions breaks backward compatibility, as >>> > explained >>> > earlier. The effects are mitigated by providing the >>> > ``numpy_financial`` >>> > library. >>> > >>> > >>> > Alternatives >>> > ------------ >>> > >>> > The following alternatives were mentioned in [5]_: >>> > >>> > * *Maintain the functions as they are (i.e. do nothing).* >>> > A review of the history makes clear that this is not the preference >>> > of many >>> > NumPy developers. A recurring comment is that the functions simply >>> > do not >>> > belong in NumPy. When that sentiment is combined with the history >>> > of bug >>> > reports and the ongoing questions about the correctness of the >>> > functions, the >>> > conclusion is that the cleanest solution is deprecation and >>> > removal. >>> > * *Move the functions from the ``numpy`` namespace to >>> > ``numpy.financial``.* >>> > This was the initial suggestion in [5]_. Such a change does not >>> > address the >>> > maintenance issues, and doesn't change the misfit that many >>> > developers see >>> > between these functions and NumPy. It causes disruption for the >>> > current >>> > users of these functions without addressing what many developers >>> > see as the >>> > fundamental problem. >>> > >>> > >>> > Discussion >>> > ---------- >>> > >>> > Links to past mailing list discussions, and to relevant GitHub issues >>> > and pull >>> > requests, have already been given. >>> > >>> > >>> > References and footnotes >>> > ------------------------ >>> > >>> > .. [1] Financial functions, >>> > https://numpy.org/doc/1.17/reference/routines.financial.html >>> > >>> > .. [2] Numpy-discussion mailing list, "Simple financial functions for >>> > NumPy", >>> > >>> > https://mail.python.org/pipermail/numpy-discussion/2008-April/032353.html >>> > >>> > .. [3] Numpy-discussion mailing list, "add xirr to numpy financial >>> > functions?", >>> > >>> > https://mail.python.org/pipermail/numpy-discussion/2009-May/042645.html >>> > >>> > .. [4] Numpy-discussion mailing list, "Definitions of pv, fv, nper, >>> > pmt, and rate", >>> > >>> > https://mail.python.org/pipermail/numpy-discussion/2009-June/043188.html >>> > >>> > .. [5] Get financial functions out of main namespace, >>> > https://github.com/numpy/numpy/issues/2880 >>> > >>> > .. [6] Numpy-discussion mailing list, "Deprecation of financial >>> > routines", >>> > >>> > https://mail.python.org/pipermail/numpy-discussion/2013-August/067409.html >>> > >>> > .. [7] ``component: numpy.lib.financial`` issues, >>> > >>> > https://github.com/numpy/numpy/issues?utf8=%E2%9C%93&q=is%3Aissue+label%3A%22component%3A+numpy.lib.financial%22+ >>> > >>> > .. [8] ``component: numpy.lib.financial`` pull request, >>> > >>> > https://github.com/numpy/numpy/pulls?utf8=%E2%9C%93&q=is%3Apr+label%3A%22component%3A+numpy.lib.financial%22+ >>> > >>> > .. [9] Quansight-Labs/python-api-inspect, >>> > https://github.com/Quansight-Labs/python-api-inspect/ >>> > >>> > >>> > Copyright >>> > --------- >>> > >>> > This document has been placed in the public domain. >>> > >>> > _______________________________________________ >>> > NumPy-Discussion mailing list >>> > NumPy-Discussion at python.org >>> > https://mail.python.org/mailman/listinfo/numpy-discussion >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > From njs at pobox.com Sun Sep 8 21:26:43 2019 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 8 Sep 2019 18:26:43 -0700 Subject: [Numpy-discussion] =?utf-8?q?NEP_31_=E2=80=94_Context-local_and_?= =?utf-8?q?global_overrides_of_the_NumPy_API?= In-Reply-To: References: Message-ID: On Sun, Sep 8, 2019 at 8:40 AM Ralf Gommers wrote: > > > > On Sun, Sep 8, 2019 at 12:54 AM Nathaniel Smith wrote: >> >> On Fri, Sep 6, 2019 at 11:53 AM Ralf Gommers wrote: >> > On Fri, Sep 6, 2019 at 12:53 AM Nathaniel Smith wrote: >> >> On Tue, Sep 3, 2019 at 2:04 AM Hameer Abbasi wrote: >> >> > The fact that we're having to design more and more protocols for a lot >> >> > of very similar things is, to me, an indicator that we do have holistic >> >> > problems that ought to be solved by a single protocol. >> >> >> >> But the reason we've had trouble designing these protocols is that >> >> they're each different :-). If it was just a matter of copying >> >> __array_ufunc__ we'd have been done in a few minutes... >> > >> > I don't think that argument is correct. That we now have two very similar protocols is simply a matter of history and limited developer time. NEP 18 discusses in several places that __array_ufunc__ should be brought in line with __array_ufunc__, and that we can migrate a function from one protocol to the other. There's no technical reason other than backwards compat and dev time why we couldn't use __array_function__ for ufuncs also. >> >> Huh, that's interesting! Apparently we have a profoundly different >> understanding of what we're doing here. > > > That is interesting indeed. We should figure this out first - no point discussing a NEP about plugging the gaps in our override system when we don't have a common understanding of why we wanted/needed an override system in the first place. > >> To me, __array_ufunc__ and >> __array_function__ are completely different. In fact I'd say >> __array_ufunc__ is a good idea and __array_function__ is a bad idea, > > > It's early days, but "customer feedback" certainly has been more enthusiastic for __array_function__. Also from what I've seen so far it works well. Example: at the SciPy sprints someone put together Xarray plus pydata/sparse to use distributed sparse arrays for visualizing some large genetic (I think) data sets. That was made to work in a single day, with impressively little code. Yeah, it's true, and __array_function__ made a bunch of stuff that used to be impossible become possible, I'm not saying it didn't. My prediction is that the longer we live with it, the more limits we'll hit and the more problems we'll have with long-term maintainability. I don't think initial enthusiasm is a good predictor of that either way. >> The key difference is that __array_ufunc__ allows for *generic* >> implementations. > > Implementations of what? Generic in the sense that you can write __array_ufunc__ once and have it work for all ufuncs. >> Most duck array libraries can write a single >> implementation of __array_ufunc__ that works for *all* ufuncs, even >> new third-party ufuncs that the duck array library has never heard of, > > > I see where you're going with this. You are thinking of reusing the ufunc implementation to do a computation. That's a minor use case (imho), and I can't remember seeing it used. I mean, I just looked at dask and xarray, and they're both doing exactly what I said, right now in shipping code. What use cases are you targeting here if you consider dask and xarray out-of-scope? :-) > this is case where knowing if something is a ufunc helps use a property of it. so there the more specialized nature of __array_ufunc__ helps. Seems niche though, and could probably also be done by checking if a function is an instance of np.ufunc via __array_function__ Sparse arrays aren't very niche... and the isinstance trick is possible in some cases, but (a) it's relying on an undocumented implementation detail of __array_function__; according to __array_function__'s API contract, you could just as easily get passed the ufunc's __call__ method instead of the object itself, and (b) it doesn't work at all for ufunc methods like reduce, outer, accumulate. These are both show-stoppers IMO. > This last point, using third-party ufuncs, is the interesting one to me. They have to be generated with the NumPy ufunc machinery, so the dispatch mechanism is attached to them. We could do third party functions for __array_function__ too, but that would require making @array_function_dispatch public, which we haven't done (yet?). With __array_function__ it's theoretically possible to do the dispatch on third-party functions, but when someone defines a new function they always have to go update all the duck array libraries to hard-code in some special knowledge of their new function. So in my example, even if we made @array_function_dispatch public, you still couldn't use your nice new numba-created gufunc unless you first convinced dask, xarray, and bcolz to all accept patches to support your new gufunc. With __array_ufunc__, it works out-of-the-box. > But what is that road, and what do you think the goal is? To me it's: separate our API from our implementation. Yours seems to be "reuse our implementations" for __array_ufunc__, but I can't see how that generalizes beyond ufuncs. The road is to define *abstractions* for the operations we expose through our API, so that duck array implementors can work against a contract with well-defined preconditions and postconditions, so they can write code the works reliably even when the surrounding environment changes. That's the only way to keep things maintainable AFAICT. If the API contract is just a vague handwave at the numpy API, then no-one knows which details actually matter, it's impossible to test, implementations will inevitably end up with subtle long-standing bugs, and literally any change in numpy could potentially break duck array users, we don't know. So my motivation is that I like testing, I don't like bugs, and I like being able to maintain things with confidence :-). The principles are much more general than ufuncs; that's just a pertinent example. > I think this is an important point. GPUs are massively popular, and when very likely just continue to grow in importance. So anything we do in this space that says "well it works, just not for GPUs" is probably not going to solve our most pressing problems. I'm not saying "__array_ufunc__ doesn't work for GPUs". I'm saying that when it comes to GPUs, there's an upper bound for how good you can hope to do, and __array_ufunc__ achieves that upper bound. So does __array_function__. So if we only care about GPUs, they're about equally good. But if we also care about dask and xarray and compressed storage and sparse storage and ... then __array_ufunc__ is strictly superior in those cases. So replacing __array_ufunc__ with __array_function__ would be a major backwards step. -n -- Nathaniel J. Smith -- https://vorpus.org From nathan.goldbaum at gmail.com Sun Sep 8 22:29:18 2019 From: nathan.goldbaum at gmail.com (Nathan) Date: Sun, 8 Sep 2019 20:29:18 -0600 Subject: [Numpy-discussion] =?utf-8?q?NEP_31_=E2=80=94_Context-local_and_?= =?utf-8?q?global_overrides_of_the_NumPy_API?= In-Reply-To: References: Message-ID: On Sun, Sep 8, 2019 at 7:27 PM Nathaniel Smith wrote: > On Sun, Sep 8, 2019 at 8:40 AM Ralf Gommers > wrote: > > > > > > > > On Sun, Sep 8, 2019 at 12:54 AM Nathaniel Smith wrote: > >> > >> On Fri, Sep 6, 2019 at 11:53 AM Ralf Gommers > wrote: > >> > On Fri, Sep 6, 2019 at 12:53 AM Nathaniel Smith > wrote: > >> >> On Tue, Sep 3, 2019 at 2:04 AM Hameer Abbasi < > einstein.edison at gmail.com> wrote: > >> >> > The fact that we're having to design more and more protocols for a > lot > >> >> > of very similar things is, to me, an indicator that we do have > holistic > >> >> > problems that ought to be solved by a single protocol. > >> >> > >> >> But the reason we've had trouble designing these protocols is that > >> >> they're each different :-). If it was just a matter of copying > >> >> __array_ufunc__ we'd have been done in a few minutes... > >> > > >> > I don't think that argument is correct. That we now have two very > similar protocols is simply a matter of history and limited developer time. > NEP 18 discusses in several places that __array_ufunc__ should be brought > in line with __array_ufunc__, and that we can migrate a function from one > protocol to the other. There's no technical reason other than backwards > compat and dev time why we couldn't use __array_function__ for ufuncs also. > >> > >> Huh, that's interesting! Apparently we have a profoundly different > >> understanding of what we're doing here. > > > > > > That is interesting indeed. We should figure this out first - no point > discussing a NEP about plugging the gaps in our override system when we > don't have a common understanding of why we wanted/needed an override > system in the first place. > > > >> To me, __array_ufunc__ and > >> __array_function__ are completely different. In fact I'd say > >> __array_ufunc__ is a good idea and __array_function__ is a bad idea, > > > > > > It's early days, but "customer feedback" certainly has been more > enthusiastic for __array_function__. Also from what I've seen so far it > works well. Example: at the SciPy sprints someone put together Xarray plus > pydata/sparse to use distributed sparse arrays for visualizing some large > genetic (I think) data sets. That was made to work in a single day, with > impressively little code. > > Yeah, it's true, and __array_function__ made a bunch of stuff that > used to be impossible become possible, I'm not saying it didn't. My > prediction is that the longer we live with it, the more limits we'll > hit and the more problems we'll have with long-term maintainability. I > don't think initial enthusiasm is a good predictor of that either way. > > >> The key difference is that __array_ufunc__ allows for *generic* > >> implementations. > > > > Implementations of what? > > Generic in the sense that you can write __array_ufunc__ once and have > it work for all ufuncs. > > >> Most duck array libraries can write a single > >> implementation of __array_ufunc__ that works for *all* ufuncs, even > >> new third-party ufuncs that the duck array library has never heard of, > > > > > > I see where you're going with this. You are thinking of reusing the > ufunc implementation to do a computation. That's a minor use case (imho), > and I can't remember seeing it used. > > I mean, I just looked at dask and xarray, and they're both doing > exactly what I said, right now in shipping code. What use cases are > you targeting here if you consider dask and xarray out-of-scope? :-) > > > this is case where knowing if something is a ufunc helps use a property > of it. so there the more specialized nature of __array_ufunc__ helps. Seems > niche though, and could probably also be done by checking if a function is > an instance of np.ufunc via __array_function__ > > Sparse arrays aren't very niche... and the isinstance trick is > possible in some cases, but (a) it's relying on an undocumented > implementation detail of __array_function__; according to > __array_function__'s API contract, you could just as easily get passed > the ufunc's __call__ method instead of the object itself, and (b) it > doesn't work at all for ufunc methods like reduce, outer, accumulate. > These are both show-stoppers IMO. > > > This last point, using third-party ufuncs, is the interesting one to me. > They have to be generated with the NumPy ufunc machinery, so the dispatch > mechanism is attached to them. We could do third party functions for > __array_function__ too, but that would require making > @array_function_dispatch public, which we haven't done (yet?). > > With __array_function__ it's theoretically possible to do the dispatch > on third-party functions, but when someone defines a new function they > always have to go update all the duck array libraries to hard-code in > some special knowledge of their new function. So in my example, even > if we made @array_function_dispatch public, you still couldn't use > your nice new numba-created gufunc unless you first convinced dask, > xarray, and bcolz to all accept patches to support your new gufunc. > With __array_ufunc__, it works out-of-the-box. > > > But what is that road, and what do you think the goal is? To me it's: > separate our API from our implementation. Yours seems to be "reuse our > implementations" for __array_ufunc__, but I can't see how that generalizes > beyond ufuncs. > > The road is to define *abstractions* for the operations we expose > through our API, so that duck array implementors can work against a > contract with well-defined preconditions and postconditions, so they > can write code the works reliably even when the surrounding > environment changes. That's the only way to keep things maintainable > AFAICT. If the API contract is just a vague handwave at the numpy API, > then no-one knows which details actually matter, it's impossible to > test, implementations will inevitably end up with subtle long-standing > bugs, and literally any change in numpy could potentially break duck > array users, we don't know. So my motivation is that I like testing, I > don't like bugs, and I like being able to maintain things with > confidence :-). The principles are much more general than ufuncs; > that's just a pertinent example. > > > I think this is an important point. GPUs are massively popular, and when > very likely just continue to grow in importance. So anything we do in this > space that says "well it works, just not for GPUs" is probably not going to > solve our most pressing problems. > > I'm not saying "__array_ufunc__ doesn't work for GPUs". I'm saying > that when it comes to GPUs, there's an upper bound for how good you > can hope to do, and __array_ufunc__ achieves that upper bound. So does > __array_function__. So if we only care about GPUs, they're about > equally good. But if we also care about dask and xarray and compressed > storage and sparse storage and ... then __array_ufunc__ is strictly > superior in those cases. So replacing __array_ufunc__ with > __array_function__ would be a major backwards step. One case that hasn?t been brought up in this thread is unit-handling. For example, unyt?s array_ufunc implementation explicitly handles ufuncs and will bail if someone tries to use a ufunc that unyt doesn?t know about. I tried to implement a completely generic solution but ended up concluding I couldn?t do that without silently generating answers with incorrect units. I definitely agree with your analysis that this sort of implementation is error-prone, in fact we just had to do a bugfix release to fix clip suddenly not working now that it?s a ufunc in numpy 1.17. > > -n > > -- > Nathaniel J. Smith -- https://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matti.picus at gmail.com Mon Sep 9 04:41:36 2019 From: matti.picus at gmail.com (Matti Picus) Date: Mon, 9 Sep 2019 11:41:36 +0300 Subject: [Numpy-discussion] Using hypothesis in testing Message-ID: An HTML attachment was scrubbed... URL: From daniel.knuettel at daknuett.eu Mon Sep 9 05:43:33 2019 From: daniel.knuettel at daknuett.eu (Daniel =?ISO-8859-1?Q?Kn=FCttel?=) Date: Mon, 09 Sep 2019 11:43:33 +0200 Subject: [Numpy-discussion] numpy C-API :: use numpy's random number generator in a ufunc Message-ID: <9370f92917dc8b54c3273eacf2fbabe9b28fc090.camel@daknuett.eu> Hi folks, I currently have a project that requires randomness in a ufunc. In order to keep the ufuncs as reproducible as possible I would like to use numpy's random number generator for that; basically because setting the seed will be more intuitive this way. However I cannot find the documentation of the numpy.random C-API (does it have one?). How would one do that? Cheers, -- Daniel Kn?ttel From dsm054 at gmail.com Mon Sep 9 11:07:59 2019 From: dsm054 at gmail.com (D.S. McNeil) Date: Mon, 9 Sep 2019 08:07:59 -0700 (MST) Subject: [Numpy-discussion] NEP 32: Remove the financial functions from NumPy In-Reply-To: References: <9067a8f06bc885307d1ec726a55bc5fd906c3c62.camel@sipsolutions.net> Message-ID: <1568041679303-0.post@n7.nabble.com> [coming over from the pydata post] I just checked about ~150KLOC of our Python code in a financial context, written by about twenty developers over about four years. Almost every function uses numpy, sometimes directly and sometimes via pandas. It seems like these functions were never used anywhere, and the lead dev on one of the projects responded "never used them; didn't even know they exist". I knew they existed, but even on the rare occasion I need the functionality I need better control over the dates, which means for practical purposes I need something which supports Series natively anyhow. As it is, they also clutter up the namespace in unfriendly ways: if there's going to be a top-level function called np.rate I don't think this is the one it should be. Admittedly that's more an argument against their current location. Although it wouldn't be useful for us, I could imagine someone finding a package which provides numpy-compatible versions of the many OpenFormula (or whatever the spec is called) functions helpful. Having numpy carry a tiny subset of them doesn't feel productive. +1 for removing them. Doug -- Sent from: http://numpy-discussion.10968.n7.nabble.com/ From chunwei.yuan at gmail.com Mon Sep 9 17:27:06 2019 From: chunwei.yuan at gmail.com (Chun-Wei Yuan) Date: Mon, 9 Sep 2019 14:27:06 -0700 Subject: [Numpy-discussion] [JOB] Principal Software Engineer position at IHME Message-ID: *The Institute for Health Metrics and Evaluation (IHME) *has an outstanding opportunity for a full-time *Principal Software Engineer *on our Forecasting/Future Health Scenarios (FHS) team*.* The development arm of the team is responsible for the design and implementation of software to support this effort, and the Principal Software Engineer will lead the development work and supervise engineers on that team. IHME?s aim within the FHS portfolio is to create an analytic engine that can model the impact of a wide array of determinants on the trajectory of health outcomes and risks in different countries, projected 25 years into the future, that will allow decision-makers to assess the impact of their potential actions analytically. A recent publication can be found here: https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(18)31694-5/fulltext If you join IHME, you?ll be joining a team of mission-oriented people who are committed to creating a welcoming and diverse workforce that respects and appreciates differences, and embraces collaboration. *Further Information: *See IHME?s website: www.healthdata.org *To Apply and see the whole job description: *Please apply at uw.edu/jobs and search for req 171527 Please direct your questions to Megan at mkmason at uw.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Mon Sep 9 18:19:25 2019 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Mon, 9 Sep 2019 15:19:25 -0700 Subject: [Numpy-discussion] [JOB] Principal Software Engineer position at IHME In-Reply-To: References: Message-ID: On Mon, Sep 9, 2019 at 2:27 PM Chun-Wei Yuan wrote: > *The Institute for Health Metrics and Evaluation (IHME) *has an > outstanding opportunity for a full-time *Principal Software Engineer *on > our Forecasting/Future Health Scenarios (FHS) team*.* The development arm > of the team is responsible for the design and implementation of software to > support this effort, and the Principal Software Engineer will lead the > development work and supervise engineers on that team. IHME?s aim within > the FHS portfolio is to create an analytic engine that can model the impact > of a wide array of determinants on the trajectory of health outcomes and > risks in different countries, projected 25 years into the future, that will > allow decision-makers to assess the impact of their potential actions > analytically. A recent publication can be found here: > https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(18)31694-5/fulltext > > > > If you join IHME, you?ll be joining a team of mission-oriented people who > are committed to creating a welcoming and diverse workforce that respects > and appreciates differences, and embraces collaboration. > > > > *Further Information: *See IHME?s website: www.healthdata.org > > *To Apply and see the whole job description: *Please apply at uw.edu/jobs > > and search for req 171527 > > > Please direct your questions to Megan at mkmason at uw.edu > Hi Chun-Wei, while this seems like an interesting job, it's not clear that it provides an opportunity to contribute back to NumPy or other community projects (that'd be awesome though, and I would encourage you to make that part of this job). For general software job ads (even if they use NumPy), we'd prefer to keep those off this list. Thank you, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From chunwei.yuan at gmail.com Mon Sep 9 19:14:24 2019 From: chunwei.yuan at gmail.com (Chun-Wei Yuan) Date: Mon, 9 Sep 2019 16:14:24 -0700 Subject: [Numpy-discussion] [JOB] Principal Software Engineer position at IHME In-Reply-To: References: Message-ID: I see. Sorry. I think I misinterpreted "It is okay to post job ads for work involving NumPy/SciPy and related packages if you put [JOB] in the subject". Thanks for the clarification. On Mon, Sep 9, 2019 at 3:19 PM Ralf Gommers wrote: > > > On Mon, Sep 9, 2019 at 2:27 PM Chun-Wei Yuan > wrote: > >> *The Institute for Health Metrics and Evaluation (IHME) *has an >> outstanding opportunity for a full-time *Principal Software Engineer *on >> our Forecasting/Future Health Scenarios (FHS) team*.* The development >> arm of the team is responsible for the design and implementation of >> software to support this effort, and the Principal Software Engineer will >> lead the development work and supervise engineers on that team. IHME?s aim >> within the FHS portfolio is to create an analytic engine that can model the >> impact of a wide array of determinants on the trajectory of health outcomes >> and risks in different countries, projected 25 years into the future, that >> will allow decision-makers to assess the impact of their potential actions >> analytically. A recent publication can be found here: >> https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(18)31694-5/fulltext >> >> >> >> If you join IHME, you?ll be joining a team of mission-oriented people who >> are committed to creating a welcoming and diverse workforce that respects >> and appreciates differences, and embraces collaboration. >> >> >> >> *Further Information: *See IHME?s website: www.healthdata.org >> >> *To Apply and see the whole job description: *Please apply at uw.edu/jobs >> >> and search for req 171527 >> >> >> Please direct your questions to Megan at mkmason at uw.edu >> > > Hi Chun-Wei, while this seems like an interesting job, it's not clear that > it provides an opportunity to contribute back to NumPy or other community > projects (that'd be awesome though, and I would encourage you to make that > part of this job). For general software job ads (even if they use NumPy), > we'd prefer to keep those off this list. > > Thank you, > Ralf > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Mon Sep 9 19:26:44 2019 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Mon, 9 Sep 2019 16:26:44 -0700 Subject: [Numpy-discussion] [JOB] Principal Software Engineer position at IHME In-Reply-To: References: Message-ID: On Mon, Sep 9, 2019 at 4:14 PM Chun-Wei Yuan wrote: > I see. Sorry. I think I misinterpreted "It is okay to post job ads for > work involving NumPy/SciPy and related packages if you put [JOB] in the > subject". Thanks for the clarification. > That might be our fault for not updating that page, thanks for pointing that out. That bit of text stems from a time when it was still quite unusual to be able to use NumPy et al. in a job. Luckily these days that's different;) Cheers, Ralf > > On Mon, Sep 9, 2019 at 3:19 PM Ralf Gommers > wrote: > >> >> >> On Mon, Sep 9, 2019 at 2:27 PM Chun-Wei Yuan >> wrote: >> >>> *The Institute for Health Metrics and Evaluation (IHME) *has an >>> outstanding opportunity for a full-time *Principal Software Engineer *on >>> our Forecasting/Future Health Scenarios (FHS) team*.* The development >>> arm of the team is responsible for the design and implementation of >>> software to support this effort, and the Principal Software Engineer will >>> lead the development work and supervise engineers on that team. IHME?s aim >>> within the FHS portfolio is to create an analytic engine that can model the >>> impact of a wide array of determinants on the trajectory of health outcomes >>> and risks in different countries, projected 25 years into the future, that >>> will allow decision-makers to assess the impact of their potential actions >>> analytically. A recent publication can be found here: >>> https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(18)31694-5/fulltext >>> >>> >>> >>> If you join IHME, you?ll be joining a team of mission-oriented people >>> who are committed to creating a welcoming and diverse workforce that >>> respects and appreciates differences, and embraces collaboration. >>> >>> >>> >>> *Further Information: *See IHME?s website: www.healthdata.org >>> >>> *To Apply and see the whole job description: *Please apply at >>> uw.edu/jobs >>> >>> and search for req 171527 >>> >>> >>> Please direct your questions to Megan at mkmason at uw.edu >>> >> >> Hi Chun-Wei, while this seems like an interesting job, it's not clear >> that it provides an opportunity to contribute back to NumPy or other >> community projects (that'd be awesome though, and I would encourage you to >> make that part of this job). For general software job ads (even if they use >> NumPy), we'd prefer to keep those off this list. >> >> Thank you, >> Ralf >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Mon Sep 9 21:27:34 2019 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Mon, 9 Sep 2019 18:27:34 -0700 Subject: [Numpy-discussion] =?utf-8?q?NEP_31_=E2=80=94_Context-local_and_?= =?utf-8?q?global_overrides_of_the_NumPy_API?= In-Reply-To: References: Message-ID: On Sun, Sep 8, 2019 at 6:27 PM Nathaniel Smith wrote: > On Sun, Sep 8, 2019 at 8:40 AM Ralf Gommers > wrote: > > > > > > > > On Sun, Sep 8, 2019 at 12:54 AM Nathaniel Smith wrote: > >> > >> On Fri, Sep 6, 2019 at 11:53 AM Ralf Gommers > wrote: > >> > On Fri, Sep 6, 2019 at 12:53 AM Nathaniel Smith > wrote: > >> >> On Tue, Sep 3, 2019 at 2:04 AM Hameer Abbasi < > einstein.edison at gmail.com> wrote: > >> >> > The fact that we're having to design more and more protocols for a > lot > >> >> > of very similar things is, to me, an indicator that we do have > holistic > >> >> > problems that ought to be solved by a single protocol. > >> >> > >> >> But the reason we've had trouble designing these protocols is that > >> >> they're each different :-). If it was just a matter of copying > >> >> __array_ufunc__ we'd have been done in a few minutes... > >> > > >> > I don't think that argument is correct. That we now have two very > similar protocols is simply a matter of history and limited developer time. > NEP 18 discusses in several places that __array_ufunc__ should be brought > in line with __array_ufunc__, and that we can migrate a function from one > protocol to the other. There's no technical reason other than backwards > compat and dev time why we couldn't use __array_function__ for ufuncs also. > >> > >> Huh, that's interesting! Apparently we have a profoundly different > >> understanding of what we're doing here. > > > > > > That is interesting indeed. We should figure this out first - no point > discussing a NEP about plugging the gaps in our override system when we > don't have a common understanding of why we wanted/needed an override > system in the first place. > > > >> To me, __array_ufunc__ and > >> __array_function__ are completely different. In fact I'd say > >> __array_ufunc__ is a good idea and __array_function__ is a bad idea, > > > > > > It's early days, but "customer feedback" certainly has been more > enthusiastic for __array_function__. Also from what I've seen so far it > works well. Example: at the SciPy sprints someone put together Xarray plus > pydata/sparse to use distributed sparse arrays for visualizing some large > genetic (I think) data sets. That was made to work in a single day, with > impressively little code. > > Yeah, it's true, and __array_function__ made a bunch of stuff that > used to be impossible become possible, I'm not saying it didn't. My > prediction is that the longer we live with it, the more limits we'll > hit and the more problems we'll have with long-term maintainability. I > don't think initial enthusiasm is a good predictor of that either way. > > >> The key difference is that __array_ufunc__ allows for *generic* > >> implementations. > > > > Implementations of what? > > Generic in the sense that you can write __array_ufunc__ once and have > it work for all ufuncs. > > >> Most duck array libraries can write a single > >> implementation of __array_ufunc__ that works for *all* ufuncs, even > >> new third-party ufuncs that the duck array library has never heard of, > > > > > > I see where you're going with this. You are thinking of reusing the > ufunc implementation to do a computation. That's a minor use case (imho), > and I can't remember seeing it used. > > I mean, I just looked at dask and xarray, and they're both doing > exactly what I said, right now in shipping code. What use cases are > you targeting here if you consider dask and xarray out-of-scope? :-) > I don't think that's the interesting part, or even right. When you call `np.cos(dask_array_of_cupy_arrays)`, it certainly will not reuse the NumPy ufunc np.cos. It will call da.cos, and that will in turn call cupy.cos. Yes it will call np.cos if you feed it a dask array that contains a NumPy ndarray under the hood. But that's equally true of np.mean, which is not a ufunc. The story here is ~95% parallel for __array_ufunc__ and __array_function__. When I said not seeing used, I meant in ways that are fundamentally different between those two protocols. > > this is case where knowing if something is a ufunc helps use a property > of it. so there the more specialized nature of __array_ufunc__ helps. Seems > niche though, and could probably also be done by checking if a function is > an instance of np.ufunc via __array_function__ > > Sparse arrays aren't very niche... and the isinstance trick is > possible in some cases, but (a) it's relying on an undocumented > implementation detail of __array_function__; according to > __array_function__'s API contract, you could just as easily get passed > the ufunc's __call__ method instead of the object itself, That seems to be a matter of making it documented? Currently the dispatcher is only attached to functions, not methods. and (b) it > doesn't work at all for ufunc methods like reduce, outer, accumulate. > No idea without looking in more detail if this can be made to work, but a quick count in the SciPy code base says ~10 uses of .reduce, 2 of .outer and 0 of .accumulate. So hardly showstoppers I'd say. These are both show-stoppers IMO. > > > This last point, using third-party ufuncs, is the interesting one to me. > They have to be generated with the NumPy ufunc machinery, so the dispatch > mechanism is attached to them. We could do third party functions for > __array_function__ too, but that would require making > @array_function_dispatch public, which we haven't done (yet?). > > With __array_function__ it's theoretically possible to do the dispatch > on third-party functions, but when someone defines a new function they > always have to go update all the duck array libraries to hard-code in > some special knowledge of their new function. So in my example, even > if we made @array_function_dispatch public, you still couldn't use > your nice new numba-created gufunc unless you first convinced dask, > xarray, and bcolz to all accept patches to support your new gufunc. > With __array_ufunc__, it works out-of-the-box. > Yep that's true. May still be better than not doing anything though, in some cases. You'll get a TypeError with a clear message for functions that aren't implemented, for something that otherwise likely doesn't work either. > > But what is that road, and what do you think the goal is? To me it's: > separate our API from our implementation. Yours seems to be "reuse our > implementations" for __array_ufunc__, but I can't see how that generalizes > beyond ufuncs. > > The road is to define *abstractions* for the operations we expose > through our API, so that duck array implementors can work against a > contract with well-defined preconditions and postconditions, so they > can write code the works reliably even when the surrounding > environment changes. That's the only way to keep things maintainable > AFAICT. If the API contract is just a vague handwave at the numpy API, > then no-one knows which details actually matter, it's impossible to > test, implementations will inevitably end up with subtle long-standing > bugs, and literally any change in numpy could potentially break duck > array users, we don't know. So my motivation is that I like testing, I > don't like bugs, and I like being able to maintain things with > confidence :-). The principles are much more general than ufuncs; > that's just a pertinent example. > Well, it's hard to argue with that in the abstract. I like all those things too:) The question is, what does that mean concretely? Most of the NumPy API, (g)ufuncs excepted, doesn't have well-defined abstractions, and it's hard to imagine we'll get those even if we could be more liberal with backwards compat. Most functions are just, well, functions. You can dispatch on them, or not. Your preference seems to be the latter, but I have a hard time figuring out how that translates into anything but "do nothing". Do you have a concrete alternative? I think we've chosen to try the former - dispatch on functions so we can reuse the NumPy API. It could work out well, it could give some long-term maintenance issues, time will tell. The question is now if and how to plug the gap that __array_function__ left. It's main limitation is "doesn't work for functions that don't have an array-like input" - that left out ~10-20% of functions. So now we have a proposal for a structural solution to that last 10-20%. It seems logical to want that gap plugged, rather than go back and say "we shouldn't have gone for the first 80%, so let's go no further". > > I think this is an important point. GPUs are massively popular, and when > very likely just continue to grow in importance. So anything we do in this > space that says "well it works, just not for GPUs" is probably not going to > solve our most pressing problems. > > I'm not saying "__array_ufunc__ doesn't work for GPUs". I'm saying > that when it comes to GPUs, there's an upper bound for how good you > can hope to do, and __array_ufunc__ achieves that upper bound. So does > __array_function__. So if we only care about GPUs, they're about > equally good. Indeed. But if we also care about dask and xarray and compressed > storage and sparse storage and ... then __array_ufunc__ is strictly > superior in those cases. That it's superior not really interesting though is it? Their main characteristic (the actual override) is identical, and then ufuncs go a bit further. I think to convince me you're going to have to come up with an actual alternative plan to `__array_ufunc__ + __array_function__ + unumpy-or-alternative-to-it`. And re maintenance worries: I think cleaning up our API surface and namespaces will go *much* further than yes/no on overrides. > So replacing __array_ufunc__ with > __array_function__ would be a major backwards step. > To be 100% clear, no one is actually proposing this. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Mon Sep 9 23:32:48 2019 From: shoyer at gmail.com (Stephan Hoyer) Date: Mon, 9 Sep 2019 20:32:48 -0700 Subject: [Numpy-discussion] =?utf-8?q?NEP_31_=E2=80=94_Context-local_and_?= =?utf-8?q?global_overrides_of_the_NumPy_API?= In-Reply-To: References: Message-ID: On Mon, Sep 9, 2019 at 6:27 PM Ralf Gommers wrote: > I think we've chosen to try the former - dispatch on functions so we can > reuse the NumPy API. It could work out well, it could give some long-term > maintenance issues, time will tell. The question is now if and how to plug > the gap that __array_function__ left. It's main limitation is "doesn't work > for functions that don't have an array-like input" - that left out ~10-20% > of functions. So now we have a proposal for a structural solution to that > last 10-20%. It seems logical to want that gap plugged, rather than go back > and say "we shouldn't have gone for the first 80%, so let's go no further". > I'm excited about solving the remaining 10-20% of use cases for flexible array dispatching, but the unumpy interface suggested here (numpy.overridable) feels like a redundant redo of __array_function__ and __array_ufunc__. I would much rather continue to develop specialized protocols for the remaining usecases. Summarizing those I've seen in this thread, these include: 1. Overrides for customizing array creation and coercion. 2. Overrides to implement operations for new dtypes. 3. Overriding implementations of NumPy functions, e.g., FFT and ufuncs with MKL. (1) could mostly be solved by adding np.duckarray() and another function for duck array coercion. There is still the matter of overriding np.zeros and the like, which perhaps justifies another new protocol, but in my experience the use-cases for truly an array from scratch are quite rare. (2) should be tackled as part of overhauling NumPy's dtype system to better support user defined dtypes. But it should definitely be in the form of specialized protocols, e.g., which pass in preallocated arrays to into ufuncs for a new dtype. By design, new dtypes should not be able to customize the semantics of array *structure*. (3) could potentially motivate a new solution, but it should exist *inside* of select existing NumPy implementations, after checking for overrides with __array_function__. If the only option NumPy provides for overriding np.fft is to implement np.overrideable.fft, I doubt that would suffice to convince MKL developers from monkey patching it -- they already decided that a separate namespace is not good enough for them. I also share Nathaniel's concern that the overrides in unumpy are too powerful, by allowing for control from arbitrary function arguments and even *non-local* control (i.e., global variables) from context managers. This level of flexibility can make code very hard to debug, especially in larger codebases. Best, Stephan -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Tue Sep 10 00:17:41 2019 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Mon, 09 Sep 2019 21:17:41 -0700 Subject: [Numpy-discussion] =?utf-8?q?NEP_31_=E2=80=94_Context-local_and_?= =?utf-8?q?global_overrides_of_the_NumPy_API?= In-Reply-To: References: Message-ID: On Mon, 2019-09-09 at 20:32 -0700, Stephan Hoyer wrote: > On Mon, Sep 9, 2019 at 6:27 PM Ralf Gommers > wrote: > > I think we've chosen to try the former - dispatch on functions so > > we can reuse the NumPy API. It could work out well, it could give > > some long-term maintenance issues, time will tell. The question is > > now if and how to plug the gap that __array_function__ left. It's > > main limitation is "doesn't work for functions that don't have an > > array-like input" - that left out ~10-20% of functions. So now we > > have a proposal for a structural solution to that last 10-20%. It > > seems logical to want that gap plugged, rather than go back and say > > "we shouldn't have gone for the first 80%, so let's go no further". > > > > I'm excited about solving the remaining 10-20% of use cases for > flexible array dispatching, but the unumpy interface suggested here > (numpy.overridable) feels like a redundant redo of __array_function__ > and __array_ufunc__. > > I would much rather continue to develop specialized protocols for the > remaining usecases. Summarizing those I've seen in this thread, these > include: > 1. Overrides for customizing array creation and coercion. > 2. Overrides to implement operations for new dtypes. > 3. Overriding implementations of NumPy functions, e.g., FFT and > ufuncs with MKL. > > (1) could mostly be solved by adding np.duckarray() and another > function for duck array coercion. There is still the matter of > overriding np.zeros and the like, which perhaps justifies another new > protocol, but in my experience the use-cases for truly an array from > scratch are quite rare. > There is an issue open about adding more functions for that. Made me wonder if giving a method of choosing the duck-array whose `__array_function__` is used, could not solve it reasonably. Similar to explicitly choosing a specific template version to call in templated code. In other words `np.arange(100)` (but with a completely different syntax, probably hidden away only for libraries to use). Maybe it is indeed time to write up a list of options to plug that hole, and then see where it brings us. Best, Sebastian > (2) should be tackled as part of overhauling NumPy's dtype system to > better support user defined dtypes. But it should definitely be in > the form of specialized protocols, e.g., which pass in preallocated > arrays to into ufuncs for a new dtype. By design, new dtypes should > not be able to customize the semantics of array *structure*. > > (3) could potentially motivate a new solution, but it should exist > *inside* of select existing NumPy implementations, after checking for > overrides with __array_function__. If the only option NumPy provides > for overriding np.fft is to implement np.overrideable.fft, I doubt > that would suffice to convince MKL developers from monkey patching it > -- they already decided that a separate namespace is not good enough > for them. > > I also share Nathaniel's concern that the overrides in unumpy are too > powerful, by allowing for control from arbitrary function arguments > and even *non-local* control (i.e., global variables) from context > managers. This level of flexibility can make code very hard to debug, > especially in larger codebases. > > Best, > Stephan > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From wieser.eric+numpy at gmail.com Tue Sep 10 01:26:14 2019 From: wieser.eric+numpy at gmail.com (Eric Wieser) Date: Mon, 9 Sep 2019 22:26:14 -0700 Subject: [Numpy-discussion] =?utf-8?q?NEP_31_=E2=80=94_Context-local_and_?= =?utf-8?q?global_overrides_of_the_NumPy_API?= In-Reply-To: References: Message-ID: > In other words `np.arange(100)` (but with a completely different syntax, probably hidden away only for libraries to use). It sounds an bit like you're describing factory classmethods there. Is the solution to this problem to move (leaving behind aliases) `np.arange` to `ndarray.arange`, `np.zeros` to `ndarray.zeros`, etc - callers then would use `type(duckarray).zeros` if they're trying to generalize. Eric On Mon, Sep 9, 2019, 21:18 Sebastian Berg wrote: > On Mon, 2019-09-09 at 20:32 -0700, Stephan Hoyer wrote: > > On Mon, Sep 9, 2019 at 6:27 PM Ralf Gommers > > wrote: > > > I think we've chosen to try the former - dispatch on functions so > > > we can reuse the NumPy API. It could work out well, it could give > > > some long-term maintenance issues, time will tell. The question is > > > now if and how to plug the gap that __array_function__ left. It's > > > main limitation is "doesn't work for functions that don't have an > > > array-like input" - that left out ~10-20% of functions. So now we > > > have a proposal for a structural solution to that last 10-20%. It > > > seems logical to want that gap plugged, rather than go back and say > > > "we shouldn't have gone for the first 80%, so let's go no further". > > > > > > > I'm excited about solving the remaining 10-20% of use cases for > > flexible array dispatching, but the unumpy interface suggested here > > (numpy.overridable) feels like a redundant redo of __array_function__ > > and __array_ufunc__. > > > > I would much rather continue to develop specialized protocols for the > > remaining usecases. Summarizing those I've seen in this thread, these > > include: > > 1. Overrides for customizing array creation and coercion. > > 2. Overrides to implement operations for new dtypes. > > 3. Overriding implementations of NumPy functions, e.g., FFT and > > ufuncs with MKL. > > > > (1) could mostly be solved by adding np.duckarray() and another > > function for duck array coercion. There is still the matter of > > overriding np.zeros and the like, which perhaps justifies another new > > protocol, but in my experience the use-cases for truly an array from > > scratch are quite rare. > > > > There is an issue open about adding more functions for that. Made me > wonder if giving a method of choosing the duck-array whose > `__array_function__` is used, could not solve it reasonably. > Similar to explicitly choosing a specific template version to call in > templated code. In other words `np.arange(100)` (but > with a completely different syntax, probably hidden away only for > libraries to use). > > > Maybe it is indeed time to write up a list of options to plug that > hole, and then see where it brings us. > > Best, > > Sebastian > > > > (2) should be tackled as part of overhauling NumPy's dtype system to > > better support user defined dtypes. But it should definitely be in > > the form of specialized protocols, e.g., which pass in preallocated > > arrays to into ufuncs for a new dtype. By design, new dtypes should > > not be able to customize the semantics of array *structure*. > > > > (3) could potentially motivate a new solution, but it should exist > > *inside* of select existing NumPy implementations, after checking for > > overrides with __array_function__. If the only option NumPy provides > > for overriding np.fft is to implement np.overrideable.fft, I doubt > > that would suffice to convince MKL developers from monkey patching it > > -- they already decided that a separate namespace is not good enough > > for them. > > > > I also share Nathaniel's concern that the overrides in unumpy are too > > powerful, by allowing for control from arbitrary function arguments > > and even *non-local* control (i.e., global variables) from context > > managers. This level of flexibility can make code very hard to debug, > > especially in larger codebases. > > > > Best, > > Stephan > > > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pankaj.jangid at gmail.com Tue Sep 10 02:04:35 2019 From: pankaj.jangid at gmail.com (Pankaj Jangid) Date: Tue, 10 Sep 2019 11:34:35 +0530 Subject: [Numpy-discussion] [JOB] Principal Software Engineer position at IHME In-Reply-To: (Ralf Gommers's message of "Mon, 9 Sep 2019 16:26:44 -0700") References: Message-ID: Ralf Gommers writes: > On Mon, Sep 9, 2019 at 4:14 PM Chun-Wei Yuan wrote: >> I see. Sorry. I think I misinterpreted "It is okay to post job ads for >> work involving NumPy/SciPy and related packages if you put [JOB] in the >> subject". Thanks for the clarification. > That might be our fault for not updating that page, thanks for pointing > that out. That bit of text stems from a time when it was still quite > unusual to be able to use NumPy et al. in a job. Luckily these days that's > different;) Times are changing ? -- Pankaj Jangid From sebastian at sipsolutions.net Tue Sep 10 02:11:58 2019 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Mon, 09 Sep 2019 23:11:58 -0700 Subject: [Numpy-discussion] =?utf-8?q?NEP_31_=E2=80=94_Context-local_and_?= =?utf-8?q?global_overrides_of_the_NumPy_API?= In-Reply-To: References: Message-ID: On Mon, 2019-09-09 at 22:26 -0700, Eric Wieser wrote: > > In other words `np.arange(100)` (but > with a completely different syntax, probably hidden away only for > libraries to use). > > It sounds an bit like you're describing factory classmethods there. > Is the solution to this problem to move (leaving behind aliases) > `np.arange` to `ndarray.arange`, `np.zeros` to `ndarray.zeros`, etc - > callers then would use `type(duckarray).zeros` if they're trying to > generalize. > Yeah, factory classmethod is probably the better way to describe it. The question is where you hide them away conveniently (and how to access them). And of course if/what completely different alternatives exist. In a sense, `__array_function__` is a bit like a collection of operator dunder methods, I guess. So, we need another collection for classmethods. And that was the quick, possibly silly, idea to also use `__array_function__`. So yeah, there is not much of a point in not simply creating another place for them, or even using individual dunder classmethods. But we still an "operator"/function to access them nicely, unless we want to force `type(duckarray).?` on library authors. I guess the important thing is mostly what would be convenient to downstreams implementers. - Sebastian > Eric > > On Mon, Sep 9, 2019, 21:18 Sebastian Berg > wrote: > > On Mon, 2019-09-09 at 20:32 -0700, Stephan Hoyer wrote: > > > On Mon, Sep 9, 2019 at 6:27 PM Ralf Gommers < > > ralf.gommers at gmail.com> > > > wrote: > > > > I think we've chosen to try the former - dispatch on functions > > so > > > > we can reuse the NumPy API. It could work out well, it could > > give > > > > some long-term maintenance issues, time will tell. The question > > is > > > > now if and how to plug the gap that __array_function__ left. > > It's > > > > main limitation is "doesn't work for functions that don't have > > an > > > > array-like input" - that left out ~10-20% of functions. So now > > we > > > > have a proposal for a structural solution to that last 10-20%. > > It > > > > seems logical to want that gap plugged, rather than go back and > > say > > > > "we shouldn't have gone for the first 80%, so let's go no > > further". > > > > > > > > > > I'm excited about solving the remaining 10-20% of use cases for > > > flexible array dispatching, but the unumpy interface suggested > > here > > > (numpy.overridable) feels like a redundant redo of > > __array_function__ > > > and __array_ufunc__. > > > > > > I would much rather continue to develop specialized protocols for > > the > > > remaining usecases. Summarizing those I've seen in this thread, > > these > > > include: > > > 1. Overrides for customizing array creation and coercion. > > > 2. Overrides to implement operations for new dtypes. > > > 3. Overriding implementations of NumPy functions, e.g., FFT and > > > ufuncs with MKL. > > > > > > (1) could mostly be solved by adding np.duckarray() and another > > > function for duck array coercion. There is still the matter of > > > overriding np.zeros and the like, which perhaps justifies another > > new > > > protocol, but in my experience the use-cases for truly an array > > from > > > scratch are quite rare. > > > > > > > There is an issue open about adding more functions for that. Made > > me > > wonder if giving a method of choosing the duck-array whose > > `__array_function__` is used, could not solve it reasonably. > > Similar to explicitly choosing a specific template version to call > > in > > templated code. In other words `np.arange(100)` > > (but > > with a completely different syntax, probably hidden away only for > > libraries to use). > > > > > > Maybe it is indeed time to write up a list of options to plug that > > hole, and then see where it brings us. > > > > Best, > > > > Sebastian > > > > > > > (2) should be tackled as part of overhauling NumPy's dtype system > > to > > > better support user defined dtypes. But it should definitely be > > in > > > the form of specialized protocols, e.g., which pass in > > preallocated > > > arrays to into ufuncs for a new dtype. By design, new dtypes > > should > > > not be able to customize the semantics of array *structure*. > > > > > > (3) could potentially motivate a new solution, but it should > > exist > > > *inside* of select existing NumPy implementations, after checking > > for > > > overrides with __array_function__. If the only option NumPy > > provides > > > for overriding np.fft is to implement np.overrideable.fft, I > > doubt > > > that would suffice to convince MKL developers from monkey > > patching it > > > -- they already decided that a separate namespace is not good > > enough > > > for them. > > > > > > I also share Nathaniel's concern that the overrides in unumpy are > > too > > > powerful, by allowing for control from arbitrary function > > arguments > > > and even *non-local* control (i.e., global variables) from > > context > > > managers. This level of flexibility can make code very hard to > > debug, > > > especially in larger codebases. > > > > > > Best, > > > Stephan > > > > > > > > > > > > > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at python.org > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From einstein.edison at gmail.com Tue Sep 10 06:37:30 2019 From: einstein.edison at gmail.com (Hameer Abbasi) Date: Tue, 10 Sep 2019 12:37:30 +0200 Subject: [Numpy-discussion] =?utf-8?q?NEP_31_=E2=80=94_Context-local_and_?= =?utf-8?q?global_overrides_of_the_NumPy_API?= In-Reply-To: References: Message-ID: <56c4a7ef-2d91-5cd9-cde1-382ade6b2d0e@gmail.com> On 08.09.19 10:56, Nathaniel Smith wrote: > On Sun, Sep 8, 2019 at 1:04 AM Hameer Abbasi wrote: >> On 08.09.19 09:53, Nathaniel Smith wrote: >>> OTOH, __array_function__ doesn't allow this kind of simplification: if >>> we were using __array_function__ for ufuncs, every library would have >>> to special-case every individual ufunc, which leads to dramatically >>> more work and more potential for bugs. >> But uarray does allow this kind of simplification. You would do the following inside a uarray backend: >> >> def __ua_function__(func, args, kwargs): >> with ua.skip_backend(self_backend): >> # Do code here, dispatches to everything but > You can dispatch to the underlying operation, sure, but you can't > implement a generic ufunc loop because you don't know that 'func' is > actually a bound ufunc method, or have any way to access the > underlying ufunc object. (E.g. consider the case where 'func' is > 'np.add.reduce'.) The critical part of my example was that it's a new > ufunc that none of these libraries have ever heard of before. You don't get np.add.reduce, you get np.ufunc.reduce with self=np.add. So you can access the underlying ufunc and the method, nothing limiting about that. > Ufuncs have lot of consistent structure beyond what generic Python > callables have, and the whole point of __array_ufunc__ is that > implementors can rely on that structure. You get to work at a higher > level of abstraction. > > A similar but simpler example would be the protocol we've sketched out > for concatenation: the idea would be to capture the core similarity > between np.concatenate/np.hstack/np.vstack/np.dstack/np.column_stack/np.row_stack/any > other variants, so that implementors only have to worry about the > higher-level concept of "concatenation" rather than the raw APIs of > all those individual functions. > > -n > > -n > From einstein.edison at gmail.com Tue Sep 10 06:48:24 2019 From: einstein.edison at gmail.com (Hameer Abbasi) Date: Tue, 10 Sep 2019 12:48:24 +0200 Subject: [Numpy-discussion] =?utf-8?q?NEP_31_=E2=80=94_Context-local_and_?= =?utf-8?q?global_overrides_of_the_NumPy_API?= In-Reply-To: References: Message-ID: On 09.09.19 03:26, Nathaniel Smith wrote: > [snip] > Generic in the sense that you can write __array_ufunc__ once and have > it work for all ufuncs. You can do that too with __ua_function__, you get np.ufunc.__call__, with self=. The same holds for say, RandomState objects, once implemented. > >>> Most duck array libraries can write a single >>> implementation of __array_ufunc__ that works for *all* ufuncs, even >>> new third-party ufuncs that the duck array library has never heard of, >> >> I see where you're going with this. You are thinking of reusing the ufunc implementation to do a computation. That's a minor use case (imho), and I can't remember seeing it used. > I mean, I just looked at dask and xarray, and they're both doing > exactly what I said, right now in shipping code. What use cases are > you targeting here if you consider dask and xarray out-of-scope? :-) > >> this is case where knowing if something is a ufunc helps use a property of it. so there the more specialized nature of __array_ufunc__ helps. Seems niche though, and could probably also be done by checking if a function is an instance of np.ufunc via __array_function__ > Sparse arrays aren't very niche... and the isinstance trick is > possible in some cases, but (a) it's relying on an undocumented > implementation detail of __array_function__; according to > __array_function__'s API contract, you could just as easily get passed > the ufunc's __call__ method instead of the object itself, and (b) it > doesn't work at all for ufunc methods like reduce, outer, accumulate. > These are both show-stoppers IMO. It does work for all ufunc methods. You just get passed in the appropriate method (ufunc.reduce, ufunc.accumulate, ...), with self=. > [snip] From einstein.edison at gmail.com Tue Sep 10 09:05:55 2019 From: einstein.edison at gmail.com (Hameer Abbasi) Date: Tue, 10 Sep 2019 15:05:55 +0200 Subject: [Numpy-discussion] =?utf-8?q?NEP_31_=E2=80=94_Context-local_and_?= =?utf-8?q?global_overrides_of_the_NumPy_API?= In-Reply-To: References: Message-ID: <93a3cb0b-2669-23da-e273-091128948cf6@gmail.com> On 10.09.19 05:32, Stephan Hoyer wrote: > On Mon, Sep 9, 2019 at 6:27 PM Ralf Gommers > wrote: > > I think we've chosen to try the former - dispatch on functions so > we can reuse the NumPy API. It could work out well, it could give > some long-term maintenance issues, time will tell. The question is > now if and how to plug the gap that __array_function__ left. It's > main limitation is "doesn't work for functions that don't have an > array-like input" - that left out ~10-20% of functions. So now we > have a proposal for a structural solution to that last 10-20%. It > seems logical to want that gap plugged, rather than go back and > say "we shouldn't have gone for the first 80%, so let's go no > further". > > > I'm excited about solving the remaining 10-20% of use cases for > flexible array dispatching, but the unumpy interface suggested here > (numpy.overridable) feels like a redundant redo of __array_function__ > and __array_ufunc__. > > I would much rather continue to develop specialized protocols for the > remaining usecases. Summarizing those I've seen in this thread, these > include: > 1. Overrides for?customizing array creation and coercion. > 2. Overrides to implement operations for new dtypes. > 3. Overriding implementations of NumPy functions, e.g., FFT and ufuncs > with MKL. > > (1) could mostly be solved by adding np.duckarray() and another > function for duck array coercion. There is still the matter of > overriding np.zeros and the like, which perhaps justifies another new > protocol, but in my experience the use-cases for truly an array from > scratch are quite rare. While they're rare for libraries like XArray; CuPy, Dask and PyData/Sparse need these. > > (2) should be tackled as part of overhauling NumPy's dtype system to > better support user defined dtypes. But it should definitely be in the > form of specialized protocols, e.g., which pass in preallocated arrays > to into ufuncs for a new dtype. By design, new dtypes should not be > able to customize the semantics of array *structure*. We already have a split in the type system with e.g. Cython's buffers, Numba's parallel type system. This is a different issue altogether, e.g. allowing a unyt dtype to spawn a unyt array, rather than forcing a re-write of unyt to cooperate with NumPy's new dtype system. > > (3) could potentially motivate a new solution, but it should exist > *inside* of select existing NumPy implementations, after checking for > overrides with __array_function__. If the only option NumPy provides > for overriding np.fft is to implement np.overrideable.fft, I doubt > that would suffice to convince MKL developers from monkey patching it > -- they already decided that a separate namespace is not good enough > for them. That has already been addressed by Ralf in another email. We're proposing to merge that into NumPy proper. Also, you're missing a few: 4. Having default implementations that allow overrides of a large part of the API while defining only a small part. This holds for e.g. transpose/concatenate. 5. Generation of Random numbers (overriding RandomState). CuPy has its own implementation which would be nice to override. > > I also share Nathaniel's concern that the overrides in unumpy are too > powerful, by allowing for control from arbitrary function arguments > and even *non-local* control (i.e., global variables) from context > managers. This level of flexibility can make code very hard to debug, > especially in larger codebases. Backend switching needs global context, in any case. There isn't a good way around that other than the class dundermethods outlined in another thread, which would require rewrites of large amounts of code. > > Best, > Stephan > > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From einstein.edison at gmail.com Tue Sep 10 11:28:34 2019 From: einstein.edison at gmail.com (Hameer Abbasi) Date: Tue, 10 Sep 2019 17:28:34 +0200 Subject: [Numpy-discussion] =?utf-8?q?NEP_31_=E2=80=94_Context-local_and_?= =?utf-8?q?global_overrides_of_the_NumPy_API?= In-Reply-To: References: <67b837e7-5d3d-337e-49f7-cac078ec4d8f@gmail.com> Message-ID: <768bf864-3df9-e4d2-a430-06316f374094@gmail.com> On 07.09.19 22:06, Sebastian Berg wrote: > On Fri, 2019-09-06 at 14:45 -0700, Ralf Gommers wrote: > > > > Let me try to move the discussion from the github issue here (this may > not be the best place). (https://github.com/numpy/numpy/issues/14441 > which asked for easier creation functions together with `__array_function__`). > > I think an important note mentioned here is how users interact with > unumpy, vs. __array_function__. The former is an explicit opt-in, while > the latter is implicit choice based on an `array-like` abstract base > class and functional type based dispatching. > > To quote NEP 18 on this: "The downsides are that this would require an > explicit opt-in from all existing code, e.g., import numpy.api as np, > and in the long term would result in the maintenance of two separate > NumPy APIs. Also, many functions from numpy itself are already > overloaded (but inadequately), so confusion about high vs. low level > APIs in NumPy would still persist." > (I do think this is a point we should not just ignore, `uarray` is a > thin layer, but it has a big surface area) > > Now there are things where explicit opt-in is obvious. And the FFT > example is one of those, there is no way to implicitly choose another > backend (except by just replacing it, i.e. monkeypatching) [1]. And > right now I think these are _very_ different. > > > Now for the end-users choosing one array-like over another, seems nicer > as an implicit mechanism (why should I not mix sparse, dask and numpy > arrays!?). This is the promise `__array_function__` tries to make. > Unless convinced otherwise, my guess is that most library authors would > strive for implicit support (i.e. sklearn, skimage, scipy). You can, once you register the backend it becomes implicit, so all backends are tried until one succeeds. Unless you explicitly say "I do not want another backend" (only/coerce=True). > > Circling back to creation and coercion. In a purely Object type system, > these would be classmethods, I guess, but in NumPy and the libraries > above, we are lost. > > Solution 1: Create explicit opt-in, e.g. through uarray. (NEP-31) > * Required end-user opt-in. > * Seems cleaner in many ways > * Requires a full copy of the API. > > Solution 2: Add some coercion "protocol" (NEP-30) and expose a way to > create new arrays more conveniently. This would practically mean adding > an `array_type=np.ndarray` argument. > * _Not_ used by end-users! End users should use dask.linspace! > * Adds "strange" API somewhere in numpy, and possible a new > "protocol" (additionally to coercion).[2] > > I still feel these solve different issues. The second one is intended > to make array likes work implicitly in libraries (without end users > having to do anything). While the first seems to force the end user to > opt in, sometimes unnecessarily: > > def my_library_func(array_like): > exp = np.exp(array_like) > idx = np.arange(len(exp)) > return idx, exp > > Would have all the information for implicit opt-in/Array-like support, > but cannot do it right now. This is what I have been wondering, if > uarray/unumpy, can in some way help me make this work (even _without_ > the end user opting in). The reason is that simply, right now I am very > clear on the need for this use case, but not sure about the need for > end user opt in, since end users can just use dask.arange(). Sure, the end user can, but library authors cannot. And end users may want to easily port code to GPU or between back-ends, just as library authors might. > > Cheers, > > Sebastian > > > [1] To be honest, I do think a lot of the "issues" around > monkeypatching exists just as much with backend choosing, the main > difference seems to me that a lot of that: > 1. monkeypatching was not done explicit > (import mkl_fft; mkl_fft.monkeypatch_numpy())? > 2. A backend system allows libaries to prefer one locally? > (which I think is a big advantage) > > [2] There are the options of adding `linspace_like` functions somewhere > in a numpy submodule, or adding `linspace(..., array_type=np.ndarray)`, > or simply inventing a new "protocl" (which is not really a protocol?), > and make it `ndarray.__numpy_like_creation_functions__.arange()`. Handling things like RandomState can get complicated here. From sebastian at sipsolutions.net Tue Sep 10 13:51:04 2019 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 10 Sep 2019 10:51:04 -0700 Subject: [Numpy-discussion] =?utf-8?q?NEP_31_=E2=80=94_Context-local_and_?= =?utf-8?q?global_overrides_of_the_NumPy_API?= In-Reply-To: <768bf864-3df9-e4d2-a430-06316f374094@gmail.com> References: <67b837e7-5d3d-337e-49f7-cac078ec4d8f@gmail.com> <768bf864-3df9-e4d2-a430-06316f374094@gmail.com> Message-ID: On Tue, 2019-09-10 at 17:28 +0200, Hameer Abbasi wrote: > On 07.09.19 22:06, Sebastian Berg wrote: > > On Fri, 2019-09-06 at 14:45 -0700, Ralf Gommers wrote: > > > > > > > > Let me try to move the discussion from the github issue here (this > > may > > not be the best place). ( > > https://github.com/numpy/numpy/issues/14441 > > which asked for easier creation functions together with > > `__array_function__`). > > > > I think an important note mentioned here is how users interact with > > unumpy, vs. __array_function__. The former is an explicit opt-in, > > while > > the latter is implicit choice based on an `array-like` abstract > > base > > class and functional type based dispatching. > > > > To quote NEP 18 on this: "The downsides are that this would require > > an > > explicit opt-in from all existing code, e.g., import numpy.api as > > np, > > and in the long term would result in the maintenance of two > > separate > > NumPy APIs. Also, many functions from numpy itself are already > > overloaded (but inadequately), so confusion about high vs. low > > level > > APIs in NumPy would still persist." > > (I do think this is a point we should not just ignore, `uarray` is > > a > > thin layer, but it has a big surface area) > > > > Now there are things where explicit opt-in is obvious. And the FFT > > example is one of those, there is no way to implicitly choose > > another > > backend (except by just replacing it, i.e. monkeypatching) [1]. And > > right now I think these are _very_ different. > > > > > > Now for the end-users choosing one array-like over another, seems > > nicer > > as an implicit mechanism (why should I not mix sparse, dask and > > numpy > > arrays!?). This is the promise `__array_function__` tries to make. > > Unless convinced otherwise, my guess is that most library authors > > would > > strive for implicit support (i.e. sklearn, skimage, scipy). > You can, once you register the backend it becomes implicit, so all > backends are tried until one succeeds. Unless you explicitly say "I > do > not want another backend" (only/coerce=True). The thing here being "once you register the backend". Thus requiring at least in some form an explicit opt-in by the end user. Also, unless you use the with statement (with all the scoping rules attached), you cannot plug the coercion/creation hole left by `__array_function__`. > > Circling back to creation and coercion. In a purely Object type > > system, > > these would be classmethods, I guess, but in NumPy and the > > def my_library_func(array_like): > > exp = np.exp(array_like) > > idx = np.arange(len(exp)) > > return idx, exp > > > > Would have all the information for implicit opt-in/Array-like > > support, > > but cannot do it right now. This is what I have been wondering, if > > uarray/unumpy, can in some way help me make this work (even > > _without_ > > the end user opting in). The reason is that simply, right now I am > > very > > clear on the need for this use case, but not sure about the need > > for > > end user opt in, since end users can just use dask.arange(). > > Sure, the end user can, but library authors cannot. And end users > may > want to easily port code to GPU or between back-ends, just as > library > authors might. Yes, but library authors want to solve the particular thing above right now, and I am still not sure how uarray helps there. If it does, then only with a added complexity _and_ (at least currently) explicit end- user opt-in. Now, I am not a particularly good judge for these things, but I have been trying to figure out how things can improve with it and still I am tempted to say that uarray is a giant step in no particular direction at all. Of course it _can_ solve everything, but right now it seems like it might require a py2 -> py3 like transition. And even then it is so powerful, that it probably comes with its own bunch of issues (such as far away side effects due to scoping of with statements). Best, Sebastian > > Cheers, > > > > Sebastian > > > > > > [1] To be honest, I do think a lot of the "issues" around > > monkeypatching exists just as much with backend choosing, the main > > difference seems to me that a lot of that: > > 1. monkeypatching was not done explicit > > (import mkl_fft; mkl_fft.monkeypatch_numpy())? > > 2. A backend system allows libaries to prefer one locally? > > (which I think is a big advantage) > > > > [2] There are the options of adding `linspace_like` functions > > somewhere > > in a numpy submodule, or adding `linspace(..., > > array_type=np.ndarray)`, > > or simply inventing a new "protocl" (which is not really a > > protocol?), > > and make it `ndarray.__numpy_like_creation_functions__.arange()`. > > Handling things like RandomState can get complicated here. > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From shoyer at gmail.com Tue Sep 10 13:58:39 2019 From: shoyer at gmail.com (Stephan Hoyer) Date: Tue, 10 Sep 2019 10:58:39 -0700 Subject: [Numpy-discussion] =?utf-8?q?NEP_31_=E2=80=94_Context-local_and_?= =?utf-8?q?global_overrides_of_the_NumPy_API?= In-Reply-To: <93a3cb0b-2669-23da-e273-091128948cf6@gmail.com> References: <93a3cb0b-2669-23da-e273-091128948cf6@gmail.com> Message-ID: On Tue, Sep 10, 2019 at 6:06 AM Hameer Abbasi wrote: > On 10.09.19 05:32, Stephan Hoyer wrote: > > On Mon, Sep 9, 2019 at 6:27 PM Ralf Gommers > wrote: > >> I think we've chosen to try the former - dispatch on functions so we can >> reuse the NumPy API. It could work out well, it could give some long-term >> maintenance issues, time will tell. The question is now if and how to plug >> the gap that __array_function__ left. It's main limitation is "doesn't work >> for functions that don't have an array-like input" - that left out ~10-20% >> of functions. So now we have a proposal for a structural solution to that >> last 10-20%. It seems logical to want that gap plugged, rather than go back >> and say "we shouldn't have gone for the first 80%, so let's go no further". >> > > I'm excited about solving the remaining 10-20% of use cases for flexible > array dispatching, but the unumpy interface suggested here > (numpy.overridable) feels like a redundant redo of __array_function__ and > __array_ufunc__. > > I would much rather continue to develop specialized protocols for the > remaining usecases. Summarizing those I've seen in this thread, these > include: > 1. Overrides for customizing array creation and coercion. > 2. Overrides to implement operations for new dtypes. > 3. Overriding implementations of NumPy functions, e.g., FFT and ufuncs > with MKL. > > (1) could mostly be solved by adding np.duckarray() and another function > for duck array coercion. There is still the matter of overriding np.zeros > and the like, which perhaps justifies another new protocol, but in my > experience the use-cases for truly an array from scratch are quite rare. > > While they're rare for libraries like XArray; CuPy, Dask and PyData/Sparse > need these. > > > (2) should be tackled as part of overhauling NumPy's dtype system to > better support user defined dtypes. But it should definitely be in the form > of specialized protocols, e.g., which pass in preallocated arrays to into > ufuncs for a new dtype. By design, new dtypes should not be able to > customize the semantics of array *structure*. > > We already have a split in the type system with e.g. Cython's buffers, > Numba's parallel type system. This is a different issue altogether, e.g. > allowing a unyt dtype to spawn a unyt array, rather than forcing a re-write > of unyt to cooperate with NumPy's new dtype system. > I guess you're proposing that operations like np.sum(numpy_array, dtype=other_dtype) could rely on other_dtype for the implementation and potentially return a non-NumPy array? I'm not sure this is well motivated -- it would be helpful to discuss actual use-cases. The most commonly used NumPy functionality related to dtypes can be found only in methods on np.ndarray, e.g., astype() and view(). But I don't think there's any proposal to change that. > 4. Having default implementations that allow overrides of a large part of > the API while defining only a small part. This holds for e.g. > transpose/concatenate. > I'm not sure how unumpy solve the problems we encountered when trying to do this with __array_function__ -- namely the way that it exposes all of NumPy's internals, or requires rewriting a lot of internal NumPy code to ensure it always casts inputs with asarray(). I think it would be useful to expose default implementations of NumPy operations somewhere to make it easier to implement __array_function__, but it doesn't make much sense to couple this to user facing overrides. These can be exposed as a separate package or numpy module (e.g., numpy.default_implementations) that uses np.duckarray(), which library authors can make use of by calling inside their __aray_function__ methods. > 5. Generation of Random numbers (overriding RandomState). CuPy has its > own implementation which would be nice to override. > I'm not sure that NumPy's random state objects make sense for duck arrays. Because these are stateful objects, they are pretty coupled to NumPy's implementation -- you cannot store any additional state on RandomState objects that might be needed for a new implementation. At a bare minimum, you will loss the reproducibility of random seeds, though this may be less of a concern with the new random API. > I also share Nathaniel's concern that the overrides in unumpy are too > powerful, by allowing for control from arbitrary function arguments and > even *non-local* control (i.e., global variables) from context managers. > This level of flexibility can make code very hard to debug, especially in > larger codebases. > > Backend switching needs global context, in any case. There isn't a good > way around that other than the class dundermethods outlined in another > thread, which would require rewrites of large amounts of code. > Do we really need to support robust backend switching in NumPy? I'm not strongly opposed, but what use cases does it actually solve to be able to override np.fft.fft rather than using a new function? At some point, if you want maximum performance you won't be writing the code using NumPy proper anyways. At best you'll be using a system with duck-array support like CuPy. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Tue Sep 10 15:11:16 2019 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 10 Sep 2019 12:11:16 -0700 Subject: [Numpy-discussion] NumPy Community Meeting Wednesday, Sep. 11 Message-ID: <889157a2789d64e87d514675fa64e3f72b7b05be.camel@sipsolutions.net> Hi all, There will be a NumPy Community meeting Wednesday September 11 at 11 am Pacific Time. Everyone is invited to join in and edit the work-in- progress meeting topics and notes: https://hackmd.io/76o-IxCjQX2mOXO_wwkcpg?both Best wishes Sebastian -------------- next part -------------- A non-text attachment was scrubbed... Name: NumPy_Community_Call.ics Type: text/calendar Size: 3264 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From warren.weckesser at gmail.com Wed Sep 11 10:29:39 2019 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Wed, 11 Sep 2019 10:29:39 -0400 Subject: [Numpy-discussion] NEP 32: Remove the financial functions from NumPy In-Reply-To: <1568041679303-0.post@n7.nabble.com> References: <9067a8f06bc885307d1ec726a55bc5fd906c3c62.camel@sipsolutions.net> <1568041679303-0.post@n7.nabble.com> Message-ID: On 9/9/19, D.S. McNeil wrote: > [coming over from the pydata post] > > I just checked about ~150KLOC of our Python code in a financial context, > written by about twenty developers over about four years. Almost every > function uses numpy, sometimes directly and sometimes via pandas. > > It seems like these functions were never used anywhere, and the lead dev on > one of the projects responded "never used them; didn't even know they > exist". I knew they existed, but even on the rare occasion I need the > functionality I need better control over the dates, which means for > practical purposes I need something which supports Series natively anyhow. > > As it is, they also clutter up the namespace in unfriendly ways: if there's > going to be a top-level function called np.rate I don't think this is the > one it should be. Admittedly that's more an argument against their current > location. > > Although it wouldn't be useful for us, I could imagine someone finding a > package which provides numpy-compatible versions of the many OpenFormula > (or > whatever the spec is called) functions helpful. Having numpy carry a tiny > subset of them doesn't feel productive. > > +1 for removing them. > > > Doug Thanks Doug, that's useful feedback. Warren > > > > -- > Sent from: http://numpy-discussion.10968.n7.nabble.com/ > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > From warren.weckesser at gmail.com Wed Sep 11 10:30:55 2019 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Wed, 11 Sep 2019 10:30:55 -0400 Subject: [Numpy-discussion] NEP 32: Remove the financial functions from NumPy In-Reply-To: References: Message-ID: On 9/3/19, Warren Weckesser wrote: > Github issue 2880 ("Get financial functions out of main namespace", > https://github.com/numpy/numpy/issues/2880) has been open since 2013. In a > recent community meeting, it was suggested that we create a NEP to propose > the removal of the financial functions from NumPy. I have submitted "NEP > 32: Remove the financial functions from NumPy" in a pull request at > https://github.com/numpy/numpy/pull/14399. A copy of the latest version of > the NEP is below. FYI, the NEP is now also available at https://numpy.org/neps/nep-0032-remove-financial-functions.html. Warren > > According to the NEP process document, "Once the PR is in place, the NEP > should be announced on the mailing list for discussion (comments on the PR > itself should be restricted to minor editorial and technical fixes)." This > email is the announcement for NEP 32. > > The NEP includes a brief summary of the history of the financial functions, > and has links to several relevant mailing list threads, dating back to when > the functions were added to NumPy in 2008. I recommend reviewing those > threads before commenting here. > > Warren > > ----- > > ================================================== > NEP 32 ? Remove the financial functions from NumPy > ================================================== > > :Author: Warren Weckesser > :Status: Draft > :Type: Standards Track > :Created: 2019-08-30 > > > Abstract > -------- > > We propose deprecating and ultimately removing the financial functions [1]_ > from NumPy. The functions will be moved to an independent repository, > and provided to the community as a separate package with the name > ``numpy_financial``. > > > Motivation and scope > -------------------- > > The NumPy financial functions [1]_ are the 10 functions ``fv``, ``ipmt``, > ``irr``, ``mirr``, ``nper``, ``npv``, ``pmt``, ``ppmt``, ``pv`` and > ``rate``. > The functions provide elementary financial calculations such as future > value, > net present value, etc. These functions were added to NumPy in 2008 [2]_. > > In May, 2009, a request by Joe Harrington to add a function called ``xirr`` > to > the financial functions triggered a long thread about these functions [3]_. > One important point that came up in that thread is that a "real" financial > library must be able to handle real dates. The NumPy financial functions > do > not work with actual dates or calendars. The preference for a more capable > library independent of NumPy was expressed several times in that thread. > > In June, 2009, D. L. Goldsmith expressed concerns about the correctness of > the > implementations of some of the financial functions [4]_. It was suggested > then > to move the financial functions out of NumPy to an independent package. > > In a GitHub issue in 2013 [5]_, Nathaniel Smith suggested moving the > financial > functions from the top-level namespace to ``numpy.financial``. He also > suggested giving the functions better names. Responses at that time > included > the suggestion to deprecate them and move them from NumPy to a separate > package. This issue is still open. > > Later in 2013 [6]_, it was suggested on the mailing list that these > functions > be removed from NumPy. > > The arguments for the removal of these functions from NumPy: > > * They are too specialized for NumPy. > * They are not actually useful for "real world" financial calculations, > because > they do not handle real dates and calendars. > * The definition of "correctness" for some of these functions seems to be a > matter of convention, and the current NumPy developers do not have the > background to judge their correctness. > * There has been little interest among past and present NumPy developers > in maintaining these functions. > > The main arguments for keeping the functions in NumPy are: > > * Removing these functions will be disruptive for some users. Current > users > will have to add the new ``numpy_financial`` package to their > dependencies, > and then modify their code to use the new package. > * The functions provided, while not "industrial strength", are apparently > similar to functions provided by spreadsheets and some calculators. > Having > them available in NumPy makes it easier for some developers to migrate > their > software to Python and NumPy. > > It is clear from comments in the mailing list discussions and in the GitHub > issues that many current NumPy developers believe the benefits of removing > the functions outweigh the costs. For example, from [5]_:: > > The financial functions should probably be part of a separate package > -- Charles Harris > > If there's a better package we can point people to we could just > deprecate > them and then remove them entirely... I'd be fine with that too... > -- Nathaniel Smith > > +1 to deprecate them. If no other package exists, it can be created if > someone feels the need for that. > -- Ralf Gommers > > I feel pretty strongly that we should deprecate these. If nobody on > numpy?s > core team is interested in maintaining them, then it is purely a drag > on > development for NumPy. > -- Stephan Hoyer > > And from the 2013 mailing list discussion, about removing the functions > from > NumPy:: > > I am +1 as well, I don't think they should have been included in the > first > place. > -- David Cournapeau > > But not everyone was in favor of removal:: > > The fin routines are tiny and don't require much maintenance once > written. If we made an effort (putting up pages with examples of > common > financial calculations and collecting those under a topical web page, > then linking to that page from various places and talking it up), I > would think they could attract users looking for a free way to play > with > financial scenarios. [...] > So, I would say we keep them. If ours are not the best, we should > bring > them up to snuff. > -- Joe Harrington > > For an idea of the maintenance burden of the financial functions, one can > look for all the GitHub issues [7]_ and pull requests [8]_ that have the > tag > ``component: numpy.lib.financial``. > > One method for measuring the effect of removing these functions is to find > all the packages on GitHub that use them. Such a search can be performed > with the ``python-api-inspect`` service [9]_. A search for all uses of the > NumPy financial functions finds just eight repositories. (See the comments > in [5]_ for the actual SQL query.) > > > Implementation > -------------- > > * Create a new Python package, ``numpy_financial``, to be maintained in the > top-level NumPy github organization. This repository will contain the > definitions and unit tests for the financial functions. The package will > be added to PyPI so it can be installed with ``pip``. > * Deprecate the financial functions in the ``numpy`` namespace, beginning > in > NumPy version 1.18. Remove the financial functions from NumPy version > 1.20. > > > Backward compatibility > ---------------------- > > The removal of these functions breaks backward compatibility, as explained > earlier. The effects are mitigated by providing the ``numpy_financial`` > library. > > > Alternatives > ------------ > > The following alternatives were mentioned in [5]_: > > * *Maintain the functions as they are (i.e. do nothing).* > A review of the history makes clear that this is not the preference of > many > NumPy developers. A recurring comment is that the functions simply do > not > belong in NumPy. When that sentiment is combined with the history of bug > reports and the ongoing questions about the correctness of the functions, > the > conclusion is that the cleanest solution is deprecation and removal. > * *Move the functions from the ``numpy`` namespace to ``numpy.financial``.* > This was the initial suggestion in [5]_. Such a change does not address > the > maintenance issues, and doesn't change the misfit that many developers > see > between these functions and NumPy. It causes disruption for the current > users of these functions without addressing what many developers see as > the > fundamental problem. > > > Discussion > ---------- > > Links to past mailing list discussions, and to relevant GitHub issues and > pull > requests, have already been given. > > > References and footnotes > ------------------------ > > .. [1] Financial functions, > https://numpy.org/doc/1.17/reference/routines.financial.html > > .. [2] Numpy-discussion mailing list, "Simple financial functions for > NumPy", > > https://mail.python.org/pipermail/numpy-discussion/2008-April/032353.html > > .. [3] Numpy-discussion mailing list, "add xirr to numpy financial > functions?", > https://mail.python.org/pipermail/numpy-discussion/2009-May/042645.html > > .. [4] Numpy-discussion mailing list, "Definitions of pv, fv, nper, pmt, > and rate", > https://mail.python.org/pipermail/numpy-discussion/2009-June/043188.html > > .. [5] Get financial functions out of main namespace, > https://github.com/numpy/numpy/issues/2880 > > .. [6] Numpy-discussion mailing list, "Deprecation of financial routines", > > https://mail.python.org/pipermail/numpy-discussion/2013-August/067409.html > > .. [7] ``component: numpy.lib.financial`` issues, > > https://github.com/numpy/numpy/issues?utf8=%E2%9C%93&q=is%3Aissue+label%3A%22component%3A+numpy.lib.financial%22+ > > .. [8] ``component: numpy.lib.financial`` pull request, > > https://github.com/numpy/numpy/pulls?utf8=%E2%9C%93&q=is%3Apr+label%3A%22component%3A+numpy.lib.financial%22+ > > .. [9] Quansight-Labs/python-api-inspect, > https://github.com/Quansight-Labs/python-api-inspect/ > > > Copyright > --------- > > This document has been placed in the public domain. > From tyler.je.reddy at gmail.com Wed Sep 11 15:43:54 2019 From: tyler.je.reddy at gmail.com (Tyler Reddy) Date: Wed, 11 Sep 2019 13:43:54 -0600 Subject: [Numpy-discussion] Using hypothesis in testing In-Reply-To: References: Message-ID: I think the pros outweigh the cons -- I'll comment briefly on the PR. On Mon, 9 Sep 2019 at 02:41, Matti Picus wrote: > We have discussed using the hypothesis package to generate test cases at a > few meetings informally. At the EuroSciPy sprint, kitchoi took up the > challenge and issued a pull request > https://github.com/numpy/numpy/pull/14440 that actually goes ahead and > does it. While not finding any new failures, the round-trip testing of s = > np.array2string(np.array(s)) shows what hypothesis can do. The new test > runs for about 1/2 a second. In my mind the next step would be to use this > style of testing to expose problems in the np.chararray routines. > > > What do you think? Is the cost of adding a new dependency worth the more > thorough testing? > > Matti > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Wed Sep 11 18:53:11 2019 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Wed, 11 Sep 2019 15:53:11 -0700 Subject: [Numpy-discussion] =?utf-8?q?NEP_31_=E2=80=94_Context-local_and_?= =?utf-8?q?global_overrides_of_the_NumPy_API?= In-Reply-To: References: <67b837e7-5d3d-337e-49f7-cac078ec4d8f@gmail.com> <768bf864-3df9-e4d2-a430-06316f374094@gmail.com> Message-ID: On Tue, Sep 10, 2019 at 10:53 AM Sebastian Berg wrote: > On Tue, 2019-09-10 at 17:28 +0200, Hameer Abbasi wrote: > > On 07.09.19 22:06, Sebastian Berg wrote: > > > > > > Now for the end-users choosing one array-like over another, seems > > > nicer > > > as an implicit mechanism (why should I not mix sparse, dask and > > > numpy > > > arrays!?). This is the promise `__array_function__` tries to make. > > > Unless convinced otherwise, my guess is that most library authors > > > would > > > strive for implicit support (i.e. sklearn, skimage, scipy). > > You can, once you register the backend it becomes implicit, so all > > backends are tried until one succeeds. Unless you explicitly say "I > > do > > not want another backend" (only/coerce=True). > > The thing here being "once you register the backend". Thus requiring at > least in some form an explicit opt-in by the end user. Also, unless you > use the with statement (with all the scoping rules attached), you > cannot plug the coercion/creation hole left by `__array_function__`. > The need for this is clear I think. We're discussion on the unumpy repo whether this can be done with a minor change to how unumpy works, or by having backend auto-register somehow on import. It should be possible without mandating that the end user has to explicitly do something, but needs some thought. Stay tuned. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Wed Sep 11 19:17:32 2019 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Wed, 11 Sep 2019 16:17:32 -0700 Subject: [Numpy-discussion] =?utf-8?q?NEP_31_=E2=80=94_Context-local_and_?= =?utf-8?q?global_overrides_of_the_NumPy_API?= In-Reply-To: References: <93a3cb0b-2669-23da-e273-091128948cf6@gmail.com> Message-ID: On Tue, Sep 10, 2019 at 10:59 AM Stephan Hoyer wrote: > On Tue, Sep 10, 2019 at 6:06 AM Hameer Abbasi > wrote: > >> On 10.09.19 05:32, Stephan Hoyer wrote: >> >> On Mon, Sep 9, 2019 at 6:27 PM Ralf Gommers >> wrote: >> >>> I think we've chosen to try the former - dispatch on functions so we can >>> reuse the NumPy API. It could work out well, it could give some long-term >>> maintenance issues, time will tell. The question is now if and how to plug >>> the gap that __array_function__ left. It's main limitation is "doesn't work >>> for functions that don't have an array-like input" - that left out ~10-20% >>> of functions. So now we have a proposal for a structural solution to that >>> last 10-20%. It seems logical to want that gap plugged, rather than go back >>> and say "we shouldn't have gone for the first 80%, so let's go no further". >>> >> >> I'm excited about solving the remaining 10-20% of use cases for flexible >> array dispatching, >> >> Great! I think most (but not all) of us are on the same page here. Actually now that Peter came up with the `like=` keyword idea for array creation functions I'm very interested in seeing that worked out, feels like that could be a nice solution for part of that 10-20% that did look pretty bad before. > but the unumpy interface suggested here (numpy.overridable) feels like a >> redundant redo of __array_function__ and __array_ufunc__. >> >> A bit of context: a big part of the reason I advocated for numpy.overridable is that library authors can use it *only* for the parts not already covered by the protocols we already have. If there's overlap there's several ways to deal with that, including only including part of the unumpy API surface. It does plug all the holes in one go (although you can then indeed argue it does too much), and there is no other coherent proposal/vision yet that does this. What you wrote below comes closest, and I'd love to see that worked out (e.g. the like= argument for array creation). What I don't like is an ad-hoc plugging of one hole at a time without visibility on how many more protocols and new workaround functions in the API we would need. So hopefully we can come to an apples-to-apples comparison of two design alternatives. Also, we just discussed this whole thread in the community call, and it's clear that it's a complex matter with many different angles. It's very hard to get a full overview. Our conclusion in the call was that this will benefit from an in-person discussion. The sprint in November may be a really good opportunity for that. In the meantime we can of course keep working out ideas/docs. For now I think it's clear that we (the NEP authors) have some homework to do - that may take some time. >> I would much rather continue to develop specialized protocols for the >> remaining usecases. Summarizing those I've seen in this thread, these >> include: >> 1. Overrides for customizing array creation and coercion. >> 2. Overrides to implement operations for new dtypes. >> 3. Overriding implementations of NumPy functions, e.g., FFT and ufuncs >> with MKL. >> >> (1) could mostly be solved by adding np.duckarray() and another function >> for duck array coercion. There is still the matter of overriding np.zeros >> and the like, which perhaps justifies another new protocol, but in my >> experience the use-cases for truly an array from scratch are quite rare. >> >> While they're rare for libraries like XArray; CuPy, Dask and >> PyData/Sparse need these. >> >> >> (2) should be tackled as part of overhauling NumPy's dtype system to >> better support user defined dtypes. But it should definitely be in the form >> of specialized protocols, e.g., which pass in preallocated arrays to into >> ufuncs for a new dtype. By design, new dtypes should not be able to >> customize the semantics of array *structure*. >> >> We already have a split in the type system with e.g. Cython's buffers, >> Numba's parallel type system. This is a different issue altogether, e.g. >> allowing a unyt dtype to spawn a unyt array, rather than forcing a re-write >> of unyt to cooperate with NumPy's new dtype system. >> > > I guess you're proposing that operations like np.sum(numpy_array, > dtype=other_dtype) could rely on other_dtype for the implementation and > potentially return a non-NumPy array? I'm not sure this is well motivated > -- it would be helpful to discuss actual use-cases. > > The most commonly used NumPy functionality related to dtypes can be found > only in methods on np.ndarray, e.g., astype() and view(). But I don't think > there's any proposal to change that. > >> 4. Having default implementations that allow overrides of a large part of >> the API while defining only a small part. This holds for e.g. >> transpose/concatenate. >> > I'm not sure how unumpy solve the problems we encountered when trying to > do this with __array_function__ -- namely the way that it exposes all of > NumPy's internals, or requires rewriting a lot of internal NumPy code to > ensure it always casts inputs with asarray(). > > I think it would be useful to expose default implementations of NumPy > operations somewhere to make it easier to implement __array_function__, but > it doesn't make much sense to couple this to user facing overrides. These > can be exposed as a separate package or numpy module (e.g., > numpy.default_implementations) that uses np.duckarray(), which library > authors can make use of by calling inside their __aray_function__ methods. > >> 5. Generation of Random numbers (overriding RandomState). CuPy has its >> own implementation which would be nice to override. >> > I'm not sure that NumPy's random state objects make sense for duck arrays. > Because these are stateful objects, they are pretty coupled to NumPy's > implementation -- you cannot store any additional state on RandomState > objects that might be needed for a new implementation. At a bare minimum, > you will loss the reproducibility of random seeds, though this may be less > of a concern with the new random API. > >> I also share Nathaniel's concern that the overrides in unumpy are too >> powerful, by allowing for control from arbitrary function arguments and >> even *non-local* control (i.e., global variables) from context managers. >> This level of flexibility can make code very hard to debug, especially in >> larger codebases. >> >> Backend switching needs global context, in any case. There isn't a good >> way around that other than the class dundermethods outlined in another >> thread, which would require rewrites of large amounts of code. >> > > Do we really need to support robust backend switching in NumPy? I'm not > strongly opposed, but what use cases does it actually solve to be able to > override np.fft.fft rather than using a new function? > I don't know, but that feels like an odd question. We wanted an FFT backend system. Now applying __array_function__ to numpy.fft happened without a real discussion, but as a backend system I don't think it would have met the criteria. Something that works for CuPy, Dask and Xarray, but not for Pyfftw or mkl_fft is only half a solution. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Wed Sep 11 22:03:20 2019 From: shoyer at gmail.com (Stephan Hoyer) Date: Wed, 11 Sep 2019 19:03:20 -0700 Subject: [Numpy-discussion] =?utf-8?q?NEP_31_=E2=80=94_Context-local_and_?= =?utf-8?q?global_overrides_of_the_NumPy_API?= In-Reply-To: References: <93a3cb0b-2669-23da-e273-091128948cf6@gmail.com> Message-ID: On Wed, Sep 11, 2019 at 4:18 PM Ralf Gommers wrote: > > > On Tue, Sep 10, 2019 at 10:59 AM Stephan Hoyer wrote: > >> On Tue, Sep 10, 2019 at 6:06 AM Hameer Abbasi >> wrote: >> >>> On 10.09.19 05:32, Stephan Hoyer wrote: >>> >>> On Mon, Sep 9, 2019 at 6:27 PM Ralf Gommers >>> wrote: >>> >>>> I think we've chosen to try the former - dispatch on functions so we >>>> can reuse the NumPy API. It could work out well, it could give some >>>> long-term maintenance issues, time will tell. The question is now if and >>>> how to plug the gap that __array_function__ left. It's main limitation is >>>> "doesn't work for functions that don't have an array-like input" - that >>>> left out ~10-20% of functions. So now we have a proposal for a structural >>>> solution to that last 10-20%. It seems logical to want that gap plugged, >>>> rather than go back and say "we shouldn't have gone for the first 80%, so >>>> let's go no further". >>>> >>> >>> I'm excited about solving the remaining 10-20% of use cases for flexible >>> array dispatching, >>> >>> Great! I think most (but not all) of us are on the same page here. > Actually now that Peter came up with the `like=` keyword idea for array > creation functions I'm very interested in seeing that worked out, feels > like that could be a nice solution for part of that 10-20% that did look > pretty bad before. > >> but the unumpy interface suggested here (numpy.overridable) feels like a >>> redundant redo of __array_function__ and __array_ufunc__. >>> >>> > A bit of context: a big part of the reason I advocated for > numpy.overridable is that library authors can use it *only* for the parts > not already covered by the protocols we already have. If there's overlap > there's several ways to deal with that, including only including part of > the unumpy API surface. It does plug all the holes in one go (although you > can then indeed argue it does too much), and there is no other coherent > proposal/vision yet that does this. What you wrote below comes closest, and > I'd love to see that worked out (e.g. the like= argument for array > creation). What I don't like is an ad-hoc plugging of one hole at a time > without visibility on how many more protocols and new workaround functions > in the API we would need. So hopefully we can come to an apples-to-apples > comparison of two design alternatives. > > Also, we just discussed this whole thread in the community call, and it's > clear that it's a complex matter with many different angles. It's very hard > to get a full overview. Our conclusion in the call was that this will > benefit from an in-person discussion. The sprint in November may be a > really good opportunity for that. > Sounds good, I'm looking forward to the discussion at the November sprint! > In the meantime we can of course keep working out ideas/docs. For now I > think it's clear that we (the NEP authors) have some homework to do - that > may take some time. > > >>> I would much rather continue to develop specialized protocols for the >>> remaining usecases. Summarizing those I've seen in this thread, these >>> include: >>> 1. Overrides for customizing array creation and coercion. >>> 2. Overrides to implement operations for new dtypes. >>> 3. Overriding implementations of NumPy functions, e.g., FFT and ufuncs >>> with MKL. >>> >>> (1) could mostly be solved by adding np.duckarray() and another function >>> for duck array coercion. There is still the matter of overriding np.zeros >>> and the like, which perhaps justifies another new protocol, but in my >>> experience the use-cases for truly an array from scratch are quite rare. >>> >>> While they're rare for libraries like XArray; CuPy, Dask and >>> PyData/Sparse need these. >>> >>> >>> (2) should be tackled as part of overhauling NumPy's dtype system to >>> better support user defined dtypes. But it should definitely be in the form >>> of specialized protocols, e.g., which pass in preallocated arrays to into >>> ufuncs for a new dtype. By design, new dtypes should not be able to >>> customize the semantics of array *structure*. >>> >>> We already have a split in the type system with e.g. Cython's buffers, >>> Numba's parallel type system. This is a different issue altogether, e.g. >>> allowing a unyt dtype to spawn a unyt array, rather than forcing a re-write >>> of unyt to cooperate with NumPy's new dtype system. >>> >> >> I guess you're proposing that operations like np.sum(numpy_array, >> dtype=other_dtype) could rely on other_dtype for the implementation and >> potentially return a non-NumPy array? I'm not sure this is well motivated >> -- it would be helpful to discuss actual use-cases. >> >> The most commonly used NumPy functionality related to dtypes can be found >> only in methods on np.ndarray, e.g., astype() and view(). But I don't think >> there's any proposal to change that. >> >>> 4. Having default implementations that allow overrides of a large part >>> of the API while defining only a small part. This holds for e.g. >>> transpose/concatenate. >>> >> I'm not sure how unumpy solve the problems we encountered when trying to >> do this with __array_function__ -- namely the way that it exposes all of >> NumPy's internals, or requires rewriting a lot of internal NumPy code to >> ensure it always casts inputs with asarray(). >> >> I think it would be useful to expose default implementations of NumPy >> operations somewhere to make it easier to implement __array_function__, but >> it doesn't make much sense to couple this to user facing overrides. These >> can be exposed as a separate package or numpy module (e.g., >> numpy.default_implementations) that uses np.duckarray(), which library >> authors can make use of by calling inside their __aray_function__ methods. >> >>> 5. Generation of Random numbers (overriding RandomState). CuPy has its >>> own implementation which would be nice to override. >>> >> I'm not sure that NumPy's random state objects make sense for duck >> arrays. Because these are stateful objects, they are pretty coupled to >> NumPy's implementation -- you cannot store any additional state on >> RandomState objects that might be needed for a new implementation. At a >> bare minimum, you will loss the reproducibility of random seeds, though >> this may be less of a concern with the new random API. >> >>> I also share Nathaniel's concern that the overrides in unumpy are too >>> powerful, by allowing for control from arbitrary function arguments and >>> even *non-local* control (i.e., global variables) from context managers. >>> This level of flexibility can make code very hard to debug, especially in >>> larger codebases. >>> >>> Backend switching needs global context, in any case. There isn't a good >>> way around that other than the class dundermethods outlined in another >>> thread, which would require rewrites of large amounts of code. >>> >> >> Do we really need to support robust backend switching in NumPy? I'm not >> strongly opposed, but what use cases does it actually solve to be able to >> override np.fft.fft rather than using a new function? >> > > I don't know, but that feels like an odd question. We wanted an FFT > backend system. Now applying __array_function__ to numpy.fft happened > without a real discussion, but as a backend system I don't think it would > have met the criteria. Something that works for CuPy, Dask and Xarray, but > not for Pyfftw or mkl_fft is only half a solution. > I agree, __array_function__ is not a backend system. > Cheers, > Ralf > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Thu Sep 12 12:10:27 2019 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Thu, 12 Sep 2019 09:10:27 -0700 Subject: [Numpy-discussion] NumPy Community Survey In-Reply-To: References: Message-ID: <5c334c899d08db072be60d35a2dfedb5f4000f20.camel@sipsolutions.net> Hi all, On Thu, 2019-08-29 at 00:41 -0400, Inessa Pawson wrote: > You know that NumPy is essential to the Python community. The NumPy > team wants you to know that YOU, our user and developer community, > are essential to us. That?s why we are putting together a team to > create the inaugural NumPy Community Survey. > We hope feedback will provide insights that will help us to guide > better decision-making about the development of NumPy as software and > community. > For more information about the proposed survey please refer to > github.com/numpy/numpy-surveys . > > Call for Contributions > We are looking for volunteers experienced in survey design and > translating English into Spanish, Portuguese, Russian, Hindi, Chinese > and other languages. > > If you?d like to learn more about these volunteer opportunities, or > additional ways to support NumPy, feel free to reach out to our > community coordinators at numpy-team at googlegroups.com or join us on > Slack numpy-team.slack.com (email to numpy-team at googlegroups.com for > an invite first). > Just a reminder for everyone that the survey planning is ongoing and happening at https://github.com/numpy/numpy-surveys as well as the Slack channel above (and our weekly community calls). So if you are interested or always wanted to ask users specific questions now or soon is a good time to contribute. It is a rare opportunity for us to do such a survey! Best, Sebastian > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From matti.picus at gmail.com Fri Sep 13 02:25:11 2019 From: matti.picus at gmail.com (mattip) Date: Thu, 12 Sep 2019 23:25:11 -0700 (MST) Subject: [Numpy-discussion] Code review for adding axis argument to permutation and shuffle function In-Reply-To: References: Message-ID: <1568355911628-0.post@n7.nabble.com> This proposal to add an axis argument to permutation and shuffle seems to have garnered no reply. Are people OK with it (for the new random.Generator only) ? -- Sent from: http://numpy-discussion.10968.n7.nabble.com/ From jni at fastmail.com Fri Sep 13 02:47:08 2019 From: jni at fastmail.com (Juan Nunez-Iglesias) Date: Fri, 13 Sep 2019 16:47:08 +1000 Subject: [Numpy-discussion] Code review for adding axis argument to permutation and shuffle function In-Reply-To: <1568355911628-0.post@n7.nabble.com> References: <1568355911628-0.post@n7.nabble.com> Message-ID: I don?t understand why the proposal would be controversial in any way. It?s very natural to have `axis=` keyword arguments in NumPy, and it?s the lack of them that is surprising. My only additional suggestion would be to allow tuples of axes, but that can come later. Juan. > On 13 Sep 2019, at 4:25 pm, mattip wrote: > > This proposal to add an axis argument to permutation and shuffle seems to > have garnered no reply. Are people OK with it (for the new random.Generator > only) ? > > > > -- > Sent from: http://numpy-discussion.10968.n7.nabble.com/ > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion From matthew.brett at gmail.com Fri Sep 13 04:37:35 2019 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 13 Sep 2019 09:37:35 +0100 Subject: [Numpy-discussion] Code review for adding axis argument to permutation and shuffle function In-Reply-To: References: <1568355911628-0.post@n7.nabble.com> Message-ID: Hi, Thanks - yes - I agree, an axis argument seems like a very sensible idea. Cheers, Matthew On Fri, Sep 13, 2019 at 7:48 AM Juan Nunez-Iglesias wrote: > > I don?t understand why the proposal would be controversial in any way. It?s very natural to have `axis=` keyword arguments in NumPy, and it?s the lack of them that is surprising. My only additional suggestion would be to allow tuples of axes, but that can come later. > > Juan. > > > On 13 Sep 2019, at 4:25 pm, mattip wrote: > > > > This proposal to add an axis argument to permutation and shuffle seems to > > have garnered no reply. Are people OK with it (for the new random.Generator > > only) ? > > > > > > > > -- > > Sent from: http://numpy-discussion.10968.n7.nabble.com/ > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion From irvin.probst at ensta-bretagne.fr Fri Sep 13 06:48:55 2019 From: irvin.probst at ensta-bretagne.fr (Irvin Probst) Date: Fri, 13 Sep 2019 12:48:55 +0200 Subject: [Numpy-discussion] round / set_printoptions discrepancy Message-ID: <09ba3019-b523-e123-3874-c0195aa0d503@ensta-bretagne.fr> Hi, Is it expected/documented that np.round and np.set_printoptions do not output the same result on screen ? I tumbled into this running this code: import numpy as np mes = np.array([ ??? [16.06, 16.13, 16.06, 16.00, 16.06, 16.00, 16.13, 16.00] ]) avg = np.mean(mes, axis=1) print(np.round(avg, 2)) np.set_printoptions(precision=2) print(avg) Which outputs: [16.06] [16.05] Is that worth a bug report or did I miss something ? I've been able to reproduce this on many windows/linux PCs with python/numpy releases from 2017 up to last week. Thanks. From deak.andris at gmail.com Fri Sep 13 07:23:57 2019 From: deak.andris at gmail.com (Andras Deak) Date: Fri, 13 Sep 2019 13:23:57 +0200 Subject: [Numpy-discussion] round / set_printoptions discrepancy In-Reply-To: <09ba3019-b523-e123-3874-c0195aa0d503@ensta-bretagne.fr> References: <09ba3019-b523-e123-3874-c0195aa0d503@ensta-bretagne.fr> Message-ID: On Fri, Sep 13, 2019 at 12:58 PM Irvin Probst wrote: > > Hi, > Is it expected/documented that np.round and np.set_printoptions do not > output the same result on screen ? > I tumbled into this running this code: > > import numpy as np > mes = np.array([ > [16.06, 16.13, 16.06, 16.00, 16.06, 16.00, 16.13, 16.00] > ]) > > avg = np.mean(mes, axis=1) > print(np.round(avg, 2)) > np.set_printoptions(precision=2) > print(avg) > > > Which outputs: > > [16.06] > [16.05] > > Is that worth a bug report or did I miss something ? I've been able to > reproduce this on many windows/linux PCs with python/numpy releases from > 2017 up to last week. > > Thanks. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion Hi, I just want to add that you can use literal 16.055 to reproduce this: >>> import numpy as np >>> np.set_printoptions(precision=2) >>> np.array([16.055]).round(2) array([16.06]) >>> np.array([16.055]) array([16.05]) I would think it has to do with "round to nearest even": >>> np.array(16.055) array(16.05) >>> np.array(16.065) array(16.07) >>> np.array(16.065).round(2) 16.07 But it's as if `round` rounded decimal digits upwards (16.055 -> 16.06, 16.065 -> 16.07), whereas the `repr` rounded to the nearest odd(!) digit (16.055 -> 16.05, 16.065 -> 16.07). Does this make any sense? I'm on numpy 1.17.2. (Scalars or 1-length 1d arrays don't seem to make a difference). Regards, Andr?s From hodge at stsci.edu Fri Sep 13 08:05:27 2019 From: hodge at stsci.edu (Philip Hodge) Date: Fri, 13 Sep 2019 08:05:27 -0400 Subject: [Numpy-discussion] round / set_printoptions discrepancy In-Reply-To: References: <09ba3019-b523-e123-3874-c0195aa0d503@ensta-bretagne.fr> Message-ID: <3d56f064-3fb3-69a7-4a07-093eb276e21c@stsci.edu> On 9/13/19 7:23 AM, Andras Deak wrote: > I just want to add that you can use literal 16.055 to reproduce this: >>>> import numpy as np >>>> np.set_printoptions(precision=2) >>>> np.array([16.055]).round(2) > array([16.06]) >>>> np.array([16.055]) > array([16.05]) > > I would think it has to do with "round to nearest even": >>>> np.array(16.055) > array(16.05) >>>> np.array(16.065) > array(16.07) >>>> np.array(16.065).round(2) > 16.07 > > But it's as if `round` rounded decimal digits upwards (16.055 -> > 16.06, 16.065 -> 16.07), whereas the `repr` rounded to the nearest > odd(!) digit (16.055 -> 16.05, 16.065 -> 16.07). Does this make any > sense? I'm on numpy 1.17.2. > (Scalars or 1-length 1d arrays don't seem to make a difference). > Regards, > > Andr?s Isn't that just for consistency with Python 3 round()?? I agree that the discrepancy with np.set_printoptions is not necessarily expected, except possibly for backwards compatibility. Phil -------------- next part -------------- An HTML attachment was scrubbed... URL: From irvin.probst at ensta-bretagne.fr Fri Sep 13 08:45:23 2019 From: irvin.probst at ensta-bretagne.fr (Irvin Probst) Date: Fri, 13 Sep 2019 14:45:23 +0200 Subject: [Numpy-discussion] round / set_printoptions discrepancy In-Reply-To: <3d56f064-3fb3-69a7-4a07-093eb276e21c@stsci.edu> References: <09ba3019-b523-e123-3874-c0195aa0d503@ensta-bretagne.fr> <3d56f064-3fb3-69a7-4a07-093eb276e21c@stsci.edu> Message-ID: <9185ae23-8ebd-6e62-5c5d-4288123b1ea6@ensta-bretagne.fr> On 13/09/2019 14:05, Philip Hodge wrote: > > Isn't that just for consistency with Python 3 round()?? I agree that > the discrepancy with np.set_printoptions is not necessarily expected, > except possibly for backwards compatibility. > > I've just checked and np.set_printoptions behaves as python's round: >>> round(16.055,2) 16.05 >>> np.round(16.055,2) 16.06 I don't know why round and np.round do not behave the same, actually I would even dare to say that I don't care :-) However np.round and np.set_printoptions should provide the same output, shouldn't they ? This discrepancy is really disturbing whereas consistency with python's round looks like the icing on the cake but in no way a required feature. -- Irvin From hodge at stsci.edu Fri Sep 13 08:59:17 2019 From: hodge at stsci.edu (Philip Hodge) Date: Fri, 13 Sep 2019 08:59:17 -0400 Subject: [Numpy-discussion] round / set_printoptions discrepancy In-Reply-To: <9185ae23-8ebd-6e62-5c5d-4288123b1ea6@ensta-bretagne.fr> References: <09ba3019-b523-e123-3874-c0195aa0d503@ensta-bretagne.fr> <3d56f064-3fb3-69a7-4a07-093eb276e21c@stsci.edu> <9185ae23-8ebd-6e62-5c5d-4288123b1ea6@ensta-bretagne.fr> Message-ID: <53f3db55-1c47-cb8e-1805-718bf2f3a5f2@stsci.edu> On 9/13/19 8:45 AM, Irvin Probst wrote: > On 13/09/2019 14:05, Philip Hodge wrote: >> >> Isn't that just for consistency with Python 3 round()?? I agree that >> the discrepancy with np.set_printoptions is not necessarily expected, >> except possibly for backwards compatibility. >> >> > > I've just checked and np.set_printoptions behaves as python's round: > > >>> round(16.055,2) > 16.05 > >>> np.round(16.055,2) > 16.06 > > I don't know why round and np.round do not behave the same, actually I > would even dare to say that I don't care :-) > However np.round and np.set_printoptions should provide the same > output, shouldn't they ? This discrepancy is really disturbing whereas > consistency with python's round looks like the icing on the cake but > in no way a required feature. > Python round() is supposed to round to the nearest even value, if the two closest values are equally close.? So round(16.055, 2) returning 16.05 was a surprise to me.? I checked the documentation and found a note that explained that this was because "most decimal fractions can't be represented exactly as a float."? round(16.55) returns 16.6. Phil From deak.andris at gmail.com Fri Sep 13 09:19:06 2019 From: deak.andris at gmail.com (Andras Deak) Date: Fri, 13 Sep 2019 15:19:06 +0200 Subject: [Numpy-discussion] round / set_printoptions discrepancy In-Reply-To: <53f3db55-1c47-cb8e-1805-718bf2f3a5f2@stsci.edu> References: <09ba3019-b523-e123-3874-c0195aa0d503@ensta-bretagne.fr> <3d56f064-3fb3-69a7-4a07-093eb276e21c@stsci.edu> <9185ae23-8ebd-6e62-5c5d-4288123b1ea6@ensta-bretagne.fr> <53f3db55-1c47-cb8e-1805-718bf2f3a5f2@stsci.edu> Message-ID: On Fri, Sep 13, 2019 at 2:59 PM Philip Hodge wrote: > > On 9/13/19 8:45 AM, Irvin Probst wrote: > > On 13/09/2019 14:05, Philip Hodge wrote: > >> > >> Isn't that just for consistency with Python 3 round()? I agree that > >> the discrepancy with np.set_printoptions is not necessarily expected, > >> except possibly for backwards compatibility. > >> > >> > > > > I've just checked and np.set_printoptions behaves as python's round: > > > > >>> round(16.055,2) > > 16.05 > > >>> np.round(16.055,2) > > 16.06 > > > > I don't know why round and np.round do not behave the same, actually I > > would even dare to say that I don't care :-) > > However np.round and np.set_printoptions should provide the same > > output, shouldn't they ? This discrepancy is really disturbing whereas > > consistency with python's round looks like the icing on the cake but > > in no way a required feature. > > > > Python round() is supposed to round to the nearest even value, if the > two closest values are equally close. So round(16.055, 2) returning > 16.05 was a surprise to me. I checked the documentation and found a > note that explained that this was because "most decimal fractions can't > be represented exactly as a float." round(16.55) returns 16.6. > > Phil > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion Ah, of course, endless double-precision shenanigans... >>> format(16.055, '.30f') '16.054999999999999715782905695960' >>> format(16.55, '.30f') '16.550000000000000710542735760100' Andr?s From ewm at redtetrahedron.org Fri Sep 13 09:26:21 2019 From: ewm at redtetrahedron.org (Eric Moore) Date: Fri, 13 Sep 2019 09:26:21 -0400 Subject: [Numpy-discussion] round / set_printoptions discrepancy In-Reply-To: References: <09ba3019-b523-e123-3874-c0195aa0d503@ensta-bretagne.fr> <3d56f064-3fb3-69a7-4a07-093eb276e21c@stsci.edu> <9185ae23-8ebd-6e62-5c5d-4288123b1ea6@ensta-bretagne.fr> <53f3db55-1c47-cb8e-1805-718bf2f3a5f2@stsci.edu> Message-ID: See the notes section here. https://numpy.org/devdocs/reference/generated/numpy.around.html. This note was recently added in https://github.com/numpy/numpy/pull/14392 Eric On Fri, Sep 13, 2019 at 9:20 AM Andras Deak wrote: > On Fri, Sep 13, 2019 at 2:59 PM Philip Hodge wrote: > > > > On 9/13/19 8:45 AM, Irvin Probst wrote: > > > On 13/09/2019 14:05, Philip Hodge wrote: > > >> > > >> Isn't that just for consistency with Python 3 round()? I agree that > > >> the discrepancy with np.set_printoptions is not necessarily expected, > > >> except possibly for backwards compatibility. > > >> > > >> > > > > > > I've just checked and np.set_printoptions behaves as python's round: > > > > > > >>> round(16.055,2) > > > 16.05 > > > >>> np.round(16.055,2) > > > 16.06 > > > > > > I don't know why round and np.round do not behave the same, actually I > > > would even dare to say that I don't care :-) > > > However np.round and np.set_printoptions should provide the same > > > output, shouldn't they ? This discrepancy is really disturbing whereas > > > consistency with python's round looks like the icing on the cake but > > > in no way a required feature. > > > > > > > Python round() is supposed to round to the nearest even value, if the > > two closest values are equally close. So round(16.055, 2) returning > > 16.05 was a surprise to me. I checked the documentation and found a > > note that explained that this was because "most decimal fractions can't > > be represented exactly as a float." round(16.55) returns 16.6. > > > > Phil > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > Ah, of course, endless double-precision shenanigans... > >>> format(16.055, '.30f') > '16.054999999999999715782905695960' > > >>> format(16.55, '.30f') > '16.550000000000000710542735760100' > > Andr?s > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From irvin.probst at ensta-bretagne.fr Fri Sep 13 09:34:41 2019 From: irvin.probst at ensta-bretagne.fr (Irvin Probst) Date: Fri, 13 Sep 2019 15:34:41 +0200 Subject: [Numpy-discussion] round / set_printoptions discrepancy In-Reply-To: References: <09ba3019-b523-e123-3874-c0195aa0d503@ensta-bretagne.fr> <3d56f064-3fb3-69a7-4a07-093eb276e21c@stsci.edu> <9185ae23-8ebd-6e62-5c5d-4288123b1ea6@ensta-bretagne.fr> <53f3db55-1c47-cb8e-1805-718bf2f3a5f2@stsci.edu> Message-ID: <46cc64b8-9276-9072-9c1c-d7a3d331824a@ensta-bretagne.fr> On 13/09/2019 15:26, Eric Moore wrote: > See the notes section here. > https://numpy.org/devdocs/reference/generated/numpy.around.html. > > This note was recently added in https://github.com/numpy/numpy/pull/14392 > > Thanks, it indeed explains the discrepancy. From stefano.miccoli at polimi.it Fri Sep 13 11:59:43 2019 From: stefano.miccoli at polimi.it (Stefano Miccoli) Date: Fri, 13 Sep 2019 15:59:43 +0000 Subject: [Numpy-discussion] round / set_printoptions discrepancy Message-ID: In my opinion the problem is that numpy floats break the Liskov substitution principle, >>> pyfloat = 16.055 >>> npfloat = np.float64(pyfloat) >>> isinstance(npfloat, float) True >>> round(pyfloat, 2) 16.05 >>> round(npfloat, 2) 16.06 Since numpy.float64 is a subclass of builtins.float I would expect that >>> round(x, j) == round(np.float64(x), j) is an invariant, but unfortunately this is not the case. Moreover the python3 semantics of the round function require that when the number of digits is None, the return value should be of integral type: >>> round(pyfloat) 16 >>> round(pyfloat, None) 16 >>> round(pyfloat, 0) 16.0 >>> round(npfloat) 16.0 >>> round(npfloat, None) 16.0 >>> round(npfloat, 0) 16.0 see also https://github.com/numpy/numpy/issues/11810 Stefano From warren.weckesser at gmail.com Fri Sep 13 16:18:06 2019 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Fri, 13 Sep 2019 16:18:06 -0400 Subject: [Numpy-discussion] Code review for adding axis argument to permutation and shuffle function In-Reply-To: References: Message-ID: On 7/4/19, Kexuan Sun wrote: > Hi, > > I would like to request a code review. The random.permutation and > random.shuffle functions now can only shuffle along the first axis of a > multi-dimensional array. I propose to add an axis argument for the > functions and allow them to shuffle along a given axis. Here is the link > to the PR (https://github.com/numpy/numpy/pull/13829). Given the current semantics of 'shuffle', the proposed change makes sense. However, I would like to call attention to https://github.com/numpy/numpy/issues/5173 and to the mailing list thread from 2014 that I started here: https://mail.python.org/pipermail/numpy-discussion/2014-October/071340.html The topic of those discussions was that the current behavior of 'shuffle' is often *not* what users want or expect. What is often desired is to shuffle each row (or column, or whatever dimension is specified) *independently* of the others. So if a = np.array([[0, 1, 2, 3, 4], [5, 6, 7, 8, 9], [10, 11, 12, 13, 14]]), then randomly shuffling 'a' along axis=1 should shuffle each row independently of the others, to create something like a = np.array([[2, 4, 0, 3, 1], [8, 6, 9, 7, 5], [11, 12, 10, 14, 13]]) An API for this was discussed (and of course that ran into the second of the two hard problems in computer science, naming things). Take a look at those discussions, and check that https://github.com/numpy/numpy/pull/13829 fits in with the possible changes mentioned in those discussions. If we don't use the name 'shuffle' for the new random permutation function(s), then the change in PR 13829 is a good one. However, if we want to try to reuse the name 'shuffle' to also allow independent shuffling along an axis, then we have to be careful with how we interpret the 'axis' argument. Warren > > Thanks! > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > From allanhaldane at gmail.com Fri Sep 13 18:27:51 2019 From: allanhaldane at gmail.com (Allan Haldane) Date: Fri, 13 Sep 2019 18:27:51 -0400 Subject: [Numpy-discussion] round / set_printoptions discrepancy In-Reply-To: References: <09ba3019-b523-e123-3874-c0195aa0d503@ensta-bretagne.fr> <3d56f064-3fb3-69a7-4a07-093eb276e21c@stsci.edu> <9185ae23-8ebd-6e62-5c5d-4288123b1ea6@ensta-bretagne.fr> <53f3db55-1c47-cb8e-1805-718bf2f3a5f2@stsci.edu> Message-ID: <58b61af1-3f50-1250-71c6-7ad830c7f19e@gmail.com> On 9/13/19 9:26 AM, Eric Moore wrote: > See the notes section here.?? > https://numpy.org/devdocs/reference/generated/numpy.around.html. > > This note was recently added in?https://github.com/numpy/numpy/pull/14392 > > Eric Hmm, but this example with 16.055 shows the note still isn't quite right. The doc suggests that the floating point error only matters for large values or large `decimals`, but this shows it also happens for small values. Makes sense now that I see the example. We should tweak the docstring. Also, I did make some notes in https://github.com/numpy/numpy/issues/14391 for how we could "fix" this problem efficiently. Unfortunately it's far from trivial to write a correct rounding algorithm, and I'm not sure it's worth the effort: The round error is comparable to normal floating-point error, and I don't think round is heavily used. Best, Allan > On Fri, Sep 13, 2019 at 9:20 AM Andras Deak > wrote: > > On Fri, Sep 13, 2019 at 2:59 PM Philip Hodge > wrote: > > > > On 9/13/19 8:45 AM, Irvin Probst wrote: > > > On 13/09/2019 14:05, Philip Hodge wrote: > > >> > > >> Isn't that just for consistency with Python 3 round()?? I agree > that > > >> the discrepancy with np.set_printoptions is not necessarily > expected, > > >> except possibly for backwards compatibility. > > >> > > >> > > > > > > I've just checked and np.set_printoptions behaves as python's round: > > > > > > >>> round(16.055,2) > > > 16.05 > > > >>> np.round(16.055,2) > > > 16.06 > > > > > > I don't know why round and np.round do not behave the same, > actually I > > > would even dare to say that I don't care :-) > > > However np.round and np.set_printoptions should provide the same > > > output, shouldn't they ? This discrepancy is really disturbing > whereas > > > consistency with python's round looks like the icing on the cake but > > > in no way a required feature. > > > > > > > Python round() is supposed to round to the nearest even value, if the > > two closest values are equally close.? So round(16.055, 2) returning > > 16.05 was a surprise to me.? I checked the documentation and found a > > note that explained that this was because "most decimal fractions > can't > > be represented exactly as a float."? round(16.55) returns 16.6. > > > > Phil > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > Ah, of course, endless double-precision shenanigans... > >>> format(16.055, '.30f') > '16.054999999999999715782905695960' > > >>> format(16.55, '.30f') > '16.550000000000000710542735760100' > > Andr?s > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > From ralf.gommers at gmail.com Sun Sep 15 20:24:57 2019 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sun, 15 Sep 2019 17:24:57 -0700 Subject: [Numpy-discussion] keeping all meeting notes and docs in new repo? Message-ID: Hi all, We have had community calls for quite a while, the minutes of which are kept in https://github.com/BIDS-numpy/docs. That's quite hard to discover, it would be better if those lived under the NumPy GitHub org. Also, we have minutes from Season of Docs and website redesign calls, plus occasionally some other docs (e.g. the roadmap drafts were on hackmd.io). Would it make sense to add a new repo to contain all such meeting minutes and docs? Presentations and proposals may make sense to add as well - several people have given presentations or submitted proposals on behalf of the project. Inessa also suggested to enable HackMD Hub (see https://hackmd.io/c/tutorials/%2Fs%2Flink-with-github) so we get automatic versioning for some HackMD documents. I haven't used it before, but it looks good. Thoughts? Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Mon Sep 16 15:09:19 2019 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 16 Sep 2019 12:09:19 -0700 Subject: [Numpy-discussion] How to Capitalize numpy? In-Reply-To: References: Message-ID: Trivial note: On the subject of naming things (spelling things??) -- should it be: numpy or Numpy or NumPy ? All three are in the draft NEP 30 ( mostly "NumPy", I noticed this when reading/copy editing the NEP) . Is there an "official" capitalization? My preference, would be to use "numpy", and where practicable, use a "computer" font -- i.e. ``numpy`` in RST. But if there is consensus already for anything else, that's fine, I'd just like to know what it is. -CHB On Mon, Aug 12, 2019 at 4:02 AM Peter Andreas Entschev wrote: > Apologies for the late reply. I've opened a new PR > https://github.com/numpy/numpy/pull/14257 with the changes requested > on clarifying the text. After reading the detailed description, I've > decided to add a subsection "Scope" to clarify the scope where NEP-30 > would be useful. I think the inclusion of this new subsection > complements the "Detail description" forming a complete text w.r.t. > motivation of the NEP, but feel free to point out disagreements with > my suggestion. I've also added a new section "Usage" pointing out how > one would use duck array in replacement to np.asarray where relevant. > > Regarding the naming discussion, I must say I like the idea of keeping > the __array_ prefix, but it seems like that is going to be difficult > given that none of the existing ideas so far play very nicely with > that. So if the general consensus is to go with __numpy_like__, I > would also update the NEP to reflect that changes. FWIW, I > particularly neither like nor dislike __numpy_like__, but I don't have > any better suggestions than that or keeping the current naming. > > Best, > Peter > > On Thu, Aug 8, 2019 at 3:40 AM Stephan Hoyer wrote: > > > > > > > > On Wed, Aug 7, 2019 at 6:18 PM Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> > >> > >> > >> On Wed, Aug 7, 2019 at 7:10 PM Stephan Hoyer wrote: > >>> > >>> On Wed, Aug 7, 2019 at 5:11 PM Ralf Gommers > wrote: > >>>> > >>>> > >>>> On Mon, Aug 5, 2019 at 6:18 PM Stephan Hoyer > wrote: > >>>>> > >>>>> On Mon, Aug 5, 2019 at 2:48 PM Ralf Gommers > wrote: > >>>>> > >>>>>> > >>>>>> The NEP currently does not say who this is meant for. Would you > expect libraries like SciPy to adopt it for example? > >>>>>> > >>>>>> The NEP also (understandably) punts on the question of when > something is a valid duck array. If you want this to be widely used, that > will need an answer or at least some rough guidance though. For example, we > would expect a duck array to have a mean() method, but probably not a ptp() > method. A library author who wants to use np.duckarray() needs to know, > because she can't test with all existing and future duck array > implementations. > >>>>> > >>>>> > >>>>> I think this is covered in NEP-22 already. > >>>> > >>>> > >>>> It's not really. We discussed this briefly in the community call > today, Peter said he will try to add some text. > >>>> > >>>> We should not add new functions to NumPy without indicating who is > supposed to use this, and what need it fills / problem it solves. It seems > pretty clear to me that it's mostly aimed at library authors rather than > end users. And also that mature libraries like SciPy may not immediately > adopt it, because it's too fuzzy - so it's new libraries first, mature > libraries after the dust has settled a bit (I think). > >>> > >>> > >>> I totally agree -- we definitely should clarify this in the docstring > and elsewhere in the docs. An example in the new doc page on "Writing > custom array containers" ( > https://numpy.org/devdocs/user/basics.dispatch.html) would also probably > be appropriate. > >>> > >>>>> > >>>>> As discussed there, I don't think NumPy is in a good position to > pronounce decisive APIs at this time. I would welcome efforts to try, but I > don't think that's essential for now. > >>>> > >>>> > >>>> There's no need to pronounce a decisive API that fully covers duck > array. Note that RNumPy is an attempt in that direction (not a full one, > but way better than nothing). In the NEP/docs, at least saying something > along the lines of "if you implement this, we recommend the following > strategy: check if a function is present in Dask, CuPy and Sparse. If so, > it's reasonable to expect any duck array to work here. If not, we suggest > you indicate in your docstring what kinds of duck arrays are accepted, or > what properties they need to have". That's a spec by implementation, which > is less than ideal but better than saying nothing. > >>> > >>> > >>> OK, I agree here as well -- some guidance is better than nothing. > >>> > >>> Two other minor notes on this NEP, concerning naming: > >>> 1. We should have a brief note on why we settled on the name "duck > array". Namely, as discussed in NEP-22, we don't love the "duck" jargon, > but we couldn't come up with anything better since NumPy already uses > "array like" and "any array" for different purposes. > >>> 2. The protocol should use *something* more clearly namespaced as > NumPy specific than __duckarray__. All the other special protocols NumPy > defines start with "__array_". That suggests either __array_duckarray__ > (sounds a little redundant) or __numpy_duckarray__ (which I like the look > of, but is a different from the existing protocols). > >>> > >> > >> `__numpy_like__` ? > > > > > > > > This could work, but I think we would also want to rename the NumPy > function itself to either np.like or np.numpy_like. The later is a little > redundant but definitely more self-descriptive than "duck array". > > > >> > >> Chuck > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> NumPy-Discussion at python.org > >> https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Mon Sep 16 15:25:35 2019 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 16 Sep 2019 12:25:35 -0700 Subject: [Numpy-discussion] NEP 30 - Duck Typing for NumPy Arrays - Implementation In-Reply-To: References: Message-ID: On Mon, Aug 12, 2019 at 4:02 AM Peter Andreas Entschev wrote: > Apologies for the late reply. I've opened a new PR > https://github.com/numpy/numpy/pull/14257 with the changes requested > thanks! I've written a small PR on your PR: https://github.com/pentschev/numpy/pull/1 Essentially, other than typos and copy editing, I'm suggesting that a duck-array could choose to implement __array__ if it so chooses -- it should, of course, return an actual numpy array. I think this could be useful, as much code does require an actual numpy array, and only that class itself knows how best to convert to one. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Mon Sep 16 15:36:00 2019 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 16 Sep 2019 12:36:00 -0700 Subject: [Numpy-discussion] Add a total_seconds() method to timedelta64? Message-ID: I just noticed that there is no obvious way to convert a timedelta64 to seconds (or some other easy unit) as a number. The stdlib datetime.datetime has a .total_seconds() method for doing that. I think it's a handy thing to have. Looking at StackOverflow (and others), I see people suggesting things like: a_timedelta.astype(np.float) / 1e6 This seems a really bad idea, as it's assuming the timedelta is storing milliseconds. The "proper" way to do it also suggested: a_timedelta / np.timedelta64(1, 's') This is, in fact, a much better way to do it, and allows you to specify other units if you like: "ms"., "us", etc. There was semi-recently a discussion thread on python-ideas about adding other methods to datetime: (e.g. .total_hours, .total_minutes). That was pretty much rejected (or petered out anyway), and some argued that dividing by a timedelta of the unit you want is the "right" way to do it anyway (some argued that .total_seconds() never should have been added. Personally I understand the "correctness" of using united-division, but "practicality beats purity", and the discoverability of a method or two really makes it easier on folks. That being said, of folks don't want to add .total_seconds and the like -- we should add a bit to the docs about this, suggesting using the division approach. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From peter at entschev.com Mon Sep 16 16:35:33 2019 From: peter at entschev.com (Peter Andreas Entschev) Date: Mon, 16 Sep 2019 22:35:33 +0200 Subject: [Numpy-discussion] How to Capitalize numpy? In-Reply-To: References: Message-ID: My answer to that: "NumPy". Reference: logo at the top of https://numpy.org/neps/index.html . In NEP-30 [1], I've used "NumPy" everywhere, except for references to code, repos, etc., where "numpy" is used. I see there's one occurrence of "Numpy", which was definitely a typo and I had not noticed it until now, but I will address this on a future update, thanks for pointing that out. [1] https://numpy.org/neps/nep-0030-duck-array-protocol.html On Mon, Sep 16, 2019 at 9:09 PM Chris Barker wrote: > > Trivial note: > > On the subject of naming things (spelling things??) -- should it be: > > numpy > or > Numpy > or > NumPy > ? > > All three are in the draft NEP 30 ( mostly "NumPy", I noticed this when reading/copy editing the NEP) . Is there an "official" capitalization? > > My preference, would be to use "numpy", and where practicable, use a "computer" font -- i.e. ``numpy`` in RST. > > But if there is consensus already for anything else, that's fine, I'd just like to know what it is. > > -CHB > > > > On Mon, Aug 12, 2019 at 4:02 AM Peter Andreas Entschev wrote: >> >> Apologies for the late reply. I've opened a new PR >> https://github.com/numpy/numpy/pull/14257 with the changes requested >> on clarifying the text. After reading the detailed description, I've >> decided to add a subsection "Scope" to clarify the scope where NEP-30 >> would be useful. I think the inclusion of this new subsection >> complements the "Detail description" forming a complete text w.r.t. >> motivation of the NEP, but feel free to point out disagreements with >> my suggestion. I've also added a new section "Usage" pointing out how >> one would use duck array in replacement to np.asarray where relevant. >> >> Regarding the naming discussion, I must say I like the idea of keeping >> the __array_ prefix, but it seems like that is going to be difficult >> given that none of the existing ideas so far play very nicely with >> that. So if the general consensus is to go with __numpy_like__, I >> would also update the NEP to reflect that changes. FWIW, I >> particularly neither like nor dislike __numpy_like__, but I don't have >> any better suggestions than that or keeping the current naming. >> >> Best, >> Peter >> >> On Thu, Aug 8, 2019 at 3:40 AM Stephan Hoyer wrote: >> > >> > >> > >> > On Wed, Aug 7, 2019 at 6:18 PM Charles R Harris wrote: >> >> >> >> >> >> >> >> On Wed, Aug 7, 2019 at 7:10 PM Stephan Hoyer wrote: >> >>> >> >>> On Wed, Aug 7, 2019 at 5:11 PM Ralf Gommers wrote: >> >>>> >> >>>> >> >>>> On Mon, Aug 5, 2019 at 6:18 PM Stephan Hoyer wrote: >> >>>>> >> >>>>> On Mon, Aug 5, 2019 at 2:48 PM Ralf Gommers wrote: >> >>>>> >> >>>>>> >> >>>>>> The NEP currently does not say who this is meant for. Would you expect libraries like SciPy to adopt it for example? >> >>>>>> >> >>>>>> The NEP also (understandably) punts on the question of when something is a valid duck array. If you want this to be widely used, that will need an answer or at least some rough guidance though. For example, we would expect a duck array to have a mean() method, but probably not a ptp() method. A library author who wants to use np.duckarray() needs to know, because she can't test with all existing and future duck array implementations. >> >>>>> >> >>>>> >> >>>>> I think this is covered in NEP-22 already. >> >>>> >> >>>> >> >>>> It's not really. We discussed this briefly in the community call today, Peter said he will try to add some text. >> >>>> >> >>>> We should not add new functions to NumPy without indicating who is supposed to use this, and what need it fills / problem it solves. It seems pretty clear to me that it's mostly aimed at library authors rather than end users. And also that mature libraries like SciPy may not immediately adopt it, because it's too fuzzy - so it's new libraries first, mature libraries after the dust has settled a bit (I think). >> >>> >> >>> >> >>> I totally agree -- we definitely should clarify this in the docstring and elsewhere in the docs. An example in the new doc page on "Writing custom array containers" (https://numpy.org/devdocs/user/basics.dispatch.html) would also probably be appropriate. >> >>> >> >>>>> >> >>>>> As discussed there, I don't think NumPy is in a good position to pronounce decisive APIs at this time. I would welcome efforts to try, but I don't think that's essential for now. >> >>>> >> >>>> >> >>>> There's no need to pronounce a decisive API that fully covers duck array. Note that RNumPy is an attempt in that direction (not a full one, but way better than nothing). In the NEP/docs, at least saying something along the lines of "if you implement this, we recommend the following strategy: check if a function is present in Dask, CuPy and Sparse. If so, it's reasonable to expect any duck array to work here. If not, we suggest you indicate in your docstring what kinds of duck arrays are accepted, or what properties they need to have". That's a spec by implementation, which is less than ideal but better than saying nothing. >> >>> >> >>> >> >>> OK, I agree here as well -- some guidance is better than nothing. >> >>> >> >>> Two other minor notes on this NEP, concerning naming: >> >>> 1. We should have a brief note on why we settled on the name "duck array". Namely, as discussed in NEP-22, we don't love the "duck" jargon, but we couldn't come up with anything better since NumPy already uses "array like" and "any array" for different purposes. >> >>> 2. The protocol should use *something* more clearly namespaced as NumPy specific than __duckarray__. All the other special protocols NumPy defines start with "__array_". That suggests either __array_duckarray__ (sounds a little redundant) or __numpy_duckarray__ (which I like the look of, but is a different from the existing protocols). >> >>> >> >> >> >> `__numpy_like__` ? >> > >> > >> > >> > This could work, but I think we would also want to rename the NumPy function itself to either np.like or np.numpy_like. The later is a little redundant but definitely more self-descriptive than "duck array". >> > >> >> >> >> Chuck >> >> _______________________________________________ >> >> NumPy-Discussion mailing list >> >> NumPy-Discussion at python.org >> >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at python.org >> > https://mail.python.org/mailman/listinfo/numpy-discussion >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion > > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion From peter at entschev.com Mon Sep 16 16:44:44 2019 From: peter at entschev.com (Peter Andreas Entschev) Date: Mon, 16 Sep 2019 22:44:44 +0200 Subject: [Numpy-discussion] NEP 30 - Duck Typing for NumPy Arrays - Implementation In-Reply-To: References: Message-ID: What would be the use case for a duck-array to implement __array__ and return a NumPy array? Unless I'm missing something, this seems redundant and one should just use array/asarray functions then. This would also prevent error-handling, what if the developer intentionally wants a NumPy-like array (e.g., the original array passed to the duckarray function) or an exception (instead of coercing to a NumPy array)? On Mon, Sep 16, 2019 at 9:25 PM Chris Barker wrote: > > > > On Mon, Aug 12, 2019 at 4:02 AM Peter Andreas Entschev wrote: >> >> Apologies for the late reply. I've opened a new PR >> https://github.com/numpy/numpy/pull/14257 with the changes requested > > > thanks! > > I've written a small PR on your PR: > > https://github.com/pentschev/numpy/pull/1 > > Essentially, other than typos and copy editing, I'm suggesting that a duck-array could choose to implement __array__ if it so chooses -- it should, of course, return an actual numpy array. > > I think this could be useful, as much code does require an actual numpy array, and only that class itself knows how best to convert to one. > > -CHB > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion From shoyer at gmail.com Mon Sep 16 17:25:53 2019 From: shoyer at gmail.com (Stephan Hoyer) Date: Mon, 16 Sep 2019 14:25:53 -0700 Subject: [Numpy-discussion] NEP 30 - Duck Typing for NumPy Arrays - Implementation In-Reply-To: References: Message-ID: On Mon, Sep 16, 2019 at 1:45 PM Peter Andreas Entschev wrote: > What would be the use case for a duck-array to implement __array__ and > return a NumPy array? Unless I'm missing something, this seems > redundant and one should just use array/asarray functions then. This > would also prevent error-handling, what if the developer intentionally > wants a NumPy-like array (e.g., the original array passed to the > duckarray function) or an exception (instead of coercing to a NumPy > array)? > Dask arrays are a good example. They will want to implement __duck_array__ (or whatever we call it) because they support duck typed versions of NumPy operation. They also (already) implement __array__, so they can converted into NumPy arrays as a fallback. This is convenient for moderately sized dask arrays, e.g., so you can pass one into a matplotlib function. > > On Mon, Sep 16, 2019 at 9:25 PM Chris Barker > wrote: > > > > > > > > On Mon, Aug 12, 2019 at 4:02 AM Peter Andreas Entschev < > peter at entschev.com> wrote: > >> > >> Apologies for the late reply. I've opened a new PR > >> https://github.com/numpy/numpy/pull/14257 with the changes requested > > > > > > thanks! > > > > I've written a small PR on your PR: > > > > https://github.com/pentschev/numpy/pull/1 > > > > Essentially, other than typos and copy editing, I'm suggesting that a > duck-array could choose to implement __array__ if it so chooses -- it > should, of course, return an actual numpy array. > > > > I think this could be useful, as much code does require an actual numpy > array, and only that class itself knows how best to convert to one. > > > > -CHB > > > > -- > > > > Christopher Barker, Ph.D. > > Oceanographer > > > > Emergency Response Division > > NOAA/NOS/OR&R (206) 526-6959 voice > > 7600 Sand Point Way NE (206) 526-6329 fax > > Seattle, WA 98115 (206) 526-6317 main reception > > > > Chris.Barker at noaa.gov > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jh at physics.ucf.edu Mon Sep 16 17:32:33 2019 From: jh at physics.ucf.edu (Joe Harrington) Date: Mon, 16 Sep 2019 23:32:33 +0200 Subject: [Numpy-discussion] How to Capitalize numpy? In-Reply-To: References: Message-ID: <8ce61707-b8be-253e-02d2-ff8ebeee2971@physics.ucf.edu> Here are my thoughts on textual capitalization (at first, I thought you wanted to raise money!): We all agree that in code, it is "numpy".? If you don't use that, it throws an error.? If, in text, we keep "numpy" with a forced lower-case letter at the start, it is just one more oddball to remember.? It is even weirder in titles and the beginnings of sentences.? I'd strongly like not to be weird that way.? A few packages are, it's annoying, and it doesn't much earn them any goodwill. The default among people who are not "in the know" will be to do what they're used to.? Let's give them what they're used to, a proper noun with initial (at least) capital. Likewise, I object to preferring a particular font.? What fonts to use for the names of things like software packages is a decision for publications to make.? A journal or manual might make fine distinctions and demand several different, specific fonts, while a popular publication might prefer not to do that.? Leave the typesetting to the editors of the publications.? We can certainly adopt a standard for our publications (docs, web pages, etc.), but we should state explicitly that others can do as they like. It's not an acronym, so that leaves the options of "Numpy" and "NumPy".? It would be great, easy to remember, consistent for others, etc., if NumPy and SciPy were capitalized the same way and were pronounced the same (I still occasionally hear "numpee"). So, I would favor "NumPy" to go along with "SciPy", and let the context choose the font. --jh-- On 9/16/19 9:09 PM, Chris Barker wrote: Trivial note: On the subject of naming things (spelling things??) -- should it be: numpy or Numpy or NumPy ? All three are in the draft NEP 30 ( mostly "NumPy", I noticed this when reading/copy editing the NEP) . Is there an "official" capitalization? My preference, would be to use "numpy", and where practicable, use a "computer" font -- i.e. ``numpy`` in RST. But if there is consensus already for anything else, that's fine, I'd just like to know what it is. -CHB On Mon, Aug 12, 2019 at 4:02 AM Peter Andreas Entschev > wrote: Apologies for the late reply. I've opened a new PR https://github.com/numpy/numpy/pull/14257 with the changes requested on clarifying the text. After reading the detailed description, I've decided to add a subsection "Scope" to clarify the scope where NEP-30 would be useful. I think the inclusion of this new subsection complements the "Detail description" forming a complete text w.r.t. motivation of the NEP, but feel free to point out disagreements with my suggestion. I've also added a new section "Usage" pointing out how one would use duck array in replacement to np.asarray where relevant. Regarding the naming discussion, I must say I like the idea of keeping the __array_ prefix, but it seems like that is going to be difficult given that none of the existing ideas so far play very nicely with that. So if the general consensus is to go with __numpy_like__, I would also update the NEP to reflect that changes. FWIW, I particularly neither like nor dislike __numpy_like__, but I don't have any better suggestions than that or keeping the current naming. Best, Peter On Thu, Aug 8, 2019 at 3:40 AM Stephan Hoyer > wrote: > > > > On Wed, Aug 7, 2019 at 6:18 PM Charles R Harris > wrote: >> >> >> >> On Wed, Aug 7, 2019 at 7:10 PM Stephan Hoyer > wrote: >>> >>> On Wed, Aug 7, 2019 at 5:11 PM Ralf Gommers > wrote: >>>> >>>> >>>> On Mon, Aug 5, 2019 at 6:18 PM Stephan Hoyer > wrote: >>>>> >>>>> On Mon, Aug 5, 2019 at 2:48 PM Ralf Gommers > wrote: >>>>> >>>>>> >>>>>> The NEP currently does not say who this is meant for. Would you expect libraries like SciPy to adopt it for example? >>>>>> >>>>>> The NEP also (understandably) punts on the question of when something is a valid duck array. If you want this to be widely used, that will need an answer or at least some rough guidance though. For example, we would expect a duck array to have a mean() method, but probably not a ptp() method. A library author who wants to use np.duckarray() needs to know, because she can't test with all existing and future duck array implementations. >>>>> >>>>> >>>>> I think this is covered in NEP-22 already. >>>> >>>> >>>> It's not really. We discussed this briefly in the community call today, Peter said he will try to add some text. >>>> >>>> We should not add new functions to NumPy without indicating who is supposed to use this, and what need it fills / problem it solves. It seems pretty clear to me that it's mostly aimed at library authors rather than end users. And also that mature libraries like SciPy may not immediately adopt it, because it's too fuzzy - so it's new libraries first, mature libraries after the dust has settled a bit (I think). >>> >>> >>> I totally agree -- we definitely should clarify this in the docstring and elsewhere in the docs. An example in the new doc page on "Writing custom array containers" (https://numpy.org/devdocs/user/basics.dispatch.html) would also probably be appropriate. >>> >>>>> >>>>> As discussed there, I don't think NumPy is in a good position to pronounce decisive APIs at this time. I would welcome efforts to try, but I don't think that's essential for now. >>>> >>>> >>>> There's no need to pronounce a decisive API that fully covers duck array. Note that RNumPy is an attempt in that direction (not a full one, but way better than nothing). In the NEP/docs, at least saying something along the lines of "if you implement this, we recommend the following strategy: check if a function is present in Dask, CuPy and Sparse. If so, it's reasonable to expect any duck array to work here. If not, we suggest you indicate in your docstring what kinds of duck arrays are accepted, or what properties they need to have". That's a spec by implementation, which is less than ideal but better than saying nothing. >>> >>> >>> OK, I agree here as well -- some guidance is better than nothing. >>> >>> Two other minor notes on this NEP, concerning naming: >>> 1. We should have a brief note on why we settled on the name "duck array". Namely, as discussed in NEP-22, we don't love the "duck" jargon, but we couldn't come up with anything better since NumPy already uses "array like" and "any array" for different purposes. >>> 2. The protocol should use *something* more clearly namespaced as NumPy specific than __duckarray__. All the other special protocols NumPy defines start with "__array_". That suggests either __array_duckarray__ (sounds a little redundant) or __numpy_duckarray__ (which I like the look of, but is a different from the existing protocols). >>> >> >> `__numpy_like__` ? > > > > This could work, but I think we would also want to rename the NumPy function itself to either np.like or np.numpy_like. The later is a little redundant but definitely more self-descriptive than "duck array". > >> >> Chuck >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at python.org https://mail.python.org/mailman/listinfo/numpy-discussion -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959?? voice 7600 Sand Point Way NE ??(206) 526-6329?? fax Seattle, WA ?98115 ? ? ??(206) 526-6317?? main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Mon Sep 16 17:40:10 2019 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Mon, 16 Sep 2019 14:40:10 -0700 Subject: [Numpy-discussion] How to Capitalize numpy? In-Reply-To: References: Message-ID: On Mon, Sep 16, 2019 at 1:42 PM Peter Andreas Entschev wrote: > My answer to that: "NumPy". Reference: logo at the top of > https://numpy.org/neps/index.html . > Yes, NumPy is the right capitalization > In NEP-30 [1], I've used "NumPy" everywhere, except for references to > code, repos, etc., where "numpy" is used. I see there's one occurrence > of "Numpy", which was definitely a typo and I had not noticed it until > now, but I will address this on a future update, thanks for pointing > that out. > > [1] https://numpy.org/neps/nep-0030-duck-array-protocol.html > > On Mon, Sep 16, 2019 at 9:09 PM Chris Barker > wrote: > > > > Trivial note: > > > > On the subject of naming things (spelling things??) -- should it be: > > > > numpy > > or > > Numpy > > or > > NumPy > > ? > > > > All three are in the draft NEP 30 ( mostly "NumPy", I noticed this when > reading/copy editing the NEP) . Is there an "official" capitalization? > > > > My preference, would be to use "numpy", and where practicable, use a > "computer" font -- i.e. ``numpy`` in RST. > > > > But if there is consensus already for anything else, that's fine, I'd > just like to know what it is. > > > > -CHB > > > > > > > > On Mon, Aug 12, 2019 at 4:02 AM Peter Andreas Entschev < > peter at entschev.com> wrote: > >> > >> Apologies for the late reply. I've opened a new PR > >> https://github.com/numpy/numpy/pull/14257 with the changes requested > >> on clarifying the text. After reading the detailed description, I've > >> decided to add a subsection "Scope" to clarify the scope where NEP-30 > >> would be useful. I think the inclusion of this new subsection > >> complements the "Detail description" forming a complete text w.r.t. > >> motivation of the NEP, but feel free to point out disagreements with > >> my suggestion. I've also added a new section "Usage" pointing out how > >> one would use duck array in replacement to np.asarray where relevant. > >> > >> Regarding the naming discussion, I must say I like the idea of keeping > >> the __array_ prefix, but it seems like that is going to be difficult > >> given that none of the existing ideas so far play very nicely with > >> that. So if the general consensus is to go with __numpy_like__, I > >> would also update the NEP to reflect that changes. FWIW, I > >> particularly neither like nor dislike __numpy_like__, but I don't have > >> any better suggestions than that or keeping the current naming. > >> > >> Best, > >> Peter > >> > >> On Thu, Aug 8, 2019 at 3:40 AM Stephan Hoyer wrote: > >> > > >> > > >> > > >> > On Wed, Aug 7, 2019 at 6:18 PM Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> > >> >> > >> >> > >> >> On Wed, Aug 7, 2019 at 7:10 PM Stephan Hoyer > wrote: > >> >>> > >> >>> On Wed, Aug 7, 2019 at 5:11 PM Ralf Gommers > wrote: > >> >>>> > >> >>>> > >> >>>> On Mon, Aug 5, 2019 at 6:18 PM Stephan Hoyer > wrote: > >> >>>>> > >> >>>>> On Mon, Aug 5, 2019 at 2:48 PM Ralf Gommers < > ralf.gommers at gmail.com> wrote: > >> >>>>> > >> >>>>>> > >> >>>>>> The NEP currently does not say who this is meant for. Would you > expect libraries like SciPy to adopt it for example? > >> >>>>>> > >> >>>>>> The NEP also (understandably) punts on the question of when > something is a valid duck array. If you want this to be widely used, that > will need an answer or at least some rough guidance though. For example, we > would expect a duck array to have a mean() method, but probably not a ptp() > method. A library author who wants to use np.duckarray() needs to know, > because she can't test with all existing and future duck array > implementations. > >> >>>>> > >> >>>>> > >> >>>>> I think this is covered in NEP-22 already. > >> >>>> > >> >>>> > >> >>>> It's not really. We discussed this briefly in the community call > today, Peter said he will try to add some text. > >> >>>> > >> >>>> We should not add new functions to NumPy without indicating who is > supposed to use this, and what need it fills / problem it solves. It seems > pretty clear to me that it's mostly aimed at library authors rather than > end users. And also that mature libraries like SciPy may not immediately > adopt it, because it's too fuzzy - so it's new libraries first, mature > libraries after the dust has settled a bit (I think). > >> >>> > >> >>> > >> >>> I totally agree -- we definitely should clarify this in the > docstring and elsewhere in the docs. An example in the new doc page on > "Writing custom array containers" ( > https://numpy.org/devdocs/user/basics.dispatch.html) would also probably > be appropriate. > >> >>> > >> >>>>> > >> >>>>> As discussed there, I don't think NumPy is in a good position to > pronounce decisive APIs at this time. I would welcome efforts to try, but I > don't think that's essential for now. > >> >>>> > >> >>>> > >> >>>> There's no need to pronounce a decisive API that fully covers duck > array. Note that RNumPy is an attempt in that direction (not a full one, > but way better than nothing). In the NEP/docs, at least saying something > along the lines of "if you implement this, we recommend the following > strategy: check if a function is present in Dask, CuPy and Sparse. If so, > it's reasonable to expect any duck array to work here. If not, we suggest > you indicate in your docstring what kinds of duck arrays are accepted, or > what properties they need to have". That's a spec by implementation, which > is less than ideal but better than saying nothing. > >> >>> > >> >>> > >> >>> OK, I agree here as well -- some guidance is better than nothing. > >> >>> > >> >>> Two other minor notes on this NEP, concerning naming: > >> >>> 1. We should have a brief note on why we settled on the name "duck > array". Namely, as discussed in NEP-22, we don't love the "duck" jargon, > but we couldn't come up with anything better since NumPy already uses > "array like" and "any array" for different purposes. > >> >>> 2. The protocol should use *something* more clearly namespaced as > NumPy specific than __duckarray__. All the other special protocols NumPy > defines start with "__array_". That suggests either __array_duckarray__ > (sounds a little redundant) or __numpy_duckarray__ (which I like the look > of, but is a different from the existing protocols). > >> >>> > >> >> > >> >> `__numpy_like__` ? > >> > > >> > > >> > > >> > This could work, but I think we would also want to rename the NumPy > function itself to either np.like or np.numpy_like. The later is a little > redundant but definitely more self-descriptive than "duck array". > >> > > >> >> > >> >> Chuck > >> >> _______________________________________________ > >> >> NumPy-Discussion mailing list > >> >> NumPy-Discussion at python.org > >> >> https://mail.python.org/mailman/listinfo/numpy-discussion > >> > > >> > _______________________________________________ > >> > NumPy-Discussion mailing list > >> > NumPy-Discussion at python.org > >> > https://mail.python.org/mailman/listinfo/numpy-discussion > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> NumPy-Discussion at python.org > >> https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > > -- > > > > Christopher Barker, Ph.D. > > Oceanographer > > > > Emergency Response Division > > NOAA/NOS/OR&R (206) 526-6959 voice > > 7600 Sand Point Way NE (206) 526-6329 fax > > Seattle, WA 98115 (206) 526-6317 main reception > > > > Chris.Barker at noaa.gov > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Mon Sep 16 17:44:46 2019 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 16 Sep 2019 14:44:46 -0700 Subject: [Numpy-discussion] How to Capitalize numpy? In-Reply-To: References: Message-ID: got it, thanks. I've fixed that typo in a PR I"m working on , too. -CHB On Mon, Sep 16, 2019 at 2:41 PM Ralf Gommers wrote: > > > On Mon, Sep 16, 2019 at 1:42 PM Peter Andreas Entschev > wrote: > >> My answer to that: "NumPy". Reference: logo at the top of >> https://numpy.org/neps/index.html . >> > > Yes, NumPy is the right capitalization > > > >> In NEP-30 [1], I've used "NumPy" everywhere, except for references to >> code, repos, etc., where "numpy" is used. I see there's one occurrence >> of "Numpy", which was definitely a typo and I had not noticed it until >> now, but I will address this on a future update, thanks for pointing >> that out. >> >> [1] https://numpy.org/neps/nep-0030-duck-array-protocol.html >> >> On Mon, Sep 16, 2019 at 9:09 PM Chris Barker >> wrote: >> > >> > Trivial note: >> > >> > On the subject of naming things (spelling things??) -- should it be: >> > >> > numpy >> > or >> > Numpy >> > or >> > NumPy >> > ? >> > >> > All three are in the draft NEP 30 ( mostly "NumPy", I noticed this when >> reading/copy editing the NEP) . Is there an "official" capitalization? >> > >> > My preference, would be to use "numpy", and where practicable, use a >> "computer" font -- i.e. ``numpy`` in RST. >> > >> > But if there is consensus already for anything else, that's fine, I'd >> just like to know what it is. >> > >> > -CHB >> > >> > >> > >> > On Mon, Aug 12, 2019 at 4:02 AM Peter Andreas Entschev < >> peter at entschev.com> wrote: >> >> >> >> Apologies for the late reply. I've opened a new PR >> >> https://github.com/numpy/numpy/pull/14257 with the changes requested >> >> on clarifying the text. After reading the detailed description, I've >> >> decided to add a subsection "Scope" to clarify the scope where NEP-30 >> >> would be useful. I think the inclusion of this new subsection >> >> complements the "Detail description" forming a complete text w.r.t. >> >> motivation of the NEP, but feel free to point out disagreements with >> >> my suggestion. I've also added a new section "Usage" pointing out how >> >> one would use duck array in replacement to np.asarray where relevant. >> >> >> >> Regarding the naming discussion, I must say I like the idea of keeping >> >> the __array_ prefix, but it seems like that is going to be difficult >> >> given that none of the existing ideas so far play very nicely with >> >> that. So if the general consensus is to go with __numpy_like__, I >> >> would also update the NEP to reflect that changes. FWIW, I >> >> particularly neither like nor dislike __numpy_like__, but I don't have >> >> any better suggestions than that or keeping the current naming. >> >> >> >> Best, >> >> Peter >> >> >> >> On Thu, Aug 8, 2019 at 3:40 AM Stephan Hoyer wrote: >> >> > >> >> > >> >> > >> >> > On Wed, Aug 7, 2019 at 6:18 PM Charles R Harris < >> charlesr.harris at gmail.com> wrote: >> >> >> >> >> >> >> >> >> >> >> >> On Wed, Aug 7, 2019 at 7:10 PM Stephan Hoyer >> wrote: >> >> >>> >> >> >>> On Wed, Aug 7, 2019 at 5:11 PM Ralf Gommers < >> ralf.gommers at gmail.com> wrote: >> >> >>>> >> >> >>>> >> >> >>>> On Mon, Aug 5, 2019 at 6:18 PM Stephan Hoyer >> wrote: >> >> >>>>> >> >> >>>>> On Mon, Aug 5, 2019 at 2:48 PM Ralf Gommers < >> ralf.gommers at gmail.com> wrote: >> >> >>>>> >> >> >>>>>> >> >> >>>>>> The NEP currently does not say who this is meant for. Would you >> expect libraries like SciPy to adopt it for example? >> >> >>>>>> >> >> >>>>>> The NEP also (understandably) punts on the question of when >> something is a valid duck array. If you want this to be widely used, that >> will need an answer or at least some rough guidance though. For example, we >> would expect a duck array to have a mean() method, but probably not a ptp() >> method. A library author who wants to use np.duckarray() needs to know, >> because she can't test with all existing and future duck array >> implementations. >> >> >>>>> >> >> >>>>> >> >> >>>>> I think this is covered in NEP-22 already. >> >> >>>> >> >> >>>> >> >> >>>> It's not really. We discussed this briefly in the community call >> today, Peter said he will try to add some text. >> >> >>>> >> >> >>>> We should not add new functions to NumPy without indicating who >> is supposed to use this, and what need it fills / problem it solves. It >> seems pretty clear to me that it's mostly aimed at library authors rather >> than end users. And also that mature libraries like SciPy may not >> immediately adopt it, because it's too fuzzy - so it's new libraries first, >> mature libraries after the dust has settled a bit (I think). >> >> >>> >> >> >>> >> >> >>> I totally agree -- we definitely should clarify this in the >> docstring and elsewhere in the docs. An example in the new doc page on >> "Writing custom array containers" ( >> https://numpy.org/devdocs/user/basics.dispatch.html) would also probably >> be appropriate. >> >> >>> >> >> >>>>> >> >> >>>>> As discussed there, I don't think NumPy is in a good position to >> pronounce decisive APIs at this time. I would welcome efforts to try, but I >> don't think that's essential for now. >> >> >>>> >> >> >>>> >> >> >>>> There's no need to pronounce a decisive API that fully covers >> duck array. Note that RNumPy is an attempt in that direction (not a full >> one, but way better than nothing). In the NEP/docs, at least saying >> something along the lines of "if you implement this, we recommend the >> following strategy: check if a function is present in Dask, CuPy and >> Sparse. If so, it's reasonable to expect any duck array to work here. If >> not, we suggest you indicate in your docstring what kinds of duck arrays >> are accepted, or what properties they need to have". That's a spec by >> implementation, which is less than ideal but better than saying nothing. >> >> >>> >> >> >>> >> >> >>> OK, I agree here as well -- some guidance is better than nothing. >> >> >>> >> >> >>> Two other minor notes on this NEP, concerning naming: >> >> >>> 1. We should have a brief note on why we settled on the name "duck >> array". Namely, as discussed in NEP-22, we don't love the "duck" jargon, >> but we couldn't come up with anything better since NumPy already uses >> "array like" and "any array" for different purposes. >> >> >>> 2. The protocol should use *something* more clearly namespaced as >> NumPy specific than __duckarray__. All the other special protocols NumPy >> defines start with "__array_". That suggests either __array_duckarray__ >> (sounds a little redundant) or __numpy_duckarray__ (which I like the look >> of, but is a different from the existing protocols). >> >> >>> >> >> >> >> >> >> `__numpy_like__` ? >> >> > >> >> > >> >> > >> >> > This could work, but I think we would also want to rename the NumPy >> function itself to either np.like or np.numpy_like. The later is a little >> redundant but definitely more self-descriptive than "duck array". >> >> > >> >> >> >> >> >> Chuck >> >> >> _______________________________________________ >> >> >> NumPy-Discussion mailing list >> >> >> NumPy-Discussion at python.org >> >> >> https://mail.python.org/mailman/listinfo/numpy-discussion >> >> > >> >> > _______________________________________________ >> >> > NumPy-Discussion mailing list >> >> > NumPy-Discussion at python.org >> >> > https://mail.python.org/mailman/listinfo/numpy-discussion >> >> _______________________________________________ >> >> NumPy-Discussion mailing list >> >> NumPy-Discussion at python.org >> >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > >> > >> > >> > -- >> > >> > Christopher Barker, Ph.D. >> > Oceanographer >> > >> > Emergency Response Division >> > NOAA/NOS/OR&R (206) 526-6959 voice >> > 7600 Sand Point Way NE (206) 526-6329 fax >> > Seattle, WA 98115 (206) 526-6317 main reception >> > >> > Chris.Barker at noaa.gov >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at python.org >> > https://mail.python.org/mailman/listinfo/numpy-discussion >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Mon Sep 16 17:46:44 2019 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 16 Sep 2019 14:46:44 -0700 Subject: [Numpy-discussion] How to Capitalize numpy? In-Reply-To: <8ce61707-b8be-253e-02d2-ff8ebeee2971@physics.ucf.edu> References: <8ce61707-b8be-253e-02d2-ff8ebeee2971@physics.ucf.edu> Message-ID: Thanks Joe, looks like everyone agrees: In text, NumPy it is. -CHB On Mon, Sep 16, 2019 at 2:41 PM Joe Harrington wrote: > Here are my thoughts on textual capitalization (at first, I thought you > wanted to raise money!): > > We all agree that in code, it is "numpy". If you don't use that, it > throws an error. If, in text, we keep "numpy" with a forced lower-case > letter at the start, it is just one more oddball to remember. It is even > weirder in titles and the beginnings of sentences. I'd strongly like not > to be weird that way. A few packages are, it's annoying, and it doesn't > much earn them any goodwill. The default among people who are not "in the > know" will be to do what they're used to. Let's give them what they're > used to, a proper noun with initial (at least) capital. > > Likewise, I object to preferring a particular font. What fonts to use for > the names of things like software packages is a decision for publications > to make. A journal or manual might make fine distinctions and demand > several different, specific fonts, while a popular publication might prefer > not to do that. Leave the typesetting to the editors of the publications. > We can certainly adopt a standard for our publications (docs, web pages, > etc.), but we should state explicitly that others can do as they like. > > It's not an acronym, so that leaves the options of "Numpy" and "NumPy". > It would be great, easy to remember, consistent for others, etc., if NumPy > and SciPy were capitalized the same way and were pronounced the same (I > still occasionally hear "numpee"). So, I would favor "NumPy" to go along > with "SciPy", and let the context choose the font. > > --jh-- > > > On 9/16/19 9:09 PM, Chris Barker > wrote: > > > > > > Trivial note: > > On the subject of naming things (spelling things??) -- should it be: > > numpy > or > Numpy > or > NumPy > ? > > All three are in the draft NEP 30 ( mostly "NumPy", I noticed this when > reading/copy editing the NEP) . Is there an "official" capitalization? > > My preference, would be to use "numpy", and where practicable, use a > "computer" font -- i.e. ``numpy`` in RST. > > But if there is consensus already for anything else, that's fine, I'd just > like to know what it is. > > -CHB > > > > On Mon, Aug 12, 2019 at 4:02 AM Peter Andreas Entschev > wrote: > >> Apologies for the late reply. I've opened a new PR >> https://github.com/numpy/numpy/pull/14257 with the changes requested >> on clarifying the text. After reading the detailed description, I've >> decided to add a subsection "Scope" to clarify the scope where NEP-30 >> would be useful. I think the inclusion of this new subsection >> complements the "Detail description" forming a complete text w.r.t. >> motivation of the NEP, but feel free to point out disagreements with >> my suggestion. I've also added a new section "Usage" pointing out how >> one would use duck array in replacement to np.asarray where relevant. >> >> Regarding the naming discussion, I must say I like the idea of keeping >> the __array_ prefix, but it seems like that is going to be difficult >> given that none of the existing ideas so far play very nicely with >> that. So if the general consensus is to go with __numpy_like__, I >> would also update the NEP to reflect that changes. FWIW, I >> particularly neither like nor dislike __numpy_like__, but I don't have >> any better suggestions than that or keeping the current naming. >> >> Best, >> Peter >> >> On Thu, Aug 8, 2019 at 3:40 AM Stephan Hoyer wrote: >> > >> > >> > >> > On Wed, Aug 7, 2019 at 6:18 PM Charles R Harris < >> charlesr.harris at gmail.com> wrote: >> >> >> >> >> >> >> >> On Wed, Aug 7, 2019 at 7:10 PM Stephan Hoyer wrote: >> >>> >> >>> On Wed, Aug 7, 2019 at 5:11 PM Ralf Gommers >> wrote: >> >>>> >> >>>> >> >>>> On Mon, Aug 5, 2019 at 6:18 PM Stephan Hoyer >> wrote: >> >>>>> >> >>>>> On Mon, Aug 5, 2019 at 2:48 PM Ralf Gommers >> wrote: >> >>>>> >> >>>>>> >> >>>>>> The NEP currently does not say who this is meant for. Would you >> expect libraries like SciPy to adopt it for example? >> >>>>>> >> >>>>>> The NEP also (understandably) punts on the question of when >> something is a valid duck array. If you want this to be widely used, that >> will need an answer or at least some rough guidance though. For example, we >> would expect a duck array to have a mean() method, but probably not a ptp() >> method. A library author who wants to use np.duckarray() needs to know, >> because she can't test with all existing and future duck array >> implementations. >> >>>>> >> >>>>> >> >>>>> I think this is covered in NEP-22 already. >> >>>> >> >>>> >> >>>> It's not really. We discussed this briefly in the community call >> today, Peter said he will try to add some text. >> >>>> >> >>>> We should not add new functions to NumPy without indicating who is >> supposed to use this, and what need it fills / problem it solves. It seems >> pretty clear to me that it's mostly aimed at library authors rather than >> end users. And also that mature libraries like SciPy may not immediately >> adopt it, because it's too fuzzy - so it's new libraries first, mature >> libraries after the dust has settled a bit (I think). >> >>> >> >>> >> >>> I totally agree -- we definitely should clarify this in the docstring >> and elsewhere in the docs. An example in the new doc page on "Writing >> custom array containers" ( >> https://numpy.org/devdocs/user/basics.dispatch.html) would also probably >> be appropriate. >> >>> >> >>>>> >> >>>>> As discussed there, I don't think NumPy is in a good position to >> pronounce decisive APIs at this time. I would welcome efforts to try, but I >> don't think that's essential for now. >> >>>> >> >>>> >> >>>> There's no need to pronounce a decisive API that fully covers duck >> array. Note that RNumPy is an attempt in that direction (not a full one, >> but way better than nothing). In the NEP/docs, at least saying something >> along the lines of "if you implement this, we recommend the following >> strategy: check if a function is present in Dask, CuPy and Sparse. If so, >> it's reasonable to expect any duck array to work here. If not, we suggest >> you indicate in your docstring what kinds of duck arrays are accepted, or >> what properties they need to have". That's a spec by implementation, which >> is less than ideal but better than saying nothing. >> >>> >> >>> >> >>> OK, I agree here as well -- some guidance is better than nothing. >> >>> >> >>> Two other minor notes on this NEP, concerning naming: >> >>> 1. We should have a brief note on why we settled on the name "duck >> array". Namely, as discussed in NEP-22, we don't love the "duck" jargon, >> but we couldn't come up with anything better since NumPy already uses >> "array like" and "any array" for different purposes. >> >>> 2. The protocol should use *something* more clearly namespaced as >> NumPy specific than __duckarray__. All the other special protocols NumPy >> defines start with "__array_". That suggests either __array_duckarray__ >> (sounds a little redundant) or __numpy_duckarray__ (which I like the look >> of, but is a different from the existing protocols). >> >>> >> >> >> >> `__numpy_like__` ? >> > >> > >> > >> > This could work, but I think we would also want to rename the NumPy >> function itself to either np.like or np.numpy_like. The later is a little >> redundant but definitely more self-descriptive than "duck array". >> > >> >> >> >> Chuck >> >> _______________________________________________ >> >> NumPy-Discussion mailing list >> >> NumPy-Discussion at python.org >> >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at python.org >> > https://mail.python.org/mailman/listinfo/numpy-discussion >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Mon Sep 16 17:55:20 2019 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 16 Sep 2019 14:55:20 -0700 Subject: [Numpy-discussion] NEP 30 - Duck Typing for NumPy Arrays - Implementation In-Reply-To: References: Message-ID: On Mon, Sep 16, 2019 at 1:46 PM Peter Andreas Entschev wrote: > What would be the use case for a duck-array to implement __array__ and > return a NumPy array? some users need a genuine, actual numpy array (for passing to Cyton code, for example). if __array__ is not implemented, how can they get that from an array-like object?? Only the author of the array-like object knows how best to make a numpy array out of it. Unless I'm missing something, this seems > redundant and one should just use array/asarray functions then. but if the object does not impliment __array__, then user's can't use the array/asarray functions! > This > would also prevent error-handling, what if the developer intentionally > wants a NumPy-like array (e.g., the original array passed to the > duckarray function) or an exception (instead of coercing to a NumPy > array)? > I'm really confused now -- if a end-user wants a duckarray, they should call duckarray() -- if they want an actual numpy array, they should call .asarray(). Why would anyone want an Exception? If you don't want an array, then don't call asarray() If you call duckarray(), and the object has not implemented __duckarray__, then you will get an exception -- whoch you should. If you call __array_, and __array__ has not been implimented, then you will get an exception. what is the potential problem here? Which makes me think -- why should Duck arrays ever implement an __array__ method that raises an Exception? why not jsut not impliment it? (unless you wantt o add some helpful error message -- which I did for the example in my PR. (PR to the numpy repo in progress) -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Mon Sep 16 18:11:34 2019 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 16 Sep 2019 15:11:34 -0700 Subject: [Numpy-discussion] NEP 30 - Duck Typing for NumPy Arrays - Implementation In-Reply-To: References: Message-ID: On Mon, Sep 16, 2019 at 2:27 PM Stephan Hoyer wrote: > On Mon, Sep 16, 2019 at 1:45 PM Peter Andreas Entschev > wrote: > >> What would be the use case for a duck-array to implement __array__ and >> return a NumPy array? > > > Dask arrays are a good example. They will want to implement __duck_array__ > (or whatever we call it) because they support duck typed versions of NumPy > operation. They also (already) implement __array__, so they can converted > into NumPy arrays as a fallback. This is convenient for moderately sized > dask arrays, e.g., so you can pass one into a matplotlib function. > Exactly. And I have implemented __array__ in classes that are NOT duck arrays at all (an image class, for instance). But I also can see wanting to support both: use me as a duck array and convert me into a proper numpy array. OK -- looking again at the NEP, I see this suggested implementation: def duckarray(array_like): if hasattr(array_like, '__duckarray__'): return array_like.__duckarray__() return np.asarray(array_like) So I see the point now, if a user wants a duck array -- they may not want to accidentally coerce this object to a real array (potentially expensive). but in this case, asarray() will only get called (and thus __array__ will only get called), if __duckarray__ is not implemented. So the only reason to impliment __array__ and raise and Exception is so that users will get that exception is the specifically call asarray() -- why should they get that?? I'm working on a PR with suggestion for this. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Mon Sep 16 18:23:11 2019 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 16 Sep 2019 15:23:11 -0700 Subject: [Numpy-discussion] NEP 30 - Duck Typing for NumPy Arrays - Implementation In-Reply-To: References: Message-ID: OK -- I *finally* got it: when you pass an arbitrary object into np.asarray(), it will create an array object scalar with the object in it. So yes, I can see that you may want to raise a TypeError instead, so that users don't get an object array scalar when they wre expecting to get an array-like object. So it's probably a good idea to recommend that when a class implements __dauckarray__ that it also implements __array__, which can either raise an exception or return and ndarray. -CHB On Mon, Sep 16, 2019 at 3:11 PM Chris Barker wrote: > On Mon, Sep 16, 2019 at 2:27 PM Stephan Hoyer wrote: > >> On Mon, Sep 16, 2019 at 1:45 PM Peter Andreas Entschev < >> peter at entschev.com> wrote: >> >>> What would be the use case for a duck-array to implement __array__ and >>> return a NumPy array? >> >> > >> Dask arrays are a good example. They will want to implement >> __duck_array__ (or whatever we call it) because they support duck typed >> versions of NumPy operation. They also (already) implement __array__, so >> they can converted into NumPy arrays as a fallback. This is convenient for >> moderately sized dask arrays, e.g., so you can pass one into a matplotlib >> function. >> > > Exactly. > > And I have implemented __array__ in classes that are NOT duck arrays at > all (an image class, for instance). But I also can see wanting to support > both: > > use me as a duck array > and > convert me into a proper numpy array. > > OK -- looking again at the NEP, I see this suggested implementation: > > def duckarray(array_like): > if hasattr(array_like, '__duckarray__'): > return array_like.__duckarray__() > return np.asarray(array_like) > > So I see the point now, if a user wants a duck array -- they may not want > to accidentally coerce this object to a real array (potentially expensive). > > but in this case, asarray() will only get called (and thus __array__ will > only get called), if __duckarray__ is not implemented. So the only reason > to impliment __array__ and raise and Exception is so that users will get > that exception is the specifically call asarray() -- why should they get > that?? > > I'm working on a PR with suggestion for this. > > -CHB > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Mon Sep 16 18:38:53 2019 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 16 Sep 2019 15:38:53 -0700 Subject: [Numpy-discussion] NEP 30 - Duck Typing for NumPy Arrays - Implementation In-Reply-To: References: Message-ID: Here's a PR with a different dicsussion of __array__: https://github.com/numpy/numpy/pull/14529 -CHB On Mon, Sep 16, 2019 at 3:23 PM Chris Barker wrote: > OK -- I *finally* got it: > > when you pass an arbitrary object into np.asarray(), it will create an > array object scalar with the object in it. > > So yes, I can see that you may want to raise a TypeError instead, so that > users don't get an object array scalar when they wre expecting to get an > array-like object. > > So it's probably a good idea to recommend that when a class implements > __dauckarray__ that it also implements __array__, which can either raise an > exception or return and ndarray. > > -CHB > > > On Mon, Sep 16, 2019 at 3:11 PM Chris Barker > wrote: > >> On Mon, Sep 16, 2019 at 2:27 PM Stephan Hoyer wrote: >> >>> On Mon, Sep 16, 2019 at 1:45 PM Peter Andreas Entschev < >>> peter at entschev.com> wrote: >>> >>>> What would be the use case for a duck-array to implement __array__ and >>>> return a NumPy array? >>> >>> >> >>> Dask arrays are a good example. They will want to implement >>> __duck_array__ (or whatever we call it) because they support duck typed >>> versions of NumPy operation. They also (already) implement __array__, so >>> they can converted into NumPy arrays as a fallback. This is convenient for >>> moderately sized dask arrays, e.g., so you can pass one into a matplotlib >>> function. >>> >> >> Exactly. >> >> And I have implemented __array__ in classes that are NOT duck arrays at >> all (an image class, for instance). But I also can see wanting to support >> both: >> >> use me as a duck array >> and >> convert me into a proper numpy array. >> >> OK -- looking again at the NEP, I see this suggested implementation: >> >> def duckarray(array_like): >> if hasattr(array_like, '__duckarray__'): >> return array_like.__duckarray__() >> return np.asarray(array_like) >> >> So I see the point now, if a user wants a duck array -- they may not want >> to accidentally coerce this object to a real array (potentially expensive). >> >> but in this case, asarray() will only get called (and thus __array__ will >> only get called), if __duckarray__ is not implemented. So the only reason >> to impliment __array__ and raise and Exception is so that users will get >> that exception is the specifically call asarray() -- why should they get >> that?? >> >> I'm working on a PR with suggestion for this. >> >> -CHB >> >> -- >> >> Christopher Barker, Ph.D. >> Oceanographer >> >> Emergency Response Division >> NOAA/NOS/OR&R (206) 526-6959 voice >> 7600 Sand Point Way NE (206) 526-6329 fax >> Seattle, WA 98115 (206) 526-6317 main reception >> >> Chris.Barker at noaa.gov >> > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From pankaj.jangid at gmail.com Mon Sep 16 20:29:30 2019 From: pankaj.jangid at gmail.com (Pankaj Jangid) Date: Tue, 17 Sep 2019 05:59:30 +0530 Subject: [Numpy-discussion] How to Capitalize numpy? In-Reply-To: <8ce61707-b8be-253e-02d2-ff8ebeee2971@physics.ucf.edu> (Joe Harrington's message of "Mon, 16 Sep 2019 23:32:33 +0200") References: <8ce61707-b8be-253e-02d2-ff8ebeee2971@physics.ucf.edu> Message-ID: Joe Harrington writes: > It's not an acronym, so that leaves the options of "Numpy" and > "NumPy".? It would be great, easy to remember, consistent for others, > etc., if NumPy and SciPy were capitalized the same way and were > pronounced the same (I still occasionally hear "numpee"). So, I would > favor "NumPy" to go along with "SciPy", and let the context choose the > font. > "NumPy" is perfect capitalization. It looks beautiful in pure text. For programming, "numpy" is good. Most of the time I import it as "np". -- Regards, Pankaj Jangid From poh.zijie at gmail.com Tue Sep 17 00:14:53 2019 From: poh.zijie at gmail.com (Zijie Poh) Date: Mon, 16 Sep 2019 21:14:53 -0700 Subject: [Numpy-discussion] keeping all meeting notes and docs in new repo? In-Reply-To: References: Message-ID: Hi all, I like the idea of having a new repo containing meeting minutes and docs. Regards, ZJ On Sun, Sep 15, 2019 at 5:26 PM Ralf Gommers wrote: > Hi all, > > We have had community calls for quite a while, the minutes of which are > kept in https://github.com/BIDS-numpy/docs. That's quite hard to > discover, it would be better if those lived under the NumPy GitHub org. > Also, we have minutes from Season of Docs and website redesign calls, plus > occasionally some other docs (e.g. the roadmap drafts were on hackmd.io). > > Would it make sense to add a new repo to contain all such meeting minutes > and docs? Presentations and proposals may make sense to add as well - > several people have given presentations or submitted proposals on behalf of > the project. > > Inessa also suggested to enable HackMD Hub (see > https://hackmd.io/c/tutorials/%2Fs%2Flink-with-github) so we get > automatic versioning for some HackMD documents. I haven't used it before, > but it looks good. > > Thoughts? > > Cheers, > Ralf > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From peter at entschev.com Tue Sep 17 09:54:21 2019 From: peter at entschev.com (Peter Andreas Entschev) Date: Tue, 17 Sep 2019 15:54:21 +0200 Subject: [Numpy-discussion] NEP 30 - Duck Typing for NumPy Arrays - Implementation In-Reply-To: References: Message-ID: I see what you mean now. It was my misunderstanding, I thought you wanted to return a call to __array__ when you call np.duckarray. I agree with your point and understand how the current text may be misleading, so we shall make it clearer in the NEP (as done in https://github.com/numpy/numpy/pull/14529) that both are valid ways: * Have a genuine implementation of __array__ (like Dask, as pointed out by Stephan); or * Raise an exception (as CuPy does). Thanks for opening the PR, I will comment there as well. From sebastian at sipsolutions.net Tue Sep 17 15:04:51 2019 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 17 Sep 2019 12:04:51 -0700 Subject: [Numpy-discussion] NumPy Community Meeting Wednesday, Sep. 18 Message-ID: <6fe2a07608d82ed48434668fc54677ef84d05bba.camel@sipsolutions.net> Hi all, There will be a NumPy Community meeting Wednesday September 18 at 11 am Pacific Time. Everyone is invited to join in and edit the work-in- progress meeting topics and notes: https://hackmd.io/76o-IxCjQX2mOXO_wwkcpg?both Best wishes Sebastian -------------- next part -------------- A non-text attachment was scrubbed... Name: NumPy_Community_Call.ics Type: text/calendar Size: 3264 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From chris.barker at noaa.gov Tue Sep 17 18:04:01 2019 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 17 Sep 2019 15:04:01 -0700 Subject: [Numpy-discussion] NEP 30 - Duck Typing for NumPy Arrays - Implementation In-Reply-To: References: Message-ID: On Tue, Sep 17, 2019 at 6:56 AM Peter Andreas Entschev wrote: > I agree with your point and understand how the current text may be > misleading, so we shall make it clearer in the NEP (as done in > https://github.com/numpy/numpy/pull/14529) that both are valid ways: > > * Have a genuine implementation of __array__ (like Dask, as pointed > out by Stephan); or > * Raise an exception (as CuPy does). > great -- sounds like we're all (well three of us anyway) are on teh same page. Just need to sort out the text. -CHB > > Thanks for opening the PR, I will comment there as well. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefanv at berkeley.edu Tue Sep 17 18:11:37 2019 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Tue, 17 Sep 2019 15:11:37 -0700 Subject: [Numpy-discussion] keeping all meeting notes and docs in new repo? In-Reply-To: References: Message-ID: <16d4147f1a8.27ae.acf34a9c767d7bb498a799333be0433e@fastmail.com> Yes, that would make sense. The notes are there because of historic reasons, the meetings having originated as BIDS updates to the community, but by now it is much more open and community driven (thanks all!) so +1. St?fan On September 16, 2019 21:16:18 Zijie Poh wrote: > Hi all, > > > I like the idea of having a new repo containing meeting minutes and docs. > > > Regards, > ZJ > > > On Sun, Sep 15, 2019 at 5:26 PM Ralf Gommers wrote: > > Hi all, > > > We have had community calls for quite a while, the minutes of which are > kept in https://github.com/BIDS-numpy/docs. That's quite hard to discover, > it would be better if those lived under the NumPy GitHub org. Also, we have > minutes from Season of Docs and website redesign calls, plus occasionally > some other docs (e.g. the roadmap drafts were on hackmd.io). > > > Would it make sense to add a new repo to contain all such meeting minutes > and docs? Presentations and proposals may make sense to add as well - > several people have given presentations or submitted proposals on behalf of > the project. > > > Inessa also suggested to enable HackMD Hub (see > https://hackmd.io/c/tutorials/%2Fs%2Flink-with-github) so we get automatic > versioning for some HackMD documents. I haven't used it before, but it > looks good. > > > > Thoughts? > > > > Cheers, > > Ralf > > > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From albuscode at gmail.com Wed Sep 18 12:51:45 2019 From: albuscode at gmail.com (Inessa Pawson) Date: Wed, 18 Sep 2019 12:51:45 -0400 Subject: [Numpy-discussion] User Stories for https://numpy.org Message-ID: The NumPy web team has begun redesigning https://numpy.org determined to transform the website into a welcoming and useful digital hub of all things NumPy. We are inviting all members of our large and diverse community to submit their user stories to help us fulfill our mission. *What are we looking for?* In simple, concise terms, a user story describes what a user needs to accomplish while visiting a website. Anyone who reads the user story must be able to understand why the user needs the functionality, and what is required to implement the story. User stories must have acceptance criteria. The shorter the story the better. *Examples of good user stories* 1. Lotte is a library author that depends on NumPy. She is looking for information about major changes and a release date of the next version of NumPy. She would like to easily find it on the website instead of contacting the core team. 2. Yu Yan was introduced to NumPy in her first week of the Foundations of Data Science class. She is looking for a NumPy tutorial for absolute beginners in Mandarin. 3. Tiago is a software developer. By day, he builds enterprise applications for a Fortune 100 company. By night, he cultivates his academic interests in statistics and computer science using various Python libraries. Tiago has an idea for a new NumPy feature and would like to implement it. He is looking for information on how to contact the person(s) in charge of such decisions. *Please note* that at this stage of the numpy.org redesign our focus is not on expanding or improving the documentation but, rather, developing high-level content to provide information about the project to a multitude of stakeholders. -- Every good wish, *Inessa Pawson* NumPy Web Team -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Wed Sep 18 19:34:49 2019 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Wed, 18 Sep 2019 16:34:49 -0700 Subject: [Numpy-discussion] DType Roadmap/NEP Discussion Message-ID: <5f00290851aa9418215c4bd0fea3378bf94dcc79.camel@sipsolutions.net> Hi all, to try and make some progress towards a decision since the broad design is pretty much settling from my side. I am thinking about making a meeting, and suggest Monday at 11am Pacific Time (I am open to other times though). My hope is to get everyone interested on board, so that we can make an informed decision about the general direction very soon. So just reach out, or discuss on the mailing list as well. The current draft for an NEP is here: https://hackmd.io/kxuh15QGSjueEKft5SaMug?both There are some design goals that I would like to clear up. I would prefer to avoid deep discussions of some specific issues, since I think the important decision right now is that my general start is in the right direction. It is not an easy topic, so my plan would be try and briefly summarize that and then hopefully clarify any questions and then we can discuss why alternatives are rejected. The most important thing is maybe gathering concerns which need to be clarified before we can go towards accepting the general design ideas. The main point of the NEP draft is actually captured by the picture in the linked document: DTypes are classes (such as Float64) and what is attached to the array is an instance of that class "float64". Additionally, we would have AbstractDType classes which cannot be instantiated but define a type hierarchy. To list the main points: * DTypes are classes (corresponding to the current type number) * `arr.dtype` is an instances of its class, allowing to store additional information such as a physical unit, the string length. * Most things are defined in special dtype slots similar to Pythons type and number slots. They will be hidden and can be set through an init function similar to `PyType_FromSpec` [1]. * Promotion is defined primarily on the DType classes * Casting from one DType to another DType is defined by a new CastingImpl object (should become a special ufunc) - e.g. for strings, the CastingImpl is in charge of finding the correct string length * The AbstractDType hierarchy will be used to decide the signature when calling UFuncs. The main iffier points I can think of are: * NumPy currently uses value based promotion in some cases, which requires special AbstractDTypes to describe (and some legacy paths). (They are used use more like instances than typical classes) * Casting between flexible dtypes (such as strings) is a multi-step process to figure out the actual output dtype. - An example is: `np.can_cast("float64", "S3")` first finding that `Float64->String` is possible in principle and then asking the CastingImpl to find that `float64->S3` is not. * We have to break ABI compatibility in very minor, back-portable way. More smaller incompatibilities are likely [2]. * Since it is a major redesign, a lot of code has to be added/touched, although it is possible to channel much of it back into the old machinery. * A largish amount of new API around new DType type objects and also DTypeMeta type objects, which users can (although usually do not have to) subclass. However, most other designs will have similar issues. Basically, I currently really think this is "right", even if some details may end up a tricky. Best, Sebastian PS: The one thing outside the more general list above that I may want to discuss is how acceptable a global dict/mapping for dtype discovery during `np.array` coercion is (mapping python type -> dtype)... [1] https://docs.python.org/3/c-api/type.html#c.PyType_FromSpec [2] One possible issue may be "S0" which is normally used to denote what in the new API would be the `String` DType class. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From ralf.gommers at gmail.com Thu Sep 19 00:33:56 2019 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Wed, 18 Sep 2019 21:33:56 -0700 Subject: [Numpy-discussion] DType Roadmap/NEP Discussion In-Reply-To: <5f00290851aa9418215c4bd0fea3378bf94dcc79.camel@sipsolutions.net> References: <5f00290851aa9418215c4bd0fea3378bf94dcc79.camel@sipsolutions.net> Message-ID: Hi Sebastian, On Wed, Sep 18, 2019 at 4:35 PM Sebastian Berg wrote: > Hi all, > > to try and make some progress towards a decision since the broad design > is pretty much settling from my side. I am thinking about making a > meeting, and suggest Monday at 11am Pacific Time (I am open to other > times though). > > My hope is to get everyone interested on board, so that we can make an > informed decision about the general direction very soon. So just reach > out, or discuss on the mailing list as well. > > The current draft for an NEP is here: > https://hackmd.io/kxuh15QGSjueEKft5SaMug?both > > There are some design goals that I would like to clear up. The design itself seems very sensible to me insofar as I understand it. After having read your document again, I think you're still missing the actual goals though. "structure of class layout" and "type hierarchy" are important, but they're not the goals. You're touching on the real goals in places, but it may be valuable to be much more explicit there. Here are some example goals: 1. Make creating new dtypes via the NumPy C API take >4x less lines of code on average (in practice: for rational/quaternion, hard to measure otherwise). 2. Make it possible to create new dypes with full functionality via the NumPy Python API. Performance may be up to 1-2 orders of magnitude worse than when creating the same dtype via the C API; the main purpose is to allow easier prototyping of new dtypes. 3. Make the NumPy codebase more maintainable by removing special-casing of datetime dtypes in many places. 4. Enable creation of a units library whose arrays *are* numpy arrays rather than a subclass or duck array. This will make such a library work much better with SciPy and other existing libraries that use np.asarray extensively. 5. Hide currently exposed implementation details in the C API so long-term .... (you have this one, but it would be nice to work it out a little more - after all we recently considered reverting the deprecation for direct field access, so how important is this?) 6. Improve casting behavior for external dtypes 7. Make np.char behavior better (you mention fixed length strings work poorly now, but not what would change) Listing non-goals would also be useful: 1. Performance: no significant performance improvements are expected. We aim for no performance regressions. 2. Introducing new dtypes into NumPy itself 3. Pandas ExtensionArrays? You mention them, but does this dtype redesign help Pandas in any way or not? 4. Changes to NumPy's current casting rules 5. Allow creation of dtypes that don't fit the current NumPy model of what a dtype is (e.g. ref [1]), such as a variable-length string dtype. Many of those (and there can be more, this is just what came to mind now) can/should be a paragraph or section. In my experience describing these goals and requirements well takes about 15-30% of the length of the design description. Think of for example a Pandas or units library maintainer reading this: they should be able to stop reading at where you now have "Overview Graphic" and have a pretty clear high-level understanding of what this whole redesign will mean for them. Same for a NumPy maintainer who wants to get a sense of what the benefits and impacts will be: reading only (the expanded version of) your Abstract, Motivation and Scope, and Backwards Compatibility sections should be enough. Here's a concrete question, that's the type of thing I'd like to understand without having to understand the whole design in detail: ``` >>> import datetime >>> import pandas as pd >>> import datetime >>> dti = pd.to_datetime(['1/1/2018', np.datetime64('2018-01-01'), ... datetime.datetime(2018, 1, 1)]) >>> >>> dti.values array(['2018-01-01T00:00:00.000000000', '2018-01-01T00:00:00.000000000', '2018-01-01T00:00:00.000000000'], dtype='datetime64[ns]') >>> dti.values.dtype dtype('>> isinstance(dti.values.dtype, np.dtype) True >>> dti.dtype == dti.values.dtype # okay, that's nice True >>> start = pd.to_datetime('2015-02-24') >>> rng = pd.date_range(start, periods=3) >>> t = pd.Series(rng) >>> t_withzone = t.dt.tz_localize('UTC').dt.tz_convert('Asia/Kolkata') >>> t_withzone 0 2015-02-24 05:30:00+05:30 1 2015-02-25 05:30:00+05:30 2 2015-02-26 05:30:00+05:30 dtype: datetime64[ns, Asia/Kolkata] >>> t_withzone.dtype datetime64[ns, Asia/Kolkata] >>> t_withzone.values.dtype dtype('>> t_withzone.dtype == t_withzone.values.dtype # could this be True in the future? False ``` So can Pandas create timezone-aware numpy dtypes in the future if they want to, or would they still be better off rolling their own? Also one question/comment about the design content. When looking at the current external dtypes (e.g. [2]), a large part of the work of implementing a new dtype now deals with ufunc behavior. It's not clear from your document how that changes with the new design, can you add something about that? Cheers, Ralf [1] http://scipy-lectures.org/advanced/advanced_numpy/index.html#the-descriptor [2] https://github.com/moble/quaternion/blob/master/numpy_quaternion.c -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevin.k.sheppard at gmail.com Thu Sep 19 04:28:03 2019 From: kevin.k.sheppard at gmail.com (Kevin Sheppard) Date: Thu, 19 Sep 2019 09:28:03 +0100 Subject: [Numpy-discussion] Low-level API for Random Message-ID: There are some users of the NumPy C code in randomkit. This was never officially supported. There has been a long open issue to provide this officially. When I wrote randomgen I supplied .pdx files that make it simpler to write Cython code that uses the components. The lower-level API has not had much scrutiny and is in need of a clean-up. I thought this would also encourage users to extend the random machinery themselves as part of their project or code so as to minimize the requests for new (exotic) distributions to be included in Generator. Most of the generator functions follow a pattern random_DISTRIBUTION. Some have a bit more name mangling which can easily be cleaned up (like ranomd_gauss_zig, which should become PREFIX_standard_normal). Ralf Gommers suggested unprefixed names. I tried this in a local branch and it was a bit ugly since some of the distributions have common math names (e.g., gamma) and others are very short (e.g., t or f). I think a prefix is needed, and after looking through the C API docs npy_random_ seemed like a reasonable choice (since these live in numpy.random). Any thoughts on the following questions are welcome (others too): 1. Should there be a prefix on the C functions? 2. If so, what should the prefix be? 3. Should the legacy C functions be part of the API -- these are mostly the ones that produce or depend on polar transform normals (Box-Muller). I have a feeling no, but there may be reasons to prefer BM since they do not depend on rejection sampling. 4. Should low-level API be consumable like any other numpy C API by including the usual header locations and library locations? Right now, the pxd simplifies writing Cython but users have sp specify the location of the headers and source manually An alternative would be to provide a function like np.get_include() -> np.random.get_include() that would specialize in random. Kevin -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Thu Sep 19 05:22:36 2019 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Thu, 19 Sep 2019 11:22:36 +0200 Subject: [Numpy-discussion] Low-level API for Random In-Reply-To: References: Message-ID: On Thu, Sep 19, 2019 at 10:28 AM Kevin Sheppard wrote: > There are some users of the NumPy C code in randomkit. This was never > officially supported. There has been a long open issue to provide this > officially. > > When I wrote randomgen I supplied .pdx files that make it simpler to write > Cython code that uses the components. The lower-level API has not had much > scrutiny and is in need of a clean-up. I thought this would also > encourage users to extend the random machinery themselves as part of their > project or code so as to minimize the requests for new (exotic) > distributions to be included in Generator. > > Most of the generator functions follow a pattern random_DISTRIBUTION. > Some have a bit more name mangling which can easily be cleaned up (like > ranomd_gauss_zig, which should become PREFIX_standard_normal). > > Ralf Gommers suggested unprefixed names. > I suggested that the names should match the Python API, which I think isn't quite the same. The Python API doesn't contain things like "gamma", "t" or "f". I tried this in a local branch and it was a bit ugly since some of the > distributions have common math names (e.g., gamma) and others are very > short (e.g., t or f). I think a prefix is needed, and after looking > through the C API docs npy_random_ seemed like a reasonable choice (since > these live in numpy.random). > > Any thoughts on the following questions are welcome (others too): > > 1. Should there be a prefix on the C functions? > 2. If so, what should the prefix be? > Before worrying about naming details, can we start with "what should be in the C/Cython API"? If I look through the current pxd files, there's a lot there that looks like it should be private, and what we expose as Python API is not all present as far as I can tell (which may be fine, if the only goal is to let people write new generators rather than use the existing ones from Cython without the Python overhead). In the end we want to get to a doc section similar to http://scipy.github.io/devdocs/special.cython_special.html I'd think. 3. Should the legacy C functions be part of the API -- these are mostly the > ones that produce or depend on polar transform normals (Box-Muller). I have > a feeling no, but there may be reasons to prefer BM since they do not > depend on rejection sampling. > Even if there would be a couple of users interested, it would be odd starting to do this after deeming the code "legacy". So I agree with your "no". > 4. Should low-level API be consumable like any other numpy C API by > including the usual header locations and library locations? Right now, the > pxd simplifies writing Cython but users have sp specify the location of the > headers and source manually An alternative would be to provide a function > like np.get_include() -> np.random.get_include() that would specialize in > random. > Good question. I'm not sure this is "like any other NumPy C API". We don't provide a C API for fft, linalg or other functionality further from core either. It's possible of course, but does it really help library authors or end users? Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevin.k.sheppard at gmail.com Thu Sep 19 06:40:38 2019 From: kevin.k.sheppard at gmail.com (Kevin Sheppard) Date: Thu, 19 Sep 2019 11:40:38 +0100 Subject: [Numpy-discussion] Low-level API for Random In-Reply-To: References: Message-ID: On Thu, Sep 19, 2019 at 10:23 AM Ralf Gommers wrote: > > > On Thu, Sep 19, 2019 at 10:28 AM Kevin Sheppard < > kevin.k.sheppard at gmail.com> wrote: > >> There are some users of the NumPy C code in randomkit. This was never >> officially supported. There has been a long open issue to provide this >> officially. >> >> When I wrote randomgen I supplied .pdx files that make it simpler to >> write Cython code that uses the components. The lower-level API has not >> had much scrutiny and is in need of a clean-up. I thought this would also >> encourage users to extend the random machinery themselves as part of their >> project or code so as to minimize the requests for new (exotic) >> distributions to be included in Generator. >> >> Most of the generator functions follow a pattern random_DISTRIBUTION. >> Some have a bit more name mangling which can easily be cleaned up (like >> ranomd_gauss_zig, which should become PREFIX_standard_normal). >> >> Ralf Gommers suggested unprefixed names. >> > > I suggested that the names should match the Python API, which I think > isn't quite the same. The Python API doesn't contain things like "gamma", > "t" or "f". > My gamma and f (I misspoke about t) I mean the names that appear as Generator methods: https://docs.scipy.org/doc/numpy/reference/random/generator.html#numpy.random.Generator If I understand your point (and with reference with page linked below), then there would be something like numpy.random.cython_random.gamma (which is currently called numpy.random.distributions.random_gamma). Maybe I'm not understanding your point about the Python API though. > > I tried this in a local branch and it was a bit ugly since some of the >> distributions have common math names (e.g., gamma) and others are very >> short (e.g., t or f). I think a prefix is needed, and after looking >> through the C API docs npy_random_ seemed like a reasonable choice (since >> these live in numpy.random). >> >> Any thoughts on the following questions are welcome (others too): >> >> 1. Should there be a prefix on the C functions? >> 2. If so, what should the prefix be? >> > > Before worrying about naming details, can we start with "what should be in > the C/Cython API"? If I look through the current pxd files, there's a lot > there that looks like it should be private, and what we expose as Python > API is not all present as far as I can tell (which may be fine, if the only > goal is to let people write new generators rather than use the existing > ones from Cython without the Python overhead). > >From the ground up, for someone who want to write a new distribution: 1. The bit generators. These currently have no pxd files. These are always going to be Python obects and so it isn't absolutely essential to expose them with a low-level API. All that is needed is the capsule which has the bitgen struct, which is what is really needed 2. bitgen_t which is in common.pxd. This is essential since it enables access to the callables to produce basic psueod random values. 3. The distributions, which are in distributions.pdx. The integer generators are in bounded_integers.pxd.in, which would need to be processed and then included after processing (same for bounded_integers.pxd.in). a. The legacy in legacy_distributions.pxd. If the legacy is included, then aug_bitgen_t needs to also be included which is also in legacy_distributions.pxd 4. The "helpers" which are defined in common.pxd. These simplify implementing complete distributions which support automatix broadcasting when needed. They are only provided to match the signatures for the functions in distributions.pxd. The highest level ones are cont() and disc(). Some of the lower-level ones could easily be marked as private. 1,2 and 3 are pretty important. 4 could be in or out. It could help if someone wanted to write a fully featured distribution w/ broadcasting, but I think this use case is less likely than someone say wanting to implement a custom rejection sampler. For someone who wants to write a new BitGenerator 1. BitGenerator and SeedSequence in bit_generato.pxd are required. As is bitgen_t which is in common. bitgen_t should probably move to bit_generators. 2. aligned_malloc: This has been requested on multiple occasions and is practically important when interfacing with SSE or AVX code. It is potentially more general than the random module. This lives in common.pxd. > > In the end we want to get to a doc section similar to > http://scipy.github.io/devdocs/special.cython_special.html I'd think. > > 3. Should the legacy C functions be part of the API -- these are mostly >> the ones that produce or depend on polar transform normals (Box-Muller). I >> have a feeling no, but there may be reasons to prefer BM since they do not >> depend on rejection sampling. >> > > Even if there would be a couple of users interested, it would be odd > starting to do this after deeming the code "legacy". So I agree with your > "no". > > >> 4. Should low-level API be consumable like any other numpy C API by >> including the usual header locations and library locations? Right now, the >> pxd simplifies writing Cython but users have sp specify the location of the >> headers and source manually An alternative would be to provide a function >> like np.get_include() -> np.random.get_include() that would specialize in >> random. >> > > Good question. I'm not sure this is "like any other NumPy C API". We don't > provide a C API for fft, linalg or other functionality further from core > either. It's possible of course, but does it really help library authors or > end users? > SciPy provides a very useful Cython API to low-level linalg. But there is little reason to provide C APIs to fft or linalg since they are all directly available. The code is random is AFAICT, one of the more complete C implementations of functions needed to produce variates from many distributions (mostly due to its ancestor randomkit, which AFAICT isn't maintained). An ideal API would allow projects like https://github.com/deepmind/torch-randomkit/tree/master/randomkit or numba to consume the code in NumPy without vendoring it. Best wishes, Kevin > Cheers, > Ralf > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From evgeny.burovskiy at gmail.com Thu Sep 19 07:10:44 2019 From: evgeny.burovskiy at gmail.com (Evgeni Burovski) Date: Thu, 19 Sep 2019 14:10:44 +0300 Subject: [Numpy-discussion] Low-level API for Random In-Reply-To: References: Message-ID: >> >> >> 1. Should there be a prefix on the C functions? >> 2. If so, what should the prefix be? >> > Preferably, yes. Don't have an opinion on an exact prefix, as long as it allows me to e.g. swap a normal distribution generator in my cython/c++ user code without too much mess. if the only goal is to let people write new generators rather than use the > existing ones from Cython without the Python overhead). > Is it the only goal? If possible, it'd be worth IMO supporting something more like cython_lapack, so that one can use existing machinery from a cython application. Use case: an MC application where drawing random variates is in a hot loop. Then I can start from a python prototype and cythonize it gradually. Sure I can reach into non-public parts In the end we want to get to a doc section similar to > http://scipy.github.io/devdocs/special.cython_special.html I'd think. > > 3. Should the legacy C functions be part of the API -- these are mostly >> the ones that produce or depend on polar transform normals (Box-Muller). I >> have a feeling no, but there may be reasons to prefer BM since they do not >> depend on rejection sampling. >> > > Even if there would be a couple of users interested, it would be odd > starting to do this after deeming the code "legacy". So I agree with your > "no". > Unless it's a big maintenance burden, is there an issue with exposing both ziggurat_normal and bm_normal? Sure, I can cook up a BM transform myself (yet again) however. > >> 4. Should low-level API be consumable like any other numpy C API by >> including the usual header locations and library locations? Right now, the >> pxd simplifies writing Cython but users have sp specify the location of the >> headers and source manually An alternative would be to provide a function >> like np.get_include() -> np.random.get_include() that would specialize in >> random. >> > > Good question. I'm not sure this is "like any other NumPy C API". We don't > provide a C API for fft, linalg or other functionality further from core > either. It's possible of course, but does it really help library authors or > end users? > While I gave only anecdotal evidence, not hard data, I suspect that both cython and C API would be useful. E.g. there are c++ applications which use boost::random, would be nice to be able to swap it for numpy.random. Also reproducibility: it's *much* easier to debug the compiled app vs its python prototype if random streams are identical. Like I said, take all I'm saying with enough salt, as I'm wearing my user hat here. Cheers, Evgeni -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Thu Sep 19 10:52:16 2019 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 19 Sep 2019 10:52:16 -0400 Subject: [Numpy-discussion] Low-level API for Random In-Reply-To: References: Message-ID: On Thu, Sep 19, 2019 at 5:24 AM Ralf Gommers wrote: > > On Thu, Sep 19, 2019 at 10:28 AM Kevin Sheppard < > kevin.k.sheppard at gmail.com> wrote: > >> There are some users of the NumPy C code in randomkit. This was never >> officially supported. There has been a long open issue to provide this >> officially. >> >> When I wrote randomgen I supplied .pdx files that make it simpler to >> write Cython code that uses the components. The lower-level API has not >> had much scrutiny and is in need of a clean-up. I thought this would also >> encourage users to extend the random machinery themselves as part of their >> project or code so as to minimize the requests for new (exotic) >> distributions to be included in Generator. >> >> Most of the generator functions follow a pattern random_DISTRIBUTION. >> Some have a bit more name mangling which can easily be cleaned up (like >> ranomd_gauss_zig, which should become PREFIX_standard_normal). >> >> Ralf Gommers suggested unprefixed names. >> > > I suggested that the names should match the Python API, which I think > isn't quite the same. The Python API doesn't contain things like "gamma", > "t" or "f". > As the implementations evolve, they aren't going to match one-to-one 100%. The implementations are shared by the legacy RandomState. When we update an algorithm, we'll need to make a new function with the better algorithm for Generator to use, then we'll have two C functions roughly corresponding to the same method name (albeit on different classes). C doesn't give us as many namespace options as Python. We could rely on conventional prefixes to distinguish between the two classes of function (e.g. legacy_normal vs random_normal). There are times when it would be nice to be more descriptive about the algorithm difference (e.g. random_normal_polar vs random_normal_ziggurat), most of our algorithm updates will be minor tweaks rather than changing to a new named algorithm. > I tried this in a local branch and it was a bit ugly since some of the >> distributions have common math names (e.g., gamma) and others are very >> short (e.g., t or f). I think a prefix is needed, and after looking >> through the C API docs npy_random_ seemed like a reasonable choice (since >> these live in numpy.random). >> >> Any thoughts on the following questions are welcome (others too): >> >> 1. Should there be a prefix on the C functions? >> 2. If so, what should the prefix be? >> > > Before worrying about naming details, can we start with "what should be in > the C/Cython API"? If I look through the current pxd files, there's a lot > there that looks like it should be private, and what we expose as Python > API is not all present as far as I can tell (which may be fine, if the only > goal is to let people write new generators rather than use the existing > ones from Cython without the Python overhead) > Using the existing distributions from Cython was a requested feature and an explicit goal, yes. There are users waiting for this. -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at gmail.com Thu Sep 19 11:10:28 2019 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Thu, 19 Sep 2019 11:10:28 -0400 Subject: [Numpy-discussion] Proposal to accept NEP 32: Remove the financial functions from NumPy Message-ID: NEP 32 is available at https://numpy.org/neps/nep-0032-remove-financial-functions.html Recent timeline: - 30-Aug-2019 - A pull request with NEP 32 submitted. - 03-Sep-2019 - Announcement of the NEP 32 pull request on the NumPy-Discussion mailing list, with the text of the NEP included in the email. - 08-Sep-2019 - NEP 32 announced on the PyData mailing list (not standard procedure, but suggested in a response to the email in NumPy-Discussion). - 09-Sep-2019 - NEP 32 pull request merged. - 11-Sep-2019 - Emails sent to the NumPy-Discussion and PyData mailing lists with links to the online version of the NEP. Only one user (speaking for a group of 12 or so) expressed a preference for keeping the functions in NumPy, and that user acknowledged "Probably not a huge inconvenience if we would have to use another library". (The NEP includes a plan to provide an alternative package for the functions.) Several other users were in favor of removing them. Among the current NumPy developers who have expressed an opinion, all are in favor of removing the functions. There have been no additional email responses since the reminder was sent on September 11. In accordance with NEP 0, I propose that the status of NEP 32 be changed to *Accepted*. If there are no substantive objections within 7 days from this email, then the NEP will be accepted; see NEP 0 for more details ( https://numpy.org/neps/nep-0000.html#how-a-nep-becomes-accepted). Warren -------------- next part -------------- An HTML attachment was scrubbed... URL: From sseibert at anaconda.com Thu Sep 19 11:10:36 2019 From: sseibert at anaconda.com (Stanley Seibert) Date: Thu, 19 Sep 2019 10:10:36 -0500 Subject: [Numpy-discussion] Low-level API for Random In-Reply-To: References: Message-ID: Just to chime in: Numba would definitely appreciate C functions to access the random distribution implementations, and have a side-project (numba-scipy) that is making the Cython wrapped functions in SciPy visible to Numba. On Thu, Sep 19, 2019 at 5:41 AM Kevin Sheppard wrote: > > > On Thu, Sep 19, 2019 at 10:23 AM Ralf Gommers > wrote: > >> >> >> On Thu, Sep 19, 2019 at 10:28 AM Kevin Sheppard < >> kevin.k.sheppard at gmail.com> wrote: >> >>> There are some users of the NumPy C code in randomkit. This was never >>> officially supported. There has been a long open issue to provide this >>> officially. >>> >>> When I wrote randomgen I supplied .pdx files that make it simpler to >>> write Cython code that uses the components. The lower-level API has not >>> had much scrutiny and is in need of a clean-up. I thought this would also >>> encourage users to extend the random machinery themselves as part of their >>> project or code so as to minimize the requests for new (exotic) >>> distributions to be included in Generator. >>> >>> Most of the generator functions follow a pattern random_DISTRIBUTION. >>> Some have a bit more name mangling which can easily be cleaned up (like >>> ranomd_gauss_zig, which should become PREFIX_standard_normal). >>> >>> Ralf Gommers suggested unprefixed names. >>> >> >> I suggested that the names should match the Python API, which I think >> isn't quite the same. The Python API doesn't contain things like "gamma", >> "t" or "f". >> > > My gamma and f (I misspoke about t) I mean the names that appear as > Generator methods: > > > https://docs.scipy.org/doc/numpy/reference/random/generator.html#numpy.random.Generator > > > If I understand your point (and with reference with page linked below), > then there would be something like numpy.random.cython_random.gamma (which > is currently called numpy.random.distributions.random_gamma). Maybe I'm not > understanding your point about the Python API though. > > >> >> I tried this in a local branch and it was a bit ugly since some of the >>> distributions have common math names (e.g., gamma) and others are very >>> short (e.g., t or f). I think a prefix is needed, and after looking >>> through the C API docs npy_random_ seemed like a reasonable choice (since >>> these live in numpy.random). >>> >>> Any thoughts on the following questions are welcome (others too): >>> >>> 1. Should there be a prefix on the C functions? >>> 2. If so, what should the prefix be? >>> >> >> Before worrying about naming details, can we start with "what should be >> in the C/Cython API"? If I look through the current pxd files, there's a >> lot there that looks like it should be private, and what we expose as >> Python API is not all present as far as I can tell (which may be fine, if >> the only goal is to let people write new generators rather than use the >> existing ones from Cython without the Python overhead). >> > > From the ground up, for someone who want to write a new distribution: > 1. The bit generators. These currently have no pxd files. These are > always going to be Python obects and so it isn't absolutely essential to > expose them with a low-level API. All that is needed is the capsule which > has the bitgen struct, which is what is really needed > 2. bitgen_t which is in common.pxd. This is essential since it enables > access to the callables to produce basic psueod random values. > 3. The distributions, which are in distributions.pdx. The integer > generators are in bounded_integers.pxd.in, which would need to be > processed and then included after processing (same for > bounded_integers.pxd.in). > a. The legacy in legacy_distributions.pxd. If the legacy is > included, then aug_bitgen_t needs to also be included which is also in > legacy_distributions.pxd > 4. The "helpers" which are defined in common.pxd. These simplify > implementing complete distributions which support automatix broadcasting > when needed. They are only provided to match the signatures for the > functions in distributions.pxd. The highest level ones are cont() and > disc(). Some of the lower-level ones could easily be marked as private. > > 1,2 and 3 are pretty important. 4 could be in or out. It could help if > someone wanted to write a fully featured distribution w/ broadcasting, but > I think this use case is less likely than someone say wanting to implement > a custom rejection sampler. > > > For someone who wants to write a new BitGenerator > > 1. BitGenerator and SeedSequence in bit_generato.pxd are required. As is > bitgen_t which is in common. bitgen_t should probably move to > bit_generators. > 2. aligned_malloc: This has been requested on multiple occasions and is > practically important when interfacing with SSE or AVX code. It is > potentially more general than the random module. This lives in common.pxd. > > > >> >> In the end we want to get to a doc section similar to >> http://scipy.github.io/devdocs/special.cython_special.html I'd think. >> >> 3. Should the legacy C functions be part of the API -- these are mostly >>> the ones that produce or depend on polar transform normals (Box-Muller). I >>> have a feeling no, but there may be reasons to prefer BM since they do not >>> depend on rejection sampling. >>> >> >> Even if there would be a couple of users interested, it would be odd >> starting to do this after deeming the code "legacy". So I agree with your >> "no". >> >> >>> 4. Should low-level API be consumable like any other numpy C API by >>> including the usual header locations and library locations? Right now, the >>> pxd simplifies writing Cython but users have sp specify the location of the >>> headers and source manually An alternative would be to provide a function >>> like np.get_include() -> np.random.get_include() that would specialize in >>> random. >>> >> >> Good question. I'm not sure this is "like any other NumPy C API". We >> don't provide a C API for fft, linalg or other functionality further from >> core either. It's possible of course, but does it really help library >> authors or end users? >> > > SciPy provides a very useful Cython API to low-level linalg. But there is > little reason to provide C APIs to fft or linalg since they are all > directly available. The code is random is AFAICT, one of the more complete > C implementations of functions needed to produce variates from many > distributions (mostly due to its ancestor randomkit, which AFAICT isn't > maintained). > > An ideal API would allow projects like > https://github.com/deepmind/torch-randomkit/tree/master/randomkit or > numba to consume the code in NumPy without vendoring it. > > Best wishes, > Kevin > > >> Cheers, >> Ralf >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Thu Sep 19 13:51:15 2019 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Thu, 19 Sep 2019 10:51:15 -0700 Subject: [Numpy-discussion] DType Roadmap/NEP Discussion In-Reply-To: References: <5f00290851aa9418215c4bd0fea3378bf94dcc79.camel@sipsolutions.net> Message-ID: <445fc648fcc51c2893b8f8037bed05651a594ac2.camel@sipsolutions.net> On Wed, 2019-09-18 at 21:33 -0700, Ralf Gommers wrote: > Hi Sebastian, > > > On Wed, Sep 18, 2019 at 4:35 PM Sebastian Berg < > sebastian at sipsolutions.net> wrote: > > Hi all, > > > > to try and make some progress towards a decision since the broad > > design > > is pretty much settling from my side. I am thinking about making a > > meeting, and suggest Monday at 11am Pacific Time (I am open to > > other > > times though). > > > > My hope is to get everyone interested on board, so that we can make > > an > > informed decision about the general direction very soon. So just > > reach > > out, or discuss on the mailing list as well. > > > > The current draft for an NEP is here: > > https://hackmd.io/kxuh15QGSjueEKft5SaMug?both > > > > There are some design goals that I would like to clear up. > > The design itself seems very sensible to me insofar as I understand > it. After having read your document again, I think you're still > missing the actual goals though. "structure of class layout" and > "type hierarchy" are important, but they're not the goals. You're > touching on the real goals in places, but it may be valuable to be > much more explicit there. > Good points, I will try and incorporate some. Had answers to a few, but I do not think it is too helpful here and now; this got a bit longer than expected, but more general... There is a bit of clash of long term vs. mid term goals. My goal is to enable pretty much any conceivable long term goal, but in the mid/short term, that means: 1. Convince you (and me) that the proposed API can handle everything we can think of now and can grow easily (e.g. optimization, new features). 2. Convince everyone that the current state is unacceptable enough that any added maintenance burden (during the transition phase) is acceptable. I personally think, the maintenance will definitely get better quickly, even if we reuse a lot of old code. The main issue is the initial massive set of changes. 3. Any necessary ABI/API breakage that may happen is acceptable. The DType breakage itself is very limited. Specific UFuncs may break more, but only in hidden features that I know only of astropy as users (and they are OK with us breaking it), numba might also be affected, but I think less so. The main point right now is organizing everything from monolithic -> operator based, improving long term maintainability and extensibility. Dog feeding ourselves for the same reason. E.g. the AbstractDType hierarchy... it is something we could discuss. I think it is right, since it replaces `dtype.kind` and makes for powerful organization of dispatching in ufuncs. But, we could limit it initially! To give one example: Say ora creates many DTypes with different datetime representations. ora could create an AbstractOraDType, so that you can do easy isinstance checks. Especially, during ufunc dispatch ora can use it to write a single function for figuring out promotion: `OraDType1 + OraDType1 -> OraDType1 + OraDType2.astype(OraDType1)`. I agree that this probably missing: UFunc dispatch is a major reason for the split of "common DType" (class) and "common dtype instance" (of strings with different length) functionality. I think it is a reasonable split in any case, but for dispatching the first is sufficient, while the second is more naturally found after dispatching (only after you know you have Unit * Unit, can you reasonably figure out the actual output `Unit("m*m")`). Best, Sebastian PS: The only real limitation that I see right now is allowing promotion to inspect array values. (This example is not very good probably) For example `int_arr.astype(Categorical)`, wants to find `Categorical(np.unique(int_arr))`). I think not providing that is acceptable, because categorical can provide its own function to find the actual categorical instance. Or implement a Categorical and FrozenCategorical, so the dtype instance is mutable in that it can add new categories. (For array coercion from a list of items, the issue is different, and allowing such things can be provided or added later) > Here are some example goals: > > 1. Make creating new dtypes via the NumPy C API take >4x less lines > of code on average (in practice: for rational/quaternion, hard to > measure otherwise). > > 2. Make it possible to create new dypes with full functionality via > the NumPy Python API. Performance may be up to 1-2 orders of > magnitude worse than when creating the same dtype via the C API; the > main purpose is to allow easier prototyping of new dtypes. > > 3. Make the NumPy codebase more maintainable by removing special- > casing of datetime dtypes in many places. > > 4. Enable creation of a units library whose arrays *are* numpy arrays > rather than a subclass or duck array. This will make such a library > work much better with SciPy and other existing libraries that use > np.asarray extensively. > > 5. Hide currently exposed implementation details in the C API so > long-term .... (you have this one, but it would be nice to work it > out a little more - after all we recently considered reverting the > deprecation for direct field access, so how important is this?) > > 6. Improve casting behavior for external dtypes > > 7. Make np.char behavior better (you mention fixed > length strings work poorly now, but not what would change) > > > Listing non-goals would also be useful: > > 1. Performance: no significant performance improvements are expected. > We aim for no performance regressions. > > 2. Introducing new dtypes into NumPy itself > > 3. Pandas ExtensionArrays? You mention them, but does this dtype > redesign help Pandas in any way or not? > > 4. Changes to NumPy's current casting rules > > 5. Allow creation of dtypes that don't fit the current NumPy model of > what a dtype is (e.g. ref [1]), such as a variable-length string > dtype. > > > Many of those (and there can be more, this is just what came to mind > now) can/should be a paragraph or section. In my experience > describing these goals and requirements well takes about 15-30% of > the length of the design description. Think of for example a Pandas > or units library maintainer reading this: they should be able to stop > reading at where you now have "Overview Graphic" and have a pretty > clear high-level understanding of what this whole redesign will mean > for them. Same for a NumPy maintainer who wants to get a sense of > what the benefits and impacts will be: reading only (the expanded > version of) your Abstract, Motivation and Scope, and Backwards > Compatibility sections should be enough. > > Here's a concrete question, that's the type of thing I'd like to > understand without having to understand the whole design in detail: > ``` > >>> import datetime > > >>> import pandas as pd > > >>> import datetime > > >>> dti = pd.to_datetime(['1/1/2018', np.datetime64('2018-01-01'), > ... datetime.datetime(2018, 1, 1)]) > > >>> > > >>> dti.values > > array(['2018-01-01T00:00:00.000000000', '2018-01- > 01T00:00:00.000000000', > '2018-01-01T00:00:00.000000000'], dtype='datetime64[ns]') > >>> dti.values.dtype > > dtype(' >>> isinstance(dti.values.dtype, np.dtype) > > True > >>> dti.dtype == dti.values.dtype # okay, that's nice > > True > > >>> start = pd.to_datetime('2015-02-24') > > >>> rng = pd.date_range(start, periods=3) > > >>> t = pd.Series(rng) > > >>> t_withzone = > t.dt.tz_localize('UTC').dt.tz_convert('Asia/Kolkata') > > >>> t_withzone > > 0 2015-02-24 05:30:00+05:30 > 1 2015-02-25 05:30:00+05:30 > 2 2015-02-26 05:30:00+05:30 > dtype: datetime64[ns, Asia/Kolkata] > >>> t_withzone.dtype > > datetime64[ns, Asia/Kolkata] > >>> t_withzone.values.dtype > > dtype(' >>> t_withzone.dtype == t_withzone.values.dtype # could this be > True in the future? > False > ``` > So can Pandas create timezone-aware numpy dtypes in the future if > they want to, or would they still be better off rolling their own? > > > Also one question/comment about the design content. When looking at > the current external dtypes (e.g. [2]), a large part of the work of > implementing a new dtype now deals with ufunc behavior. It's not > clear from your document how that changes with the new design, can > you add something about that? > > Cheers, > Ralf > > [1] > http://scipy-lectures.org/advanced/advanced_numpy/index.html#the-descriptor > [2] > https://github.com/moble/quaternion/blob/master/numpy_quaternion.c > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From matti.picus at gmail.com Thu Sep 19 14:35:50 2019 From: matti.picus at gmail.com (Matti Picus) Date: Thu, 19 Sep 2019 21:35:50 +0300 Subject: [Numpy-discussion] DType Roadmap/NEP Discussion In-Reply-To: <5f00290851aa9418215c4bd0fea3378bf94dcc79.camel@sipsolutions.net> References: <5f00290851aa9418215c4bd0fea3378bf94dcc79.camel@sipsolutions.net> Message-ID: <235dc0a0-0ef2-8b06-3a5d-05c2746ae750@gmail.com> On 19/9/19 2:34 am, Sebastian Berg wrote: > Hi all, > > to try and make some progress towards a decision since the broad design > is pretty much settling from my side. I am thinking about making a > meeting, and suggest Monday at 11am Pacific Time (I am open to other > times though). > > My hope is to get everyone interested on board, so that we can make an > informed decision about the general direction very soon. So just reach > out, or discuss on the mailing list as well. > > The current draft for an NEP is here: > https://hackmd.io/kxuh15QGSjueEKft5SaMug?both Mon Sept 23 sounds good. Please reach out to the possible consumers of the API to get wider input. - Pandas - Astropy - Numba - ??? It may be a bit too short notice, but it seems like there is enough to talk about even if only the NumPy community show up. Where/how will the meeting take place? Matti From warren.weckesser at gmail.com Thu Sep 19 19:09:25 2019 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Thu, 19 Sep 2019 19:09:25 -0400 Subject: [Numpy-discussion] DType Roadmap/NEP Discussion In-Reply-To: <5f00290851aa9418215c4bd0fea3378bf94dcc79.camel@sipsolutions.net> References: <5f00290851aa9418215c4bd0fea3378bf94dcc79.camel@sipsolutions.net> Message-ID: On 9/18/19, Sebastian Berg wrote: > Hi all, > > to try and make some progress towards a decision since the broad design > is pretty much settling from my side. I am thinking about making a > meeting, and suggest Monday at 11am Pacific Time (I am open to other > times though). That works for me. Warren > > My hope is to get everyone interested on board, so that we can make an > informed decision about the general direction very soon. So just reach > out, or discuss on the mailing list as well. > > The current draft for an NEP is here: > https://hackmd.io/kxuh15QGSjueEKft5SaMug?both > > There are some design goals that I would like to clear up. I would > prefer to avoid deep discussions of some specific issues, since I think > the important decision right now is that my general start is in the > right direction. > > It is not an easy topic, so my plan would be try and briefly summarize > that and then hopefully clarify any questions and then we can discuss > why alternatives are rejected. The most important thing is maybe > gathering concerns which need to be clarified before we can go towards > accepting the general design ideas. > > The main point of the NEP draft is actually captured by the picture in > the linked document: DTypes are classes (such as Float64) and what is > attached to the array is an instance of that class " ">float64". Additionally, we would have AbstractDType classes which > cannot be instantiated but define a type hierarchy. > > To list the main points: > > * DTypes are classes (corresponding to the current type number) > > * `arr.dtype` is an instances of its class, allowing to store > additional information such as a physical unit, the string length. > > * Most things are defined in special dtype slots similar to Pythons > type and number slots. They will be hidden and can be set through > an init function similar to `PyType_FromSpec` [1]. > > * Promotion is defined primarily on the DType classes > > * Casting from one DType to another DType is defined by a new > CastingImpl object (should become a special ufunc) > - e.g. for strings, the CastingImpl is in charge of finding the > correct string length > > * The AbstractDType hierarchy will be used to decide the signature when > calling UFuncs. > > > The main iffier points I can think of are: > > * NumPy currently uses value based promotion in some cases, which > requires special AbstractDTypes to describe (and some legacy > paths). (They are used use more like instances than typical classes) > > * Casting between flexible dtypes (such as strings) is a multi-step > process to figure out the actual output dtype. > - An example is: `np.can_cast("float64", "S3")` first finding > that `Float64->String` is possible in principle and then > asking the CastingImpl to find that `float64->S3` is not. > > * We have to break ABI compatibility in very minor, back-portable > way. More smaller incompatibilities are likely [2]. > > * Since it is a major redesign, a lot of code has to be added/touched, > although it is possible to channel much of it back into the old > machinery. > > * A largish amount of new API around new DType type objects and also > DTypeMeta type objects, which users can (although usually do not have > to) subclass. > > However, most other designs will have similar issues. Basically, I > currently really think this is "right", even if some details may end up > a tricky. > > Best, > > Sebastian > > > PS: The one thing outside the more general list above that I may want > to discuss is how acceptable a global dict/mapping for dtype discovery > during `np.array` coercion is (mapping python type -> dtype)... > > > [1] https://docs.python.org/3/c-api/type.html#c.PyType_FromSpec > [2] One possible issue may be "S0" which is normally used to denote > what in the new API would be the `String` DType class. > From ralf.gommers at gmail.com Thu Sep 19 23:02:46 2019 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Fri, 20 Sep 2019 05:02:46 +0200 Subject: [Numpy-discussion] Low-level API for Random In-Reply-To: References: Message-ID: On Thu, Sep 19, 2019 at 4:53 PM Robert Kern wrote: > On Thu, Sep 19, 2019 at 5:24 AM Ralf Gommers > wrote: > >> >> On Thu, Sep 19, 2019 at 10:28 AM Kevin Sheppard < >> kevin.k.sheppard at gmail.com> wrote: >> >>> There are some users of the NumPy C code in randomkit. This was never >>> officially supported. There has been a long open issue to provide this >>> officially. >>> >>> When I wrote randomgen I supplied .pdx files that make it simpler to >>> write Cython code that uses the components. The lower-level API has not >>> had much scrutiny and is in need of a clean-up. I thought this would also >>> encourage users to extend the random machinery themselves as part of their >>> project or code so as to minimize the requests for new (exotic) >>> distributions to be included in Generator. >>> >>> Most of the generator functions follow a pattern random_DISTRIBUTION. >>> Some have a bit more name mangling which can easily be cleaned up (like >>> ranomd_gauss_zig, which should become PREFIX_standard_normal). >>> >>> Ralf Gommers suggested unprefixed names. >>> >> >> I suggested that the names should match the Python API, which I think >> isn't quite the same. The Python API doesn't contain things like "gamma", >> "t" or "f". >> > > As the implementations evolve, they aren't going to match one-to-one 100%. > The implementations are shared by the legacy RandomState. When we update an > algorithm, we'll need to make a new function with the better algorithm for > Generator to use, then we'll have two C functions roughly corresponding to > the same method name (albeit on different classes). C doesn't give us as > many namespace options as Python. We could rely on conventional prefixes to > distinguish between the two classes of function (e.g. legacy_normal vs > random_normal). > That seems simple and clear There are times when it would be nice to be more descriptive about the > algorithm difference (e.g. random_normal_polar vs random_normal_ziggurat), > We decided against versioning algorithms in NEP 19, so an update to an algorithm would mean we'd want to get rid of the older version (unless it's still in use by legacy). So AFAICT we'd never have both random_normal_polar and random_normal_ziggurat present at the same time? I may be missing your point here, but if we have in Python `Generator.normal` and can switch its implementation from polar to ziggurat or vice versa without any deprecation, then why would we want to switch names in the C API? most of our algorithm updates will be minor tweaks rather than changing to > a new named algorithm. > > >> I tried this in a local branch and it was a bit ugly since some of the >>> distributions have common math names (e.g., gamma) and others are very >>> short (e.g., t or f). I think a prefix is needed, and after looking >>> through the C API docs npy_random_ seemed like a reasonable choice (since >>> these live in numpy.random). >>> >>> Any thoughts on the following questions are welcome (others too): >>> >>> 1. Should there be a prefix on the C functions? >>> 2. If so, what should the prefix be? >>> >> >> Before worrying about naming details, can we start with "what should be >> in the C/Cython API"? If I look through the current pxd files, there's a >> lot there that looks like it should be private, and what we expose as >> Python API is not all present as far as I can tell (which may be fine, if >> the only goal is to let people write new generators rather than use the >> existing ones from Cython without the Python overhead) >> > > Using the existing distributions from Cython was a requested feature and > an explicit goal, yes. There are users waiting for this. > Thanks, clear (also from other responses on this thread). Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Thu Sep 19 23:25:21 2019 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 19 Sep 2019 23:25:21 -0400 Subject: [Numpy-discussion] Low-level API for Random In-Reply-To: References: Message-ID: On Thu, Sep 19, 2019 at 11:04 PM Ralf Gommers wrote: > > > On Thu, Sep 19, 2019 at 4:53 PM Robert Kern wrote: > >> On Thu, Sep 19, 2019 at 5:24 AM Ralf Gommers >> wrote: >> >>> >>> On Thu, Sep 19, 2019 at 10:28 AM Kevin Sheppard < >>> kevin.k.sheppard at gmail.com> wrote: >>> >>>> There are some users of the NumPy C code in randomkit. This was never >>>> officially supported. There has been a long open issue to provide this >>>> officially. >>>> >>>> When I wrote randomgen I supplied .pdx files that make it simpler to >>>> write Cython code that uses the components. The lower-level API has not >>>> had much scrutiny and is in need of a clean-up. I thought this would also >>>> encourage users to extend the random machinery themselves as part of their >>>> project or code so as to minimize the requests for new (exotic) >>>> distributions to be included in Generator. >>>> >>>> Most of the generator functions follow a pattern random_DISTRIBUTION. >>>> Some have a bit more name mangling which can easily be cleaned up (like >>>> ranomd_gauss_zig, which should become PREFIX_standard_normal). >>>> >>>> Ralf Gommers suggested unprefixed names. >>>> >>> >>> I suggested that the names should match the Python API, which I think >>> isn't quite the same. The Python API doesn't contain things like "gamma", >>> "t" or "f". >>> >> >> As the implementations evolve, they aren't going to match one-to-one >> 100%. The implementations are shared by the legacy RandomState. When we >> update an algorithm, we'll need to make a new function with the better >> algorithm for Generator to use, then we'll have two C functions roughly >> corresponding to the same method name (albeit on different classes). C >> doesn't give us as many namespace options as Python. We could rely on >> conventional prefixes to distinguish between the two classes of function >> (e.g. legacy_normal vs random_normal). >> > > That seems simple and clear > > There are times when it would be nice to be more descriptive about the >> algorithm difference (e.g. random_normal_polar vs random_normal_ziggurat), >> > > We decided against versioning algorithms in NEP 19, so an update to an > algorithm would mean we'd want to get rid of the older version (unless it's > still in use by legacy). So AFAICT we'd never have both random_normal_polar > and random_normal_ziggurat present at the same time? > Well, we must because one's used by the legacy RandomState and one's used by Generator. :-) > I may be missing your point here, but if we have in Python > `Generator.normal` and can switch its implementation from polar to ziggurat > or vice versa without any deprecation, then why would we want to switch > names in the C API? > I didn't mean to suggest that we'd have an unbounded number of functions as we improve the algorithms, just that we might have 2 once we decide to change something about the algorithm. We need 2 to support both the improved algorithm in Generator and the legacy algorithm in RandomState. The current implementation of the C function would be copied to a new name (`legacy_foo` or whatever), then we'd make RandomState use that frozen copy, then we make the desired modifications to the main function that Generator is referencing (`random_foo`). Or we could just make those legacy copies now so that people get to use them explicitly under the legacy names, whatever they are, and we can feel more free to modify the main implementations. I suggested this earlier, but convinced myself that it wasn't strictly necessary. But then I admit I was more focused on the Python API stability than any promises about the C/Cython API. We might end up with more than 2 implementations if we need to change something about the function signature, for whatever reason, and we want to retain C/Cython API compatibility with older code. The C functions aren't necessarily going to be one-to-one to the Generator methods. They're just part of the implementation. So for example, if we wanted to, say, precompute some intermediate values from the given scalar parameters so we don't have to recompute them for each element of the `size`-large requested output, we might do that in one C function and pass those intermediate values as arguments to the C function that does the actual sampling. So we'd have two C functions for that one Generator method, and the sampling C function will not have the same signature as it did before the modification that refactored the work into two functions. In that case, I would not be so strict as to require that `Generator.foo` is one to one with `random_foo`. To your point, though, we don't have to use gratuitously different names when there _is_ a one-to-one relationship. `random_gauss_zig` should be `random_normal`. -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From matti.picus at gmail.com Fri Sep 20 01:36:50 2019 From: matti.picus at gmail.com (Matti Picus) Date: Fri, 20 Sep 2019 08:36:50 +0300 Subject: [Numpy-discussion] Low-level API for Random In-Reply-To: References: Message-ID: <045a5563-bc29-c917-d1c8-7da082624cc5@gmail.com> On 20/9/19 6:25 am, Robert Kern wrote: > > Well, we must because one's used by the legacy RandomState and one's > used by Generator. :-) > I would prefer not to create a legacy C-API at all. Are we required to from the NEP? Matti From ralf.gommers at gmail.com Fri Sep 20 06:07:33 2019 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Fri, 20 Sep 2019 12:07:33 +0200 Subject: [Numpy-discussion] Low-level API for Random In-Reply-To: References: Message-ID: On Fri, Sep 20, 2019 at 5:29 AM Robert Kern wrote: > On Thu, Sep 19, 2019 at 11:04 PM Ralf Gommers > wrote: > >> >> >> On Thu, Sep 19, 2019 at 4:53 PM Robert Kern >> wrote: >> >>> On Thu, Sep 19, 2019 at 5:24 AM Ralf Gommers >>> wrote: >>> >>>> >>>> On Thu, Sep 19, 2019 at 10:28 AM Kevin Sheppard < >>>> kevin.k.sheppard at gmail.com> wrote: >>>> >>>>> There are some users of the NumPy C code in randomkit. This was never >>>>> officially supported. There has been a long open issue to provide this >>>>> officially. >>>>> >>>>> When I wrote randomgen I supplied .pdx files that make it simpler to >>>>> write Cython code that uses the components. The lower-level API has not >>>>> had much scrutiny and is in need of a clean-up. I thought this would also >>>>> encourage users to extend the random machinery themselves as part of their >>>>> project or code so as to minimize the requests for new (exotic) >>>>> distributions to be included in Generator. >>>>> >>>>> Most of the generator functions follow a pattern random_DISTRIBUTION. >>>>> Some have a bit more name mangling which can easily be cleaned up (like >>>>> ranomd_gauss_zig, which should become PREFIX_standard_normal). >>>>> >>>>> Ralf Gommers suggested unprefixed names. >>>>> >>>> >>>> I suggested that the names should match the Python API, which I think >>>> isn't quite the same. The Python API doesn't contain things like "gamma", >>>> "t" or "f". >>>> >>> >>> As the implementations evolve, they aren't going to match one-to-one >>> 100%. The implementations are shared by the legacy RandomState. When we >>> update an algorithm, we'll need to make a new function with the better >>> algorithm for Generator to use, then we'll have two C functions roughly >>> corresponding to the same method name (albeit on different classes). C >>> doesn't give us as many namespace options as Python. We could rely on >>> conventional prefixes to distinguish between the two classes of function >>> (e.g. legacy_normal vs random_normal). >>> >> >> That seems simple and clear >> >> There are times when it would be nice to be more descriptive about the >>> algorithm difference (e.g. random_normal_polar vs random_normal_ziggurat), >>> >> >> We decided against versioning algorithms in NEP 19, so an update to an >> algorithm would mean we'd want to get rid of the older version (unless it's >> still in use by legacy). So AFAICT we'd never have both random_normal_polar >> and random_normal_ziggurat present at the same time? >> > > Well, we must because one's used by the legacy RandomState and one's used > by Generator. :-) > > >> I may be missing your point here, but if we have in Python >> `Generator.normal` and can switch its implementation from polar to ziggurat >> or vice versa without any deprecation, then why would we want to switch >> names in the C API? >> > > I didn't mean to suggest that we'd have an unbounded number of functions > as we improve the algorithms, just that we might have 2 once we decide to > change something about the algorithm. We need 2 to support both the > improved algorithm in Generator and the legacy algorithm in RandomState. > The current implementation of the C function would be copied to a new name > (`legacy_foo` or whatever), then we'd make RandomState use that frozen > copy, then we make the desired modifications to the main function that > Generator is referencing (`random_foo`). > > Or we could just make those legacy copies now so that people get to use > them explicitly under the legacy names, whatever they are, and we can feel > more free to modify the main implementations. I suggested this earlier, but > convinced myself that it wasn't strictly necessary. But then I admit I was > more focused on the Python API stability than any promises about the > C/Cython API. > > We might end up with more than 2 implementations if we need to change > something about the function signature, for whatever reason, and we want to > retain C/Cython API compatibility with older code. The C functions aren't > necessarily going to be one-to-one to the Generator methods. They're just > part of the implementation. So for example, if we wanted to, say, > precompute some intermediate values from the given scalar parameters so we > don't have to recompute them for each element of the `size`-large requested > output, we might do that in one C function and pass those intermediate > values as arguments to the C function that does the actual sampling. So > we'd have two C functions for that one Generator method, and the sampling C > function will not have the same signature as it did before the modification > that refactored the work into two functions. In that case, I would not be > so strict as to require that `Generator.foo` is one to one with > `random_foo`. > You're saying "be so strict" as if it were a bad thing, or a major effort. I understand that in some cases a C API can not be evolved in the same way as a Python API, but in the example you're giving here I'd say you want one function to be public, and one private. Making both public just exposes more implementation details for no good reason, and will give us more maintenance issues long-term. Anyway, this is not an issue today. If we try to keep Python and C APIs matching, we can deal with possible difficulties with that if and when they arise - should be infrequent. Cheers, Ralf > To your point, though, we don't have to use gratuitously different names > when there _is_ a one-to-one relationship. `random_gauss_zig` should be > `random_normal`. > > -- > Robert Kern > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndbecker2 at gmail.com Fri Sep 20 07:18:38 2019 From: ndbecker2 at gmail.com (Neal Becker) Date: Fri, 20 Sep 2019 07:18:38 -0400 Subject: [Numpy-discussion] Low-level API for Random In-Reply-To: References: Message-ID: I have used C-api in the past, and would like to see a convenient and stable way to do this. Currently I'm using randomgen, but calling (from c++) to the python api. The inefficiency is amortized by generating and caching batches of results. I thought randomgen was supposed to be the future of numpy random, so I've based on that. On Fri, Sep 20, 2019 at 6:08 AM Ralf Gommers wrote: > > > > On Fri, Sep 20, 2019 at 5:29 AM Robert Kern wrote: >> >> On Thu, Sep 19, 2019 at 11:04 PM Ralf Gommers wrote: >>> >>> >>> >>> On Thu, Sep 19, 2019 at 4:53 PM Robert Kern wrote: >>>> >>>> On Thu, Sep 19, 2019 at 5:24 AM Ralf Gommers wrote: >>>>> >>>>> >>>>> On Thu, Sep 19, 2019 at 10:28 AM Kevin Sheppard wrote: >>>>>> >>>>>> There are some users of the NumPy C code in randomkit. This was never officially supported. There has been a long open issue to provide this officially. >>>>>> >>>>>> When I wrote randomgen I supplied .pdx files that make it simpler to write Cython code that uses the components. The lower-level API has not had much scrutiny and is in need of a clean-up. I thought this would also encourage users to extend the random machinery themselves as part of their project or code so as to minimize the requests for new (exotic) distributions to be included in Generator. >>>>>> >>>>>> Most of the generator functions follow a pattern random_DISTRIBUTION. Some have a bit more name mangling which can easily be cleaned up (like ranomd_gauss_zig, which should become PREFIX_standard_normal). >>>>>> >>>>>> Ralf Gommers suggested unprefixed names. >>>>> >>>>> >>>>> I suggested that the names should match the Python API, which I think isn't quite the same. The Python API doesn't contain things like "gamma", "t" or "f". >>>> >>>> >>>> As the implementations evolve, they aren't going to match one-to-one 100%. The implementations are shared by the legacy RandomState. When we update an algorithm, we'll need to make a new function with the better algorithm for Generator to use, then we'll have two C functions roughly corresponding to the same method name (albeit on different classes). C doesn't give us as many namespace options as Python. We could rely on conventional prefixes to distinguish between the two classes of function (e.g. legacy_normal vs random_normal). >>> >>> >>> That seems simple and clear >>> >>>> There are times when it would be nice to be more descriptive about the algorithm difference (e.g. random_normal_polar vs random_normal_ziggurat), >>> >>> >>> We decided against versioning algorithms in NEP 19, so an update to an algorithm would mean we'd want to get rid of the older version (unless it's still in use by legacy). So AFAICT we'd never have both random_normal_polar and random_normal_ziggurat present at the same time? >> >> >> Well, we must because one's used by the legacy RandomState and one's used by Generator. :-) >> >>> >>> I may be missing your point here, but if we have in Python `Generator.normal` and can switch its implementation from polar to ziggurat or vice versa without any deprecation, then why would we want to switch names in the C API? >> >> >> I didn't mean to suggest that we'd have an unbounded number of functions as we improve the algorithms, just that we might have 2 once we decide to change something about the algorithm. We need 2 to support both the improved algorithm in Generator and the legacy algorithm in RandomState. The current implementation of the C function would be copied to a new name (`legacy_foo` or whatever), then we'd make RandomState use that frozen copy, then we make the desired modifications to the main function that Generator is referencing (`random_foo`). >> >> Or we could just make those legacy copies now so that people get to use them explicitly under the legacy names, whatever they are, and we can feel more free to modify the main implementations. I suggested this earlier, but convinced myself that it wasn't strictly necessary. But then I admit I was more focused on the Python API stability than any promises about the C/Cython API. >> >> We might end up with more than 2 implementations if we need to change something about the function signature, for whatever reason, and we want to retain C/Cython API compatibility with older code. The C functions aren't necessarily going to be one-to-one to the Generator methods. They're just part of the implementation. So for example, if we wanted to, say, precompute some intermediate values from the given scalar parameters so we don't have to recompute them for each element of the `size`-large requested output, we might do that in one C function and pass those intermediate values as arguments to the C function that does the actual sampling. So we'd have two C functions for that one Generator method, and the sampling C function will not have the same signature as it did before the modification that refactored the work into two functions. In that case, I would not be so strict as to require that `Generator.foo` is one to one with `random_foo`. > > > You're saying "be so strict" as if it were a bad thing, or a major effort. I understand that in some cases a C API can not be evolved in the same way as a Python API, but in the example you're giving here I'd say you want one function to be public, and one private. Making both public just exposes more implementation details for no good reason, and will give us more maintenance issues long-term. > > Anyway, this is not an issue today. If we try to keep Python and C APIs matching, we can deal with possible difficulties with that if and when they arise - should be infrequent. > > Cheers, > Ralf > >> >> To your point, though, we don't have to use gratuitously different names when there _is_ a one-to-one relationship. `random_gauss_zig` should be `random_normal`. >> >> -- >> Robert Kern >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -- Those who don't understand recursion are doomed to repeat it From matti.picus at gmail.com Fri Sep 20 09:02:58 2019 From: matti.picus at gmail.com (Matti Picus) Date: Fri, 20 Sep 2019 16:02:58 +0300 Subject: [Numpy-discussion] Low-level API for Random In-Reply-To: References: Message-ID: <492c1843-0713-bcf9-a2c4-919f47bc32f4@gmail.com> On 20/9/19 2:18 pm, Neal Becker wrote: > I have used C-api in the past, and would like to see a convenient and > stable way to do this. Currently I'm using randomgen, but calling > (from c++) > to the python api. The inefficiency is amortized by generating and > caching batches of results. > > I thought randomgen was supposed to be the future of numpy random, so > I've based on that. > It would be good to have actual users tell us what APIs they need. Are you using the BitGenerators or only the higher level Generator functions? From ndbecker2 at gmail.com Fri Sep 20 09:12:15 2019 From: ndbecker2 at gmail.com (Neal Becker) Date: Fri, 20 Sep 2019 09:12:15 -0400 Subject: [Numpy-discussion] Low-level API for Random In-Reply-To: <492c1843-0713-bcf9-a2c4-919f47bc32f4@gmail.com> References: <492c1843-0713-bcf9-a2c4-919f47bc32f4@gmail.com> Message-ID: I'm using the low-level generator. In this example I need to generate small random integers of defined bit widths (e.g., 2 bit). So I get 64-bit uniform random uintegers, and cache the values, returning them n-bits (e.g. 2 bits) at a time to the caller. On Fri, Sep 20, 2019 at 9:03 AM Matti Picus wrote: > > On 20/9/19 2:18 pm, Neal Becker wrote: > > I have used C-api in the past, and would like to see a convenient and > > stable way to do this. Currently I'm using randomgen, but calling > > (from c++) > > to the python api. The inefficiency is amortized by generating and > > caching batches of results. > > > > I thought randomgen was supposed to be the future of numpy random, so > > I've based on that. > > > > It would be good to have actual users tell us what APIs they need. > > Are you using the BitGenerators or only the higher level Generator > functions? > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -- Those who don't understand recursion are doomed to repeat it From robert.kern at gmail.com Fri Sep 20 10:08:19 2019 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 20 Sep 2019 10:08:19 -0400 Subject: [Numpy-discussion] Low-level API for Random In-Reply-To: References: Message-ID: On Fri, Sep 20, 2019 at 6:09 AM Ralf Gommers wrote: > > > On Fri, Sep 20, 2019 at 5:29 AM Robert Kern wrote: > >> >> We might end up with more than 2 implementations if we need to change >> something about the function signature, for whatever reason, and we want to >> retain C/Cython API compatibility with older code. The C functions aren't >> necessarily going to be one-to-one to the Generator methods. They're just >> part of the implementation. So for example, if we wanted to, say, >> precompute some intermediate values from the given scalar parameters so we >> don't have to recompute them for each element of the `size`-large requested >> output, we might do that in one C function and pass those intermediate >> values as arguments to the C function that does the actual sampling. So >> we'd have two C functions for that one Generator method, and the sampling C >> function will not have the same signature as it did before the modification >> that refactored the work into two functions. In that case, I would not be >> so strict as to require that `Generator.foo` is one to one with >> `random_foo`. >> > > You're saying "be so strict" as if it were a bad thing, or a major effort. > I am. It's an unnecessary limitation on the C API without a corresponding benefit. Your original complaint is much more directly addressed by a "don't gratuitously name related C functions differently than the Python methods they implement" rule (e.g. "gauss" instead of "normal"). > I understand that in some cases a C API can not be evolved in the same way > as a Python API, but in the example you're giving here I'd say you want one > function to be public, and one private. Making both public just exposes > more implementation details for no good reason, and will give us more > maintenance issues long-term. > Not at all. In this example, neither one of those functions is useful without the other. If one is public, both must be. -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Fri Sep 20 16:07:31 2019 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Fri, 20 Sep 2019 13:07:31 -0700 Subject: [Numpy-discussion] DType Roadmap/NEP Discussion In-Reply-To: <235dc0a0-0ef2-8b06-3a5d-05c2746ae750@gmail.com> References: <5f00290851aa9418215c4bd0fea3378bf94dcc79.camel@sipsolutions.net> <235dc0a0-0ef2-8b06-3a5d-05c2746ae750@gmail.com> Message-ID: On Thu, 2019-09-19 at 21:35 +0300, Matti Picus wrote: > On 19/9/19 2:34 am, Sebastian Berg wrote: > > Hi all, > > > > to try and make some progress towards a decision since the broad > > design > > is pretty much settling from my side. I am thinking about making a > > meeting, and suggest Monday at 11am Pacific Time (I am open to > > other > > times though). > > > > My hope is to get everyone interested on board, so that we can make > > an > > informed decision about the general direction very soon. So just > > reach > > out, or discuss on the mailing list as well. > > > > The current draft for an NEP is here: > > https://hackmd.io/kxuh15QGSjueEKft5SaMug?both > > Mon Sept 23 sounds good. Please reach out to the possible consumers > of > the API to get wider input. > > - Pandas > > - Astropy > > - Numba > > - ??? > > > It may be a bit too short notice, but it seems like there is enough > to > talk about even if only the NumPy community show up. > > > > Where/how will the meeting take place? > Lets go for the typical zoom link. I will add a few points later probably, but to be able to update things easily, see: https://hackmd.io/5S3ADAdOSIeaUwFxlvajMA Best, Sebastian > Matti > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From ralf.gommers at gmail.com Fri Sep 20 23:32:04 2019 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Fri, 20 Sep 2019 20:32:04 -0700 Subject: [Numpy-discussion] Low-level API for Random In-Reply-To: References: Message-ID: On Fri, Sep 20, 2019 at 7:09 AM Robert Kern wrote: > > > On Fri, Sep 20, 2019 at 6:09 AM Ralf Gommers > wrote: > >> >> >> On Fri, Sep 20, 2019 at 5:29 AM Robert Kern >> wrote: >> >>> >>> We might end up with more than 2 implementations if we need to change >>> something about the function signature, for whatever reason, and we want to >>> retain C/Cython API compatibility with older code. The C functions aren't >>> necessarily going to be one-to-one to the Generator methods. They're just >>> part of the implementation. So for example, if we wanted to, say, >>> precompute some intermediate values from the given scalar parameters so we >>> don't have to recompute them for each element of the `size`-large requested >>> output, we might do that in one C function and pass those intermediate >>> values as arguments to the C function that does the actual sampling. So >>> we'd have two C functions for that one Generator method, and the sampling C >>> function will not have the same signature as it did before the modification >>> that refactored the work into two functions. In that case, I would not be >>> so strict as to require that `Generator.foo` is one to one with >>> `random_foo`. >>> >> >> You're saying "be so strict" as if it were a bad thing, or a major effort. >> > > I am. It's an unnecessary limitation on the C API without a corresponding > benefit. Your original complaint > It's not a "complaint". We're having this discussion because we shipped a partial API in 1.17.0 that we will now have to go back and either take out or clean up in 1.17.3. The PR for the new numpy.random grew so large that we didn't notice or discuss that (such things happen, no big deal - we have limited reviewer bandwidth). So now that we do, it makes sense to actually think about what needs to be in the API. For now I think that's only the parts that are matching the Python API plus what is needed to use them from C/Cython. Future additions require similar review and criteria as adding to the Python API and the existing NumPy C API. To me, your example seems to (a) not deal with API stability, and (b) expose too much implementation detail. To be clear about the actual status, we: - shipped one header file (bitgen.h) - shipped two pxd files (common.pxd, bit_generator.pxd) - removed a header file we used to ship (randomkit.h) - did not ship distributions.pxd, bounded_integers.pxd, legacy_distributions.pxd or related header files bit_generator.pxd looks fine, common.pxd contains parts that shouldn't be there. I think the intent was to ship at least distributions.pxd/h, and perhaps all of those pxd files. is much more directly addressed by a "don't gratuitously name related C > functions differently than the Python methods they implement" rule (e.g. > "gauss" instead of "normal"). > > >> I understand that in some cases a C API can not be evolved in the same >> way as a Python API, but in the example you're giving here I'd say you want >> one function to be public, and one private. Making both public just exposes >> more implementation details for no good reason, and will give us more >> maintenance issues long-term. >> > > Not at all. In this example, neither one of those functions is useful > without the other. If one is public, both must be. > If neither one is useful without the other, it sounds like both should be private and the third one that puts them together - the one that didn't change signature and implements `Generator.foo` - is the public one. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Sat Sep 21 00:30:41 2019 From: robert.kern at gmail.com (Robert Kern) Date: Sat, 21 Sep 2019 00:30:41 -0400 Subject: [Numpy-discussion] Low-level API for Random In-Reply-To: References: Message-ID: On Fri, Sep 20, 2019 at 11:33 PM Ralf Gommers wrote: > > > On Fri, Sep 20, 2019 at 7:09 AM Robert Kern wrote: > >> >> >> On Fri, Sep 20, 2019 at 6:09 AM Ralf Gommers >> wrote: >> >>> >>> >>> On Fri, Sep 20, 2019 at 5:29 AM Robert Kern >>> wrote: >>> >>>> >>>> We might end up with more than 2 implementations if we need to change >>>> something about the function signature, for whatever reason, and we want to >>>> retain C/Cython API compatibility with older code. The C functions aren't >>>> necessarily going to be one-to-one to the Generator methods. They're just >>>> part of the implementation. So for example, if we wanted to, say, >>>> precompute some intermediate values from the given scalar parameters so we >>>> don't have to recompute them for each element of the `size`-large requested >>>> output, we might do that in one C function and pass those intermediate >>>> values as arguments to the C function that does the actual sampling. So >>>> we'd have two C functions for that one Generator method, and the sampling C >>>> function will not have the same signature as it did before the modification >>>> that refactored the work into two functions. In that case, I would not be >>>> so strict as to require that `Generator.foo` is one to one with >>>> `random_foo`. >>>> >>> >>> You're saying "be so strict" as if it were a bad thing, or a major >>> effort. >>> >> >> I am. It's an unnecessary limitation on the C API without a corresponding >> benefit. Your original complaint >> > > It's not a "complaint". > Please forgive me. That word choice was not intended to be dismissive. I don't view "complaints" as minor annoyances that the "complainer" should just shut up and deal with, or that the "complainer" is just being annoying, but I can see how it came across that I might. Please continue as if I said "The problem you originally noted...". It's a real problem that needs to be addressed. We just have different thoughts on exactly what is needed to address it. > We're having this discussion because we shipped a partial API in 1.17.0 > that we will now have to go back and either take out or clean up in 1.17.3. > The PR for the new numpy.random grew so large that we didn't notice or > discuss that (such things happen, no big deal - we have limited reviewer > bandwidth). So now that we do, it makes sense to actually think about what > needs to be in the API. For now I think that's only the parts that are > matching the Python API plus what is needed to use them from C/Cython. > Future additions require similar review and criteria as adding to the > Python API and the existing NumPy C API. To me, your example seems to (a) > not deal with API stability, and (b) expose too much implementation detail. > > To be clear about the actual status, we: > - shipped one header file (bitgen.h) > - shipped two pxd files (common.pxd, bit_generator.pxd) > - removed a header file we used to ship (randomkit.h) > - did not ship distributions.pxd, bounded_integers.pxd, > legacy_distributions.pxd or related header files > > bit_generator.pxd looks fine, common.pxd contains parts that shouldn't be > there. I think the intent was to ship at least distributions.pxd/h, and > perhaps all of those pxd files. > > is much more directly addressed by a "don't gratuitously name related C >> functions differently than the Python methods they implement" rule (e.g. >> "gauss" instead of "normal"). >> >> >>> I understand that in some cases a C API can not be evolved in the same >>> way as a Python API, but in the example you're giving here I'd say you want >>> one function to be public, and one private. Making both public just exposes >>> more implementation details for no good reason, and will give us more >>> maintenance issues long-term. >>> >> >> Not at all. In this example, neither one of those functions is useful >> without the other. If one is public, both must be. >> > > If neither one is useful without the other, it sounds like both should be > private and the third one that puts them together - the one that didn't > change signature and implements `Generator.foo` - is the public one. > That defeats the point of using the C API in this instance, though. The reason it got split into two (in this plausible hypothetical; I'm thinking of the binomial implementation here, which caches these intermediates in a passed-in struct) is because in C you want to call them in different ways (in this case, the prep function once and the sampling function many times). It's not that you always call them in lockstep pairs: `prep(); sample(); prep(); sample();`. A C function that combines them defeats the efficiency that one wanted to gain by using the C API. The C API has different needs than the Python API, because the Python API has a lot more support from the Python language and numpy data structures to be able to jam a lot of functionality into a single function signature that C just doesn't give us. The purpose of the C API is not just to avoid Python function call overhead. If there's a reason that the Generator method needs the implementation split up into multiple C functions, that's a really strong signal that *other* C code using the C API will need that same split. It's not just an implementation detail; it's a documented use case. Given the prevalence of Cython, it's actually really easy to use the Python API pretty easily in "C", so it's actually a huge waste if the C API matches the Python API too closely. The power and utility of the C API will be in how it *differs* from the Python API. For the distribution methods, this is largely in how it lets you sample one number at a time without bothering with the numpy and broadcasting overhead. That's the driving motivation for having a C API for the distributions, and the algorithms that we choose have consequences for the C API that will best satisfy that motivation. The issue that this raises with API stability is that if you require a one-to-one match between the C API function and the Generator method, we can never change the function signature of the C function. That's going to forbid us from moving from an algorithm that doesn't need any precomputation to one that does. That precomputation either requires a 2-function dance or a new argument to keep the cached values (c.f. `random_binomial()`), so it's always going to affect the API. To use such a new algorithm, we'll have to add a new function or two to the C API and document the deprecation of the older API function. We can't just swap it in under the same name, even if the new function is standalone. That's a significant constraint on future development when the main issue that led to the suggestion is that the names were sometimes gratuitously different between the C API and the Python API, which hindered discoverability. We can fix *that* problem easily without constraining the universe of algorithms that we might consider using in the future. -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sat Sep 21 00:47:49 2019 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Fri, 20 Sep 2019 21:47:49 -0700 Subject: [Numpy-discussion] Low-level API for Random In-Reply-To: References: Message-ID: On Fri, Sep 20, 2019 at 9:31 PM Robert Kern wrote: > On Fri, Sep 20, 2019 at 11:33 PM Ralf Gommers > wrote: > >> >> >> On Fri, Sep 20, 2019 at 7:09 AM Robert Kern >> wrote: >> >>> >>> >>> On Fri, Sep 20, 2019 at 6:09 AM Ralf Gommers >>> wrote: >>> >>>> >>>> >>>> On Fri, Sep 20, 2019 at 5:29 AM Robert Kern >>>> wrote: >>>> >>>>> >>>>> We might end up with more than 2 implementations if we need to change >>>>> something about the function signature, for whatever reason, and we want to >>>>> retain C/Cython API compatibility with older code. The C functions aren't >>>>> necessarily going to be one-to-one to the Generator methods. They're just >>>>> part of the implementation. So for example, if we wanted to, say, >>>>> precompute some intermediate values from the given scalar parameters so we >>>>> don't have to recompute them for each element of the `size`-large requested >>>>> output, we might do that in one C function and pass those intermediate >>>>> values as arguments to the C function that does the actual sampling. So >>>>> we'd have two C functions for that one Generator method, and the sampling C >>>>> function will not have the same signature as it did before the modification >>>>> that refactored the work into two functions. In that case, I would not be >>>>> so strict as to require that `Generator.foo` is one to one with >>>>> `random_foo`. >>>>> >>>> >>>> You're saying "be so strict" as if it were a bad thing, or a major >>>> effort. >>>> >>> >>> I am. It's an unnecessary limitation on the C API without a >>> corresponding benefit. Your original complaint >>> >> >> It's not a "complaint". >> > > Please forgive me. That word choice was not intended to be dismissive. I > don't view "complaints" as minor annoyances that the "complainer" should > just shut up and deal with, or that the "complainer" is just being > annoying, but I can see how it came across that I might. Please continue as > if I said "The problem you originally noted...". It's a real problem that > needs to be addressed. We just have different thoughts on exactly what is > needed to address it. > Okay, thank you:) > >> We're having this discussion because we shipped a partial API in 1.17.0 >> that we will now have to go back and either take out or clean up in 1.17.3. >> The PR for the new numpy.random grew so large that we didn't notice or >> discuss that (such things happen, no big deal - we have limited reviewer >> bandwidth). So now that we do, it makes sense to actually think about what >> needs to be in the API. For now I think that's only the parts that are >> matching the Python API plus what is needed to use them from C/Cython. >> Future additions require similar review and criteria as adding to the >> Python API and the existing NumPy C API. To me, your example seems to (a) >> not deal with API stability, and (b) expose too much implementation detail. >> >> To be clear about the actual status, we: >> - shipped one header file (bitgen.h) >> - shipped two pxd files (common.pxd, bit_generator.pxd) >> - removed a header file we used to ship (randomkit.h) >> - did not ship distributions.pxd, bounded_integers.pxd, >> legacy_distributions.pxd or related header files >> >> bit_generator.pxd looks fine, common.pxd contains parts that shouldn't be >> there. I think the intent was to ship at least distributions.pxd/h, and >> perhaps all of those pxd files. >> >> is much more directly addressed by a "don't gratuitously name related C >>> functions differently than the Python methods they implement" rule (e.g. >>> "gauss" instead of "normal"). >>> >>> >>>> I understand that in some cases a C API can not be evolved in the same >>>> way as a Python API, but in the example you're giving here I'd say you want >>>> one function to be public, and one private. Making both public just exposes >>>> more implementation details for no good reason, and will give us more >>>> maintenance issues long-term. >>>> >>> >>> Not at all. In this example, neither one of those functions is useful >>> without the other. If one is public, both must be. >>> >> >> If neither one is useful without the other, it sounds like both should be >> private and the third one that puts them together - the one that didn't >> change signature and implements `Generator.foo` - is the public one. >> > > That defeats the point of using the C API in this instance, though. The > reason it got split into two (in this plausible hypothetical; I'm thinking > of the binomial implementation here, which caches these intermediates in a > passed-in struct) is because in C you want to call them in different ways > (in this case, the prep function once and the sampling function many > times). It's not that you always call them in lockstep pairs: `prep(); > sample(); prep(); sample();`. A C function that combines them defeats the > efficiency that one wanted to gain by using the C API. The C API has > different needs than the Python API, because the Python API has a lot more > support from the Python language and numpy data structures to be able to > jam a lot of functionality into a single function signature that C just > doesn't give us. The purpose of the C API is not just to avoid Python > function call overhead. If there's a reason that the Generator method needs > the implementation split up into multiple C functions, that's a really > strong signal that *other* C code using the C API will need that same > split. It's not just an implementation detail; it's a documented use case. > > Given the prevalence of Cython, it's actually really easy to use the > Python API pretty easily in "C", so it's actually a huge waste if the C API > matches the Python API too closely. The power and utility of the C API will > be in how it *differs* from the Python API. For the distribution methods, > this is largely in how it lets you sample one number at a time without > bothering with the numpy and broadcasting overhead. That's the driving > motivation for having a C API for the distributions, and the algorithms > that we choose have consequences for the C API that will best satisfy that > motivation. > > The issue that this raises with API stability is that if you require a > one-to-one match between the C API function and the Generator method, we > can never change the function signature of the C function. That's going to > forbid us from moving from an algorithm that doesn't need any > precomputation to one that does. That precomputation either requires a > 2-function dance or a new argument to keep the cached values (c.f. > `random_binomial()`), so it's always going to affect the API. To use such a > new algorithm, we'll have to add a new function or two to the C API and > document the deprecation of the older API function. We can't just swap it > in under the same name, even if the new function is standalone. That's a > significant constraint on future development when the main issue that led > to the suggestion is that the names were sometimes gratuitously different > between the C API and the Python API, which hindered discoverability. We > can fix *that* problem easily without constraining the universe of > algorithms that we might consider using in the future. > Fair enough, now the use case is clear to me. In summary: there may be real reasons to deviate and add more functions; let's do so if and when it makes sense to. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sat Sep 21 01:29:28 2019 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Fri, 20 Sep 2019 22:29:28 -0700 Subject: [Numpy-discussion] User Stories for https://numpy.org In-Reply-To: References: Message-ID: On Wed, Sep 18, 2019 at 9:52 AM Inessa Pawson wrote: > The NumPy web team has begun redesigning https://numpy.org determined to > transform the website into a welcoming and useful digital hub of all things > NumPy. We are inviting all members of our large and diverse community to > submit their user stories to help us fulfill our mission. > Thanks Inessa. I hope to see some user stories in particular from stakeholder groups that we may not be thinking about yet. Our first focus is probably something like: beginning user, advanced user, contributor. Beyond that there's groups like educators and packagers that I hope we can include specific content for soon. I'm sure we're still missing some groups, would love to hear specific needs or previous unsuccesfull/unsatisfactory attempts at engaging with NumPy. Note that we're keeping track of these user stories in https://github.com/numpy/numpy.org/issues/42 Cheers, Ralf > *What are we looking for?* > > In simple, concise terms, a user story describes what a user needs to > accomplish while visiting a website. Anyone who reads the user story must > be able to understand why the user needs the functionality, and what is > required to implement the story. User stories must have acceptance > criteria. The shorter the story the better. > > > *Examples of good user stories* > > 1. Lotte is a library author that depends on NumPy. She is looking for > information about major changes and a release date of the next version of > NumPy. She would like to easily find it on the website instead of > contacting the core team. > > > 2. Yu Yan was introduced to NumPy in her first week of the Foundations of > Data Science class. She is looking for a NumPy tutorial for absolute > beginners in Mandarin. > > > 3. Tiago is a software developer. By day, he builds enterprise > applications for a Fortune 100 company. By night, he cultivates his > academic interests in statistics and computer science using various Python > libraries. Tiago has an idea for a new NumPy feature and would like to > implement it. He is looking for information on how to contact the person(s) > in charge of such decisions. > > > *Please note* that at this stage of the numpy.org redesign our focus is > not on expanding or improving the documentation but, rather, developing > high-level content to provide information about the project to a multitude > of stakeholders. > > -- > Every good wish, > *Inessa Pawson* > NumPy Web Team > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at gmail.com Sun Sep 22 23:44:45 2019 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Sun, 22 Sep 2019 23:44:45 -0400 Subject: [Numpy-discussion] [pydata] NumPy proposal to remove the financial functions. In-Reply-To: <8983e694-7067-44cb-a35f-5c173d44c160@googlegroups.com> References: <8458a06f-ead7-4f7e-b288-4a15a6002482@googlegroups.com> <8983e694-7067-44cb-a35f-5c173d44c160@googlegroups.com> Message-ID: On 9/21/19, Brendan Barnwell wrote: > Hi Warren, > > I'm somewhat late to this discussion but I too have used the financial > functions. I looked at the discussion and the NEP and one thing I don't > understand is how the maintenance burden is alleviated if the functions are > > moved to a separate library. Is the intent of the Numpy devs to just > "dump" these functions into numpy_financial and then not maintain them? If > > not, what is achieved by moving them out of numpy? Brendan, There have been some more recent comments on the github issue that are relevant; take a look: https://github.com/numpy/numpy/issues/2880 It is true that when the functions are moved to numpy_financial, they will receive less attention from the core NumPy developers. Indeed, that is the point of the move. As you can see from the comments in the github issue and those quoted in the NEP, there is no interest among the current developers in maintaining these functions in NumPy. By having a smaller and more focused library that is explicitly for financial functions, it is possible that new developers with greater interest and expertise in that domain will be motivated to contribute. See, for example, Graham Duncan's recent comments in the github issue. It remains to be seen whether we'll end up with a significantly *better* library for financial calculations once the transition is complete. For the most visibility among the NumPy developers, it would be best to continue the conversation in a NumPy venue, either the github issue or the NumPy mailing list. I've cc'ed this email to the NumPy mailing list. Warren > > On Thursday, September 19, 2019 at 8:25:52 AM UTC-7, Warren Weckesser > wrote: >> >> On 9/8/19, Warren Weckesser > wrote: >> > NumPy is considering a NEP (NumPy Enhancement Proposal) that proposes >> the >> > deprecation and ultimate removal of the financial functions from NumPy. >> > >> > The functions would be moved to an independent library. The mailing >> list >> > discussion of this proposal is at >> > >> > >> > >> http://numpy-discussion.10968.n7.nabble.com/NEP-32-Remove-the-financial-functions-from-NumPy-tt47456.html >> >> > >> > or >> > >> > >> > >> https://mail.python.org/pipermail/numpy-discussion/2019-September/079965.html >> >> > >> > The first message in that thread includes the proposed NEP. >> > >> > There have been a couple suggestions to ask about this on the Pandas >> > mailing list. Contributions to the thread in the numpy-discussion >> mailing >> > list would be appreciated! >> >> >> FYI: The proposal to accept the NEP to remove the financial functions >> has been made on the NumPy-Discussion mailing list: >> >> https://mail.python.org/pipermail/numpy-discussion/2019-September/080074.html >> >> >> Warren >> >> > >> > Thanks, >> > >> > Warren >> > >> > -- >> > You received this message because you are subscribed to the Google >> Groups >> > "PyData" group. >> > To unsubscribe from this group and stop receiving emails from it, send >> an >> > email to pyd... at googlegroups.com . >> > To view this discussion on the web visit >> > >> https://groups.google.com/d/msgid/pydata/8458a06f-ead7-4f7e-b288-4a15a6002482%40googlegroups.com. >> >> >> > >> > > -- > You received this message because you are subscribed to the Google Groups > "PyData" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to pydata+unsubscribe at googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/pydata/8983e694-7067-44cb-a35f-5c173d44c160%40googlegroups.com. > From anntzer.lee at gmail.com Mon Sep 23 05:56:25 2019 From: anntzer.lee at gmail.com (Antony Lee) Date: Mon, 23 Sep 2019 11:56:25 +0200 Subject: [Numpy-discussion] ANN: mplcairo 0.2 release Message-ID: Dear all, I am pleased to announce the release of mplcairo 0.2. mplcairo is a Matplotlib backend based on the well-known cairo library, supporting output to both raster (including interactively) and vector formats. In other words, it provides the functionality of Matplotlib's {,qt5,gtk3,wx,tk,macos}{agg,cairo}, pdf, ps, and svg backends. Per Matplotlib's standard API, the backend can be selected by calling matplotlib.use("module://mplcairo.qt") or setting your MPLBACKEND environment variable to `module://mplcairo.qt` for Qt5, and similarly for other toolkits. mplcairo 0.2 adds support for cairo 1.17.2's high-precision floating point surfaces, simplifies the use of custom compositing operators (see `examples/operators.py`), a few other features listed in the changelog, as well as the usual bugfixes over 0.1. Enjoy, Antony Lee -------------- next part -------------- An HTML attachment was scrubbed... URL: From tom.w.augspurger at gmail.com Mon Sep 23 07:39:51 2019 From: tom.w.augspurger at gmail.com (Tom Augspurger) Date: Mon, 23 Sep 2019 06:39:51 -0500 Subject: [Numpy-discussion] DType Roadmap/NEP Discussion In-Reply-To: References: <5f00290851aa9418215c4bd0fea3378bf94dcc79.camel@sipsolutions.net> <235dc0a0-0ef2-8b06-3a5d-05c2746ae750@gmail.com> Message-ID: On Fri, Sep 20, 2019 at 3:10 PM Sebastian Berg wrote: > On Thu, 2019-09-19 at 21:35 +0300, Matti Picus wrote: > > On 19/9/19 2:34 am, Sebastian Berg wrote: > > > Hi all, > > > > > > to try and make some progress towards a decision since the broad > > > design > > > is pretty much settling from my side. I am thinking about making a > > > meeting, and suggest Monday at 11am Pacific Time (I am open to > > > other > > > times though). > > > > > > My hope is to get everyone interested on board, so that we can make > > > an > > > informed decision about the general direction very soon. So just > > > reach > > > out, or discuss on the mailing list as well. > > > > > > The current draft for an NEP is here: > > > https://hackmd.io/kxuh15QGSjueEKft5SaMug?both > > > > Mon Sept 23 sounds good. Please reach out to the possible consumers > > of > > the API to get wider input. > > > > - Pandas > > > > - Astropy > > > > - Numba > > > > - ??? > > > > > > It may be a bit too short notice, but it seems like there is enough > > to > > talk about even if only the NumPy community show up. > > > > > > > > Where/how will the meeting take place? > > > > Lets go for the typical zoom link. I will add a few points later > probably, but to be able to update things easily, see: > > https://hackmd.io/5S3ADAdOSIeaUwFxlvajMA Is there a time set for this meeting? I'll try to attend from the pandas side of things. > > Best, > > Sebastian > > > > Matti > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Mon Sep 23 07:44:42 2019 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Mon, 23 Sep 2019 13:44:42 +0200 Subject: [Numpy-discussion] DType Roadmap/NEP Discussion In-Reply-To: References: <5f00290851aa9418215c4bd0fea3378bf94dcc79.camel@sipsolutions.net> <235dc0a0-0ef2-8b06-3a5d-05c2746ae750@gmail.com> Message-ID: On Mon, Sep 23, 2019 at 1:40 PM Tom Augspurger wrote: > > > On Fri, Sep 20, 2019 at 3:10 PM Sebastian Berg > wrote: > >> On Thu, 2019-09-19 at 21:35 +0300, Matti Picus wrote: >> > On 19/9/19 2:34 am, Sebastian Berg wrote: >> > > Hi all, >> > > >> > > to try and make some progress towards a decision since the broad >> > > design >> > > is pretty much settling from my side. I am thinking about making a >> > > meeting, and suggest Monday at 11am Pacific Time (I am open to >> > > other >> > > times though). >> > > >> > > My hope is to get everyone interested on board, so that we can make >> > > an >> > > informed decision about the general direction very soon. So just >> > > reach >> > > out, or discuss on the mailing list as well. >> > > >> > > The current draft for an NEP is here: >> > > https://hackmd.io/kxuh15QGSjueEKft5SaMug?both >> > >> > Mon Sept 23 sounds good. Please reach out to the possible consumers >> > of >> > the API to get wider input. >> > >> > - Pandas >> > >> > - Astropy >> > >> > - Numba >> > >> > - ??? >> > >> > >> > It may be a bit too short notice, but it seems like there is enough >> > to >> > talk about even if only the NumPy community show up. >> > >> > >> > >> > Where/how will the meeting take place? >> > >> >> Lets go for the typical zoom link. I will add a few points later >> probably, but to be able to update things easily, see: >> >> https://hackmd.io/5S3ADAdOSIeaUwFxlvajMA > > > Is there a time set for this meeting? I'll try to attend from the pandas > side of things. > The HackMD link above says 11am PST (so ~6 hours from now), and also contains a Zoom link to join the call. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Mon Sep 23 13:43:40 2019 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Mon, 23 Sep 2019 10:43:40 -0700 Subject: [Numpy-discussion] DType Roadmap/NEP Discussion In-Reply-To: References: <5f00290851aa9418215c4bd0fea3378bf94dcc79.camel@sipsolutions.net> <235dc0a0-0ef2-8b06-3a5d-05c2746ae750@gmail.com> Message-ID: <64783a66a2fccedeaff7e8f8a645bcbc1caeee5b.camel@sipsolutions.net> On Mon, 2019-09-23 at 13:44 +0200, Ralf Gommers wrote: > > > On Mon, Sep 23, 2019 at 1:40 PM Tom Augspurger < > tom.w.augspurger at gmail.com> wrote: > > > > On Fri, Sep 20, 2019 at 3:10 PM Sebastian Berg < > > sebastian at sipsolutions.net> wrote: > > > On Thu, 2019-09-19 at 21:35 +0300, Matti Picus wrote: > > > > On 19/9/19 2:34 am, Sebastian Berg wrote: > > > > > Hi all, > > > > > > > > > > to try and make some progress towards a decision since the > > > broad > > > > > design > > > > > is pretty much settling from my side. I am thinking about > > > making a > > > > > meeting, and suggest Monday at 11am Pacific Time (I am open > > > to > > > > > other > > > > > times though). > > > > > > > > > > My hope is to get everyone interested on board, so that we > > > can make > > > > > an > > > > > informed decision about the general direction very soon. So > > > just > > > > > reach > > > > > out, or discuss on the mailing list as well. > > > > > > > > > > The current draft for an NEP is here: > > > > > https://hackmd.io/kxuh15QGSjueEKft5SaMug?both > > > > > > > > Mon Sept 23 sounds good. Please reach out to the possible > > > consumers > > > > of > > > > the API to get wider input. > > > > > > > > - Pandas > > > > > > > > - Astropy > > > > > > > > - Numba > > > > > > > > - ??? > > > > > > > > > > > > It may be a bit too short notice, but it seems like there is > > > enough > > > > to > > > > talk about even if only the NumPy community show up. > > > > > > > > > > > > > > > > Where/how will the meeting take place? > > > > > > > > > > Lets go for the typical zoom link. I will add a few points later > > > probably, but to be able to update things easily, see: > > > > > > https://hackmd.io/5S3ADAdOSIeaUwFxlvajMA > > > > Is there a time set for this meeting? I'll try to attend from the > > pandas side of things. > > > > The HackMD link above says 11am PST (so ~6 hours from now), and also > contains a Zoom link to join the call. > Just to let you know, unfortunately our room is in use, so we will have to use a different zoom link: https://zoom.us/j/6398421986 (the HackMD is updated) Cheers, Sebastian > Cheers, > Ralf > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From sebastian at sipsolutions.net Mon Sep 23 16:04:52 2019 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Mon, 23 Sep 2019 13:04:52 -0700 Subject: [Numpy-discussion] DType Roadmap/NEP Discussion In-Reply-To: <64783a66a2fccedeaff7e8f8a645bcbc1caeee5b.camel@sipsolutions.net> References: <5f00290851aa9418215c4bd0fea3378bf94dcc79.camel@sipsolutions.net> <235dc0a0-0ef2-8b06-3a5d-05c2746ae750@gmail.com> <64783a66a2fccedeaff7e8f8a645bcbc1caeee5b.camel@sipsolutions.net> Message-ID: <2d82f2474da837ec438e91c79a3f7b9b1bb2e4e6.camel@sipsolutions.net> Since it probably got lost. I am currently developing things at: https://github.com/seberg/numpy/tree/dtypemeta Please do not expect the tidiest code at the moment. The public API is not yet available, and currently mainly at a proof-of-concept stage, that things like: * Promotion * Casting * Array creation ? coercion `np.array(...)` (fairly far along) * AbstractDTypes for value based casting work. Also of course generally having a DTypeMeta class and " On Mon, 2019-09-23 at 13:44 +0200, Ralf Gommers wrote: > > > > On Mon, Sep 23, 2019 at 1:40 PM Tom Augspurger < > > tom.w.augspurger at gmail.com> wrote: > > > On Fri, Sep 20, 2019 at 3:10 PM Sebastian Berg < > > > sebastian at sipsolutions.net> wrote: > > > > On Thu, 2019-09-19 at 21:35 +0300, Matti Picus wrote: > > > > > On 19/9/19 2:34 am, Sebastian Berg wrote: > > > > > > Hi all, > > > > > > > > > > > > to try and make some progress towards a decision since the > > > > broad > > > > > > design > > > > > > is pretty much settling from my side. I am thinking about > > > > making a > > > > > > meeting, and suggest Monday at 11am Pacific Time (I am open > > > > to > > > > > > other > > > > > > times though). > > > > > > > > > > > > My hope is to get everyone interested on board, so that we > > > > can make > > > > > > an > > > > > > informed decision about the general direction very soon. So > > > > just > > > > > > reach > > > > > > out, or discuss on the mailing list as well. > > > > > > > > > > > > The current draft for an NEP is here: > > > > > > https://hackmd.io/kxuh15QGSjueEKft5SaMug?both > > > > > > > > > > Mon Sept 23 sounds good. Please reach out to the possible > > > > consumers > > > > > of > > > > > the API to get wider input. > > > > > > > > > > - Pandas > > > > > > > > > > - Astropy > > > > > > > > > > - Numba > > > > > > > > > > - ??? > > > > > > > > > > > > > > > It may be a bit too short notice, but it seems like there is > > > > enough > > > > > to > > > > > talk about even if only the NumPy community show up. > > > > > > > > > > > > > > > > > > > > Where/how will the meeting take place? > > > > > > > > > > > > > Lets go for the typical zoom link. I will add a few points > > > > later > > > > probably, but to be able to update things easily, see: > > > > > > > > https://hackmd.io/5S3ADAdOSIeaUwFxlvajMA > > > > > > Is there a time set for this meeting? I'll try to attend from the > > > pandas side of things. > > > > > > > The HackMD link above says 11am PST (so ~6 hours from now), and > > also > > contains a Zoom link to join the call. > > > > Just to let you know, unfortunately our room is in use, so we will > have > to use a different zoom link: https://zoom.us/j/6398421986 > (the HackMD is updated) > > Cheers, > > Sebastian > > > > Cheers, > > Ralf > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From eric at depagne.org Tue Sep 24 10:19:08 2019 From: eric at depagne.org (=?ISO-8859-1?Q?=C9ric?= Depagne) Date: Tue, 24 Sep 2019 16:19:08 +0200 Subject: [Numpy-discussion] Data filtering with np.genfromtxt Message-ID: <2797982.PVNr7dFt1b@portable> Hi all, I am reading large csv file, that has 8.5 million lines and 216 columns using genfromtxt. I'm not interested in all of the 216 columns, so I filter them out using the "usecols" and "converters" parameters. That works very well, but in my original large file, all the columns I extract are not filled with values. As expected in these cases, genfromtxt replaces them by nan and thus, in the final array, there are rows that contain these nans. I'd like to know if there is a way to filterout at the genfromtxt level the lines that do contain these nans, so that they do not appear in my final array. I'd like to have something like: genfromtxt extracts the line using the parameters I need. If the extracted line contains a NaN, do nothing and process the next line. If it has no NaNs, add it to the output array as usual. I could of course remove in the array created by genfromtxt() all the rows that contain nans (and x[~np.isnan(x).any(axis=1)] does it nicely), but I'd like to be able to get a given size of the output array. The idea is that I can get, for instance, the first 10000 (or any number) lines of the input file that contain all the columns I need not just the first 10000. I've found a few examples on SO that do some filtering, but the ones I've found do not process the extracted lines. Any help appreciated. ?ric. -- Un clavier azerty en vaut deux ---------------------------------------------------------- ?ric Depagne -------------- next part -------------- An HTML attachment was scrubbed... URL: From cimrman3 at ntc.zcu.cz Tue Sep 24 10:23:07 2019 From: cimrman3 at ntc.zcu.cz (Robert Cimrman) Date: Tue, 24 Sep 2019 16:23:07 +0200 Subject: [Numpy-discussion] ANN: SfePy 2019.3 Message-ID: I am pleased to announce release 2019.3 of SfePy. Description ----------- SfePy (simple finite elements in Python) is a software for solving systems of coupled partial differential equations by the finite element method or by the isogeometric analysis (limited support). It is distributed under the new BSD license. Home page: http://sfepy.org Mailing list: https://mail.python.org/mm3/mailman3/lists/sfepy.python.org/ Git (source) repository, issue tracker: https://github.com/sfepy/sfepy Highlights of this release -------------------------- - interface to eigenvalue problem solvers in SLEPc - new Python 3 enabled Timer class and other Python 3 compatibility fixes For full release notes see [1]. Cheers, Robert Cimrman [1] http://docs.sfepy.org/doc/release_notes.html#id1 --- Contributors to this release in alphabetical order: Robert Cimrman Vladimir Lukes From sebastian at sipsolutions.net Tue Sep 24 14:48:09 2019 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 24 Sep 2019 11:48:09 -0700 Subject: [Numpy-discussion] NumPy Community Meeting Wednesday, Sep. 25 Message-ID: Hi all, There will be a NumPy Community meeting Wednesday September 25 at 11 am Pacific Time. Everyone is invited to join in and edit the work-in- progress meeting topics and notes: https://hackmd.io/76o-IxCjQX2mOXO_wwkcpg?both Best wishes Sebastian -------------- next part -------------- A non-text attachment was scrubbed... Name: NumPy_Community_Call.ics Type: text/calendar Size: 3264 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From stefanv at berkeley.edu Wed Sep 25 13:53:38 2019 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Wed, 25 Sep 2019 10:53:38 -0700 Subject: [Numpy-discussion] Low-level API for Random In-Reply-To: References: Message-ID: <03ba8d91-52d4-43d8-b9bb-5fd9973a96ee@www.fastmail.com> On Fri, Sep 20, 2019, at 21:30, Robert Kern wrote: > Given the prevalence of Cython, it's actually really easy to use the Python API pretty easily in "C", so it's actually a huge waste if the C API matches the Python API too closely. The power and utility of the C API will be in how it *differs* from the Python API. For the distribution methods, this is largely in how it lets you sample one number at a time without bothering with the numpy and broadcasting overhead. That's the driving motivation for having a C API for the distributions, and the algorithms that we choose have consequences for the C API that will best satisfy that motivation. I'd like to clarify what exactly we mean by exposing a C API. Do we have in mind that our random number generators can be used from standalone C code, or via Cython `cimport` like with the current numpy.pxd? It sounds like we want to expose the highest level generators; do we also want to provide access to the bit streams? Best regards, St?fan -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Wed Sep 25 14:05:54 2019 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 25 Sep 2019 13:05:54 -0500 Subject: [Numpy-discussion] Low-level API for Random In-Reply-To: <03ba8d91-52d4-43d8-b9bb-5fd9973a96ee@www.fastmail.com> References: <03ba8d91-52d4-43d8-b9bb-5fd9973a96ee@www.fastmail.com> Message-ID: On Wed, Sep 25, 2019, 12:56 PM Stefan van der Walt wrote: > On Fri, Sep 20, 2019, at 21:30, Robert Kern wrote: > > Given the prevalence of Cython, it's actually really easy to use the > Python API pretty easily in "C", so it's actually a huge waste if the C API > matches the Python API too closely. The power and utility of the C API will > be in how it *differs* from the Python API. For the distribution methods, > this is largely in how it lets you sample one number at a time without > bothering with the numpy and broadcasting overhead. That's the driving > motivation for having a C API for the distributions, and the algorithms > that we choose have consequences for the C API that will best satisfy that > motivation. > > > I'd like to clarify what exactly we mean by exposing a C API. Do we have > in mind that our random number generators can be used from standalone C > code, or via Cython `cimport` like with the current numpy.pxd? > Cython is the priority. Numba and cffi/ctypes are also desired and relatively easy to do with capsules. Pure C (via #include) is desired, but can be added later because doing that is more annoying. It sounds like we want to expose the highest level generators; do we also > want to provide access to the bit streams? > 100% -------------- next part -------------- An HTML attachment was scrubbed... URL: From kevin.k.sheppard at gmail.com Wed Sep 25 16:36:44 2019 From: kevin.k.sheppard at gmail.com (Kevin Sheppard) Date: Wed, 25 Sep 2019 21:36:44 +0100 Subject: [Numpy-discussion] Low-level API for Random In-Reply-To: <03ba8d91-52d4-43d8-b9bb-5fd9973a96ee@www.fastmail.com> References: <03ba8d91-52d4-43d8-b9bb-5fd9973a96ee@www.fastmail.com> Message-ID: > > I'd like to clarify what exactly we mean by exposing a C API. Do we have > in mind that our random number generators can be used from standalone C > code, or via Cython `cimport` like with the current numpy.pxd? > > It sounds like we want to expose the highest level generators; do we also > want to provide access to the bit streams? > > What do you mean by standalone C? A Python extension is written in C (but not Cython)? Or a C application that doesn't include Python.h? The former is pretty easy since you can use a few PyObjects to simplify initializing the bit generator, and the rest of the code can be directly used in C without any more Python objects. The latter is also doable although the low-level functions needed to initialize the bit generators (which are just C structs) have no standardization. I think the only component in a standalone C application that would need some non-trivial work is SeedSequence (i.e., more than changing function names or reorganizing files). Like Robert, I suspect that Cython users would be the largest immediate beneficiaries of a lower-level API. numba end-users can already consume the bit generators through the exposed CFFI/ctypes interface they provide. These can then be used with the higher-leverl generators, although end users have to build a shared lib/DLL first. Getting the C API in shape to be used directly by numba is probably a bigger task. Kevin -------------- next part -------------- An HTML attachment was scrubbed... URL: From alan.isaac at gmail.com Fri Sep 27 12:11:38 2019 From: alan.isaac at gmail.com (Alan Isaac) Date: Fri, 27 Sep 2019 12:11:38 -0400 Subject: [Numpy-discussion] error during pip install Message-ID: <2ba1fd4e-b5e0-fc8a-68f5-069ae09729c6@gmail.com> Upgrading numpy with pip on Python 3.8b4 on Win 10 produced: ERROR: Could not install packages due to an EnvironmentError: [WinError 123] The filename, directory name, or volume label syntax is incorrect: '"C:' However, the install appears to have been successful. fwiw, Alan Isaac From warren.weckesser at gmail.com Fri Sep 27 13:54:40 2019 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Fri, 27 Sep 2019 13:54:40 -0400 Subject: [Numpy-discussion] NEP 32 is accepted. Now the work begins... Message-ID: NumPy devs, NEP 32 to remove the financial functions (https://numpy.org/neps/nep-0032-remove-financial-functions.html) has been accepted. The next step is to create the numpy-financial package that will replace them. The repository for the new package is https://github.com/numpy/numpy-financial. I have a work-in-progress pull request there to get the initial structure set up. Reviews of the PR would be helpful, as would contributions to set up Sphinx-based documentation, continuous integration, PyPI packaging, and anything else that goes into setting up a "proper" package. Any help would be greatly appreciated! Warren From sebastian at sipsolutions.net Fri Sep 27 14:50:30 2019 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Fri, 27 Sep 2019 11:50:30 -0700 Subject: [Numpy-discussion] UFunc out argument not forcing high precision loop? Message-ID: Hi all, Looking at the ufunc dispatching rules with an `out` argument, I was a bit surprised to realize this little gem is how things work: ``` arr = np.arange(10, dtype=np.uint16) + 2**15 print(arr) # array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18], dtype=uint16) out = np.zeros(10) np.add(arr, arr, out=out) print(repr(out)) # array([ 0., 2., 4., 6., 8., 10., 12., 14., 16., 18.]) ``` This is strictly speaking correct/consistent. What the ufunc tries to ensure is that whatever the loop produces fits into `out`. However, I still find it unexpected that it does not pick the full precision loop. There is currently only one way to achieve that, and this by using `dtype=out.dtype` (or similar incarnations) which specify the exact dtype [0]. Of course this is also because I would like to simplify things for a new dispatching system, but I would like to propose to disable the above behaviour. This would mean: ``` # make the call: np.add(arr, arr, out=out) # Equivalent to the current [1]: np.add(arr, arr, out=out, dtype=(None, None, out.dtype)) # Getting the old behaviour requires (assuming inputs have same dtype): np.add(arr, arr, out=out, dtypes=arr.dtype) ``` and thus force the high precision loop. In very rare cases, this could lead to no loop being found. The main incompatibility is if someone actually makes use of the above (integer over/underflow) behaviour, but wants to store it in a higher precision array. I personally currently think we should change it, but am curious if we think that we may be able to get away with an accelerate process and not a year long FutureWarning. Cheers, Sebastian [0] You can also use `casting="no"` but in all relevant cases that should find no loop, since the we typically only have homogeneous loop definitions, and [1] Which is normally the same as the shorter spelling `dtype=out.dtype` of course. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From sebastian at sipsolutions.net Fri Sep 27 18:02:01 2019 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Fri, 27 Sep 2019 15:02:01 -0700 Subject: [Numpy-discussion] UFunc out argument not forcing high precision loop? In-Reply-To: References: Message-ID: <4d51e2edad53c2602ba15a9e0f51bed366caa3e8.camel@sipsolutions.net> On Fri, 2019-09-27 at 11:50 -0700, Sebastian Berg wrote: > Hi all, > > Looking at the ufunc dispatching rules with an `out` argument, I was > a > bit surprised to realize this little gem is how things work: > > ``` > arr = np.arange(10, dtype=np.uint16) + 2**15 > print(arr) > # array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18], dtype=uint16) > Whoops, copied that print wrong of course. Just to be clear, I personally will consider this an accuracy/precision bug and assume that we can just switch the behaviour failry unceremoniously at some point (and if someone feels that should be a major release, I do not mind). It seems like one of those things that will definitely fix some bugs but could break the odd system/assumption somewhere. Similar to fixing the memory overlap issues. - Sebastian > out = np.zeros(10) > > np.add(arr, arr, out=out) > print(repr(out)) > # array([ 0., 2., 4., 6., 8., 10., 12., 14., 16., 18.]) > ``` > > This is strictly speaking correct/consistent. What the ufunc tries to > ensure is that whatever the loop produces fits into `out`. > However, I still find it unexpected that it does not pick the full > precision loop. > > There is currently only one way to achieve that, and this by using > `dtype=out.dtype` (or similar incarnations) which specify the exact > dtype [0]. > > Of course this is also because I would like to simplify things for a > new dispatching system, but I would like to propose to disable the > above behaviour. This would mean: > > ``` > # make the call: > np.add(arr, arr, out=out) > > # Equivalent to the current [1]: > np.add(arr, arr, out=out, dtype=(None, None, out.dtype)) > > # Getting the old behaviour requires (assuming inputs have same > dtype): > np.add(arr, arr, out=out, dtypes=arr.dtype) > ``` > > and thus force the high precision loop. In very rare cases, this > could > lead to no loop being found. > > The main incompatibility is if someone actually makes use of the > above > (integer over/underflow) behaviour, but wants to store it in a higher > precision array. > > I personally currently think we should change it, but am curious if > we > think that we may be able to get away with an accelerate process and > not a year long FutureWarning. > > Cheers, > > Sebastian > > > [0] You can also use `casting="no"` but in all relevant cases that > should find no loop, since the we typically only have homogeneous > loop > definitions, and > > [1] Which is normally the same as the shorter spelling > `dtype=out.dtype` of course. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From njs at pobox.com Fri Sep 27 18:50:38 2019 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 27 Sep 2019 15:50:38 -0700 Subject: [Numpy-discussion] UFunc out argument not forcing high precision loop? In-Reply-To: <4d51e2edad53c2602ba15a9e0f51bed366caa3e8.camel@sipsolutions.net> References: <4d51e2edad53c2602ba15a9e0f51bed366caa3e8.camel@sipsolutions.net> Message-ID: It is pretty weird that these two statements don't necessarily produce the same result: someufunc(*inputs, out=out_arr) out_arr[...] = someufunc(*inputs) On Fri, Sep 27, 2019, 15:02 Sebastian Berg wrote: > On Fri, 2019-09-27 at 11:50 -0700, Sebastian Berg wrote: > > Hi all, > > > > Looking at the ufunc dispatching rules with an `out` argument, I was > > a > > bit surprised to realize this little gem is how things work: > > > > ``` > > arr = np.arange(10, dtype=np.uint16) + 2**15 > > print(arr) > > # array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18], dtype=uint16) > > > > Whoops, copied that print wrong of course. > > Just to be clear, I personally will consider this an accuracy/precision > bug and assume that we can just switch the behaviour failry > unceremoniously at some point (and if someone feels that should be a > major release, I do not mind). > It seems like one of those things that will definitely fix some bugs > but could break the odd system/assumption somewhere. Similar to fixing > the memory overlap issues. > > - Sebastian > > > > out = np.zeros(10) > > > > np.add(arr, arr, out=out) > > print(repr(out)) > > # array([ 0., 2., 4., 6., 8., 10., 12., 14., 16., 18.]) > > ``` > > > > This is strictly speaking correct/consistent. What the ufunc tries to > > ensure is that whatever the loop produces fits into `out`. > > However, I still find it unexpected that it does not pick the full > > precision loop. > > > > There is currently only one way to achieve that, and this by using > > `dtype=out.dtype` (or similar incarnations) which specify the exact > > dtype [0]. > > > > Of course this is also because I would like to simplify things for a > > new dispatching system, but I would like to propose to disable the > > above behaviour. This would mean: > > > > ``` > > # make the call: > > np.add(arr, arr, out=out) > > > > # Equivalent to the current [1]: > > np.add(arr, arr, out=out, dtype=(None, None, out.dtype)) > > > > # Getting the old behaviour requires (assuming inputs have same > > dtype): > > np.add(arr, arr, out=out, dtypes=arr.dtype) > > ``` > > > > and thus force the high precision loop. In very rare cases, this > > could > > lead to no loop being found. > > > > The main incompatibility is if someone actually makes use of the > > above > > (integer over/underflow) behaviour, but wants to store it in a higher > > precision array. > > > > I personally currently think we should change it, but am curious if > > we > > think that we may be able to get away with an accelerate process and > > not a year long FutureWarning. > > > > Cheers, > > > > Sebastian > > > > > > [0] You can also use `casting="no"` but in all relevant cases that > > should find no loop, since the we typically only have homogeneous > > loop > > definitions, and > > > > [1] Which is normally the same as the shorter spelling > > `dtype=out.dtype` of course. > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Fri Sep 27 19:11:07 2019 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Fri, 27 Sep 2019 16:11:07 -0700 Subject: [Numpy-discussion] UFunc out argument not forcing high precision loop? In-Reply-To: References: <4d51e2edad53c2602ba15a9e0f51bed366caa3e8.camel@sipsolutions.net> Message-ID: On Fri, 2019-09-27 at 15:50 -0700, Nathaniel Smith wrote: > It is pretty weird that these two statements don't necessarily > produce the same result: > > someufunc(*inputs, out=out_arr) > out_arr[...] = someufunc(*inputs) > Ooopst, fair point. I am not sure I agree, since currently the (mental) model is typically: loop_dtype = np.result_type(*arguments) the question now is, if it is arguments or outputs. However, the oops is, that I did not realize that right now do ? effectively ? ignore the output argument completely for the type resolution. (i.e. I could probably work with that assumption, without actually breaking anything.) - Sebastian > On Fri, Sep 27, 2019, 15:02 Sebastian Berg < > sebastian at sipsolutions.net> wrote: > > On Fri, 2019-09-27 at 11:50 -0700, Sebastian Berg wrote: > > > Hi all, > > > > > > Looking at the ufunc dispatching rules with an `out` argument, I > > was > > > a > > > bit surprised to realize this little gem is how things work: > > > > > > ``` > > > arr = np.arange(10, dtype=np.uint16) + 2**15 > > > print(arr) > > > # array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18], dtype=uint16) > > > > > > > Whoops, copied that print wrong of course. > > > > Just to be clear, I personally will consider this an > > accuracy/precision > > bug and assume that we can just switch the behaviour failry > > unceremoniously at some point (and if someone feels that should be > > a > > major release, I do not mind). > > It seems like one of those things that will definitely fix some > > bugs > > but could break the odd system/assumption somewhere. Similar to > > fixing > > the memory overlap issues. > > > > - Sebastian > > > > > > > out = np.zeros(10) > > > > > > np.add(arr, arr, out=out) > > > print(repr(out)) > > > # array([ 0., 2., 4., 6., 8., 10., 12., 14., 16., 18.]) > > > ``` > > > > > > This is strictly speaking correct/consistent. What the ufunc > > tries to > > > ensure is that whatever the loop produces fits into `out`. > > > However, I still find it unexpected that it does not pick the > > full > > > precision loop. > > > > > > There is currently only one way to achieve that, and this by > > using > > > `dtype=out.dtype` (or similar incarnations) which specify the > > exact > > > dtype [0]. > > > > > > Of course this is also because I would like to simplify things > > for a > > > new dispatching system, but I would like to propose to disable > > the > > > above behaviour. This would mean: > > > > > > ``` > > > # make the call: > > > np.add(arr, arr, out=out) > > > > > > # Equivalent to the current [1]: > > > np.add(arr, arr, out=out, dtype=(None, None, out.dtype)) > > > > > > # Getting the old behaviour requires (assuming inputs have same > > > dtype): > > > np.add(arr, arr, out=out, dtypes=arr.dtype) > > > ``` > > > > > > and thus force the high precision loop. In very rare cases, this > > > could > > > lead to no loop being found. > > > > > > The main incompatibility is if someone actually makes use of the > > > above > > > (integer over/underflow) behaviour, but wants to store it in a > > higher > > > precision array. > > > > > > I personally currently think we should change it, but am curious > > if > > > we > > > think that we may be able to get away with an accelerate process > > and > > > not a year long FutureWarning. > > > > > > Cheers, > > > > > > Sebastian > > > > > > > > > [0] You can also use `casting="no"` but in all relevant cases > > that > > > should find no loop, since the we typically only have homogeneous > > > loop > > > definitions, and > > > > > > [1] Which is normally the same as the shorter spelling > > > `dtype=out.dtype` of course. > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at python.org > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From charlesr.harris at gmail.com Fri Sep 27 19:41:00 2019 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 27 Sep 2019 17:41:00 -0600 Subject: [Numpy-discussion] error during pip install In-Reply-To: <2ba1fd4e-b5e0-fc8a-68f5-069ae09729c6@gmail.com> References: <2ba1fd4e-b5e0-fc8a-68f5-069ae09729c6@gmail.com> Message-ID: I that the pip that comes with Python 3.8b4? On Fri, Sep 27, 2019 at 10:12 AM Alan Isaac wrote: > Upgrading numpy with pip on Python 3.8b4 on Win 10 produced: > ERROR: Could not install packages due to an EnvironmentError: [WinError > 123] The filename, directory name, or volume label syntax is incorrect: > '"C:' > > However, the install appears to have been successful. > > fwiw, Alan Isaac > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Sep 27 19:43:40 2019 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 27 Sep 2019 17:43:40 -0600 Subject: [Numpy-discussion] error during pip install In-Reply-To: References: <2ba1fd4e-b5e0-fc8a-68f5-069ae09729c6@gmail.com> Message-ID: On Fri, Sep 27, 2019 at 5:41 PM Charles R Harris wrote: > I that the pip that comes with Python 3.8b4? > > On Fri, Sep 27, 2019 at 10:12 AM Alan Isaac wrote: > >> Upgrading numpy with pip on Python 3.8b4 on Win 10 produced: >> ERROR: Could not install packages due to an EnvironmentError: [WinError >> 123] The filename, directory name, or volume label syntax is incorrect: >> '"C:' >> >> However, the install appears to have been successful. >> >> And where did you get NumPy, we don't have any compatible wheels. Was this from source? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From alan.isaac at gmail.com Sat Sep 28 11:44:25 2019 From: alan.isaac at gmail.com (Alan Isaac) Date: Sat, 28 Sep 2019 11:44:25 -0400 Subject: [Numpy-discussion] error during pip install In-Reply-To: References: <2ba1fd4e-b5e0-fc8a-68f5-069ae09729c6@gmail.com> Message-ID: <497cf6e2-e6de-c639-704e-6e353829bc6d@gmail.com> >>> On Fri, Sep 27, 2019 at 10:12 AM Alan Isaac wrote: >> Upgrading numpy with pip on Python 3.8b4 on Win 10 >> produced: >> ERROR: Could not install packages due to an >> EnvironmentError: [WinError 123] The filename, >> directory name, or volume label syntax is incorrect: >> '"C:' >> However, the install appears to have been successful. >> On Fri, Sep 27, 2019 at 5:41 PM Charles R Harris wrote: > I that the pip that comes with Python 3.8b4? Yes. On 9/27/2019 7:43 PM, Charles R Harris wrote: > And where did you get NumPy, we don't have any compatible wheels. Was this from source? Umm, ... does `pip` automatically compile from source in this case? (I just used `python38 -m pip install numpy`; I'm afraid I did not specify a log file.) But I'll take the core message to be: wait for the wheels. Cheers, Alan From charlesr.harris at gmail.com Sat Sep 28 12:12:56 2019 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 28 Sep 2019 10:12:56 -0600 Subject: [Numpy-discussion] error during pip install In-Reply-To: <497cf6e2-e6de-c639-704e-6e353829bc6d@gmail.com> References: <2ba1fd4e-b5e0-fc8a-68f5-069ae09729c6@gmail.com> <497cf6e2-e6de-c639-704e-6e353829bc6d@gmail.com> Message-ID: On Sat, Sep 28, 2019 at 9:45 AM Alan Isaac wrote: > >>> On Fri, Sep 27, 2019 at 10:12 AM Alan Isaac wrote: > >> Upgrading numpy with pip on Python 3.8b4 on Win 10 > >> produced: > >> ERROR: Could not install packages due to an > >> EnvironmentError: [WinError 123] The filename, > >> directory name, or volume label syntax is incorrect: > >> '"C:' > > >> However, the install appears to have been successful. > > >> On Fri, Sep 27, 2019 at 5:41 PM Charles R Harris wrote: > > I that the pip that comes with Python 3.8b4? > > Yes. > > On 9/27/2019 7:43 PM, Charles R Harris wrote: > > And where did you get NumPy, we don't have any compatible wheels. Was > this from source? > > > Umm, ... does `pip` automatically compile from source in > this case? (I just used `python38 -m pip install numpy`; > I'm afraid I did not specify a log file.) > Yes. I'm actually pleased that the install succeeded on Window, although you won't have good BLAS/LAPACK, just the numpy C versions of lapack_lite. The warning/error is a bit concerning though, it would be nice to know if it is from Python3.8 pip or numpy. > But I'll take the core message to be: wait for the wheels. > We will need to work on generating 3.8 wheels as soon as Python 3.8 is released. I'd like to try before then, but the simplest attempt failed and I didn't pursue it. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at gmail.com Sat Sep 28 13:15:49 2019 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Sat, 28 Sep 2019 13:15:49 -0400 Subject: [Numpy-discussion] NEP 32 is accepted. Now the work begins... In-Reply-To: References: Message-ID: On 9/27/19, Warren Weckesser wrote: > NumPy devs, > > NEP 32 to remove the financial functions > (https://numpy.org/neps/nep-0032-remove-financial-functions.html) has > been accepted. CI gurus: the web page containing the rendered NEPs, https://numpy.org/neps/, has not updated since the pull request that changed the status of NEP 32 to Accepted was merged (https://github.com/numpy/numpy/pull/14600). Does something else need to be done to get that page to regenerate? Warren The next step is to create the numpy-financial package > that will replace them. The repository for the new package is > https://github.com/numpy/numpy-financial. > > I have a work-in-progress pull request there to get the initial > structure set up. Reviews of the PR would be helpful, as would > contributions to set up Sphinx-based documentation, continuous > integration, PyPI packaging, and anything else that goes into setting > up a "proper" package. Any help would be greatly appreciated! > > > Warren > From alan.isaac at gmail.com Sat Sep 28 13:22:47 2019 From: alan.isaac at gmail.com (Alan Isaac) Date: Sat, 28 Sep 2019 13:22:47 -0400 Subject: [Numpy-discussion] error during pip install In-Reply-To: References: <2ba1fd4e-b5e0-fc8a-68f5-069ae09729c6@gmail.com> <497cf6e2-e6de-c639-704e-6e353829bc6d@gmail.com> Message-ID: <98c93ae3-fa34-c919-ae86-764faa5e9f0a@gmail.com> On 9/28/2019 12:12 PM, Charles R Harris wrote: > I'm actually pleased that the install succeeded on Window, although you won't have good BLAS/LAPACK, just the numpy C versions of lapack_lite. The warning/error is a bit > concerning though, it would be nice to know if it is from Python3.8 pip or numpy. Possibly relevant: https://github.com/numpy/numpy/issues/11451 Alan From charlesr.harris at gmail.com Sat Sep 28 13:27:51 2019 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 28 Sep 2019 11:27:51 -0600 Subject: [Numpy-discussion] error during pip install In-Reply-To: <98c93ae3-fa34-c919-ae86-764faa5e9f0a@gmail.com> References: <2ba1fd4e-b5e0-fc8a-68f5-069ae09729c6@gmail.com> <497cf6e2-e6de-c639-704e-6e353829bc6d@gmail.com> <98c93ae3-fa34-c919-ae86-764faa5e9f0a@gmail.com> Message-ID: On Sat, Sep 28, 2019 at 11:23 AM Alan Isaac wrote: > On 9/28/2019 12:12 PM, Charles R Harris wrote: > > I'm actually pleased that the install succeeded on Window, although you > won't have good BLAS/LAPACK, just the numpy C versions of lapack_lite. The > warning/error is a bit > > concerning though, it would be nice to know if it is from Python3.8 pip > or numpy. > > > Possibly relevant: > https://github.com/numpy/numpy/issues/11451 > > Yes, thanks, that looks to be the problem. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at gmail.com Sat Sep 28 20:47:22 2019 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Sat, 28 Sep 2019 20:47:22 -0400 Subject: [Numpy-discussion] Forcing gufunc to error with size zero input Message-ID: I'm experimenting with gufuncs, and I just created a simple one with signature '(i)->()'. Is there a way to configure the gufunc itself so that an empty array results in an error? Or would I have to create a Python wrapper around the gufunc that does the error checking? Currently, when passed an empty array, the ufunc loop is called with the core dimension associated with i set to 0. It would be nice if the code didn't get that far, and the ufunc machinery "knew" that this gufunc didn't accept a core dimension that is 0. I'd like to automatically get an error, something like the error produced by `np.max([])`. Warren From wieser.eric+numpy at gmail.com Sat Sep 28 21:03:50 2019 From: wieser.eric+numpy at gmail.com (Eric Wieser) Date: Sat, 28 Sep 2019 18:03:50 -0700 Subject: [Numpy-discussion] Forcing gufunc to error with size zero input In-Reply-To: References: Message-ID: Can you just raise an exception in the gufuncs inner loop? Or is there no mechanism to do that today? I don't think you were proposing that core dimensions should _never_ be allowed to be 0, but if you were I disagree. I spent a fair amount of work enabling that for linalg because it provided some convenient base cases. We could go down the route of augmenting the gufuncs signature syntax to support requiring non-empty dimensions, like we did for optional ones - although IMO we should consider switching from a string minilanguage to a structured object specification if we plan to go too much further with extending it. On Sat, Sep 28, 2019, 17:47 Warren Weckesser wrote: > I'm experimenting with gufuncs, and I just created a simple one with > signature '(i)->()'. Is there a way to configure the gufunc itself so > that an empty array results in an error? Or would I have to create a > Python wrapper around the gufunc that does the error checking? > Currently, when passed an empty array, the ufunc loop is called with > the core dimension associated with i set to 0. It would be nice if > the code didn't get that far, and the ufunc machinery "knew" that this > gufunc didn't accept a core dimension that is 0. I'd like to > automatically get an error, something like the error produced by > `np.max([])`. > > Warren > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Sat Sep 28 21:22:00 2019 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Sat, 28 Sep 2019 18:22:00 -0700 Subject: [Numpy-discussion] NEP 32 is accepted. Now the work begins... In-Reply-To: References: Message-ID: <8825c1fd342b3fe4e342a2b5e55f2bd3a055d361.camel@sipsolutions.net> On Sat, 2019-09-28 at 13:15 -0400, Warren Weckesser wrote: > On 9/27/19, Warren Weckesser wrote: > > NumPy devs, > > > > NEP 32 to remove the financial functions > > (https://numpy.org/neps/nep-0032-remove-financial-functions.html) > > has > > been accepted. > > CI gurus: the web page containing the rendered NEPs, > https://numpy.org/neps/, has not updated since the pull request that > changed the status of NEP 32 to Accepted was merged > (https://github.com/numpy/numpy/pull/14600). Does something else > need > to be done to get that page to regenerate? > I pushed an empty commit to trigger deployment. That should happen automatically (as it does OK for the devdocs). I do not know why it does not work, and github did not yet answer my service request on it. - Sebastian > Warren > > > The next step is to create the numpy-financial package > > that will replace them. The repository for the new package is > > https://github.com/numpy/numpy-financial. > > > > I have a work-in-progress pull request there to get the initial > > structure set up. Reviews of the PR would be helpful, as would > > contributions to set up Sphinx-based documentation, continuous > > integration, PyPI packaging, and anything else that goes into > > setting > > up a "proper" package. Any help would be greatly appreciated! > > > > > > Warren > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From warren.weckesser at gmail.com Sat Sep 28 23:03:49 2019 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Sat, 28 Sep 2019 23:03:49 -0400 Subject: [Numpy-discussion] NEP 32 is accepted. Now the work begins... In-Reply-To: <8825c1fd342b3fe4e342a2b5e55f2bd3a055d361.camel@sipsolutions.net> References: <8825c1fd342b3fe4e342a2b5e55f2bd3a055d361.camel@sipsolutions.net> Message-ID: On 9/28/19, Sebastian Berg wrote: > On Sat, 2019-09-28 at 13:15 -0400, Warren Weckesser wrote: >> On 9/27/19, Warren Weckesser wrote: >> > NumPy devs, >> > >> > NEP 32 to remove the financial functions >> > (https://numpy.org/neps/nep-0032-remove-financial-functions.html) >> > has >> > been accepted. >> >> CI gurus: the web page containing the rendered NEPs, >> https://numpy.org/neps/, has not updated since the pull request that >> changed the status of NEP 32 to Accepted was merged >> (https://github.com/numpy/numpy/pull/14600). Does something else >> need >> to be done to get that page to regenerate? >> > > I pushed an empty commit to trigger deployment. That should happen > automatically (as it does OK for the devdocs). I do not know why it > does not work, and github did not yet answer my service request on it. > Thanks Sebastian. The NEPs web page is updated now. Warren > - Sebastian > > >> Warren >> >> >> The next step is to create the numpy-financial package >> > that will replace them. The repository for the new package is >> > https://github.com/numpy/numpy-financial. >> > >> > I have a work-in-progress pull request there to get the initial >> > structure set up. Reviews of the PR would be helpful, as would >> > contributions to set up Sphinx-based documentation, continuous >> > integration, PyPI packaging, and anything else that goes into >> > setting >> > up a "proper" package. Any help would be greatly appreciated! >> > >> > >> > Warren >> > >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > From warren.weckesser at gmail.com Sun Sep 29 00:20:03 2019 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Sun, 29 Sep 2019 00:20:03 -0400 Subject: [Numpy-discussion] Forcing gufunc to error with size zero input In-Reply-To: References: Message-ID: On 9/28/19, Eric Wieser wrote: > Can you just raise an exception in the gufuncs inner loop? Or is there no > mechanism to do that today? Maybe? I don't know what is the idiomatic way to handle errors detected in an inner loop. And pushing this particular error detection into the inner loop doesn't feel right. > > I don't think you were proposing that core dimensions should _never_ be > allowed to be 0, No, I'm not suggesting that. There are many cases where a length 0 core dimension is fine. I'm interested in the case where there is not a meaningful definition of the operation on the empty set. The mean is an example. Currently `np.mean([])` generates two warnings (one useful, the other cryptic and apparently incidental), and returns nan. Returning nan is one way to handle such a case; another is to raise an error like `np.amax([])` does. I'd like to raise an error in the example that I'm working on ('peaktopeak' at https://github.com/WarrenWeckesser/npuff). The function is a gufunc, not a reduction of a binary operation, so the 'identity' argument of PyUFunc_FromFuncAndDataAndSignature has no effect. > but if you were I disagree. I spent a fair amount of work > enabling that for linalg because it provided some convenient base cases. > > We could go down the route of augmenting the gufuncs signature syntax to > support requiring non-empty dimensions, like we did for optional ones - > although IMO we should consider switching from a string minilanguage to a > structured object specification if we plan to go too much further with > extending it. After only a quick glance at that code: one option is to add a '+' after the input names in the signature that must have a length that is at least 1. So the signature for functions like `mean` (if you were to reimplement it as a gufunc, and wanted an error instead of nan), `amax`, `ptp`, etc, would be '(i+)->()'. However, the only meaningful uses-cases of this enhancement that I've come up with are these simple reductions. So I don't know if making such a change to the signature is worthwhile. On the other hand, there are many examples of useful 1-d reductions that aren't the reduction of an associative binary operation. It might be worthwhile to have a new convenience function just for the case '(i)->()', maybe something like PyUFunc_OneDReduction_FromFuncAndData (ugh, that's ugly, but I think you get the idea), and that function can have an argument to specify that the length must be at least 1. I'll see if that is feasible, but I won't be surprised to learn that there are good reasons for *not* doing that. Warren > > On Sat, Sep 28, 2019, 17:47 Warren Weckesser > wrote: > >> I'm experimenting with gufuncs, and I just created a simple one with >> signature '(i)->()'. Is there a way to configure the gufunc itself so >> that an empty array results in an error? Or would I have to create a >> Python wrapper around the gufunc that does the error checking? >> Currently, when passed an empty array, the ufunc loop is called with >> the core dimension associated with i set to 0. It would be nice if >> the code didn't get that far, and the ufunc machinery "knew" that this >> gufunc didn't accept a core dimension that is 0. I'd like to >> automatically get an error, something like the error produced by >> `np.max([])`. >> >> Warren >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > From warren.weckesser at gmail.com Sun Sep 29 00:40:50 2019 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Sun, 29 Sep 2019 00:40:50 -0400 Subject: [Numpy-discussion] Forcing gufunc to error with size zero input In-Reply-To: References: Message-ID: On 9/29/19, Warren Weckesser wrote: > On 9/28/19, Eric Wieser wrote: >> Can you just raise an exception in the gufuncs inner loop? Or is there no >> mechanism to do that today? > > Maybe? I don't know what is the idiomatic way to handle errors > detected in an inner loop. And pushing this particular error > detection into the inner loop doesn't feel right. > > >> >> I don't think you were proposing that core dimensions should _never_ be >> allowed to be 0, > > > No, I'm not suggesting that. There are many cases where a length 0 > core dimension is fine. > > I'm interested in the case where there is not a meaningful definition > of the operation on the empty set. The mean is an example. Currently > `np.mean([])` generates two warnings (one useful, the other cryptic > and apparently incidental), and returns nan. Returning nan is one way > to handle such a case; another is to raise an error like `np.amax([])` > does. I'd like to raise an error in the example that I'm working on > ('peaktopeak' at https://github.com/WarrenWeckesser/npuff). The > function is a gufunc, not a reduction of a binary operation, so the > 'identity' argument of PyUFunc_FromFuncAndDataAndSignature has no > effect. > >> but if you were I disagree. I spent a fair amount of work >> enabling that for linalg because it provided some convenient base cases. >> >> We could go down the route of augmenting the gufuncs signature syntax to >> support requiring non-empty dimensions, like we did for optional ones - >> although IMO we should consider switching from a string minilanguage to a >> structured object specification if we plan to go too much further with >> extending it. > > After only a quick glance at that code: one option is to add a '+' > after the input names in the signature that must have a length that is > at least 1. So the signature for functions like `mean` (if you were > to reimplement it as a gufunc, and wanted an error instead of nan), > `amax`, `ptp`, etc, would be '(i+)->()'. > > However, the only meaningful uses-cases of this enhancement that I've > come up with are these simple reductions. Of course, just minutes after sending the email, I realized I *do* know of other signatures that could benefit from a check on the core dimension size. An implementation of Pearson's correlation coefficient as a gufunc would have signature (i),(i)->(), and the core dimension i must be at least *2* for the calculation to be well defined. Other correlations would also likely require a nonzero core dimension. Warren > So I don't know if making > such a change to the signature is worthwhile. On the other hand, > there are many examples of useful 1-d reductions that aren't the > reduction of an associative binary operation. It might be worthwhile > to have a new convenience function just for the case '(i)->()', maybe > something like PyUFunc_OneDReduction_FromFuncAndData (ugh, that's > ugly, but I think you get the idea), and that function can have an > argument to specify that the length must be at least 1. > > I'll see if that is feasible, but I won't be surprised to learn that > there are good reasons for *not* doing that. > > Warren > > > >> >> On Sat, Sep 28, 2019, 17:47 Warren Weckesser >> wrote: >> >>> I'm experimenting with gufuncs, and I just created a simple one with >>> signature '(i)->()'. Is there a way to configure the gufunc itself so >>> that an empty array results in an error? Or would I have to create a >>> Python wrapper around the gufunc that does the error checking? >>> Currently, when passed an empty array, the ufunc loop is called with >>> the core dimension associated with i set to 0. It would be nice if >>> the code didn't get that far, and the ufunc machinery "knew" that this >>> gufunc didn't accept a core dimension that is 0. I'd like to >>> automatically get an error, something like the error produced by >>> `np.max([])`. >>> >>> Warren >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >> > From sebastian at sipsolutions.net Sun Sep 29 00:43:36 2019 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Sat, 28 Sep 2019 21:43:36 -0700 Subject: [Numpy-discussion] Forcing gufunc to error with size zero input In-Reply-To: References: Message-ID: On Sun, 2019-09-29 at 00:20 -0400, Warren Weckesser wrote: > On 9/28/19, Eric Wieser wrote: > > Can you just raise an exception in the gufuncs inner loop? Or is > > there no > > mechanism to do that today? > > Maybe? I don't know what is the idiomatic way to handle errors > detected in an inner loop. And pushing this particular error > detection into the inner loop doesn't feel right. > Basically, since you want to release the GIL, you can grab and set an error right now. That will work, although grabbing the GIL from the inner loop is not ideal, at least in the sense that it does not work with subinterpreters (but numpy does not currently work with those in any case). We do use this internally, I believe. Well, even without dtypes, I think we probably want a few extra API around UFuncs, and that is setup/teardown (not necessarily as such functions), as well as a return value for the inner loop to signal iteration stop. There was a long discussion about that, for example here: https://github.com/numpy/numpy/issues/12518 There is another use-case, that we probably want to allow optimized loop selection (necessary/used in casting).. Note that I believe all of this type of logic should be moved into a UFuncImpl [0] object, so that it can be DType (and especially user DType) specific without bloating up the current UFunc object too much. Although that puts a lot of power out there, so may be good to limit it a lot iniyially Best, Sebastian [0] It was Erics suggestion/name, I do not know if it came up earlier. > > > I don't think you were proposing that core dimensions should > > _never_ be > > allowed to be 0, > > No, I'm not suggesting that. There are many cases where a length 0 > core dimension is fine. > > I'm interested in the case where there is not a meaningful definition > of the operation on the empty set. The mean is an > example. Currently > `np.mean([])` generates two warnings (one useful, the other cryptic > and apparently incidental), and returns nan. Returning nan is one > way > to handle such a case; another is to raise an error like > `np.amax([])` > does. I'd like to raise an error in the example that I'm working on > ('peaktopeak' at https://github.com/WarrenWeckesser/npuff). The > function is a gufunc, not a reduction of a binary operation, so the > 'identity' argument of PyUFunc_FromFuncAndDataAndSignature has no > effect. > > > but if you were I disagree. I spent a fair amount of work > > enabling that for linalg because it provided some convenient base > > cases. > > > > We could go down the route of augmenting the gufuncs signature > > syntax to > > support requiring non-empty dimensions, like we did for optional > > ones - > > although IMO we should consider switching from a string > > minilanguage to a > > structured object specification if we plan to go too much further > > with > > extending it. > > After only a quick glance at that code: one option is to add a '+' > after the input names in the signature that must have a length that > is > at least 1. So the signature for functions like `mean` (if you were > to reimplement it as a gufunc, and wanted an error instead of nan), > `amax`, `ptp`, etc, would be '(i+)->()'. > > However, the only meaningful uses-cases of this enhancement that I've > come up with are these simple reductions. So I don't know if making > such a change to the signature is worthwhile. On the other hand, > there are many examples of useful 1-d reductions that aren't the > reduction of an associative binary operation. It might be worthwhile > to have a new convenience function just for the case '(i)->()', maybe > something like PyUFunc_OneDReduction_FromFuncAndData (ugh, that's > ugly, but I think you get the idea), and that function can have an > argument to specify that the length must be at least 1. > > I'll see if that is feasible, but I won't be surprised to learn that > there are good reasons for *not* doing that. > > Warren > > > > > On Sat, Sep 28, 2019, 17:47 Warren Weckesser < > > warren.weckesser at gmail.com> > > wrote: > > > > > I'm experimenting with gufuncs, and I just created a simple one > > > with > > > signature '(i)->()'. Is there a way to configure the gufunc > > > itself so > > > that an empty array results in an error? Or would I have to > > > create a > > > Python wrapper around the gufunc that does the error checking? > > > Currently, when passed an empty array, the ufunc loop is called > > > with > > > the core dimension associated with i set to 0. It would be nice > > > if > > > the code didn't get that far, and the ufunc machinery "knew" that > > > this > > > gufunc didn't accept a core dimension that is 0. I'd like to > > > automatically get an error, something like the error produced by > > > `np.max([])`. > > > > > > Warren > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at python.org > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From warren.weckesser at gmail.com Sun Sep 29 11:02:33 2019 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Sun, 29 Sep 2019 11:02:33 -0400 Subject: [Numpy-discussion] Error handling in a ufunc inner loop. Message-ID: This is a new thread to address the question of error handling in a ufunc loop that was brought up in the thread on handling core dimensions of length zero. I'm attempting to answer my own question about the idiomatic way to handle an error in an inner loop. The use of the GIL with a ufunc loop is documented at https://numpy.org/devdocs/reference/internals.code-explanations.html#function-call So an inner loop is running without the GIL if the macro NPY_ALLOW_THREADS is defined and the loop is not an object-type loop. If the inner loop is running without the GIL, it must acquire the GIL before calling, say, PyErr_SetString to set an exception. The NumPy macros for acquiring the GIL are documented at https://docs.scipy.org/doc/numpy/reference/c-api.array.html#group-2 These macros are defined in numpy/core/include/numpy/ndarraytypes.h. If NPY_ALLOW_THREADS is defined, these macros wrap calls to PyGILState_Ensure() and PyGILState_Release() ( https://docs.python.org/3/c-api/init.html#non-python-created-threads): ``` #define NPY_ALLOW_C_API_DEF PyGILState_STATE __save__; #define NPY_ALLOW_C_API do {__save__ = PyGILState_Ensure();} while (0); #define NPY_DISABLE_C_API do {PyGILState_Release(__save__);} while (0); ``` If NPY_ALLOW_THREADS is not defined, those macros are defined with empty values. Now suppose I want to change the following inner loop to set an exception instead of returning nan when the input is negative: ``` static void logfactorial_loop(char **args, npy_intp *dimensions, npy_intp* steps, void* data) { char *in = args[0]; char *out = args[1]; npy_intp in_step = steps[0]; npy_intp out_step = steps[1]; for (npy_intp i = 0; i < dimensions[0]; ++i, in += in_step, out += out_step) { int64_t x = *(int64_t *)in; if (x < 0) { *((double *)out) = NAN; } else { *((double *)out) = logfactorial(x); } } } ``` Based on the documentation linked above, the changed inner loop is simply: ``` static void logfactorial_loop(char **args, npy_intp *dimensions, npy_intp* steps, void* data) { char *in = args[0]; char *out = args[1]; npy_intp in_step = steps[0]; npy_intp out_step = steps[1]; for (npy_intp i = 0; i < dimensions[0]; ++i, in += in_step, out += out_step) { int64_t x = *(int64_t *)in; if (x < 0) { NPY_ALLOW_C_API_DEF NPY_ALLOW_C_API PyErr_SetString(PyExc_ValueError, "math domain error in logfactorial: x < 0"); NPY_DISABLE_C_API return; } else { *((double *)out) = logfactorial(x); } } } ``` That worked as expected, but I haven't tried it yet with a NumPy installation where NPY_ALLOW_THREADS is not defined. Is that change correct? Would that be considered the (or an) idiomatic way to handle errors in an inner loop? Are there any potential problems that I'm missing? Warren -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at gmail.com Sun Sep 29 14:01:04 2019 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Sun, 29 Sep 2019 14:01:04 -0400 Subject: [Numpy-discussion] Forcing gufunc to error with size zero input In-Reply-To: References: Message-ID: On 9/29/19, Warren Weckesser wrote: > On 9/28/19, Eric Wieser wrote: >> Can you just raise an exception in the gufuncs inner loop? Or is there no >> mechanism to do that today? > > Maybe? I don't know what is the idiomatic way to handle errors > detected in an inner loop. And pushing this particular error > detection into the inner loop doesn't feel right. > > >> >> I don't think you were proposing that core dimensions should _never_ be >> allowed to be 0, > > > No, I'm not suggesting that. There are many cases where a length 0 > core dimension is fine. > > I'm interested in the case where there is not a meaningful definition > of the operation on the empty set. The mean is an example. Currently > `np.mean([])` generates two warnings (one useful, the other cryptic > and apparently incidental), and returns nan. Returning nan is one way > to handle such a case; another is to raise an error like `np.amax([])` > does. I'd like to raise an error in the example that I'm working on > ('peaktopeak' at https://github.com/WarrenWeckesser/npuff). The FYI: I renamed that repository to 'ufunclab': https://github.com/WarrenWeckesser/ufunclab Warren > function is a gufunc, not a reduction of a binary operation, so the > 'identity' argument of PyUFunc_FromFuncAndDataAndSignature has no > effect. > >> but if you were I disagree. I spent a fair amount of work >> enabling that for linalg because it provided some convenient base cases. >> >> We could go down the route of augmenting the gufuncs signature syntax to >> support requiring non-empty dimensions, like we did for optional ones - >> although IMO we should consider switching from a string minilanguage to a >> structured object specification if we plan to go too much further with >> extending it. > > After only a quick glance at that code: one option is to add a '+' > after the input names in the signature that must have a length that is > at least 1. So the signature for functions like `mean` (if you were > to reimplement it as a gufunc, and wanted an error instead of nan), > `amax`, `ptp`, etc, would be '(i+)->()'. > > However, the only meaningful uses-cases of this enhancement that I've > come up with are these simple reductions. So I don't know if making > such a change to the signature is worthwhile. On the other hand, > there are many examples of useful 1-d reductions that aren't the > reduction of an associative binary operation. It might be worthwhile > to have a new convenience function just for the case '(i)->()', maybe > something like PyUFunc_OneDReduction_FromFuncAndData (ugh, that's > ugly, but I think you get the idea), and that function can have an > argument to specify that the length must be at least 1. > > I'll see if that is feasible, but I won't be surprised to learn that > there are good reasons for *not* doing that. > > Warren > > > >> >> On Sat, Sep 28, 2019, 17:47 Warren Weckesser >> wrote: >> >>> I'm experimenting with gufuncs, and I just created a simple one with >>> signature '(i)->()'. Is there a way to configure the gufunc itself so >>> that an empty array results in an error? Or would I have to create a >>> Python wrapper around the gufunc that does the error checking? >>> Currently, when passed an empty array, the ufunc loop is called with >>> the core dimension associated with i set to 0. It would be nice if >>> the code didn't get that far, and the ufunc machinery "knew" that this >>> gufunc didn't accept a core dimension that is 0. I'd like to >>> automatically get an error, something like the error produced by >>> `np.max([])`. >>> >>> Warren >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >> > From warren.weckesser at gmail.com Sun Sep 29 14:17:25 2019 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Sun, 29 Sep 2019 14:17:25 -0400 Subject: [Numpy-discussion] Error handling in a ufunc inner loop. In-Reply-To: References: Message-ID: On 9/29/19, Warren Weckesser wrote: > This is a new thread to address the question of error handling in a ufunc > loop that was brought up in the thread on handling core dimensions of > length zero. I'm attempting to answer my own question about the idiomatic > way to handle an error in an inner loop. > > The use of the GIL with a ufunc loop is documented at > > > https://numpy.org/devdocs/reference/internals.code-explanations.html#function-call > > So an inner loop is running without the GIL if the macro NPY_ALLOW_THREADS > is defined and the loop is not an object-type loop. > > If the inner loop is running without the GIL, it must acquire the GIL > before calling, say, PyErr_SetString to set an exception. The NumPy macros > for acquiring the GIL are documented at > > https://docs.scipy.org/doc/numpy/reference/c-api.array.html#group-2 > > These macros are defined in numpy/core/include/numpy/ndarraytypes.h. If > NPY_ALLOW_THREADS is defined, these macros wrap calls to > PyGILState_Ensure() and PyGILState_Release() ( > https://docs.python.org/3/c-api/init.html#non-python-created-threads): > > ``` > #define NPY_ALLOW_C_API_DEF PyGILState_STATE __save__; > #define NPY_ALLOW_C_API do {__save__ = PyGILState_Ensure();} while > (0); > #define NPY_DISABLE_C_API do {PyGILState_Release(__save__);} while (0); > ``` > > If NPY_ALLOW_THREADS is not defined, those macros are defined with empty > values. > > Now suppose I want to change the following inner loop to set an exception > instead of returning nan when the input is negative: > > ``` > static void > logfactorial_loop(char **args, npy_intp *dimensions, > npy_intp* steps, void* data) > { > char *in = args[0]; > char *out = args[1]; > npy_intp in_step = steps[0]; > npy_intp out_step = steps[1]; > > for (npy_intp i = 0; i < dimensions[0]; ++i, in += in_step, out += > out_step) { > int64_t x = *(int64_t *)in; > if (x < 0) { > *((double *)out) = NAN; > } > else { > *((double *)out) = logfactorial(x); > } > } > } > ``` > > Based on the documentation linked above, the changed inner loop is simply: > > ``` > static void > logfactorial_loop(char **args, npy_intp *dimensions, > npy_intp* steps, void* data) > { > char *in = args[0]; > char *out = args[1]; > npy_intp in_step = steps[0]; > npy_intp out_step = steps[1]; > > for (npy_intp i = 0; i < dimensions[0]; ++i, in += in_step, out += > out_step) { > int64_t x = *(int64_t *)in; > if (x < 0) { > NPY_ALLOW_C_API_DEF > NPY_ALLOW_C_API > PyErr_SetString(PyExc_ValueError, "math domain error in > logfactorial: x < 0"); > NPY_DISABLE_C_API > return; > } > else { > *((double *)out) = logfactorial(x); > } > } > } > ``` > > That worked as expected, but I haven't tried it yet with a NumPy > installation where NPY_ALLOW_THREADS is not defined. > > Is that change correct? Would that be considered the (or an) idiomatic way > to handle errors in an inner loop? Are there any potential problems that > I'm missing? Sebastian Berg pointed out to me that exactly this pattern is used in NumPy, for example, https://github.com/numpy/numpy/blob/68bd6e359a6b0863acf39cad637e1444d78eabd0/numpy/core/src/umath/loops.c.src#L913 So I'll take that as a yes, that's the way (or at least a way) to do it. Warren > > Warren >