From matti.picus at gmail.com Tue May 1 01:21:13 2018 From: matti.picus at gmail.com (Matti Picus) Date: Tue, 1 May 2018 08:21:13 +0300 Subject: [Numpy-discussion] Extending ufunc signature syntax for matmul, frozen dimensions In-Reply-To: References: Message-ID: <260099c3-500d-b7f3-a75f-c3cf8a12e8f4@gmail.com> On 01/05/18 01:45, Allan Haldane wrote: > On 04/29/2018 05:46 AM, Matti Picus wrote: >> In looking to solve issue #9028 "no way to override matmul/@ if >> __array_ufunc__ is set", it seems there is consensus around the idea of >> making matmul a true gufunc, but matmul can behave differently for >> different combinations of array and vector: >> >> (n,k),(k,m)->(n,m) >> (n,k),(k) -> (n) >> (k),(k,m)->(m) >> >> Currently there is no way to express that in the ufunc signature. The >> proposed solution to issue #9029 is to extend the meaning of a signature >> so "syntax like (n?,k),(k,m?)->(n?,m?) could mean that n and m are >> optional dimensions; if missing in the input, they're treated as 1, and >> then dropped from the output" Additionally, there is an open pull >> request #5015 "Add frozen dimensions to gufunc signatures" to allow >> signatures like '(3),(3)->(3)'. > How much harder would it be to implement multiple-dispatch for gufunc > signatures, instead of modifying the signature to include `?` ? > > There was some discussion of this last year: > > http://numpy-discussion.10968.n7.nabble.com/Changes-to-generalized-ufunc-core-dimension-checking-tp42618p42638.html > > That sounded like a clean solution to me, although I'm a bit ignorant of > the gufunc internals and the compatibility constraints. > > I assume gufuncs already have code to match the signature to the array > dims, so it sounds fairly straightforward (I say without looking at any > code) to do this in a loop over alternate signatures until one works. > > Allan > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion I will take a look at multiple-dispatch for gufuncs. The discussion also suggests using an axis kwarg when calling a gufunc for which there is PR #1108 (https://github.com/numpy/numpy/pull/11018) discussion). Matti From einstein.edison at gmail.com Tue May 1 03:06:19 2018 From: einstein.edison at gmail.com (Hameer Abbasi) Date: Tue, 1 May 2018 03:06:19 -0400 Subject: [Numpy-discussion] Extending ufunc signature syntax for matmul, frozen dimensions In-Reply-To: References: Message-ID: I agree with Eric here. As one of the users of __array_ufunc__, I'd much rather have three separate gufuncs or a single one with axis insertion and removal. On 30/04/2018 at 23:38, Eric wrote: I think I?m -1 on this - this just makes things harder on the implementers of _array_ufunc__ who now might have to work out which signature matches. I?d prefer the solution where np.matmul is a wrapper around one of three gufuncs (or maybe just around one with axis insertion) - this is similar to how np.linalg already works. Eric On Mon, 30 Apr 2018 at 14:34 Stephan Hoyer wrote: On Sun, Apr 29, 2018 at 2:48 AM Matti Picus wrote: The proposed solution to issue #9029 is to extend the meaning of a signature so "syntax like (n?,k),(k,m?)->(n?,m?) could mean that n and m are optional dimensions; if missing in the input, they're treated as 1, and then dropped from the output" I agree that this is an elegant fix for matmul, but are there other use-cases for "optional dimensions" in gufuncs? It feels a little wrong to add gufunc features if we can only think of one function that can use them. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at python.org https://mail.python.org/mailman/listinfo/numpy-discussion From matti.picus at gmail.com Tue May 1 03:20:57 2018 From: matti.picus at gmail.com (Matti Picus) Date: Tue, 1 May 2018 10:20:57 +0300 Subject: [Numpy-discussion] Extending ufunc signature syntax for matmul, frozen dimensions In-Reply-To: References: Message-ID: On 01/05/18 00:38, Eric Wieser wrote: > > I think I?m -1 on this - this just makes things harder on the > implementers of |_array_ufunc__| who now might have to work out which > signature matches. I?d prefer the solution where |np.matmul| is a > wrapper around one of three gufuncs (or maybe just around one with > axis insertion) - this is similar to how np.linalg already works. > > Eric > > ? > > On Mon, 30 Apr 2018 at 14:34 Stephan Hoyer > wrote: > > On Sun, Apr 29, 2018 at 2:48 AM Matti Picus > wrote: > > The proposed solution to issue #9029 is to extend the meaning > of a signature?so "syntax like (n?,k),(k,m?)->(n?,m?) could > mean that n and m are optional dimensions; if missing in the > input, they're treated as 1, and > then dropped from the output" > > > I agree that this is an elegant fix for matmul, but are there > other use-cases for "optional dimensions" in gufuncs? > > It feels a little wrong to add gufunc features if we can only > think of one function that can use them. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion I will try to prototype this solution and put it up for comment, alongside the multi-signature one. Matti From nelle.varoquaux at gmail.com Tue May 1 12:59:41 2018 From: nelle.varoquaux at gmail.com (Nelle Varoquaux) Date: Tue, 1 May 2018 09:59:41 -0700 Subject: [Numpy-discussion] summary of "office Hours" open discusison April 25 In-Reply-To: <039ab238-be1f-c83f-8a2e-8925033b102a@gmail.com> References: <039ab238-be1f-c83f-8a2e-8925033b102a@gmail.com> Message-ID: > Furher resources to consider: > - How did Jupyter organize their roadmap (ask Brian Granger)? > - How did Pandas run the project with a full time maintainer (Jeff Reback)? > - Can we copy other projects' management guidelines? > scikit-learn also has a number of full time developers. Might be worth checking out what they did. Cheers, N > > We did not set a time for another online discussion, since it was felt > that maybe near/during the sprint in May would be appropriate. > > I apologize for any misrepresentation. > > Matti Picus > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Tue May 1 13:21:04 2018 From: shoyer at gmail.com (Stephan Hoyer) Date: Tue, 01 May 2018 17:21:04 +0000 Subject: [Numpy-discussion] summary of "office Hours" open discusison April 25 In-Reply-To: References: <039ab238-be1f-c83f-8a2e-8925033b102a@gmail.com> Message-ID: I'm happy to chat about how pandas has done things. It's worth noting that although it may *look* like Jeff Reback is a full-time maintainer (he does a lot of work!), he has actually been maintaining pandas as a side-project. Mostly the project bumbles along without a clear direction, somewhat similar to the case for NumPy for the past few years, with new contributions coming from either interested users or core developers when they have time and interest. On Tue, May 1, 2018 at 10:00 AM Nelle Varoquaux wrote: > Furher resources to consider: >> - How did Jupyter organize their roadmap (ask Brian Granger)? >> - How did Pandas run the project with a full time maintainer (Jeff >> Reback)? >> - Can we copy other projects' management guidelines? >> > > scikit-learn also has a number of full time developers. Might be worth > checking out what they did. > > Cheers, > N > > >> >> We did not set a time for another online discussion, since it was felt >> that maybe near/during the sprint in May would be appropriate. >> >> I apologize for any misrepresentation. >> >> Matti Picus >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.h.vankerkwijk at gmail.com Tue May 1 14:08:30 2018 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Tue, 1 May 2018 14:08:30 -0400 Subject: [Numpy-discussion] Extending ufunc signature syntax for matmul, frozen dimensions In-Reply-To: References: Message-ID: Just for completeness: there are *four* gufuncs (matmat, matvec, vecmat, and vecvec). I remain torn about the best way forward. The main argument against using them inside matmul is that in order to decide which of the four to use, matmul has to have access to the `shape` of the arguments. This meants that means that `__array_ufunc__` cannot be used to override `matmul` (or `@`) for any object which does not have a shape. >From that perspective, multiple signatures is definitely a more elegant solution. An advantage of the separate solution is that they are useful independently of whether they are used internally in `matmul`; though, then again, with a multi-signature matmul, these would be trivially created as convenience functions. -- Marten From robbmcleod at gmail.com Tue May 1 16:31:41 2018 From: robbmcleod at gmail.com (Robert McLeod) Date: Tue, 1 May 2018 13:31:41 -0700 Subject: [Numpy-discussion] ANN: NumExpr 2.6.5 Message-ID: ========================== Announcing Numexpr 2.6.5 ========================== Hi everyone, This is primarily an incremental performance improvement release, especially with regards to improving import times of downstream packages (e.g. `pandas`, `tables`, `sympy`). Import times have been reduced from ~300 ms to ~100 ms through removing a `pkg_resources` import and making the `cpuinfo` import lazy. The maximum number of threads is now set at import-time, similar to `numba`, by setting an environment variable 'NUMEXPR_MAX_THREADS'. The runtime number of threads can still be reduced by calling `numexpr.set_num_threads(N)`. DEPRECATION WARNING: The variable `numexpr.is_cpu_amd_intel` has been set to a dummy value of `False`. This variable may be removed in the future. Project documentation is available at: http://numexpr.readthedocs.io/ Changes from 2.6.4 to 2.6.5 --------------------------- - The maximum thread count can now be set at import-time by setting the environment variable 'NUMEXPR_MAX_THREADS'. The default number of max threads was lowered from 4096 (which was deemed excessive) to 64. - A number of imports were removed (pkg_resources) or made lazy (cpuinfo) in order to speed load-times for downstream packages (such as `pandas`, `sympy`, and `tables`). Import time has dropped from about 330 ms to 90 ms. Thanks to Jason Sachs for pointing out the source of the slow-down. - Thanks to Alvaro Lopez Ortega for updates to benchmarks to be compatible with Python 3. - Travis and AppVeyor now fail if the test module fails or errors. - Thanks to Mahdi Ben Jelloul for a patch that removed a bug where constants in `where` calls would raise a ValueError. - Fixed a bug whereby all-constant power operations would lead to infinite recursion. -- Robert McLeod, Ph.D. robbmcleod at gmail.com robbmcleod at protonmail.com robert.mcleod at hitachi-hhtc.ca www.entropyreduction.al -------------- next part -------------- An HTML attachment was scrubbed... URL: From einstein.edison at gmail.com Wed May 2 06:24:22 2018 From: einstein.edison at gmail.com (Hameer Abbasi) Date: Wed, 2 May 2018 06:24:22 -0400 Subject: [Numpy-discussion] Extending ufunc signature syntax for matmul, frozen dimensions In-Reply-To: References: Message-ID: There is always the option of any downstream object overriding matmul, and I fail to see which objects won't have a shape. - Hameer On 01/05/2018 at 21:08, Marten wrote: Just for completeness: there are *four* gufuncs (matmat, matvec, vecmat, and vecvec). I remain torn about the best way forward. The main argument against using them inside matmul is that in order to decide which of the four to use, matmul has to have access to the `shape` of the arguments. This meants that means that `__array_ufunc__` cannot be used to override `matmul` (or `@`) for any object which does not have a shape. >From that perspective, multiple signatures is definitely a more elegant solution. An advantage of the separate solution is that they are useful independently of whether they are used internally in `matmul`; though, then again, with a multi-signature matmul, these would be trivially created as convenience functions. -- Marten _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at python.org https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.h.vankerkwijk at gmail.com Wed May 2 11:39:41 2018 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Wed, 2 May 2018 11:39:41 -0400 Subject: [Numpy-discussion] Extending ufunc signature syntax for matmul, frozen dimensions In-Reply-To: References: Message-ID: On Wed, May 2, 2018 at 6:24 AM, Hameer Abbasi wrote: > There is always the option of any downstream object overriding matmul, and I > fail to see which objects won't have a shape. - Hameer I think we should not decide too readily on what is "reasonable" to expect for a ufunc input. For instance, I'm currently writing a chained-ufunc class which uses __array_ufunc__ to help make a chain (something like `chained_ufunc = np.sin(np.multiply(Input(), Input()))`). Here, my `Input` class defines `__array_ufunc__` but definitely does not have a shape, and I would like to be able to override `np.matmul` just like every other ufunc. -- Marten From shoyer at gmail.com Wed May 2 11:47:37 2018 From: shoyer at gmail.com (Stephan Hoyer) Date: Wed, 02 May 2018 15:47:37 +0000 Subject: [Numpy-discussion] Extending ufunc signature syntax for matmul, frozen dimensions In-Reply-To: References: Message-ID: On Wed, May 2, 2018 at 8:39 AM Marten van Kerkwijk < m.h.vankerkwijk at gmail.com> wrote: > I think we should not decide too readily on what is "reasonable" to > expect for a ufunc input. > I agree strongly with this. I can think of a couple of other use-cases off hand: - xarray.Dataset is a dict-like container of multiple arrays. Matrix-multiplication with a numpy array could make sense (just map over all the contained arrays), but xarray.Dataset itself is not an array and thus does not define shape. - tensorflow.Tensor can have a dynamic shape that is only known when computation is explicitly run, not when computation is defined in Python. The problem is even bigger for np.matmul because NumPy also wants to use the same logic for overriding @, and Python's built-in operators definitely should not have such restrictions. -------------- next part -------------- An HTML attachment was scrubbed... URL: From matti.picus at gmail.com Wed May 2 12:01:45 2018 From: matti.picus at gmail.com (Matti Picus) Date: Wed, 2 May 2018 19:01:45 +0300 Subject: [Numpy-discussion] Extending ufunc signature syntax for matmul, frozen dimensions In-Reply-To: References: Message-ID: <25e4dbdc-b4a8-51e9-cecd-0e586bb9d805@gmail.com> On 01/05/18 21:08, Marten van Kerkwijk wrote: > Just for completeness: there are *four* gufuncs (matmat, matvec, > vecmat, and vecvec). > > I remain torn about the best way forward. The main argument against > using them inside matmul is that in order to decide which of the four > to use, matmul has to have access to the `shape` of the arguments. > This meants that means that `__array_ufunc__` cannot be used to > override `matmul` (or `@`) for any object which does not have a shape. > From that perspective, multiple signatures is definitely a more > elegant solution. > > An advantage of the separate solution is that they are useful > independently of whether they are used internally in `matmul`; though, > then again, with a multi-signature matmul, these would be trivially > created as convenience functions. > > -- Marten > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion My goal is to solve issue #9028, "no way to override matmul/@ if __array_ufunc__ is set on other". Maybe I am too focused on that, it seems shape does not come into play here. Given a call to matmul(self, other) it appears to me that the decision to commit to self.matmul or to call other.__array_ufunc__("__call__", self.matmul, ...) is independent of the shapes and needs only nin and nout. In other words, the implementation of matmul becomes (simplified): (matmul(self, other) called)-> ??? (use __array_ufunc__ and nin and nout to decide whether to defer to other's __array_ufunc__ via PyUFunc_CheckOverride which implements NEP13) -> ??????? (yes: call other.__array_ufunc__ as for any other ufunc), ??????? (no: call matmul like we currently do, no more __aray__ufunc__ testing needed) So the two avenues I am trying are 1) make matmul a gufunc and then it will automatically use the __array_ufunc__ machinery without any added changes, but this requires expanding the meaning of a signature to allow dispatch 2) generalize the __array_ufunc__ machinery to handle some kind of wrapped function, the wrapper knows about nin and nout and calls PyUFunc_CheckOverride, which would allow matmul to work unchanged and might support other functions as well. The issue of whether matmat, vecmat, matvec, and vecvec are functions, gufuncs accessible from userland, or not defined at all is secondary to the current issue of overriding matmul , we can decide that in the future. If we do create ufuncs for these variants, calling a.vecmat(other) for instance will still resolve to other's __array_ufunc__ without needing to explore other's shape. I probably misunderstood what you were driving at because I am so focused on this particular issue. Matti From m.h.vankerkwijk at gmail.com Wed May 2 12:43:44 2018 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Wed, 2 May 2018 12:43:44 -0400 Subject: [Numpy-discussion] Extending ufunc signature syntax for matmul, frozen dimensions In-Reply-To: <25e4dbdc-b4a8-51e9-cecd-0e586bb9d805@gmail.com> References: <25e4dbdc-b4a8-51e9-cecd-0e586bb9d805@gmail.com> Message-ID: Hi Matti, In the original implementation of what was then __numpy_ufunc__, we had overrides for both `np.dot` and `np.matmul` that worked exactly as your option (2), but we decided in the end that those really are not true ufuncs and we should not include ufunc mimics in the mix as someone using `__array_ufunc__` should be able to count on being passed a ufunc, including all its properties. Perhaps this needs revisiting, and we should have some UFuncABC... But my own feeling remains that matmul is close enough to a (set of) gufunc that making it fit the gufunc mold is the way to go... All the best, Marten From pav at iki.fi Sat May 5 14:36:03 2018 From: pav at iki.fi (Pauli Virtanen) Date: Sat, 05 May 2018 20:36:03 +0200 Subject: [Numpy-discussion] ANN: SciPy 1.1.0 released Message-ID: <236a10a327111484ffc775d388704ebcca6500a7.camel@iki.fi> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Hi all, On behalf of the SciPy development team I'm pleased to announce the SciPy 1.1.0 release. Sources and binary wheels can be found at https://pypi.python.org/pypi/scipy and at https://github.com/scipy/scipy/releases/tag/v1.1.0. To install with pip: pip install scipy==1.1.0 Thanks to everyone who contributed to this release! ========================= SciPy 1.1.0 Release Notes ========================= SciPy 1.1.0 is the culmination of 7 months of hard work. It contains many new features, numerous bug-fixes, improved test coverage and better documentation. There have been a number of deprecations and API changes in this release, which are documented below. All users are encouraged to upgrade to this release, as there are a large number of bug-fixes and optimizations. Before upgrading, we recommend that users check that their own code does not use deprecated SciPy functionality (to do so, run your code with ``python -Wd`` and check for ``DeprecationWarning`` s). Our development attention will now shift to bug-fix releases on the 1.1.x branch, and on adding new features on the master branch. This release requires Python 2.7 or 3.4+ and NumPy 1.8.2 or greater. This release has improved but not necessarily 100% compatibility with the `PyPy `__ Python implementation. For running on PyPy, PyPy 6.0+ and Numpy 1.15.0+ are required. New features ============ `scipy.integrate` improvements - ------------------------------ The argument ``tfirst`` has been added to the function `scipy.integrate.odeint`. This allows odeint to use the same user functions as `scipy.integrate.solve_ivp` and `scipy.integrate.ode` without the need for wrapping them in a function that swaps the first two arguments. Error messages from ``quad()`` are now clearer. `scipy.linalg` improvements - --------------------------- The function `scipy.linalg.ldl` has been added for factorization of indefinite symmetric/hermitian matrices into triangular and block diagonal matrices. Python wrappers for LAPACK ``sygst``, ``hegst`` added in `scipy.linalg.lapack`. Added `scipy.linalg.null_space`, `scipy.linalg.cdf2rdf`, `scipy.linalg.rsf2csf`. `scipy.misc` improvements - ------------------------- An electrocardiogram has been added as an example dataset for a one-dimensional signal. It can be accessed through `scipy.misc.electrocardiogram`. `scipy.ndimage` improvements - ---------------------------- The routines `scipy.ndimage.binary_opening`, and `scipy.ndimage.binary_closing` now support masks and different border values. `scipy.optimize` improvements - ----------------------------- The method ``trust-constr`` has been added to `scipy.optimize.minimize`. The method switches between two implementations depending on the problem definition. For equality constrained problems it is an implementation of a trust-region sequential quadratic programming solver and, when inequality constraints are imposed, it switches to a trust-region interior point method. Both methods are appropriate for large scale problems. Quasi-Newton options BFGS and SR1 were implemented and can be used to approximate second order derivatives for this new method. Also, finite-differences can be used to approximate either first-order or second-order derivatives. Random-to-Best/1/bin and Random-to-Best/1/exp mutation strategies were added to `scipy.optimize.differential_evolution` as ``randtobest1bin`` and ``randtobest1exp``, respectively. Note: These names were already in use but implemented a different mutation strategy. See `Backwards incompatible changes <#backwards-incompatible-changes>`__, below. The ``init`` keyword for the `scipy.optimize.differential_evolution` function can now accept an array. This array allows the user to specify the entire population. Add an ``adaptive`` option to Nelder-Mead to use step parameters adapted to the dimensionality of the problem. Minor improvements in `scipy.optimize.basinhopping`. `scipy.signal` improvements - --------------------------- Three new functions for peak finding in one-dimensional arrays were added. `scipy.signal.find_peaks` searches for peaks (local maxima) based on simple value comparison of neighbouring samples and returns those peaks whose properties match optionally specified conditions for their height, prominence, width, threshold and distance to each other. `scipy.signal.peak_prominences` and `scipy.signal.peak_widths` can directly calculate the prominences or widths of known peaks. Added ZPK versions of frequency transformations: `scipy.signal.bilinear_zpk`, `scipy.signal.lp2bp_zpk`, `scipy.signal.lp2bs_zpk`, `scipy.signal.lp2hp_zpk`, `scipy.signal.lp2lp_zpk`. Added `scipy.signal.windows.dpss`, `scipy.signal.windows.general_cosine` and `scipy.signal.windows.general_hamming`. `scipy.sparse` improvements - --------------------------- Previously, the ``reshape`` method only worked on `scipy.sparse.lil_matrix`, and in-place reshaping did not work on any matrices. Both operations are now implemented for all matrices. Handling of shapes has been made consistent with ``numpy.matrix`` throughout the `scipy.sparse` module (shape can be a tuple or splatted, negative number acts as placeholder, padding and unpadding dimensions of size 1 to ensure length-2 shape). `scipy.special` improvements - ---------------------------- Added Owen?s T function as `scipy.special.owens_t`. Accuracy improvements in ``chndtr``, ``digamma``, ``gammaincinv``, ``lambertw``, ``zetac``. `scipy.stats` improvements - -------------------------- The Moyal distribution has been added as `scipy.stats.moyal`. Added the normal inverse Gaussian distribution as `scipy.stats.norminvgauss`. Deprecated features =================== The iterative linear equation solvers in `scipy.sparse.linalg` had a sub-optimal way of how absolute tolerance is considered. The default behavior will be changed in a future Scipy release to a more standard and less surprising one. To silence deprecation warnings, set the ``atol=`` parameter explicitly. `scipy.signal.windows.slepian` is deprecated, replaced by `scipy.signal.windows.dpss`. The window functions in `scipy.signal` are now available in `scipy.signal.windows`. They will remain also available in the old location in the `scipy.signal` namespace in future Scipy versions. However, importing them from `scipy.signal.windows` is preferred, and new window functions will be added only there. Indexing sparse matrices with floating-point numbers instead of integers is deprecated. The function `scipy.stats.itemfreq` is deprecated. Backwards incompatible changes ============================== Previously, `scipy.linalg.orth` used a singular value cutoff value appropriate for double precision numbers also for single-precision input. The cutoff value is now tunable, and the default has been changed to depend on the input data precision. In previous versions of Scipy, the ``randtobest1bin`` and ``randtobest1exp`` mutation strategies in `scipy.optimize.differential_evolution` were actually implemented using the Current-to-Best/1/bin and Current-to-Best/1/exp strategies, respectively. These strategies were renamed to ``currenttobest1bin`` and ``currenttobest1exp`` and the implementations of ``randtobest1bin`` and ``randtobest1exp`` strategies were corrected. Functions in the ndimage module now always return their output array. Before this most functions only returned the output array if it had been allocated by the function, and would return ``None`` if it had been provided by the user. Distance metrics in `scipy.spatial.distance` now require non-negative weights. `scipy.special.loggamma` returns now real-valued result when the input is real-valued. Other changes ============= When building on Linux with GNU compilers, the ``.so`` Python extension files now hide all symbols except those required by Python, which can avoid problems when embedding the Python interpreter. Authors ======= * Saurabh Agarwal + * Diogo Aguiam + * Joseph Albert + * Gerrit Ansmann + * Jean-Fran?ois B + * Vahan Babayan + * Alessandro Pietro Bardelli * Christoph Baumgarten + * Felix Berkenkamp * Lilian Besson + * Aditya Bharti + * Matthew Brett * Evgeni Burovski * CJ Carey * Martin ?. Christensen + * Robert Cimrman * Vicky Close + * Peter Cock + * Philip DeBoer * Jaime Fernandez del Rio * Dieter Werthm?ller + * Tom Donoghue + * Matt Dzugan + * Lars G + * Jacques Gaudin + * Andriy Gelman + * Sean Gillies + * Dezmond Goff * Christoph Gohlke * Ralf Gommers * Uri Goren + * Deepak Kumar Gouda + * Douglas Lessa Graciosa + * Matt Haberland * David Hagen * Charles Harris * Jordan Heemskerk + * Danny Hermes + * Stephan Hoyer + * Theodore Hu + * Jean-Fran?ois B. + * Mads Jensen + * Jon Haitz Legarreta Gorro?o + * Ben Jude + * Noel Kippers + * Julius Bier Kirkegaard + * Maria Knorps + * Mikkel Kristensen + * Eric Larson * Kasper Primdal Lauritzen + * Denis Laxalde * KangWon Lee + * Jan Lehky + * Jackie Leng + * P.L. Lim + * Nikolay Mayorov * Mihai Capot? + * Max Mikhaylov + * Mark Mikofski + * Jarrod Millman * Raden Muhammad + * Paul Nation * Andrew Nelson * Nico Schl?mer * Joel Nothman * Kyle Oman + * Egor Panfilov + * Nick Papior * Anubhav Patel + * Oleksandr Pavlyk * Ilhan Polat * Robert Pollak + * Anant Prakash + * Aman Pratik * Sean Quinn + * Giftlin Rajaiah + * Tyler Reddy * Joscha Reimer * Antonio H Ribeiro + * Antonio Horta Ribeiro * Benjamin Rose + * Fabian Rost * Divakar Roy + * Scott Sievert * Leo Singer * Sourav Singh * Martino Sorbaro + * Eric Stansifer + * Martin Thoma * Phil Tooley + * Piotr Uchwat + * Paul van Mulbregt * Pauli Virtanen * Stefan van der Walt * Warren Weckesser * Florian Weimer + * Eric Wieser * Josh Wilson * Ted Ying + * Evgeny Zhurko * Z? Vin?cius * @Astrofysicus + * @awakenting + * @endolith * @FormerPhysicist + * @gaulinmp + * @hugovk * @ksemb + * @kshitij12345 + * @luzpaz + * @NKrvavica + * @rafalalgo + * @samyak0210 + * @soluwalana + * @sudheerachary + * @Tokixix + * @tttthomasssss + * @vkk800 + * @xoviat * @ziejcow + A total of 122 people contributed to this release. People with a "+" by their names contributed a patch for the first time. This list of names is automatically generated, and may not be fully complete. Issues closed for 1.1.0 - ----------------------- * `#979 `__: Allow Hermitian matrices in lobpcg (Trac #452) * `#2694 `__: Solution of iterative solvers can be less accurate than tolerance... * `#3164 `__: RectBivariateSpline usage inconsistent with other interpolation... * `#4161 `__: Missing ITMAX optional argument in scipy.optimize.nnls * `#4354 `__: signal.slepian should use definition of digital window * `#4866 `__: Shouldn't scipy.linalg.sqrtm raise an error if matrix is singular? * `#4953 `__: The dirichlet distribution unnecessarily requires strictly positive... * `#5336 `__: sqrtm on a diagonal matrix can warn "Matrix is singular and may... * `#5922 `__: Suboptimal convergence of Halley's method? * `#6036 `__: Incorrect edge case in scipy.stats.triang.pdf * `#6202 `__: Enhancement: Add LDLt factorization to scipy * `#6589 `__: sparse.random with custom rvs callable does pass on arg to subclass * `#6654 `__: Spearman's rank correlation coefficient slow with nan values... * `#6794 `__: Remove NumarrayType struct with numarray type names from ndimage * `#7136 `__: The dirichlet distribution unnecessarily rejects probabilities... * `#7169 `__: Will it be possible to add LDL' factorization for Hermitian indefinite... * `#7291 `__: fsolve docs should say it doesn't handle over- or under-determined... * `#7453 `__: binary_opening/binary_closing missing arguments * `#7500 `__: linalg.solve test failure on OS X with Accelerate * `#7555 `__: Integratig a function with singularities using the quad routine * `#7624 `__: allow setting both absolute and relative tolerance of sparse... * `#7724 `__: odeint documentation refers to t0 instead of t * `#7746 `__: False CDF values for skew normal distribution * `#7750 `__: mstats.winsorize documentation needs clarification * `#7787 `__: Documentation error in spherical Bessel, Neumann, modified spherical... * `#7836 `__: Scipy mmwrite incorrectly writes the zeros for skew-symmetric,... * `#7839 `__: sqrtm is unable to compute square root of zero matrix * `#7847 `__: solve is very slow since #6775 * `#7888 `__: Scipy 1.0.0b1 prints spurious DVODE/ZVODE/lsoda messages * `#7909 `__: bessel kv function in 0 is nan * `#7915 `__: LinearOperator's __init__ runs two times when instantiating the... * `#7958 `__: integrate.quad could use better error messages when given bad... * `#7968 `__: integrate.quad handles decreasing limits (b`__: ENH: matching return dtype for loggamma/gammaln * `#7991 `__: `lfilter` segfaults for integer inputs * `#8076 `__: "make dist" for the docs doesn't complete cleanly * `#8080 `__: Use JSON in `special/_generate_pyx.py`? * `#8127 `__: scipy.special.psi(x) very slow for some values of x * `#8145 `__: BUG: ndimage geometric_transform and zoom using deprecated NumPy... * `#8158 `__: BUG: romb print output requires correction * `#8181 `__: loadmat() raises TypeError instead of FileNotFound when reading... * `#8228 `__: bug for log1p on csr_matrix * `#8235 `__: scipy.stats multinomial pmf return nan * `#8271 `__: scipy.io.mmwrite raises type error for uint16 * `#8288 `__: Should tests be written for scipy.sparse.linalg.isolve.minres... * `#8298 `__: Broken links on scipy API web page * `#8329 `__: `_gels` fails for fat A matrix * `#8346 `__: Avoidable overflow in scipy.special.binom(n, k) * `#8371 `__: BUG: special: zetac(x) returns 0 for x < -30.8148 * `#8382 `__: collections.OrderedDict in test_mio.py * `#8492 `__: Missing documentation for `brute_force` parameter in scipy.ndimage.morphology * `#8532 `__: leastsq needlessly appends extra dimension for scalar problems * `#8544 `__: [feature request] Convert complex diagonal form to real block... * `#8561 `__: [Bug?] Example of Bland's Rule for optimize.linprog (simplex)... * `#8562 `__: CI: Appveyor builds fail because it can't import ConvexHull from... * `#8576 `__: BUG: optimize: `show_options(solver='minimize', method='Newton-CG')`... * `#8603 `__: test_roots_gegenbauer/chebyt/chebyc failures on manylinux * `#8604 `__: Test failures in scipy.sparse test_inplace_dense * `#8616 `__: special: ellpj.c code can be cleaned up a bit * `#8625 `__: scipy 1.0.1 no longer allows overwriting variables in netcdf... * `#8629 `__: gcrotmk.test_atol failure with MKL * `#8632 `__: Sigma clipping on data with the same value * `#8646 `__: scipy.special.sinpi test failures in test_zero_sign on old MSVC * `#8663 `__: linprog with method=interior-point produced incorrect answer... * `#8694 `__: linalg:TestSolve.test_all_type_size_routine_combinations fails... * `#8703 `__: Q: Does runtests.py --refguide-check need env (or other) variables... Pull requests for 1.1.0 - ----------------------- * `#6590 `__: BUG: sparse: fix custom rvs callable argument in sparse.random * `#7004 `__: ENH: scipy.linalg.eigsh cannot get all eigenvalues * `#7120 `__: ENH: implemented Owen's T function * `#7483 `__: ENH: Addition/multiplication operators for StateSpace systems * `#7566 `__: Informative exception when passing a sparse matrix * `#7592 `__: Adaptive Nelder-Mead * `#7729 `__: WIP: ENH: optimize: large-scale constrained optimization algorithms... * `#7802 `__: MRG: Add dpss window function * `#7803 `__: DOC: Add examples to spatial.distance * `#7821 `__: Add Returns section to the docstring * `#7833 `__: ENH: Performance improvements in scipy.linalg.special_matrices * `#7864 `__: MAINT: sparse: Simplify sputils.isintlike * `#7865 `__: ENH: Improved speed of copy into L, U matrices * `#7871 `__: ENH: sparse: Add 64-bit integer to sparsetools * `#7879 `__: ENH: re-enabled old sv lapack routine as defaults * `#7889 `__: DOC: Show probability density functions as math * `#7900 `__: API: Soft deprecate signal.* windows * `#7910 `__: ENH: allow `sqrtm` to compute the root of some singular matrices * `#7911 `__: MAINT: Avoid unnecessary array copies in xdist * `#7913 `__: DOC: Clarifies the meaning of `initial` of scipy.integrate.cumtrapz() * `#7916 `__: BUG: sparse.linalg: fix wrong use of __new__ in LinearOperator * `#7921 `__: BENCH: split spatial benchmark imports * `#7927 `__: ENH: added sygst/hegst routines to lapack * `#7934 `__: MAINT: add `io/_test_fortranmodule` to `.gitignore` * `#7936 `__: DOC: Fixed typo in scipy.special.roots_jacobi documentation * `#7937 `__: MAINT: special: Mark a test that fails on i686 as a known failure. * `#7941 `__: ENH: LDLt decomposition for indefinite symmetric/hermitian matrices * `#7945 `__: ENH: Implement reshape method on sparse matrices * `#7947 `__: DOC: update docs on releasing and installing/upgrading * `#7954 `__: Basin-hopping changes * `#7964 `__: BUG: test_falker not robust against numerical fuss in eigenvalues * `#7967 `__: QUADPACK Errors - human friendly errors to replace 'Invalid Input' * `#7975 `__: Make sure integrate.quad doesn't double-count singular points * `#7978 `__: TST: ensure negative weights are not allowed in distance metrics * `#7980 `__: MAINT: Truncate the warning msg about ill-conditioning * `#7981 `__: BUG: special: fix hyp2f1 behavior in certain circumstances * `#7983 `__: ENH: special: Add a real dispatch to `loggamma` * `#7989 `__: BUG: special: make `kv` return `inf` at a zero real argument * `#7990 `__: TST: special: test ufuncs in special at `nan` inputs * `#7994 `__: DOC: special: fix typo in spherical Bessel function documentation * `#7995 `__: ENH: linalg: add null_space for computing null spaces via svd * `#7999 `__: BUG: optimize: Protect _minpack calls with a lock. * `#8003 `__: MAINT: consolidate c99 compatibility * `#8004 `__: TST: special: get all `cython_special` tests running again * `#8006 `__: MAINT: Consolidate an additional _c99compat.h * `#8011 `__: Add new example of integrate.quad * `#8015 `__: DOC: special: remove `jn` from the refguide (again) * `#8018 `__: BUG - Issue with uint datatypes for array in get_index_dtype * `#8021 `__: DOC: spatial: Simplify Delaunay plotting * `#8024 `__: Documentation fix * `#8027 `__: BUG: io.matlab: fix saving unicode matrix names on py2 * `#8028 `__: BUG: special: some fixes for `lambertw` * `#8030 `__: MAINT: Bump Cython version * `#8034 `__: BUG: sparse.linalg: fix corner-case bug in expm * `#8035 `__: MAINT: special: remove complex division hack * `#8038 `__: ENH: Cythonize pyx files if pxd dependencies change * `#8042 `__: TST: stats: reduce required precision in test_fligner * `#8043 `__: TST: Use diff. values for decimal keyword for single and doubles * `#8044 `__: TST: accuracy of tests made different for singles and doubles * `#8049 `__: Unhelpful error message when calling scipy.sparse.save_npz on... * `#8052 `__: TST: spatial: add a regression test for gh-8051 * `#8059 `__: BUG: special: fix ufunc results for `nan` arguments * `#8066 `__: MAINT: special: reimplement inverses of incomplete gamma functions * `#8072 `__: Example for scipy.fftpack.ifft, https://github.com/scipy/scipy/issues/7168 * `#8073 `__: Example for ifftn, https://github.com/scipy/scipy/issues/7168 * `#8078 `__: Link to CoC in contributing.rst doc * `#8085 `__: BLD: Fix npy_isnan of integer variables in cephes * `#8088 `__: DOC: note version for which new attributes have been added to... * `#8090 `__: BUG: special: add nan check to `_legacy_cast_check` functions * `#8091 `__: Doxy Typos + trivial comment typos (2nd attempt) * `#8096 `__: TST: special: simplify `Arg` * `#8101 `__: MAINT: special: run `_generate_pyx.py` when `add_newdocs.py`... * `#8104 `__: Input checking for scipy.sparse.linalg.inverse() * `#8105 `__: DOC: special: Update the 'euler' docstring. * `#8109 `__: MAINT: fixing code comments and hyp2f1 docstring: see issues... * `#8112 `__: More trivial typos * `#8113 `__: MAINT: special: generate test data npz files in setup.py and... * `#8116 `__: DOC: add build instructions * `#8120 `__: DOC: Clean up README * `#8121 `__: DOC: Add missing colons in docstrings * `#8123 `__: BLD: update Bento build config files for recent C99 changes. * `#8124 `__: Change to avoid use of `fmod` in scipy.signal.chebwin * `#8126 `__: Added examples for mode arg in geometric_transform * `#8128 `__: relax relative tolerance parameter in TestMinumumPhase.test_hilbert * `#8129 `__: ENH: special: use rational approximation for `digamma` on `[1,... * `#8137 `__: DOC Correct matrix width * `#8141 `__: MAINT: optimize: remove unused `__main__` code in L-BSGS-B * `#8147 `__: BLD: update Bento build for removal of .npz scipy.special test... * `#8148 `__: Alias hanning as an explanatory function of hann * `#8149 `__: MAINT: special: small fixes for `digamma` * `#8159 `__: Update version classifiers * `#8164 `__: BUG: riccati solvers don't catch ill-conditioned problems sufficiently... * `#8168 `__: DOC: release note for sparse resize methods * `#8170 `__: BUG: correctly pad netCDF files with null bytes * `#8171 `__: ENH added normal inverse gaussian distribution to scipy.stats * `#8175 `__: DOC: Add example to scipy.ndimage.zoom * `#8177 `__: MAINT: diffev small speedup in ensure constraint * `#8178 `__: FIX: linalg._qz String formatter syntax error * `#8179 `__: TST: Added pdist to asv spatial benchmark suite * `#8180 `__: TST: ensure constraint test improved * `#8183 `__: 0d conj correlate * `#8186 `__: BUG: special: fix derivative of `spherical_jn(1, 0)` * `#8194 `__: Fix warning message * `#8196 `__: BUG: correctly handle inputs with nan's and ties in spearmanr * `#8198 `__: MAINT: stats.triang edge case fixes #6036 * `#8200 `__: DOC: Completed "Examples" sections of all linalg funcs * `#8201 `__: MAINT: stats.trapz edge cases * `#8204 `__: ENH: sparse.linalg/lobpcg: change .T to .T.conj() to support... * `#8206 `__: MAINT: missed triang edge case. * `#8214 `__: BUG: Fix memory corruption in linalg._decomp_update C extension * `#8222 `__: DOC: recommend scipy.integrate.solve_ivp * `#8223 `__: ENH: added Moyal distribution to scipy.stats * `#8232 `__: BUG: sparse: Use deduped data for numpy ufuncs * `#8236 `__: Fix #8235 * `#8253 `__: BUG: optimize: fix bug related with function call calculation... * `#8264 `__: ENH: Extend peak finding capabilities in scipy.signal * `#8273 `__: BUG fixed printing of convergence message in minimize_scalar... * `#8276 `__: DOC: Add notes to explain constrains on overwrite_<> * `#8279 `__: CI: fixing doctests * `#8282 `__: MAINT: weightedtau, change search for nan * `#8287 `__: Improving documentation of solve_ivp and the underlying solvers * `#8291 `__: DOC: fix non-ascii characters in docstrings which broke the doc... * `#8292 `__: CI: use numpy 1.13 for refguide check build * `#8296 `__: Fixed bug reported in issue #8181 * `#8297 `__: DOC: Examples for linalg/decomp eigvals function * `#8300 `__: MAINT: Housekeeping for minimizing the linalg compiler warnings * `#8301 `__: DOC: make public API documentation cross-link to refguide. * `#8302 `__: make sure _onenorm_matrix_power_nnm actually returns a float * `#8313 `__: Change copyright to outdated 2008-2016 to 2008-year * `#8315 `__: TST: Add tests for `scipy.sparse.linalg.isolve.minres` * `#8318 `__: ENH: odeint: Add the argument 'tfirst' to odeint. * `#8328 `__: ENH: optimize: ``trust-constr`` optimization algorithms [GSoC... * `#8330 `__: ENH: add a maxiter argument to NNLS * `#8331 `__: DOC: tweak the Moyal distribution docstring * `#8333 `__: FIX: Rewrapped ?gels and ?gels_lwork routines * `#8336 `__: MAINT: integrate: handle b < a in quad * `#8337 `__: BUG: special: Ensure zetac(1) returns inf. * `#8347 `__: BUG: Fix overflow in special.binom. Issue #8346 * `#8356 `__: DOC: Corrected Documentation Issue #7750 winsorize function * `#8358 `__: ENH: stats: Use explicit MLE formulas in lognorm.fit and expon.fit * `#8374 `__: BUG: gh7854, maxiter for l-bfgs-b closes #7854 * `#8379 `__: CI: enable gcov coverage on travis * `#8383 `__: Removed collections.OrderedDict import ignore. * `#8384 `__: TravisCI: tool pep8 is now pycodestyle * `#8387 `__: MAINT: special: remove unused specfun code for Struve functions * `#8393 `__: DOC: Replace old type names in ndimage tutorial. * `#8400 `__: Fix tolerance specification in sparse.linalg iterative solvers * `#8402 `__: MAINT: Some small cleanups in ndimage. * `#8403 `__: FIX: Make scipy.optimize.zeros run under PyPy * `#8407 `__: BUG: sparse.linalg: fix termination bugs for cg, cgs * `#8409 `__: MAINT: special: add a `.pxd` file for Cephes functions * `#8412 `__: MAINT: special: remove `cephes/protos.h` * `#8421 `__: Setting "unknown" message in OptimizeResult when calling MINPACK. * `#8423 `__: FIX: Handle unsigned integers in mmio * `#8426 `__: DOC: correct FAQ entry on Apache license compatibility. Closes... * `#8433 `__: MAINT: add `.pytest_cache` to the `.gitignore` * `#8436 `__: MAINT: scipy.sparse: less copies at transpose method * `#8437 `__: BUG: correct behavior for skew-symmetric matrices in io.mmwrite * `#8440 `__: DOC:Add examples to integrate.quadpack docstrings * `#8441 `__: BUG: sparse.linalg/gmres: deal with exact breakdown in gmres * `#8442 `__: MAINT: special: clean up Cephes header files * `#8448 `__: TST: Generalize doctest stopwords .axis( .plot( * `#8457 `__: MAINT: special: use JSON for function signatures in `_generate_pyx.py` * `#8461 `__: MAINT: Simplify return value of ndimage functions. * `#8464 `__: MAINT: Trivial typos * `#8474 `__: BUG: spatial: make qhull.pyx more pypy-friendly * `#8476 `__: TST: _lib: disable refcounting tests on PyPy * `#8479 `__: BUG: io/matlab: fix issues in matlab i/o on pypy * `#8481 `__: DOC: Example for signal.cmplx_sort * `#8482 `__: TST: integrate: use integers instead of PyCapsules to store pointers * `#8483 `__: ENH: io/netcdf: make mmap=False the default on PyPy * `#8484 `__: BUG: io/matlab: work around issue in to_writeable on PyPy * `#8488 `__: MAINT: special: add const/static specifiers where possible * `#8489 `__: BUG: ENH: use common halley's method instead of parabolic variant * `#8491 `__: DOC: fix typos * `#8496 `__: ENH: special: make Chebyshev nodes symmetric * `#8501 `__: BUG: stats: Split the integral used to compute skewnorm.cdf. * `#8502 `__: WIP: Port CircleCI to v2 * `#8507 `__: DOC: Add missing description to `brute_force` parameter. * `#8509 `__: BENCH: forgot to add nelder-mead to list of methods * `#8512 `__: MAINT: Move spline interpolation code to spline.c * `#8513 `__: TST: special: mark a slow test as xslow * `#8514 `__: CircleCI: Share data between jobs * `#8515 `__: ENH: special: improve accuracy of `zetac` for negative arguments * `#8520 `__: TST: Decrease the array sizes for two linalg tests * `#8522 `__: TST: special: restrict range of `test_besselk`/`test_besselk_int` * `#8527 `__: Documentation - example added for voronoi_plot_2d * `#8528 `__: DOC: Better, shared docstrings in ndimage * `#8533 `__: BUG: Fix PEP8 errors introduced in #8528. * `#8534 `__: ENH: Expose additional window functions * `#8538 `__: MAINT: Fix a couple mistakes in .pyf files. * `#8540 `__: ENH: interpolate: allow string aliases in make_interp_spline... * `#8541 `__: ENH: Cythonize peak_prominences * `#8542 `__: Remove numerical arguments from convolve2d / correlate2d * `#8546 `__: ENH: New arguments, documentation, and tests for ndimage.binary_opening * `#8547 `__: Giving both size and input now raises UserWarning (#7334) * `#8549 `__: DOC: stats: invweibull is also known as Frechet or type II extreme... * `#8550 `__: add cdf2rdf function * `#8551 `__: ENH: Port of most of the dd_real part of the qd high-precision... * `#8553 `__: Note in docs to address issue #3164. * `#8554 `__: ENH: stats: Use explicit MLE formulas in uniform.fit() * `#8555 `__: MAINT: adjust benchmark config * `#8557 `__: [DOC]: fix Nakagami density docstring * `#8559 `__: DOC: Fix docstring of diric(x, n) * `#8563 `__: [DOC]: fix gamma density docstring * `#8564 `__: BLD: change default Python version for doc build from 2.7 to... * `#8568 `__: BUG: Fixes Bland's Rule for pivot row/leaving variable, closes... * `#8572 `__: ENH: Add previous/next to interp1d * `#8578 `__: Example for linalg.eig() * `#8580 `__: DOC: update link to asv docs * `#8584 `__: filter_design: switch to explicit arguments, keeping None as... * `#8586 `__: DOC: stats: Add parentheses that were missing in the exponnorm... * `#8587 `__: TST: add benchmark for newton, secant, halley * `#8588 `__: DOC: special: Remove heaviside from "functions not in special"... * `#8591 `__: DOC: cdf2rdf Added version info and "See also" * `#8594 `__: ENH: Cythonize peak_widths * `#8595 `__: MAINT/ENH/BUG/TST: cdf2rdf: Address review comments made after... * `#8597 `__: DOC: add versionadded 1.1.0 for new keywords in ndimage.morphology * `#8605 `__: MAINT: special: improve implementations of `sinpi` and `cospi` * `#8607 `__: MAINT: add 2D benchmarks for convolve * `#8608 `__: FIX: Fix int check * `#8613 `__: fix typo in doc of signal.peak_widths * `#8615 `__: TST: fix failing linalg.qz float32 test by decreasing precision. * `#8617 `__: MAINT: clean up code in ellpj.c * `#8618 `__: add fsolve docs it doesn't handle over- or under-determined problems * `#8620 `__: DOC: add note on dtype attribute of aslinearoperator() argument * `#8627 `__: ENH: Add example 1D signal (ECG) to scipy.misc * `#8630 `__: ENH: Remove unnecessary copying in stats.percentileofscore * `#8631 `__: BLD: fix pdf doc build. closes gh-8076 * `#8633 `__: BUG: fix regression in `io.netcdf_file` with append mode. * `#8635 `__: MAINT: remove spurious warning from (z)vode and lsoda. Closes... * `#8636 `__: BUG: sparse.linalg/gcrotmk: avoid rounding error in termination... * `#8637 `__: For pdf build * `#8639 `__: CI: build pdf documentation on circleci * `#8640 `__: TST: fix special test that was importing `np.testing.utils` (deprecated) * `#8641 `__: BUG: optimize: fixed sparse redundancy removal bug * `#8645 `__: BUG: modified sigmaclip to avoid clipping of constant input in... * `#8647 `__: TST: sparse: skip test_inplace_dense for numpy<1.13 * `#8657 `__: Latex reduce left margins * `#8659 `__: TST: special: skip sign-of-zero test on 32-bit win32 with old... * `#8661 `__: Fix dblquad and tplquad not accepting float boundaries * `#8666 `__: DOC: fixes #8532 * `#8667 `__: BUG: optimize: fixed issue #8663 * `#8668 `__: Fix example in docstring of netcdf_file * `#8671 `__: DOC: Replace deprecated matplotlib kwarg * `#8673 `__: BUG: special: Use a stricter tolerance for the chndtr calculation. * `#8674 `__: ENH: In the Dirichlet distribution allow x_i to be 0 if alpha_i... * `#8676 `__: BUG: optimize: partial fix to linprog fails to detect infeasibility... * `#8685 `__: DOC: Add interp1d-next/previous example to tutorial * `#8687 `__: TST: netcdf: explicit mmap=True in test * `#8688 `__: BUG: signal, stats: use Python sum() instead of np.sum for summing... * `#8689 `__: TST: bump tolerances in tests * `#8690 `__: DEP: deprecate stats.itemfreq * `#8691 `__: BLD: special: fix build vs. dd_real.h package * `#8695 `__: DOC: Improve examples in signal.find_peaks with ECG signal * `#8697 `__: BUG: Fix `setup.py build install egg_info`, which did not previously... * `#8704 `__: TST: linalg: drop large size from solve() test * `#8705 `__: DOC: Describe signal.find_peaks and related functions behavior... * `#8706 `__: DOC: Specify encoding of rst file, remove an ambiguity in an... * `#8710 `__: MAINT: fix an import cycle sparse -> special -> integrate ->... * `#8711 `__: ENH: remove an avoidable overflow in scipy.stats.norminvgauss.pdf() * `#8716 `__: BUG: interpolate: allow list inputs for make_interp_spline(...,... * `#8720 `__: np.testing import that is compatible with numpy 1.15 * `#8724 `__: CI: don't use pyproject.toml in the CI builds Checksums ========= MD5 ~~~ 5f5dac4aeb117e977eba6b57231f467a scipy-1.1.0-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl 8f47f7779047f19ab0b54821bfbf699e scipy-1.1.0-cp27-cp27m-manylinux1_i686.whl 4cd3a7840cccb7bd8001cca94cd5264d scipy-1.1.0-cp27-cp27m-manylinux1_x86_64.whl 5a6981b403117237066a289df8a3e41e scipy-1.1.0-cp27-cp27mu-manylinux1_i686.whl 1370771ae0d6032c415cd1ff74be0308 scipy-1.1.0-cp27-cp27mu-manylinux1_x86_64.whl 6fd9d352028851983efadcd7cc93486a scipy-1.1.0-cp27-none-win32.whl af1a53b10b754bb73afbce2d4333e25a scipy-1.1.0-cp27-none-win_amd64.whl 28a40c0cc6516faa85c2d6a4b759d49b scipy-1.1.0-cp34-cp34m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl fe52c136fd85780e1b8eae61f547573f scipy-1.1.0-cp34-cp34m-manylinux1_i686.whl 0beed5f35d90e47ca0a19df1b4d6705b scipy-1.1.0-cp34-cp34m-manylinux1_x86_64.whl 149d429369a8d4c65340da5f4d7f3c5c scipy-1.1.0-cp34-none-win32.whl f53881e219fe7b1e7baafc16076b8037 scipy-1.1.0-cp34-none-win_amd64.whl 02c5f91e021aff8e0616ddb3e13b2939 scipy-1.1.0-cp35-cp35m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl 85abc7a569fb1a0eb83e889c2dbf9415 scipy-1.1.0-cp35-cp35m-manylinux1_i686.whl 959d873bda4753d33b4f2f4e882027d7 scipy-1.1.0-cp35-cp35m-manylinux1_x86_64.whl 53aa31c367ae288d9268ab80fcabae98 scipy-1.1.0-cp35-none-win32.whl 5231103cf8f60d377395992c64dca3e8 scipy-1.1.0-cp35-none-win_amd64.whl 911883861aaa4030f49290134057cd14 scipy-1.1.0-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl c20dd7bd0ee52dda5d9f364ed30868e0 scipy-1.1.0-cp36-cp36m-manylinux1_i686.whl 6ceb8c9e15464bc097d6fb033df36436 scipy-1.1.0-cp36-cp36m-manylinux1_x86_64.whl a47007af1f8fa31abffddca45aa8dba6 scipy-1.1.0-cp36-none-win32.whl 748e458b4a488894007afdc5740dca12 scipy-1.1.0-cp36-none-win_amd64.whl aa6bcc85276b6f25e17bcfc4dede8718 scipy-1.1.0.tar.gz 7ff4ecfd9f0e953c2ec36c934ee65a97 scipy-1.1.0.tar.xz 6b56add6c5994ebf6d0c2861c538a86c scipy-1.1.0.zip SHA256 ~~~~~~ 340ef70f5b0f4e2b4b43c8c8061165911bc6b2ad16f8de85d9774545e2c47463 scipy-1.1.0-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl c22b27371b3866c92796e5d7907e914f0e58a36d3222c5d436ddd3f0e354227a scipy-1.1.0-cp27-cp27m-manylinux1_i686.whl d8491d4784aceb1f100ddb8e31239c54e4afab8d607928a9f7ef2469ec35ae01 scipy-1.1.0-cp27-cp27m-manylinux1_x86_64.whl 8190770146a4c8ed5d330d5b5ad1c76251c63349d25c96b3094875b930c44692 scipy-1.1.0-cp27-cp27mu-manylinux1_i686.whl 08237eda23fd8e4e54838258b124f1cd141379a5f281b0a234ca99b38918c07a scipy-1.1.0-cp27-cp27mu-manylinux1_x86_64.whl dfc5080c38dde3f43d8fbb9c0539a7839683475226cf83e4b24363b227dfe552 scipy-1.1.0-cp27-none-win32.whl e7a01e53163818d56eabddcafdc2090e9daba178aad05516b20c6591c4811020 scipy-1.1.0-cp27-none-win_amd64.whl 0e645dbfc03f279e1946cf07c9c754c2a1859cb4a41c5f70b25f6b3a586b6dbd scipy-1.1.0-cp34-cp34m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl f0521af1b722265d824d6ad055acfe9bd3341765735c44b5a4d0069e189a0f40 scipy-1.1.0-cp34-cp34m-manylinux1_i686.whl 3b243c77a822cd034dad53058d7c2abf80062aa6f4a32e9799c95d6391558631 scipy-1.1.0-cp34-cp34m-manylinux1_x86_64.whl 8f841bbc21d3dad2111a94c490fb0a591b8612ffea86b8e5571746ae76a3deac scipy-1.1.0-cp34-none-win32.whl ee677635393414930541a096fc8e61634304bb0153e4e02b75685b11eba14cae scipy-1.1.0-cp34-none-win_amd64.whl 423b3ff76957d29d1cce1bc0d62ebaf9a3fdfaf62344e3fdec14619bb7b5ad3a scipy-1.1.0-cp35-cp35m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl 0611ee97296265af4a21164a5323f8c1b4e8e15c582d3dfa7610825900136bb7 scipy-1.1.0-cp35-cp35m-manylinux1_i686.whl 108c16640849e5827e7d51023efb3bd79244098c3f21e4897a1007720cb7ce37 scipy-1.1.0-cp35-cp35m-manylinux1_x86_64.whl 3ad73dfc6f82e494195144bd3a129c7241e761179b7cb5c07b9a0ede99c686f3 scipy-1.1.0-cp35-none-win32.whl d0cdd5658b49a722783b8b4f61a6f1f9c75042d0e29a30ccb6cacc9b25f6d9e2 scipy-1.1.0-cp35-none-win_amd64.whl e24e22c8d98d3c704bb3410bce9b69e122a8de487ad3dbfe9985d154e5c03a40 scipy-1.1.0-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl 404a00314e85eca9d46b80929571b938e97a143b4f2ddc2b2b3c91a4c4ead9c5 scipy-1.1.0-cp36-cp36m-manylinux1_i686.whl 729f8f8363d32cebcb946de278324ab43d28096f36593be6281ca1ee86ce6559 scipy-1.1.0-cp36-cp36m-manylinux1_x86_64.whl 0e9bb7efe5f051ea7212555b290e784b82f21ffd0f655405ac4f87e288b730b3 scipy-1.1.0-cp36-none-win32.whl 698c6409da58686f2df3d6f815491fd5b4c2de6817a45379517c92366eea208f scipy-1.1.0-cp36-none-win_amd64.whl 878352408424dffaa695ffedf2f9f92844e116686923ed9aa8626fc30d32cfd1 scipy-1.1.0.tar.gz a18cc84d1c784c78b06f0f2adc400b29647728712f3843fc890c02aad1ac5424 scipy-1.1.0.tar.xz 391af739bf65c3915f229647d858c15fca0b96dac0497901a4fe2bd3fa69f40b scipy-1.1.0.zip -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEiNDtwDFNJaodBiJZHOiglY3YpVcFAlrt+ZMACgkQHOiglY3Y pVcnNBAAk+tuy6qyPZJomc7urrz804VZdDH81hLfEHww7g5DpIYGk54rkyKgeDWT /ZveYcSzUc+zm81qmDZPb6ctfB7MH2B2mXUfNvxZkGYDseYIJyLtO10/hrqxyHsL YXGp4nD4QEeJ696OYyuaYxnjVe64dz2Lc25l6MBPdcevH9bY2MG2q95ajZKX3k2y QUM5OsBZzKfGWCaXCYFt6E+FcEQ9cLfTBUQNxktCrDosuPBv+jy8ukwDj5NSqBwg zM4lZ3C/lXF4v68cye17voJqr5fIEA+Y61jK200TwJO+jH9ifUs9hR5ED1PLzvMK SRIxX+5digAe0iAVHkOeGS9fTEFTSoT62fqYMIVdSRv6zId9FcGiP54h94676wB8 OYApvcxLQqaGnsMW462EQsKg/oXd+MXcwX4Gpn1WQT8XDmlygsLyIPduymNKeh6/ L53UJsjLgC4ize7oM7ppfAPo+OmxAqzuUxd2K57Jr+hWUThmWmE+3BkAipH/XP+4 w3cApggI1KCooi9Kt/IkYxUPjwdAwuQjuKB/PpPpfRdnCwveXasJS/hcld3f4y5j GBQ8QT2wT4P9wS20b5yfuy5FpL7Ss9eTbKihXzjT7ymxz2IY6gl47m+Aug8laexn kE/l/XmWUU2D+/VdZAgHFenigI/z6iv/EhiPeZR+GEkYIl/a+QM= =OnQY -----END PGP SIGNATURE----- From Nicolas.Rougier at inria.fr Mon May 7 01:51:06 2018 From: Nicolas.Rougier at inria.fr (Nicolas Rougier) Date: Mon, 7 May 2018 07:51:06 +0200 Subject: [Numpy-discussion] Book on scientific visualization using Python & OpenGL Message-ID: <34FFDB4C-C545-42F5-97CF-9B7098B10228@inria.fr> [Sorry for cross-posting] Hi all, I just released an (unfinished) book on "Python & OpenGL for Scientific Visualization?. It?s available at http://www.labri.fr/perso/nrougier/python-opengl/ (sources are available at https://github.com/rougier/python-opengl) Nicolas From pwoodford at keywcorp.com Tue May 8 14:03:34 2018 From: pwoodford at keywcorp.com (Paul Woodford) Date: Tue, 8 May 2018 18:03:34 +0000 Subject: [Numpy-discussion] sinc always returns double precision Message-ID: <5F8AD39A-B6AE-4AE2-87DC-D6FC2BD6A700@keywcorp.com> Sinc seems to always return double precision, even when the input is single precision: In [125]: np.sinc(np.float32(0)).dtype Out[125]: dtype('float64') Is this desired behavior?? I?m guessing the promotion occurs in the multiplication by pi: y = pi * where(x == 0, 1.0e-20, x) Paul _________ Paul Woodford, Ph.D. Principal Research Engineer KeyW Corporation 7763 Old Telegraph Road | Severn, MD 21144 Direct: 443.274.1466 | Main: 443.733.1500 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5202 bytes Desc: not available URL: From m.h.vankerkwijk at gmail.com Tue May 8 14:34:11 2018 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Tue, 8 May 2018 14:34:11 -0400 Subject: [Numpy-discussion] sinc always returns double precision In-Reply-To: <5F8AD39A-B6AE-4AE2-87DC-D6FC2BD6A700@keywcorp.com> References: <5F8AD39A-B6AE-4AE2-87DC-D6FC2BD6A700@keywcorp.com> Message-ID: It is actually a bit more subtle (annoyingly so), the reason you get a float64 is that you pass in a scalar, and for scalars, the dtype of `pi` indeed "wins", as there is little reason to possibly loose precision. If you pass in an array instead, then you do get `float32`: ``` np.sinc(np.array([1.], dtype=np.float32)).dtype dtype('float32') ``` The rationale here is that for an array you generally do not want to just blow up the memory usage, so its dtype has precedent (as long as it is float). So, there is a reason, but it certainly leads to a lot of confusion (e.g., https://github.com/numpy/numpy/issues/10322) All that said, the implementation of `np.sinc` is not super - really could do with a few more in-place operations! -- Marten From matti.picus at gmail.com Wed May 9 16:33:17 2018 From: matti.picus at gmail.com (Matti Picus) Date: Wed, 9 May 2018 23:33:17 +0300 Subject: [Numpy-discussion] NumPy sprint May 24-25 at BIDS Message-ID: <35623d67-7255-3cf3-31db-91839c63a7b8@gmail.com> A reminder - we will take advantage of a few NumPy developers being at Berkeley to hold a two day sprint May 24-25 https://scisprints.github.io/#may-numpy-developer-sprint . We invite any core contributors who would like to attend and can help if needed with travel and accomodations. Stefan and Matti From pierre.debuyl at kuleuven.be Wed May 9 17:51:09 2018 From: pierre.debuyl at kuleuven.be (Pierre de Buyl) Date: Wed, 9 May 2018 23:51:09 +0200 Subject: [Numpy-discussion] EuroSciPy 2018 - call for abstracts Message-ID: <20180509215109.GH2378@pi-x230> EuroSciPy 2018 https://www.euroscipy.org/2018/ has opened its call for abstracts for talks, posters, and tutorials! It was announced via twitter and our "announcement" mailing list but I did not reach out to the NumPy/SciPy lists yet, so I'll just mention: 1. The conference is in Trento (IT), 28 Aug. - 1 Sep. 2. The deadline for the call is the 13 May 3. Tutorials are included in the call, if you wish to have an idea of what we are looking for, see https://scipy2017.scipy.org/ehome/220975/493418/ (SciPy US 2017) or https://www.euroscipy.org/2017/program.html (EuroSciPy 2017). Regards, Pierre de Buyl From charlesr.harris at gmail.com Wed May 9 21:44:14 2018 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 9 May 2018 19:44:14 -0600 Subject: [Numpy-discussion] NumPy sprint May 24-25 at BIDS In-Reply-To: <35623d67-7255-3cf3-31db-91839c63a7b8@gmail.com> References: <35623d67-7255-3cf3-31db-91839c63a7b8@gmail.com> Message-ID: Hi Matti, I need to know some details: 1. Where is the meeting 2. When is the meeting 3. Where are good places to stay 4. Is there a recommended airport I expect the university has a page somewhere with useful information for visitors, a link would be helpful. I've been to SV several times, but the last time I was in Berkeley was in 1969 :) Chuck On Wed, May 9, 2018 at 2:33 PM, Matti Picus wrote: > A reminder - we will take advantage of a few NumPy developers being at > Berkeley to hold a two day sprint May 24-25 https://scisprints.github.io/# > may-numpy-developer-sprint #may-numpy-developer-sprint>. > We invite any core contributors who would like to attend and can help if > needed with travel and accomodations. > > Stefan and Matti > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pwoodford at keywcorp.com Thu May 10 14:27:00 2018 From: pwoodford at keywcorp.com (Paul Woodford) Date: Thu, 10 May 2018 18:27:00 +0000 Subject: [Numpy-discussion] sinc always returns double precision In-Reply-To: References: <5F8AD39A-B6AE-4AE2-87DC-D6FC2BD6A700@keywcorp.com> Message-ID: Ah, I see.? That rationale makes sense. Paul From: NumPy-Discussion on behalf of Marten van Kerkwijk Reply-To: Discussion of Numerical Python Date: Tuesday, May 8, 2018 at 2:40 PM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] sinc always returns double precision It is actually a bit more subtle (annoyingly so), the reason you get a float64 is that you pass in a scalar, and for scalars, the dtype of `pi` indeed "wins", as there is little reason to possibly loose precision. If you pass in an array instead, then you do get `float32`: ``` np.sinc(np.array([1.], dtype=np.float32)).dtype dtype('float32') ``` The rationale here is that for an array you generally do not want to just blow up the memory usage, so its dtype has precedent (as long as it is float). So, there is a reason, but it certainly leads to a lot of confusion (e.g., https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_numpy_numpy_issues_10322&d=DwICAg&c=31nHN1tvZeuWBT6LwDN4Ngk1qezfsYHyolgGeY2ZhlU&r=pWejSe2R2_qRF_nJUoy-IChkOn_Jf3D9EMavoc-oxtw&m=JujDdD-etvHDgctd0xXmVffN19AAT7Q9yD3Ryb71tnI&s=H4tN00ZUcDUwYLPau1NUGBTZbZ2HRfvFvzA0S8PB_Wc&e=) All that said, the implementation of `np.sinc` is not super - really could do with a few more in-place operations! -- Marten _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at python.org https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.python.org_mailman_listinfo_numpy-2Ddiscussion&d=DwICAg&c=31nHN1tvZeuWBT6LwDN4Ngk1qezfsYHyolgGeY2ZhlU&r=pWejSe2R2_qRF_nJUoy-IChkOn_Jf3D9EMavoc-oxtw&m=JujDdD-etvHDgctd0xXmVffN19AAT7Q9yD3Ryb71tnI&s=z-fzi6OPNJPWvVe8bgcB49TVJoosQJyusXcabEXXP0k&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5202 bytes Desc: not available URL: From einstein.edison at gmail.com Thu May 10 22:48:55 2018 From: einstein.edison at gmail.com (Hameer Abbasi) Date: Thu, 10 May 2018 22:48:55 -0400 Subject: [Numpy-discussion] Casting scalars Message-ID: Hello, everyone! I might be missing something and this might be a very stupid and redundant question, but is there a way to cast a scalar to a given dtype? Hameer -------------- next part -------------- An HTML attachment was scrubbed... URL: From stuart at stuartreynolds.net Thu May 10 22:50:50 2018 From: stuart at stuartreynolds.net (Stuart Reynolds) Date: Thu, 10 May 2018 19:50:50 -0700 Subject: [Numpy-discussion] Casting scalars In-Reply-To: References: Message-ID: np.float(scalar) On Thu, May 10, 2018 at 7:49 PM Hameer Abbasi wrote: > Hello, everyone! > > I might be missing something and this might be a very stupid and redundant > question, but is there a way to cast a scalar to a given dtype? > > Hameer > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nathan12343 at gmail.com Thu May 10 22:51:14 2018 From: nathan12343 at gmail.com (Nathan Goldbaum) Date: Thu, 10 May 2018 21:51:14 -0500 Subject: [Numpy-discussion] Casting scalars In-Reply-To: References: Message-ID: In [1]: import numpy as np In [2]: np.float64(12) Out[2]: 12.0 In [3]: np.float64(12).dtype Out[3]: dtype('float64') On Thu, May 10, 2018 at 9:49 PM Hameer Abbasi wrote: > Hello, everyone! > > I might be missing something and this might be a very stupid and redundant > question, but is there a way to cast a scalar to a given dtype? > > Hameer > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From einstein.edison at gmail.com Thu May 10 22:53:33 2018 From: einstein.edison at gmail.com (Hameer Abbasi) Date: Thu, 10 May 2018 22:53:33 -0400 Subject: [Numpy-discussion] Casting scalars In-Reply-To: References: Message-ID: Yes, that I know. I meant given a dtype string such as 'uint8' or a dtype object. I know I can possibly do np.array(scalar, dtype=dtype)[()] but I was looking for a less hacky method. On 11/05/2018 at 07:50, Stuart wrote: np.float(scalar) On Thu, May 10, 2018 at 7:49 PM Hameer Abbasi wrote: Hello, everyone! I might be missing something and this might be a very stupid and redundant question, but is there a way to cast a scalar to a given dtype? Hameer _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at python.org https://mail.python.org/mailman/listinfo/numpy-discussion From nathan12343 at gmail.com Thu May 10 22:54:46 2018 From: nathan12343 at gmail.com (Nathan Goldbaum) Date: Thu, 10 May 2018 21:54:46 -0500 Subject: [Numpy-discussion] Casting scalars In-Reply-To: References: Message-ID: On Thu, May 10, 2018 at 9:51 PM Stuart Reynolds wrote: > np.float(scalar) > This actually isn't right. It's a common misconception, but np.float is an alias to the built-in float type. You probably want np.float_(scalar) In [5]: np.float_(12).dtype Out[5]: dtype('float64') In [6]: np.float is float Out[6]: True > > On Thu, May 10, 2018 at 7:49 PM Hameer Abbasi > wrote: > >> Hello, everyone! >> >> I might be missing something and this might be a very stupid and >> redundant question, but is there a way to cast a scalar to a given dtype? >> >> Hameer >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at gmail.com Thu May 10 23:20:32 2018 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Thu, 10 May 2018 23:20:32 -0400 Subject: [Numpy-discussion] Casting scalars In-Reply-To: References: Message-ID: On Thu, May 10, 2018 at 10:53 PM, Hameer Abbasi wrote: > Yes, that I know. I meant given a dtype string such as 'uint8' or a > dtype object. I know I can possibly do np.array(scalar, > dtype=dtype)[()] but I was looking for a less hacky method. Apparently the `dtype` object has the attribute `type` that creates objects of that dtype. For example, In [30]: a Out[30]: array([ 1., 2., 3.]) In [31]: dt = a.dtype In [32]: dt Out[32]: dtype('float64') In [33]: x = dt.type(8675309) # Convert the scalar to a's dtype. In [34]: x Out[34]: 8675309.0 In [35]: type(x) Out[35]: numpy.float64 Warren > On > 11/05/2018 at 07:50, Stuart wrote: np.float(scalar) On Thu, May 10, > 2018 at 7:49 PM Hameer Abbasi wrote: > Hello, everyone! I might be missing something and this might be a very > stupid and redundant question, but is there a way to cast a scalar to > a given dtype? Hameer _______________________________________________ > NumPy-Discussion mailing list NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From einstein.edison at gmail.com Fri May 11 00:22:05 2018 From: einstein.edison at gmail.com (Hameer Abbasi) Date: Fri, 11 May 2018 00:22:05 -0400 Subject: [Numpy-discussion] Casting scalars In-Reply-To: References: Message-ID: This is exactly what I needed! Thanks! On 11/05/2018 at 08:20, Warren wrote: On Thu, May 10, 2018 at 10:53 PM, Hameer Abbasi wrote: Yes, that I know. I meant given a dtype string such as 'uint8' or a dtype object. I know I can possibly do np.array(scalar, dtype=dtype)[()] but I was looking for a less hacky method. Apparently the `dtype` object has the attribute `type` that creates objects of that dtype. For example, In [30]: a Out[30]: array([ 1., 2., 3.]) In [31]: dt = a.dtype In [32]: dt Out[32]: dtype('float64') In [33]: x = dt.type(8675309) # Convert the scalar to a's dtype. In [34]: x Out[34]: 8675309.0 In [35]: type(x) Out[35]: numpy.float64 Warren On 11/05/2018 at 07:50, Stuart wrote: np.float(scalar) On Thu, May 10, 2018 at 7:49 PM Hameer Abbasi wrote: Hello, everyone! I might be missing something and this might be a very stupid and redundant question, but is there a way to cast a scalar to a given dtype? Hameer _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at python.org https://mail.python.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at python.org https://mail.python.org/mailman/listinfo/numpy-discussion From corinhoad at gmail.com Fri May 11 07:43:38 2018 From: corinhoad at gmail.com (Corin Hoad) Date: Fri, 11 May 2018 13:43:38 +0200 Subject: [Numpy-discussion] Adding fweights and aweights to numpy.corrcoef In-Reply-To: References: <2897fa67e5db6b0e6c8b00b6c09f6b12b234a62b.camel@sipsolutions.net> Message-ID: Are there any further thoughts on this? If it's simply allowing corrcoef to hand off the keyword arguments to cov I can make a simple PR with the change. Corin Hoad On Fri, 27 Apr 2018 at 10:44 Corin Hoad wrote: > I seem to recall that there was a discussion on this and it was a lot >>> trickier then expected. >>> >> >> But given that numpy has the weights already for cov, then I don't see >> any additional issues >> whith adding it also to corrcoef. >> > >> > corrcoef is just rescaling the cov, so there is nothing special to add >> except that corrcoef hands off the options to cov. >> > > This was my understanding. I am currently just using my own copy of > corrcoef which forwards the aweights and fweights arguments directly to > np.cov. Is this the correct approach? > > Corin Hoad > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Fri May 11 10:06:30 2018 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 11 May 2018 10:06:30 -0400 Subject: [Numpy-discussion] Adding fweights and aweights to numpy.corrcoef In-Reply-To: References: <2897fa67e5db6b0e6c8b00b6c09f6b12b234a62b.camel@sipsolutions.net> Message-ID: On Fri, May 11, 2018 at 7:43 AM, Corin Hoad wrote: > Are there any further thoughts on this? If it's simply allowing corrcoef > to hand off the keyword arguments to cov I can make a simple PR with the > change. > No further thoughts from my side. I don't see a problem. Aside: And the degrees of freedom correction, which was one of the ambiguous issues in the cov case, will not matter in the corrcoef case because it cancels in the latter. Josef > > > Corin Hoad > > On Fri, 27 Apr 2018 at 10:44 Corin Hoad wrote: > >> I seem to recall that there was a discussion on this and it was a lot >>>> trickier then expected. >>>> >>> >>> But given that numpy has the weights already for cov, then I don't see >>> any additional issues >>> whith adding it also to corrcoef. >>> >> >>> >> corrcoef is just rescaling the cov, so there is nothing special to add >>> except that corrcoef hands off the options to cov. >>> >> >> This was my understanding. I am currently just using my own copy of >> corrcoef which forwards the aweights and fweights arguments directly to >> np.cov. Is this the correct approach? >> >> Corin Hoad >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri May 11 18:50:27 2018 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 11 May 2018 16:50:27 -0600 Subject: [Numpy-discussion] NumPy 1.15 branch Message-ID: Hi All, I'm thinking of branching 1.15 in the next week or so, not least because we need to get a fix out for the instruction reordering problem , maybe as a backport also. As it common with NumPy development, our attention gets dedicated to the top of the PR stack, so I'd like folks to take a look at PRs that have been languishing with an eye to getting them in. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From matti.picus at gmail.com Sat May 12 17:14:36 2018 From: matti.picus at gmail.com (Matti Picus) Date: Sun, 13 May 2018 00:14:36 +0300 Subject: [Numpy-discussion] NumPy sprint May 24-25 at BIDS In-Reply-To: References: <35623d67-7255-3cf3-31db-91839c63a7b8@gmail.com> Message-ID: On 10/05/18 04:44, Charles R Harris wrote: > On Wed, May 9, 2018 at 2:33 PM, Matti Picus > wrote: > > A reminder - we will take advantage of a few NumPy developers > being at Berkeley to hold a two day sprint May 24-25 > https://scisprints.github.io/#may-numpy-developer-sprint > > >. > We invite any core contributors who would like to attend and can > help if needed with travel and accomodations. > > Stefan and Matti > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > Hi Matti, > > I need to know some details: > > 1. Where is the meeting > 2. When is the meeting > 3. Where are good places to stay > 4. Is there a recommended airport > > I expect the university has a page somewhere with useful information > for visitors, a link would be helpful. I've been to SV several times, > but the last time I was in Berkeley was in 1969 :) > > Chuck > > We will meet at the BIDS inside the Berkeley campus, more info on travel, location and accommodation is here https://bids.berkeley.edu/about/directions-and-travel Matti From corinhoad at gmail.com Sat May 12 17:19:06 2018 From: corinhoad at gmail.com (Corin Hoad) Date: Sat, 12 May 2018 22:19:06 +0100 Subject: [Numpy-discussion] Adding fweights and aweights to numpy.corrcoef In-Reply-To: References: <2897fa67e5db6b0e6c8b00b6c09f6b12b234a62b.camel@sipsolutions.net> Message-ID: The discussed changes are implemented in PR #11078 Corin On Fri, 11 May 2018 at 15:07 wrote: > On Fri, May 11, 2018 at 7:43 AM, Corin Hoad wrote: > >> Are there any further thoughts on this? If it's simply allowing corrcoef >> to hand off the keyword arguments to cov I can make a simple PR with the >> change. >> > > No further thoughts from my side. I don't see a problem. > > Aside: And the degrees of freedom correction, which was one of the > ambiguous issues in the cov case, will not matter in the corrcoef case > because it cancels in the latter. > > Josef > > > >> >> >> Corin Hoad >> >> On Fri, 27 Apr 2018 at 10:44 Corin Hoad wrote: >> >>> I seem to recall that there was a discussion on this and it was a lot >>>>> trickier then expected. >>>>> >>>> >>>> But given that numpy has the weights already for cov, then I don't see >>>> any additional issues >>>> whith adding it also to corrcoef. >>>> >>> >>>> >>> corrcoef is just rescaling the cov, so there is nothing special to add >>>> except that corrcoef hands off the options to cov. >>>> >>> >>> This was my understanding. I am currently just using my own copy of >>> corrcoef which forwards the aweights and fweights arguments directly to >>> np.cov. Is this the correct approach? >>> >>> Corin Hoad >>> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> >> _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From millman at berkeley.edu Sat May 12 18:27:42 2018 From: millman at berkeley.edu (Jarrod Millman) Date: Sat, 12 May 2018 15:27:42 -0700 Subject: [Numpy-discussion] NumPy sprint May 24-25 at BIDS In-Reply-To: References: <35623d67-7255-3cf3-31db-91839c63a7b8@gmail.com> Message-ID: Hi Chuck, A few more bits of advice from a local ... Oakland airport is smaller and closer, so I try to use it when I can. But SFO probably has more options and isn't too far away. I prefer Hotel Shattuck Plaza to Hotel Durant. Shattuck is close to BART. So you can get on the BART at either SFO or OAK and get off at the Downtown Berkeley station and then walk a short block to your hotel. Also, Durant is closer to the frat houses, so it can get noisy at certain times (although, classes should be out so that shouldn't be a problem now). Some people prefer to use Airbnb. If you go that route, I would try to get something near (or on) Euclid Ave. between Hearst and Marin. There is a bus line that runs up and down Euclid every 30 minutes. It is a pretty and very quiet area (as soon as you get a couple blocks away from campus). It takes about 30 minutes to walk from the corner of Euclid and Marin to BIDS. It is all downhill to campus. (I live near Euclid & Marin and often walk even though I have a bus pass.) Once you get to Euclid and Hearst (i.e., the Northgate to campus), it is a short walk (continuing in the same direction) to BIDS. I am not sure about the hours of the sprint, but I suspect they will be 9ish to 5ish. I hope to see you and others soon! Best regards, Jarrod On Sat, May 12, 2018 at 2:14 PM, Matti Picus wrote: > On 10/05/18 04:44, Charles R Harris wrote: >> >> On Wed, May 9, 2018 at 2:33 PM, Matti Picus > > wrote: >> >> A reminder - we will take advantage of a few NumPy developers >> being at Berkeley to hold a two day sprint May 24-25 >> https://scisprints.github.io/#may-numpy-developer-sprint >> >> > >. >> We invite any core contributors who would like to attend and can >> help if needed with travel and accomodations. >> >> Stefan and Matti >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> >> >> >> Hi Matti, >> >> I need to know some details: >> >> 1. Where is the meeting >> 2. When is the meeting >> 3. Where are good places to stay >> 4. Is there a recommended airport >> >> I expect the university has a page somewhere with useful information for >> visitors, a link would be helpful. I've been to SV several times, but the >> last time I was in Berkeley was in 1969 :) >> >> Chuck >> >> > We will meet at the BIDS inside the Berkeley campus, more info on travel, > location and accommodation is here > https://bids.berkeley.edu/about/directions-and-travel > > > Matti > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion From charlesr.harris at gmail.com Sun May 13 13:59:13 2018 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 13 May 2018 11:59:13 -0600 Subject: [Numpy-discussion] NumPy sprint May 24-25 at BIDS In-Reply-To: References: <35623d67-7255-3cf3-31db-91839c63a7b8@gmail.com> Message-ID: On Sat, May 12, 2018 at 4:27 PM, Jarrod Millman wrote: > Hi Chuck, > > A few more bits of advice from a local ... > > Oakland airport is smaller and closer, so I try to use it when I can. > But SFO probably has more options and isn't too far away. > > I prefer Hotel Shattuck Plaza to Hotel Durant. Shattuck is close to > BART. So you can get on the BART at either SFO or OAK and get off at > the Downtown Berkeley station and then walk a short block to your > hotel. Also, Durant is closer to the frat houses, so it can get noisy > at certain times (although, classes should be out so that shouldn't be > a problem now). > > Some people prefer to use Airbnb. If you go that route, I would try > to get something near (or on) Euclid Ave. between Hearst and Marin. > There is a bus line that runs up and down Euclid every 30 minutes. It > is a pretty and very quiet area (as soon as you get a couple blocks > away from campus). It takes about 30 minutes to walk from the corner > of Euclid and Marin to BIDS. It is all downhill to campus. (I live > near Euclid & Marin and often walk even though I have a bus pass.) > Once you get to Euclid and Hearst (i.e., the Northgate to campus), it > is a short walk (continuing in the same direction) to BIDS. > > I am not sure about the hours of the sprint, but I suspect they will > be 9ish to 5ish. > > I hope to see you and others soon! > OK, I've reserved at the Shattuck Plaza. There is a bike rental around the corner, would you recommend renting a bike ($35/day)? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From millman at berkeley.edu Sun May 13 14:50:04 2018 From: millman at berkeley.edu (Jarrod Millman) Date: Sun, 13 May 2018 11:50:04 -0700 Subject: [Numpy-discussion] NumPy sprint May 24-25 at BIDS In-Reply-To: References: <35623d67-7255-3cf3-31db-91839c63a7b8@gmail.com> Message-ID: Hi Chuck, I don't think a bike is necessary. It is less than a mile from the Shattuck hotel to BIDS and takes about 10-15 minutes to walk. It is also more or less completely flat and most of the walk is on campus. There are tons of restaurants and stores immediately around the Shattuck hotel, so the farthest you probably need to travel is from the hotel to BIDS and back. Best regards, Jarrod On Sun, May 13, 2018 at 10:59 AM, Charles R Harris wrote: > > > On Sat, May 12, 2018 at 4:27 PM, Jarrod Millman > wrote: >> >> Hi Chuck, >> >> A few more bits of advice from a local ... >> >> Oakland airport is smaller and closer, so I try to use it when I can. >> But SFO probably has more options and isn't too far away. >> >> I prefer Hotel Shattuck Plaza to Hotel Durant. Shattuck is close to >> BART. So you can get on the BART at either SFO or OAK and get off at >> the Downtown Berkeley station and then walk a short block to your >> hotel. Also, Durant is closer to the frat houses, so it can get noisy >> at certain times (although, classes should be out so that shouldn't be >> a problem now). >> >> Some people prefer to use Airbnb. If you go that route, I would try >> to get something near (or on) Euclid Ave. between Hearst and Marin. >> There is a bus line that runs up and down Euclid every 30 minutes. It >> is a pretty and very quiet area (as soon as you get a couple blocks >> away from campus). It takes about 30 minutes to walk from the corner >> of Euclid and Marin to BIDS. It is all downhill to campus. (I live >> near Euclid & Marin and often walk even though I have a bus pass.) >> Once you get to Euclid and Hearst (i.e., the Northgate to campus), it >> is a short walk (continuing in the same direction) to BIDS. >> >> I am not sure about the hours of the sprint, but I suspect they will >> be 9ish to 5ish. >> >> I hope to see you and others soon! > > > OK, I've reserved at the Shattuck Plaza. There is a bike rental around the > corner, would you recommend renting a bike ($35/day)? > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > From pierre.debuyl at kuleuven.be Thu May 17 11:30:21 2018 From: pierre.debuyl at kuleuven.be (Pierre de Buyl) Date: Thu, 17 May 2018 17:30:21 +0200 Subject: [Numpy-discussion] EuroSciPy 2018 - extended deadline - registration open Message-ID: <20180517153021.GI11858@pi-x230> Dear SciPy community, The EuroSciPy 2018 conference, held in Trento from August 28 to September 1 has opened its registration at https://www.euroscipy.org/2018/ ! Also, the deadline for participations (talks, posters, tutorials) has been extended to May 31, don't miss it! All the info is on our web page, feel free to contact us if needed and follow us on Twitter :-) https://twitter.com/EuroSciPy Regards, The EuroSciPy staff From matti.picus at gmail.com Thu May 17 19:11:02 2018 From: matti.picus at gmail.com (Matti Picus) Date: Thu, 17 May 2018 16:11:02 -0700 Subject: [Numpy-discussion] NumPy sprint May 24-25 at BIDS In-Reply-To: <35623d67-7255-3cf3-31db-91839c63a7b8@gmail.com> References: <35623d67-7255-3cf3-31db-91839c63a7b8@gmail.com> Message-ID: <42ca0e26-6f6a-cb4f-d471-4bdec92ef834@gmail.com> On 09/05/18 13:33, Matti Picus wrote: > A reminder - we will take advantage of a few NumPy developers being at > Berkeley to hold a two day sprint May 24-25 > https://scisprints.github.io/#may-numpy-developer-sprint > . > We invite any core contributors who would like to attend and can help > if needed with travel and accomodations. > > Stefan and Matti So far I know about Stefan, Nathaniel, Chuck and me. Things will work better if we can get organized ahead of time. Anyone else planning on attending for both days or part of the sprint, please drop me a line. If there are any issues, pull requests, NEPs, or ideas you would like us to work on please let me know, or add it to the Trello card https://trello.com/c/fvSYkm2w Matti From mrocklin at gmail.com Thu May 17 19:21:10 2018 From: mrocklin at gmail.com (Matthew Rocklin) Date: Thu, 17 May 2018 19:21:10 -0400 Subject: [Numpy-discussion] NumPy sprint May 24-25 at BIDS In-Reply-To: <42ca0e26-6f6a-cb4f-d471-4bdec92ef834@gmail.com> References: <35623d67-7255-3cf3-31db-91839c63a7b8@gmail.com> <42ca0e26-6f6a-cb4f-d471-4bdec92ef834@gmail.com> Message-ID: FWIW I'll be at BIDS on the 25th, though I'll mostly be working on non-Numpy things. On Thu, May 17, 2018 at 7:11 PM, Matti Picus wrote: > On 09/05/18 13:33, Matti Picus wrote: > >> A reminder - we will take advantage of a few NumPy developers being at >> Berkeley to hold a two day sprint May 24-25 >> https://scisprints.github.io/#may-numpy-developer-sprint < >> https://scisprints.github.io/#may-numpy-developer-sprint>. >> We invite any core contributors who would like to attend and can help if >> needed with travel and accomodations. >> >> Stefan and Matti >> > So far I know about Stefan, Nathaniel, Chuck and me. Things will work > better if we can get organized ahead of time. Anyone else planning on > attending for both days or part of the sprint, please drop me a line. If > there are any issues, pull requests, NEPs, or ideas you would like us to > work on please let me know, or add it to the Trello card > https://trello.com/c/fvSYkm2w > > > Matti > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaime.frio at gmail.com Thu May 17 19:36:34 2018 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Thu, 17 May 2018 16:36:34 -0700 Subject: [Numpy-discussion] NumPy sprint May 24-25 at BIDS In-Reply-To: <42ca0e26-6f6a-cb4f-d471-4bdec92ef834@gmail.com> References: <35623d67-7255-3cf3-31db-91839c63a7b8@gmail.com> <42ca0e26-6f6a-cb4f-d471-4bdec92ef834@gmail.com> Message-ID: Hi Matti, I will be joining you on Thursday, sometime around noon, and all day Friday. Jaime On Thu, May 17, 2018 at 4:11 PM Matti Picus wrote: > On 09/05/18 13:33, Matti Picus wrote: > > A reminder - we will take advantage of a few NumPy developers being at > > Berkeley to hold a two day sprint May 24-25 > > https://scisprints.github.io/#may-numpy-developer-sprint > > . > > We invite any core contributors who would like to attend and can help > > if needed with travel and accomodations. > > > > Stefan and Matti > So far I know about Stefan, Nathaniel, Chuck and me. Things will work > better if we can get organized ahead of time. Anyone else planning on > attending for both days or part of the sprint, please drop me a line. If > there are any issues, pull requests, NEPs, or ideas you would like us to > work on please let me know, or add it to the Trello card > https://trello.com/c/fvSYkm2w > > Matti > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaime.frio at gmail.com Thu May 17 19:37:42 2018 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Thu, 17 May 2018 16:37:42 -0700 Subject: [Numpy-discussion] NumPy sprint May 24-25 at BIDS In-Reply-To: References: <35623d67-7255-3cf3-31db-91839c63a7b8@gmail.com> <42ca0e26-6f6a-cb4f-d471-4bdec92ef834@gmail.com> Message-ID: OK, make that all day Friday only, if it's Friday and Saturday. Jaime On Thu, May 17, 2018 at 4:36 PM Jaime Fern?ndez del R?o < jaime.frio at gmail.com> wrote: > Hi Matti, > > I will be joining you on Thursday, sometime around noon, and all day > Friday. > > Jaime > > On Thu, May 17, 2018 at 4:11 PM Matti Picus wrote: > >> On 09/05/18 13:33, Matti Picus wrote: >> > A reminder - we will take advantage of a few NumPy developers being at >> > Berkeley to hold a two day sprint May 24-25 >> > https://scisprints.github.io/#may-numpy-developer-sprint >> > . >> > We invite any core contributors who would like to attend and can help >> > if needed with travel and accomodations. >> > >> > Stefan and Matti >> So far I know about Stefan, Nathaniel, Chuck and me. Things will work >> better if we can get organized ahead of time. Anyone else planning on >> attending for both days or part of the sprint, please drop me a line. If >> there are any issues, pull requests, NEPs, or ideas you would like us to >> work on please let me know, or add it to the Trello card >> https://trello.com/c/fvSYkm2w >> >> Matti >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > > > -- > (\__/) > ( O.o) > ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes > de dominaci?n mundial. > -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaime.frio at gmail.com Thu May 17 19:38:53 2018 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Thu, 17 May 2018 16:38:53 -0700 Subject: [Numpy-discussion] NumPy sprint May 24-25 at BIDS In-Reply-To: References: <35623d67-7255-3cf3-31db-91839c63a7b8@gmail.com> <42ca0e26-6f6a-cb4f-d471-4bdec92ef834@gmail.com> Message-ID: $#!#, was looking at the wrong calendar month: Thursday half day, Friday all day. Jaime On Thu, May 17, 2018 at 4:37 PM Jaime Fern?ndez del R?o < jaime.frio at gmail.com> wrote: > OK, make that all day Friday only, if it's Friday and Saturday. > > Jaime > > On Thu, May 17, 2018 at 4:36 PM Jaime Fern?ndez del R?o < > jaime.frio at gmail.com> wrote: > >> Hi Matti, >> >> I will be joining you on Thursday, sometime around noon, and all day >> Friday. >> >> Jaime >> >> On Thu, May 17, 2018 at 4:11 PM Matti Picus >> wrote: >> >>> On 09/05/18 13:33, Matti Picus wrote: >>> > A reminder - we will take advantage of a few NumPy developers being at >>> > Berkeley to hold a two day sprint May 24-25 >>> > https://scisprints.github.io/#may-numpy-developer-sprint >>> > . >>> > We invite any core contributors who would like to attend and can help >>> > if needed with travel and accomodations. >>> > >>> > Stefan and Matti >>> So far I know about Stefan, Nathaniel, Chuck and me. Things will work >>> better if we can get organized ahead of time. Anyone else planning on >>> attending for both days or part of the sprint, please drop me a line. If >>> there are any issues, pull requests, NEPs, or ideas you would like us to >>> work on please let me know, or add it to the Trello card >>> https://trello.com/c/fvSYkm2w >>> >>> Matti >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >> >> >> -- >> (\__/) >> ( O.o) >> ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes >> de dominaci?n mundial. >> > > > -- > (\__/) > ( O.o) > ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes > de dominaci?n mundial. > -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Fri May 18 03:30:38 2018 From: shoyer at gmail.com (Stephan Hoyer) Date: Fri, 18 May 2018 00:30:38 -0700 Subject: [Numpy-discussion] NumPy sprint May 24-25 at BIDS In-Reply-To: References: <35623d67-7255-3cf3-31db-91839c63a7b8@gmail.com> <42ca0e26-6f6a-cb4f-d471-4bdec92ef834@gmail.com> Message-ID: I will also be attending, on at least Thursday (and hopefully Friday, too). Best, Stephan On Thu, May 17, 2018 at 1:40 PM Jaime Fern?ndez del R?o < jaime.frio at gmail.com> wrote: > $#!#, was looking at the wrong calendar month: Thursday half day, Friday > all day. > > Jaime > > On Thu, May 17, 2018 at 4:37 PM Jaime Fern?ndez del R?o < > jaime.frio at gmail.com> wrote: > >> OK, make that all day Friday only, if it's Friday and Saturday. >> >> Jaime >> >> On Thu, May 17, 2018 at 4:36 PM Jaime Fern?ndez del R?o < >> jaime.frio at gmail.com> wrote: >> >>> Hi Matti, >>> >>> I will be joining you on Thursday, sometime around noon, and all day >>> Friday. >>> >>> Jaime >>> >>> On Thu, May 17, 2018 at 4:11 PM Matti Picus >>> wrote: >>> >>>> On 09/05/18 13:33, Matti Picus wrote: >>>> > A reminder - we will take advantage of a few NumPy developers being >>>> at >>>> > Berkeley to hold a two day sprint May 24-25 >>>> > https://scisprints.github.io/#may-numpy-developer-sprint >>>> > . >>>> > We invite any core contributors who would like to attend and can help >>>> > if needed with travel and accomodations. >>>> > >>>> > Stefan and Matti >>>> So far I know about Stefan, Nathaniel, Chuck and me. Things will work >>>> better if we can get organized ahead of time. Anyone else planning on >>>> attending for both days or part of the sprint, please drop me a line. >>>> If >>>> there are any issues, pull requests, NEPs, or ideas you would like us >>>> to >>>> work on please let me know, or add it to the Trello card >>>> https://trello.com/c/fvSYkm2w >>>> >>>> Matti >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at python.org >>>> https://mail.python.org/mailman/listinfo/numpy-discussion >>>> >>> >>> >>> -- >>> (\__/) >>> ( O.o) >>> ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus >>> planes de dominaci?n mundial. >>> >> >> >> -- >> (\__/) >> ( O.o) >> ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes >> de dominaci?n mundial. >> > > > -- > (\__/) > ( O.o) > ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes > de dominaci?n mundial. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mrocklin at gmail.com Fri May 18 08:28:02 2018 From: mrocklin at gmail.com (Matthew Rocklin) Date: Fri, 18 May 2018 08:28:02 -0400 Subject: [Numpy-discussion] Turn numpy.ones_like into a ufunc Message-ID: Hi All, I would like to see the numpy.ones_like function operate as a ufunc. This is currently done in np.core.umath._ones_like. This was recently raised and discussed in https://github.com/numpy/numpy/issues/11074 . It was suggested that I raise the topic here instead. My understanding is that this was considered some time ago, but that the current numpy.ones_like function was implemented instead. No one on that issue seems to fully remember why. Perhaps someone here has a longer memory? My objective for defaulting to the ufunc implementation is that it makes it compatible with other projects that implement numpy-like interfaces (dask.array, sparse, cupy) so that downstream projects can use a subset of numpy code that is valid across a few projects. More broadly I would like to see ufuncs and other protocol-enabled functions start to become more common within numpy, ones_like being one specific case. Best, -matt -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.h.vankerkwijk at gmail.com Fri May 18 09:51:15 2018 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Fri, 18 May 2018 09:51:15 -0400 Subject: [Numpy-discussion] Turn numpy.ones_like into a ufunc In-Reply-To: References: Message-ID: I'm greatly in favour, especially if the same can be done for `zeros_like` and `empty_like`, but note that a tricky part is that ufuncs do not deal very graciously with structured (void) and string dtypes. -- Marten From nathan12343 at gmail.com Fri May 18 09:57:38 2018 From: nathan12343 at gmail.com (Nathan Goldbaum) Date: Fri, 18 May 2018 09:57:38 -0400 Subject: [Numpy-discussion] Turn numpy.ones_like into a ufunc In-Reply-To: References: Message-ID: I don't particularly need this, although it would be nice to make this behavior explicit, instead of happening more or less by accident: In [1]: from yt.units import km In [2]: import numpy as np In [3]: data = [1, 2, 3]*km In [4]: np.ones_like(data) Out[4]: YTArray([1., 1., 1.]) km On Fri, May 18, 2018 at 9:51 AM, Marten van Kerkwijk < m.h.vankerkwijk at gmail.com> wrote: > I'm greatly in favour, especially if the same can be done for > `zeros_like` and `empty_like`, but note that a tricky part is that > ufuncs do not deal very graciously with structured (void) and string > dtypes. -- Marten > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From einstein.edison at gmail.com Fri May 18 10:04:03 2018 From: einstein.edison at gmail.com (Hameer Abbasi) Date: Fri, 18 May 2018 07:04:03 -0700 Subject: [Numpy-discussion] Turn numpy.ones_like into a ufunc In-Reply-To: References: Message-ID: You can preserve this with (for example) __array_ufunc__. On 18/05/2018 at 18:57, Nathan wrote: I don't particularly need this, although it would be nice to make this behavior explicit, instead of happening more or less by accident: In [1]: from yt.units import km In [2]: import numpy as np In [3]: data = [1, 2, 3]*km In [4]: np.ones_like(data) Out[4]: YTArray([1., 1., 1.]) km On Fri, May 18, 2018 at 9:51 AM, Marten van Kerkwijk wrote: I'm greatly in favour, especially if the same can be done for `zeros_like` and `empty_like`, but note that a tricky part is that ufuncs do not deal very graciously with structured (void) and string dtypes. -- Marten _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at python.org https://mail.python.org/mailman/listinfo/numpy-discussion From njs at pobox.com Fri May 18 11:41:11 2018 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 18 May 2018 08:41:11 -0700 Subject: [Numpy-discussion] Turn numpy.ones_like into a ufunc In-Reply-To: References: Message-ID: I would like to see a plan for how we're going to handle zeroes_like, empty_like, ones_like, and full_like before we start making changes to any of them. On Fri, May 18, 2018, 05:33 Matthew Rocklin wrote: > Hi All, > > I would like to see the numpy.ones_like function operate as a ufunc. > This is currently done in np.core.umath._ones_like. This was recently > raised and discussed in https://github.com/numpy/numpy/issues/11074 . It > was suggested that I raise the topic here instead. > > My understanding is that this was considered some time ago, but that the > current numpy.ones_like function was implemented instead. No one on that > issue seems to fully remember why. Perhaps someone here has a longer > memory? > > My objective for defaulting to the ufunc implementation is that it makes > it compatible with other projects that implement numpy-like interfaces > (dask.array, sparse, cupy) so that downstream projects can use a subset of > numpy code that is valid across a few projects. More broadly I would like > to see ufuncs and other protocol-enabled functions start to become more > common within numpy, ones_like being one specific case. > > Best, > -matt > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.h.vankerkwijk at gmail.com Sat May 19 21:12:49 2018 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Sat, 19 May 2018 21:12:49 -0400 Subject: [Numpy-discussion] Turn numpy.ones_like into a ufunc In-Reply-To: References: Message-ID: Just for completeness: this is *not* an issue for ndarray subclasses, but only for people attempting to write duck arrays. One might want to start by mimicking `empty_like` - not too different from `np.positive(a, where=False)`. Will note that that is 50 times slower for small arrays since it actually does the copying - it just doesn't store the results. It is comparable in time to np.zeros_like and np.ones_like, suggesting that a ufunc implementation is not necessarily a bad thing. As I noted above, the main problem I see is that the ufunc mechanism doesn't easily work with strings and voids/structured dtypes. -- Marten From mrocklin at gmail.com Mon May 21 12:01:04 2018 From: mrocklin at gmail.com (Matthew Rocklin) Date: Mon, 21 May 2018 12:01:04 -0400 Subject: [Numpy-discussion] Turn numpy.ones_like into a ufunc In-Reply-To: References: Message-ID: I've also posted a second issue on doing this at the module level (beyond just ones_like) here: https://github.com/numpy/numpy/issues/11129 On Sat, May 19, 2018 at 9:12 PM, Marten van Kerkwijk < m.h.vankerkwijk at gmail.com> wrote: > Just for completeness: this is *not* an issue for ndarray subclasses, > but only for people attempting to write duck arrays. One might want > to start by mimicking `empty_like` - not too different from > `np.positive(a, where=False)`. Will note that that is 50 times slower > for small arrays since it actually does the copying - it just doesn't > store the results. It is comparable in time to np.zeros_like and > np.ones_like, suggesting > that a ufunc implementation is not necessarily a bad thing. > > As I noted above, the main problem I see is that the ufunc mechanism > doesn't easily work with strings and voids/structured dtypes. > > -- Marten > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matti.picus at gmail.com Mon May 21 20:42:33 2018 From: matti.picus at gmail.com (Matti Picus) Date: Mon, 21 May 2018 17:42:33 -0700 Subject: [Numpy-discussion] matmul as a ufunc Message-ID: I have made progress with resolving the issue that matmul, the operation which implements `a @ b`, is not a ufunc [2]. Discussion on the issue, which prevents the __array_ufunc__ mechanism for overriding matmul on subclasses of ndarray, yeilded two approaches: - create a wrapper that can convince the ufunc mechanism to call __array_ufunc__ even on functions that are not true ufuncs - expand the semantics of core signatures so that a single matmul ufunc can implement matrix-matrix, vector-matrix, matrix-vector, and vector-vector multiplication. I have put up prototypes of both approaches as pr 11061 [0] and 11133 [1], they are WIP to prove the concept and are a bit rough. Either approach can be made to work. Which is preferable? What are the criteria we should use to judge the relative merits (backward compatibility, efficiency, code clarity, enabling further enhancements, ...) of the approaches? Matti [0] https://github.com/numpy/numpy/pull/11061 [1] https://github.com/numpy/numpy/pull/11133 [2] https://github.com/numpy/numpy/issues/9028 From nelle.varoquaux at gmail.com Tue May 22 19:01:53 2018 From: nelle.varoquaux at gmail.com (Nelle Varoquaux) Date: Tue, 22 May 2018 16:01:53 -0700 Subject: [Numpy-discussion] Submit a BoF at SciPy 2018, before June 27! Message-ID: Dear all, (apologies for the cross-posting) The SciPy conference would like to invite you to submit proposals for Birds of a Feather (BOF) sessions at this year's SciPy! BOFs usually include short presentations by a panel and a moderator with the bulk of the time spent opening up the discussion to everyone in attendance. BoF topics can be of general interest, such as state-of-the-project BoFs, or based on the themes of the conference and the mini-symposia topics. Please submit your proposals by June 27 here: https://scipy2018.scipy. org/ehome/299527/648142/ Past SciPy conferences have had a large variety of BOF sessions, including topics on Reproducibility, Jupyter Notebooks, Distributed Computing, Geospatial Packages in Python, Teaching Scientific Computing with Python, Python and Finance, NumFOCUS, Python in Astronomy, Collaborating and Contributing in Open Science, Education, and a Matplotlib Enhancement Proposal Discussion. Generally, if there is a topic where you think a number of people at SciPy will be interested, you should propose it! Thanks, Jess & Nelle -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.v.root at gmail.com Tue May 22 21:14:06 2018 From: ben.v.root at gmail.com (Benjamin Root) Date: Tue, 22 May 2018 21:14:06 -0400 Subject: [Numpy-discussion] Submit a BoF at SciPy 2018, before June 27! In-Reply-To: References: Message-ID: Question: I submitted a BoF (and code sprint), but I didn't get any email acknowledgement. Were we supposed to? How can we know that the submission was successful? On Tue, May 22, 2018 at 7:01 PM, Nelle Varoquaux wrote: > Dear all, > > (apologies for the cross-posting) > > The SciPy conference would like to invite you to submit proposals for > Birds of a Feather (BOF) sessions at this year's SciPy! BOFs usually > include short presentations by a panel and a moderator with the bulk of the > time spent opening up the discussion to everyone in attendance. BoF topics > can be of general interest, such as state-of-the-project BoFs, or based on > the themes of the conference and the mini-symposia topics. > > Please submit your proposals by June 27 here: https://scipy2018.scipy. > org/ehome/299527/648142/ > > Past SciPy conferences have had a large variety of BOF sessions, including > topics on Reproducibility, Jupyter Notebooks, Distributed Computing, > Geospatial Packages in Python, Teaching Scientific Computing with Python, > Python and Finance, NumFOCUS, Python in Astronomy, Collaborating and > Contributing in Open Science, Education, and a Matplotlib Enhancement > Proposal Discussion. Generally, if there is a topic where you think a > number of people at SciPy will be interested, you should propose it! > > > Thanks, > Jess & Nelle > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ypeng.bj at gmail.com Wed May 23 00:30:44 2018 From: ypeng.bj at gmail.com (Yu Peng) Date: Wed, 23 May 2018 12:30:44 +0800 Subject: [Numpy-discussion] Matrix opreation Message-ID: Hi, I want to make an opreation like this: if I hava a matrix: a= array([[[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11], [12, 13, 14, 15]], [[16, 17, 18, 19], [20, 21, 22, 23], [24, 25, 26, 27], [28, 29, 30, 31]], [[32, 33, 34, 35], [36, 37, 38, 39], [40, 41, 42, 43], [44, 45, 46, 47]], [[48, 49, 50, 51], [52, 53, 54, 55], [56, 57, 58, 59], [60, 61, 62, 63]]]) and the shape of a is (4,4,4), I want to tranform this tensor or matrix to (8,8), and the final result is like this: 0 16 1 17 2 18 3 19 32 48 33 49 34 50 35 51 4 20 5 21 6 22 7 23 36 52 37 53 38 54 39 55 8 24 9 25 10 26 11 27 40 56 41 57 42 58 43 59 12 28 13 29 14 30 15 31 44 60 45 61 46 62 47 63 If you know how to deal with this matrix, please give me some suggestions.. Thanks. -------------- next part -------------- An HTML attachment was scrubbed... URL: From wieser.eric+numpy at gmail.com Wed May 23 00:41:38 2018 From: wieser.eric+numpy at gmail.com (Eric Wieser) Date: Tue, 22 May 2018 21:41:38 -0700 Subject: [Numpy-discussion] Matrix opreation In-Reply-To: References: Message-ID: I?d recommend asking this kind of question on stackoverflow in future, but you can do that with: b = (a .reshape((2, 2, 4, 4)) # split up the (4,) axis into (2, 2) .transpose((2, 0, 3, 1)) # reorder to (4, 2, 4, 2) .reshape((8, 8)) # collapse adjacent dimensions ) ? On Tue, 22 May 2018 at 21:31 Yu Peng wrote: > Hi, I want to make an opreation like this: > > if I hava a matrix: > > a= > > array([[[ 0, 1, 2, 3], > [ 4, 5, 6, 7], > [ 8, 9, 10, 11], > [12, 13, 14, 15]], > > [[16, 17, 18, 19], > [20, 21, 22, 23], > [24, 25, 26, 27], > [28, 29, 30, 31]], > > [[32, 33, 34, 35], > [36, 37, 38, 39], > [40, 41, 42, 43], > [44, 45, 46, 47]], > > [[48, 49, 50, 51], > [52, 53, 54, 55], > [56, 57, 58, 59], > [60, 61, 62, 63]]]) > > > and the shape of a is (4,4,4), I want to tranform this tensor or matrix > to (8,8), and the final result is like this: > 0 16 1 17 2 18 3 19 > 32 48 33 49 34 50 35 51 > 4 20 5 21 6 22 7 23 > 36 52 37 53 38 54 39 55 > 8 24 9 25 10 26 11 27 > 40 56 41 57 42 58 43 59 > 12 28 13 29 14 30 15 31 > 44 60 45 61 46 62 47 63 > If you know how to deal with this matrix, please give me some > suggestions.. Thanks. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matti.picus at gmail.com Wed May 23 15:06:52 2018 From: matti.picus at gmail.com (Matti Picus) Date: Wed, 23 May 2018 12:06:52 -0700 Subject: [Numpy-discussion] Splitting MaskedArray into a separate package Message-ID: <45fa0f5f-26c6-ee9d-7239-8a77d95a6ba4@gmail.com> MaskedArray is a strange but useful creature. This NEP proposes to distribute it as a separate package under the NumPy brand. As I understand the process, a proposed NEP should be first discussed here to gauge general acceptance, then after that the details should be discussed on the pull request itself https://github.com/numpy/numpy/pull/11146. Here is the motivation section from the NEP: > MaskedArrays are a sub-class of the NumPy ``ndarray`` that adds > masking capabilities, i.e. the ability to ignore or hide certain array > values during computation. > > While historically convenient to distribute this class inside of NumPy, > improved packaging has made it possible to distribute it separately > without difficulty. > > Motivations for this move include: > > ?* Focus: the NumPy package should strive to only include the > ?? `ndarray` object, and the essential utilities needed to manipulate > ?? such arrays. > ?* Complexity: the MaskedArray implementation is non-trivial, and imposes > ?? a significant maintenance burden. > ?* Compatibility: MaskedArray objects, being subclasses of `ndarrays`, > ?? often cause complications when being used with other packages. > ?? Fixing these issues is outside the scope of NumPy development. > > This NEP proposes a deprecation pathway through which MaskedArrays > would still be accessible to users, but no longer as part of the core > package. Any thoughts? Matti and Stefan From ralf.gommers at gmail.com Wed May 23 15:29:32 2018 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Wed, 23 May 2018 12:29:32 -0700 Subject: [Numpy-discussion] Splitting MaskedArray into a separate package In-Reply-To: <45fa0f5f-26c6-ee9d-7239-8a77d95a6ba4@gmail.com> References: <45fa0f5f-26c6-ee9d-7239-8a77d95a6ba4@gmail.com> Message-ID: On Wed, May 23, 2018 at 12:06 PM, Matti Picus wrote: > MaskedArray is a strange but useful creature. This NEP proposes to > distribute it as a separate package under the NumPy brand. > > As I understand the process, a proposed NEP should be first discussed here > to gauge general acceptance, then after that the details should be > discussed on the pull request itself https://github.com/numpy/numpy > /pull/11146. > > Here is the motivation section from the NEP: > > MaskedArrays are a sub-class of the NumPy ``ndarray`` that adds >> masking capabilities, i.e. the ability to ignore or hide certain array >> values during computation. >> >> While historically convenient to distribute this class inside of NumPy, >> improved packaging has made it possible to distribute it separately >> without difficulty. >> >> Motivations for this move include: >> >> * Focus: the NumPy package should strive to only include the >> `ndarray` object, and the essential utilities needed to manipulate >> such arrays. >> * Complexity: the MaskedArray implementation is non-trivial, and imposes >> a significant maintenance burden. >> * Compatibility: MaskedArray objects, being subclasses of `ndarrays`, >> often cause complications when being used with other packages. >> Fixing these issues is outside the scope of NumPy development. >> > Hmm, I wouldn't say it's out of scope at all. Currently it's simply part of numpy. > >> This NEP proposes a deprecation pathway through which MaskedArrays >> would still be accessible to users, but no longer as part of the core >> package. >> > > Any thoughts? > You're missing an important step I think. You're proposing to deprecate MaskedArray completely (or not?). IIRC this has not been decided or seriously discussed before. The complexity is not going away if you intend to keep MaskedArray alive long-term, only in a separate package. It gets worse actually, because now we would need to cross-package regression testing. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefanv at berkeley.edu Wed May 23 16:03:12 2018 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Wed, 23 May 2018 13:03:12 -0700 Subject: [Numpy-discussion] Splitting MaskedArray into a separate package In-Reply-To: References: <45fa0f5f-26c6-ee9d-7239-8a77d95a6ba4@gmail.com> Message-ID: <20180523200312.q7nf4rnrobvbezaz@carbo> On Wed, 23 May 2018 12:29:32 -0700, Ralf Gommers wrote: > >> * Compatibility: MaskedArray objects, being subclasses of `ndarrays`, > >> often cause complications when being used with other packages. > >> Fixing these issues is outside the scope of NumPy development. > > > Hmm, I wouldn't say it's out of scope at all. Currently it's simply part of > numpy. That is currently the situation, yes. I think this was meant more as "we'd preferably not like to think about MaskedArrays any differently than we do about other external packages, such as dask". I.e., not support specific hacks to make it work. > You're missing an important step I think. You're proposing to deprecate > MaskedArray completely (or not?). IIRC this has not been decided or > seriously discussed before. Good point, which certainly needs to be discussed. My thought was to move it out into a separate package that could be maintained more in the spirit of a scikit by people who care deeply about its functionality. Best regards, St?fan From efiring at hawaii.edu Wed May 23 16:02:22 2018 From: efiring at hawaii.edu (Eric Firing) Date: Wed, 23 May 2018 10:02:22 -1000 Subject: [Numpy-discussion] Splitting MaskedArray into a separate package In-Reply-To: <45fa0f5f-26c6-ee9d-7239-8a77d95a6ba4@gmail.com> References: <45fa0f5f-26c6-ee9d-7239-8a77d95a6ba4@gmail.com> Message-ID: On 2018/05/23 9:06 AM, Matti Picus wrote: > MaskedArray is a strange but useful creature. This NEP proposes to > distribute it as a separate package under the NumPy brand. > > As I understand the process, a proposed NEP should be first discussed > here to gauge general acceptance, then after that the details should be > discussed on the pull request itself > https://github.com/numpy/numpy/pull/11146. > > Here is the motivation section from the NEP: > >> MaskedArrays are a sub-class of the NumPy ``ndarray`` that adds >> masking capabilities, i.e. the ability to ignore or hide certain array >> values during computation. >> >> While historically convenient to distribute this class inside of NumPy, >> improved packaging has made it possible to distribute it separately >> without difficulty. >> >> Motivations for this move include: >> >> ?* Focus: the NumPy package should strive to only include the >> ?? `ndarray` object, and the essential utilities needed to manipulate >> ?? such arrays. >> ?* Complexity: the MaskedArray implementation is non-trivial, and imposes >> ?? a significant maintenance burden. >> ?* Compatibility: MaskedArray objects, being subclasses of `ndarrays`, >> ?? often cause complications when being used with other packages. >> ?? Fixing these issues is outside the scope of NumPy development. >> >> This NEP proposes a deprecation pathway through which MaskedArrays >> would still be accessible to users, but no longer as part of the core >> package. > > Any thoughts? > > Matti and Stefan I understand at least some of the motivation and potential advantages, but as it stands, I find this NEP highly alarming. Masked arrays are critical to my numpy usage, and I suspect they are critical for many other use cases as well. In fact, I would prefer that a high priority for major numpy development be the more complete integration of masked array capabilities into numpy, not their removal to a separate package. I was unhappy to see the effort in that direction a few years ago being killed. I didn't agree with every design decision, but overall I thought it was going in the right direction. Bad or missing values (and situations where one wants to use a mask to operate on a subset of an array) are found in many domains of real life; do you really want python users in those domains to have to fall back on Matlab-style reliance on nans and/or manual mask manipulations, as the new maskedarray package is sidelined? Or is there any realistic prospect for maintenance and improvement of the package after it is separated out? Or of mask/missing value handling being integrated into numpy? Is the latter option on the table in any form, or is it DOA? Side question: does your proposed purification of numpy include elimination of linalg and random? Based on the criteria in the NEP, I would expect it does; so maybe you should have a more ambitious NEP, and do the purification all in one step as a numpy version 2.0. (Surely if masked arrays are purged, the matrix class should be booted out at the same time.) Eric From ralf.gommers at gmail.com Wed May 23 16:30:49 2018 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Wed, 23 May 2018 13:30:49 -0700 Subject: [Numpy-discussion] Splitting MaskedArray into a separate package In-Reply-To: <20180523200312.q7nf4rnrobvbezaz@carbo> References: <45fa0f5f-26c6-ee9d-7239-8a77d95a6ba4@gmail.com> <20180523200312.q7nf4rnrobvbezaz@carbo> Message-ID: On Wed, May 23, 2018 at 1:03 PM, Stefan van der Walt wrote: > On Wed, 23 May 2018 12:29:32 -0700, Ralf Gommers wrote: > > >> * Compatibility: MaskedArray objects, being subclasses of `ndarrays`, > > >> often cause complications when being used with other packages. > > >> Fixing these issues is outside the scope of NumPy development. > > > > > Hmm, I wouldn't say it's out of scope at all. Currently it's simply part > of > > numpy. > > That is currently the situation, yes. I think this was meant more as > "we'd preferably not like to think about MaskedArrays any differently > than we do about other external packages, such as dask". I.e., not > support specific hacks to make it work. > > > You're missing an important step I think. You're proposing to deprecate > > MaskedArray completely (or not?). IIRC this has not been decided or > > seriously discussed before. > > Good point, which certainly needs to be discussed. My thought was to > move it out into a separate package that could be maintained more in the > spirit of a scikit by people who care deeply about its functionality. > That would be good in principle, but it's only possible that way once the specific hacks you refer to above are removed. As long as MaskedArray depends on implementation details of ndarray, evolving them in lock-step will be necessary. And that is much easier when they're in the same package. Regarding whether a split-off package will actually be developed, I think that depends on having at least one champion for it stepping up. If we just move it over into github.com/numpy/maskedarray, I think it will get less rather than more attention. Cheers, Ralf > > Best regards, > St?fan > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefanv at berkeley.edu Wed May 23 16:51:16 2018 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Wed, 23 May 2018 13:51:16 -0700 Subject: [Numpy-discussion] Splitting MaskedArray into a separate package In-Reply-To: References: <45fa0f5f-26c6-ee9d-7239-8a77d95a6ba4@gmail.com> Message-ID: <1638ec529a0.27ae.acf34a9c767d7bb498a799333be0433e@fastmail.com> Hi Eric, On May 23, 2018 13:25:44 Eric Firing wrote: > On 2018/05/23 9:06 AM, Matti Picus wrote: > I understand at least some of the motivation and potential advantages, > but as it stands, I find this NEP highly alarming. I am not at my computer right now, so I will respond in more detail later. But I wanted to address your statement above: I see a NEP as an opportunity to discuss and flesh out an idea, and I certainly hope that you there's no reason for alarm. I do not expect to know whether this is a good idea before discussions conclude, so I appreciate your feedback. If we cannot find good support for the idea, with very specific benefits, it should simply be dropped. But, I think there's a lot to learn from the conversation in the meantime w.r.t. exactly how streamlined people want NumPy to be, how core functionality can perhaps be strengthened by becoming a customer of our own API, how to optimally maintain sub-components, etc. Best regards, St?fan From ilhanpolat at gmail.com Wed May 23 16:57:35 2018 From: ilhanpolat at gmail.com (Ilhan Polat) Date: Wed, 23 May 2018 22:57:35 +0200 Subject: [Numpy-discussion] Splitting MaskedArray into a separate package In-Reply-To: <1638ec529a0.27ae.acf34a9c767d7bb498a799333be0433e@fastmail.com> References: <45fa0f5f-26c6-ee9d-7239-8a77d95a6ba4@gmail.com> <1638ec529a0.27ae.acf34a9c767d7bb498a799333be0433e@fastmail.com> Message-ID: As far as I understand from the discussion above, I think the opposite would be a better strategy for the sanity of our scarce but brave maintainers. I would argue that if there is a maintenance burden, then the ballasts seem to be the linalg and random indeed. Similar pain points exist in SciPy too. There are a lot of issues that has been already thought of, years ago but never materialized (be it backwards compatibility, lack of champions and so on) because they are not the priority of the maintaining team. It is very common that a discussion ends with "yes, we should probably make it a ufunc" and then fades away. I feel that if there were less things to worry about more people would step up and "do it". I would also argue that highest expectancy from NumPy would be having a really sound data structure basis with more ufuncs, more array manipulation tricks and so on. Masked arrays, imho, fall into that category. Hence, if the codebase gets more refined in that respect and less stuff to maintain, less moving parts, I think there would be a more coherent overall picture and more focused action plan. Now the attention of maintainers seem to be divided into a lot of orthogonal issues which is not a bad thing per se but tedious at times. Currently NumPy has a lot of code that really doesn't need to bother and can delegate to higher level packages like SciPy or any other subpackage. It sounds like NumPy 2.0 but actually more of a gradual thinning out. On Wed, May 23, 2018 at 10:51 PM, Stefan van der Walt wrote: > Hi Eric, > > On May 23, 2018 13:25:44 Eric Firing wrote: > > On 2018/05/23 9:06 AM, Matti Picus wrote: >> I understand at least some of the motivation and potential advantages, >> but as it stands, I find this NEP highly alarming. >> > > I am not at my computer right now, so I will respond in more detail later. > But I wanted to address your statement above: > > I see a NEP as an opportunity to discuss and flesh out an idea, and I > certainly hope that you there's no reason for alarm. > > I do not expect to know whether this is a good idea before discussions > conclude, so I appreciate your feedback. If we cannot find good support for > the idea, with very specific benefits, it should simply be dropped. > > But, I think there's a lot to learn from the conversation in the meantime > w.r.t. exactly how streamlined people want NumPy to be, how core > functionality can perhaps be strengthened by becoming a customer of our own > API, how to optimally maintain sub-components, etc. > > Best regards, > St?fan > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Wed May 23 17:26:39 2018 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 23 May 2018 22:26:39 +0100 Subject: [Numpy-discussion] Splitting MaskedArray into a separate package In-Reply-To: <1638ec529a0.27ae.acf34a9c767d7bb498a799333be0433e@fastmail.com> References: <45fa0f5f-26c6-ee9d-7239-8a77d95a6ba4@gmail.com> <1638ec529a0.27ae.acf34a9c767d7bb498a799333be0433e@fastmail.com> Message-ID: Hi, On Wed, May 23, 2018 at 9:51 PM, Stefan van der Walt wrote: > Hi Eric, > > On May 23, 2018 13:25:44 Eric Firing wrote: > >> On 2018/05/23 9:06 AM, Matti Picus wrote: >> I understand at least some of the motivation and potential advantages, >> but as it stands, I find this NEP highly alarming. > > > I am not at my computer right now, so I will respond in more detail later. > But I wanted to address your statement above: > > I see a NEP as an opportunity to discuss and flesh out an idea, and I > certainly hope that you there's no reason for alarm. > > I do not expect to know whether this is a good idea before discussions > conclude, so I appreciate your feedback. If we cannot find good support for > the idea, with very specific benefits, it should simply be dropped. > > But, I think there's a lot to learn from the conversation in the meantime > w.r.t. exactly how streamlined people want NumPy to be, how core > functionality can perhaps be strengthened by becoming a customer of our own > API, how to optimally maintain sub-components, etc. Can I ask what the plans are for supporting missing values, inside or outside numpy? Is there are successor to MaskedArray - and is this part of the succession plan? Cheers, Matthew From allanhaldane at gmail.com Wed May 23 17:33:44 2018 From: allanhaldane at gmail.com (Allan Haldane) Date: Wed, 23 May 2018 17:33:44 -0400 Subject: [Numpy-discussion] Splitting MaskedArray into a separate package In-Reply-To: References: <45fa0f5f-26c6-ee9d-7239-8a77d95a6ba4@gmail.com> Message-ID: <6395ad57-345c-0b82-7fe3-6c90240fc7ef@gmail.com> On 05/23/2018 04:02 PM, Eric Firing wrote: > Bad or missing values (and situations where one wants to use a mask to > operate on a subset of an array) are found in many domains of real life; > do you really want python users in those domains to have to fall back on > Matlab-style reliance on nans and/or manual mask manipulations, as the > new maskedarray package is sidelined? I also think that missing value support is important to include inside numpy, just as it is included in other numerical packages like R and Julia. The time is ripe to write a new and better MaskedArray, because __array_ufunc__ exists now. With some other numpy devs a few months ago we also played with rewriting MA using __array_ufunc__ and fixing up all the bugs and inconsistencies we have discovered over time (eg, getting rid of the Masked constant). Both Eric and I started working on some code changes, but never submitted PRs. See a little bit of discussion here (there was some more elsewhere I can't find now): https://github.com/numpy/numpy/pull/9792#issuecomment-333346420 As I say there, numpy's current MA support is pretty poor compared to R - Wes McKinney partly justified his desire to move pandas away from numpy because of it. We have a lot to gain by implementing it nicely. We already have an NEP discussing possible ways forward: https://docs.scipy.org/doc/numpy-1.14.0/neps/missing-data.html I was pretty excited by discussion above, and still am. I want to get back to it after I finish more immediate priorities - finishing printing/loading/saving fixes and structured array fixes. But Masked-Array-2 is on my list of desired long-term enhancements for numpy. Allan From stefanv at berkeley.edu Wed May 23 17:42:30 2018 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Wed, 23 May 2018 14:42:30 -0700 Subject: [Numpy-discussion] Splitting MaskedArray into a separate package In-Reply-To: References: <45fa0f5f-26c6-ee9d-7239-8a77d95a6ba4@gmail.com> <1638ec529a0.27ae.acf34a9c767d7bb498a799333be0433e@fastmail.com> Message-ID: <1638ef41170.27ae.acf34a9c767d7bb498a799333be0433e@fastmail.com> On May 23, 2018 14:28:05 Matthew Brett wrote: > > Can I ask what the plans are for supporting missing values, inside or > outside numpy? Is there are successor to MaskedArray - and is this > part of the succession plan? I am not aware of any concrete plans, maybe others can chime in? It's a bit strange, the words that are used in this thread: "succession", "purification", "elimination", and "purge". I don't have my knife out for MaskedArrays; I merged a lot of Pierre's work myself. I simply suspect there may be a better and more supporting home/project configuration for it, perhaps still under the NumPy umbrella. Best regards, St?fan From sebastian at sipsolutions.net Wed May 23 17:48:07 2018 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Wed, 23 May 2018 23:48:07 +0200 Subject: [Numpy-discussion] Splitting MaskedArray into a separate package In-Reply-To: <6395ad57-345c-0b82-7fe3-6c90240fc7ef@gmail.com> References: <45fa0f5f-26c6-ee9d-7239-8a77d95a6ba4@gmail.com> <6395ad57-345c-0b82-7fe3-6c90240fc7ef@gmail.com> Message-ID: <7bbac23bcedd1171ca8c0b15547dde2a550d0102.camel@sipsolutions.net> On Wed, 2018-05-23 at 17:33 -0400, Allan Haldane wrote: > On 05/23/2018 04:02 PM, Eric Firing wrote: > > Bad or missing values (and situations where one wants to use a mask > > to > > operate on a subset of an array) are found in many domains of real > > life; > > do you really want python users in those domains to have to fall > > back on > > Matlab-style reliance on nans and/or manual mask manipulations, as > > the > > new maskedarray package is sidelined? > > I also think that missing value support is important to include > inside > numpy, just as it is included in other numerical packages like R and > Julia. > > The time is ripe to write a new and better MaskedArray, because > __array_ufunc__ exists now. With some other numpy devs a few months > ago > we also played with rewriting MA using __array_ufunc__ and fixing up > all > the bugs and inconsistencies we have discovered over time (eg, > getting > rid of the Masked constant). Both Eric and I started working on some > code changes, but never submitted PRs. See a little bit of discussion > here (there was some more elsewhere I can't find now): > > https://github.com/numpy/numpy/pull/9792#issuecomment-333346420 > > As I say there, numpy's current MA support is pretty poor compared to > R > - Wes McKinney partly justified his desire to move pandas away from > numpy because of it. We have a lot to gain by implementing it nicely. > > We already have an NEP discussing possible ways forward: > https://docs.scipy.org/doc/numpy-1.14.0/neps/missing-data.html > > I was pretty excited by discussion above, and still am. I want to get > back to it after I finish more immediate priorities - finishing > printing/loading/saving fixes and structured array fixes. > > But Masked-Array-2 is on my list of desired long-term enhancements > for > numpy. Well, if we plan to replace it within numpy, I think we should wait until then for any move on deprecation (after which it seems like the obviously right choice)? If we do not plan to replace it within numpy, we need to discuss a bit how it might affect infrastructure (multiple implementations....). There is the other discussion about how to replace it. By opening up/creating new masked dtypes or similar (cool but unclear how complex/long term) or `__array_ufunc__` based (relatively simple, will get rid of the nastier hacks that are currently needed). Or even both, just on different time scales? My first gut feeling about the proposal is: I love the idea to get rid of it... but lets not do it, it does feel like it makes too much infrastructure unclear. - Sebastian > > Allan > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From matthew.brett at gmail.com Wed May 23 18:08:07 2018 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 23 May 2018 23:08:07 +0100 Subject: [Numpy-discussion] Splitting MaskedArray into a separate package In-Reply-To: <1638ef41170.27ae.acf34a9c767d7bb498a799333be0433e@fastmail.com> References: <45fa0f5f-26c6-ee9d-7239-8a77d95a6ba4@gmail.com> <1638ec529a0.27ae.acf34a9c767d7bb498a799333be0433e@fastmail.com> <1638ef41170.27ae.acf34a9c767d7bb498a799333be0433e@fastmail.com> Message-ID: Hi, On Wed, May 23, 2018 at 10:42 PM, Stefan van der Walt wrote: > On May 23, 2018 14:28:05 Matthew Brett wrote: >> >> >> Can I ask what the plans are for supporting missing values, inside or >> outside numpy? Is there are successor to MaskedArray - and is this >> part of the succession plan? > > > I am not aware of any concrete plans, maybe others can chime in? > > It's a bit strange, the words that are used in this thread: "succession", > "purification", "elimination", and "purge". I don't have my knife out for > MaskedArrays; I merged a lot of Pierre's work myself. I simply suspect there > may be a better and more supporting home/project configuration for it, > perhaps still under the NumPy umbrella. The NEP notes that MaskedArray imposes a significant maintenance burden, as a motivation for removing it. I'm sure you'd predict that the Numpy developers are likely to spend less time on it, if it moves to its own package. I guess the hope would be that others would take over, but is that likely? What if they don't? Would it be reasonable to develop an alternative plan for missing arrays in concert with this NEP, maybe along the lines that Allan mentioned, above? Cheers, Matthew From stefanv at berkeley.edu Wed May 23 19:01:06 2018 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Wed, 23 May 2018 16:01:06 -0700 Subject: [Numpy-discussion] Splitting MaskedArray into a separate package In-Reply-To: References: <45fa0f5f-26c6-ee9d-7239-8a77d95a6ba4@gmail.com> <20180523200312.q7nf4rnrobvbezaz@carbo> Message-ID: <20180523230106.pigl3sklcfo6xrns@carbo> On Wed, 23 May 2018 13:30:49 -0700, Ralf Gommers wrote: > > Good point, which certainly needs to be discussed. My thought was to > > move it out into a separate package that could be maintained more in the > > spirit of a scikit by people who care deeply about its functionality. > > > That would be good in principle, but it's only possible that way once the > specific hacks you refer to above are removed. As long as MaskedArray > depends on implementation details of ndarray, evolving them in lock-step > will be necessary. And that is much easier when they're in the same > package. Yes, I agree: no special hacks should exist inside of NumPy for MaskedArrays. We should, in this instance, become consumers of our public facing API, and refactor that API as necessary to support it. > Regarding whether a split-off package will actually be developed, I think > that depends on having at least one champion for it stepping up. If we just > move it over into github.com/numpy/maskedarray, I think it will get less > rather than more attention. Wouldn't this be a good test of whether MaskedArrays are as valuable as is being argued? If so, a community will form around it, and if not it may fade into obscurity. Perhaps there is a fear that, in the transition period (i.e., before potential contributors realize that the NumPy core team is no longer doing active maintenance) the project may flounder. But I suspect that is unlikely to happen, as long as we keep an eye on its test suite from the NumPy side (perhaps execute its test suite as part of NumPy CI). Why is the scikit model successful, considering packages could just as well be part of SciPy? I would guess: a strong sense of ownership, the ability to rapidly evolve, better focus, and a lower barrier to entry (fewer moving pieces) may all play a role. When you own a small package, you know no-one else will take care of problems, so you pay careful attention. Best regards, St?fan From mrocklin at gmail.com Wed May 23 19:08:30 2018 From: mrocklin at gmail.com (Matthew Rocklin) Date: Wed, 23 May 2018 19:08:30 -0400 Subject: [Numpy-discussion] Splitting MaskedArray into a separate package In-Reply-To: References: <45fa0f5f-26c6-ee9d-7239-8a77d95a6ba4@gmail.com> <1638ec529a0.27ae.acf34a9c767d7bb498a799333be0433e@fastmail.com> <1638ef41170.27ae.acf34a9c767d7bb498a799333be0433e@fastmail.com> Message-ID: Hi All, *Disclaimer: I don't spend any hours actually maintaining Numpy, so please don't take my comments here with much weight.* My gut reaction here is that if removing masked array allows Numpy to evolve more quickly then this excites me. It could be that a plan goes something like the following: 1. Remove masked array to a separate package, pin it to current versions of Numpy. 2. Evolve Numpy to the point where making new array types becomes attractive 3. Make a new masked array with that new functionality that doesn't have the problems of the current implementation Of course this is a simplistic view of the world, and it could also be that this triggers a forking event. However, hopefully it gets a general theme across though that there is value to allowing Numpy to move quickly, and that it might make sense for some feature-sets to miss out on that evolution for a time for the greater good of the ecosystem's evolution. -matt On Wed, May 23, 2018 at 6:08 PM, Matthew Brett wrote: > Hi, > > On Wed, May 23, 2018 at 10:42 PM, Stefan van der Walt > wrote: > > On May 23, 2018 14:28:05 Matthew Brett wrote: > >> > >> > >> Can I ask what the plans are for supporting missing values, inside or > >> outside numpy? Is there are successor to MaskedArray - and is this > >> part of the succession plan? > > > > > > I am not aware of any concrete plans, maybe others can chime in? > > > > It's a bit strange, the words that are used in this thread: "succession", > > "purification", "elimination", and "purge". I don't have my knife out for > > MaskedArrays; I merged a lot of Pierre's work myself. I simply suspect > there > > may be a better and more supporting home/project configuration for it, > > perhaps still under the NumPy umbrella. > > The NEP notes that MaskedArray imposes a significant maintenance > burden, as a motivation for removing it. I'm sure you'd predict that > the Numpy developers are likely to spend less time on it, if it moves > to its own package. I guess the hope would be that others would take > over, but is that likely? What if they don't? > > Would it be reasonable to develop an alternative plan for missing > arrays in concert with this NEP, maybe along the lines that Allan > mentioned, above? > > Cheers, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From samuel.lotz at salotz.info Wed May 23 19:11:33 2018 From: samuel.lotz at salotz.info (Samuel Lotz) Date: Wed, 23 May 2018 19:11:33 -0400 Subject: [Numpy-discussion] NumPy-Discussion Digest, Vol 140, Issue 25 In-Reply-To: References: Message-ID: <592cfb63-d0aa-5a4e-255c-7e112fd3d6d5@salotz.info> If someone implements a separate library for masked arrays without changing anything in numpy and its better and people use it then maybe the deprecation of it in numpy would be wise. But for me it seems like a large disruption to force such a transition. Much in the way that the numeric standard library is de facto not the numerical library for python. ~Sam From stefanv at berkeley.edu Wed May 23 19:38:55 2018 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Wed, 23 May 2018 16:38:55 -0700 Subject: [Numpy-discussion] Splitting MaskedArray into a separate package In-Reply-To: References: <45fa0f5f-26c6-ee9d-7239-8a77d95a6ba4@gmail.com> Message-ID: <20180523233855.c623fjmzzvx3rfio@carbo> Hi Eric, On Wed, 23 May 2018 10:02:22 -1000, Eric Firing wrote: > Masked arrays are critical to my numpy usage, and I suspect they are > critical for many other use cases as well. That's good to know; and the goal of this NEP should be to improve your siatuion, not make it worse. > In fact, I would prefer that a high priority for major numpy > development be the more complete integration of masked array capabilities > into numpy, not their removal to a separate package. > > I was unhappy to see > the effort in that direction a few years ago being killed. I didn't agree > with every design decision, but overall I thought it was going in the right > direction. I see this and the NEP as orthogonal issues. MaskedArrays, one particular version of the masked value solution, has never truly been a first class citizen. If we could instead implement masked arrays such that it simply sits on top of existing NumPy functionality (using, e.g., special dtypes or bitmasks), re-using all the standard machinery, that would be a natural fit in the core of NumPy, and would negate the need for MaskedArrays. But we haven't reached that point yet, and I am not aware of any current proposal to do so. > Bad or missing values (and situations where one wants to use a mask to > operate on a subset of an array) are found in many domains of real life; do > you really want python users in those domains to have to fall back on > Matlab-style reliance on nans and/or manual mask manipulations, as the new > maskedarray package is sidelined? This is not too far from the current status quo, I would argue. The functionality exists, but it is "bolted on" rather than "built in". And my guess is that the component will benefit from some extra attention that it is not getting as part of the current package. > Or is there any realistic prospect for maintenance and improvement of the > package after it is separated out? In order to prevent the package from being "sidelined", we would have to strengthen this part of the story. > Side question: does your proposed purification of numpy include elimination > of linalg and random? Based on the criteria in the NEP, I would expect it > does; so maybe you should have a more ambitious NEP, and do the purification > all in one step as a numpy version 2.0. (Surely if masked arrays are > purged, the matrix class should be booted out at the same time.) That's an interesting question, and one I have wondered about. Would it make sense to ship just the core ndarray object? I don't know. It probably depends a lot on whether we can define clear API boundaries, whether this kind of split is desired from the average user's perspective, and whether it could benefit the development of the subcomponents. W.r.t. matrices, I think you're setting a trap for me here, but I'm going to step into it anyway ;) https://mail.python.org/pipermail/numpy-discussion/2013-July/067254.html It is, then, not the first time I argued in favor of moving certain components out of NumPy onto their own packages. I would probably have written that NEP this time around, had it not been for the many strings attached via SciPy sparse (and therefore sklearn etc.). Before matrix deprecation can be discussed further, therefore, we need to implement sparse *arrays* for SciPy (and some efforts are slowly underway). See also: https://mail.python.org/pipermail/numpy-discussion/2017-January/076290.html http://numpy-discussion.10968.n7.nabble.com/Deprecate-matrices-in-1-15-and-remove-in-1-17-tp44968.html Best regards, St?fan From ben.v.root at gmail.com Wed May 23 22:52:53 2018 From: ben.v.root at gmail.com (Benjamin Root) Date: Wed, 23 May 2018 22:52:53 -0400 Subject: [Numpy-discussion] Splitting MaskedArray into a separate package In-Reply-To: <20180523233855.c623fjmzzvx3rfio@carbo> References: <45fa0f5f-26c6-ee9d-7239-8a77d95a6ba4@gmail.com> <20180523233855.c623fjmzzvx3rfio@carbo> Message-ID: users of a package does not equate to maintainers of a package. Scikits are successful because scientists that have specialty in a field can contribute code and support the packages using their domain knowledge. How many people here are specialists in masked/missing value computation? Would I like to see better missing value support in numpy? Sure, but until then, MaskedArrays are what we have and it is still better than just using NaNs all over the place. Cheers! Ben Root On Wed, May 23, 2018 at 7:38 PM, Stefan van der Walt wrote: > Hi Eric, > > On Wed, 23 May 2018 10:02:22 -1000, Eric Firing wrote: > > Masked arrays are critical to my numpy usage, and I suspect they are > > critical for many other use cases as well. > > That's good to know; and the goal of this NEP should be to improve your > siatuion, not make it worse. > > > In fact, I would prefer that a high priority for major numpy > > development be the more complete integration of masked array capabilities > > into numpy, not their removal to a separate package. > > > > I was unhappy to see > > the effort in that direction a few years ago being killed. I didn't > agree > > with every design decision, but overall I thought it was going in the > right > > direction. > > I see this and the NEP as orthogonal issues. MaskedArrays, one > particular version of the masked value solution, has never truly been a > first class citizen. > > If we could instead implement masked arrays such that it simply sits on > top of existing NumPy functionality (using, e.g., special dtypes or > bitmasks), re-using all the standard machinery, that would be a natural > fit in the core of NumPy, and would negate the need for MaskedArrays. > But we haven't reached that point yet, and I am not aware of any current > proposal to do so. > > > Bad or missing values (and situations where one wants to use a mask to > > operate on a subset of an array) are found in many domains of real life; > do > > you really want python users in those domains to have to fall back on > > Matlab-style reliance on nans and/or manual mask manipulations, as the > new > > maskedarray package is sidelined? > > This is not too far from the current status quo, I would argue. The > functionality exists, but it is "bolted on" rather than "built in". And > my guess is that the component will benefit from some extra attention > that it is not getting as part of the current package. > > > Or is there any realistic prospect for maintenance and improvement of the > > package after it is separated out? > > In order to prevent the package from being "sidelined", we would have to > strengthen this part of the story. > > > Side question: does your proposed purification of numpy include > elimination > > of linalg and random? Based on the criteria in the NEP, I would expect > it > > does; so maybe you should have a more ambitious NEP, and do the > purification > > all in one step as a numpy version 2.0. (Surely if masked arrays are > > purged, the matrix class should be booted out at the same time.) > > That's an interesting question, and one I have wondered about. Would it > make sense to ship just the core ndarray object? I don't know. It > probably depends a lot on whether we can define clear API boundaries, > whether this kind of split is desired from the average user's > perspective, and whether it could benefit the development of the > subcomponents. > > W.r.t. matrices, I think you're setting a trap for me here, but I'm > going to step into it anyway ;) > > https://mail.python.org/pipermail/numpy-discussion/2013-July/067254.html > > It is, then, not the first time I argued in favor of moving certain > components out of NumPy onto their own packages. I would probably have > written that NEP this time around, had it not been for the many strings > attached via SciPy sparse (and therefore sklearn etc.). Before matrix > deprecation can be discussed further, therefore, we need to implement > sparse *arrays* for SciPy (and some efforts are slowly underway). > > See also: > > https://mail.python.org/pipermail/numpy-discussion/ > 2017-January/076290.html > http://numpy-discussion.10968.n7.nabble.com/Deprecate- > matrices-in-1-15-and-remove-in-1-17-tp44968.html > > Best regards, > St?fan > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.v.root at gmail.com Wed May 23 22:56:51 2018 From: ben.v.root at gmail.com (Benjamin Root) Date: Wed, 23 May 2018 22:56:51 -0400 Subject: [Numpy-discussion] Splitting MaskedArray into a separate package In-Reply-To: References: <45fa0f5f-26c6-ee9d-7239-8a77d95a6ba4@gmail.com> <20180523233855.c623fjmzzvx3rfio@carbo> Message-ID: As further evidence of a widely used package that is often considered "critical" to an ecosystem that gets negligible support, look no further than Basemap. It went almost two years without any commits before I took it up (and then only because my employer needed a couple of fixes). I worry that a masked array package would turn into Basemap. Ben Root On Wed, May 23, 2018 at 10:52 PM, Benjamin Root wrote: > users of a package does not equate to maintainers of a package. Scikits > are successful because scientists that have specialty in a field can > contribute code and support the packages using their domain knowledge. How > many people here are specialists in masked/missing value computation? > > Would I like to see better missing value support in numpy? Sure, but until > then, MaskedArrays are what we have and it is still better than just using > NaNs all over the place. > > Cheers! > Ben Root > > On Wed, May 23, 2018 at 7:38 PM, Stefan van der Walt > wrote: > >> Hi Eric, >> >> On Wed, 23 May 2018 10:02:22 -1000, Eric Firing wrote: >> > Masked arrays are critical to my numpy usage, and I suspect they are >> > critical for many other use cases as well. >> >> That's good to know; and the goal of this NEP should be to improve your >> siatuion, not make it worse. >> >> > In fact, I would prefer that a high priority for major numpy >> > development be the more complete integration of masked array >> capabilities >> > into numpy, not their removal to a separate package. >> > >> > I was unhappy to see >> > the effort in that direction a few years ago being killed. I didn't >> agree >> > with every design decision, but overall I thought it was going in the >> right >> > direction. >> >> I see this and the NEP as orthogonal issues. MaskedArrays, one >> particular version of the masked value solution, has never truly been a >> first class citizen. >> >> If we could instead implement masked arrays such that it simply sits on >> top of existing NumPy functionality (using, e.g., special dtypes or >> bitmasks), re-using all the standard machinery, that would be a natural >> fit in the core of NumPy, and would negate the need for MaskedArrays. >> But we haven't reached that point yet, and I am not aware of any current >> proposal to do so. >> >> > Bad or missing values (and situations where one wants to use a mask to >> > operate on a subset of an array) are found in many domains of real >> life; do >> > you really want python users in those domains to have to fall back on >> > Matlab-style reliance on nans and/or manual mask manipulations, as the >> new >> > maskedarray package is sidelined? >> >> This is not too far from the current status quo, I would argue. The >> functionality exists, but it is "bolted on" rather than "built in". And >> my guess is that the component will benefit from some extra attention >> that it is not getting as part of the current package. >> >> > Or is there any realistic prospect for maintenance and improvement of >> the >> > package after it is separated out? >> >> In order to prevent the package from being "sidelined", we would have to >> strengthen this part of the story. >> >> > Side question: does your proposed purification of numpy include >> elimination >> > of linalg and random? Based on the criteria in the NEP, I would expect >> it >> > does; so maybe you should have a more ambitious NEP, and do the >> purification >> > all in one step as a numpy version 2.0. (Surely if masked arrays are >> > purged, the matrix class should be booted out at the same time.) >> >> That's an interesting question, and one I have wondered about. Would it >> make sense to ship just the core ndarray object? I don't know. It >> probably depends a lot on whether we can define clear API boundaries, >> whether this kind of split is desired from the average user's >> perspective, and whether it could benefit the development of the >> subcomponents. >> >> W.r.t. matrices, I think you're setting a trap for me here, but I'm >> going to step into it anyway ;) >> >> https://mail.python.org/pipermail/numpy-discussion/2013-July/067254.html >> >> It is, then, not the first time I argued in favor of moving certain >> components out of NumPy onto their own packages. I would probably have >> written that NEP this time around, had it not been for the many strings >> attached via SciPy sparse (and therefore sklearn etc.). Before matrix >> deprecation can be discussed further, therefore, we need to implement >> sparse *arrays* for SciPy (and some efforts are slowly underway). >> >> See also: >> >> https://mail.python.org/pipermail/numpy-discussion/2017- >> January/076290.html >> http://numpy-discussion.10968.n7.nabble.com/Deprecate-matric >> es-in-1-15-and-remove-in-1-17-tp44968.html >> >> Best regards, >> St?fan >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Thu May 24 11:31:10 2018 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Thu, 24 May 2018 17:31:10 +0200 Subject: [Numpy-discussion] Splitting MaskedArray into a separate package In-Reply-To: <7bbac23bcedd1171ca8c0b15547dde2a550d0102.camel@sipsolutions.net> References: <45fa0f5f-26c6-ee9d-7239-8a77d95a6ba4@gmail.com> <6395ad57-345c-0b82-7fe3-6c90240fc7ef@gmail.com> <7bbac23bcedd1171ca8c0b15547dde2a550d0102.camel@sipsolutions.net> Message-ID: <9db3d59f9af8769e38512aac4a9dd6766c97ded2.camel@sipsolutions.net> On Wed, 2018-05-23 at 23:48 +0200, Sebastian Berg wrote: > On Wed, 2018-05-23 at 17:33 -0400, Allan Haldane wrote: > > If we do not plan to replace it within numpy, we need to discuss a > bit > how it might affect infrastructure (multiple implementations....). > > There is the other discussion about how to replace it. By opening > up/creating new masked dtypes or similar (cool but unclear how > complex/long term) or `__array_ufunc__` based (relatively simple, > will > get rid of the nastier hacks that are currently needed). > > Or even both, just on different time scales? > I also somewhat like the idea of taking it out (once we have a first replacement) in the case that we have a plan to do a better/lower level replacement at a later point within numpy. Removal generally has its merits, but if a (mid term) replacement will come in any case, it would be nice to get those started first if possible. Otherwise downstream might end up having to fix up things twice. - Sebastian > My first gut feeling about the proposal is: I love the idea to get > rid > of it... but lets not do it, it does feel like it makes too much > infrastructure unclear. > > - Sebastian > > > > > > Allan > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From einstein.edison at gmail.com Thu May 24 11:40:00 2018 From: einstein.edison at gmail.com (Hameer Abbasi) Date: Thu, 24 May 2018 11:40:00 -0400 Subject: [Numpy-discussion] Splitting MaskedArray into a separate package In-Reply-To: <9db3d59f9af8769e38512aac4a9dd6766c97ded2.camel@sipsolutions.net> References: <45fa0f5f-26c6-ee9d-7239-8a77d95a6ba4@gmail.com> <6395ad57-345c-0b82-7fe3-6c90240fc7ef@gmail.com> <7bbac23bcedd1171ca8c0b15547dde2a550d0102.camel@sipsolutions.net> <9db3d59f9af8769e38512aac4a9dd6766c97ded2.camel@sipsolutions.net> Message-ID: I also somewhat like the idea of taking it out (once we have a first replacement) in the case that we have a plan to do a better/lower level replacement at a later point within numpy. Removal generally has its merits, but if a (mid term) replacement will come in any case, it would be nice to get those started first if possible. Otherwise downstream might end up having to fix up things twice. - Sebastian I also like the idea of designing a replacement first (using modern array protocols, perhaps in a separate repository) and then deprecating MaskedArray second. Deprecating an entire class in NumPy seems counterproductive, although I will admit I?ve never found use from it. From this thread, it?s clear that others have, though. Sent from Astro for Mac -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaime.frio at gmail.com Thu May 24 13:01:40 2018 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Thu, 24 May 2018 10:01:40 -0700 Subject: [Numpy-discussion] NumPy sprint May 24-25 at BIDS In-Reply-To: References: <35623d67-7255-3cf3-31db-91839c63a7b8@gmail.com> <42ca0e26-6f6a-cb4f-d471-4bdec92ef834@gmail.com> Message-ID: I have had a last minute meeting scheduled for this afternoon, so I'm afraid I won't be able to make it today... I will be there tomorrow for sure, though. What time do you plan on starting? Jaime On Fri, May 18, 2018 at 12:31 AM Stephan Hoyer wrote: > I will also be attending, on at least Thursday (and hopefully Friday, too). > > Best, > Stephan > > On Thu, May 17, 2018 at 1:40 PM Jaime Fern?ndez del R?o < > jaime.frio at gmail.com> wrote: > >> $#!#, was looking at the wrong calendar month: Thursday half day, Friday >> all day. >> >> Jaime >> >> On Thu, May 17, 2018 at 4:37 PM Jaime Fern?ndez del R?o < >> jaime.frio at gmail.com> wrote: >> >>> OK, make that all day Friday only, if it's Friday and Saturday. >>> >>> Jaime >>> >>> On Thu, May 17, 2018 at 4:36 PM Jaime Fern?ndez del R?o < >>> jaime.frio at gmail.com> wrote: >>> >>>> Hi Matti, >>>> >>>> I will be joining you on Thursday, sometime around noon, and all day >>>> Friday. >>>> >>>> Jaime >>>> >>>> On Thu, May 17, 2018 at 4:11 PM Matti Picus >>>> wrote: >>>> >>>>> On 09/05/18 13:33, Matti Picus wrote: >>>>> > A reminder - we will take advantage of a few NumPy developers being >>>>> at >>>>> > Berkeley to hold a two day sprint May 24-25 >>>>> > https://scisprints.github.io/#may-numpy-developer-sprint >>>>> > . >>>>> > We invite any core contributors who would like to attend and can >>>>> help >>>>> > if needed with travel and accomodations. >>>>> > >>>>> > Stefan and Matti >>>>> So far I know about Stefan, Nathaniel, Chuck and me. Things will work >>>>> better if we can get organized ahead of time. Anyone else planning on >>>>> attending for both days or part of the sprint, please drop me a line. >>>>> If >>>>> there are any issues, pull requests, NEPs, or ideas you would like us >>>>> to >>>>> work on please let me know, or add it to the Trello card >>>>> https://trello.com/c/fvSYkm2w >>>>> >>>>> Matti >>>>> _______________________________________________ >>>>> NumPy-Discussion mailing list >>>>> NumPy-Discussion at python.org >>>>> https://mail.python.org/mailman/listinfo/numpy-discussion >>>>> >>>> >>>> >>>> -- >>>> (\__/) >>>> ( O.o) >>>> ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus >>>> planes de dominaci?n mundial. >>>> >>> >>> >>> -- >>> (\__/) >>> ( O.o) >>> ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus >>> planes de dominaci?n mundial. >>> >> >> >> -- >> (\__/) >> ( O.o) >> ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes >> de dominaci?n mundial. >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From nathan12343 at gmail.com Thu May 24 14:13:16 2018 From: nathan12343 at gmail.com (Nathan Goldbaum) Date: Thu, 24 May 2018 13:13:16 -0500 Subject: [Numpy-discussion] Citation for ndarray Message-ID: Hi all, I see listed on the scipy.org site that the preferred citation for NumPy is the "Guide to NumPy": https://www.scipy.org/citing.html This could work for what I'm writing, but I'd prefer to find a citation specifically for NumPy's ndarray data structure. Does such a citation exist? Thanks! -Nathan -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefanv at berkeley.edu Thu May 24 14:31:16 2018 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Thu, 24 May 2018 11:31:16 -0700 Subject: [Numpy-discussion] Citation for ndarray In-Reply-To: References: Message-ID: <20180524183116.yqpzdf32rrqcqtdm@carbo> Hi Nathan, On Thu, 24 May 2018 13:13:16 -0500, Nathan Goldbaum wrote: > I see listed on the scipy.org site that the preferred citation for NumPy is > the "Guide to NumPy": > > https://www.scipy.org/citing.html > > This could work for what I'm writing, but I'd prefer to find a citation > specifically for NumPy's ndarray data structure. Does such a citation > exist? The citation used to point to "The NumPy array: a structure for efficient numerical computation" (https://arxiv.org/abs/1102.1523), but I asked that it be changed to give credit to Travis. I am not aware of publications for the original Numeric and NumArray; the first reference for NumPy itself is Travis's book. Best regards, St?fan From stefanv at berkeley.edu Thu May 24 14:23:05 2018 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Thu, 24 May 2018 11:23:05 -0700 Subject: [Numpy-discussion] NumPy sprint May 24-25 at BIDS In-Reply-To: References: <35623d67-7255-3cf3-31db-91839c63a7b8@gmail.com> <42ca0e26-6f6a-cb4f-d471-4bdec92ef834@gmail.com> Message-ID: <20180524182305.ppoqewqhmswnrgyh@carbo> Hi Jaime, On Thu, 24 May 2018 10:01:40 -0700, Jaime Fern?ndez del R?o wrote: > I will be there tomorrow for sure, though. What time do you plan on > starting? Thanks for the heads-up. We'll probably start around 9:30am. We're at 190 Doe Library on Berkeley campus. We started today by discussing how a draft roadmap could look; I'll post a more detailed update after the sprint. Best regards, St?fan From allanhaldane at gmail.com Thu May 24 16:23:07 2018 From: allanhaldane at gmail.com (Allan Haldane) Date: Thu, 24 May 2018 16:23:07 -0400 Subject: [Numpy-discussion] Splitting MaskedArray into a separate package In-Reply-To: <9db3d59f9af8769e38512aac4a9dd6766c97ded2.camel@sipsolutions.net> References: <45fa0f5f-26c6-ee9d-7239-8a77d95a6ba4@gmail.com> <6395ad57-345c-0b82-7fe3-6c90240fc7ef@gmail.com> <7bbac23bcedd1171ca8c0b15547dde2a550d0102.camel@sipsolutions.net> <9db3d59f9af8769e38512aac4a9dd6766c97ded2.camel@sipsolutions.net> Message-ID: <3bee4df3-a314-7b6e-06dc-7f8f09077ede@gmail.com> On 05/24/2018 11:31 AM, Sebastian Berg wrote: > I also somewhat like the idea of taking it out (once we have a first > replacement) in the case that we have a plan to do a better/lower level > replacement at a later point within numpy. > Removal generally has its merits, but if a (mid term) replacement will > come in any case, it would be nice to get those started first if > possible. > Otherwise downstream might end up having to fix up things twice. > > - Sebastian Yes, I think the way forward is to start working on a new masked array while keeping the old one in place. Once it has progressed a little and we can step back and look at it, we can consider how to switch over. I imagine we would have both present in numpy under different names for a while. Also, I think it would be nice to work on it soon because it is a chance for us to eat our own dogfood in the __array_ufunc__ interface, which is not yet set in stone so we can fix any problems we discover with it. Allan From m.h.vankerkwijk at gmail.com Fri May 25 10:27:35 2018 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Fri, 25 May 2018 10:27:35 -0400 Subject: [Numpy-discussion] Splitting MaskedArray into a separate package In-Reply-To: <3bee4df3-a314-7b6e-06dc-7f8f09077ede@gmail.com> References: <45fa0f5f-26c6-ee9d-7239-8a77d95a6ba4@gmail.com> <6395ad57-345c-0b82-7fe3-6c90240fc7ef@gmail.com> <7bbac23bcedd1171ca8c0b15547dde2a550d0102.camel@sipsolutions.net> <9db3d59f9af8769e38512aac4a9dd6766c97ded2.camel@sipsolutions.net> <3bee4df3-a314-7b6e-06dc-7f8f09077ede@gmail.com> Message-ID: Hi All, I agree with comments above that deprecating/removing MaskedArray is premature; we certainly depend on it in astropy (which is indeed what got me started to contribute to numpy -- it was quite buggy!). I also think that, unlike Matrix, it is far from a neglected part of numpy. Eric Wieser in particular has been cleaning it up quite a bit. Also like Allan, I'm excited about making a new version based on `__array_ufunc__`. Beyond this, I think it is actually very useful for numpy to contain at least one ndarray subclass, so that we have an internal check that any changes we make to the base ndarray actually work. So, my own sense would be that we should instead write a NEP with a roadmap of what we want MaskedArray 2.0 to be like (e.g., no more `nomask`...) All the best, Marten p.s. And, of course, deprecation of Matrix is actually starting to happen: with my PRs, it is now in a state where one could remove `matrixlib` and all tests would still pass, and there is a pending PR to start giving out PendingDeprectationWarning. From bruno.piguet at gmail.com Sat May 26 03:54:15 2018 From: bruno.piguet at gmail.com (bruno Piguet) Date: Sat, 26 May 2018 09:54:15 +0200 Subject: [Numpy-discussion] Splitting MaskedArray into a separate package In-Reply-To: References: <45fa0f5f-26c6-ee9d-7239-8a77d95a6ba4@gmail.com> <6395ad57-345c-0b82-7fe3-6c90240fc7ef@gmail.com> <7bbac23bcedd1171ca8c0b15547dde2a550d0102.camel@sipsolutions.net> <9db3d59f9af8769e38512aac4a9dd6766c97ded2.camel@sipsolutions.net> Message-ID: Disclaimer : this is a user's point of view. I never commited a line in numpy. In my usage, missing values happen or the need for some kind of mask, such as sea/land. I've been told, here, that using MA is superior to using NaNs, and indeed, I found a couple case where other libraries (matplotlib, ...) behaved better with MA than with NaNs in simple ndarrays. Thus, I fear moving masked arrays to a separate package would give them a second-class status, have them look as optional, and lower their support by third-party libraries. And I view a as a bad idea any suggestion to deprecate MaskedArray before any replacement is designed an implemented. Bruno. 2018-05-24 17:40 GMT+02:00 Hameer Abbasi : > I also somewhat like the idea of taking it out (once we have a first > replacement) in the case that we have a plan to do a better/lower level > replacement at a later point within numpy. > Removal generally has its merits, but if a (mid term) replacement will > come in any case, it would be nice to get those started first if > possible. > Otherwise downstream might end up having to fix up things twice. > > - Sebastian > > > I also like the idea of designing a replacement first (using modern array > protocols, perhaps in a separate repository) and then deprecating > MaskedArray second. Deprecating an entire class in NumPy seems > counterproductive, although I will admit I?ve never found use from it. From > this thread, it?s clear that others have, though. > > Sent from Astro for Mac > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre.augier at univ-grenoble-alpes.fr Sun May 27 11:27:12 2018 From: pierre.augier at univ-grenoble-alpes.fr (PIERRE AUGIER) Date: Sun, 27 May 2018 17:27:12 +0200 (CEST) Subject: [Numpy-discussion] Efficiency of Numpy wheels and simple way to benchmark Numpy installation? Message-ID: <366729222.532889.1527434832558.JavaMail.zimbra@univ-grenoble-alpes.fr> Hello, I don't know if it is a good place to ask such questions. As advised here https://www.scipy.org/scipylib/mailing-lists.html#stackoverflow, I first posted a question on stackoverflow: https://stackoverflow.com/questions/50475989/efficiency-of-numpy-wheels-and-simple-benchmark-for-numpy-installations Since I got no feedback, I try here. My questions are: - When we care about performance, is it a good practice to rely on wheels (especially for Numpy)? Will it be slower than using (for example) a conda built Numpy? - Are there simple commands to benchmark Numpy installations and get a good idea of their overall performance? I explain a little bit more in the stackoverflow question... Pierre Augier From njs at pobox.com Sun May 27 16:12:00 2018 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 27 May 2018 13:12:00 -0700 Subject: [Numpy-discussion] Efficiency of Numpy wheels and simple way to benchmark Numpy installation? In-Reply-To: <366729222.532889.1527434832558.JavaMail.zimbra@univ-grenoble-alpes.fr> References: <366729222.532889.1527434832558.JavaMail.zimbra@univ-grenoble-alpes.fr> Message-ID: Performance is an incredibly multi-dimensional thing. Modern computers are incredibly complex, with layers of interacting caches, different microarchitectural features (do you have AVX2? does your cpu's branch predictor interact in a funny way with your workload?), compiler optimizations that vary from version to version, ... and different parts of numpy are affected differently by an these things. So, the only really reliable answer to a question like this is, always, that you need to benchmark the application you actually care about in the contexts where it will actually run (or as close as you can get to that). That said, as a general rule of thumb, the main difference between different numpy builds is which BLAS library they use, which primarily affects the speed of numpy's linear algebra routines. The wheels on pypi use either OpenBLAS (on Windows and Linux), or Accelerate (in MacOS. The conda packages provided as part of the Anaconda distribution normally use Intel's MKL. All three of these libraries are generally pretty good. They're all serious attempts to make a blazing fast linear algebra library, and much much faster than naive implementations. Generally MKL has a reputation for being somewhat faster than the others, when there's a difference. But again, whether this happens, or is significant, for *your* app is impossible to say without trying it. -n On Sun, May 27, 2018, 08:32 PIERRE AUGIER < pierre.augier at univ-grenoble-alpes.fr> wrote: > Hello, > > I don't know if it is a good place to ask such questions. As advised here > https://www.scipy.org/scipylib/mailing-lists.html#stackoverflow, I first > posted a question on stackoverflow: > > > https://stackoverflow.com/questions/50475989/efficiency-of-numpy-wheels-and-simple-benchmark-for-numpy-installations > > Since I got no feedback, I try here. My questions are: > > - When we care about performance, is it a good practice to rely on wheels > (especially for Numpy)? Will it be slower than using (for example) a conda > built Numpy? > > - Are there simple commands to benchmark Numpy installations and get a > good idea of their overall performance? > > I explain a little bit more in the stackoverflow question... > > Pierre Augier > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Sun May 27 17:20:57 2018 From: matthew.brett at gmail.com (Matthew Brett) Date: Sun, 27 May 2018 22:20:57 +0100 Subject: [Numpy-discussion] Efficiency of Numpy wheels and simple way to benchmark Numpy installation? In-Reply-To: References: <366729222.532889.1527434832558.JavaMail.zimbra@univ-grenoble-alpes.fr> Message-ID: Hi, On Sun, May 27, 2018 at 9:12 PM, Nathaniel Smith wrote: > Performance is an incredibly multi-dimensional thing. Modern computers are > incredibly complex, with layers of interacting caches, different > microarchitectural features (do you have AVX2? does your cpu's branch > predictor interact in a funny way with your workload?), compiler > optimizations that vary from version to version, ... and different parts of > numpy are affected differently by an these things. > > So, the only really reliable answer to a question like this is, always, that > you need to benchmark the application you actually care about in the > contexts where it will actually run (or as close as you can get to that). > > That said, as a general rule of thumb, the main difference between different > numpy builds is which BLAS library they use, which primarily affects the > speed of numpy's linear algebra routines. The wheels on pypi use either > OpenBLAS (on Windows and Linux), or Accelerate (in MacOS. The conda packages > provided as part of the Anaconda distribution normally use Intel's MKL. > > All three of these libraries are generally pretty good. They're all serious > attempts to make a blazing fast linear algebra library, and much much faster > than naive implementations. Generally MKL has a reputation for being > somewhat faster than the others, when there's a difference. But again, > whether this happens, or is significant, for *your* app is impossible to say > without trying it. Yes - I'd be surprised if you find a significant difference in performance for real usage between pip / OpenBLAS and conda / MKL - but if you do, please let us know, and we'll investigate. Cheers, Matthew From er.gauravsinha at gmail.com Mon May 28 00:21:36 2018 From: er.gauravsinha at gmail.com (gaurav sinha) Date: Mon, 28 May 2018 09:51:36 +0530 Subject: [Numpy-discussion] Guidance required for automation of Excel work and Data Sciences Message-ID: Dear Experts Greetings!! *About me- I am Telecom professional having ~10 years of experience in 2G/3G/4G mobile technologies. * *I have selected Python as my Programming Language. Its been some time, I stared to learn and work in Python. * *Subject*- Guidance required for automation of Excel work and suggestion for Data Sciences. *Goal*- We have lots of parameters, spreed in many excel files (raw excel files having different parameter on data/time basis). I have to make a customized Excel sheet from these raw excel sheet having data (parameters vs data) which can be used as one button solution. I mean to say that, just we need to fetch the excel reports (raw) and put in a folder and just press one button/script file to generate our final report. Please suggest how to proceed. Also please suggest how to master the Python in the filed of Data Sciences. Thanks a lot!! * BR//GAURAV SINHA* -------------- next part -------------- An HTML attachment was scrubbed... URL: From dipo.elegbede at gmail.com Mon May 28 07:20:05 2018 From: dipo.elegbede at gmail.com (DIPO ELEGBEDE) Date: Mon, 28 May 2018 06:20:05 -0500 Subject: [Numpy-discussion] Guidance required for automation of Excel work and Data Sciences In-Reply-To: References: Message-ID: Hi Gaurav, You may want to look at: www.python-excel.org This site lists out possible libraries you can use. Once you try out any of these and run into a snag, you can hit the group up again. HTH. Regards, Muhammed. On Sun, May 27, 2018, 23:22 gaurav sinha wrote: > Dear Experts > Greetings!! > > *About me- I am Telecom professional having ~10 years of experience in > 2G/3G/4G mobile technologies. * > > *I have selected Python as my Programming Language. Its been some time, I > stared to learn and work in Python. * > > *Subject*- Guidance required for automation of Excel work and suggestion > for Data Sciences. > > *Goal*- We have lots of parameters, spreed in many excel files (raw excel > files having different parameter on data/time basis). > I have to make a customized Excel sheet from these raw excel sheet having > data (parameters vs data) which can be used as one button solution. I mean > to say that, just we need to fetch the excel reports (raw) and put in a > folder and just press one button/script file to generate our final report. > > Please suggest how to proceed. > > Also please suggest how to master the Python in the filed of Data > Sciences. > > Thanks a lot!! > > > > > * BR//GAURAV SINHA* > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Mon May 28 07:40:28 2018 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 28 May 2018 12:40:28 +0100 Subject: [Numpy-discussion] [SciPy-User] Guidance required for automation of Excel work and Data Sciences In-Reply-To: References: Message-ID: Hi, On Mon, May 28, 2018 at 5:21 AM, gaurav sinha wrote: > Dear Experts > Greetings!! > > About me- I am Telecom professional having ~10 years of experience in > 2G/3G/4G mobile technologies. > > I have selected Python as my Programming Language. Its been some time, I > stared to learn and work in Python. > > Subject- Guidance required for automation of Excel work and suggestion for > Data Sciences. > > Goal- We have lots of parameters, spreed in many excel files (raw excel > files having different parameter on data/time basis). > I have to make a customized Excel sheet from these raw excel sheet having > data (parameters vs data) which can be used as one button solution. I mean > to say that, just we need to fetch the excel reports (raw) and put in a > folder and just press one button/script file to generate our final report. > > Please suggest how to proceed. > > Also please suggest how to master the Python in the filed of Data Sciences. You're currently on the Scipy mailing list; Scipy is a library of numerical routines, often used by other libraries that have a stronger link to data science, particularly Pandas, and scikit-learn. As a first pass, I suggest you try over at the Pandas mailing list, and you might want to start with this book, to get going: http://wesmckinney.com/pages/book.html Cheers, Matthew From ben.v.root at gmail.com Mon May 28 09:13:02 2018 From: ben.v.root at gmail.com (Benjamin Root) Date: Mon, 28 May 2018 09:13:02 -0400 Subject: [Numpy-discussion] [SciPy-User] Guidance required for automation of Excel work and Data Sciences In-Reply-To: References: Message-ID: the openpyxl package will be your friend. Here is a whole chapter on using it: https://automatetheboringstuff.com/chapter12/ Welcome to python! Ben Root On Mon, May 28, 2018 at 12:21 AM, gaurav sinha wrote: > Dear Experts > Greetings!! > > *About me- I am Telecom professional having ~10 years of experience in > 2G/3G/4G mobile technologies. * > > *I have selected Python as my Programming Language. Its been some time, I > stared to learn and work in Python. * > > *Subject*- Guidance required for automation of Excel work and suggestion > for Data Sciences. > > *Goal*- We have lots of parameters, spreed in many excel files (raw excel > files having different parameter on data/time basis). > I have to make a customized Excel sheet from these raw excel sheet having > data (parameters vs data) which can be used as one button solution. I mean > to say that, just we need to fetch the excel reports (raw) and put in a > folder and just press one button/script file to generate our final report. > > Please suggest how to proceed. > > Also please suggest how to master the Python in the filed of Data > Sciences. > > Thanks a lot!! > > > > > * BR//GAURAV SINHA* > > _______________________________________________ > SciPy-User mailing list > SciPy-User at python.org > https://mail.python.org/mailman/listinfo/scipy-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Mon May 28 19:26:33 2018 From: shoyer at gmail.com (Stephan Hoyer) Date: Mon, 28 May 2018 16:26:33 -0700 Subject: [Numpy-discussion] matmul as a ufunc In-Reply-To: References: Message-ID: On Mon, May 21, 2018 at 5:42 PM Matti Picus wrote: > - create a wrapper that can convince the ufunc mechanism to call > __array_ufunc__ even on functions that are not true ufuncs > I am somewhat opposed to this approach, because __array_ufunc__ is about overloading ufuncs, and as soon as we relax this guarantee the set of invariants __array_ufunc__ implementors rely on becomes much more limited. We really should have another mechanism for arbitrary function overloading in NumPy (NEP to follow shortly!). > - expand the semantics of core signatures so that a single matmul ufunc > can implement matrix-matrix, vector-matrix, matrix-vector, and > vector-vector multiplication. I was initially concerned that adding optional dimensions for gufuncs would introduce additional complexity for only the benefit of a single function (matmul), but I'm now convinced that it makes sense: 1. All other arithmetic overloads use __array_ufunc__, and it would be nice to keep @/matmul in the same place. 2. There's a common family of gufuncs for which optional dimensions like np.matmul make sense: matrix functions where 1D arrays should be treated as 2D row- or column-vectors. One example of this class of behavior would be np.linalg.solve, which could support vectors like Ax=b and matrices like Ax=B with the signature (m,m),(m,n?)->(m,n?). We couldn't immediately make np.linalg.solve a gufunc since it uses a subtly different dispatching rule, but it's the same use-case. Another example would be the "matrix transpose" function that has been occasionally proposed, to swap the last two dimensions of an array. It could have the signature (m?,n)->(n,m?), which ensure that it is still well defined (as the identity) on 1d arrays. -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Mon May 28 20:11:25 2018 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 28 May 2018 17:11:25 -0700 Subject: [Numpy-discussion] matmul as a ufunc In-Reply-To: References: Message-ID: On Mon, May 28, 2018 at 4:26 PM, Stephan Hoyer wrote: > On Mon, May 21, 2018 at 5:42 PM Matti Picus wrote: >> >> - create a wrapper that can convince the ufunc mechanism to call >> __array_ufunc__ even on functions that are not true ufuncs > > > I am somewhat opposed to this approach, because __array_ufunc__ is about > overloading ufuncs, and as soon as we relax this guarantee the set of > invariants __array_ufunc__ implementors rely on becomes much more limited. > > We really should have another mechanism for arbitrary function overloading > in NumPy (NEP to follow shortly!). > >> >> - expand the semantics of core signatures so that a single matmul ufunc >> can implement matrix-matrix, vector-matrix, matrix-vector, and >> vector-vector multiplication. > > > I was initially concerned that adding optional dimensions for gufuncs would > introduce additional complexity for only the benefit of a single function > (matmul), but I'm now convinced that it makes sense: > 1. All other arithmetic overloads use __array_ufunc__, and it would be nice > to keep @/matmul in the same place. > 2. There's a common family of gufuncs for which optional dimensions like > np.matmul make sense: matrix functions where 1D arrays should be treated as > 2D row- or column-vectors. > > One example of this class of behavior would be np.linalg.solve, which could > support vectors like Ax=b and matrices like Ax=B with the signature > (m,m),(m,n?)->(m,n?). We couldn't immediately make np.linalg.solve a gufunc > since it uses a subtly different dispatching rule, but it's the same > use-case. Specifically, np.linalg.solve uses a unique rule where solve(a, b) assumes that b is a stack of vectors if (a.ndim - 1 == b.ndim), and otherwise assumes that it's a stack of matrices. This is pretty confusing. You'd think that solve(a, b) should be equivalent to (inv(a) @ b), but it isn't. Say a.shape == (10, 3, 3) and b.shape == (3,). Then inv(a) @ b works, and does what you'd expect: for each of the ten 3x3 matrices in a, it computes the inverse and multiplies it by the 1-d vector in b (treated as a column vector). But solve(a, b) is an error, because the dimension aren't lined up to trigger the special handling for 1-d vectors. Or, say a.shape == (10, 3, 3) and b.shape == (3, 3). Then again inv(a) @ b works, and does what you'd expect: for each of the ten 3x3 matrices in a, it computes the inverse and multiplies it by the 3x3 matrix in b. But again solve(a, b) is an error -- this time because the special handling for 1-d vectors *does* kick in, even though it doesn't make sense: it tries to match up the ten 3x3 matrices in a against the three one-dimensional vectors in b, and 10 != 3 so the broadcasting fails. This also points to even more confusing possibilities: if a.shape == (3, 3) or (3, 3, 3, 3) and b.shape == (3, 3), then inv(a) @ b and solve(a, b) both work and do the same thing. But if a.shape == (3, 3, 3), then inv(a) @ b and solve(a, b) both work, and do totally *different* things. I wonder if we should deprecate these corner cases, and eventually migrate to making inv(a) @ b and solve(a, b) the same in all situations. If we did, then solve(a, b) would actually be a gufunc with signature (m,m),(m,n?)->(m,n?). I think the cases that would need to be changed are those where (a.ndim - 1 == b.ndim and b.ndim > 1). My guess is that this happens very rarely in existing code, especially since (IIRC) this behavior was only added a few years ago, when we gufunc-ified numpy.linalg. > Another example would be the "matrix transpose" function that has been > occasionally proposed, to swap the last two dimensions of an array. It could > have the signature (m?,n)->(n,m?), which ensure that it is still well > defined (as the identity) on 1d arrays. Unfortunately I don't think we could make "broadcasting matrix transpose" be literally a gufunc, since it should return a view. But I guess there'd still be some value in having the notation available just when talking about it, so we could say "this operation is *like* a gufunc with signature (m?,n)->(n,m?), except that it returns a view". -n -- Nathaniel J. Smith -- https://vorpus.org From wieser.eric+numpy at gmail.com Mon May 28 22:36:35 2018 From: wieser.eric+numpy at gmail.com (Eric Wieser) Date: Mon, 28 May 2018 19:36:35 -0700 Subject: [Numpy-discussion] matmul as a ufunc In-Reply-To: References: Message-ID: which ensure that it is still well defined (as the identity) on 1d arrays. This strikes me as a bad idea. There?s already enough confusion from beginners that array_1d.T is a no-op. If we introduce a matrix-transpose, it should either error on <1d inputs with a useful message, or insert the extra dimension. I?d favor the former. Eric On Mon, 28 May 2018 at 16:27 Stephan Hoyer shoyer at gmail.com wrote: On Mon, May 21, 2018 at 5:42 PM Matti Picus wrote: > >> - create a wrapper that can convince the ufunc mechanism to call >> __array_ufunc__ even on functions that are not true ufuncs >> > > I am somewhat opposed to this approach, because __array_ufunc__ is about > overloading ufuncs, and as soon as we relax this guarantee the set of > invariants __array_ufunc__ implementors rely on becomes much more limited. > > We really should have another mechanism for arbitrary function overloading > in NumPy (NEP to follow shortly!). > > >> - expand the semantics of core signatures so that a single matmul ufunc >> can implement matrix-matrix, vector-matrix, matrix-vector, and >> vector-vector multiplication. > > > I was initially concerned that adding optional dimensions for gufuncs > would introduce additional complexity for only the benefit of a single > function (matmul), but I'm now convinced that it makes sense: > 1. All other arithmetic overloads use __array_ufunc__, and it would be > nice to keep @/matmul in the same place. > 2. There's a common family of gufuncs for which optional dimensions like > np.matmul make sense: matrix functions where 1D arrays should be treated as > 2D row- or column-vectors. > > One example of this class of behavior would be np.linalg.solve, which > could support vectors like Ax=b and matrices like Ax=B with the signature > (m,m),(m,n?)->(m,n?). We couldn't immediately make np.linalg.solve a gufunc > since it uses a subtly different dispatching rule, but it's the same > use-case. > > Another example would be the "matrix transpose" function that has been > occasionally proposed, to swap the last two dimensions of an array. It > could have the signature (m?,n)->(n,m?), which ensure that it is still well > defined (as the identity) on 1d arrays. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From wieser.eric+numpy at gmail.com Mon May 28 23:06:55 2018 From: wieser.eric+numpy at gmail.com (Eric Wieser) Date: Mon, 28 May 2018 20:06:55 -0700 Subject: [Numpy-discussion] Adding take_along_axis and put_along_axis functions Message-ID: These functions provide a vectorized way of using one array to look up items in another. In particular, they extend the 1d: a = np.array([4, 5, 6, 1, 2, 3]) b = np.array(["four", "five", "six", "one", "two", "three"]) i = a.argsort() b_sorted = b[i] To work for higher-dimensions: a = np.array([[4, 1], [5, 2], [6, 3]]) b = np.array([["four", "one"], ["five", "two"], ["six", "three"]]) i = a.argsort(axis=1) b_sorted = np.take_along_axis(b, i, axis=1) put_along_axis is the obvious but less useful dual to this operation, inserting elements rather than extracting them. (Unlike put and take which are not obvious duals). These have been merged in gh-11105 , but as a new addition this probably should have gone by the mailing list first. There was a lack of consensus in gh-8714 about how best to generalize to differing dimensions, so only the non-controversial case where the indices and array have the same dimensions was implemented. These names were chosen to mirror apply_along_axis, which behaves similarly. Do they seem reasonable? Eric ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Mon May 28 23:40:54 2018 From: shoyer at gmail.com (Stephan Hoyer) Date: Mon, 28 May 2018 20:40:54 -0700 Subject: [Numpy-discussion] matmul as a ufunc In-Reply-To: References: Message-ID: On Mon, May 28, 2018 at 7:36 PM Eric Wieser wrote: > which ensure that it is still well defined (as the identity) on 1d arrays. > > This strikes me as a bad idea. There?s already enough confusion from > beginners that array_1d.T is a no-op. If we introduce a matrix-transpose, > it should either error on <1d inputs with a useful message, or insert the > extra dimension. I?d favor the former. > To be clear: matrix transpose is an example use-case rather than a serious proposal in this discussion. But given that idiomatic NumPy code uses 1D arrays in favor of explicit row/column vectors with shapes (1,n) and (n,1), I do think it does make sense for matrix transpose on 1D arrays to be the identity, because matrix transpose should convert back and forth between row and column vectors representations. Certainly, matrix transpose should error on 0d arrays, because it doesn't make sense to transpose a scalar. -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Tue May 29 00:01:35 2018 From: shoyer at gmail.com (Stephan Hoyer) Date: Mon, 28 May 2018 21:01:35 -0700 Subject: [Numpy-discussion] Adding take_along_axis and put_along_axis functions In-Reply-To: References: Message-ID: As I'm sure I stated in the GItHub discussion, I strongly support adding these functions to NumPy. This logic is non-trivial to get right and is quite broadly useful. These names also seem natural to me. On Mon, May 28, 2018 at 8:07 PM Eric Wieser wrote: > These functions provide a vectorized way of using one array to look up > items in another. In particular, they extend the 1d: > > a = np.array([4, 5, 6, 1, 2, 3]) > b = np.array(["four", "five", "six", "one", "two", "three"]) > i = a.argsort() > b_sorted = b[i] > > To work for higher-dimensions: > > a = np.array([[4, 1], [5, 2], [6, 3]]) > b = np.array([["four", "one"], ["five", "two"], ["six", "three"]]) > i = a.argsort(axis=1) > b_sorted = np.take_along_axis(b, i, axis=1) > > put_along_axis is the obvious but less useful dual to this operation, > inserting elements rather than extracting them. (Unlike put and take > which are not obvious duals). > > These have been merged in gh-11105 > , but as a new addition this > probably should have gone by the mailing list first. > > There was a lack of consensus in gh-8714 > about how best to generalize > to differing dimensions, so only the non-controversial case where the > indices and array have the same dimensions was implemented. > > These names were chosen to mirror apply_along_axis, which behaves > similarly. Do they seem reasonable? > > Eric > ? > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From deak.andris at gmail.com Tue May 29 05:22:09 2018 From: deak.andris at gmail.com (Andras Deak) Date: Tue, 29 May 2018 11:22:09 +0200 Subject: [Numpy-discussion] matmul as a ufunc In-Reply-To: References: Message-ID: On Tue, May 29, 2018 at 5:40 AM, Stephan Hoyer wrote: > But given that idiomatic NumPy code uses 1D arrays in favor of explicit > row/column vectors with shapes (1,n) and (n,1), I do think it does make > sense for matrix transpose on 1D arrays to be the identity, because matrix > transpose should convert back and forth between row and column vectors > representations. > > Certainly, matrix transpose should error on 0d arrays, because it doesn't > make sense to transpose a scalar. Apologies for the probably academic nitpick, but if idiomatic code uses 1d arrays as vectors then shouldn't scalars be compatible with matrices with dimension (in the mathematical sense) of 1? Since the matrix product of shapes (1,n) and (n,1) is (1,1) but the same for shapes (n,) and (n,) is (), it might make sense after all for the matrix transpose to be identity for scalars. I'm aware that this is tangential to the primary discussion, but I'm also wondering if I'm being confused about the subject (wouldn't be the first time that I got confused about numpy scalars). Andr?s From valerio.maggio at gmail.com Tue May 29 05:58:39 2018 From: valerio.maggio at gmail.com (Valerio Maggio) Date: Tue, 29 May 2018 11:58:39 +0200 Subject: [Numpy-discussion] ANN: EuroScipy 2018 Message-ID: *** Apologies if you receive multiple copies *** Dear Colleagues, We are delighted to invite you to join us for the **11th European Conference on Python in Science**. The EuroSciPy 2018 (https://www.euroscipy.org/2018/) Conference will be organised by Fondazione Bruno Kessler (FBK) and will take place from August 28 to September 1 in **Trento, Italy**. The EuroSciPy meeting is a cross-disciplinary gathering focused on the use and development of the Python language in scientific research. This event strives to bring together both users and developers of scientific tools, as well as academic research and state of the art industry. The conference will be structured as it follows: Aug, 28-29 : Tutorials and Hands-on Aug, 30-31 : Main Conference Sep, 1 : Sprint TOPICS OF INTEREST: =================== Presentations of scientific tools and libraries using the Python language, including but not limited to: - Algorithms implemented or exposed in Python - Astronomy - Data Visualisation - Deep Learning & AI - Earth, Ocean and Geo Science - General-purpose Python tools that can be of special interest to the scientific community. - Image Processing - Materials Science - Parallel computing - Political and Social Sciences - Project Jupyter - Reports on the use of Python in scientific achievements or ongoing projects. - Robotics & IoT - Scientific data flow and persistence - Scientific visualization - Simulation - Statistics - Vector and array manipulation - Web applications and portals for science and engineering - 3D Printing CALL FOR PROPOSALS: =================== EuroScipy will accept three different kinds of contributions: * Regular Talks: standard talks for oral presentations, allocated in time slots of 15, or 30 minutes, depending on your preference and scheduling constraints. Each time slot considers a Q&A session at the end of the talk (at least, 5 mins). * Hands-on Tutorials: These are beginner or advanced training sessions to dive into the subject with all details. These sessions are 90 minutes long, and the audience will be strongly encouraged to bring a laptop to experiment. * Poster: EuroScipy will host two poster sessions during the two days of Main Conference. So attendees and students are highly encourage to present their work and/or preliminary results as posters. Proposals should be submitted using the EuroScipy submission system at https://pretalx.com/euroscipy18. **Submission deadline is May, 31st 2018**. REGISTRATION & FEES: ==================== To register to EuroScipy 2018, please go to http://euroscipy2018.eventbrite.co.uk or to http://www.euroscipy.org/2018/ Fees: ----- | Tutorials | Student* | Academic/Individual | Industry| |-----------------------------|----------|---------------------|----------| | Early Bird (till July, 1st) | ? 50,00 | ? 70,00 | ? 125,00 | | Regular (till Aug, 5th | ? 100,00 | ? 110,00 | ? 250,00 | | Late (till Aug, 22nd) | ? 135,00 | ? 135,00 | ? 300,00 | | Main Conference | Student* | Academic/Individual | Industry| |-----------------------------|----------|---------------------|----------| | Early Bird (till July, 1st) | ? 50,00 | ? 70,00 | ? 125,00 | | Regular (till Aug, 5th | ? 100,00 | ? 110,00 | ? 250,00 | | Late (till Aug, 22nd) | ? 135,00 | ? 135,00 | ? 300,00 | * A proof of student status will be required at time of the registration. kind regards, Valerio EuroScipy 2018 Organising Committee, Email: info at euroscipy.org Website: http://www.euroscipy.org/2018 twitter: @euroscipy -------------- next part -------------- An HTML attachment was scrubbed... URL: From davidmenhur at gmail.com Tue May 29 06:16:07 2018 From: davidmenhur at gmail.com (=?UTF-8?B?RGHPgGlk?=) Date: Tue, 29 May 2018 12:16:07 +0200 Subject: [Numpy-discussion] matmul as a ufunc In-Reply-To: References: Message-ID: On 29 May 2018 at 05:40, Stephan Hoyer wrote: > But given that idiomatic NumPy code uses 1D arrays in favor of explicit > row/column vectors with shapes (1,n) and (n,1), I do think it does make > sense for matrix transpose on 1D arrays to be the identity, because matrix > transpose should convert back and forth between row and column vectors > representations. > When doing algebra on paper, I like the braket notation. It makes abundantly clear the shape of the outputs, without having to remember on which side the transpose goes: is a scalar, |u> make sense to transpose a scalar. > Unless the scalar is 8, in which case the transpose is np.inf... Right now, np.int(8).T throws an error, but np.transpose(np.int(8)) gives a 0-d array. On one hand, it is nice to be able to use the same code for scalars as for vectors, but on the other, you may be making a mistake. /David. -------------- next part -------------- An HTML attachment was scrubbed... URL: From deak.andris at gmail.com Tue May 29 06:51:09 2018 From: deak.andris at gmail.com (Andras Deak) Date: Tue, 29 May 2018 12:51:09 +0200 Subject: [Numpy-discussion] matmul as a ufunc In-Reply-To: References: Message-ID: On Tue, May 29, 2018 at 12:16 PM, Da?id wrote: > Right now, np.int(8).T throws an error, but np.transpose(np.int(8)) gives a > 0-d array. On one hand, it is nice to be able to use the same code for `np.int` is just python `int`! What you mean is `np.int64(8).T` which works fine, so does `np.array(8).T`. From ilhanpolat at gmail.com Tue May 29 08:14:56 2018 From: ilhanpolat at gmail.com (Ilhan Polat) Date: Tue, 29 May 2018 14:14:56 +0200 Subject: [Numpy-discussion] matmul as a ufunc In-Reply-To: References: Message-ID: Apart from the math-validity discussion, in my experience errors are used a bit too generously in the not-allowed ops. No ops are fine once you learn more about them such as transpose on 1D arrays (good or bad is another discussion). But raising errors bloat the computational code too much. "Is it a scalar oh then do this is it 1D oh make this one is it 2D then do something else." type of coding is really making life difficult. Most of my time in the numerical code is spent on trying to catch scalars and 1D arrays and writing exceptions because I can't predict what the user would do or what the result should be after certain operations. Quite unwillingly, I've started making everything 2D whether it is required or not because then I can just avoid the following np.eye(4)[:, 1] # 1d np.eye(4)[:, 1:2] #2d np.eye(4)[:, [1]] #2d np.eye(4)[:, [1]] @ 5 # Error np.eye(4)[:, [1]] @ np.array(5) #Error np.eye(4)[:, [1]] @ np.array([5]) # Result is 1D np.eye(4)[:, [1]] @ np.array([[5]]) # Result 2D So imagine I'm trying to get a simple multiply_these function, I have already quite some cases to consider such that the function is "Pythonic". If the second argument is int,float do *-mult, if it is a numpy array but has no dimensions then again *-mult but if it is 1d keep dims and also if it is 2d do @-mult. Add broadcasting rules on top of this and it gets a pretty wordy function.Hence, what I would suggest is to also include the use cases while deciding the behavior of a single functionality. So indeed it doesn't make sense to transpose 0d array but as an array object now it would start to have a lot of Wat! moments. https://www.destroyallsoftware.com/talks/wat On Tue, May 29, 2018 at 12:51 PM, Andras Deak wrote: > On Tue, May 29, 2018 at 12:16 PM, Da?id wrote: > > Right now, np.int(8).T throws an error, but np.transpose(np.int(8)) > gives a > > 0-d array. On one hand, it is nice to be able to use the same code for > > `np.int` is just python `int`! What you mean is `np.int64(8).T` which > works fine, so does `np.array(8).T`. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jotasi_numpy_scipy at posteo.de Tue May 29 09:14:29 2018 From: jotasi_numpy_scipy at posteo.de (Jonathan Tammo Siebert) Date: Tue, 29 May 2018 15:14:29 +0200 Subject: [Numpy-discussion] Inconsistent results for the covariance matrix between scipy.optimize.curve_fit and numpy.polyfit Message-ID: <1527599669.32567.9.camel@posteo.de> Hi, I hope this is the appropriate place to ask something like this, otherwise please let me know (or feel free to ignore this). Also I hope that I do not misunderstood something or did some silly mistake. If so, please let me know as well! TLDR: When scaling the covariance matrix based on the residuals, scipy.optimize.curve_fit uses a factor of chisq(popt)/(M-N) (with M=number of point, N=number of parameters) and numpy.polyfit uses chisq(popt)/(M-N-2). I am wondering which is correct. I am somewhat confused about different results I am getting for the covariance matrix of a simple linear fit, when comparing `scipy.optimize.curve_fit` and `numpy.polyfit`. I am aware, that `curve_fit` solves the more general non-linear problem numerically, while `polyfit` finds an analytical solution to the linear problem. However, both converge to the same solution, so I suspect that this difference is not important here. The difference, I am curious about is not in the returned parameters but in the estimate of the corresponding covariance matrix. As I understand, there are two different ways to estimate it, based either on the absolute values of the provided uncertainties or by interpreting those only as weights and then scaling the matrix to produce an appropriate reduced chisq. To that end, curve_fit has the parameter: https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.cur ve_fit.html: "absolute_sigma : bool, optional If True, sigma is used in an absolute sense and the estimated parameter covariance pcov reflects these absolute values. If False, only the relative magnitudes of the sigma values matter. The returned parameter covariance matrix pcov is based on scaling sigma by a constant factor. This constant is set by demanding that the reduced chisq for the optimal parameters popt when using the scaled sigma equals unity. In other words, sigma is scaled to match the sample variance of the residuals after the fit. Mathematically, pcov(absolute_sigma=False) = pcov(absolute_sigma=True) * chisq(popt)/(M-N)" https://docs.scipy.org/doc/numpy/reference/generated/numpy.polyfit.html on the other hand, does not say anything about how the covariance matrix is estimated. To my understanding, its default should correspond to `absolute_sigma=False` for `curve_fit`. As `polyfit` has a weight parameter instead of an uncertainty parameter, I guess the difference in default behavior is not that surprising. However, even when specifying `absolute_sigma=False`, `curve_fit` and `polyfit` produce different covariance matrices as the applied scaling factors are chisq(popt)/(M-N-2) for `polyfit` (https://github.com/numpy/numpy/blob/6a58e25703cbecb6786faa09a04ae2ec82 21348b/numpy/lib/polynomial.py#L598-L605) and chisq(popt)/(M-N) for `curve_fit` (https://github.com/scipy/scipy/blob/607a21e07dad234f8e63fcf03b7994137a 3ccd5b/scipy/optimize/minpack.py#L781-L782). The argument given in a comment to the scaling `polyfit` is: "Some literature ignores the extra -2.0 factor in the denominator, but it is included here because the covariance of Multivariate Student-T (which is implied by a Bayesian uncertainty analysis) includes it. Plus, it gives a slightly more conservative estimate of uncertainty.", but honestly, in a quick search, I was not able to find any literature not ignoring the extra "factor". But obviously, I could very well be misunderstanding something. Nonetheless, as `curve_fit` ignores it as well, I was wondering whether those two shouldn't give consistent results and if so, which would be the correct solution. Best,? Jonathan From josef.pktd at gmail.com Tue May 29 10:47:22 2018 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 29 May 2018 10:47:22 -0400 Subject: [Numpy-discussion] Inconsistent results for the covariance matrix between scipy.optimize.curve_fit and numpy.polyfit In-Reply-To: <1527599669.32567.9.camel@posteo.de> References: <1527599669.32567.9.camel@posteo.de> Message-ID: On Tue, May 29, 2018 at 9:14 AM, Jonathan Tammo Siebert < jotasi_numpy_scipy at posteo.de> wrote: > Hi, > > I hope this is the appropriate place to ask something like > this, otherwise please let me know (or feel free to ignore > this). Also I hope that I do not misunderstood something or > did some silly mistake. If so, please let me know as well! > > TLDR: > When scaling the covariance matrix based on the residuals, > scipy.optimize.curve_fit uses a factor of chisq(popt)/(M-N) > (with M=number of point, N=number of parameters) and > numpy.polyfit uses chisq(popt)/(M-N-2). I am wondering which > is correct. > > I am somewhat confused about different results I am getting > for the covariance matrix of a simple linear fit, when > comparing `scipy.optimize.curve_fit` and `numpy.polyfit`. I > am aware, that `curve_fit` solves the more general non-linear > problem numerically, while `polyfit` finds an analytical > solution to the linear problem. However, both converge to the > same solution, so I suspect that this difference is not > important here. The difference, I am curious about is not in > the returned parameters but in the estimate of the > corresponding covariance matrix. As I understand, there are > two different ways to estimate it, based either on the > absolute values of the provided uncertainties or by > interpreting those only as weights and then scaling the > matrix to produce an appropriate reduced chisq. To that end, > curve_fit has the parameter: > https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.cur > ve_fit.html: > "absolute_sigma : bool, optional > If True, sigma is used in an absolute sense and the > estimated parameter covariance pcov reflects these absolute > values. > If False, only the relative magnitudes of the sigma values > matter. The returned parameter covariance matrix pcov is > based on scaling sigma by a constant factor. This constant > is set by demanding that the reduced chisq for the optimal > parameters popt when using the scaled sigma equals unity. In > other words, sigma is scaled to match the sample variance of > the residuals after the fit. Mathematically, > pcov(absolute_sigma=False) = pcov(absolute_sigma=True) * > chisq(popt)/(M-N)" > https://docs.scipy.org/doc/numpy/reference/generated/numpy.polyfit.html > on the other hand, does not say anything about how the > covariance matrix is estimated. To my understanding, its > default should correspond to `absolute_sigma=False` for > `curve_fit`. As `polyfit` has a weight parameter instead of > an uncertainty parameter, I guess the difference in default > behavior is not that surprising. > However, even when specifying `absolute_sigma=False`, > `curve_fit` and `polyfit` produce different covariance > matrices as the applied scaling factors are > chisq(popt)/(M-N-2) for `polyfit` > (https://github.com/numpy/numpy/blob/6a58e25703cbecb6786faa09a04ae2ec82 > 21348b/numpy/lib/polynomial.py#L598-L605) > and chisq(popt)/(M-N) for `curve_fit` > (https://github.com/scipy/scipy/blob/607a21e07dad234f8e63fcf03b7994137a > 3ccd5b/scipy/optimize/minpack.py#L781-L782). > The argument given in a comment to the scaling `polyfit` is: > "Some literature ignores the extra -2.0 factor in the > denominator, but it is included here because the covariance > of Multivariate Student-T (which is implied by a Bayesian > uncertainty analysis) includes it. Plus, it gives a slightly > more conservative estimate of uncertainty.", > but honestly, in a quick search, I was not able to find any > literature not ignoring the extra "factor". But obviously, I > could very well be misunderstanding something. > Nonetheless, as `curve_fit` ignores it as well, I was > wondering whether those two shouldn't give consistent > results and if so, which would be the correct solution. > I've never seen the -2 in any literature, and there is no reference in the code comment. (I would remove it as a bug-fix. Even if there is some Bayesian interpretation, it is not what users would expect.) There was a similar thread in 2013 https://mail.scipy.org/pipermail/numpy-discussion/2013-February/065664.html Josef > > > Best, > > Jonathan > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Tue May 29 11:46:18 2018 From: shoyer at gmail.com (Stephan Hoyer) Date: Tue, 29 May 2018 08:46:18 -0700 Subject: [Numpy-discussion] Where to discuss NEPs (was: Re: new NEP: np.AbstractArray and np.asabstractarray) In-Reply-To: References: Message-ID: Reviving this discussion -- I don't really care what our policy is, but can we make a decision one way or the other about where we discuss NEPs? We've had a revival of NEP writing recently, so this is very timely. Previously, I was in slight favor of doing discussion on GitHub. Now that I've started doing a bit of NEP writing, I've started to swing the other way, since it would be nice to be able to reference draft/rejected NEPs in a consistent way -- and rendered HTML is more readable than raw RST in pull requests. On Wed, Mar 14, 2018 at 6:52 PM Marten van Kerkwijk < m.h.vankerkwijk at gmail.com> wrote: > Apparently, where and how to discuss enhancement proposals was > recently a topic on the python mailing list as well -- see the > write-up at LWN: > https://lwn.net/SubscriberLink/749200/4343911ee71e35cf/ > The conclusion seems to be that one should switch to mailman3... > -- Marten > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue May 29 12:24:08 2018 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 29 May 2018 10:24:08 -0600 Subject: [Numpy-discussion] Where to discuss NEPs (was: Re: new NEP: np.AbstractArray and np.asabstractarray) In-Reply-To: References: Message-ID: On Tue, May 29, 2018 at 9:46 AM, Stephan Hoyer wrote: > Reviving this discussion -- > I don't really care what our policy is, but can we make a decision one way > or the other about where we discuss NEPs? We've had a revival of NEP > writing recently, so this is very timely. > > Previously, I was in slight favor of doing discussion on GitHub. Now that > I've started doing a bit of NEP writing, I've started to swing the other > way, since it would be nice to be able to reference draft/rejected NEPs in > a consistent way -- and rendered HTML is more readable than raw RST in pull > requests. > My understanding of the discussion at the sprint was that we favored quick commits of NEPs with extended discussions of them on the list. Updates and changes would go in through the normal PR process. In practice, I expect there will be some overlap, I think the important thing is the quick commit with the understanding that the NEPs are only proposals until formally adopted. I think the formal adoption process is not well defined... Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Tue May 29 13:52:26 2018 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 29 May 2018 10:52:26 -0700 Subject: [Numpy-discussion] matmul as a ufunc In-Reply-To: References: Message-ID: On Mon, May 28, 2018, 20:41 Stephan Hoyer wrote: > On Mon, May 28, 2018 at 7:36 PM Eric Wieser > wrote: > >> which ensure that it is still well defined (as the identity) on 1d >> arrays. >> >> This strikes me as a bad idea. There?s already enough confusion from >> beginners that array_1d.T is a no-op. If we introduce a >> matrix-transpose, it should either error on <1d inputs with a useful >> message, or insert the extra dimension. I?d favor the former. >> > To be clear: matrix transpose is an example use-case rather than a serious > proposal in this discussion. > > But given that idiomatic NumPy code uses 1D arrays in favor of explicit > row/column vectors with shapes (1,n) and (n,1), I do think it does make > sense for matrix transpose on 1D arrays to be the identity, because matrix > transpose should convert back and forth between row and column vectors > representations. > More concretely, I think the idea is that if you write code like a.T @ a then it's nice if that automatically works for both 2d and 1d arrays. Especially, say, if this is embedded inside a larger function so you have some responsibilities to you users to handle different inputs appropriately, and your users expect that to include both 2d matrices and 1d vectors. It reduces special cases. But, on the other hand, if you write a @ a.T then you'll be in for a surprise... So maybe it's not a great idea after all. (Note that here I'm using .T as a placeholder for a hypothetical "broadcasting matrix transpose". I don't think anyone proposes that .T itself should be changed to do this; I just needed some notation.) -n > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jotasi_numpy_scipy at posteo.de Tue May 29 14:21:25 2018 From: jotasi_numpy_scipy at posteo.de (Jonathan Tammo Siebert) Date: Tue, 29 May 2018 20:21:25 +0200 Subject: [Numpy-discussion] Inconsistent results for the covariance matrix between scipy.optimize.curve_fit and numpy.polyfit In-Reply-To: References: <1527599669.32567.9.camel@posteo.de> Message-ID: <1527618085.4203.1.camel@posteo.de> On Tue, 2018-05-29 at 10:47 -0400, josef.pktd at gmail.com wrote: > On Tue, May 29, 2018 at 9:14 AM, Jonathan Tammo Siebert < > jotasi_numpy_scipy at posteo.de> wrote: > > > Hi, > > > > I hope this is the appropriate place to ask something like > > this, otherwise please let me know (or feel free to ignore > > this). Also I hope that I do not misunderstood something or > > did some silly mistake. If so, please let me know as well! > > > > TLDR: > > When scaling the covariance matrix based on the residuals, > > scipy.optimize.curve_fit uses a factor of chisq(popt)/(M-N) > > (with M=number of point, N=number of parameters) and > > numpy.polyfit uses chisq(popt)/(M-N-2). I am wondering which > > is correct. > > > > I am somewhat confused about different results I am getting > > for the covariance matrix of a simple linear fit, when > > comparing `scipy.optimize.curve_fit` and `numpy.polyfit`. I > > am aware, that `curve_fit` solves the more general non-linear > > problem numerically, while `polyfit` finds an analytical > > solution to the linear problem. However, both converge to the > > same solution, so I suspect that this difference is not > > important here. The difference, I am curious about is not in > > the returned parameters but in the estimate of the > > corresponding covariance matrix. As I understand, there are > > two different ways to estimate it, based either on the > > absolute values of the provided uncertainties or by > > interpreting those only as weights and then scaling the > > matrix to produce an appropriate reduced chisq. To that end, > > curve_fit has the parameter: > > https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize > > .cur > > ve_fit.html: > > "absolute_sigma : bool, optional > > If True, sigma is used in an absolute sense and the > > estimated parameter covariance pcov reflects these absolute > > values. > > If False, only the relative magnitudes of the sigma values > > matter. The returned parameter covariance matrix pcov is > > based on scaling sigma by a constant factor. This constant > > is set by demanding that the reduced chisq for the optimal > > parameters popt when using the scaled sigma equals unity. In > > other words, sigma is scaled to match the sample variance of > > the residuals after the fit. Mathematically, > > pcov(absolute_sigma=False) = pcov(absolute_sigma=True) * > > chisq(popt)/(M-N)" > > https://docs.scipy.org/doc/numpy/reference/generated/numpy.polyfit. > > html > > on the other hand, does not say anything about how the > > covariance matrix is estimated. To my understanding, its > > default should correspond to `absolute_sigma=False` for > > `curve_fit`. As `polyfit` has a weight parameter instead of > > an uncertainty parameter, I guess the difference in default > > behavior is not that surprising. > > However, even when specifying `absolute_sigma=False`, > > `curve_fit` and `polyfit` produce different covariance > > matrices as the applied scaling factors are > > chisq(popt)/(M-N-2) for `polyfit` > > (https://github.com/numpy/numpy/blob/6a58e25703cbecb6786faa09a04ae2 > > ec82 > > 21348b/numpy/lib/polynomial.py#L598-L605) > > and chisq(popt)/(M-N) for `curve_fit` > > (https://github.com/scipy/scipy/blob/607a21e07dad234f8e63fcf03b7994 > > 137a > > 3ccd5b/scipy/optimize/minpack.py#L781-L782). > > The argument given in a comment to the scaling `polyfit` is: > > "Some literature ignores the extra -2.0 factor in the > > denominator, but it is included here because the covariance > > of Multivariate Student-T (which is implied by a Bayesian > > uncertainty analysis) includes it. Plus, it gives a slightly > > more conservative estimate of uncertainty.", > > but honestly, in a quick search, I was not able to find any > > literature not ignoring the extra "factor". But obviously, I > > could very well be misunderstanding something. > > Nonetheless, as `curve_fit` ignores it as well, I was > > wondering whether those two shouldn't give consistent > > results and if so, which would be the correct solution. > > > > > I've never seen the -2 in any literature, and there is no reference > in the > code comment. > (I would remove it as a bug-fix. Even if there is some Bayesian > interpretation, it is not what users would expect.) That would be my preferred fix as well. If there aren't any objections, I'll open a corresponding issue and PR. > There was a similar thread in 2013 > https://mail.scipy.org/pipermail/numpy-discussion/2013- > February/065664.html Thanks for the link. I must've somehow missed that earlier discussion. Would it be appropriate to also add an additional parameter along the lines of curve_fit's `absolute_sigma` with default `False` to keep it consistent? I already felt that something like this was missing for cases where proper standard errors are known for the data points and it was apparently already discussed in 2013. As far as I can see, the main reason against that is the fact that `polyfit` accepts `w` (weights->`1/sigma`) as a parameter and not `sigma`, which would make the documentation somewhat less intuitive than in the case of `curve_fit`. Jonathan > > Josef > > > > > > > > Best, > > > > Jonathan > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion From josef.pktd at gmail.com Tue May 29 16:54:52 2018 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 29 May 2018 16:54:52 -0400 Subject: [Numpy-discussion] Inconsistent results for the covariance matrix between scipy.optimize.curve_fit and numpy.polyfit In-Reply-To: <1527618085.4203.1.camel@posteo.de> References: <1527599669.32567.9.camel@posteo.de> <1527618085.4203.1.camel@posteo.de> Message-ID: On Tue, May 29, 2018 at 2:21 PM, Jonathan Tammo Siebert < jotasi_numpy_scipy at posteo.de> wrote: > On Tue, 2018-05-29 at 10:47 -0400, josef.pktd at gmail.com wrote: > > On Tue, May 29, 2018 at 9:14 AM, Jonathan Tammo Siebert < > > jotasi_numpy_scipy at posteo.de> wrote: > > > > > Hi, > > > > > > I hope this is the appropriate place to ask something like > > > this, otherwise please let me know (or feel free to ignore > > > this). Also I hope that I do not misunderstood something or > > > did some silly mistake. If so, please let me know as well! > > > > > > TLDR: > > > When scaling the covariance matrix based on the residuals, > > > scipy.optimize.curve_fit uses a factor of chisq(popt)/(M-N) > > > (with M=number of point, N=number of parameters) and > > > numpy.polyfit uses chisq(popt)/(M-N-2). I am wondering which > > > is correct. > > > > > > I am somewhat confused about different results I am getting > > > for the covariance matrix of a simple linear fit, when > > > comparing `scipy.optimize.curve_fit` and `numpy.polyfit`. I > > > am aware, that `curve_fit` solves the more general non-linear > > > problem numerically, while `polyfit` finds an analytical > > > solution to the linear problem. However, both converge to the > > > same solution, so I suspect that this difference is not > > > important here. The difference, I am curious about is not in > > > the returned parameters but in the estimate of the > > > corresponding covariance matrix. As I understand, there are > > > two different ways to estimate it, based either on the > > > absolute values of the provided uncertainties or by > > > interpreting those only as weights and then scaling the > > > matrix to produce an appropriate reduced chisq. To that end, > > > curve_fit has the parameter: > > > https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize > > > .cur > > > ve_fit.html: > > > "absolute_sigma : bool, optional > > > If True, sigma is used in an absolute sense and the > > > estimated parameter covariance pcov reflects these absolute > > > values. > > > If False, only the relative magnitudes of the sigma values > > > matter. The returned parameter covariance matrix pcov is > > > based on scaling sigma by a constant factor. This constant > > > is set by demanding that the reduced chisq for the optimal > > > parameters popt when using the scaled sigma equals unity. In > > > other words, sigma is scaled to match the sample variance of > > > the residuals after the fit. Mathematically, > > > pcov(absolute_sigma=False) = pcov(absolute_sigma=True) * > > > chisq(popt)/(M-N)" > > > https://docs.scipy.org/doc/numpy/reference/generated/numpy.polyfit. > > > html > > > on the other hand, does not say anything about how the > > > covariance matrix is estimated. To my understanding, its > > > default should correspond to `absolute_sigma=False` for > > > `curve_fit`. As `polyfit` has a weight parameter instead of > > > an uncertainty parameter, I guess the difference in default > > > behavior is not that surprising. > > > However, even when specifying `absolute_sigma=False`, > > > `curve_fit` and `polyfit` produce different covariance > > > matrices as the applied scaling factors are > > > chisq(popt)/(M-N-2) for `polyfit` > > > (https://github.com/numpy/numpy/blob/6a58e25703cbecb6786faa09a04ae2 > > > ec82 > > > 21348b/numpy/lib/polynomial.py#L598-L605) > > > and chisq(popt)/(M-N) for `curve_fit` > > > (https://github.com/scipy/scipy/blob/607a21e07dad234f8e63fcf03b7994 > > > 137a > > > 3ccd5b/scipy/optimize/minpack.py#L781-L782). > > > The argument given in a comment to the scaling `polyfit` is: > > > "Some literature ignores the extra -2.0 factor in the > > > denominator, but it is included here because the covariance > > > of Multivariate Student-T (which is implied by a Bayesian > > > uncertainty analysis) includes it. Plus, it gives a slightly > > > more conservative estimate of uncertainty.", > > > but honestly, in a quick search, I was not able to find any > > > literature not ignoring the extra "factor". But obviously, I > > > could very well be misunderstanding something. > > > Nonetheless, as `curve_fit` ignores it as well, I was > > > wondering whether those two shouldn't give consistent > > > results and if so, which would be the correct solution. > > > > > > > > > I've never seen the -2 in any literature, and there is no reference > > in the > > code comment. > > (I would remove it as a bug-fix. Even if there is some Bayesian > > interpretation, it is not what users would expect.) > > That would be my preferred fix as well. If there aren't any objections, > I'll open a corresponding issue and PR. > > > There was a similar thread in 2013 > > https://mail.scipy.org/pipermail/numpy-discussion/2013- > > February/065664.html > > Thanks for the link. I must've somehow missed that earlier discussion. > Would it be appropriate to also add an additional parameter along the > lines of curve_fit's `absolute_sigma` with default `False` to keep it > consistent? I already felt that something like this was missing for > cases where proper standard errors are known for the data points and it > was apparently already discussed in 2013. As far as I can see, the main > reason against that is the fact that `polyfit` accepts `w` > (weights->`1/sigma`) as a parameter and not `sigma`, which would make > the documentation somewhat less intuitive than in the case of > `curve_fit`. > It would work with "absolute_weights" After the long discussions for scaling in curve_fit, I think it's fine to add it. asides: I still don't really understand why users would want it for the covariance of the parameter estimates. However, I also added an option to statsmodels OLS and WLS to keep the scale fixed instead of using the estimated scale. There is a reason that polyfit might have different, larger standard errors than curve_fit, if we assume that curve_fit has a correctly specified mean function and polyfit is just a low order approximation to an arbitrary non-linear function. That is polyfit combines a functional approximation error with the stochastic error from the random observations. (which is also ignore if scale is fixed.) (But I never tried to figure out those non- or semi-parametric approximation details.) aside further away: In Poisson regression, we go the opposite way and use an estimated residual scale instead of a fixed scale = 1 so we can correct the standard errors for the parameters when there is over-dispersion. (I just ran a Monte Carlo example where a hypothesis test under Poisson assumption is very wrong because of the over dispersion. I.e. if the chi2 is far away from 1, then the standard errors for the parameters are "useless" if scale is assumed to be 1.) Josef > > > Jonathan > > > > > Josef > > > > > > > > > > > > > Best, > > > > > > Jonathan > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at python.org > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.h.vankerkwijk at gmail.com Wed May 30 14:14:13 2018 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Wed, 30 May 2018 14:14:13 -0400 Subject: [Numpy-discussion] Allowing broadcasting of code dimensions in generalized ufuncs Message-ID: Hi All, Following on a PR combining the ability to provide fixed and flexible dimensions [1] (useful for, e.g., 3-vector input with a signature like `(3),(3)->(3)`, and for `matmul`, resp.; based on earlier PRs by Jaime [2] and Matt (Picus) [3]), I've now made a PR with a further enhancement, which allows one can indicate that a core dimension can be broadcast [4]. A particular use case is `all_equal`, a new function suggested in a stalled PR by Matt (Harrigan) [5], which compares two arrays axis-by-axis, but short-circuits if a non-equality is found (unlike what is the case if one does `(a==b).all(axis)`). One thing that would be obviously useful for a routine like `all_equal` is to be able to provide an array as one argument and a constant as another, i.e., if the core dimensions can be broadcast if needed, just like they are in `(a==b).all(axis)`. This is currently not possible: with its signature of `(n),(n)->()`, the two arrays have to have the same trailing size. My PR provides the ability to indicate in the signature that a core dimension can be broadcast, by using a suffix of "|1". Thus, the signature of `all_equal` would become: ``` (n|1),(n|1)->() ``` Comments most welcome (yes, even on the notation - though I think it is fairly self-explanatory)! Marten p.s. There are some similarities to the new "flexible" dimensions implemented for `matmul` [1], but also differences. In particular, for a signature of `(n?),(n?)->()`, one could also pass in an array of trailing size n and a constant, but it would not be possible to pass in an array with trailing size 1: the dimensions with the same name have to be either present and the same or absent. In contrast, for broadcasting, dimensions with the same name can have trailing size n, size 1, or be absent (in which case they count as 1). For broadcasting, any output dimensions with the same name are never affected, while for flexible dimensions those are removed. [1] https://github.com/numpy/numpy/pull/11175 [2] https://github.com/numpy/numpy/pull/5015 [3] https://github.com/numpy/numpy/pull/11132 [4] https://github.com/numpy/numpy/pull/11179 [5] https://github.com/numpy/numpy/pull/8528 From shoyer at gmail.com Wed May 30 19:22:36 2018 From: shoyer at gmail.com (Stephan Hoyer) Date: Wed, 30 May 2018 16:22:36 -0700 Subject: [Numpy-discussion] Allowing broadcasting of code dimensions in generalized ufuncs In-Reply-To: References: Message-ID: On Wed, May 30, 2018 at 11:15 AM Marten van Kerkwijk < m.h.vankerkwijk at gmail.com> wrote: > My PR provides the ability to indicate in the signature that a core > dimension can be broadcast, by using a suffix of "|1". Thus, the > signature of `all_equal` would become: > > ``` > (n|1),(n|1)->() > ``` > I read this as "dimensions may have size n or 1", which would exclude the possibility of scalars. For all_equal, I think you could also use a signature like "(m?),(n?)->()", with a short-cut to automatically return False if m != n. -------------- next part -------------- An HTML attachment was scrubbed... URL: From harrigan.matthew at gmail.com Wed May 30 20:00:22 2018 From: harrigan.matthew at gmail.com (Matthew Harrigan) Date: Wed, 30 May 2018 20:00:22 -0400 Subject: [Numpy-discussion] Allowing broadcasting of code dimensions in generalized ufuncs In-Reply-To: References: Message-ID: "short-cut to automatically return False if m != n", that seems like a silent bug AFAICT there are 3 possibilities: 1) current behavior 2) a scalar or size 1 array may be substituted, ie a constant 3) a scalar or array with shape[-1] == 1 may be substituted and broadcasted I am fond of using "n^" to signify #3 since I think of broadcasting as increasing the size of the array. Although a stretch, "n#" might work for #2, as it reminds me of #define'ing constants in C. To generalize a bit, most elementwise operations have obvious broadcasting cases and reduce operations have a core dimension. Fusing any two, ie sumproduct, would result in a gufunc which would benefit from this ability. On Wed, May 30, 2018 at 7:22 PM, Stephan Hoyer wrote: > On Wed, May 30, 2018 at 11:15 AM Marten van Kerkwijk < > m.h.vankerkwijk at gmail.com> wrote: > >> My PR provides the ability to indicate in the signature that a core >> dimension can be broadcast, by using a suffix of "|1". Thus, the >> signature of `all_equal` would become: >> >> ``` >> (n|1),(n|1)->() >> ``` >> > > I read this as "dimensions may have size n or 1", which would exclude the > possibility of scalars. > > For all_equal, I think you could also use a signature like > "(m?),(n?)->()", with a short-cut to automatically return False if m != n. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.h.vankerkwijk at gmail.com Wed May 30 20:21:50 2018 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Wed, 30 May 2018 20:21:50 -0400 Subject: [Numpy-discussion] Allowing broadcasting of code dimensions in generalized ufuncs In-Reply-To: References: Message-ID: Hi Stephan, Matt, My `n|1` was indeed meant to be read as `n or 1`, but with the (implicit) understanding that any array can have as many ones pre-pended as needed. The signature `(n?),(n?)->()` is now set aside for flexible dimensions: this would allow the constant, but not the trailing shape of 1 (at least as we currently have implemented it). I do think that is more consistent with the visual suggestion, that the think may be absent. It also has implications for output: `(m?,n),(n,p?)->(m?,p?)` is meant to indicate that if a dimension is absent, it should be absent in the output as well. In contrast, for broadcasting, I'd envisage `(n|1),(n|1)->(n)` to indicate that the output dimension will always be present and be of length n. I'm not sure I'm sold on `n^` - I don't think it gives an immediate hint of what it would do... All the best, Marten From njs at pobox.com Thu May 31 00:10:03 2018 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 30 May 2018 21:10:03 -0700 Subject: [Numpy-discussion] Allowing broadcasting of code dimensions in generalized ufuncs In-Reply-To: References: Message-ID: On Wed, May 30, 2018 at 11:14 AM, Marten van Kerkwijk wrote: > Hi All, > > Following on a PR combining the ability to provide fixed and flexible > dimensions [1] (useful for, e.g., 3-vector input with a signature like > `(3),(3)->(3)`, and for `matmul`, resp.; based on earlier PRs by Jaime > [2] and Matt (Picus) [3]), I've now made a PR with a further > enhancement, which allows one can indicate that a core dimension can > be broadcast [4]. > > A particular use case is `all_equal`, a new function suggested in a > stalled PR by Matt (Harrigan) [5], which compares two arrays > axis-by-axis, but short-circuits if a non-equality is found (unlike > what is the case if one does `(a==b).all(axis)`). One thing that would > be obviously useful for a routine like `all_equal` is to be able to > provide an array as one argument and a constant as another, i.e., if > the core dimensions can be broadcast if needed, just like they are in > `(a==b).all(axis)`. This is currently not possible: with its signature > of `(n),(n)->()`, the two arrays have to have the same trailing size. > > My PR provides the ability to indicate in the signature that a core > dimension can be broadcast, by using a suffix of "|1". Thus, the > signature of `all_equal` would become: > > ``` > (n|1),(n|1)->() > ``` > > Comments most welcome (yes, even on the notation - though I think it > is fairly self-explanatory)! I'm currently -0.5 on both fixed dimensions and this broadcasting dimension idea. My reasoning is: - The use cases seem fairly esoteric. For fixed dimensions, I guess the motivating example is cross-product (are there any others?). But would it be so bad for a cross-product gufunc to raise an error if it receives the wrong number of dimensions? For this broadcasting case... well, obviously we've survived this long without all_equal :-). And there's something funny about all_equal, since it's really smushing together two conceptually separate gufuncs for efficiency. Should we also have all_less_than, sum_square, ...? If this is a big problem, then wouldn't it be better to solve it in a general way, like dask or Numba or numexpr do? To be clear, I'm not saying these features are necessarily *bad* ideas, in isolation -- just that the benefits aren't very convincing, and there are trade-offs, like: - When it comes to the core ufunc machinery, we have a limited complexity budget. I'm nervous that if we add too many bells and whistles, we'll end up writing ourselves into a corner where we have trouble maintaining it, where it becomes difficult to predict how different features interact, it becomes increasingly difficult for third-parties to handle all the different features in their __array_ufunc__ methods... - And, we have a lot of other demands on the core ufunc machinery, that might be better places to spend our limited complexity budget. For example, can we come up with an extension to make np.sort a gufunc? That seems like a much higher priority than figuring out how to make all_equal a gufunc. What about refactoring the ufunc machinery to support user-defined dtypes? That'll need some serious work, and again, it's probably higher priority than supporting cross-product or all_equal directly (or at least it seems that way to me). Maybe there are more compelling use cases that I'm missing, but as it is, I feel like trying to add too many features to the current ufunc machinery is pretty risky for future maintainability, and we shouldn't do it without really solid use cases. -n -- Nathaniel J. Smith -- https://vorpus.org From m.h.vankerkwijk at gmail.com Thu May 31 07:20:21 2018 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Thu, 31 May 2018 07:20:21 -0400 Subject: [Numpy-discussion] Allowing broadcasting of code dimensions in generalized ufuncs In-Reply-To: References: Message-ID: Hi Nathaniel, I think the case for frozen dimensions is much more solid that just `cross1d` - there are many operations that work on size-3 vectors. Indeed, as I noted in the PR, I have just been wrapping a Standards-of-Astronomy library in gufuncs, and many of its functions require size-3 vectors or 3x3 matrices [1]. Of course, I can put checks on the sizes, and I've now done that in a custom type resolver (which I needed anyway since, as you say, user dtypes is currently not easy), but there is a real problem for functions that take scalars and produce vectors: with a signature like `(),()->(n)`, I am forced to pass in an output with size 3, which is very inconvenient (especially if I then also want to override with `__array_ufunc__` - now my Quantity implementation also has to start changing an output already put in. So, having frozen dimensions is definitely helpful for developers of new gufuncs. Furthermore, with frozen dimensions, the signature is not just immediately clear, `(),()->(3)` for the example above, it is also better in telling users about what a function does. Indeed, I think this addition has much more justification than the `?` which is much more complex than the fixed size, yet neither particularly clear nor useful beyond the single purpose of matmul. (It is just that that single purpose has fairly high weight...) As for broadcasting, well, with the flexible dimensions defined, the *additional* complexity is very small. I have no ready example other than all_equal, though will say that I currently have code that does `if a[0] == x && a[1] == x && a[2] ==x && np.all(a[3:] == x):` just because the short-circuiting is well worth the time (it is unlikely in this context that all a equals x). So, there is at least one example of an actual need for this function. All the best, Marten [1] https://github.com/astropy/astropy/pull/7502 From allanhaldane at gmail.com Thu May 31 09:35:09 2018 From: allanhaldane at gmail.com (Allan Haldane) Date: Thu, 31 May 2018 09:35:09 -0400 Subject: [Numpy-discussion] Allowing broadcasting of code dimensions in generalized ufuncs In-Reply-To: References: Message-ID: On 05/31/2018 12:10 AM, Nathaniel Smith wrote: > On Wed, May 30, 2018 at 11:14 AM, Marten van Kerkwijk > wrote: >> Hi All, >> >> Following on a PR combining the ability to provide fixed and flexible >> dimensions [1] (useful for, e.g., 3-vector input with a signature like >> `(3),(3)->(3)`, and for `matmul`, resp.; based on earlier PRs by Jaime >> [2] and Matt (Picus) [3]), I've now made a PR with a further >> enhancement, which allows one can indicate that a core dimension can >> be broadcast [4]. >> >> A particular use case is `all_equal`, a new function suggested in a >> stalled PR by Matt (Harrigan) [5], which compares two arrays >> axis-by-axis, but short-circuits if a non-equality is found (unlike >> what is the case if one does `(a==b).all(axis)`). One thing that would >> be obviously useful for a routine like `all_equal` is to be able to >> provide an array as one argument and a constant as another, i.e., if >> the core dimensions can be broadcast if needed, just like they are in >> `(a==b).all(axis)`. This is currently not possible: with its signature >> of `(n),(n)->()`, the two arrays have to have the same trailing size. >> >> My PR provides the ability to indicate in the signature that a core >> dimension can be broadcast, by using a suffix of "|1". Thus, the >> signature of `all_equal` would become: >> >> ``` >> (n|1),(n|1)->() >> ``` >> >> Comments most welcome (yes, even on the notation - though I think it >> is fairly self-explanatory)! > > I'm currently -0.5 on both fixed dimensions and this broadcasting > dimension idea. My reasoning is: > > - The use cases seem fairly esoteric. For fixed dimensions, I guess > the motivating example is cross-product (are there any others?). But > would it be so bad for a cross-product gufunc to raise an error if it > receives the wrong number of dimensions? For this broadcasting case... > well, obviously we've survived this long without all_equal :-). And > there's something funny about all_equal, since it's really smushing > together two conceptually separate gufuncs for efficiency. Should we > also have all_less_than, sum_square, ...? If this is a big problem, > then wouldn't it be better to solve it in a general way, like dask or > Numba or numexpr do? To be clear, I'm not saying these features are > necessarily *bad* ideas, in isolation -- just that the benefits aren't > very convincing, and there are trade-offs, like: I have often wished numpy had these short-circuiting gufuncs, for a very long time. I specifically remember my fruitless searches for how to do it back to 2007. While "on average" short-circuiting only gives a speedup of 2x, in many situations you can arrange your algorithm so short circuiting will happen early, eg usually in the first 10 elements of a 10^6 element array, giving enormous speedups. Also, I do not imagine these as free-floating ufuncs, I think we can arrange them in a logical way in a gufunc ecosystem. There would be some "core ufuncs", with "associated gufuncs" accessible as attributes. For instance, any_less_than will be accessible as less.any binary "comparison" ufuncs would have attributes less.any less.all less.first # returns first matching index less.count # counts matches without intermediate bool array This adds on to the existing attributes, for instance ufuncs already have: add.reduce add.accumulate add.reduceat add.outer add.at It is unfortunate that all ufuncs currently have these attributes even if they are unimplemented/inappropriate (eg, np.sin.reduce), I would like to remove the inappropriate ones, so each core ufunc will only have the appropriate attribute "associated gufuncs". Incidentally, once we make reduce/accumuate/... into "associated gufuncs", I propose completely removing the "method" argument of __array_ufunc__, since it is no longer needed and adds a lot of complexity which implementors of an __array_ufunc__ are forced to account for. Cheers, Allan > > - When it comes to the core ufunc machinery, we have a limited > complexity budget. I'm nervous that if we add too many bells and > whistles, we'll end up writing ourselves into a corner where we have > trouble maintaining it, where it becomes difficult to predict how > different features interact, it becomes increasingly difficult for > third-parties to handle all the different features in their > __array_ufunc__ methods... > > - And, we have a lot of other demands on the core ufunc machinery, > that might be better places to spend our limited complexity budget. > For example, can we come up with an extension to make np.sort a > gufunc? That seems like a much higher priority than figuring out how > to make all_equal a gufunc. What about refactoring the ufunc machinery > to support user-defined dtypes? That'll need some serious work, and > again, it's probably higher priority than supporting cross-product or > all_equal directly (or at least it seems that way to me). > > Maybe there are more compelling use cases that I'm missing, but as it > is, I feel like trying to add too many features to the current ufunc > machinery is pretty risky for future maintainability, and we shouldn't > do it without really solid use cases. > > -n > From sebastian at sipsolutions.net Thu May 31 09:53:51 2018 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Thu, 31 May 2018 15:53:51 +0200 Subject: [Numpy-discussion] Allowing broadcasting of code dimensions in generalized ufuncs In-Reply-To: References: Message-ID: <4c3f6e324d579de7edb3369f79d3c59d317a5642.camel@sipsolutions.net> > > > > I'm currently -0.5 on both fixed dimensions and this broadcasting > > dimension idea. My reasoning is: > > > > - The use cases seem fairly esoteric. For fixed dimensions, I guess > > the motivating example is cross-product (are there any others?). > > But > > would it be so bad for a cross-product gufunc to raise an error if > > it > > receives the wrong number of dimensions? For this broadcasting > > case... > > well, obviously we've survived this long without all_equal :-). And > > there's something funny about all_equal, since it's really smushing > > together two conceptually separate gufuncs for efficiency. Should > > we > > also have all_less_than, sum_square, ...? If this is a big problem, > > then wouldn't it be better to solve it in a general way, like dask > > or > > Numba or numexpr do? To be clear, I'm not saying these features are > > necessarily *bad* ideas, in isolation -- just that the benefits > > aren't > > very convincing, and there are trade-offs, like: > > I have often wished numpy had these short-circuiting gufuncs, for a > very > long time. I specifically remember my fruitless searches for how to > do > it back to 2007. > > While "on average" short-circuiting only gives a speedup of 2x, in > many > situations you can arrange your algorithm so short circuiting will > happen early, eg usually in the first 10 elements of a 10^6 element > array, giving enormous speedups. > Also, I do not imagine these as free-floating ufuncs, I think we can > arrange them in a logical way in a gufunc ecosystem. There would be > some > "core ufuncs", with "associated gufuncs" accessible as attributes. > For > instance, any_less_than will be accessible as less.any > So then, why is it a gufunc and not an attribute using a ufunc with binary output? I have asked this before, and even got arguments as to why it fits gufuncs better, but frankly I still do not really understand. If it is an associated gufunc, why gufunc at all? We need any() and all() here, so that is not that many methods, right? And when it comes to buffering you have much more flexibility. Say I have the operation: (float_arr > int_arr).all(axis=(1, 2)) With int_arr being shaped (2, 1000, 1000) (i.e. large along the interesting axes). A normal gufunc IIRC will get the whole inner dimension as a float buffer. In other words, you gain practically nothing, because the whole int_arr will be cast to float anyway. If, however, you actually implement np.greater_than.all(float_arr, int_arr, axis=(1, 2)) as a separate ufunc method, you would have the freedom to work in the typical cache friendly buffersize chunk size for each of the outer dimensions one at a time. A gufunc would require to say: please do not buffer for me, or implement all possible type combinations to do this. (of course there are memory layout subtleties, since you would have to optimize always for the "fast exit" case, potentially making the worst case scenario much worse -- unless you do seriously fancy stuff anyway). A more general question is actually whether we should rather focus on solving the same problem more generally. For example if `numexpr` would implement all/any reductions, it may be able to pretty simply get the identical tradeoffs with even more flexibility! (I have to admit, it may get tricky with multiple reduction dimensions, etc.) - Sebastian > binary "comparison" ufuncs would have attributes > > less.any > less.all > less.first # returns first matching index > less.count # counts matches without intermediate bool array > > This adds on to the existing attributes, for instance > ufuncs already have: > > add.reduce > add.accumulate > add.reduceat > add.outer > add.at > > It is unfortunate that all ufuncs currently have these attributes > even > if they are unimplemented/inappropriate (eg, np.sin.reduce), I would > like to remove the inappropriate ones, so each core ufunc will only > have the appropriate attribute "associated gufuncs". > > Incidentally, once we make reduce/accumuate/... into "associated > gufuncs", I propose completely removing the "method" argument of > __array_ufunc__, since it is no longer needed and adds a lot > of complexity which implementors of an __array_ufunc__ are forced to > account for. > > Cheers, > Allan > > > > > > > > > > - When it comes to the core ufunc machinery, we have a limited > > complexity budget. I'm nervous that if we add too many bells and > > whistles, we'll end up writing ourselves into a corner where we > > have > > trouble maintaining it, where it becomes difficult to predict how > > different features interact, it becomes increasingly difficult for > > third-parties to handle all the different features in their > > __array_ufunc__ methods... > > > > - And, we have a lot of other demands on the core ufunc machinery, > > that might be better places to spend our limited complexity budget. > > For example, can we come up with an extension to make np.sort a > > gufunc? That seems like a much higher priority than figuring out > > how > > to make all_equal a gufunc. What about refactoring the ufunc > > machinery > > to support user-defined dtypes? That'll need some serious work, and > > again, it's probably higher priority than supporting cross-product > > or > > all_equal directly (or at least it seems that way to me). > > > > Maybe there are more compelling use cases that I'm missing, but as > > it > > is, I feel like trying to add too many features to the current > > ufunc > > machinery is pretty risky for future maintainability, and we > > shouldn't > > do it without really solid use cases. > > > > -n > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From m.h.vankerkwijk at gmail.com Thu May 31 09:58:44 2018 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Thu, 31 May 2018 09:58:44 -0400 Subject: [Numpy-discussion] Allowing broadcasting of code dimensions in generalized ufuncs In-Reply-To: References: Message-ID: > Incidentally, once we make reduce/accumuate/... into "associated gufuncs", I > propose completely removing the "method" argument of __array_ufunc__, since > it is no longer needed and adds a lot > of complexity which implementors of an __array_ufunc__ are forced to > account for. For Quantity at least I found it somewhat helpful to have the method separate from the ufunc, as how one deals with it still has similarities. Though it would probably have been similarly easy if `__array_ufunc__` had been passed `np.add.reduce` directly. Aside: am not sure we can so easily remove `method` any more, but I guess we can at least start getting to a state where it always is `__call__`. Anyway, this does argue we should be rather careful with signatures.. In particular, for `__array_function__` we really need to think carefully about `types` - an `__array_function__` specific dict at the end may be more useful. All the best, Marten From einstein.edison at gmail.com Thu May 31 09:58:44 2018 From: einstein.edison at gmail.com (Hameer Abbasi) Date: Thu, 31 May 2018 09:58:44 -0400 Subject: [Numpy-discussion] Allowing broadcasting of code dimensions in generalized ufuncs In-Reply-To: References: Message-ID: While "on average" short-circuiting only gives a speedup of 2x, in many situations you can arrange your algorithm so short circuiting will happen early, eg usually in the first 10 elements of a 10^6 element array, giving enormous speedups. Also, I do not imagine these as free-floating ufuncs, I think we can arrange them in a logical way in a gufunc ecosystem. There would be some "core ufuncs", with "associated gufuncs" accessible as attributes. For instance, any_less_than will be accessible as less.any binary "comparison" ufuncs would have attributes less.any less.all less.first # returns first matching index less.count # counts matches without intermediate bool array This adds on to the existing attributes, for instance ufuncs already have: add.reduce add.accumulate add.reduceat add.outer add.at It is unfortunate that all ufuncs currently have these attributes even if they are unimplemented/inappropriate (eg, np.sin.reduce), I would like to remove the inappropriate ones, so each core ufunc will only have the appropriate attribute "associated gufuncs". I?m definitely in favour of all this. It?d be great to have this, and it?d be an excellent ecosystem. I?ll add that composing ufuncs is something I?ve wanted, and that has come up from time to time. Incidentally, once we make reduce/accumuate/... into "associated gufuncs", I propose completely removing the "method" argument of __array_ufunc__, since it is no longer needed and adds a lot of complexity which implementors of an __array_ufunc__ are forced to account for. While removing ?method? is okay in my book, there should at least be a way to detect if something is e.g., a reduction, or an element-wise ufunc (this one is obvious, all shapes involved will be ()). We, for example, use this in pydata/sparse. As you can imagine, for sparse arrays, element-wise operations behave a certain way and there turns out to be a general way to do reductions that have certain properties as well. See my paper?s draft[1] for details. I don?t mind the __array_ufunc__ api changing, but I?d like it if there was a way to still access the information that was previously available. [1] https://github.com/scipy-conference/scipy_proceedings/pull/388 Regards, Hameer Abbasi Sent from Astro for Mac -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.h.vankerkwijk at gmail.com Thu May 31 10:06:17 2018 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Thu, 31 May 2018 10:06:17 -0400 Subject: [Numpy-discussion] Allowing broadcasting of code dimensions in generalized ufuncs In-Reply-To: References: Message-ID: Hi Sebastian, This is getting a bit far off-topic (which is whether it is a good idea to allow the ability to set frozen dimensions and broadcasting), but on `all_equal`, I definitely see the point that a method might be better, but that needs work: to expand the normal ufunc mechanism to allow the inner loop to carry state (for which there is an issue [1]) and for it to tell the iterator to stop feeding it buffers. It is not quite clear to me that this is better/easier than expanding gufuncs to be able to deal with buffers... All the best, Marten [1] https://github.com/numpy/numpy/issues/8773 From allanhaldane at gmail.com Thu May 31 10:34:14 2018 From: allanhaldane at gmail.com (Allan Haldane) Date: Thu, 31 May 2018 10:34:14 -0400 Subject: [Numpy-discussion] Allowing broadcasting of code dimensions in generalized ufuncs In-Reply-To: <4c3f6e324d579de7edb3369f79d3c59d317a5642.camel@sipsolutions.net> References: <4c3f6e324d579de7edb3369f79d3c59d317a5642.camel@sipsolutions.net> Message-ID: <1c78ccce-7796-f78d-5499-c2abd8ec2d1a@gmail.com> On 05/31/2018 09:53 AM, Sebastian Berg wrote: > > >> Also, I do not imagine these as free-floating ufuncs, I think we can >> arrange them in a logical way in a gufunc ecosystem. There would be >> some >> "core ufuncs", with "associated gufuncs" accessible as attributes. >> For >> instance, any_less_than will be accessible as less.any >> > > So then, why is it a gufunc and not an attribute using a ufunc with > binary output? I have asked this before, and even got arguments as to > why it fits gufuncs better, but frankly I still do not really > understand. > > If it is an associated gufunc, why gufunc at all? We need any() and > all() here, so that is not that many methods, right? And when it comes > to buffering you have much more flexibility. > > Say I have the operation: > > (float_arr > int_arr).all(axis=(1, 2)) > > With int_arr being shaped (2, 1000, 1000) (i.e. large along the > interesting axes). A normal gufunc IIRC will get the whole inner > dimension as a float buffer. In other words, you gain practically > nothing, because the whole int_arr will be cast to float anyway. > > If, however, you actually implement np.greater_than.all(float_arr, > int_arr, axis=(1, 2)) as a separate ufunc method, you would have the > freedom to work in the typical cache friendly buffersize chunk size for > each of the outer dimensions one at a time. A gufunc would require to > say: please do not buffer for me, or implement all possible type > combinations to do this. > (of course there are memory layout subtleties, since you would have to > optimize always for the "fast exit" case, potentially making the worst > case scenario much worse -- unless you do seriously fancy stuff > anyway). > > A more general question is actually whether we should rather focus on > solving the same problem more generally. > For example if `numexpr` would implement all/any reductions, it may be > able to pretty simply get the identical tradeoffs with even more > flexibility! (I have to admit, it may get tricky with multiple > reduction dimensions, etc.) > > - Sebastian Hmm, I hadn't known/considered the limitations of gufunc buffer sizes. I was just thinking of them as a standardized interface which handles the where/out/broadcasting for you. I'll have to read about it. One thing I don't like about the ufunc-method strategy is that it esily pollutes all the ufuncs namespaces and their implementations, so many ufuncs have to account for a new "all" method even if innapropriate, for example. Cheers, Allan From allanhaldane at gmail.com Thu May 31 11:23:38 2018 From: allanhaldane at gmail.com (Allan Haldane) Date: Thu, 31 May 2018 11:23:38 -0400 Subject: [Numpy-discussion] Allowing broadcasting of code dimensions in generalized ufuncs In-Reply-To: References: Message-ID: On 05/31/2018 12:10 AM, Nathaniel Smith wrote: > On Wed, May 30, 2018 at 11:14 AM, Marten van Kerkwijk > - When it comes to the core ufunc machinery, we have a limited > complexity budget. I'm nervous that if we add too many bells and > whistles, we'll end up writing ourselves into a corner where we have > trouble maintaining it, where it becomes difficult to predict how > different features interact, it becomes increasingly difficult for > third-parties to handle all the different features in their > __array_ufunc__ methods... Re: implenetation complexity, I just want to bring up multiple-dispatch signatures again, where the new signature syntax would just be to join some signatures together with "|", and try them in order until one works. I'm not convinced it's better myself, I just wanted to make sure we are aware of it. The translation from the current proposed syntax would be: Current Syntax Multiple-dispatch syntax (n|1),(n|1)->() <===> (n),(n)->() | (n),()->() | (),(n)->() (m?,n),(n,p?)->(m?,p?) <===> (m,n),(n,p)->(m,p) | (n),(n,p)->(p) | (m,n),(n)->(m) | (n),(n)->() Conceivably, multiple-dispatch could reduce code complexity because we don't need all the special flags like UFUNC_CORE_DIM_CAN_BROADCAST, and instead of handling special syntax for ? and | and any future syntax separately, we just need a split("|") and then loop with the old signature handling code. On the other hand the m-d signatures are much less concise and the intention is perhaps harder to read. Yet they more explicitly state which combinations are allowed, while with '?' syntax you might have to puzzle it out. Cheers, Allan From ralf.gommers at gmail.com Thu May 31 12:02:22 2018 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Thu, 31 May 2018 09:02:22 -0700 Subject: [Numpy-discussion] Where to discuss NEPs (was: Re: new NEP: np.AbstractArray and np.asabstractarray) In-Reply-To: References: Message-ID: On Tue, May 29, 2018 at 9:24 AM, Charles R Harris wrote: > > > On Tue, May 29, 2018 at 9:46 AM, Stephan Hoyer wrote: > >> Reviving this discussion -- >> I don't really care what our policy is, but can we make a decision one >> way or the other about where we discuss NEPs? We've had a revival of NEP >> writing recently, so this is very timely. >> >> Previously, I was in slight favor of doing discussion on GitHub. Now that >> I've started doing a bit of NEP writing, I've started to swing the other >> way, since it would be nice to be able to reference draft/rejected NEPs in >> a consistent way -- and rendered HTML is more readable than raw RST in pull >> requests. >> > > My understanding of the discussion at the sprint was that we favored quick > commits of NEPs with extended discussions of them on the list. Updates and > changes would go in through the normal PR process. In practice, I expect > there will be some overlap, I think the important thing is the quick commit > with the understanding that the NEPs are only proposals until formally > adopted. I think the formal adoption process is not well defined... > For the formal adoption part, how about: 1. When discussions/disagreements appear to have been resolved, a NEP author or a core developer may propose that the NEP is formally adopted. 2. The formal decision is made by consensus, according to https://docs.scipy.org/doc/numpy/dev/governance/governance.html#consensus-based-decision-making-by-the-community (which also covers how to handle consensus not being reached). Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.h.vankerkwijk at gmail.com Thu May 31 12:11:58 2018 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Thu, 31 May 2018 12:11:58 -0400 Subject: [Numpy-discussion] Allowing broadcasting of code dimensions in generalized ufuncs In-Reply-To: References: Message-ID: Hi Allan, Seeing it written out like that, I quite like the multiple dispatch signature: perhaps verbose, but clear. It does mean a different way of changing the ufunc structure, but I actually think it has the advantage of being possible without extending the structure (though that might still be needed for frozen dimensions...): currently, the relevant parts of _tagPyUFuncObject are: ``` /* 0 for scalar ufunc; 1 for generalized ufunc */ int core_enabled; /* number of distinct dimension names in signature */ int core_num_dim_ix; /* numbers of core dimensions of each argument */ int *core_num_dims; /* dimension indices in a flatted form */ int *core_dim_ixs; /* positions of 1st core dimensions of each argument in core_dim_ixs */ int *core_offsets; ``` I think this could be changed without making any serious change if we slightly repurposed `core_enabled` as meaning `number of signatures`. Then, the pointers above could become ``` int core_xxx[num_sigs][max_core_num_dim_ixs]; ``` Less easy is `core_num_dim_ix`, but that number is not really needed anyway (nor used much in the code) as it is implicitly encoded in the indices. It could quite logically become the `max_core_num_dim_ixs` above. All the best, Marten -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.h.vankerkwijk at gmail.com Thu May 31 12:21:43 2018 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Thu, 31 May 2018 12:21:43 -0400 Subject: [Numpy-discussion] Allowing broadcasting of code dimensions in generalized ufuncs In-Reply-To: References: Message-ID: p.s. While my test case of `cube_equal` in the PR is perhaps not super-realistic, I don't know if one really wants to do multiple dispatch on something like "(o|1,n|1,m|1),(o|1,n|1,m|1)->()"... -- Marten -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.h.vankerkwijk at gmail.com Thu May 31 12:39:18 2018 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Thu, 31 May 2018 12:39:18 -0400 Subject: [Numpy-discussion] Allowing broadcasting of code dimensions in generalized ufuncs In-Reply-To: References: Message-ID: p.p.s. Your multiple dispatch signature for broadcasted dimensions is actually not quite right: should be (n|1),(n|1)->() ===> (n),(n)->() | (n),(1)->() | (1),(n)->() | (n),() -> () | (),(n)->() This is becoming quite verbose... (and perhaps become somewhat slow). Though arguably one could always allow missing dimensions to be 1 for this case, so that one could drop the last two. -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu May 31 13:18:06 2018 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 31 May 2018 11:18:06 -0600 Subject: [Numpy-discussion] Allowing broadcasting of code dimensions in generalized ufuncs In-Reply-To: References: Message-ID: On Thu, May 31, 2018 at 10:39 AM, Marten van Kerkwijk < m.h.vankerkwijk at gmail.com> wrote: > p.p.s. Your multiple dispatch signature for broadcasted dimensions is > actually not quite right: should be > (n|1),(n|1)->() ===> > > (n),(n)->() | (n),(1)->() | (1),(n)->() | (n),() -> () | (),(n)->() > > This is becoming quite verbose... (and perhaps become somewhat slow). > Though arguably one could always allow missing dimensions to be 1 for this > case, so that one could drop the last two. > > At some point there should be a formal syntax description for these things. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From matti.picus at gmail.com Thu May 31 13:53:06 2018 From: matti.picus at gmail.com (Matti Picus) Date: Thu, 31 May 2018 10:53:06 -0700 Subject: [Numpy-discussion] Allowing broadcasting of code dimensions in generalized ufuncs In-Reply-To: References: Message-ID: <2732bdc2-2528-4b5e-5595-994718170796@gmail.com> On 31/05/18 08:23, Allan Haldane wrote: > Re: implenetation complexity, I just want to bring up multiple-dispatch > signatures again, where the new signature syntax would just be to join > some signatures together with "|", and try them in order until one works. > > I'm not convinced it's better myself, I just wanted to make sure we > are aware of it. The translation from the current proposed syntax > would be: > > ? Current Syntax??????????? Multiple-dispatch syntax > > (n|1),(n|1)->()? <===>?? (n),(n)->() | (n),()->() | (),(n)->() > > > (m?,n),(n,p?)->(m?,p?)? <===> (m,n),(n,p)->(m,p) | > ????????????????????????????? (n),(n,p)->(p) | > ????????????????????????????? (m,n),(n)->(m) | > ????????????????????????????? (n),(n)->() > > ... > Cheers, > Allan I am -1 on multiple signatures. We may revisit this in time, but for now I find the minimal intrusiveness of the current changes appealing, especially as it requires few to no changes whatsoever to the inner loop function. Multiple dispatch could easily break that model by allowing very different signatures to be aggregated into a single ufunc, leading to unhandled edge cases and strange segfaults. It also seems to me that looping over all signatures might slow down ufunc calling, leading to endless variations of strategies of optimizing signature ordering. Matti From shoyer at gmail.com Thu May 31 13:55:53 2018 From: shoyer at gmail.com (Stephan Hoyer) Date: Thu, 31 May 2018 10:55:53 -0700 Subject: [Numpy-discussion] Allowing broadcasting of code dimensions in generalized ufuncs In-Reply-To: References: Message-ID: On Thu, May 31, 2018 at 4:21 AM Marten van Kerkwijk < m.h.vankerkwijk at gmail.com> wrote: > I think the case for frozen dimensions is much more solid that just > `cross1d` - there are many operations that work on size-3 vectors. > Indeed, as I noted in the PR, I have just been wrapping a > Standards-of-Astronomy library in gufuncs, and many of its functions > require size-3 vectors or 3x3 matrices [1]. Of course, I can put > checks on the sizes, and I've now done that in a custom type resolver > (which I needed anyway since, as you say, user dtypes is currently not > easy), but there is a real problem for functions that take scalars and > produce vectors: with a signature like `(),()->(n)`, I am forced to > pass in an output with size 3, which is very inconvenient (especially > if I then also want to override with `__array_ufunc__` - now my > Quantity implementation also has to start changing an output already > put in. So, having frozen dimensions is definitely helpful for > developers of new gufuncs. > I agree that the use-cases for frozen dimensions are well motivated. It's not as common as writing code that supports arbitrary dimensions, but given that the real world is three dimensional it comes up with some regularity. Certainly for these use-cases it would add significant values (not requiring pre-allocation of output arrays). Furthermore, with frozen dimensions, the signature is not just > immediately clear, `(),()->(3)` for the example above, it is also > better in telling users about what a function does. > Yes, frozen dimensions really do feel like a natural fit. There is no ambiguity about what an integer means in a gufunc signature, so the complexity of the gufunc model (for users and __array_ufunc__ implementors) would remain roughly fixed. In contrast, broadcasting would certainly increase the complexity of the model, as evidenced by the new syntax we would need. This may or may not be justified. Currently I am at -0.5 along with Nathaniel here. > Indeed, I think this addition has much more justification than the `?` > which is much more complex than the fixed size, yet neither > particularly clear nor useful beyond the single purpose of matmul. (It > is just that that single purpose has fairly high weight...) Agreed, though at least in principle there is the slightly broader use of case of handling arguments that are either matrices or column/row vectors. -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Thu May 31 14:00:06 2018 From: shoyer at gmail.com (Stephan Hoyer) Date: Thu, 31 May 2018 11:00:06 -0700 Subject: [Numpy-discussion] Allowing broadcasting of code dimensions in generalized ufuncs In-Reply-To: References: Message-ID: On Wed, May 30, 2018 at 5:01 PM Matthew Harrigan wrote: > "short-cut to automatically return False if m != n", that seems like a > silent bug > I guess it depends on the use-cases. This is how np.array_equal() works: https://docs.scipy.org/doc/numpy/reference/generated/numpy.array_equal.html We could even imagine incorporating this hypothetical "equality along some axes with broadcasting" functionality into axis/axes arguments for array_equal() if we choose this behavior. -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.h.vankerkwijk at gmail.com Thu May 31 16:14:02 2018 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Thu, 31 May 2018 16:14:02 -0400 Subject: [Numpy-discussion] Allowing broadcasting of code dimensions in generalized ufuncs In-Reply-To: <2732bdc2-2528-4b5e-5595-994718170796@gmail.com> References: <2732bdc2-2528-4b5e-5595-994718170796@gmail.com> Message-ID: > > I am -1 on multiple signatures. We may revisit this in time, but for now I > find the minimal intrusiveness of the current changes appealing, especially > as it requires few to no changes whatsoever to the inner loop function. > Multiple dispatch could easily break that model by allowing very different > signatures to be aggregated into a single ufunc, leading to unhandled edge > cases and strange segfaults. It also seems to me that looping over all > signatures might slow down ufunc calling, leading to endless variations of > strategies of optimizing signature ordering. I had actually started trying Allan's suggestion [1], and at least parsing is not difficult. But I will stop now, as I think your point about the inner loop really needing a fixed set of sizes and strides is deadly for the whole idea. (Just goes to show I should think before writing code!) As is, all the proposed changes do is fiddle with size 1 axes (well, and defining a fixed size rather than letting the operand do it), which of course doesn't matter for the inner loop. -- Marten [1] https://github.com/numpy/numpy/compare/master...mhvk:gufunc-multiple-dispatch?expand=1 -------------- next part -------------- An HTML attachment was scrubbed... URL: From allanhaldane at gmail.com Thu May 31 16:17:54 2018 From: allanhaldane at gmail.com (Allan Haldane) Date: Thu, 31 May 2018 16:17:54 -0400 Subject: [Numpy-discussion] Allowing broadcasting of code dimensions in generalized ufuncs In-Reply-To: References: <2732bdc2-2528-4b5e-5595-994718170796@gmail.com> Message-ID: On 05/31/2018 04:14 PM, Marten van Kerkwijk wrote: > I am -1 on multiple signatures. We may revisit this in time, but for > now I find the minimal intrusiveness of the current changes > appealing, especially as it requires few to no changes whatsoever to > the inner loop function. Multiple dispatch could easily break that > model by allowing very different signatures to be aggregated into a > single ufunc, leading to unhandled edge cases and strange segfaults. > It also seems to me that looping over all signatures might slow down > ufunc calling, leading to endless variations of strategies of > optimizing signature ordering. > > > I had actually started trying Allan's suggestion [1], and at least > parsing is not difficult. But I will stop now, as I think your point > about the inner loop really needing a fixed set of sizes and strides is > deadly for the whole idea. (Just goes to show I should think before > writing code!) > > As is, all the proposed changes do is fiddle with size 1 axes (well, and > defining a fixed size rather than letting the operand do it), which of > course doesn't matter for the inner loop. Yes, after seeing how complicated some of the signatures become, I think I'm more convinced the custom syntax is better. Allan From njs at pobox.com Thu May 31 18:02:52 2018 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 31 May 2018 15:02:52 -0700 Subject: [Numpy-discussion] Allowing broadcasting of code dimensions in generalized ufuncs In-Reply-To: References: Message-ID: On Thu, May 31, 2018 at 4:20 AM, Marten van Kerkwijk wrote: > Hi Nathaniel, > > I think the case for frozen dimensions is much more solid that just > `cross1d` - there are many operations that work on size-3 vectors. > Indeed, as I noted in the PR, I have just been wrapping a > Standards-of-Astronomy library in gufuncs, and many of its functions > require size-3 vectors or 3x3 matrices [1]. Of course, I can put > checks on the sizes, and I've now done that in a custom type resolver > (which I needed anyway since, as you say, user dtypes is currently not > easy), but there is a real problem for functions that take scalars and > produce vectors: with a signature like `(),()->(n)`, I am forced to > pass in an output with size 3, which is very inconvenient (especially > if I then also want to override with `__array_ufunc__` - now my > Quantity implementation also has to start changing an output already > put in. So, having frozen dimensions is definitely helpful for > developers of new gufuncs. Ah, this does sound like I'm missing something. I suspect this is a situation where we have two problems: - For some people the use cases are everyday and obvious; for others they're things we've never heard of (what's a "standard of astronomy"?) - The discussion is scattered around mailing list posts, old comments on random github issues, etc. This makes it hard for everyone to be on the same page. But this is exactly the situation where NEPs are useful. Maybe you could write up a short NEP for frozen dimensions? It doesn't need to be fancy or take long, but I think it'd be useful to have a single piece of text we can all look at that describes the use cases and how frozen dimensions help. BTW, regarding output shape: as you hint, there's a similar problem with parametrized dtypes in general. Consider defining a loop for np.add that lets it concatenate strings. If the inputs are S4 and S5, then the output should be S9 ? but how does the ufunc machinery know that? This suggests that when we do the big refactor to ufuncs to support user-defined and parametrized dtypes in general, one of the things we'll need is a way for an individual loop to select the output dtype. One natural way to do this would be to have two callbacks per loop: one that receives the input dtypes, and returns the output dtypes, and then the other that's like the current loop callback that actually performs the operation. Output shape feels very similar to output dtype to me, so maybe the general way to handle this would be to make the first callback take the input shapes+dtypes and return the desired output shapes+dtypes? Maybe frozen dimensions are a good idea regardless, but just wanted to put that out there since it might be a more general solution. > Furthermore, with frozen dimensions, the signature is not just > immediately clear, `(),()->(3)` for the example above, it is also > better in telling users about what a function does. > > Indeed, I think this addition has much more justification than the `?` > which is much more complex than the fixed size, yet neither > particularly clear nor useful beyond the single purpose of matmul. (It > is just that that single purpose has fairly high weight...) Yeah, that's why I'm not 100% happy with '?' either (even though I proposed it in the first place :-)). But matmul is like, arguably the single most important operation in numpy, so it can justify a lot more... -n -- Nathaniel J. Smith -- https://vorpus.org From matti.picus at gmail.com Thu May 31 19:50:02 2018 From: matti.picus at gmail.com (Matti Picus) Date: Thu, 31 May 2018 16:50:02 -0700 Subject: [Numpy-discussion] A roadmap for NumPy - longer term planning Message-ID: <69cf3275-26f3-deec-d499-b56204d96c60@gmail.com> At the recent NumPy sprint at BIDS (thanks to those who made the trip) we spent some time brainstorming about a roadmap for NumPy, in the spirit of similar work that was done for Jupyter. The idea is that a document with wide community acceptance can guide the work of the full-time developer(s), and be a source of ideas for expanding development efforts. I put the document up at https://github.com/numpy/numpy/wiki/NumPy-Roadmap, and hope to discuss it at a BOF session during SciPy in the middle of July in Austin. Eventually it could become a NEP or formalized in another way. Matti