From miles.cranmer at gmail.com Mon Oct 1 14:36:14 2018 From: miles.cranmer at gmail.com (Miles Cranmer) Date: Mon, 1 Oct 2018 14:36:14 -0400 Subject: [Numpy-discussion] Fwd: Performance feature for np.isin and np.in1d In-Reply-To: References: Message-ID: (Not sure what the right list is for this) Hi, I have started a PR for a "fast_integers" flag for np.isin and np.in1d which greatly increases performance when both arrays are integral. It works by creating a boolean array with elements set to 1 where the parent array (ar2) has elements and 0 otherwise. This array is then indexed by the child array (ar1) to create the output. https://github.com/numpy/numpy/pull/12065 Thoughts on this? Please let me know if you have any questions about my addition. Thank you. Best regards, Miles -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu Oct 4 17:30:52 2018 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 4 Oct 2018 15:30:52 -0600 Subject: [Numpy-discussion] Deactivated appveyor Message-ID: Hi All, This is just to notify everyone making PRs on github that I have deactivated the appeveyor webhook now that azure testing seems to be working for the windows tests. Azure is much faster and I expect that travis or one of the other platforms will become the testing bottleneck. I think the new tests can be activated by closing/opening PRs, and maybe it will happen automatically on updates, we will see. Finding details of failing tests is a bit of a hassle, but there is an obscure button at the bottom of the default details page that you can click for actual details, although you will still need to hunt around to find the pipeline. This is still somewhat experimental, so post your feedback and complaints here. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From matti.picus at gmail.com Fri Oct 5 04:31:20 2018 From: matti.picus at gmail.com (Matti Picus) Date: Fri, 5 Oct 2018 11:31:20 +0300 Subject: [Numpy-discussion] Adding a hex version like PY_VERSION_HEX Message-ID: In PR 12074 https://github.com/numpy/numpy/pull/12074 I propose adding a function `version.get_numpy_version_as_hex()` which returns a hex value to represent the current NumPy version MAJOR.MINOR.MICRO where v = hex(MAJOR << 24 | MINOR << 16 | MICRO) so the current 1.15.0 would become '0x10f0000'. I also made this avaiable via C through `get_hex_version`. The hex version is based on the |PY_VERSION_HEX| macro from CPython. Currently we have a ABI version and an API version for the numpy C-API. We only increment those for updated or breaking changes in the NumPy C-API, but not for - changes in behavior, especially in python code - changes in sizes of outward-facing structures like PyArray_Desc Occasionally it is desirable to determine backward compatibility from the runtime version, rather than from the ABI or API versions, and having it as a single value makes the comparison in C easy. For instance this may be convenient when there is suspicion that older header files may have been used to create or manipulate an object directly in C (or via a cython optimization), and we want to verify the version used to create the object, or when we may want to verify de-serialized objects. The `numpy.lib._version.NumpyVersion` class enables version comparison in python, but I would prefer a single value that can be stored in a C struct as an integer type. Since this is an enhancement proposal, I am bringing the idea to the mailing list for reactions. Matti From Jerome.Kieffer at esrf.fr Fri Oct 5 04:46:02 2018 From: Jerome.Kieffer at esrf.fr (Jerome Kieffer) Date: Fri, 5 Oct 2018 10:46:02 +0200 Subject: [Numpy-discussion] Adding a hex version like PY_VERSION_HEX In-Reply-To: References: Message-ID: <20181005104602.5c2c971c@lintaillefer.esrf.fr> On Fri, 5 Oct 2018 11:31:20 +0300 Matti Picus wrote: > In PR 12074 https://github.com/numpy/numpy/pull/12074 I propose adding a > function `version.get_numpy_version_as_hex()` which returns a hex value > to represent the current NumPy version MAJOR.MINOR.MICRO where > > v = hex(MAJOR << 24 | MINOR << 16 | MICRO) +1 We use it in our code and it is a good practice, much better then 0.9.0>0.10.0 ! We added some support for dev, alpha, beta, RC and final versions in https://github.com/silx-kit/silx/blob/master/version.py Cheers, -- J?r?me Kieffer From matti.picus at gmail.com Sun Oct 7 02:24:22 2018 From: matti.picus at gmail.com (Matti Picus) Date: Sun, 7 Oct 2018 09:24:22 +0300 Subject: [Numpy-discussion] Adding a hex version like PY_VERSION_HEX In-Reply-To: <20181005104602.5c2c971c@lintaillefer.esrf.fr> References: <20181005104602.5c2c971c@lintaillefer.esrf.fr> Message-ID: On 05/10/18 11:46, Jerome Kieffer wrote: > On Fri, 5 Oct 2018 11:31:20 +0300 > Matti Picus wrote: > >> In PR 12074 https://github.com/numpy/numpy/pull/12074 I propose adding a >> function `version.get_numpy_version_as_hex()` which returns a hex value >> to represent the current NumPy version MAJOR.MINOR.MICRO where >> >> v = hex(MAJOR << 24 | MINOR << 16 | MICRO) > +1 > > We use it in our code and it is a good practice, much better then 0.9.0>0.10.0 ! > > We added some support for dev, alpha, beta, RC and final versions in > https://github.com/silx-kit/silx/blob/master/version.py > > Cheers, Thanks. I think at this point I will change the proposal to v = hex(MAJOR << 24 | MINOR << 16 | MICRO << 8) which leaves room for future enhancement with "release level" and "serial" as the lower bits. Matti From mark.harfouche at gmail.com Sun Oct 7 10:32:11 2018 From: mark.harfouche at gmail.com (Mark Harfouche) Date: Sun, 7 Oct 2018 10:32:11 -0400 Subject: [Numpy-discussion] ndrange, like range but multidimensiontal Message-ID: Hi All, I've been using numpy array objects to store collections of 2D (and soon ND) variables. When iterating through these collections, I often found it useful to use `ndindex`, which for `for loops` behaves much like `range` with only a `stop` parameter. That said, it lacks a few features that are now present in `range` are missing from `ndindex`, most notably the ability to iterate over a subset of the ndindex. I found myself often writing `itertools.product(range(1, data.shapep[0]), range(3, data.shape[2]))` for custom iterations. While it does flatten out the for loop, it is arguable less readable than having 1 or 2 levels of nested for loops. It is quite possible that `nditer` would solve my problems, but unfortunately I am still not able to make sense of then numerous options it has. I propose an `ndrange` class that can be used to iterate over nd-collections mimicking the API of `range` as much as possible and adapting it to the ND case (i.e. returning tuples instead of singletons). Since this is an enhancement proposal, I am bringing the idea to the mailing list for reactions. The implementation in this PR https://github.com/numpy/numpy/pull/12094 is based on keeping track of a tuple of python `range` range objects. The `__iter__` method returns the result of `itertools.product(*self._ranges)` By leveraging python's `range` implementation, operations like `containement` `index`, `reversed`, `equality` and most importantly slicing of the ndrange object are possible to offer to the general numpy audiance. For example, iterating through a 2D collection but avoiding indexing the first and last column used to look like this: ``` c = np.empty((4, 4), dtype=object) # ... compute on c for j in range(c.shape[0]): for i in range(1, c.shape[1]-1): c[j, i] # = compute on c[j, i] that depends on the index i, j ``` With `np.ndrange` it can look something like this: ``` c = np.empty((4, 4), dtype=object) # ... compute on c for i in np.ndrange(c.shape)[:, 1:-1]: c[i] # = some operation on c[i] that depends on the index i ``` very pythonic, very familiar to numpy users Thank you for the feedback, Mark References: An issue requesting expansion to the ndindex API on github: https://github.com/numpy/numpy/issues/6393 -------------- next part -------------- An HTML attachment was scrubbed... URL: From allanhaldane at gmail.com Sun Oct 7 17:20:11 2018 From: allanhaldane at gmail.com (Allan Haldane) Date: Sun, 7 Oct 2018 17:20:11 -0400 Subject: [Numpy-discussion] ndrange, like range but multidimensiontal In-Reply-To: References: Message-ID: On 10/07/2018 10:32 AM, Mark Harfouche wrote: > With `np.ndrange` it can look something like this: > > ``` > c = np.empty((4, 4), dtype=object) > # ... compute on c > for i in np.ndrange(c.shape)[:, 1:-1]: > ??? c[i] # = some operation on c[i] that depends on the index i > ``` > > very pythonic, very familiar to numpy users So if I understand, this does the same as `np.ndindex` but allows numpy-like slicing of the returned generator object, as requested in #6393. I don't like the duplication in functionality between ndindex and ndrange here. Better rather to add the slicing functionality to ndindex, than create a whole new nearly-identical function. np.ndindex is already a somewhat obscure and discouraged method since it is usually better to find a vectorized numpy operation instead of a for loop, and I don't like adding more obscure functions. But as an improvement to np.ndindex, I think adding this functionality seems good if it can be nicely implemented. Maybe there is a way to use the same optimization tricks as in the current implementation of ndindex but allow different stop/step? A simple wrapper of ndindex? Cheers, Allan From mark.harfouche at gmail.com Mon Oct 8 12:21:40 2018 From: mark.harfouche at gmail.com (Mark Harfouche) Date: Mon, 8 Oct 2018 12:21:40 -0400 Subject: [Numpy-discussion] ndrange, like range but multidimensiontal Message-ID: Allan, Sorry for the delay. I had my mailing list preferences set to digest. I changed them for now. (I hope this message continues that thread). Thank you for your feedback. You are correct in identifying that the real feature is expanding the `ndindex` API to support slicing. See comments about the separate points you raised below ## Expanding the API of ndindex > Better rather to add the slicing functionality to ndindex, than create a whole new nearly-identical function. This is a very important point. I should have included a note about it. My [first attempt]( https://github.com/hmaarrfk/numpy/pull/1/files#diff-1bd953557a98073031ce66d05dbde3c8R663) did try that approach. I ran into 2 issues: 1. Getting around the catch-all positional argument is annoying, and logic to do that will likely be error prone. Peculiarities about how we implement it might cause some very strange for `tuple-like` inputs that we don't expect. 2. `ndindex` is an iterator itself. As proposed, `ndrange`, like `range`, is not an iterator. Changing this behaviour would likely lead to breaking code that uses that assumption. For example anybody using introspection or code like: ``` indx = np.ndindex(5, 5) next(indx) # Don't look at the (0, 0) coordinate for i in indx: print(i) ``` would break if `ndindex` becomes "not an iterator" For these two reasons, I thought it was easier to simply have a new class, that seems like a close sibling to `ndindex`. I personally don't care about point 1 so much. In my mind, start, stop and step is confusing in ND. but maybe some might find it useful? Point 1 also makes it harder to make `ndrange` more familiar to `range` users. > I don't like adding more obscure functions Hopefully the name `ndrange` makes it easier to find? ## Writing vectorized code > np.ndindex is already a somewhat obscure and discouraged method since it is usually better to find a vectorized numpy operation instead of a for loop I understand that this kind of function is not focused on `numerical` operations on the elements of the matrix itself. It really is there to help fill the void of any useful multi-dimensional python container. I think `ndrange`/`ndindex` is there to be used like `np.vectorized`. I've tried to use `np.vectorize` in my own code, but quickly found that making logic fit into vectorize's requirements was often more complicated than writing my own loop multi-nested loops. In my opinion, nested `range` loops or `ndrange`/`ndindex` is a much more natural way to loop over collections compared to `np.vectorized`. I'm glad to add warnings to the docs. ## Implementation detail: itertools.product + range vs nditer > Maybe there is a way to use the same optimization tricks as in the current implementation of ndindex but allow different stop/step? My primary goal here is to make `ndrange` behave much like `range`. By implementing it on top of `range`, it makes it obvious to me how to enforce that behaviour as the API of range gets expanded (though it seems to have settled since Python 3.3). Whatever we decide to call `ndrange`/`ndindex`, the tests I wrote can help ensure we have good range-API coverage (for now). itertools.product + range seems to be much faster than the current implementation of ndindex (python 3.6) ``` %%timeit for i in np.ndindex(100, 100): pass 3.94 ms ? 19.4 ?s per loop (mean ? std. dev. of 7 runs, 100 loops each) %%timeit import itertools for i in itertools.product(range(100), range(100)): pass 231 ?s ? 1.09 ?s per loop (mean ? std. dev. of 7 runs, 1000 loops each) ``` -------------- next part -------------- An HTML attachment was scrubbed... URL: From allanhaldane at gmail.com Mon Oct 8 15:33:18 2018 From: allanhaldane at gmail.com (Allan Haldane) Date: Mon, 8 Oct 2018 15:33:18 -0400 Subject: [Numpy-discussion] ndrange, like range but multidimensiontal In-Reply-To: References: Message-ID: <039bfa4b-5f14-0241-6fd6-a52b123ac176@gmail.com> On 10/8/18 12:21 PM, Mark Harfouche wrote: > 2. `ndindex` is an iterator itself. As proposed, `ndrange`, like > `range`, is not an iterator. Changing this behaviour would likely lead > to breaking code that uses that assumption. For example anybody using > introspection or code like: > > ``` > indx = np.ndindex(5, 5) > next(indx)? # Don't look at the (0, 0) coordinate > for i in indx: > ??? print(i) > ``` > would break if `ndindex` becomes "not an iterator" OK, I see now. Just like python3 has separate range and range_iterator types, where range is sliceable, we would have separate ndrange and ndindex types, where ndrange is sliceable. You're just copying the python3 api. That justifies it pretty well for me. I still think we shouldn't have two functions which do nearly the same thing. We should only have one, and get rid of the other. I see two ways forward: * replace ndindex by your ndrange code, so it is no longer an iter. This would require some deprecation cycles for the cases that break. * deprecate ndindex in favor of a new function ndrange. We would keep ndindex around for back-compatibility, with a dep warning to use ndrange instead. Doing a code search on github, I can see that a lot of people's code would break if ndindex no longer was an iter. I also like the name ndrange for its allusion to python3's range behavior. That makes me lean towards the second option of a separate ndrange, with possible deprecation of ndindex. > itertools.product + range seems to be much faster than the current > implementation of ndindex > > (python 3.6) > ``` > %%timeit > > for i in np.ndindex(100, 100): > ??? pass > 3.94 ms ? 19.4 ?s per loop (mean ? std. dev. of 7 runs, 100 loops each) > > %%timeit > import itertools > for i in itertools.product(range(100), range(100)): > ??? pass > 231 ?s ? 1.09 ?s per loop (mean ? std. dev. of 7 runs, 1000 loops each) > ``` If the new code ends up faster than the old code, that's great, and further justification for using ndrange instead of ndindex. I had thought using nditer in the old code was fastest. So as far as I am concerned, I say go ahead with the PR the way you are doing it. Allan From shoyer at gmail.com Mon Oct 8 16:25:14 2018 From: shoyer at gmail.com (Stephan Hoyer) Date: Mon, 8 Oct 2018 13:25:14 -0700 Subject: [Numpy-discussion] ndrange, like range but multidimensiontal In-Reply-To: <039bfa4b-5f14-0241-6fd6-a52b123ac176@gmail.com> References: <039bfa4b-5f14-0241-6fd6-a52b123ac176@gmail.com> Message-ID: I'm open to adding ndrange, and "soft-deprecating" ndindex (i.e., discouraging its use in our docs, but not actually deprecating it). Certainly ndrange seems like a small but meaningful improvement in the interface. That said, I'm not convinced this is really worth the trouble. I think the nested loop is still pretty readable/clear, and there are few times when I've actually found ndindex() be useful. On Mon, Oct 8, 2018 at 12:35 PM Allan Haldane wrote: > On 10/8/18 12:21 PM, Mark Harfouche wrote: > > 2. `ndindex` is an iterator itself. As proposed, `ndrange`, like > > `range`, is not an iterator. Changing this behaviour would likely lead > > to breaking code that uses that assumption. For example anybody using > > introspection or code like: > > > > ``` > > indx = np.ndindex(5, 5) > > next(indx) # Don't look at the (0, 0) coordinate > > for i in indx: > > print(i) > > ``` > > would break if `ndindex` becomes "not an iterator" > > OK, I see now. Just like python3 has separate range and range_iterator > types, where range is sliceable, we would have separate ndrange and > ndindex types, where ndrange is sliceable. You're just copying the > python3 api. That justifies it pretty well for me. > > I still think we shouldn't have two functions which do nearly the same > thing. We should only have one, and get rid of the other. I see two ways > forward: > > * replace ndindex by your ndrange code, so it is no longer an iter. > This would require some deprecation cycles for the cases that break. > * deprecate ndindex in favor of a new function ndrange. We would keep > ndindex around for back-compatibility, with a dep warning to use > ndrange instead. > > Doing a code search on github, I can see that a lot of people's code > would break if ndindex no longer was an iter. I also like the name > ndrange for its allusion to python3's range behavior. That makes me lean > towards the second option of a separate ndrange, with possible > deprecation of ndindex. > > > itertools.product + range seems to be much faster than the current > > implementation of ndindex > > > > (python 3.6) > > ``` > > %%timeit > > > > for i in np.ndindex(100, 100): > > pass > > 3.94 ms ? 19.4 ?s per loop (mean ? std. dev. of 7 runs, 100 loops each) > > > > %%timeit > > import itertools > > for i in itertools.product(range(100), range(100)): > > pass > > 231 ?s ? 1.09 ?s per loop (mean ? std. dev. of 7 runs, 1000 loops each) > > ``` > > If the new code ends up faster than the old code, that's great, and > further justification for using ndrange instead of ndindex. I had > thought using nditer in the old code was fastest. > > So as far as I am concerned, I say go ahead with the PR the way you are > doing it. > > Allan > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rmcgibbo at gmail.com Mon Oct 8 16:31:47 2018 From: rmcgibbo at gmail.com (Robert T. McGibbon) Date: Mon, 8 Oct 2018 16:31:47 -0400 Subject: [Numpy-discussion] Determining NPY_ABI_VERSION statically in compiled extensions Message-ID: Is anyone aware of any tricks that can be played with tools like `readelf`, `nm` or `dlopen` / `dlsym` in order to statically determine what version of numpy a fully-compiled C extension (for example, found inside a wheel) was compiled against? Even if it only worked with relatively new versions of numpy, that would be fine. I'm interested in creating something similar to https://github.com/pypa/auditwheel that could statically check for compatibility between wheel files and python installations, in situations where the metadata about how they were compiled is missing. -- -Robert -------------- next part -------------- An HTML attachment was scrubbed... URL: From matti.picus at gmail.com Mon Oct 8 17:26:18 2018 From: matti.picus at gmail.com (Matti Picus) Date: Tue, 9 Oct 2018 00:26:18 +0300 Subject: [Numpy-discussion] Determining NPY_ABI_VERSION statically in compiled extensions In-Reply-To: References: Message-ID: <64c3f2fb-c0b3-c013-5402-7be5296e4c59@gmail.com> On 08/10/18 23:31, Robert T. McGibbon wrote: > Is anyone aware of any tricks that can be played with tools like > `readelf`, `nm` or `dlopen` / `dlsym` in order to statically determine > what version of numpy a fully-compiled C extension (for example, found > inside a wheel) was compiled against? Even if it only worked with > relatively new versions of numpy, that would be fine. > > I'm interested in creating something similar to > https://github.com/pypa/auditwheel that could statically check for > compatibility between wheel files and python installations, in > situations where the metadata about how they were compiled is missing. > -- > -Robert > NPY_ABI_VERSION is exposed in C as PyArray_GetNDArrayCVersion and NPY_API_VERSION is exposed in C as PyArray_GetNDArrayCFeatureVersion. These are not incremented for every NumPy release, see the documentation in numpy/core/setup_common.py. The numpy.__version__ is determined by a python file numpy/version.py, which is probably what you want to use. There is an open Issue to better reveal compile time info https://github.com/numpy/numpy/issues/10983 Matti From mark.harfouche at gmail.com Mon Oct 8 19:25:30 2018 From: mark.harfouche at gmail.com (Mark Harfouche) Date: Mon, 8 Oct 2018 19:25:30 -0400 Subject: [Numpy-discussion] ndrange, like range but multidimensiontal In-Reply-To: References: <039bfa4b-5f14-0241-6fd6-a52b123ac176@gmail.com> Message-ID: since ndrange is a superset of the features of ndindex, we can implement ndindex with ndrange or keep it as is. ndindex is now a glorified `nditer` object anyway. So it isn't so much of a maintenance burden. As for how ndindex is implemented, I'm a little worried about python 2 performance seeing as range is a list. I would wait on changing the way ndindex is implemented for now. I agree with Stephan that ndindex should be kept in. Many want backward compatible code. It would be hard for me to justify why a dependency should be bumped up to bleeding edge numpy just for a convenience iterator. Honestly, I was really surprised to see such a speed difference, I thought it would have been closer. Allan, I decided to run a few more benchmarks, the nditer just seems slow for single array access some reason. Maybe a bug? ``` import numpy as np import itertools a = np.ones((1000, 1000)) b = {} for i in np.ndindex(a.shape): b[i] = i %%timeit # op_flag=('readonly',) doesn't change performance for a_value in np.nditer(a): pass 109 ms ? 921 ?s per loop (mean ? std. dev. of 7 runs, 10 loops each) %%timeit for i in itertools.product(range(1000), range(1000)): a_value = a[i] 113 ms ? 1.72 ms per loop (mean ? std. dev. of 7 runs, 10 loops each) %%timeit for i in itertools.product(range(1000), range(1000)): c = b[i] 193 ms ? 3.89 ms per loop (mean ? std. dev. of 7 runs, 1 loop each) %%timeit for a_value in a.flat: pass 25.3 ms ? 278 ?s per loop (mean ? std. dev. of 7 runs, 10 loops each) %%timeit for k, v in b.items(): pass 19.9 ms ? 675 ?s per loop (mean ? std. dev. of 7 runs, 10 loops each) %%timeit for i in itertools.product(range(1000), range(1000)): pass 28 ms ? 715 ?s per loop (mean ? std. dev. of 7 runs, 10 loops each) ``` On Mon, Oct 8, 2018 at 4:26 PM Stephan Hoyer wrote: > I'm open to adding ndrange, and "soft-deprecating" ndindex (i.e., > discouraging its use in our docs, but not actually deprecating it). > Certainly ndrange seems like a small but meaningful improvement in the > interface. > > That said, I'm not convinced this is really worth the trouble. I think the > nested loop is still pretty readable/clear, and there are few times when > I've actually found ndindex() be useful. > > On Mon, Oct 8, 2018 at 12:35 PM Allan Haldane > wrote: > >> On 10/8/18 12:21 PM, Mark Harfouche wrote: >> > 2. `ndindex` is an iterator itself. As proposed, `ndrange`, like >> > `range`, is not an iterator. Changing this behaviour would likely lead >> > to breaking code that uses that assumption. For example anybody using >> > introspection or code like: >> > >> > ``` >> > indx = np.ndindex(5, 5) >> > next(indx) # Don't look at the (0, 0) coordinate >> > for i in indx: >> > print(i) >> > ``` >> > would break if `ndindex` becomes "not an iterator" >> >> OK, I see now. Just like python3 has separate range and range_iterator >> types, where range is sliceable, we would have separate ndrange and >> ndindex types, where ndrange is sliceable. You're just copying the >> python3 api. That justifies it pretty well for me. >> >> I still think we shouldn't have two functions which do nearly the same >> thing. We should only have one, and get rid of the other. I see two ways >> forward: >> >> * replace ndindex by your ndrange code, so it is no longer an iter. >> This would require some deprecation cycles for the cases that break. >> * deprecate ndindex in favor of a new function ndrange. We would keep >> ndindex around for back-compatibility, with a dep warning to use >> ndrange instead. >> >> Doing a code search on github, I can see that a lot of people's code >> would break if ndindex no longer was an iter. I also like the name >> ndrange for its allusion to python3's range behavior. That makes me lean >> towards the second option of a separate ndrange, with possible >> deprecation of ndindex. >> >> > itertools.product + range seems to be much faster than the current >> > implementation of ndindex >> > >> > (python 3.6) >> > ``` >> > %%timeit >> > >> > for i in np.ndindex(100, 100): >> > pass >> > 3.94 ms ? 19.4 ?s per loop (mean ? std. dev. of 7 runs, 100 loops each) >> > >> > %%timeit >> > import itertools >> > for i in itertools.product(range(100), range(100)): >> > pass >> > 231 ?s ? 1.09 ?s per loop (mean ? std. dev. of 7 runs, 1000 loops each) >> > ``` >> >> If the new code ends up faster than the old code, that's great, and >> further justification for using ndrange instead of ndindex. I had >> thought using nditer in the old code was fastest. >> >> So as far as I am concerned, I say go ahead with the PR the way you are >> doing it. >> >> Allan >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rmcgibbo at gmail.com Mon Oct 8 19:37:38 2018 From: rmcgibbo at gmail.com (Robert T. McGibbon) Date: Mon, 8 Oct 2018 19:37:38 -0400 Subject: [Numpy-discussion] Determining NPY_ABI_VERSION statically in compiled extensions In-Reply-To: <64c3f2fb-c0b3-c013-5402-7be5296e4c59@gmail.com> References: <64c3f2fb-c0b3-c013-5402-7be5296e4c59@gmail.com> Message-ID: Matti, That doesn't quite cover my use case. I'm interested in querying a .whl file containing .so files that were compiled against numpy (not my currently installed version of numpy) to determine the conditions under which those `.so` files were compiled. -Robert On Mon, Oct 8, 2018 at 5:26 PM Matti Picus wrote: > On 08/10/18 23:31, Robert T. McGibbon wrote: > > Is anyone aware of any tricks that can be played with tools like > > `readelf`, `nm` or `dlopen` / `dlsym` in order to statically determine > > what version of numpy a fully-compiled C extension (for example, found > > inside a wheel) was compiled against? Even if it only worked with > > relatively new versions of numpy, that would be fine. > > > > I'm interested in creating something similar to > > https://github.com/pypa/auditwheel that could statically check for > > compatibility between wheel files and python installations, in > > situations where the metadata about how they were compiled is missing. > > -- > > -Robert > > > NPY_ABI_VERSION is exposed in C as PyArray_GetNDArrayCVersion and > NPY_API_VERSION is exposed in C as PyArray_GetNDArrayCFeatureVersion. > These are not incremented for every NumPy release, see the documentation > in numpy/core/setup_common.py. > > The numpy.__version__ is determined by a python file numpy/version.py, > which is probably what you want to use. > > There is an open Issue to better reveal compile time info > https://github.com/numpy/numpy/issues/10983 > > Matti > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -- -Robert -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue Oct 9 13:53:21 2018 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 9 Oct 2018 11:53:21 -0600 Subject: [Numpy-discussion] Plans for 1.15.3 release, 1.16.x branch Message-ID: Hi All, I'm planning to do a 1.15.3 release in about two weeks, if there are fixes or regressions that you feel have slipped by without getting marked for backport, please comment. I'm planning on branching 1.16.x in mid November, which should provide enough time for 1-2 release candidates and a release before the end of the year. This is all contingent on having the numpy-wheels repo working again. The latest 0.32 release of the wheel package broke everything, so we will either need to pin the version or wait on potential fixes currently under discussion. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From miles.cranmer at gmail.com Tue Oct 9 14:01:48 2018 From: miles.cranmer at gmail.com (Miles Cranmer) Date: Tue, 9 Oct 2018 14:01:48 -0400 Subject: [Numpy-discussion] Performance feature for np.isin and np.in1d In-Reply-To: References: Message-ID: Hi, I was wondering how I could have this PR merged ( https://github.com/numpy/numpy/pull/12065)? The discussion on the PR seems to have gone well and all tests pass. Cheers, Miles On Mon, Oct 1, 2018 at 2:36 PM Miles Cranmer wrote: > (Not sure what the right list is for this) > > Hi, > > I have started a PR for a "fast_integers" flag for np.isin and np.in1d > which greatly increases performance when both arrays are integral. It works > by creating a boolean array with elements set to 1 where the parent array > (ar2) has elements and 0 otherwise. This array is then indexed by the child > array (ar1) to create the output. > > https://github.com/numpy/numpy/pull/12065 > > Thoughts on this? Please let me know if you have any questions about my > addition. > > Thank you. > Best regards, > Miles > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Tue Oct 9 14:07:01 2018 From: shoyer at gmail.com (Stephan Hoyer) Date: Tue, 9 Oct 2018 11:07:01 -0700 Subject: [Numpy-discussion] Plans for 1.15.3 release, 1.16.x branch In-Reply-To: References: Message-ID: On Tue, Oct 9, 2018 at 10:54 AM Charles R Harris wrote: > I'm planning on branching 1.16.x in mid November, which should provide > enough time for 1-2 release candidates and a release before the end of the > year. > OK, this gives us a good target for finishing up the NEP-18 work! -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Tue Oct 9 16:58:37 2018 From: shoyer at gmail.com (Stephan Hoyer) Date: Tue, 9 Oct 2018 13:58:37 -0700 Subject: [Numpy-discussion] ndrange, like range but multidimensiontal In-Reply-To: References: <039bfa4b-5f14-0241-6fd6-a52b123ac176@gmail.com> Message-ID: The speed difference is interesting but really a different question than the public API. I'm coming around to ndrange(). I can see how it could be useful for symbolic manipulation of arrays and indexing operations, similar to what we do in dask and xarray. On Mon, Oct 8, 2018 at 4:25 PM Mark Harfouche wrote: > since ndrange is a superset of the features of ndindex, we can implement > ndindex with ndrange or keep it as is. > ndindex is now a glorified `nditer` object anyway. So it isn't so much of > a maintenance burden. > As for how ndindex is implemented, I'm a little worried about python 2 > performance seeing as range is a list. > I would wait on changing the way ndindex is implemented for now. > > I agree with Stephan that ndindex should be kept in. Many want backward > compatible code. It would be hard for me to justify why a dependency should > be bumped up to bleeding edge numpy just for a convenience iterator. > > Honestly, I was really surprised to see such a speed difference, I thought > it would have been closer. > > Allan, I decided to run a few more benchmarks, the nditer just seems slow > for single array access some reason. Maybe a bug? > > ``` > import numpy as np > import itertools > a = np.ones((1000, 1000)) > > b = {} > for i in np.ndindex(a.shape): > b[i] = i > > %%timeit > # op_flag=('readonly',) doesn't change performance > for a_value in np.nditer(a): > pass > 109 ms ? 921 ?s per loop (mean ? std. dev. of 7 runs, 10 loops each) > > %%timeit > for i in itertools.product(range(1000), range(1000)): > a_value = a[i] > 113 ms ? 1.72 ms per loop (mean ? std. dev. of 7 runs, 10 loops each) > > %%timeit > for i in itertools.product(range(1000), range(1000)): > c = b[i] > 193 ms ? 3.89 ms per loop (mean ? std. dev. of 7 runs, 1 loop each) > > %%timeit > for a_value in a.flat: > pass > 25.3 ms ? 278 ?s per loop (mean ? std. dev. of 7 runs, 10 loops each) > > %%timeit > for k, v in b.items(): > pass > 19.9 ms ? 675 ?s per loop (mean ? std. dev. of 7 runs, 10 loops each) > > %%timeit > for i in itertools.product(range(1000), range(1000)): > pass > 28 ms ? 715 ?s per loop (mean ? std. dev. of 7 runs, 10 loops each) > ``` > > On Mon, Oct 8, 2018 at 4:26 PM Stephan Hoyer wrote: > >> I'm open to adding ndrange, and "soft-deprecating" ndindex (i.e., >> discouraging its use in our docs, but not actually deprecating it). >> Certainly ndrange seems like a small but meaningful improvement in the >> interface. >> >> That said, I'm not convinced this is really worth the trouble. I think >> the nested loop is still pretty readable/clear, and there are few times >> when I've actually found ndindex() be useful. >> >> On Mon, Oct 8, 2018 at 12:35 PM Allan Haldane >> wrote: >> >>> On 10/8/18 12:21 PM, Mark Harfouche wrote: >>> > 2. `ndindex` is an iterator itself. As proposed, `ndrange`, like >>> > `range`, is not an iterator. Changing this behaviour would likely lead >>> > to breaking code that uses that assumption. For example anybody using >>> > introspection or code like: >>> > >>> > ``` >>> > indx = np.ndindex(5, 5) >>> > next(indx) # Don't look at the (0, 0) coordinate >>> > for i in indx: >>> > print(i) >>> > ``` >>> > would break if `ndindex` becomes "not an iterator" >>> >>> OK, I see now. Just like python3 has separate range and range_iterator >>> types, where range is sliceable, we would have separate ndrange and >>> ndindex types, where ndrange is sliceable. You're just copying the >>> python3 api. That justifies it pretty well for me. >>> >>> I still think we shouldn't have two functions which do nearly the same >>> thing. We should only have one, and get rid of the other. I see two ways >>> forward: >>> >>> * replace ndindex by your ndrange code, so it is no longer an iter. >>> This would require some deprecation cycles for the cases that break. >>> * deprecate ndindex in favor of a new function ndrange. We would keep >>> ndindex around for back-compatibility, with a dep warning to use >>> ndrange instead. >>> >>> Doing a code search on github, I can see that a lot of people's code >>> would break if ndindex no longer was an iter. I also like the name >>> ndrange for its allusion to python3's range behavior. That makes me lean >>> towards the second option of a separate ndrange, with possible >>> deprecation of ndindex. >>> >>> > itertools.product + range seems to be much faster than the current >>> > implementation of ndindex >>> > >>> > (python 3.6) >>> > ``` >>> > %%timeit >>> > >>> > for i in np.ndindex(100, 100): >>> > pass >>> > 3.94 ms ? 19.4 ?s per loop (mean ? std. dev. of 7 runs, 100 loops each) >>> > >>> > %%timeit >>> > import itertools >>> > for i in itertools.product(range(100), range(100)): >>> > pass >>> > 231 ?s ? 1.09 ?s per loop (mean ? std. dev. of 7 runs, 1000 loops each) >>> > ``` >>> >>> If the new code ends up faster than the old code, that's great, and >>> further justification for using ndrange instead of ndindex. I had >>> thought using nditer in the old code was fastest. >>> >>> So as far as I am concerned, I say go ahead with the PR the way you are >>> doing it. >>> >>> Allan >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Tue Oct 9 23:19:21 2018 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Tue, 9 Oct 2018 20:19:21 -0700 Subject: [Numpy-discussion] Adding a hex version like PY_VERSION_HEX In-Reply-To: References: <20181005104602.5c2c971c@lintaillefer.esrf.fr> Message-ID: On Sat, Oct 6, 2018 at 11:24 PM Matti Picus wrote: > On 05/10/18 11:46, Jerome Kieffer wrote: > > On Fri, 5 Oct 2018 11:31:20 +0300 > > Matti Picus wrote: > > > >> In PR 12074 https://github.com/numpy/numpy/pull/12074 I propose adding > a > >> function `version.get_numpy_version_as_hex()` which returns a hex value > >> to represent the current NumPy version MAJOR.MINOR.MICRO where > >> > >> v = hex(MAJOR << 24 | MINOR << 16 | MICRO) > > +1 > > > > We use it in our code and it is a good practice, much better then > 0.9.0>0.10.0 ! > > > > We added some support for dev, alpha, beta, RC and final versions in > > https://github.com/silx-kit/silx/blob/master/version.py > > > > Cheers, > Thanks. I think at this point I will change the proposal to > > v = hex(MAJOR << 24 | MINOR << 16 | MICRO << 8) > > which leaves room for future enhancement with "release level" and "serial" > as the lower bits. > Makes sense, but to me adding a tuple (like sys.version_info) would be more logical. Do that as well or instead of? Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From wieser.eric+numpy at gmail.com Wed Oct 10 00:03:36 2018 From: wieser.eric+numpy at gmail.com (Eric Wieser) Date: Wed, 10 Oct 2018 05:03:36 +0100 Subject: [Numpy-discussion] Adding a hex version like PY_VERSION_HEX In-Reply-To: References: <20181005104602.5c2c971c@lintaillefer.esrf.fr> Message-ID: +1 on Ralf's suggestion. I'm not sure there's any case where the C code should be using a hex version number - either it's using the C api, in which case it should just be looking at the C api version - or it's calling back into the python API, in which case it's probably not unreasonable to ask it to inspect `np.__version__` / a hypothetical `sys.version_info`, since it's already going through awkwardness to invoke pure-python APIs.. Eric On Wed, 10 Oct 2018 at 04:23 Ralf Gommers wrote: > On Sat, Oct 6, 2018 at 11:24 PM Matti Picus wrote: > >> On 05/10/18 11:46, Jerome Kieffer wrote: >> > On Fri, 5 Oct 2018 11:31:20 +0300 >> > Matti Picus wrote: >> > >> >> In PR 12074 https://github.com/numpy/numpy/pull/12074 I propose >> adding a >> >> function `version.get_numpy_version_as_hex()` which returns a hex value >> >> to represent the current NumPy version MAJOR.MINOR.MICRO where >> >> >> >> v = hex(MAJOR << 24 | MINOR << 16 | MICRO) >> > +1 >> > >> > We use it in our code and it is a good practice, much better then >> 0.9.0>0.10.0 ! >> > >> > We added some support for dev, alpha, beta, RC and final versions in >> > https://github.com/silx-kit/silx/blob/master/version.py >> > >> > Cheers, >> Thanks. I think at this point I will change the proposal to >> >> v = hex(MAJOR << 24 | MINOR << 16 | MICRO << 8) >> >> which leaves room for future enhancement with "release level" and >> "serial" as the lower bits. >> > > Makes sense, but to me adding a tuple (like sys.version_info) would be > more logical. Do that as well or instead of? > > Cheers, > Ralf > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wieser.eric+numpy at gmail.com Wed Oct 10 00:34:29 2018 From: wieser.eric+numpy at gmail.com (Eric Wieser) Date: Wed, 10 Oct 2018 05:34:29 +0100 Subject: [Numpy-discussion] ndrange, like range but multidimensiontal In-Reply-To: References: <039bfa4b-5f14-0241-6fd6-a52b123ac176@gmail.com> Message-ID: One thing that worries me here - in python, range(...) in essence generates a lazy list - so I?d expect ndrange to generate a lazy ndarray. In practice, that means it would be a duck-type defining an __array__ method to evaluate it, and only implement methods already present in numpy. It?s not clear to me what the datatype of such an array-like would be. Candidates I can think of are: 1. [('i0', intp), ('i1', intp), ...], but this makes tuple coercion a little awkward 2. (intp, (N,)) - which collapses into a shape + (3,) array 3. object_. 4. Some new np.tuple_ dtype, a heterogenous tuple, which is like the structured np.void but without field names. I?m not sure how vectorized element indexing would be spelt though. Eric ? On Tue, 9 Oct 2018 at 21:59 Stephan Hoyer wrote: > The speed difference is interesting but really a different question than > the public API. > > I'm coming around to ndrange(). I can see how it could be useful for > symbolic manipulation of arrays and indexing operations, similar to what we > do in dask and xarray. > > On Mon, Oct 8, 2018 at 4:25 PM Mark Harfouche > wrote: > >> since ndrange is a superset of the features of ndindex, we can implement >> ndindex with ndrange or keep it as is. >> ndindex is now a glorified `nditer` object anyway. So it isn't so much of >> a maintenance burden. >> As for how ndindex is implemented, I'm a little worried about python 2 >> performance seeing as range is a list. >> I would wait on changing the way ndindex is implemented for now. >> >> I agree with Stephan that ndindex should be kept in. Many want backward >> compatible code. It would be hard for me to justify why a dependency should >> be bumped up to bleeding edge numpy just for a convenience iterator. >> >> Honestly, I was really surprised to see such a speed difference, I >> thought it would have been closer. >> >> Allan, I decided to run a few more benchmarks, the nditer just seems slow >> for single array access some reason. Maybe a bug? >> >> ``` >> import numpy as np >> import itertools >> a = np.ones((1000, 1000)) >> >> b = {} >> for i in np.ndindex(a.shape): >> b[i] = i >> >> %%timeit >> # op_flag=('readonly',) doesn't change performance >> for a_value in np.nditer(a): >> pass >> 109 ms ? 921 ?s per loop (mean ? std. dev. of 7 runs, 10 loops each) >> >> %%timeit >> for i in itertools.product(range(1000), range(1000)): >> a_value = a[i] >> 113 ms ? 1.72 ms per loop (mean ? std. dev. of 7 runs, 10 loops each) >> >> %%timeit >> for i in itertools.product(range(1000), range(1000)): >> c = b[i] >> 193 ms ? 3.89 ms per loop (mean ? std. dev. of 7 runs, 1 loop each) >> >> %%timeit >> for a_value in a.flat: >> pass >> 25.3 ms ? 278 ?s per loop (mean ? std. dev. of 7 runs, 10 loops each) >> >> %%timeit >> for k, v in b.items(): >> pass >> 19.9 ms ? 675 ?s per loop (mean ? std. dev. of 7 runs, 10 loops each) >> >> %%timeit >> for i in itertools.product(range(1000), range(1000)): >> pass >> 28 ms ? 715 ?s per loop (mean ? std. dev. of 7 runs, 10 loops each) >> ``` >> >> On Mon, Oct 8, 2018 at 4:26 PM Stephan Hoyer wrote: >> >>> I'm open to adding ndrange, and "soft-deprecating" ndindex (i.e., >>> discouraging its use in our docs, but not actually deprecating it). >>> Certainly ndrange seems like a small but meaningful improvement in the >>> interface. >>> >>> That said, I'm not convinced this is really worth the trouble. I think >>> the nested loop is still pretty readable/clear, and there are few times >>> when I've actually found ndindex() be useful. >>> >>> On Mon, Oct 8, 2018 at 12:35 PM Allan Haldane >>> wrote: >>> >>>> On 10/8/18 12:21 PM, Mark Harfouche wrote: >>>> > 2. `ndindex` is an iterator itself. As proposed, `ndrange`, like >>>> > `range`, is not an iterator. Changing this behaviour would likely lead >>>> > to breaking code that uses that assumption. For example anybody using >>>> > introspection or code like: >>>> > >>>> > ``` >>>> > indx = np.ndindex(5, 5) >>>> > next(indx) # Don't look at the (0, 0) coordinate >>>> > for i in indx: >>>> > print(i) >>>> > ``` >>>> > would break if `ndindex` becomes "not an iterator" >>>> >>>> OK, I see now. Just like python3 has separate range and range_iterator >>>> types, where range is sliceable, we would have separate ndrange and >>>> ndindex types, where ndrange is sliceable. You're just copying the >>>> python3 api. That justifies it pretty well for me. >>>> >>>> I still think we shouldn't have two functions which do nearly the same >>>> thing. We should only have one, and get rid of the other. I see two ways >>>> forward: >>>> >>>> * replace ndindex by your ndrange code, so it is no longer an iter. >>>> This would require some deprecation cycles for the cases that break. >>>> * deprecate ndindex in favor of a new function ndrange. We would keep >>>> ndindex around for back-compatibility, with a dep warning to use >>>> ndrange instead. >>>> >>>> Doing a code search on github, I can see that a lot of people's code >>>> would break if ndindex no longer was an iter. I also like the name >>>> ndrange for its allusion to python3's range behavior. That makes me lean >>>> towards the second option of a separate ndrange, with possible >>>> deprecation of ndindex. >>>> >>>> > itertools.product + range seems to be much faster than the current >>>> > implementation of ndindex >>>> > >>>> > (python 3.6) >>>> > ``` >>>> > %%timeit >>>> > >>>> > for i in np.ndindex(100, 100): >>>> > pass >>>> > 3.94 ms ? 19.4 ?s per loop (mean ? std. dev. of 7 runs, 100 loops >>>> each) >>>> > >>>> > %%timeit >>>> > import itertools >>>> > for i in itertools.product(range(100), range(100)): >>>> > pass >>>> > 231 ?s ? 1.09 ?s per loop (mean ? std. dev. of 7 runs, 1000 loops >>>> each) >>>> > ``` >>>> >>>> If the new code ends up faster than the old code, that's great, and >>>> further justification for using ndrange instead of ndindex. I had >>>> thought using nditer in the old code was fastest. >>>> >>>> So as far as I am concerned, I say go ahead with the PR the way you are >>>> doing it. >>>> >>>> Allan >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at python.org >>>> https://mail.python.org/mailman/listinfo/numpy-discussion >>>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark.harfouche at gmail.com Wed Oct 10 09:56:10 2018 From: mark.harfouche at gmail.com (Mark Harfouche) Date: Wed, 10 Oct 2018 09:56:10 -0400 Subject: [Numpy-discussion] ndrange, like range but multidimensiontal In-Reply-To: References: <039bfa4b-5f14-0241-6fd6-a52b123ac176@gmail.com> Message-ID: Eric, Great point. The multi-dimensional slicing and sequence return type is definitely strange. I was thinking about that last night. I?m a little new to the __array__ methods. Are you saying that the sequence behaviour would stay the same, (ie. __iter__, __revesed__, __contains__), but np.asarray(np.ndrange((3, 3))) would return something like an array of tuples? I?m not sure this is something that anybody can?t already with do meshgrid + stack and only implement methods already present in numpy. I?m not sure what this means. I?ll note that in Python 3 , range is it?s own thing. It is still a sequence type but it doesn?t support addition. I?m kinda ok with ndrange/ndindex being a sequence type, supporting ND slicing, but not being an array ;) I?m kinda warming up to the idea of expanding ndindex. 1. The additional start and step can be omitted from ndindex for a while (indefinitely?). Slicing is way more convenient anyway. 2. Warnings can help people move from nd.index(1, 2, 3) to nd.index((1, 2, 3)) 3. ndindex can return a seperate iterator, but the ndindex object would hold a reference to it. Calls to ndindex.__next__ would simply return next(of_that_object) Note. This would break introspection since the iterator is no longer ndindex type. I?m kinda OK with this though, but breaking code is never nice :( 4. Bench-marking can help motivate the choice of iterator used for step=(1,) * N start=(0,) * N 5. Wait until 2019 because I don?t want to deal with performance regressions of potentially using range in Python2 and I don?t want this to motivate any implementation details. Mark On Wed, Oct 10, 2018 at 12:36 AM Eric Wieser wrote: > One thing that worries me here - in python, range(...) in essence > generates a lazy list - so I?d expect ndrange to generate a lazy ndarray. > In practice, that means it would be a duck-type defining an __array__ > method to evaluate it, and only implement methods already present in numpy. > > It?s not clear to me what the datatype of such an array-like would be. > Candidates I can think of are: > > 1. [('i0', intp), ('i1', intp), ...], but this makes tuple coercion a > little awkward > 2. (intp, (N,)) - which collapses into a shape + (3,) array > 3. object_. > 4. Some new np.tuple_ dtype, a heterogenous tuple, which is like the > structured np.void but without field names. I?m not sure how > vectorized element indexing would be spelt though. > > Eric > ? > > On Tue, 9 Oct 2018 at 21:59 Stephan Hoyer wrote: > >> The speed difference is interesting but really a different question than >> the public API. >> >> I'm coming around to ndrange(). I can see how it could be useful for >> symbolic manipulation of arrays and indexing operations, similar to what we >> do in dask and xarray. >> >> On Mon, Oct 8, 2018 at 4:25 PM Mark Harfouche >> wrote: >> >>> since ndrange is a superset of the features of ndindex, we can implement >>> ndindex with ndrange or keep it as is. >>> ndindex is now a glorified `nditer` object anyway. So it isn't so much >>> of a maintenance burden. >>> As for how ndindex is implemented, I'm a little worried about python 2 >>> performance seeing as range is a list. >>> I would wait on changing the way ndindex is implemented for now. >>> >>> I agree with Stephan that ndindex should be kept in. Many want backward >>> compatible code. It would be hard for me to justify why a dependency should >>> be bumped up to bleeding edge numpy just for a convenience iterator. >>> >>> Honestly, I was really surprised to see such a speed difference, I >>> thought it would have been closer. >>> >>> Allan, I decided to run a few more benchmarks, the nditer just seems >>> slow for single array access some reason. Maybe a bug? >>> >>> ``` >>> import numpy as np >>> import itertools >>> a = np.ones((1000, 1000)) >>> >>> b = {} >>> for i in np.ndindex(a.shape): >>> b[i] = i >>> >>> %%timeit >>> # op_flag=('readonly',) doesn't change performance >>> for a_value in np.nditer(a): >>> pass >>> 109 ms ? 921 ?s per loop (mean ? std. dev. of 7 runs, 10 loops each) >>> >>> %%timeit >>> for i in itertools.product(range(1000), range(1000)): >>> a_value = a[i] >>> 113 ms ? 1.72 ms per loop (mean ? std. dev. of 7 runs, 10 loops each) >>> >>> %%timeit >>> for i in itertools.product(range(1000), range(1000)): >>> c = b[i] >>> 193 ms ? 3.89 ms per loop (mean ? std. dev. of 7 runs, 1 loop each) >>> >>> %%timeit >>> for a_value in a.flat: >>> pass >>> 25.3 ms ? 278 ?s per loop (mean ? std. dev. of 7 runs, 10 loops each) >>> >>> %%timeit >>> for k, v in b.items(): >>> pass >>> 19.9 ms ? 675 ?s per loop (mean ? std. dev. of 7 runs, 10 loops each) >>> >>> %%timeit >>> for i in itertools.product(range(1000), range(1000)): >>> pass >>> 28 ms ? 715 ?s per loop (mean ? std. dev. of 7 runs, 10 loops each) >>> ``` >>> >>> On Mon, Oct 8, 2018 at 4:26 PM Stephan Hoyer wrote: >>> >>>> I'm open to adding ndrange, and "soft-deprecating" ndindex (i.e., >>>> discouraging its use in our docs, but not actually deprecating it). >>>> Certainly ndrange seems like a small but meaningful improvement in the >>>> interface. >>>> >>>> That said, I'm not convinced this is really worth the trouble. I think >>>> the nested loop is still pretty readable/clear, and there are few times >>>> when I've actually found ndindex() be useful. >>>> >>>> On Mon, Oct 8, 2018 at 12:35 PM Allan Haldane >>>> wrote: >>>> >>>>> On 10/8/18 12:21 PM, Mark Harfouche wrote: >>>>> > 2. `ndindex` is an iterator itself. As proposed, `ndrange`, like >>>>> > `range`, is not an iterator. Changing this behaviour would likely >>>>> lead >>>>> > to breaking code that uses that assumption. For example anybody using >>>>> > introspection or code like: >>>>> > >>>>> > ``` >>>>> > indx = np.ndindex(5, 5) >>>>> > next(indx) # Don't look at the (0, 0) coordinate >>>>> > for i in indx: >>>>> > print(i) >>>>> > ``` >>>>> > would break if `ndindex` becomes "not an iterator" >>>>> >>>>> OK, I see now. Just like python3 has separate range and range_iterator >>>>> types, where range is sliceable, we would have separate ndrange and >>>>> ndindex types, where ndrange is sliceable. You're just copying the >>>>> python3 api. That justifies it pretty well for me. >>>>> >>>>> I still think we shouldn't have two functions which do nearly the same >>>>> thing. We should only have one, and get rid of the other. I see two >>>>> ways >>>>> forward: >>>>> >>>>> * replace ndindex by your ndrange code, so it is no longer an iter. >>>>> This would require some deprecation cycles for the cases that break. >>>>> * deprecate ndindex in favor of a new function ndrange. We would keep >>>>> ndindex around for back-compatibility, with a dep warning to use >>>>> ndrange instead. >>>>> >>>>> Doing a code search on github, I can see that a lot of people's code >>>>> would break if ndindex no longer was an iter. I also like the name >>>>> ndrange for its allusion to python3's range behavior. That makes me >>>>> lean >>>>> towards the second option of a separate ndrange, with possible >>>>> deprecation of ndindex. >>>>> >>>>> > itertools.product + range seems to be much faster than the current >>>>> > implementation of ndindex >>>>> > >>>>> > (python 3.6) >>>>> > ``` >>>>> > %%timeit >>>>> > >>>>> > for i in np.ndindex(100, 100): >>>>> > pass >>>>> > 3.94 ms ? 19.4 ?s per loop (mean ? std. dev. of 7 runs, 100 loops >>>>> each) >>>>> > >>>>> > %%timeit >>>>> > import itertools >>>>> > for i in itertools.product(range(100), range(100)): >>>>> > pass >>>>> > 231 ?s ? 1.09 ?s per loop (mean ? std. dev. of 7 runs, 1000 loops >>>>> each) >>>>> > ``` >>>>> >>>>> If the new code ends up faster than the old code, that's great, and >>>>> further justification for using ndrange instead of ndindex. I had >>>>> thought using nditer in the old code was fastest. >>>>> >>>>> So as far as I am concerned, I say go ahead with the PR the way you are >>>>> doing it. >>>>> >>>>> Allan >>>>> _______________________________________________ >>>>> NumPy-Discussion mailing list >>>>> NumPy-Discussion at python.org >>>>> https://mail.python.org/mailman/listinfo/numpy-discussion >>>>> >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at python.org >>>> https://mail.python.org/mailman/listinfo/numpy-discussion >>>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Wed Oct 10 12:07:50 2018 From: shoyer at gmail.com (Stephan Hoyer) Date: Wed, 10 Oct 2018 09:07:50 -0700 Subject: [Numpy-discussion] ndrange, like range but multidimensiontal In-Reply-To: References: <039bfa4b-5f14-0241-6fd6-a52b123ac176@gmail.com> Message-ID: On Tue, Oct 9, 2018 at 9:34 PM Eric Wieser wrote: > One thing that worries me here - in python, range(...) in essence > generates a lazy list - so I?d expect ndrange to generate a lazy ndarray. > In practice, that means it would be a duck-type defining an __array__ > method to evaluate it, and only implement methods already present in numpy. > > It?s not clear to me what the datatype of such an array-like would be. > Candidates I can think of are: > > 1. [('i0', intp), ('i1', intp), ...], but this makes tuple coercion a > little awkward > > I think this would be the appropriate choice. What about it makes tuple coercion awkward? If you use this as the dtype, you both set and get element as tuples. In particular, I would say that ndrange() should be a lazy equivalent to the following explicit constructor: def ndrange(shape): dtype = [('i' + str(i), np.intp) for i in range(len(shape))] array = np.empty(shape, dtype) for indices in np.ndindex(*shape): array[indices] = indices return array >>> ndrange((2,) array([(0,), (1,)], dtype=[('i0', '>> ndrange((2, 3)) array([[(0, 0), (0, 1), (0, 2)], [(1, 0), (1, 1), (1, 2)]], dtype=[('i0', ' 1. (intp, (N,)) - which collapses into a shape + (3,) array > 2. object_. > 3. Some new np.tuple_ dtype, a heterogenous tuple, which is like the > structured np.void but without field names. I?m not sure how > vectorized element indexing would be spelt though. > > Eric > ? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From allanhaldane at gmail.com Wed Oct 10 14:21:00 2018 From: allanhaldane at gmail.com (Allan Haldane) Date: Wed, 10 Oct 2018 14:21:00 -0400 Subject: [Numpy-discussion] ndrange, like range but multidimensiontal In-Reply-To: References: <039bfa4b-5f14-0241-6fd6-a52b123ac176@gmail.com> Message-ID: On 10/10/18 12:34 AM, Eric Wieser wrote: > One thing that worries me here - in python, |range(...)| in essence > generates a lazy |list| - so I?d expect |ndrange| to generate a lazy > |ndarray|. In practice, that means it would be a duck-type defining an > |__array__| method to evaluate it, and only implement methods already > present in numpy. Isn't that what arange is for? It seems like there are two uses of python3's range: 1. creating a 1d iterable of indices for use in for-loops, and 2. with list(range) can be used to create a sequence of integers. Numpy can extend this in two directions: * ndrange returns an iterable of nd indices (for for-loops). * arange returns an 1d ndarray of integers instead of a list The application of for-loops, which is more niche, doesn't need ndarray's vectorized properties, so I'm not convinced it should return an ndarray. It certainly seems simpler not to return an ndarray, due to the dtype question. arange on its own seems to cover the need for a vectorized version of range. Allan From mark.harfouche at gmail.com Thu Oct 11 09:41:42 2018 From: mark.harfouche at gmail.com (Mark Harfouche) Date: Thu, 11 Oct 2018 09:41:42 -0400 Subject: [Numpy-discussion] ndrange, like range but multidimensiontal In-Reply-To: References: <039bfa4b-5f14-0241-6fd6-a52b123ac176@gmail.com> Message-ID: I?m really open to these kinds of array extensions but, I (personally) just don?t know how to do this efficiently. I feel like ogrid and mgrid are probably enough for people that want think kind of feature. My implementation would just be based on python primitives which would yield performance similar to In [2]: %timeit np.arange(1000) 1.25 ?s ? 4.01 ns per loop (mean ? std. dev. of 7 runs, 1000000 loops each) In [4]: %timeit np.asarray(range(1000)) 99.6 ?s ? 1.38 ?s per loop (mean ? std. dev. of 7 runs, 10000 loops each) Here is how mgrid can be used to return something similar to the indices from ndrange In [10]: np.mgrid[1:10:3, 2:10:3][:, 1, 1] Out[10]: array([4, 5]) In [13]: np.ndrange((10, 10))[1::3, 2::3][1, 1] Out[13]: (4, 5) On Wed, Oct 10, 2018 at 2:22 PM Allan Haldane wrote: > On 10/10/18 12:34 AM, Eric Wieser wrote: > > One thing that worries me here - in python, |range(...)| in essence > > generates a lazy |list| - so I?d expect |ndrange| to generate a lazy > > |ndarray|. In practice, that means it would be a duck-type defining an > > |__array__| method to evaluate it, and only implement methods already > > present in numpy. > > Isn't that what arange is for? > > It seems like there are two uses of python3's range: 1. creating a 1d > iterable of indices for use in for-loops, and 2. with list(range) can be > used to create a sequence of integers. > > Numpy can extend this in two directions: > * ndrange returns an iterable of nd indices (for for-loops). > * arange returns an 1d ndarray of integers instead of a list > > The application of for-loops, which is more niche, doesn't need > ndarray's vectorized properties, so I'm not convinced it should return > an ndarray. It certainly seems simpler not to return an ndarray, due to > the dtype question. > > arange on its own seems to cover the need for a vectorized version of > range. > > Allan > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wieser.eric+numpy at gmail.com Thu Oct 11 10:14:26 2018 From: wieser.eric+numpy at gmail.com (Eric Wieser) Date: Thu, 11 Oct 2018 07:14:26 -0700 Subject: [Numpy-discussion] ndrange, like range but multidimensiontal In-Reply-To: References: <039bfa4b-5f14-0241-6fd6-a52b123ac176@gmail.com> Message-ID: If you use this as the dtype, you both set and get element as tuples. Elements are not got as tuples, but they can be explicitly cast What about it makes tuple coercion awkward? This explicit cast >>> dt_ind2d = np.dtype([('i0', np.intp), ('i1', np.intp)]) >>> ind = np.zeros((), dt_ind2d)[0] >>> ind, type(ind) ((0, 0), ) >>> m[ind] Traceback (most recent call last): File "", line 1, in m[inds[0]] IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices >>> m[tuple(ind)] 1.0 On Wed, 10 Oct 2018 at 09:08 Stephan Hoyer shoyer at gmail.com wrote: On Tue, Oct 9, 2018 at 9:34 PM Eric Wieser > wrote: > >> One thing that worries me here - in python, range(...) in essence >> generates a lazy list - so I?d expect ndrange to generate a lazy ndarray. >> In practice, that means it would be a duck-type defining an __array__ >> method to evaluate it, and only implement methods already present in numpy. >> >> It?s not clear to me what the datatype of such an array-like would be. >> Candidates I can think of are: >> >> 1. [('i0', intp), ('i1', intp), ...], but this makes tuple coercion a >> little awkward >> >> I think this would be the appropriate choice. What about it makes tuple > coercion awkward? If you use this as the dtype, you both set and get > element as tuples. > > In particular, I would say that ndrange() should be a lazy equivalent to > the following explicit constructor: > > def ndrange(shape): > dtype = [('i' + str(i), np.intp) for i in range(len(shape))] > array = np.empty(shape, dtype) > for indices in np.ndindex(*shape): > array[indices] = indices > return array > > >>> ndrange((2,) > array([(0,), (1,)], dtype=[('i0', ' > >>> ndrange((2, 3)) > array([[(0, 0), (0, 1), (0, 2)], [(1, 0), (1, 1), (1, 2)]], dtype=[('i0', > ' > The one deviation in behavior would be that ndrange() iterates over > flattened elements rather than the first axes. > > It is indeed a little awkward to have field names, but given that NumPy > creates those automatically when you supply a dtype like 'i8,i8' this is > probably a reasonable choice. > > >> 1. (intp, (N,)) - which collapses into a shape + (3,) array >> 2. object_. >> 3. Some new np.tuple_ dtype, a heterogenous tuple, which is like the >> structured np.void but without field names. I?m not sure how >> vectorized element indexing would be spelt though. >> >> Eric >> ? >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From wieser.eric+numpy at gmail.com Thu Oct 11 10:19:20 2018 From: wieser.eric+numpy at gmail.com (Eric Wieser) Date: Thu, 11 Oct 2018 07:19:20 -0700 Subject: [Numpy-discussion] ndrange, like range but multidimensiontal In-Reply-To: References: <039bfa4b-5f14-0241-6fd6-a52b123ac176@gmail.com> Message-ID: Isn?t that what arange is for? Imagining ourselves in python2 land for now - I?m proposing arange is to range, as ndrange is to xrange I?m not convinced it should return an ndarray I agree - I think it should return a range-like object that: - Is convertible via __array__ if needed - Looks like an ndarray, with: - a .dtype attribute - a __getitem__(Tuple[int]) which returns numpy scalars - .ravel() and .flat for choosing iteration order. On Wed, 10 Oct 2018 at 11:21 Allan Haldane allanhaldane at gmail.com wrote: On 10/10/18 12:34 AM, Eric Wieser wrote: > > One thing that worries me here - in python, |range(...)| in essence > > generates a lazy |list| - so I?d expect |ndrange| to generate a lazy > > |ndarray|. In practice, that means it would be a duck-type defining an > > |__array__| method to evaluate it, and only implement methods already > > present in numpy. > > Isn't that what arange is for? > > It seems like there are two uses of python3's range: 1. creating a 1d > iterable of indices for use in for-loops, and 2. with list(range) can be > used to create a sequence of integers. > > Numpy can extend this in two directions: > * ndrange returns an iterable of nd indices (for for-loops). > * arange returns an 1d ndarray of integers instead of a list > > The application of for-loops, which is more niche, doesn't need > ndarray's vectorized properties, so I'm not convinced it should return > an ndarray. It certainly seems simpler not to return an ndarray, due to > the dtype question. > > arange on its own seems to cover the need for a vectorized version of > range. > > Allan > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From harrigan.matthew at gmail.com Thu Oct 11 12:53:51 2018 From: harrigan.matthew at gmail.com (Matthew Harrigan) Date: Thu, 11 Oct 2018 12:53:51 -0400 Subject: [Numpy-discussion] LaTeX version of boolean indexing Message-ID: Hello, I am documenting some code, translating the core of the algorithm to LaTeX. The style I have currently is very similar to the einsum syntax (which is awesome btw). Here is an example of some of the basic operations in NumPy. One part I do not know how to capture well is boolean indexing, ie: mask = np.array([1, 0, 1]) x = np.array([1, 2, 3]) y = x[mask] Any suggestions on how to clearly, formally, and concisely show that operation? Also, are there any guides on translating NumPy to LaTeX? It might be helpful for documenting algorithms and also for people learning NumPy. Thank you, Matt -------------- next part -------------- An HTML attachment was scrubbed... URL: From deak.andris at gmail.com Thu Oct 11 13:29:12 2018 From: deak.andris at gmail.com (Andras Deak) Date: Thu, 11 Oct 2018 19:29:12 +0200 Subject: [Numpy-discussion] LaTeX version of boolean indexing In-Reply-To: References: Message-ID: On Thu, Oct 11, 2018 at 6:54 PM Matthew Harrigan wrote: > > Hello, > > I am documenting some code, translating the core of the algorithm to LaTeX. The style I have currently is very similar to the einsum syntax (which is awesome btw). Here is an example of some of the basic operations in NumPy. One part I do not know how to capture well is boolean indexing, ie: > > mask = np.array([1, 0, 1]) > x = np.array([1, 2, 3]) > y = x[mask] That is fancy indexing with an index array rather than boolean indexing. That's why the result is [2, 1, 2] rather than [1, 3]. In case this is really what you need, it's the case of your indices originating from another sequence: `y_i = x_{m_i}` where `m_i` is your indexing sequence. For proper boolean indexing you lose the one-to-one correspondence between input and output (due to the size almost always changing), so you might not be able to formalize it this nicely with an index appearing in both sides. But something with an indicator might work... Andr?s From harrigan.matthew at gmail.com Thu Oct 11 13:43:40 2018 From: harrigan.matthew at gmail.com (Matthew Harrigan) Date: Thu, 11 Oct 2018 13:43:40 -0400 Subject: [Numpy-discussion] LaTeX version of boolean indexing In-Reply-To: References: Message-ID: My apologies, never write code directly in an email... s/b: mask = np.array([1, 0, 1], dtype=bool) What do you mean by indicator? On Thu, Oct 11, 2018 at 1:31 PM Andras Deak wrote: > On Thu, Oct 11, 2018 at 6:54 PM Matthew Harrigan > wrote: > > > > Hello, > > > > I am documenting some code, translating the core of the algorithm to > LaTeX. The style I have currently is very similar to the einsum syntax > (which is awesome btw). Here is an example of some of the basic operations > in NumPy. One part I do not know how to capture well is boolean indexing, > ie: > > > > mask = np.array([1, 0, 1]) > > x = np.array([1, 2, 3]) > > y = x[mask] > > That is fancy indexing with an index array rather than boolean > indexing. That's why the result is [2, 1, 2] rather than [1, 3]. > In case this is really what you need, it's the case of your indices > originating from another sequence: `y_i = x_{m_i}` where `m_i` is your > indexing sequence. > For proper boolean indexing you lose the one-to-one correspondence > between input and output (due to the size almost always changing), so > you might not be able to formalize it this nicely with an index > appearing in both sides. But something with an indicator might work... > > Andr?s > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From deak.andris at gmail.com Thu Oct 11 14:03:56 2018 From: deak.andris at gmail.com (Andras Deak) Date: Thu, 11 Oct 2018 20:03:56 +0200 Subject: [Numpy-discussion] LaTeX version of boolean indexing In-Reply-To: References: Message-ID: On Thu, Oct 11, 2018 at 7:45 PM Matthew Harrigan wrote: > > What do you mean by indicator? > I mostly meant what wikipedia seems to call "set-builder notation" (https://en.wikipedia.org/wiki/Set-builder_notation#Sets_defined_by_a_predicate). Since your "input" is `{x_i | i in [0,1,2]}` but your output is a `y_j for j in [0,1]`, the straightforward thing I could think of was defining the set of valid `y_j` values (with an implicit assumption of the order being preserved, I guess). This would mean you can say something like `y_i \in {x_j | m_j}` (omitting the \left/\right/\vert fluff for simplicity here) where `m_j` are the elements of the boolean mask (say, `m = [True, False, True]`). In this context I'd understand it that `m_j` is the predicate and `x_j` are the corresponding values, however the notation isn't entirely ambiguous (see also a remark on the above wikipedia page) so you can't really get away with omitting further explanation in order to resolve ambiguity. Though I guess calling `m_j` elements of a mask would do the same thing. The other option that comes to mind is to define the auxiliary indices `n_i` for which `m_j` are True, then you of course denote the result with integer indices: `y_i = x_{n_i}` where `i` goes from 0 to the number of `True`s in `m_j`. But then you have the same difficulty defining `n_i`. All in all I'm not sure there's an elegant and concise notation for boolean masking. Andr?s From mark.harfouche at gmail.com Thu Oct 11 21:53:53 2018 From: mark.harfouche at gmail.com (Mark Harfouche) Date: Thu, 11 Oct 2018 21:53:53 -0400 Subject: [Numpy-discussion] ndrange, like range but multidimensiontal In-Reply-To: References: <039bfa4b-5f14-0241-6fd6-a52b123ac176@gmail.com> Message-ID: Eric, interesting ideas. > __getitem__(Tuple[int]) which returns numpy scalars I'm not sure what you mean. Even if you supply a numpy uint8 to range, it still returns a python int class. Would you like ndrange to return a tuple of `uint8` in this case? ``` In [3]: a = iter(range(np.uint8(10))) In [4]: next(a).__class__ Out[4]: int In [5]: np.uint8(10).__class__ Out[5]: numpy.uint8 ``` Ravel seems like a cool way to choose iteration order. In the PR, I mentionned that one reason that I removed `'F'` order from the PR was: 1. My implementation was not competitive with the `C` order implementation in terms of speed (can be fixed) 2. I don't know if it something that people really need to iterate over collections (annoying to maintain if unused) Instead, I just showed an example how people could iterate in `F` order should they need to. I'm not sure if we ever want the `ndrange` object to return a full matrix. It seems like we would be creating a custom tuple class just for this which seems pretty niche. On Thu, Oct 11, 2018 at 10:21 AM Eric Wieser wrote: > Isn?t that what arange is for? > > Imagining ourselves in python2 land for now - I?m proposing arange is to > range, as ndrange is to xrange > > I?m not convinced it should return an ndarray > > I agree - I think it should return a range-like object that: > > - Is convertible via __array__ if needed > - Looks like an ndarray, with: > - a .dtype attribute > - a __getitem__(Tuple[int]) which returns numpy scalars > - .ravel() and .flat for choosing iteration order. > > On Wed, 10 Oct 2018 at 11:21 Allan Haldane allanhaldane at gmail.com > wrote: > > On 10/10/18 12:34 AM, Eric Wieser wrote: >> > One thing that worries me here - in python, |range(...)| in essence >> > generates a lazy |list| - so I?d expect |ndrange| to generate a lazy >> > |ndarray|. In practice, that means it would be a duck-type defining an >> > |__array__| method to evaluate it, and only implement methods already >> > present in numpy. >> >> Isn't that what arange is for? >> >> It seems like there are two uses of python3's range: 1. creating a 1d >> iterable of indices for use in for-loops, and 2. with list(range) can be >> used to create a sequence of integers. >> >> Numpy can extend this in two directions: >> * ndrange returns an iterable of nd indices (for for-loops). >> * arange returns an 1d ndarray of integers instead of a list >> >> The application of for-loops, which is more niche, doesn't need >> ndarray's vectorized properties, so I'm not convinced it should return >> an ndarray. It certainly seems simpler not to return an ndarray, due to >> the dtype question. >> >> arange on its own seems to cover the need for a vectorized version of >> range. >> >> Allan >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > ? > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wieser.eric+numpy at gmail.com Thu Oct 11 22:29:56 2018 From: wieser.eric+numpy at gmail.com (Eric Wieser) Date: Thu, 11 Oct 2018 19:29:56 -0700 Subject: [Numpy-discussion] ndrange, like range but multidimensiontal In-Reply-To: References: <039bfa4b-5f14-0241-6fd6-a52b123ac176@gmail.com> Message-ID: I?m not sure if we ever want the ndrange object to return a full matrix. np.array(ndrange(...)) should definitely return a full array, because that?s what the user asked for. Even if you supply a numpy uint8 to range, it still returns a python int class. If we want to design ndrange with the intent of indexing only, then it should probably always use np.intp, whatever the type of the provided arguments Would you like ndrange to return a tuple of uint8 in this case? Tuples are just one of the four options I listed in a previous message. The downside of tuples is there?s no easy way to say ?take just the first axis of this range?. Whatever we pick, the return value should be such that np.array(ndrange(...))[ind] == ndrange(...)[idx] ? On Thu, 11 Oct 2018 at 18:54 Mark Harfouche wrote: > Eric, interesting ideas. > > > __getitem__(Tuple[int]) which returns numpy scalars > > I'm not sure what you mean. Even if you supply a numpy uint8 to range, it > still returns a python int class. > Would you like ndrange to return a tuple of `uint8` in this case? > > ``` > In [3]: a = > iter(range(np.uint8(10))) > > In [4]: > next(a).__class__ > Out[4]: int > > In [5]: > np.uint8(10).__class__ > Out[5]: numpy.uint8 > ``` > > Ravel seems like a cool way to choose iteration order. In the PR, I > mentionned that one reason that I removed `'F'` order from the PR was: > 1. My implementation was not competitive with the `C` order implementation > in terms of speed (can be fixed) > 2. I don't know if it something that people really need to iterate over > collections (annoying to maintain if unused) > > Instead, I just showed an example how people could iterate in `F` order > should they need to. > > I'm not sure if we ever want the `ndrange` object to return a full matrix. > It seems like we would be creating a custom tuple class just for this which > seems pretty niche. > > > On Thu, Oct 11, 2018 at 10:21 AM Eric Wieser > wrote: > >> Isn?t that what arange is for? >> >> Imagining ourselves in python2 land for now - I?m proposing arange is to >> range, as ndrange is to xrange >> >> I?m not convinced it should return an ndarray >> >> I agree - I think it should return a range-like object that: >> >> - Is convertible via __array__ if needed >> - Looks like an ndarray, with: >> - a .dtype attribute >> - a __getitem__(Tuple[int]) which returns numpy scalars >> - .ravel() and .flat for choosing iteration order. >> >> On Wed, 10 Oct 2018 at 11:21 Allan Haldane allanhaldane at gmail.com >> wrote: >> >> On 10/10/18 12:34 AM, Eric Wieser wrote: >>> > One thing that worries me here - in python, |range(...)| in essence >>> > generates a lazy |list| - so I?d expect |ndrange| to generate a lazy >>> > |ndarray|. In practice, that means it would be a duck-type defining an >>> > |__array__| method to evaluate it, and only implement methods already >>> > present in numpy. >>> >>> Isn't that what arange is for? >>> >>> It seems like there are two uses of python3's range: 1. creating a 1d >>> iterable of indices for use in for-loops, and 2. with list(range) can be >>> used to create a sequence of integers. >>> >>> Numpy can extend this in two directions: >>> * ndrange returns an iterable of nd indices (for for-loops). >>> * arange returns an 1d ndarray of integers instead of a list >>> >>> The application of for-loops, which is more niche, doesn't need >>> ndarray's vectorized properties, so I'm not convinced it should return >>> an ndarray. It certainly seems simpler not to return an ndarray, due to >>> the dtype question. >>> >>> arange on its own seems to cover the need for a vectorized version of >>> range. >>> >>> Allan >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >> ? >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark.harfouche at gmail.com Thu Oct 11 23:15:16 2018 From: mark.harfouche at gmail.com (Mark Harfouche) Date: Thu, 11 Oct 2018 23:15:16 -0400 Subject: [Numpy-discussion] ndrange, like range but multidimensiontal In-Reply-To: References: <039bfa4b-5f14-0241-6fd6-a52b123ac176@gmail.com> Message-ID: > If we want to design ndrange with the intent of indexing only This is the only use I had in mind. But I feel like you are able to envision different use cases. > Whatever we pick, the return value should be such that np.array(ndrange(...))[ind] == ndrange(...)[idx] I can see the appeal to this. On Thu, Oct 11, 2018 at 10:31 PM Eric Wieser wrote: > I?m not sure if we ever want the ndrange object to return a full matrix. > > np.array(ndrange(...)) should definitely return a full array, because > that?s what the user asked for. > > Even if you supply a numpy uint8 to range, it still returns a python int > class. > > If we want to design ndrange with the intent of indexing only, then it > should probably always use np.intp, whatever the type of the provided > arguments > > Would you like ndrange to return a tuple of uint8 in this case? > > Tuples are just one of the four options I listed in a previous message. > The downside of tuples is there?s no easy way to say ?take just the first > axis of this range?. > Whatever we pick, the return value should be such that np.array(ndrange(...))[ind] > == ndrange(...)[idx] > > On Thu, 11 Oct 2018 at 18:54 Mark Harfouche > wrote: > >> Eric, interesting ideas. >> >> > __getitem__(Tuple[int]) which returns numpy scalars >> >> I'm not sure what you mean. Even if you supply a numpy uint8 to range, it >> still returns a python int class. >> Would you like ndrange to return a tuple of `uint8` in this case? >> >> ``` >> In [3]: a = >> iter(range(np.uint8(10))) >> >> In [4]: >> next(a).__class__ >> Out[4]: int >> >> In [5]: >> np.uint8(10).__class__ >> Out[5]: numpy.uint8 >> ``` >> >> Ravel seems like a cool way to choose iteration order. In the PR, I >> mentionned that one reason that I removed `'F'` order from the PR was: >> 1. My implementation was not competitive with the `C` order >> implementation in terms of speed (can be fixed) >> 2. I don't know if it something that people really need to iterate over >> collections (annoying to maintain if unused) >> >> Instead, I just showed an example how people could iterate in `F` order >> should they need to. >> >> I'm not sure if we ever want the `ndrange` object to return a full >> matrix. It seems like we would be creating a custom tuple class just for >> this which seems pretty niche. >> >> >> On Thu, Oct 11, 2018 at 10:21 AM Eric Wieser >> wrote: >> >>> Isn?t that what arange is for? >>> >>> Imagining ourselves in python2 land for now - I?m proposing arange is >>> to range, as ndrange is to xrange >>> >>> I?m not convinced it should return an ndarray >>> >>> I agree - I think it should return a range-like object that: >>> >>> - Is convertible via __array__ if needed >>> - Looks like an ndarray, with: >>> - a .dtype attribute >>> - a __getitem__(Tuple[int]) which returns numpy scalars >>> - .ravel() and .flat for choosing iteration order. >>> >>> On Wed, 10 Oct 2018 at 11:21 Allan Haldane allanhaldane at gmail.com >>> wrote: >>> >>> On 10/10/18 12:34 AM, Eric Wieser wrote: >>>> > One thing that worries me here - in python, |range(...)| in essence >>>> > generates a lazy |list| - so I?d expect |ndrange| to generate a lazy >>>> > |ndarray|. In practice, that means it would be a duck-type defining an >>>> > |__array__| method to evaluate it, and only implement methods already >>>> > present in numpy. >>>> >>>> Isn't that what arange is for? >>>> >>>> It seems like there are two uses of python3's range: 1. creating a 1d >>>> iterable of indices for use in for-loops, and 2. with list(range) can be >>>> used to create a sequence of integers. >>>> >>>> Numpy can extend this in two directions: >>>> * ndrange returns an iterable of nd indices (for for-loops). >>>> * arange returns an 1d ndarray of integers instead of a list >>>> >>>> The application of for-loops, which is more niche, doesn't need >>>> ndarray's vectorized properties, so I'm not convinced it should return >>>> an ndarray. It certainly seems simpler not to return an ndarray, due to >>>> the dtype question. >>>> >>>> arange on its own seems to cover the need for a vectorized version of >>>> range. >>>> >>>> Allan >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at python.org >>>> https://mail.python.org/mailman/listinfo/numpy-discussion >>>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefanv at berkeley.edu Fri Oct 12 01:43:58 2018 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Thu, 11 Oct 2018 22:43:58 -0700 Subject: [Numpy-discussion] BIDS/NumPy dev meetings, Wednesdays 12pm Pacific Message-ID: <20181012054358.denpcg2dtoree6er@carbo> Hi everyone, The team at BIDS meets once a week to discuss progress, priorities, and roadblocks. While our priorities are broadly determined by the project roadmap [0], we would like to provide an opportunity for the community to give more regular and detailed feedback on our work. We therefore invite you to join us for our weekly calls, each **Wednesday from 12:00 to 13:00 Pacific Time**. Detail of the next meeting (2018-10-17) is given in the agenda [1], which is a growing document?feel free to add topics you wish to discuss. We hope to see you there! I will send another reminder next week. Best regards, St?fan [0] https://www.numpy.org/neps/index.html [1] https://hackmd.io/YZfpGn5BSu6acAFLBaRjtw# From einstein.edison at gmail.com Fri Oct 12 11:34:32 2018 From: einstein.edison at gmail.com (Hameer Abbasi) Date: Fri, 12 Oct 2018 17:34:32 +0200 Subject: [Numpy-discussion] Exact semantics of ufunc.reduce Message-ID: <87e71577-4896-44b0-b898-41568db5eebe@Canary> Hello! I?m trying to investigate the exact way ufunc.reduce works when given a custom dtype. Does it cast before or after the operation, or somewhere in between? How does this differ from ufunc.reduceat, for example? We ran into this issue in pydata/sparse#191 (https://github.com/pydata/sparse/issues/191) when trying to match the two where the only thing differing is the number of zeros for sum, which shouldn?t change the result. Best Regards, Hameer Abbasi -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Fri Oct 12 11:46:44 2018 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Fri, 12 Oct 2018 08:46:44 -0700 Subject: [Numpy-discussion] BIDS/NumPy dev meetings, Wednesdays 12pm Pacific In-Reply-To: <20181012054358.denpcg2dtoree6er@carbo> References: <20181012054358.denpcg2dtoree6er@carbo> Message-ID: On Thu, Oct 11, 2018 at 10:44 PM Stefan van der Walt wrote: > Hi everyone, > > The team at BIDS meets once a week to discuss progress, priorities, and > roadblocks. While our priorities are broadly determined by the project > roadmap [0], we would like to provide an opportunity for the community > to give more regular and detailed feedback on our work. > > We therefore invite you to join us for our weekly calls, > each **Wednesday from 12:00 to 13:00 Pacific Time**. > > Detail of the next meeting (2018-10-17) is given in the agenda [1], > which is a growing document?feel free to add topics you wish to discuss. > > We hope to see you there! I will send another reminder next week. > Sounds like a good idea, thanks for doing this. I'm unlikely to make the first two meetings, but will try to join when I can after that. Cheers, Ralf > Best regards, > St?fan > > > [0] https://www.numpy.org/neps/index.html > [1] https://hackmd.io/YZfpGn5BSu6acAFLBaRjtw# > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Fri Oct 12 12:11:20 2018 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Fri, 12 Oct 2018 18:11:20 +0200 Subject: [Numpy-discussion] Exact semantics of ufunc.reduce In-Reply-To: <87e71577-4896-44b0-b898-41568db5eebe@Canary> References: <87e71577-4896-44b0-b898-41568db5eebe@Canary> Message-ID: On Fri, 2018-10-12 at 17:34 +0200, Hameer Abbasi wrote: > Hello! > > I?m trying to investigate the exact way ufunc.reduce works when given > a custom dtype. Does it cast before or after the operation, or > somewhere in between? How does this differ from ufunc.reduceat, for > example? > I am not 100% sure, but I think giving the dtype definitely casts the output type. And since most ufunc loops are defined as "ff->f", etc. that effectively casts the input as well. It might be it casts the input specifically, but I doubt it. The cast will occur within the buffering machinery, so the cast is only done in small chunks. But the operation itself should be performed using the given dtype. - Sebastian > We ran into this issue in pydata/sparse#191 when trying to match the > two where the only thing differing is the number of zeros for sum, > which shouldn?t change the result. > > Best Regards, > Hameer Abbasi > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From mark.harfouche at gmail.com Mon Oct 15 08:44:01 2018 From: mark.harfouche at gmail.com (Mark Harfouche) Date: Mon, 15 Oct 2018 08:44:01 -0400 Subject: [Numpy-discussion] A zeros_like implementation based on calloc instead of copyto Message-ID: Hello, Currently, `zeros_like` is based `copyto` as opposed to `calloc`. This causes inconsistencies in the amount of time it takes to create an array with `zeros` + `shape` and `zeros_like` for large arrays. This was first raised https://github.com/numpy/numpy/issues/9909 It seems to me that a memory copy can be avoided by using `PyArray_NewFromDescr_int` in C. I propose creating a new C_API function `PyArray_NewZerosLikeArray` that behaves much like the `PyArray_NewLikeArray` with the exception that it calls `PyArray_NewFromDescr_int` instead of `PyArray_NewFromDescr` to initialize the array to zeros with calloc. An all C implementation of `zeros_like` is also possible by adapting the `empty_like` function. A draft implementation is viewable https://github.com/hmaarrfk/numpy/pull/2/files for those looking for more details about my proposed implementation. Thank you for considering. Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefanv at berkeley.edu Tue Oct 16 17:26:57 2018 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Tue, 16 Oct 2018 14:26:57 -0700 Subject: [Numpy-discussion] BIDS/NumPy dev meetings, Wednesdays 12pm Pacific In-Reply-To: <20181012054358.denpcg2dtoree6er@carbo> References: <20181012054358.denpcg2dtoree6er@carbo> Message-ID: <20181016212657.sk5ujmlp3p7hxw5b@carbo> Hi everyone, This is a friendly reminder of the BIDS/NumPy dev meetings, kicking off tomorrow at 12pm Pacific time. Please add any topics you wish to discuss to the agenda linked below. Best regards, St?fan On Thu, 11 Oct 2018 22:43:58 -0700, Stefan van der Walt wrote: > The team at BIDS meets once a week to discuss progress, priorities, and > roadblocks. While our priorities are broadly determined by the project > roadmap [0], we would like to provide an opportunity for the community > to give more regular and detailed feedback on our work. > > We therefore invite you to join us for our weekly calls, > each **Wednesday from 12:00 to 13:00 Pacific Time**. > > Detail of the next meeting (2018-10-17) is given in the agenda [1], > which is a growing document?feel free to add topics you wish to discuss. > > We hope to see you there! I will send another reminder next week. > > > [0] https://www.numpy.org/neps/index.html > [1] https://hackmd.io/YZfpGn5BSu6acAFLBaRjtw# From allanhaldane at gmail.com Tue Oct 16 18:41:18 2018 From: allanhaldane at gmail.com (Allan Haldane) Date: Tue, 16 Oct 2018 18:41:18 -0400 Subject: [Numpy-discussion] BIDS/NumPy dev meetings, Wednesdays 12pm Pacific In-Reply-To: <20181016212657.sk5ujmlp3p7hxw5b@carbo> References: <20181012054358.denpcg2dtoree6er@carbo> <20181016212657.sk5ujmlp3p7hxw5b@carbo> Message-ID: <668fa1d2-bd59-6444-4162-fd42f347939d@gmail.com> I'll try to make it, especially as it looks like you want to discuss two of my PRs! :) I have a different meeting a bit before then which might run over though, so sorry ahead of time if I'm not there. Cheers, Allan On 10/16/18 5:26 PM, Stefan van der Walt wrote: > Hi everyone, > > This is a friendly reminder of the BIDS/NumPy dev meetings, kicking off > tomorrow at 12pm Pacific time. > > Please add any topics you wish to discuss to the agenda linked below. > > Best regards, > St?fan > > > On Thu, 11 Oct 2018 22:43:58 -0700, Stefan van der Walt wrote: >> The team at BIDS meets once a week to discuss progress, priorities, and >> roadblocks. While our priorities are broadly determined by the project >> roadmap [0], we would like to provide an opportunity for the community >> to give more regular and detailed feedback on our work. >> >> We therefore invite you to join us for our weekly calls, >> each **Wednesday from 12:00 to 13:00 Pacific Time**. >> >> Detail of the next meeting (2018-10-17) is given in the agenda [1], >> which is a growing document?feel free to add topics you wish to discuss. >> >> We hope to see you there! I will send another reminder next week. >> >> >> [0] https://www.numpy.org/neps/index.html >> [1] https://hackmd.io/YZfpGn5BSu6acAFLBaRjtw# > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > From ralf.gommers at gmail.com Tue Oct 16 19:05:43 2018 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Tue, 16 Oct 2018 23:05:43 +0000 Subject: [Numpy-discussion] summary of NumFOCUS Summit & roadmap presentations Message-ID: Hi all, At the end of September the NumFOCUS Summit was held; Allan and I both attended on behalf of NumPy. I've written up a summary of the event from a NumPy perspective: https://rgommers.github.io/2018/10/2018-numfocus-summit---a-summary/. I also link in that post to both of the presentations I gave on the NumPy roadmap. I suspect that what I wrote raises more questions than it answers, so questions/ideas/criticism very welcome! Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Wed Oct 17 13:47:13 2018 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 17 Oct 2018 18:47:13 +0100 Subject: [Numpy-discussion] random.choice(replace=False) very slow Message-ID: Hi, I noticed that numpy.random.choice was very slow, with the replace=False option, and then I noticed it can (for most cases) be made many hundreds of times faster in Python code: In [18]: sample = np.random.uniform(size=1000000) In [19]: timeit np.random.choice(sample, 500, replace=False) 42.1 ms ? 214 ?s per loop (mean ? std. dev. of 7 runs, 10 loops each) IIn [22]: def rc(x, size): ...: n = np.prod(size) ...: n_plus = n * 2 ...: inds = np.unique(np.random.randint(0, n_plus+1, size=n_plus))[:n] ...: return x[inds].reshape(size) In [23]: timeit rc(sample, 500) 86.5 ?s ? 421 ns per loop (mean ? std. dev. of 7 runs, 10000 loops each)each) Is there a reason why it's so slow in C? Could something more intelligent than the above be used to speed it up? Cheers, Matthew From matti.picus at gmail.com Wed Oct 17 13:58:55 2018 From: matti.picus at gmail.com (Matti Picus) Date: Wed, 17 Oct 2018 20:58:55 +0300 Subject: [Numpy-discussion] Approving NEP 27 - Historical discussion of 0-D arrays Message-ID: <1650bf66-12fd-8bdd-21e3-8d5a0ecb206f@gmail.com> In PR 12166 https://github.com/numpy/numpy/pull/12166 we revived an old wiki document discussing the implementation of 0-dimensional arrays. This became informational NEP-27 http://www.numpy.org/neps/nep-0027-zero-rank-arrarys.html. There was fruitful discussion of the NEP and the need for both 0-D arrays and scalars on the PR comments. The NEP itself is informational and freezes the information to the 2006 discussion, noting that "some of the information here is dated, for instance indexing of 0-D arrays now is now implemented and does not error." I would like to submit the NEP for discussion and approval. Matti From einstein.edison at gmail.com Wed Oct 17 14:16:22 2018 From: einstein.edison at gmail.com (Hameer Abbasi) Date: Wed, 17 Oct 2018 20:16:22 +0200 Subject: [Numpy-discussion] =?utf-8?Q?random.choice(replace=3DFalse)_?=very slow In-Reply-To: References: Message-ID: Hi! The standard algorithm for sampling without replacement is ``O(N)`` expected for ``N < 0.5 * M`` where ``M`` is the length of the original set, but ``O(N^2)`` worst-case. When this is not true, a simple Durstenfeld-Fisher-Yates shuffle [1] (``O(M)``) can be used on the original set and then the first ``N`` items selected. Although this is fast, it uses up a large amount of memory (``O(M)`` extra memory rather than ``O(N)``) and I?m not sure where the best trade off is. It also can?t be used with an arbitrary probability distribution. One way to handle this would be to sample a maximum of ``N // 2`` samples and then select the ?unselected? samples instead. Although this has a faster expected run-time than the standard algorithm in all cases, it would break backwards-compatibility guarantees. Best Regards, Hameer Abbasi [1] https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle > On Wednesday, Oct 17, 2018 at 7:48 PM, Matthew Brett wrote: > Hi, > > I noticed that numpy.random.choice was very slow, with the > replace=False option, and then I noticed it can (for most cases) be > made many hundreds of times faster in Python code: > > In [18]: sample = np.random.uniform(size=1000000) > In [19]: timeit np.random.choice(sample, 500, replace=False) > 42.1 ms ? 214 ?s per loop (mean ? std. dev. of 7 runs, 10 > loops each) > IIn [22]: def rc(x, size): > ...: n = np.prod(size) > ...: n_plus = n * 2 > ...: inds = np.unique(np.random.randint(0, n_plus+1, size=n_plus))[:n] > ...: return x[inds].reshape(size) > In [23]: timeit rc(sample, 500) > 86.5 ?s ? 421 ns per loop (mean ? std. dev. of 7 runs, 10000 loops each)each) > > Is there a reason why it's so slow in C? Could something more > intelligent than the above be used to speed it up? > > Cheers, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From einstein.edison at gmail.com Wed Oct 17 14:34:04 2018 From: einstein.edison at gmail.com (Hameer Abbasi) Date: Wed, 17 Oct 2018 20:34:04 +0200 Subject: [Numpy-discussion] Approving NEP 27 - Historical discussion of 0-D arrays In-Reply-To: <1650bf66-12fd-8bdd-21e3-8d5a0ecb206f@gmail.com> References: <1650bf66-12fd-8bdd-21e3-8d5a0ecb206f@gmail.com> Message-ID: <953635be-050c-4f6a-88d2-0e3931fe3ca4@Canary> Hi everyone, Ah, I neglected to see that the PR was already merged. In any case, I?ll repeat my comment here (referring to the indexing section): I would suggest that this section be removed entirely or updated. For example, if xis either an array scalar or a rank zero array, x[...] is guaranteed to be an array and x[()]is guaranteed to be a scalar. The difference is because x[{anything here}, ...] is guaranteed to be an array. In words, if the last index is an ellipsis, the result of indexing is guaranteed to be an array. I came across this weird behaviour when implementing the equivalent of np.wherefor PyData/Sparse. Best Regards, Hameer Abbasi > On Wednesday, Oct 17, 2018 at 7:59 PM, Matti Picus wrote: > In PR 12166 https://github.com/numpy/numpy/pull/12166 we revived an old > wiki document discussing the implementation of 0-dimensional arrays. > This became informational NEP-27 > http://www.numpy.org/neps/nep-0027-zero-rank-arrarys.html. There was > fruitful discussion of the NEP and the need for both 0-D arrays and > scalars on the PR comments. The NEP itself is informational and freezes > the information to the 2006 discussion, noting that "some of the > information here is dated, for instance indexing of 0-D arrays now is > now implemented and does not error." > > > I would like to submit the NEP for discussion and approval. > > Matti > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark.harfouche at gmail.com Wed Oct 17 14:58:52 2018 From: mark.harfouche at gmail.com (Mark Harfouche) Date: Wed, 17 Oct 2018 14:58:52 -0400 Subject: [Numpy-discussion] BIDS/NumPy dev meetings, Wednesdays 12pm Pacific In-Reply-To: <668fa1d2-bd59-6444-4162-fd42f347939d@gmail.com> References: <20181012054358.denpcg2dtoree6er@carbo> <20181016212657.sk5ujmlp3p7hxw5b@carbo> <668fa1d2-bd59-6444-4162-fd42f347939d@gmail.com> Message-ID: Stefan. I would like to simply listen in. I cant seem to find the meeting ID that we need to call in. On Tue, Oct 16, 2018 at 6:42 PM Allan Haldane wrote: > I'll try to make it, especially as it looks like you want to discuss two > of my PRs! :) > > I have a different meeting a bit before then which might run over > though, so sorry ahead of time if I'm not there. > > Cheers, > Allan > > > On 10/16/18 5:26 PM, Stefan van der Walt wrote: > > Hi everyone, > > > > This is a friendly reminder of the BIDS/NumPy dev meetings, kicking off > > tomorrow at 12pm Pacific time. > > > > Please add any topics you wish to discuss to the agenda linked below. > > > > Best regards, > > St?fan > > > > > > On Thu, 11 Oct 2018 22:43:58 -0700, Stefan van der Walt wrote: > >> The team at BIDS meets once a week to discuss progress, priorities, and > >> roadblocks. While our priorities are broadly determined by the project > >> roadmap [0], we would like to provide an opportunity for the community > >> to give more regular and detailed feedback on our work. > >> > >> We therefore invite you to join us for our weekly calls, > >> each **Wednesday from 12:00 to 13:00 Pacific Time**. > >> > >> Detail of the next meeting (2018-10-17) is given in the agenda [1], > >> which is a growing document?feel free to add topics you wish to discuss. > >> > >> We hope to see you there! I will send another reminder next week. > >> > >> > >> [0] https://www.numpy.org/neps/index.html > >> [1] https://hackmd.io/YZfpGn5BSu6acAFLBaRjtw# > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From einstein.edison at gmail.com Wed Oct 17 15:06:56 2018 From: einstein.edison at gmail.com (Hameer Abbasi) Date: Wed, 17 Oct 2018 21:06:56 +0200 Subject: [Numpy-discussion] BIDS/NumPy dev meetings, Wednesdays 12pm Pacific In-Reply-To: References: <20181012054358.denpcg2dtoree6er@carbo> <20181016212657.sk5ujmlp3p7hxw5b@carbo> <668fa1d2-bd59-6444-4162-fd42f347939d@gmail.com> Message-ID: <2c725077-5027-443e-a69b-8e98a1f122eb@Canary> Dial in: https://berkeley.zoom.us/zoomconference?m=ta2dUMqcdK219Ov78Sj7CMIzzoX2CHGZ Join in via PC: https://berkeley.zoom.us/j/400054438 Best Regards, Hameer Abbasi > On Wednesday, Oct 17, 2018 at 8:59 PM, Mark Harfouche wrote: > Stefan. I would like to simply listen in. I cant seem to find the meeting ID that we need to call in. > > On Tue, Oct 16, 2018 at 6:42 PM Allan Haldane wrote: > > I'll try to make it, especially as it looks like you want to discuss two > > of my PRs! :) > > > > I have a different meeting a bit before then which might run over > > though, so sorry ahead of time if I'm not there. > > > > Cheers, > > Allan > > > > > > On 10/16/18 5:26 PM, Stefan van der Walt wrote: > > > Hi everyone, > > > > > > This is a friendly reminder of the BIDS/NumPy dev meetings, kicking off > > > tomorrow at 12pm Pacific time. > > > > > > Please add any topics you wish to discuss to the agenda linked below. > > > > > > Best regards, > > > St?fan > > > > > > > > > On Thu, 11 Oct 2018 22:43:58 -0700, Stefan van der Walt wrote: > > >> The team at BIDS meets once a week to discuss progress, priorities, and > > >> roadblocks. While our priorities are broadly determined by the project > > >> roadmap [0], we would like to provide an opportunity for the community > > >> to give more regular and detailed feedback on our work. > > >> > > >> We therefore invite you to join us for our weekly calls, > > >> each **Wednesday from 12:00 to 13:00 Pacific Time**. > > >> > > >> Detail of the next meeting (2018-10-17) is given in the agenda [1], > > >> which is a growing document?feel free to add topics you wish to discuss. > > >> > > >> We hope to see you there! I will send another reminder next week. > > >> > > >> > > >> [0] https://www.numpy.org/neps/index.html > > >> [1] https://hackmd.io/YZfpGn5BSu6acAFLBaRjtw# > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at python.org (mailto:NumPy-Discussion at python.org) > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org (mailto:NumPy-Discussion at python.org) > > https://mail.python.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefanv at berkeley.edu Wed Oct 17 19:16:32 2018 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Wed, 17 Oct 2018 16:16:32 -0700 Subject: [Numpy-discussion] Approving NEP 27 - Historical discussion of 0-D arrays In-Reply-To: <1650bf66-12fd-8bdd-21e3-8d5a0ecb206f@gmail.com> References: <1650bf66-12fd-8bdd-21e3-8d5a0ecb206f@gmail.com> Message-ID: <20181017231632.6bqdujmnqeskhehu@carbo> On Wed, 17 Oct 2018 20:58:55 +0300, Matti Picus wrote: > http://www.numpy.org/neps/nep-0027-zero-rank-arrarys.html. There was > fruitful discussion of the NEP and the need for both 0-D arrays and scalars > on the PR comments. Were those comments integrated back into the NEP? If not, can we add a paragraph to summarize the discussion? St?fan From shoyer at gmail.com Wed Oct 17 20:39:12 2018 From: shoyer at gmail.com (Stephan Hoyer) Date: Wed, 17 Oct 2018 17:39:12 -0700 Subject: [Numpy-discussion] Approving NEP 27 - Historical discussion of 0-D arrays In-Reply-To: <20181017231632.6bqdujmnqeskhehu@carbo> References: <1650bf66-12fd-8bdd-21e3-8d5a0ecb206f@gmail.com> <20181017231632.6bqdujmnqeskhehu@carbo> Message-ID: On Wed, Oct 17, 2018 at 4:16 PM Stefan van der Walt wrote: > On Wed, 17 Oct 2018 20:58:55 +0300, Matti Picus wrote: > > http://www.numpy.org/neps/nep-0027-zero-rank-arrarys.html. There was > > fruitful discussion of the NEP and the need for both 0-D arrays and > scalars > > on the PR comments. > > Were those comments integrated back into the NEP? If not, can we add a > paragraph to summarize the discussion? > Yes, it's in the second paragraph of the "Abstract" section. -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Wed Oct 17 20:40:50 2018 From: shoyer at gmail.com (Stephan Hoyer) Date: Wed, 17 Oct 2018 17:40:50 -0700 Subject: [Numpy-discussion] Approving NEP 27 - Historical discussion of 0-D arrays In-Reply-To: <1650bf66-12fd-8bdd-21e3-8d5a0ecb206f@gmail.com> References: <1650bf66-12fd-8bdd-21e3-8d5a0ecb206f@gmail.com> Message-ID: On Wed, Oct 17, 2018 at 10:59 AM Matti Picus wrote: > In PR 12166 https://github.com/numpy/numpy/pull/12166 we revived an old > wiki document discussing the implementation of 0-dimensional arrays. > This became informational NEP-27 > http://www.numpy.org/neps/nep-0027-zero-rank-arrarys.html. There was > fruitful discussion of the NEP and the need for both 0-D arrays and > scalars on the PR comments. > We might consider adding a link to the PR under a "Discussion" section, like what you can see for NEP 16. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefanv at berkeley.edu Thu Oct 18 21:10:45 2018 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Thu, 18 Oct 2018 18:10:45 -0700 Subject: [Numpy-discussion] BIDS/NumPy dev meetings, Wednesdays 12pm Pacific In-Reply-To: <20181016212657.sk5ujmlp3p7hxw5b@carbo> References: <20181012054358.denpcg2dtoree6er@carbo> <20181016212657.sk5ujmlp3p7hxw5b@carbo> Message-ID: <20181019011045.a7ofrndp6p2om43q@carbo> Thank you to everyone who attended the development meeting this week. We've posted the agenda/notes online at: https://github.com/BIDS-numpy/docs/blob/master/status_meetings/status-2018-10-17.md On Tue, 16 Oct 2018 14:26:57 -0700, Stefan van der Walt wrote: > Hi everyone, > > This is a friendly reminder of the BIDS/NumPy dev meetings, kicking off > tomorrow at 12pm Pacific time. > > Please add any topics you wish to discuss to the agenda linked below. > > Best regards, > St?fan From matti.picus at gmail.com Fri Oct 19 04:02:01 2018 From: matti.picus at gmail.com (Matti Picus) Date: Fri, 19 Oct 2018 11:02:01 +0300 Subject: [Numpy-discussion] Removing priority labels from github Message-ID: We currently have highest, high, normal, low, and lowest priority labels for github issues/PRs. At the recent status meeting, we proposed consolidating these to a single "high" priority label. Anything "low" priority should be merged or closed since it will be quickly forgotten, and no "normal" tag is needed. With that, we (the BIDS team) would like to encourage reviewers to use the "high" priority tag to indicate things we should be working on. Any objections or thoughts? Matti (in the names of Tyler and Stefan) From matti.picus at gmail.com Fri Oct 19 04:28:36 2018 From: matti.picus at gmail.com (Matti Picus) Date: Fri, 19 Oct 2018 11:28:36 +0300 Subject: [Numpy-discussion] asanyarray vs. asarray Message-ID: An HTML attachment was scrubbed... URL: From einstein.edison at gmail.com Fri Oct 19 04:37:41 2018 From: einstein.edison at gmail.com (Hameer Abbasi) Date: Fri, 19 Oct 2018 10:37:41 +0200 Subject: [Numpy-discussion] asanyarray vs. asarray In-Reply-To: References: Message-ID: <24100c7f-20fd-4eed-99b0-d37660f52223@Canary> Hi all > On Friday, Oct 19, 2018 at 10:28 AM, Matti Picus wrote: > > Was there discussion around which of `asarray` or asanyarray` to prefer? PR 11162, https://github.com/numpy/numpy/pull/11162, proposes `asanyarray` in place of `asarray` at the entrance to `_quantile_ureduce_func` to preserve ndarray subclasses. Should we be looking into changing all the `asarray` calls into `asanyarray`? > > > I suspect that this will cause a large number of problems around np.matrix, so unless we deprecate that, this might cause a large amount of problems. The problem with np.matrix is that it?s a subclass, but it?s not substitutable for the base class, and so violates SOLID. There are efforts to remove np.matrix, with the largest consumer being scipy.sparse, so unless that?s revamped, deprecating np.matrix is kind of hard to do. > > > > > > Matti > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion Best Regards, Hameer Abbasi -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Fri Oct 19 05:21:34 2018 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Fri, 19 Oct 2018 11:21:34 +0200 Subject: [Numpy-discussion] Removing priority labels from github In-Reply-To: References: Message-ID: <22c3981e90e72bf4df9d02c02ce2277d1c7d6e41.camel@sipsolutions.net> On Fri, 2018-10-19 at 11:02 +0300, Matti Picus wrote: > We currently have highest, high, normal, low, and lowest priority > labels > for github issues/PRs. At the recent status meeting, we proposed > consolidating these to a single "high" priority label. Anything > "low" > priority should be merged or closed since it will be quickly > forgotten, > and no "normal" tag is needed. > > > With that, we (the BIDS team) would like to encourage reviewers to > use > the "high" priority tag to indicate things we should be working on. > > Any objections or thoughts? > Sounds like a plan, especially having practically meaningless tags right now is no help. Most of them are historical and personally I have only been using the milestones to tag things as high priority (very occasionally). - Sebastian > > Matti (in the names of Tyler and Stefan) > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From m.h.vankerkwijk at gmail.com Fri Oct 19 11:11:31 2018 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Fri, 19 Oct 2018 11:11:31 -0400 Subject: [Numpy-discussion] asanyarray vs. asarray In-Reply-To: <24100c7f-20fd-4eed-99b0-d37660f52223@Canary> References: <24100c7f-20fd-4eed-99b0-d37660f52223@Canary> Message-ID: There are exceptions for `matrix` in quite a few places, and there now is warning for `maxtrix` - it might not be bad to use `asanyarray` and add an exception for `maxtrix`. Indeed, I quite like the suggestion by Eric Wieser to just add the exception to `asanyarray` itself - that way when matrix is truly deprecated, it will be a very easy change. -- Marten -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Fri Oct 19 12:09:18 2018 From: shoyer at gmail.com (Stephan Hoyer) Date: Fri, 19 Oct 2018 09:09:18 -0700 Subject: [Numpy-discussion] asanyarray vs. asarray In-Reply-To: References: <24100c7f-20fd-4eed-99b0-d37660f52223@Canary> Message-ID: I don't think it makes much sense to change NumPy's existing usage of asarray() to asanyarray() unless we add subok=True arguments (which default to False). But this ends up cluttering NumPy's public API, which is also undesirable. The preferred way to override NumPy functions going forward should be __array_function__. On Fri, Oct 19, 2018 at 8:13 AM Marten van Kerkwijk < m.h.vankerkwijk at gmail.com> wrote: > There are exceptions for `matrix` in quite a few places, and there now is > warning for `maxtrix` - it might not be bad to use `asanyarray` and add an > exception for `maxtrix`. Indeed, I quite like the suggestion by Eric Wieser > to just add the exception to `asanyarray` itself - that way when matrix is > truly deprecated, it will be a very easy change. > > -- Marten > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From einstein.edison at gmail.com Fri Oct 19 12:15:08 2018 From: einstein.edison at gmail.com (Hameer Abbasi) Date: Fri, 19 Oct 2018 18:15:08 +0200 Subject: [Numpy-discussion] asanyarray vs. asarray In-Reply-To: References: <24100c7f-20fd-4eed-99b0-d37660f52223@Canary> Message-ID: Hi! > On Friday, Oct 19, 2018 at 6:09 PM, Stephan Hoyer wrote: > I don't think it makes much sense to change NumPy's existing usage of asarray() to asanyarray() unless we add subok=True arguments (which default to False). But this ends up cluttering NumPy's public API, which is also undesirable. > Agreed so far. > > The preferred way to override NumPy functions going forward should be __array_function__. > I think we should ?soft support? i.e. allow but consider unsupported, the case where one of NumPy?s functions is implemented in terms of others and ?passing through? an array results in the correct behaviour for that array. > > On Fri, Oct 19, 2018 at 8:13 AM Marten van Kerkwijk wrote: > > There are exceptions for `matrix` in quite a few places, and there now is warning for `maxtrix` - it might not be bad to use `asanyarray` and add an exception for `maxtrix`. Indeed, I quite like the suggestion by Eric Wieser to just add the exception to `asanyarray` itself - that way when matrix is truly deprecated, it will be a very easy change. > > > > -- Marten > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org (mailto:NumPy-Discussion at python.org) > > https://mail.python.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion Best Regards, Hameer Abbasi -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Fri Oct 19 12:48:57 2018 From: shoyer at gmail.com (Stephan Hoyer) Date: Fri, 19 Oct 2018 09:48:57 -0700 Subject: [Numpy-discussion] Removing priority labels from github In-Reply-To: <22c3981e90e72bf4df9d02c02ce2277d1c7d6e41.camel@sipsolutions.net> References: <22c3981e90e72bf4df9d02c02ce2277d1c7d6e41.camel@sipsolutions.net> Message-ID: On Fri, Oct 19, 2018 at 2:22 AM Sebastian Berg wrote: > On Fri, 2018-10-19 at 11:02 +0300, Matti Picus wrote: > > We currently have highest, high, normal, low, and lowest priority > > labels > > for github issues/PRs. At the recent status meeting, we proposed > > consolidating these to a single "high" priority label. Anything > > "low" > > priority should be merged or closed since it will be quickly > > forgotten, > > and no "normal" tag is needed. > > > > > > With that, we (the BIDS team) would like to encourage reviewers to > > use > > the "high" priority tag to indicate things we should be working on. > > > > Any objections or thoughts? > > > > Sounds like a plan, especially having practically meaningless tags > right now is no help. Most of them are historical and personally I have > only been using the milestones to tag things as high priority (very > occasionally). > > - Sebastian > +1 from me as well. I haven't been using these tags at all. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Fri Oct 19 16:08:27 2018 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Fri, 19 Oct 2018 20:08:27 +0000 Subject: [Numpy-discussion] Removing priority labels from github In-Reply-To: References: <22c3981e90e72bf4df9d02c02ce2277d1c7d6e41.camel@sipsolutions.net> Message-ID: On Fri, Oct 19, 2018 at 4:49 PM Stephan Hoyer wrote: > > > On Fri, Oct 19, 2018 at 2:22 AM Sebastian Berg > wrote: > >> On Fri, 2018-10-19 at 11:02 +0300, Matti Picus wrote: >> > We currently have highest, high, normal, low, and lowest priority >> > labels >> > for github issues/PRs. At the recent status meeting, we proposed >> > consolidating these to a single "high" priority label. Anything >> > "low" >> > priority should be merged or closed since it will be quickly >> > forgotten, >> > and no "normal" tag is needed. >> > >> > >> > With that, we (the BIDS team) would like to encourage reviewers to >> > use >> > the "high" priority tag to indicate things we should be working on. >> > >> > Any objections or thoughts? >> > >> >> Sounds like a plan, especially having practically meaningless tags >> right now is no help. Most of them are historical and personally I have >> only been using the milestones to tag things as high priority (very >> occasionally). >> >> - Sebastian >> > > +1 from me as well. I haven't been using these tags at all. > +1 Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Fri Oct 19 18:28:28 2018 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Fri, 19 Oct 2018 22:28:28 +0000 Subject: [Numpy-discussion] asanyarray vs. asarray In-Reply-To: References: <24100c7f-20fd-4eed-99b0-d37660f52223@Canary> Message-ID: On Fri, Oct 19, 2018 at 4:15 PM Hameer Abbasi wrote: > Hi! > > On Friday, Oct 19, 2018 at 6:09 PM, Stephan Hoyer > wrote: > I don't think it makes much sense to change NumPy's existing usage of > asarray() to asanyarray() unless we add subok=True arguments (which default > to False). But this ends up cluttering NumPy's public API, which is also > undesirable. > > Agreed so far. > I'm not sure I agree. "subok" is very unpythonic; the average numpy library function should work fine for a well-behaved subclass (i.e. most things out there except np.matrix). > > The preferred way to override NumPy functions going forward should be > __array_function__. > > > I think we should ?soft support? i.e. allow but consider unsupported, the > case where one of NumPy?s functions is implemented in terms of others and > ?passing through? an array results in the correct behaviour for that array. > I don't think we have or want such a concept as "soft support". We intend to not break anything that now has asanyarray, i.e. it's supported and ideally we have regression tests for all such functions. For anything we transition over from asarray to asanyarray, PRs should come with new tests. > > On Fri, Oct 19, 2018 at 8:13 AM Marten van Kerkwijk < > m.h.vankerkwijk at gmail.com> wrote: > >> There are exceptions for `matrix` in quite a few places, and there now is >> warning for `maxtrix` - it might not be bad to use `asanyarray` and add an >> exception for `maxtrix`. Indeed, I quite like the suggestion by Eric Wieser >> to just add the exception to `asanyarray` itself - that way when matrix is >> truly deprecated, it will be a very easy change. >> > I don't quite understand this. Adding exceptions is not deprecation - we then may as well just rip np.matrix out straight away. What I suggested in the call about this issue is that it's not very effective to treat functions like percentile/quantile one by one without an overarching strategy. A way forward could be for someone to write an overview of which sets of functions now have asanyarray (and actually work with subclasses), which ones we can and want to change now, and which ones we can and want to change after np.matrix is gone. Also, some guidelines for new functions that we add to numpy would be handy. I suspect we've been adding new functions that use asarray rather than asanyarray, which is probably undesired. Cheers, Ralf > >> -- Marten >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > > Best Regards, > Hameer Abbasi > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Fri Oct 19 18:40:10 2018 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 19 Oct 2018 15:40:10 -0700 Subject: [Numpy-discussion] asanyarray vs. asarray In-Reply-To: References: <24100c7f-20fd-4eed-99b0-d37660f52223@Canary> Message-ID: On Fri, Oct 19, 2018 at 3:28 PM, Ralf Gommers wrote: > > > On Fri, Oct 19, 2018 at 4:15 PM Hameer Abbasi > wrote: > >> Hi! >> >> On Friday, Oct 19, 2018 at 6:09 PM, Stephan Hoyer >> wrote: >> I don't think it makes much sense to change NumPy's existing usage of >> asarray() to asanyarray() unless we add subok=True arguments (which default >> to False). But this ends up cluttering NumPy's public API, which is also >> undesirable. >> >> Agreed so far. >> > > I'm not sure I agree. "subok" is very unpythonic; the average numpy > library function should work fine for a well-behaved subclass (i.e. most > things out there except np.matrix). > Masked arrays also tend to break code that's not expecting them (e.g. on a masked array, arr.sum()/arr.size will silently compute some meaningless nonsense instead of the mean, and there are lots of formulas out there that have some similarities with 'mean'). And people do all kinds of weird things in third-party array subclasses. Obviously we can't remove asanyarray or break existing code that assumes particular numpy functions use asanyarray, but fundamentally asanyarray is just not an API that makes sense or can be supported in a general way, and our overall goal is to get people to gradually transition away from using ndarray subclasses in general. That's why we're doing all this work to make duck arrays work. So extending asanyarray support doesn't seem like a good priority to spend our limited resources on, to me. -n -- Nathaniel J. Smith -- https://vorpus.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Fri Oct 19 18:46:52 2018 From: shoyer at gmail.com (Stephan Hoyer) Date: Fri, 19 Oct 2018 15:46:52 -0700 Subject: [Numpy-discussion] asanyarray vs. asarray In-Reply-To: References: <24100c7f-20fd-4eed-99b0-d37660f52223@Canary> Message-ID: > > I think we should ?soft support? i.e. allow but consider unsupported, the >> case where one of NumPy?s functions is implemented in terms of others and >> ?passing through? an array results in the correct behaviour for that array. >> > > I don't think we have or want such a concept as "soft support". We intend > to not break anything that now has asanyarray, i.e. it's supported and > ideally we have regression tests for all such functions. For anything we > transition over from asarray to asanyarray, PRs should come with new tests. > The problem with asanyarray() is that there isn't any well defined subclass API for NumPy, beyond "mostly works like a NumPy array." If every NumPy subclass strictly obeyed the Liskov Substitution Principle asanyarray() would be fine, but in practice every subclass I've encountered deviates from the behavior of numpy.ndarray in some way. The means the NumPy codebase has ended up littered with hacks/workarounds to support various specific subclasses, and new subclasses still don't work reliably. This makes it challenging to change existing code. For an example of how bad this is gotten, look at all the work-arounds I had to add to support np.testing.assert_array_equal() on ndarray subclasses in this recent PR: https://github.com/numpy/numpy/pull/12119 My hope is that __array_function__ will finally let us put a stop to this by offering a better alternative to subclassing. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Fri Oct 19 19:01:40 2018 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Fri, 19 Oct 2018 23:01:40 +0000 Subject: [Numpy-discussion] asanyarray vs. asarray In-Reply-To: References: <24100c7f-20fd-4eed-99b0-d37660f52223@Canary> Message-ID: On Fri, Oct 19, 2018 at 10:28 PM Ralf Gommers wrote: > > > On Fri, Oct 19, 2018 at 4:15 PM Hameer Abbasi > wrote: > >> Hi! >> >> On Friday, Oct 19, 2018 at 6:09 PM, Stephan Hoyer >> wrote: >> I don't think it makes much sense to change NumPy's existing usage of >> asarray() to asanyarray() unless we add subok=True arguments (which default >> to False). But this ends up cluttering NumPy's public API, which is also >> undesirable. >> >> Agreed so far. >> > > I'm not sure I agree. "subok" is very unpythonic; the average numpy > library function should work fine for a well-behaved subclass (i.e. most > things out there except np.matrix). > >> >> The preferred way to override NumPy functions going forward should be >> __array_function__. >> >> >> I think we should ?soft support? i.e. allow but consider unsupported, the >> case where one of NumPy?s functions is implemented in terms of others and >> ?passing through? an array results in the correct behaviour for that array. >> > > I don't think we have or want such a concept as "soft support". We intend > to not break anything that now has asanyarray, i.e. it's supported and > ideally we have regression tests for all such functions. For anything we > transition over from asarray to asanyarray, PRs should come with new tests. > > >> >> On Fri, Oct 19, 2018 at 8:13 AM Marten van Kerkwijk < >> m.h.vankerkwijk at gmail.com> wrote: >> >>> There are exceptions for `matrix` in quite a few places, and there now >>> is warning for `maxtrix` - it might not be bad to use `asanyarray` and add >>> an exception for `maxtrix`. Indeed, I quite like the suggestion by Eric >>> Wieser to just add the exception to `asanyarray` itself - that way when >>> matrix is truly deprecated, it will be a very easy change. >>> >> I don't quite understand this. Adding exceptions is not deprecation - we > then may as well just rip np.matrix out straight away. > > What I suggested in the call about this issue is that it's not very > effective to treat functions like percentile/quantile one by one without an > overarching strategy. A way forward could be for someone to write an > overview of which sets of functions now have asanyarray (and actually work > with subclasses), which ones we can and want to change now, and which ones > we can and want to change after np.matrix is gone. Also, some guidelines > for new functions that we add to numpy would be handy. I suspect we've been > adding new functions that use asarray rather than asanyarray, which is > probably undesired. > Thanks Nathaniel and Stephan. Your comments on my other two points are both clear and correct (and have been made a number of times before). I think the "write an overview so we can stop making ad-hoc decisions and having these discussions" is the most important point I was trying to make though. If we had such a doc and it concluded "hence we don't change anything, __array_function__ is the only way to go" then we can just close PRs like https://github.com/numpy/numpy/pull/11162 straight away. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.h.vankerkwijk at gmail.com Fri Oct 19 21:23:21 2018 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Fri, 19 Oct 2018 21:23:21 -0400 Subject: [Numpy-discussion] asanyarray vs. asarray In-Reply-To: References: <24100c7f-20fd-4eed-99b0-d37660f52223@Canary> Message-ID: Hi All, It seems there are two extreme possibilities for general functions: 1. Put `asarray` everywhere. The main benefit that I can see is that even if people put in list instead of arrays, one is guaranteed to have shape, dtype, etc. But it seems a bit like calling `int` on everything that might get used as an index, instead of letting the actual indexing do the proper thing and call `__index__`. 2. Do not coerce at all, but rather write code assuming something is an array already. This will often, but not always, just work for array mimics, with coercion done only where necessary (e.g., in lower-lying C code such as that of the ufuncs which has a smaller API surface and can be overridden more easily). The current __array_function__ work may well provide us with a way to combine both, if we (over time) move the coercion inside `ndarray.__array_function__` so that the actual implementation *can* assume it deals with pure ndarray - then, when relevant, calling that implementation will be what subclasses/duck arrays can happily do (and it is up to them to ensure this works). Of course, the above does not really answer what to do in the meantime. But perhaps it helps in thinking of what we are actually aiming for. One last thing: could we please stop bashing subclasses? One can subclass essentially everything in python, often to great advantage. Subclasses such as MaskedArray and, yes, Quantity, are widely used, and if they cause problems perhaps that should be seen as a sign that ndarray subclassing should be made easier and clearer. All the best, Marten On Fri, Oct 19, 2018 at 7:02 PM Ralf Gommers wrote: > > > On Fri, Oct 19, 2018 at 10:28 PM Ralf Gommers > wrote: > >> >> >> On Fri, Oct 19, 2018 at 4:15 PM Hameer Abbasi >> wrote: >> >>> Hi! >>> >>> On Friday, Oct 19, 2018 at 6:09 PM, Stephan Hoyer >>> wrote: >>> I don't think it makes much sense to change NumPy's existing usage of >>> asarray() to asanyarray() unless we add subok=True arguments (which default >>> to False). But this ends up cluttering NumPy's public API, which is also >>> undesirable. >>> >>> Agreed so far. >>> >> >> I'm not sure I agree. "subok" is very unpythonic; the average numpy >> library function should work fine for a well-behaved subclass (i.e. most >> things out there except np.matrix). >> >>> >>> The preferred way to override NumPy functions going forward should be >>> __array_function__. >>> >>> >>> I think we should ?soft support? i.e. allow but consider unsupported, >>> the case where one of NumPy?s functions is implemented in terms of others >>> and ?passing through? an array results in the correct behaviour for that >>> array. >>> >> >> I don't think we have or want such a concept as "soft support". We intend >> to not break anything that now has asanyarray, i.e. it's supported and >> ideally we have regression tests for all such functions. For anything we >> transition over from asarray to asanyarray, PRs should come with new tests. >> >> >>> >>> On Fri, Oct 19, 2018 at 8:13 AM Marten van Kerkwijk < >>> m.h.vankerkwijk at gmail.com> wrote: >>> >>>> There are exceptions for `matrix` in quite a few places, and there now >>>> is warning for `maxtrix` - it might not be bad to use `asanyarray` and add >>>> an exception for `maxtrix`. Indeed, I quite like the suggestion by Eric >>>> Wieser to just add the exception to `asanyarray` itself - that way when >>>> matrix is truly deprecated, it will be a very easy change. >>>> >>> I don't quite understand this. Adding exceptions is not deprecation - we >> then may as well just rip np.matrix out straight away. >> >> What I suggested in the call about this issue is that it's not very >> effective to treat functions like percentile/quantile one by one without an >> overarching strategy. A way forward could be for someone to write an >> overview of which sets of functions now have asanyarray (and actually work >> with subclasses), which ones we can and want to change now, and which ones >> we can and want to change after np.matrix is gone. Also, some guidelines >> for new functions that we add to numpy would be handy. I suspect we've been >> adding new functions that use asarray rather than asanyarray, which is >> probably undesired. >> > > Thanks Nathaniel and Stephan. Your comments on my other two points are > both clear and correct (and have been made a number of times before). I > think the "write an overview so we can stop making ad-hoc decisions and > having these discussions" is the most important point I was trying to make > though. If we had such a doc and it concluded "hence we don't change > anything, __array_function__ is the only way to go" then we can just close > PRs like https://github.com/numpy/numpy/pull/11162 straight away. > > Cheers, > Ralf > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wieser.eric+numpy at gmail.com Fri Oct 19 21:49:44 2018 From: wieser.eric+numpy at gmail.com (Eric Wieser) Date: Fri, 19 Oct 2018 18:49:44 -0700 Subject: [Numpy-discussion] asanyarray vs. asarray In-Reply-To: References: <24100c7f-20fd-4eed-99b0-d37660f52223@Canary> Message-ID: Subclasses such as MaskedArray and, yes, Quantity, are widely used, and if they cause problems perhaps that should be seen as a sign that ndarray subclassing should be made easier and clearer. Both maskedarray and quantity seem like something that would make more sense at the dtype level if our dtype system was easier to extend. It might be good to compile a list of subclassing applications, and split them into ?this ought to be a dtype? and ?this ought to be a different type of container?. ? On Fri, 19 Oct 2018 at 18:24 Marten van Kerkwijk wrote: > Hi All, > > It seems there are two extreme possibilities for general functions: > 1. Put `asarray` everywhere. The main benefit that I can see is that even > if people put in list instead of arrays, one is guaranteed to have shape, > dtype, etc. But it seems a bit like calling `int` on everything that might > get used as an index, instead of letting the actual indexing do the proper > thing and call `__index__`. > 2. Do not coerce at all, but rather write code assuming something is an > array already. This will often, but not always, just work for array mimics, > with coercion done only where necessary (e.g., in lower-lying C code such > as that of the ufuncs which has a smaller API surface and can be overridden > more easily). > > The current __array_function__ work may well provide us with a way to > combine both, if we (over time) move the coercion inside > `ndarray.__array_function__` so that the actual implementation *can* assume > it deals with pure ndarray - then, when relevant, calling that > implementation will be what subclasses/duck arrays can happily do (and it > is up to them to ensure this works). > > Of course, the above does not really answer what to do in the meantime. > But perhaps it helps in thinking of what we are actually aiming for. > > One last thing: could we please stop bashing subclasses? One can subclass > essentially everything in python, often to great advantage. Subclasses such > as MaskedArray and, yes, Quantity, are widely used, and if they cause > problems perhaps that should be seen as a sign that ndarray subclassing > should be made easier and clearer. > > All the best, > > Marten > > > On Fri, Oct 19, 2018 at 7:02 PM Ralf Gommers > wrote: > >> >> >> On Fri, Oct 19, 2018 at 10:28 PM Ralf Gommers >> wrote: >> >>> >>> >>> On Fri, Oct 19, 2018 at 4:15 PM Hameer Abbasi >>> wrote: >>> >>>> Hi! >>>> >>>> On Friday, Oct 19, 2018 at 6:09 PM, Stephan Hoyer >>>> wrote: >>>> I don't think it makes much sense to change NumPy's existing usage of >>>> asarray() to asanyarray() unless we add subok=True arguments (which default >>>> to False). But this ends up cluttering NumPy's public API, which is also >>>> undesirable. >>>> >>>> Agreed so far. >>>> >>> >>> I'm not sure I agree. "subok" is very unpythonic; the average numpy >>> library function should work fine for a well-behaved subclass (i.e. most >>> things out there except np.matrix). >>> >>>> >>>> The preferred way to override NumPy functions going forward should be >>>> __array_function__. >>>> >>>> >>>> I think we should ?soft support? i.e. allow but consider unsupported, >>>> the case where one of NumPy?s functions is implemented in terms of others >>>> and ?passing through? an array results in the correct behaviour for that >>>> array. >>>> >>> >>> I don't think we have or want such a concept as "soft support". We >>> intend to not break anything that now has asanyarray, i.e. it's supported >>> and ideally we have regression tests for all such functions. For anything >>> we transition over from asarray to asanyarray, PRs should come with new >>> tests. >>> >>> >>>> >>>> On Fri, Oct 19, 2018 at 8:13 AM Marten van Kerkwijk < >>>> m.h.vankerkwijk at gmail.com> wrote: >>>> >>>>> There are exceptions for `matrix` in quite a few places, and there now >>>>> is warning for `maxtrix` - it might not be bad to use `asanyarray` and add >>>>> an exception for `maxtrix`. Indeed, I quite like the suggestion by Eric >>>>> Wieser to just add the exception to `asanyarray` itself - that way when >>>>> matrix is truly deprecated, it will be a very easy change. >>>>> >>>> I don't quite understand this. Adding exceptions is not deprecation - >>> we then may as well just rip np.matrix out straight away. >>> >>> What I suggested in the call about this issue is that it's not very >>> effective to treat functions like percentile/quantile one by one without an >>> overarching strategy. A way forward could be for someone to write an >>> overview of which sets of functions now have asanyarray (and actually work >>> with subclasses), which ones we can and want to change now, and which ones >>> we can and want to change after np.matrix is gone. Also, some guidelines >>> for new functions that we add to numpy would be handy. I suspect we've been >>> adding new functions that use asarray rather than asanyarray, which is >>> probably undesired. >>> >> >> Thanks Nathaniel and Stephan. Your comments on my other two points are >> both clear and correct (and have been made a number of times before). I >> think the "write an overview so we can stop making ad-hoc decisions and >> having these discussions" is the most important point I was trying to make >> though. If we had such a doc and it concluded "hence we don't change >> anything, __array_function__ is the only way to go" then we can just close >> PRs like https://github.com/numpy/numpy/pull/11162 straight away. >> >> Cheers, >> Ralf >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Oct 19 22:00:02 2018 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 19 Oct 2018 20:00:02 -0600 Subject: [Numpy-discussion] asanyarray vs. asarray In-Reply-To: References: <24100c7f-20fd-4eed-99b0-d37660f52223@Canary> Message-ID: On Fri, Oct 19, 2018 at 7:50 PM Eric Wieser wrote: > Subclasses such as MaskedArray and, yes, Quantity, are widely used, and if > they cause problems perhaps that should be seen as a sign that ndarray > subclassing should be made easier and clearer. > > Both maskedarray and quantity seem like something that would make more > sense at the dtype level if our dtype system was easier to extend. It might > be good to compile a list of subclassing applications, and split them into > ?this ought to be a dtype? and ?this ought to be a different type of > container?. > Wes Mckinney has been benchmarking masks vs sentinel values for arrow: http://wesmckinney.com/blog/bitmaps-vs-sentinel-values/. The (bit) masks are faster. I'm not convinced dtypes are the way to go. Chuck > On Fri, 19 Oct 2018 at 18:24 Marten van Kerkwijk < > m.h.vankerkwijk at gmail.com> wrote: > >> Hi All, >> >> It seems there are two extreme possibilities for general functions: >> 1. Put `asarray` everywhere. The main benefit that I can see is that even >> if people put in list instead of arrays, one is guaranteed to have shape, >> dtype, etc. But it seems a bit like calling `int` on everything that might >> get used as an index, instead of letting the actual indexing do the proper >> thing and call `__index__`. >> 2. Do not coerce at all, but rather write code assuming something is an >> array already. This will often, but not always, just work for array mimics, >> with coercion done only where necessary (e.g., in lower-lying C code such >> as that of the ufuncs which has a smaller API surface and can be overridden >> more easily). >> >> The current __array_function__ work may well provide us with a way to >> combine both, if we (over time) move the coercion inside >> `ndarray.__array_function__` so that the actual implementation *can* assume >> it deals with pure ndarray - then, when relevant, calling that >> implementation will be what subclasses/duck arrays can happily do (and it >> is up to them to ensure this works). >> >> Of course, the above does not really answer what to do in the meantime. >> But perhaps it helps in thinking of what we are actually aiming for. >> >> One last thing: could we please stop bashing subclasses? One can subclass >> essentially everything in python, often to great advantage. Subclasses such >> as MaskedArray and, yes, Quantity, are widely used, and if they cause >> problems perhaps that should be seen as a sign that ndarray subclassing >> should be made easier and clearer. >> >> All the best, >> >> Marten >> >> >> On Fri, Oct 19, 2018 at 7:02 PM Ralf Gommers >> wrote: >> >>> >>> >>> On Fri, Oct 19, 2018 at 10:28 PM Ralf Gommers >>> wrote: >>> >>>> >>>> >>>> On Fri, Oct 19, 2018 at 4:15 PM Hameer Abbasi < >>>> einstein.edison at gmail.com> wrote: >>>> >>>>> Hi! >>>>> >>>>> On Friday, Oct 19, 2018 at 6:09 PM, Stephan Hoyer >>>>> wrote: >>>>> I don't think it makes much sense to change NumPy's existing usage of >>>>> asarray() to asanyarray() unless we add subok=True arguments (which default >>>>> to False). But this ends up cluttering NumPy's public API, which is also >>>>> undesirable. >>>>> >>>>> Agreed so far. >>>>> >>>> >>>> I'm not sure I agree. "subok" is very unpythonic; the average numpy >>>> library function should work fine for a well-behaved subclass (i.e. most >>>> things out there except np.matrix). >>>> >>>>> >>>>> The preferred way to override NumPy functions going forward should be >>>>> __array_function__. >>>>> >>>>> >>>>> I think we should ?soft support? i.e. allow but consider unsupported, >>>>> the case where one of NumPy?s functions is implemented in terms of others >>>>> and ?passing through? an array results in the correct behaviour for that >>>>> array. >>>>> >>>> >>>> I don't think we have or want such a concept as "soft support". We >>>> intend to not break anything that now has asanyarray, i.e. it's supported >>>> and ideally we have regression tests for all such functions. For anything >>>> we transition over from asarray to asanyarray, PRs should come with new >>>> tests. >>>> >>>> >>>>> >>>>> On Fri, Oct 19, 2018 at 8:13 AM Marten van Kerkwijk < >>>>> m.h.vankerkwijk at gmail.com> wrote: >>>>> >>>>>> There are exceptions for `matrix` in quite a few places, and there >>>>>> now is warning for `maxtrix` - it might not be bad to use `asanyarray` and >>>>>> add an exception for `maxtrix`. Indeed, I quite like the suggestion by Eric >>>>>> Wieser to just add the exception to `asanyarray` itself - that way when >>>>>> matrix is truly deprecated, it will be a very easy change. >>>>>> >>>>> I don't quite understand this. Adding exceptions is not deprecation - >>>> we then may as well just rip np.matrix out straight away. >>>> >>>> What I suggested in the call about this issue is that it's not very >>>> effective to treat functions like percentile/quantile one by one without an >>>> overarching strategy. A way forward could be for someone to write an >>>> overview of which sets of functions now have asanyarray (and actually work >>>> with subclasses), which ones we can and want to change now, and which ones >>>> we can and want to change after np.matrix is gone. Also, some guidelines >>>> for new functions that we add to numpy would be handy. I suspect we've been >>>> adding new functions that use asarray rather than asanyarray, which is >>>> probably undesired. >>>> >>> >>> Thanks Nathaniel and Stephan. Your comments on my other two points are >>> both clear and correct (and have been made a number of times before). I >>> think the "write an overview so we can stop making ad-hoc decisions and >>> having these discussions" is the most important point I was trying to make >>> though. If we had such a doc and it concluded "hence we don't change >>> anything, __array_function__ is the only way to go" then we can just close >>> PRs like https://github.com/numpy/numpy/pull/11162 straight away. >>> >>> Cheers, >>> Ralf >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Fri Oct 19 22:08:43 2018 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 19 Oct 2018 19:08:43 -0700 Subject: [Numpy-discussion] asanyarray vs. asarray In-Reply-To: References: <24100c7f-20fd-4eed-99b0-d37660f52223@Canary> Message-ID: On Fri, Oct 19, 2018 at 7:00 PM, Charles R Harris wrote: > > On Fri, Oct 19, 2018 at 7:50 PM Eric Wieser > wrote: >> >> Subclasses such as MaskedArray and, yes, Quantity, are widely used, and if >> they cause problems perhaps that should be seen as a sign that ndarray >> subclassing should be made easier and clearer. >> >> Both maskedarray and quantity seem like something that would make more >> sense at the dtype level if our dtype system was easier to extend. It might >> be good to compile a list of subclassing applications, and split them into >> ?this ought to be a dtype? and ?this ought to be a different type of >> container?. > > Wes Mckinney has been benchmarking masks vs sentinel values for arrow: > http://wesmckinney.com/blog/bitmaps-vs-sentinel-values/. The (bit) masks are > faster. I'm not convinced dtypes are the way to go. We need to add better support for both user-defined dtypes and for user-defined containers in any case. So we're going to support both missing value strategies regardless, and people will be able to choose based on engineering trade-offs. A missing value dtype is going to integrate much more easily into the rest of numpy than a new container where you have to reimplement indexing etc., but maybe custom containers can be faster. Okay, cool, they're both on PyPI, pick your favorite! Trying to wedge masks into *ndarray* seems like a non-starter, though, because it would require auditing and updating basically all code using the numpy C API. -n -- Nathaniel J. Smith -- https://vorpus.org From njs at pobox.com Fri Oct 19 22:50:01 2018 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 19 Oct 2018 19:50:01 -0700 Subject: [Numpy-discussion] asanyarray vs. asarray In-Reply-To: References: <24100c7f-20fd-4eed-99b0-d37660f52223@Canary> Message-ID: On Fri, Oct 19, 2018 at 6:23 PM, Marten van Kerkwijk wrote: > Hi All, > > It seems there are two extreme possibilities for general functions: > 1. Put `asarray` everywhere. The main benefit that I can see is that even if > people put in list instead of arrays, one is guaranteed to have shape, > dtype, etc. But it seems a bit like calling `int` on everything that might > get used as an index, instead of letting the actual indexing do the proper > thing and call `__index__`. > 2. Do not coerce at all, but rather write code assuming something is an > array already. This will often, but not always, just work for array mimics, > with coercion done only where necessary (e.g., in lower-lying C code such as > that of the ufuncs which has a smaller API surface and can be overridden > more easily). Between these two options, Numpy's APIs are very firmly on the side of "option 1", and this is common in most public APIs I'm familiar with (e.g. scipy). I guess you could try to reopen the discussion, but you'd be pushing against 15+ years of precedent there... > The current __array_function__ work may well provide us with a way to > combine both, if we (over time) move the coercion inside > `ndarray.__array_function__` so that the actual implementation *can* assume > it deals with pure ndarray - then, when relevant, calling that > implementation will be what subclasses/duck arrays can happily do (and it is > up to them to ensure this works). > > Of course, the above does not really answer what to do in the meantime. But > perhaps it helps in thinking of what we are actually aiming for. We need some kind of asduckarray(), that coerces lists and similar but allows duck-arrays to pass through. > One last thing: could we please stop bashing subclasses? One can subclass > essentially everything in python, often to great advantage. Subclasses such > as MaskedArray and, yes, Quantity, are widely used, and if they cause > problems perhaps that should be seen as a sign that ndarray subclassing > should be made easier and clearer. Who's bashing? I've spent years thinking about this and come to the conclusion that there are no viable solutions to the problems with subclassing ndarray, but that's not the same as bashing :-). If you've thought of something we've missed, you should share it... (I also know lots of senior Python devs who believe that using Python's subclassing support is pretty much always a mistake ? this talk is popularly cited: https://www.youtube.com/watch?v=3MNVP9-hglc ? but the issues with ndarray are much more severe than for the average Python class.) -n -- Nathaniel J. Smith -- https://vorpus.org From charlesr.harris at gmail.com Sat Oct 20 13:08:41 2018 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 20 Oct 2018 11:08:41 -0600 Subject: [Numpy-discussion] Removing priority labels from github In-Reply-To: References: <22c3981e90e72bf4df9d02c02ce2277d1c7d6e41.camel@sipsolutions.net> Message-ID: On Fri, Oct 19, 2018 at 2:10 PM Ralf Gommers wrote: > > > On Fri, Oct 19, 2018 at 4:49 PM Stephan Hoyer wrote: > >> >> >> On Fri, Oct 19, 2018 at 2:22 AM Sebastian Berg < >> sebastian at sipsolutions.net> wrote: >> >>> On Fri, 2018-10-19 at 11:02 +0300, Matti Picus wrote: >>> > We currently have highest, high, normal, low, and lowest priority >>> > labels >>> > for github issues/PRs. At the recent status meeting, we proposed >>> > consolidating these to a single "high" priority label. Anything >>> > "low" >>> > priority should be merged or closed since it will be quickly >>> > forgotten, >>> > and no "normal" tag is needed. >>> > >>> > >>> > With that, we (the BIDS team) would like to encourage reviewers to >>> > use >>> > the "high" priority tag to indicate things we should be working on. >>> > >>> > Any objections or thoughts? >>> > >>> >>> Sounds like a plan, especially having practically meaningless tags >>> right now is no help. Most of them are historical and personally I have >>> only been using the milestones to tag things as high priority (very >>> occasionally). >>> >>> - Sebastian >>> >> >> +1 from me as well. I haven't been using these tags at all. >> > > +1 > > +1 I may have used one of the priority labels once or twice, I don't really remember. When I think something needs to be fixed or merged I generally add a benchmark. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Oct 20 13:09:25 2018 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 20 Oct 2018 11:09:25 -0600 Subject: [Numpy-discussion] Removing priority labels from github In-Reply-To: References: <22c3981e90e72bf4df9d02c02ce2277d1c7d6e41.camel@sipsolutions.net> Message-ID: On Sat, Oct 20, 2018 at 11:08 AM Charles R Harris wrote: > > > On Fri, Oct 19, 2018 at 2:10 PM Ralf Gommers > wrote: > >> >> >> On Fri, Oct 19, 2018 at 4:49 PM Stephan Hoyer wrote: >> >>> >>> >>> On Fri, Oct 19, 2018 at 2:22 AM Sebastian Berg < >>> sebastian at sipsolutions.net> wrote: >>> >>>> On Fri, 2018-10-19 at 11:02 +0300, Matti Picus wrote: >>>> > We currently have highest, high, normal, low, and lowest priority >>>> > labels >>>> > for github issues/PRs. At the recent status meeting, we proposed >>>> > consolidating these to a single "high" priority label. Anything >>>> > "low" >>>> > priority should be merged or closed since it will be quickly >>>> > forgotten, >>>> > and no "normal" tag is needed. >>>> > >>>> > >>>> > With that, we (the BIDS team) would like to encourage reviewers to >>>> > use >>>> > the "high" priority tag to indicate things we should be working on. >>>> > >>>> > Any objections or thoughts? >>>> > >>>> >>>> Sounds like a plan, especially having practically meaningless tags >>>> right now is no help. Most of them are historical and personally I have >>>> only been using the milestones to tag things as high priority (very >>>> occasionally). >>>> >>>> - Sebastian >>>> >>> >>> +1 from me as well. I haven't been using these tags at all. >>> >> >> +1 >> >> > +1 I may have used one of the priority labels once or twice, I don't > really remember. When I think something needs to be fixed or merged I > generally add a benchmark. > > benchmark <- milestone. -------------- next part -------------- An HTML attachment was scrubbed... URL: From matti.picus at gmail.com Mon Oct 22 02:56:37 2018 From: matti.picus at gmail.com (Matti Picus) Date: Mon, 22 Oct 2018 09:56:37 +0300 Subject: [Numpy-discussion] Reminder: weekly status meeting Message-ID: Hi everyone, The team at BIDS meets once a week to discuss progress, priorities, and roadblocks. While our priorities are broadly determined by the project roadmap [0], we would like to provide an opportunity for the community to give more regular and detailed feedback on our work. We therefore invite you to join us for our weekly calls, each **Wednesday from 12:00 to 13:00 Pacific Time**. Detail of the next meeting (2018-10-24) is given in the agenda [1], which is a living document. Feel free to add topics you wish to discuss. We hope to see you there! Best regards, St?fan, Tyler, Matti [0]https://www.numpy.org/neps/index.html [1]https://hackmd.io/5WZ6VwQKSbSR_4Ng65pUFw?both From charlesr.harris at gmail.com Mon Oct 22 14:06:53 2018 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 22 Oct 2018 12:06:53 -0600 Subject: [Numpy-discussion] NumPy 1.15.3 release Message-ID: Hi All, On behalf of the NumPy team, I am pleased to announce the release of NumPy 1.15.3. This is a bugfix release for bugs and regressions reported following the 1.15.2 release. The most noticeable fix is probably for the memory leak encountered when slicing classes derived from Numpy. The Python versions supported by this release are 2.7, 3.4-3.7. Wheels for this release can be downloaded from PyPI , source archives are available from Github . Compatibility Note ================== The NumPy 1.15.x OS X wheels released on PyPI no longer contain 32-bit binaries. That will also be the case in future releases. See `#11625 `__ for the related discussion. Those needing 32-bit support should look elsewhere or build from source. Contributors ============ A total of 7 people contributed to this release. People with a "+" by their names contributed a patch for the first time. * Allan Haldane * Charles Harris * Jeroen Demeyer * Kevin Sheppard * Matthew Bowden + * Matti Picus * Tyler Reddy Pull requests merged ==================== A total of 12 pull requests were merged for this release. * `#12080 `__: MAINT: Blacklist some MSVC complex functions. * `#12083 `__: TST: Add azure CI testing to 1.15.x branch. * `#12084 `__: BUG: test_path() now uses Path.resolve() * `#12085 `__: TST, MAINT: Fix some failing tests on azure-pipelines mac and... * `#12187 `__: BUG: Fix memory leak in mapping.c * `#12188 `__: BUG: Allow boolean subtract in histogram * `#12189 `__: BUG: Fix in-place permutation * `#12190 `__: BUG: limit default for get_num_build_jobs() to 8 * `#12191 `__: BUG: OBJECT_to_* should check for errors * `#12192 `__: DOC: Prepare for NumPy 1.15.3 release. * `#12237 `__: BUG: Fix MaskedArray fill_value type conversion. * `#12238 `__: TST: Backport azure-pipeline testing fixes for Mac Cheers, Charles Harris -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefanv at berkeley.edu Wed Oct 24 18:07:50 2018 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Wed, 24 Oct 2018 15:07:50 -0700 Subject: [Numpy-discussion] Reminder: weekly status meeting In-Reply-To: References: Message-ID: <20181024220750.pytz7dav4dabeplx@carbo> Hi all, On Mon, 22 Oct 2018 09:56:37 +0300, Matti Picus wrote: > We therefore invite you to join us for our weekly calls, > each **Wednesday from 12:00 to 13:00 Pacific Time**. > > Detail of the next meeting (2018-10-24) is given in the agenda This week's meeting notes are at: https://github.com/BIDS-numpy/docs/blob/master/status_meetings/status-2018-10-24.md St?fan From einstein.edison at gmail.com Thu Oct 25 06:40:16 2018 From: einstein.edison at gmail.com (Hameer Abbasi) Date: Thu, 25 Oct 2018 12:40:16 +0200 Subject: [Numpy-discussion] Reminder: weekly status meeting In-Reply-To: <20181024220750.pytz7dav4dabeplx@carbo> References: <20181024220750.pytz7dav4dabeplx@carbo> Message-ID: <1f0af61b-62f6-4f98-a5e7-6241855a7006@Canary> Hi! Sorry to miss this week?s meeting. If I may point out an inaccuracy in the notes: in PyData/Sparse most things are implemented from the ground up without relying on scipy.sparse. The only part that does rely on it is `sparse.matmul`, `sparse.dot` and `sparse.tensordot`, as well as a few conversions to/from SciPy, if these could depend on Cython wrappers instead that?d be nice. I should probably update the docs on that. If anyone is willing to discuss pydata/sparse with me, I?ll be available for a meeting anytime. Best Regards, Hameer Abbasi > On Thursday, Oct 25, 2018 at 12:08 AM, Stefan van der Walt wrote: > Hi all, > > On Mon, 22 Oct 2018 09:56:37 +0300, Matti Picus wrote: > > We therefore invite you to join us for our weekly calls, > > each **Wednesday from 12:00 to 13:00 Pacific Time**. > > > > Detail of the next meeting (2018-10-24) is given in the agenda > > This week's meeting notes are at: > > https://github.com/BIDS-numpy/docs/blob/master/status_meetings/status-2018-10-24.md > > St?fan > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From alex.rogozhnikov at yandex.ru Thu Oct 25 16:16:32 2018 From: alex.rogozhnikov at yandex.ru (Alex Rogozhnikov) Date: Thu, 25 Oct 2018 23:16:32 +0300 Subject: [Numpy-discussion] Depreciating asfortranarray and ascontiguousarray Message-ID: <8741181540498592@myt3-c573aa6fc782.qloud-c.yandex.net> An HTML attachment was scrubbed... URL: From joferkington at gmail.com Thu Oct 25 17:00:26 2018 From: joferkington at gmail.com (Joe Kington) Date: Thu, 25 Oct 2018 16:00:26 -0500 Subject: [Numpy-discussion] Depreciating asfortranarray and ascontiguousarray In-Reply-To: <8741181540498592@myt3-c573aa6fc782.qloud-c.yandex.net> References: <8741181540498592@myt3-c573aa6fc782.qloud-c.yandex.net> Message-ID: For what it's worth, these are fairly widely used functions. From a user standpoint, I'd gently argue against deprecating them. Documenting the inconsistency with scalars seems like a less invasive approach. In particular ascontiguousarray is a very common check to make when working with C libraries or low-level file formats. A significant advantage over asarray(..., order='C') is readability. It makes the intention very clear. Similarly, asfortranarray is quite readable for folks that aren't deeply familiar with numpy. Given that the use-cases they're primarily used for are likely to be read by developers working in other languages (i.e. ascontiguousarray gets used at a lot of "boundaries" with other systems), keeping function names that make intention very clear is important. Just my $0.02, anyway. Cheers, -Joe On Thu, Oct 25, 2018 at 3:17 PM Alex Rogozhnikov wrote: > Dear numpy community, > > I'm planning to depreciate np.asfortranarray and np.ascontiguousarray > functions due to their misbehavior on scalar (0-D tensors) with PR #12244. > > Current behavior (converting scalars to 1-d array with single element) > - is unexpected and contradicts to documentation > - probably, can't be changed without breaking external code > - I believe, this was a cause for poor support of 0-d arrays in mxnet. > - both functions are easily replaced with asarray(..., order='...'), which > has expected behavior > > There is no timeline for removal - we just need to discourage from using > this functions in new code. > > Function naming may be related to how numpy treats 0-d tensors specially, > and those probably should not be called arrays. > https://www.numpy.org/neps/nep-0027-zero-rank-arrarys.html > However, as a user I never thought about 0-d arrays being special and > being "not arrays". > > > Please see original discussion at github for more details > https://github.com/numpy/numpy/issues/5300 > > Your comments welcome, > Alex Rogozhnikov > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jfoxrabinovitz at gmail.com Thu Oct 25 17:47:37 2018 From: jfoxrabinovitz at gmail.com (Joseph Fox-Rabinovitz) Date: Thu, 25 Oct 2018 17:47:37 -0400 Subject: [Numpy-discussion] Depreciating asfortranarray and ascontiguousarray In-Reply-To: References: <8741181540498592@myt3-c573aa6fc782.qloud-c.yandex.net> Message-ID: In that vein, would it be advisable to re-implement them as aliases for the correctly behaving functions instead? - Joe On Thu, Oct 25, 2018 at 5:01 PM Joe Kington wrote: > For what it's worth, these are fairly widely used functions. From a user > standpoint, I'd gently argue against deprecating them. Documenting the > inconsistency with scalars seems like a less invasive approach. > > In particular ascontiguousarray is a very common check to make when > working with C libraries or low-level file formats. A significant > advantage over asarray(..., order='C') is readability. It makes the > intention very clear. Similarly, asfortranarray is quite readable for > folks that aren't deeply familiar with numpy. > > Given that the use-cases they're primarily used for are likely to be read > by developers working in other languages (i.e. ascontiguousarray gets used > at a lot of "boundaries" with other systems), keeping function names that > make intention very clear is important. > > Just my $0.02, anyway. Cheers, > -Joe > > On Thu, Oct 25, 2018 at 3:17 PM Alex Rogozhnikov < > alex.rogozhnikov at yandex.ru> wrote: > >> Dear numpy community, >> >> I'm planning to depreciate np.asfortranarray and np.ascontiguousarray >> functions due to their misbehavior on scalar (0-D tensors) with PR #12244 >> . >> >> Current behavior (converting scalars to 1-d array with single element) >> - is unexpected and contradicts to documentation >> - probably, can't be changed without breaking external code >> - I believe, this was a cause for poor support of 0-d arrays in mxnet. >> - both functions are easily replaced with asarray(..., order='...'), >> which has expected behavior >> >> There is no timeline for removal - we just need to discourage from using >> this functions in new code. >> >> Function naming may be related to how numpy treats 0-d tensors specially, >> >> and those probably should not be called arrays. >> https://www.numpy.org/neps/nep-0027-zero-rank-arrarys.html >> However, as a user I never thought about 0-d arrays being special and >> being "not arrays". >> >> >> Please see original discussion at github for more details >> https://github.com/numpy/numpy/issues/5300 >> >> Your comments welcome, >> Alex Rogozhnikov >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From deak.andris at gmail.com Thu Oct 25 18:10:05 2018 From: deak.andris at gmail.com (Andras Deak) Date: Fri, 26 Oct 2018 00:10:05 +0200 Subject: [Numpy-discussion] Depreciating asfortranarray and ascontiguousarray In-Reply-To: References: <8741181540498592@myt3-c573aa6fc782.qloud-c.yandex.net> Message-ID: On Thu, Oct 25, 2018 at 11:48 PM Joseph Fox-Rabinovitz wrote: > > In that vein, would it be advisable to re-implement them as aliases for the correctly behaving functions instead? > > - Joe Wouldn't "probably, can't be changed without breaking external code" still apply? As I understand the suggestion for _deprecation_ is only because there's (a lot of) code relying on the current behaviour (or at least there's risk). Andr?s From tyler.je.reddy at gmail.com Thu Oct 25 19:18:47 2018 From: tyler.je.reddy at gmail.com (Tyler Reddy) Date: Thu, 25 Oct 2018 16:18:47 -0700 Subject: [Numpy-discussion] Reminder: weekly status meeting In-Reply-To: <1f0af61b-62f6-4f98-a5e7-6241855a7006@Canary> References: <20181024220750.pytz7dav4dabeplx@carbo> <1f0af61b-62f6-4f98-a5e7-6241855a7006@Canary> Message-ID: What exactly would you like Cython wrappers for? Some of the C++ code in scipy/sparse/sparsetools? I see you have COO.from_scipy_sparse(x) in some pydata/sparse code paths, which presumably you'd like to avoid or improve? On Thu, 25 Oct 2018 at 03:41, Hameer Abbasi wrote: > Hi! > > Sorry to miss this week?s meeting. > > If I may point out an inaccuracy in the notes: in PyData/Sparse most > things are implemented from the ground up without relying on scipy.sparse. > The only part that does rely on it is `sparse.matmul`, `sparse.dot` and > `sparse.tensordot`, as well as a few conversions to/from SciPy, if these > could depend on Cython wrappers instead that?d be nice. > > I should probably update the docs on that. If anyone is willing to discuss > pydata/sparse with me, I?ll be available for a meeting anytime. > > Best Regards, > Hameer Abbasi > > On Thursday, Oct 25, 2018 at 12:08 AM, Stefan van der Walt < > stefanv at berkeley.edu> wrote: > Hi all, > > On Mon, 22 Oct 2018 09:56:37 +0300, Matti Picus wrote: > > We therefore invite you to join us for our weekly calls, > each **Wednesday from 12:00 to 13:00 Pacific Time**. > > Detail of the next meeting (2018-10-24) is given in the agenda > > > This week's meeting notes are at: > > > https://github.com/BIDS-numpy/docs/blob/master/status_meetings/status-2018-10-24.md > > St?fan > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Thu Oct 25 22:02:20 2018 From: shoyer at gmail.com (Stephan Hoyer) Date: Thu, 25 Oct 2018 19:02:20 -0700 Subject: [Numpy-discussion] Depreciating asfortranarray and ascontiguousarray In-Reply-To: References: <8741181540498592@myt3-c573aa6fc782.qloud-c.yandex.net> Message-ID: On Thu, Oct 25, 2018 at 3:10 PM Andras Deak wrote: > On Thu, Oct 25, 2018 at 11:48 PM Joseph Fox-Rabinovitz > wrote: > > > > In that vein, would it be advisable to re-implement them as aliases for > the correctly behaving functions instead? > > > > - Joe > > Wouldn't "probably, can't be changed without breaking external code" > still apply? As I understand the suggestion for _deprecation_ is only > because there's (a lot of) code relying on the current behaviour (or > at least there's risk). I would also advocate for fixing these functions if possible (removing ndim=1). ascontiguousarray(...) is certainly more readable than asarray(... order='C'). The conservative way to handle this would be to do a deprecation cycle, specifically by issuing FutureWarning when scalars or 0d arrays are encountered as inputs. Cheers, Stephan -------------- next part -------------- An HTML attachment was scrubbed... URL: From einstein.edison at gmail.com Fri Oct 26 04:47:09 2018 From: einstein.edison at gmail.com (Hameer Abbasi) Date: Fri, 26 Oct 2018 10:47:09 +0200 Subject: [Numpy-discussion] Reminder: weekly status meeting In-Reply-To: References: <20181024220750.pytz7dav4dabeplx@carbo> <1f0af61b-62f6-4f98-a5e7-6241855a7006@Canary> Message-ID: <164a3678-0838-4bb7-84ed-92e1a249f875@Canary> Hi everyone, Like I said, we just use those to coerce SciPy arrays to native ones for compatibility. You could remove all those and the package would work fine, as long as you were using native PyData/Sparse arrays. The only core functionality dependent on scipy.sparse is matrix multiplication and the like. Everything else is for inter-operability. Best Regards, Hameer Abbasi > On Friday, Oct 26, 2018 at 1:19 AM, Tyler Reddy wrote: > What exactly would you like Cython wrappers for? Some of the C++ code in scipy/sparse/sparsetools? > > I see you have COO.from_scipy_sparse(x) in some pydata/sparse code paths, which presumably you'd like to avoid or improve? > On Thu, 25 Oct 2018 at 03:41, Hameer Abbasi wrote: > > Hi! > > > > Sorry to miss this week?s meeting. > > > > If I may point out an inaccuracy in the notes: in PyData/Sparse most things are implemented from the ground up without relying on scipy.sparse. The only part that does rely on it is `sparse.matmul`, `sparse.dot` and `sparse.tensordot`, as well as a few conversions to/from SciPy, if these could depend on Cython wrappers instead that?d be nice. > > > > I should probably update the docs on that. If anyone is willing to discuss pydata/sparse with me, I?ll be available for a meeting anytime. > > > > Best Regards, > > Hameer Abbasi > > > > > > > On Thursday, Oct 25, 2018 at 12:08 AM, Stefan van der Walt wrote: > > > Hi all, > > > > > > On Mon, 22 Oct 2018 09:56:37 +0300, Matti Picus wrote: > > > > We therefore invite you to join us for our weekly calls, > > > > each **Wednesday from 12:00 to 13:00 Pacific Time**. > > > > > > > > Detail of the next meeting (2018-10-24) is given in the agenda > > > > > > This week's meeting notes are at: > > > > > > https://github.com/BIDS-numpy/docs/blob/master/status_meetings/status-2018-10-24.md > > > > > > St?fan > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at python.org (mailto:NumPy-Discussion at python.org) > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org (mailto:NumPy-Discussion at python.org) > > https://mail.python.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefanv at berkeley.edu Fri Oct 26 13:03:14 2018 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Fri, 26 Oct 2018 10:03:14 -0700 Subject: [Numpy-discussion] Reminder: weekly status meeting In-Reply-To: <164a3678-0838-4bb7-84ed-92e1a249f875@Canary> References: <20181024220750.pytz7dav4dabeplx@carbo> <1f0af61b-62f6-4f98-a5e7-6241855a7006@Canary> <164a3678-0838-4bb7-84ed-92e1a249f875@Canary> Message-ID: <20181026170314.wqvwwc4ncudz5dzo@carbo> Hi Hameer, On Fri, 26 Oct 2018 10:47:09 +0200, Hameer Abbasi wrote: > The only core functionality dependent on scipy.sparse is matrix > multiplication and the like. Everything else is for inter-operability. Thank you for commenting here. As you know, I am enthusiastic about seeing an `sparray` equivalent to `spmatrix`. When we last spoke, my recollection was that it would be beneficial to `pydata/sparse`. Is this still correct? If not, are we now in a situation where it would be more helpful to build `sparray` based on `pydata/sparse`. If we can have a good sparse array API in place in SciPy, it may significantly simplify code in various other libraries (I'm thinking of scikit-learn, e.g.). Best regards, St?fan From einstein.edison at gmail.com Fri Oct 26 13:10:19 2018 From: einstein.edison at gmail.com (Hameer Abbasi) Date: Fri, 26 Oct 2018 19:10:19 +0200 Subject: [Numpy-discussion] Reminder: weekly status meeting In-Reply-To: <20181026170314.wqvwwc4ncudz5dzo@carbo> References: <20181024220750.pytz7dav4dabeplx@carbo> <1f0af61b-62f6-4f98-a5e7-6241855a7006@Canary> <164a3678-0838-4bb7-84ed-92e1a249f875@Canary> <20181026170314.wqvwwc4ncudz5dzo@carbo> Message-ID: Hi Stefan! PyData/Sparse is pretty far along, by January or so we should have a CSR/CSC replacement that is ND. It needs optimisation in a lot of cases but the API is compatible with NumPy and works pretty well already IMO. PyData/Sparse is pretty much independent of any changes to scipy.sparse at this point. We build on top of NumPy, not scipy.sparse. Feel free to use any or all of my code for sparray, although I think Ralf Gommers, Matthew Rocklin and others were of the opinion that the data structure should stay in PyData/Sparse and linear algebra and csgraph etc should go into SciPy. Best Regards, Hameer Abbasi > On Friday, Oct 26, 2018 at 7:03 PM, Stefan van der Walt wrote: > Hi Hameer, > > On Fri, 26 Oct 2018 10:47:09 +0200, Hameer Abbasi wrote: > > The only core functionality dependent on scipy.sparse is matrix > > multiplication and the like. Everything else is for inter-operability. > > Thank you for commenting here. > > As you know, I am enthusiastic about seeing an `sparray` equivalent to > `spmatrix`. When we last spoke, my recollection was that it would be > beneficial to `pydata/sparse`. Is this still correct? > > If not, are we now in a situation where it would be more helpful to > build `sparray` based on `pydata/sparse`. > > If we can have a good sparse array API in place in SciPy, it may > significantly simplify code in various other libraries (I'm thinking of > scikit-learn, e.g.). > > Best regards, > St?fan > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From alex.rogozhnikov at yandex.ru Fri Oct 26 15:47:15 2018 From: alex.rogozhnikov at yandex.ru (Alex Rogozhnikov) Date: Fri, 26 Oct 2018 22:47:15 +0300 Subject: [Numpy-discussion] Depreciating asfortranarray and ascontiguousarray In-Reply-To: References: <8741181540498592@myt3-c573aa6fc782.qloud-c.yandex.net> Message-ID: <15377471540583235@myt6-2fee75662a4f.qloud-c.yandex.net> An HTML attachment was scrubbed... URL: From stefanv at berkeley.edu Fri Oct 26 16:04:11 2018 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Fri, 26 Oct 2018 13:04:11 -0700 Subject: [Numpy-discussion] Depreciating asfortranarray and ascontiguousarray In-Reply-To: References: <8741181540498592@myt3-c573aa6fc782.qloud-c.yandex.net> Message-ID: <20181026200411.6ovdxezbfhpys5el@carbo> On Thu, 25 Oct 2018 19:02:20 -0700, Stephan Hoyer wrote: > I would also advocate for fixing these functions if possible (removing > ndim=1). ascontiguousarray(...) is certainly more readable than asarray(... > order='C'). I agree; these are widely used, and makes intuitive sense as part of the API. St?fan From shoyer at gmail.com Fri Oct 26 16:25:10 2018 From: shoyer at gmail.com (Stephan Hoyer) Date: Fri, 26 Oct 2018 13:25:10 -0700 Subject: [Numpy-discussion] Depreciating asfortranarray and ascontiguousarray In-Reply-To: <15377471540583235@myt6-2fee75662a4f.qloud-c.yandex.net> References: <8741181540498592@myt3-c573aa6fc782.qloud-c.yandex.net> <15377471540583235@myt6-2fee75662a4f.qloud-c.yandex.net> Message-ID: On Fri, Oct 26, 2018 at 12:55 PM Alex Rogozhnikov < alex.rogozhnikov at yandex.ru> wrote: > > The conservative way to handle this would be to do a deprecation cycle, > specifically by issuing FutureWarning when scalars or 0d arrays are > encountered as inputs. > Sounds good to me. Behavior should be scheduled for numpy 1.18? > Yes, that sounds about right to me. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Fri Oct 26 17:27:49 2018 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 27 Oct 2018 10:27:49 +1300 Subject: [Numpy-discussion] Reminder: weekly status meeting In-Reply-To: References: <20181024220750.pytz7dav4dabeplx@carbo> <1f0af61b-62f6-4f98-a5e7-6241855a7006@Canary> <164a3678-0838-4bb7-84ed-92e1a249f875@Canary> <20181026170314.wqvwwc4ncudz5dzo@carbo> Message-ID: On Sat, Oct 27, 2018 at 6:10 AM Hameer Abbasi wrote: > Hi Stefan! > > PyData/Sparse is pretty far along, by January or so we should have a > CSR/CSC replacement that is ND. It needs optimisation in a lot of cases but > the API is compatible with NumPy and works pretty well already IMO. > > PyData/Sparse is pretty much independent of any changes to scipy.sparse at > this point. We build on top of NumPy, not scipy.sparse. > > Feel free to use any or all of my code for sparray, although I think Ralf > Gommers, Matthew Rocklin and others were of the opinion that the data > structure should stay in PyData/Sparse and linear algebra and csgraph etc > should go into SciPy. > Just to make sure we're talking about the same things here: Stefan, I think with "sparray" you mean "an n-D sparse array implementation that lives in SciPy", nothing more specific? In that case pydata/sparse is the one implementation, and including it in scipy.sparse would make it "sparray". I'm currently indeed leaning towards depending on pydata/sparse rather than including it in scipy. Cheers, Ralf > Best Regards, > Hameer Abbasi > > On Friday, Oct 26, 2018 at 7:03 PM, Stefan van der Walt < > stefanv at berkeley.edu> wrote: > Hi Hameer, > > On Fri, 26 Oct 2018 10:47:09 +0200, Hameer Abbasi wrote: > > The only core functionality dependent on scipy.sparse is matrix > multiplication and the like. Everything else is for inter-operability. > > > Thank you for commenting here. > > As you know, I am enthusiastic about seeing an `sparray` equivalent to > `spmatrix`. When we last spoke, my recollection was that it would be > beneficial to `pydata/sparse`. Is this still correct? > > If not, are we now in a situation where it would be more helpful to > build `sparray` based on `pydata/sparse`. > > If we can have a good sparse array API in place in SciPy, it may > significantly simplify code in various other libraries (I'm thinking of > scikit-learn, e.g.). > > Best regards, > St?fan > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefanv at berkeley.edu Fri Oct 26 18:10:28 2018 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Fri, 26 Oct 2018 15:10:28 -0700 Subject: [Numpy-discussion] Reminder: weekly status meeting In-Reply-To: References: <20181024220750.pytz7dav4dabeplx@carbo> <1f0af61b-62f6-4f98-a5e7-6241855a7006@Canary> <164a3678-0838-4bb7-84ed-92e1a249f875@Canary> <20181026170314.wqvwwc4ncudz5dzo@carbo> Message-ID: <20181026221028.3r62dbsqpzzcvjj6@carbo> On Sat, 27 Oct 2018 10:27:49 +1300, Ralf Gommers wrote: > Just to make sure we're talking about the same things here: Stefan, I think > with "sparray" you mean "an n-D sparse array implementation that lives in > SciPy", nothing more specific? In that case pydata/sparse is the one > implementation, and including it in scipy.sparse would make it "sparray". > I'm currently indeed leaning towards depending on pydata/sparse rather than > including it in scipy. I want to double check: when we last spoke, it seemed as though certain refactorings inside of SciPy (specifically, sparray was mentioned) would simplify the life of pydata/sparse devs. That no longer seems to be the case? If our recommended route is to tell users to use pydata/sparse instead of SciPy (for the sparse array object), we probably want to get rid of our own internal implementation, and deprecate spmatrix (or, build spmatrix on top of pydata/sparse)? Once we can define a clear API for sparse arrays, we can include some algorithms that ingest those objects in SciPy. But, I'm not sure we have an API in place that will allow handover of such objects to the existing C/FORTRAN-level code. St?fan From sebastian at sipsolutions.net Fri Oct 26 18:10:12 2018 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Sat, 27 Oct 2018 00:10:12 +0200 Subject: [Numpy-discussion] Depreciating asfortranarray and ascontiguousarray In-Reply-To: References: <8741181540498592@myt3-c573aa6fc782.qloud-c.yandex.net> <15377471540583235@myt6-2fee75662a4f.qloud-c.yandex.net> Message-ID: On Fri, 2018-10-26 at 13:25 -0700, Stephan Hoyer wrote: > On Fri, Oct 26, 2018 at 12:55 PM Alex Rogozhnikov < > alex.rogozhnikov at yandex.ru> wrote: > > > > The conservative way to handle this would be to do a deprecation > > cycle, specifically by issuing FutureWarning when scalars or 0d > > arrays are encountered as inputs. > > Sounds good to me. Behavior should be scheduled for numpy 1.18? > > > > Yes, that sounds about right to me. > Is there a way to avoid the future warning? An unavoidable warning in a widely used function seems really annoying to me. Unless, the 0d thing happens rarely, but then it might be the downstream users that get the warning for no reason. - Sebastian > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From shoyer at gmail.com Fri Oct 26 19:26:09 2018 From: shoyer at gmail.com (Stephan Hoyer) Date: Fri, 26 Oct 2018 16:26:09 -0700 Subject: [Numpy-discussion] Depreciating asfortranarray and ascontiguousarray In-Reply-To: References: <8741181540498592@myt3-c573aa6fc782.qloud-c.yandex.net> <15377471540583235@myt6-2fee75662a4f.qloud-c.yandex.net> Message-ID: On Fri, Oct 26, 2018 at 3:48 PM Sebastian Berg wrote: > On Fri, 2018-10-26 at 13:25 -0700, Stephan Hoyer wrote: > > On Fri, Oct 26, 2018 at 12:55 PM Alex Rogozhnikov < > > alex.rogozhnikov at yandex.ru> wrote: > > > > > > The conservative way to handle this would be to do a deprecation > > > cycle, specifically by issuing FutureWarning when scalars or 0d > > > arrays are encountered as inputs. > > > Sounds good to me. Behavior should be scheduled for numpy 1.18? > > > > > > > Yes, that sounds about right to me. > > > > Is there a way to avoid the future warning? An unavoidable warning in a > widely used function seems really annoying to me. Unless, the 0d thing > happens rarely, but then it might be the downstream users that get the > warning for no reason. > > - Sebastian > My suspicion is that 0d arrays are rarely used as arguments to ascontiguousarray / asfortranarray. But it's hard to say for sure... -------------- next part -------------- An HTML attachment was scrubbed... URL: From teoliphant at gmail.com Fri Oct 26 19:26:51 2018 From: teoliphant at gmail.com (Travis Oliphant) Date: Fri, 26 Oct 2018 18:26:51 -0500 Subject: [Numpy-discussion] Depreciating asfortranarray and ascontiguousarray In-Reply-To: <8741181540498592@myt3-c573aa6fc782.qloud-c.yandex.net> References: <8741181540498592@myt3-c573aa6fc782.qloud-c.yandex.net> Message-ID: What is the justification for deprecation exactly? These functions have been well documented and have had the intended behavior of producing arrays with dimension at least 1 for some time. Why is it unexpected to produce arrays of at least 1 dimension? For some users this is exactly what is wanted. I don't understand the statement that behavior with 0-d arrays is unexpected. If the desire is to shrink the API of NumPy, I could see that. But, it seems odd to me to remove a much-used function with an established behavior except as part of a wider API-shrinkage effort. 0-d arrays in NumPy are a separate conversation. At this point, I think it was a mistake not to embrace 0-d arrays in NumPy from day one. In some sense 0-d arrays *are* scalars at least conceptually and for JIT-producing systems that exist now and will be growing in the future, they can be equivalent to scalars. The array scalars should become how you define what is *in* a NumPy array making them true Python types, rather than Python 1-style "instances" of a single "Dtype" object. You would then have 0-d arrays and these Python "memory" types describing what is *in* the array. There is a clear way to do this, some of which has been outlined by Nathaniel, and the rest I have an outline for how to implement. I can advise someone on how to do this. -Travis On Thu, Oct 25, 2018 at 3:17 PM Alex Rogozhnikov wrote: > Dear numpy community, > > I'm planning to depreciate np.asfortranarray and np.ascontiguousarray > functions due to their misbehavior on scalar (0-D tensors) with PR #12244. > > Current behavior (converting scalars to 1-d array with single element) > - is unexpected and contradicts to documentation > - probably, can't be changed without breaking external code > - I believe, this was a cause for poor support of 0-d arrays in mxnet. > - both functions are easily replaced with asarray(..., order='...'), which > has expected behavior > > There is no timeline for removal - we just need to discourage from using > this functions in new code. > > Function naming may be related to how numpy treats 0-d tensors specially, > and those probably should not be called arrays. > https://www.numpy.org/neps/nep-0027-zero-rank-arrarys.html > However, as a user I never thought about 0-d arrays being special and > being "not arrays". > > > Please see original discussion at github for more details > https://github.com/numpy/numpy/issues/5300 > > Your comments welcome, > Alex Rogozhnikov > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From teoliphant at gmail.com Fri Oct 26 19:47:00 2018 From: teoliphant at gmail.com (Travis Oliphant) Date: Fri, 26 Oct 2018 18:47:00 -0500 Subject: [Numpy-discussion] Depreciating asfortranarray and ascontiguousarray In-Reply-To: References: <8741181540498592@myt3-c573aa6fc782.qloud-c.yandex.net> <15377471540583235@myt6-2fee75662a4f.qloud-c.yandex.net> Message-ID: I see now the original motivation as the unfortunate situation that mxnet authors did not understand that np.ascontiguousarray returned an array of at least one dimension and perhaps used that one API to assume that NumPy did not support 0-d arrays --- which NumPy does indeed support. Certainly that situation would motivate a documentation change to help steer other future users from making the same incorrect assumption, but deprecation is a separate question entirely. I do not agree at all with the trend to remove functions from NumPy API prior to a dedicated NumPy 2.0 effort. This breaks the idea of semantic versioning for NumPy. These functions do, in fact, have a use and were very much intended to produce one-dimensional arrays --- in order to be used prior to calling C or Fortran code that expected at least a 1-d array. A lot of the SciPy wrapping code needed this behavior. It is a misinterpretation to assume this is buggy or unintended. Improving the documentation to warn about the behavior for 0-d arrays could indeed be useful. -Travis On Fri, Oct 26, 2018 at 6:27 PM Stephan Hoyer wrote: > On Fri, Oct 26, 2018 at 3:48 PM Sebastian Berg > wrote: > >> On Fri, 2018-10-26 at 13:25 -0700, Stephan Hoyer wrote: >> > On Fri, Oct 26, 2018 at 12:55 PM Alex Rogozhnikov < >> > alex.rogozhnikov at yandex.ru> wrote: >> > > >> > > The conservative way to handle this would be to do a deprecation >> > > cycle, specifically by issuing FutureWarning when scalars or 0d >> > > arrays are encountered as inputs. >> > > Sounds good to me. Behavior should be scheduled for numpy 1.18? >> > > >> > >> > Yes, that sounds about right to me. >> > >> >> Is there a way to avoid the future warning? An unavoidable warning in a >> widely used function seems really annoying to me. Unless, the 0d thing >> happens rarely, but then it might be the downstream users that get the >> warning for no reason. >> >> - Sebastian >> > > My suspicion is that 0d arrays are rarely used as arguments to > ascontiguousarray / asfortranarray. But it's hard to say for sure... > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From alex.rogozhnikov at yandex.ru Fri Oct 26 20:06:43 2018 From: alex.rogozhnikov at yandex.ru (Alex Rogozhnikov) Date: Sat, 27 Oct 2018 03:06:43 +0300 Subject: [Numpy-discussion] Depreciating asfortranarray and ascontiguousarray In-Reply-To: References: <8741181540498592@myt3-c573aa6fc782.qloud-c.yandex.net> Message-ID: <14021211540598803@sas1-890ba5c2334a.qloud-c.yandex.net> An HTML attachment was scrubbed... URL: From teoliphant at gmail.com Fri Oct 26 20:34:46 2018 From: teoliphant at gmail.com (Travis Oliphant) Date: Fri, 26 Oct 2018 19:34:46 -0500 Subject: [Numpy-discussion] Depreciating asfortranarray and ascontiguousarray In-Reply-To: <14021211540598803@sas1-890ba5c2334a.qloud-c.yandex.net> References: <8741181540498592@myt3-c573aa6fc782.qloud-c.yandex.net> <14021211540598803@sas1-890ba5c2334a.qloud-c.yandex.net> Message-ID: On Fri, Oct 26, 2018 at 7:14 PM Alex Rogozhnikov wrote: > > If the desire is to shrink the API of NumPy, I could see that. > > Very good desire, but my goal was different. > > > For some users this is exactly what is wanted. > > Maybe so, but I didn't face such example (and nobody mentioned those so > far in the discussion). > The opposite (according to the issue) happened. Mxnet example is > sufficient in my opinion. > I agree that the old motivation of APIs that would make it easy to create SciPy is no longer a major motivation for most users and even developers and so these reasons would not be very present (as well as why it wasn't even mentioned in the documentation). > Simple example: > x = np.zeros([]) > assert(x.flags.c_contiguous) > assert(np.ascontiguousarray(x).shape == x.shape) > > Behavior contradicts to documentation (shape is changed) and to name > (flags are saying - it is already c_contiguous) > > If you insist, that keeping ndmin=1 is important (I am not yet convinced, > but I am ready to believe your autority), > we can add ndmin=1 to functions' signatures, this way explicitly notifying > users about expected dimension. > I understand the lack of being convinced. This is ultimately a problem of 0-d arrays not being fully embraced and accepted by the Numeric community originally (which NumPy inherited during the early days). Is there a way to document functions that will be removed on a major version increase which don't print warnings on use? I would support this. I'm a big supporter of making a NumPy 2.0 and have been for several years. Now that Python 3 transition has happened, I think we could seriously discuss this. I'm trying to raise funding for maintenance and progress for NumPy and SciPy right now via Quansight Labs http://www.quansight.com/labs and I hope to be able to help find grants to support the wonderful efforts that have been happening for some time. While I'm thrilled and impressed by the number of amazing devs who have kept NumPy and SciPy going in mostly their spare time, it has created challenges that we have not had continuous maintenance funding to allow continuous paid development so that several people who know about the early decisions could not be retained to spend time on helping the transition. Your bringing the problem of mxnet devs is most appreciated. I will make a documentation PR. -Travis > Alex. > > > 27.10.2018, 02:27, "Travis Oliphant" : > > What is the justification for deprecation exactly? These functions have > been well documented and have had the intended behavior of producing arrays > with dimension at least 1 for some time. Why is it unexpected to produce > arrays of at least 1 dimension? For some users this is exactly what is > wanted. I don't understand the statement that behavior with 0-d arrays is > unexpected. > > If the desire is to shrink the API of NumPy, I could see that. But, it > seems odd to me to remove a much-used function with an established behavior > except as part of a wider API-shrinkage effort. > > 0-d arrays in NumPy are a separate conversation. At this point, I think > it was a mistake not to embrace 0-d arrays in NumPy from day one. In some > sense 0-d arrays *are* scalars at least conceptually and for JIT-producing > systems that exist now and will be growing in the future, they can be > equivalent to scalars. > > The array scalars should become how you define what is *in* a NumPy array > making them true Python types, rather than Python 1-style "instances" of a > single "Dtype" object. You would then have 0-d arrays and these Python > "memory" types describing what is *in* the array. > > There is a clear way to do this, some of which has been outlined by > Nathaniel, and the rest I have an outline for how to implement. I can > advise someone on how to do this. > > -Travis > > > > > On Thu, Oct 25, 2018 at 3:17 PM Alex Rogozhnikov < > alex.rogozhnikov at yandex.ru> wrote: > > Dear numpy community, > > I'm planning to depreciate np.asfortranarray and np.ascontiguousarray > functions due to their misbehavior on scalar (0-D tensors) with PR #12244. > > Current behavior (converting scalars to 1-d array with single element) > - is unexpected and contradicts to documentation > - probably, can't be changed without breaking external code > - I believe, this was a cause for poor support of 0-d arrays in mxnet. > - both functions are easily replaced with asarray(..., order='...'), which > has expected behavior > > There is no timeline for removal - we just need to discourage from using > this functions in new code. > > Function naming may be related to how numpy treats 0-d tensors specially, > and those probably should not be called arrays. > https://www.numpy.org/neps/nep-0027-zero-rank-arrarys.html > However, as a user I never thought about 0-d arrays being special and > being "not arrays". > > > Please see original discussion at github for more details > https://github.com/numpy/numpy/issues/5300 > > Your comments welcome, > Alex Rogozhnikov > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > , > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From teoliphant at gmail.com Fri Oct 26 22:12:00 2018 From: teoliphant at gmail.com (Travis Oliphant) Date: Fri, 26 Oct 2018 21:12:00 -0500 Subject: [Numpy-discussion] asanyarray vs. asarray In-Reply-To: References: <24100c7f-20fd-4eed-99b0-d37660f52223@Canary> Message-ID: On Fri, Oct 19, 2018 at 8:24 PM Marten van Kerkwijk < m.h.vankerkwijk at gmail.com> wrote: > Hi All, > > It seems there are two extreme possibilities for general functions: > 1. Put `asarray` everywhere. The main benefit that I can see is that even > if people put in list instead of arrays, one is guaranteed to have shape, > dtype, etc. But it seems a bit like calling `int` on everything that might > get used as an index, instead of letting the actual indexing do the proper > thing and call `__index__`. > Yes, actually getting a proper "array protocol" into Python would be a fantastic approach. We have been working with Lenore Mullin who is a researcher on the mathematics of arrays on what it means to be an array and believe we can come up with an actual array protocol that perhaps could be put into Python itself (though that isn't our immediate goal right now). > 2. Do not coerce at all, but rather write code assuming something is an > array already. This will often, but not always, just work for array mimics, > with coercion done only where necessary (e.g., in lower-lying C code such > as that of the ufuncs which has a smaller API surface and can be overridden > more easily). > > The current __array_function__ work may well provide us with a way to > combine both, if we (over time) move the coercion inside > `ndarray.__array_function__` so that the actual implementation *can* assume > it deals with pure ndarray - then, when relevant, calling that > implementation will be what subclasses/duck arrays can happily do (and it > is up to them to ensure this works). > Also, we could get rid of asarray entirely by changing expectations. This automatic conversion code throughout NumPy and SciPy is an example of the confusion in both of these libraries between "user-oriented interfaces" and "developer-oriented interfaces". A developer just wants the library to use duck-typing and then raise errors if you don't provide the right type (i.e. a list instead of an array). The user-interface could happen in Jupyter, or be isolated to a high-level library or meta-code approach (of which there are several possibilities for Python). > > Of course, the above does not really answer what to do in the meantime. > But perhaps it helps in thinking of what we are actually aiming for. > > One last thing: could we please stop bashing subclasses? One can subclass > essentially everything in python, often to great advantage. Subclasses such > as MaskedArray and, yes, Quantity, are widely used, and if they cause > problems perhaps that should be seen as a sign that ndarray subclassing > should be made easier and clearer. > > I agree that we can stop bashing subclasses in general. The problem with numpy subclasses is that they were made without adherence to SOLID: https://en.wikipedia.org/wiki/SOLID. In particular the Liskov substitution principle: https://en.wikipedia.org/wiki/Liskov_substitution_principle . Much of this is my fault. Being a scientist/engineer more than a computer scientist, I had no idea what these principles were and did not properly apply them in creating np.matrix which clearly violates the substitution principle. We can clean all this and more up. But, we really need to start talking about NumPy 2.0 to do it. Now that Python 3.x is really here, we can raise the money for it and get it done. We don't have to just rely on volunteer time. The world will thank us for actually pushing NumPy 2.0. I know not everyone agrees, but for whatever its worth, I feel very, very strongly about this, and despite not being very active on this list for the past years, I do have a lot of understanding about how the current code actually works (and where and why its warts are). -Travis > All the best, > > Marten > > > On Fri, Oct 19, 2018 at 7:02 PM Ralf Gommers > wrote: > >> >> >> On Fri, Oct 19, 2018 at 10:28 PM Ralf Gommers >> wrote: >> >>> >>> >>> On Fri, Oct 19, 2018 at 4:15 PM Hameer Abbasi >>> wrote: >>> >>>> Hi! >>>> >>>> On Friday, Oct 19, 2018 at 6:09 PM, Stephan Hoyer >>>> wrote: >>>> I don't think it makes much sense to change NumPy's existing usage of >>>> asarray() to asanyarray() unless we add subok=True arguments (which default >>>> to False). But this ends up cluttering NumPy's public API, which is also >>>> undesirable. >>>> >>>> Agreed so far. >>>> >>> >>> I'm not sure I agree. "subok" is very unpythonic; the average numpy >>> library function should work fine for a well-behaved subclass (i.e. most >>> things out there except np.matrix). >>> >>>> >>>> The preferred way to override NumPy functions going forward should be >>>> __array_function__. >>>> >>>> >>>> I think we should ?soft support? i.e. allow but consider unsupported, >>>> the case where one of NumPy?s functions is implemented in terms of others >>>> and ?passing through? an array results in the correct behaviour for that >>>> array. >>>> >>> >>> I don't think we have or want such a concept as "soft support". We >>> intend to not break anything that now has asanyarray, i.e. it's supported >>> and ideally we have regression tests for all such functions. For anything >>> we transition over from asarray to asanyarray, PRs should come with new >>> tests. >>> >>> >>>> >>>> On Fri, Oct 19, 2018 at 8:13 AM Marten van Kerkwijk < >>>> m.h.vankerkwijk at gmail.com> wrote: >>>> >>>>> There are exceptions for `matrix` in quite a few places, and there now >>>>> is warning for `maxtrix` - it might not be bad to use `asanyarray` and add >>>>> an exception for `maxtrix`. Indeed, I quite like the suggestion by Eric >>>>> Wieser to just add the exception to `asanyarray` itself - that way when >>>>> matrix is truly deprecated, it will be a very easy change. >>>>> >>>> I don't quite understand this. Adding exceptions is not deprecation - >>> we then may as well just rip np.matrix out straight away. >>> >>> What I suggested in the call about this issue is that it's not very >>> effective to treat functions like percentile/quantile one by one without an >>> overarching strategy. A way forward could be for someone to write an >>> overview of which sets of functions now have asanyarray (and actually work >>> with subclasses), which ones we can and want to change now, and which ones >>> we can and want to change after np.matrix is gone. Also, some guidelines >>> for new functions that we add to numpy would be handy. I suspect we've been >>> adding new functions that use asarray rather than asanyarray, which is >>> probably undesired. >>> >> >> Thanks Nathaniel and Stephan. Your comments on my other two points are >> both clear and correct (and have been made a number of times before). I >> think the "write an overview so we can stop making ad-hoc decisions and >> having these discussions" is the most important point I was trying to make >> though. If we had such a doc and it concluded "hence we don't change >> anything, __array_function__ is the only way to go" then we can just close >> PRs like https://github.com/numpy/numpy/pull/11162 straight away. >> >> Cheers, >> Ralf >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sat Oct 27 00:08:19 2018 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 27 Oct 2018 17:08:19 +1300 Subject: [Numpy-discussion] Reminder: weekly status meeting In-Reply-To: <20181026221028.3r62dbsqpzzcvjj6@carbo> References: <20181024220750.pytz7dav4dabeplx@carbo> <1f0af61b-62f6-4f98-a5e7-6241855a7006@Canary> <164a3678-0838-4bb7-84ed-92e1a249f875@Canary> <20181026170314.wqvwwc4ncudz5dzo@carbo> <20181026221028.3r62dbsqpzzcvjj6@carbo> Message-ID: On Sat, Oct 27, 2018 at 11:10 AM Stefan van der Walt wrote: > On Sat, 27 Oct 2018 10:27:49 +1300, Ralf Gommers wrote: > > Just to make sure we're talking about the same things here: Stefan, I > think > > with "sparray" you mean "an n-D sparse array implementation that lives in > > SciPy", nothing more specific? In that case pydata/sparse is the one > > implementation, and including it in scipy.sparse would make it "sparray". > > I'm currently indeed leaning towards depending on pydata/sparse rather > than > > including it in scipy. > > I want to double check: when we last spoke, it seemed as though certain > refactorings inside of SciPy (specifically, sparray was mentioned) would > simplify the life of pydata/sparse devs. That no longer seems to be the > case? > There's no such thing as `sparray` anywhere in SciPy. There's two inactive projects to create an n-D sparse array implementation, one of which is called sparray (https://github.com/perimosocordiae/sparray). And there's one very active project to do that same thing which is https://github.com/pydata/sparse > If our recommended route is to tell users to use pydata/sparse instead > of SciPy (for the sparse array object), we probably want to get rid of > our own internal implementation, and deprecate spmatrix Doc-deprecate I think; the sparse matrix classes in SciPy are very heavily used, so it doesn't make sense to start emitting deprecation warnings for them. But at some point we'll want to point users to pydata/sparse for new code. > (or, build > spmatrix on top of pydata/sparse)? > It's the matrix vs. array semantics that are the issue, so not sure that building one on top of the other would be useful. > Once we can define a clear API for sparse arrays, we can include some > algorithms that ingest those objects in SciPy. But, I'm not sure we > have an API in place that will allow handover of such objects to the > existing C/FORTRAN-level code. > I don't think the constructors for sparse matrix/array care about C/F order. pydata/sparse is pure Python (and uses Numba). For reusing scipy.sparse.linalg and scipy.sparse.csgraph you're right I think that that will need some careful design work. Not sure anyone has thought about that in a lot of detail yet. There are interesting API questions probably, such as how to treat explicit zeros (that debate still isn't settled for the matrix classes IIRC). And there's an interesting transition puzzle to figure out (which also includes np.matrix). At the moment the discussion on that is spread out over many mailing list threads and Github issues, at some point we'll need to summarize that. Probably around the time that the CSR/CSC replacement that Hameer mentioned is finished. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From wieser.eric+numpy at gmail.com Sat Oct 27 01:36:43 2018 From: wieser.eric+numpy at gmail.com (Eric Wieser) Date: Fri, 26 Oct 2018 22:36:43 -0700 Subject: [Numpy-discussion] Depreciating asfortranarray and ascontiguousarray In-Reply-To: References: <8741181540498592@myt3-c573aa6fc782.qloud-c.yandex.net> <14021211540598803@sas1-890ba5c2334a.qloud-c.yandex.net> Message-ID: in order to be used prior to calling C or Fortran code that expected at least a 1-d array I?d argue that the behavior for these functions should have just been to raise an error saying ?this function does not support 0d arrays?, rather than silently inserting extra dimensions. As a bonus, that would push the function developers to add support for 0d. Obviously we can?t make it do that now, but what we can do is have it emit a warning in those cases. I think our options are: 1. Deprecate the entire function 2. Deprecate and eventually(?) throw an error upon calling the function on 0d arrays, with a message like *?in future using ascontiguousarray to promote 0d arrays to 1d arrays will not be supported. If promotion is intentional, use ascontiguousarray(atleast1d(x)) to silence this warning and keep the old behavior, and if not use asarray(x, order='C') to preserve 0d arrays?* 3. Deprecate (future-warning) when passed 0d arrays, and eventually skip the upcast to 1d. If the calling code really needed a 1d array, then it will probably fail, which is not really different to 2, but has the advantage that the names are less surprising. 4. Only improve the documentation My preference would be 3 Eric On Fri, 26 Oct 2018 at 17:35 Travis Oliphant wrote: On Fri, Oct 26, 2018 at 7:14 PM Alex Rogozhnikov > wrote: > >> > If the desire is to shrink the API of NumPy, I could see that. >> >> Very good desire, but my goal was different. >> > >> > For some users this is exactly what is wanted. >> >> Maybe so, but I didn't face such example (and nobody mentioned those so >> far in the discussion). >> The opposite (according to the issue) happened. Mxnet example is >> sufficient in my opinion. >> > > I agree that the old motivation of APIs that would make it easy to create > SciPy is no longer a major motivation for most users and even developers > and so these reasons would not be very present (as well as why it wasn't > even mentioned in the documentation). > > >> Simple example: >> x = np.zeros([]) >> assert(x.flags.c_contiguous) >> assert(np.ascontiguousarray(x).shape == x.shape) >> >> Behavior contradicts to documentation (shape is changed) and to name >> (flags are saying - it is already c_contiguous) >> >> If you insist, that keeping ndmin=1 is important (I am not yet convinced, >> but I am ready to believe your autority), >> we can add ndmin=1 to functions' signatures, this way explicitly >> notifying users about expected dimension. >> > > I understand the lack of being convinced. This is ultimately a problem of > 0-d arrays not being fully embraced and accepted by the Numeric community > originally (which NumPy inherited during the early days). Is there a way > to document functions that will be removed on a major version increase > which don't print warnings on use? I would support this. > > I'm a big supporter of making a NumPy 2.0 and have been for several years. > Now that Python 3 transition has happened, I think we could seriously > discuss this. I'm trying to raise funding for maintenance and progress for > NumPy and SciPy right now via Quansight Labs http://www.quansight.com/labs > and I hope to be able to help find grants to support the wonderful efforts > that have been happening for some time. > > While I'm thrilled and impressed by the number of amazing devs who have > kept NumPy and SciPy going in mostly their spare time, it has created > challenges that we have not had continuous maintenance funding to allow > continuous paid development so that several people who know about the early > decisions could not be retained to spend time on helping the transition. > > Your bringing the problem of mxnet devs is most appreciated. I will make > a documentation PR. > > -Travis > > > > >> Alex. >> >> >> 27.10.2018, 02:27, "Travis Oliphant" : >> >> What is the justification for deprecation exactly? These functions have >> been well documented and have had the intended behavior of producing arrays >> with dimension at least 1 for some time. Why is it unexpected to produce >> arrays of at least 1 dimension? For some users this is exactly what is >> wanted. I don't understand the statement that behavior with 0-d arrays is >> unexpected. >> >> If the desire is to shrink the API of NumPy, I could see that. But, it >> seems odd to me to remove a much-used function with an established behavior >> except as part of a wider API-shrinkage effort. >> >> 0-d arrays in NumPy are a separate conversation. At this point, I think >> it was a mistake not to embrace 0-d arrays in NumPy from day one. In some >> sense 0-d arrays *are* scalars at least conceptually and for JIT-producing >> systems that exist now and will be growing in the future, they can be >> equivalent to scalars. >> >> The array scalars should become how you define what is *in* a NumPy array >> making them true Python types, rather than Python 1-style "instances" of a >> single "Dtype" object. You would then have 0-d arrays and these Python >> "memory" types describing what is *in* the array. >> >> There is a clear way to do this, some of which has been outlined by >> Nathaniel, and the rest I have an outline for how to implement. I can >> advise someone on how to do this. >> >> -Travis >> >> >> >> >> On Thu, Oct 25, 2018 at 3:17 PM Alex Rogozhnikov < >> alex.rogozhnikov at yandex.ru> wrote: >> >> Dear numpy community, >> >> I'm planning to depreciate np.asfortranarray and np.ascontiguousarray >> functions due to their misbehavior on scalar (0-D tensors) with PR #12244 >> . >> >> Current behavior (converting scalars to 1-d array with single element) >> - is unexpected and contradicts to documentation >> - probably, can't be changed without breaking external code >> - I believe, this was a cause for poor support of 0-d arrays in mxnet. >> - both functions are easily replaced with asarray(..., order='...'), >> which has expected behavior >> >> There is no timeline for removal - we just need to discourage from using >> this functions in new code. >> >> Function naming may be related to how numpy treats 0-d tensors specially, >> >> and those probably should not be called arrays. >> https://www.numpy.org/neps/nep-0027-zero-rank-arrarys.html >> However, as a user I never thought about 0-d arrays being special and >> being "not arrays". >> >> >> Please see original discussion at github for more details >> https://github.com/numpy/numpy/issues/5300 >> >> Your comments welcome, >> Alex Rogozhnikov >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> >> , >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sat Oct 27 02:29:47 2018 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 27 Oct 2018 19:29:47 +1300 Subject: [Numpy-discussion] Depreciating asfortranarray and ascontiguousarray In-Reply-To: References: <8741181540498592@myt3-c573aa6fc782.qloud-c.yandex.net> <14021211540598803@sas1-890ba5c2334a.qloud-c.yandex.net> Message-ID: On Sat, Oct 27, 2018 at 6:37 PM Eric Wieser wrote: > in order to be used prior to calling C or Fortran code that expected at > least a 1-d array > > I?d argue that the behavior for these functions should have just been to > raise an error saying ?this function does not support 0d arrays?, rather > than silently inserting extra dimensions. As a bonus, that would push the > function developers to add support for 0d. Obviously we can?t make it do > that now, but what we can do is have it emit a warning in those cases. > > I think our options are: > > 1. Deprecate the entire function > 2. Deprecate and eventually(?) throw an error upon calling the > function on 0d arrays, with a message like *?in future using > ascontiguousarray to promote 0d arrays to 1d arrays will not be supported. > If promotion is intentional, use ascontiguousarray(atleast1d(x)) to silence > this warning and keep the old behavior, and if not use asarray(x, > order='C') to preserve 0d arrays?* > 3. Deprecate (future-warning) when passed 0d arrays, and eventually > skip the upcast to 1d. > If the calling code really needed a 1d array, then it will probably > fail, which is not really different to 2, but has the advantage that the > names are less surprising. > 4. Only improve the documentation > > My preference would be 3 > I'd go for 4, or alternatively for the warning in 2 (which can be left in place indefinitely). 1 is unwarranted, and 3 will change behavior which is worse than just warning or stopping to support existing behavior (= 2). Eric > > On Fri, 26 Oct 2018 at 17:35 Travis Oliphant wrote: > > On Fri, Oct 26, 2018 at 7:14 PM Alex Rogozhnikov < >> alex.rogozhnikov at yandex.ru> wrote: >> >>> > If the desire is to shrink the API of NumPy, I could see that. >>> >>> Very good desire, but my goal was different. >>> >> >>> > For some users this is exactly what is wanted. >>> >>> Maybe so, but I didn't face such example (and nobody mentioned those so >>> far in the discussion). >>> The opposite (according to the issue) happened. Mxnet example is >>> sufficient in my opinion. >>> >> >> I agree that the old motivation of APIs that would make it easy to create >> SciPy is no longer a major motivation for most users and even developers >> and so these reasons would not be very present (as well as why it wasn't >> even mentioned in the documentation). >> >> >>> Simple example: >>> x = np.zeros([]) >>> assert(x.flags.c_contiguous) >>> assert(np.ascontiguousarray(x).shape == x.shape) >>> >>> Behavior contradicts to documentation (shape is changed) and to name >>> (flags are saying - it is already c_contiguous) >>> >>> If you insist, that keeping ndmin=1 is important (I am not yet >>> convinced, but I am ready to believe your autority), >>> we can add ndmin=1 to functions' signatures, this way explicitly >>> notifying users about expected dimension. >>> >> >> I understand the lack of being convinced. This is ultimately a problem >> of 0-d arrays not being fully embraced and accepted by the Numeric >> community originally (which NumPy inherited during the early days). Is >> there a way to document functions that will be removed on a major version >> increase which don't print warnings on use? I would support this. >> > No, there's no such thing at the moment - the closest thing is https://github.com/numpy/numpy/wiki/Backwards-incompatible-ideas-for-a-major-release. I doubt we want such a thing anyway - removing functions without deprecation warnings first doesn't seem quite right. > >> I'm a big supporter of making a NumPy 2.0 and have been for several >> years. Now that Python 3 transition has happened, I think we could >> seriously discuss this. >> > I think it's more helpful to discuss goals and concrete plans for those, rather than a "NumPy 2.0" label. The latter never worked in the past, and not just because of lack of time/funding - it just means different things to different people. We now have a good start on what our major goals are ( http://www.numpy.org/neps/#roadmap), let's build on that. I'm trying to raise funding for maintenance and progress for NumPy and >> SciPy right now via Quansight Labs http://www.quansight.com/labs and I >> hope to be able to help find grants to support the wonderful efforts that >> have been happening for some time. >> > The NumPy grant and having Tyler/Matti/Stefan at BIDS is a great start to funded development; more and more diverse funding sources would be awesome. Cheers, Ralf >> While I'm thrilled and impressed by the number of amazing devs who have >> kept NumPy and SciPy going in mostly their spare time, it has created >> challenges that we have not had continuous maintenance funding to allow >> continuous paid development so that several people who know about the early >> decisions could not be retained to spend time on helping the transition. >> >> Your bringing the problem of mxnet devs is most appreciated. I will make >> a documentation PR. >> >> -Travis >> >> >> >> >>> Alex. >>> >>> >>> 27.10.2018, 02:27, "Travis Oliphant" : >>> >>> What is the justification for deprecation exactly? These functions have >>> been well documented and have had the intended behavior of producing arrays >>> with dimension at least 1 for some time. Why is it unexpected to produce >>> arrays of at least 1 dimension? For some users this is exactly what is >>> wanted. I don't understand the statement that behavior with 0-d arrays is >>> unexpected. >>> >>> If the desire is to shrink the API of NumPy, I could see that. But, it >>> seems odd to me to remove a much-used function with an established behavior >>> except as part of a wider API-shrinkage effort. >>> >>> 0-d arrays in NumPy are a separate conversation. At this point, I think >>> it was a mistake not to embrace 0-d arrays in NumPy from day one. In some >>> sense 0-d arrays *are* scalars at least conceptually and for JIT-producing >>> systems that exist now and will be growing in the future, they can be >>> equivalent to scalars. >>> >>> The array scalars should become how you define what is *in* a NumPy >>> array making them true Python types, rather than Python 1-style "instances" >>> of a single "Dtype" object. You would then have 0-d arrays and these >>> Python "memory" types describing what is *in* the array. >>> >>> There is a clear way to do this, some of which has been outlined by >>> Nathaniel, and the rest I have an outline for how to implement. I can >>> advise someone on how to do this. >>> >>> -Travis >>> >>> >>> >>> >>> On Thu, Oct 25, 2018 at 3:17 PM Alex Rogozhnikov < >>> alex.rogozhnikov at yandex.ru> wrote: >>> >>> Dear numpy community, >>> >>> I'm planning to depreciate np.asfortranarray and np.ascontiguousarray >>> functions due to their misbehavior on scalar (0-D tensors) with PR >>> #12244. >>> >>> Current behavior (converting scalars to 1-d array with single element) >>> - is unexpected and contradicts to documentation >>> - probably, can't be changed without breaking external code >>> - I believe, this was a cause for poor support of 0-d arrays in mxnet. >>> - both functions are easily replaced with asarray(..., order='...'), >>> which has expected behavior >>> >>> There is no timeline for removal - we just need to discourage from using >>> this functions in new code. >>> >>> Function naming may be related to how numpy treats 0-d tensors >>> specially, >>> and those probably should not be called arrays. >>> https://www.numpy.org/neps/nep-0027-zero-rank-arrarys.html >>> However, as a user I never thought about 0-d arrays being special and >>> being "not arrays". >>> >>> >>> Please see original discussion at github for more details >>> https://github.com/numpy/numpy/issues/5300 >>> >>> Your comments welcome, >>> Alex Rogozhnikov >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >>> , >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From einstein.edison at gmail.com Sat Oct 27 04:29:01 2018 From: einstein.edison at gmail.com (Hameer Abbasi) Date: Sat, 27 Oct 2018 10:29:01 +0200 Subject: [Numpy-discussion] Reminder: weekly status meeting In-Reply-To: <20181026221028.3r62dbsqpzzcvjj6@carbo> References: <20181024220750.pytz7dav4dabeplx@carbo> <1f0af61b-62f6-4f98-a5e7-6241855a7006@Canary> <164a3678-0838-4bb7-84ed-92e1a249f875@Canary> <20181026170314.wqvwwc4ncudz5dzo@carbo> <20181026221028.3r62dbsqpzzcvjj6@carbo> Message-ID: <3df79185-ae85-4973-b0f2-78ae8e76acbb@Canary> > On Saturday, Oct 27, 2018 at 12:10 AM, Stefan van der Walt wrote: > On Sat, 27 Oct 2018 10:27:49 +1300, Ralf Gommers wrote: > > Just to make sure we're talking about the same things here: Stefan, I think > > with "sparray" you mean "an n-D sparse array implementation that lives in > > SciPy", nothing more specific? In that case pydata/sparse is the one > > implementation, and including it in scipy.sparse would make it "sparray". > > I'm currently indeed leaning towards depending on pydata/sparse rather than > > including it in scipy. > > I want to double check: when we last spoke, it seemed as though certain > refactorings inside of SciPy (specifically, sparray was mentioned) would > simplify the life of pydata/sparse devs. That no longer seems to be the > case? Hi! I can?t recall having said this, perhaps you inferred it from the docs (it?s on the front page, so that isn?t unreasonable). We should update that sometime. That said, we use very little of scipy.sparse in PyData/Sparse. When Matt Rocklin was maintaining the project, that was the case, but even in the later days he shifted much of his code to pure NumPy. I followed that path further, not out of unwillingness to depend on it, but out of desire for generality. In its current state, the only things in PyData/Sparse that depend on scipy.sparse are: Conversion to/from scipy.sparse spmatrix classes A bit of linear algebra i.e. dot, tensordot, matmul. Best Regards, Hameer Abbasi > > If our recommended route is to tell users to use pydata/sparse instead > of SciPy (for the sparse array object), we probably want to get rid of > our own internal implementation, and deprecate spmatrix (or, build > spmatrix on top of pydata/sparse)? > > Once we can define a clear API for sparse arrays, we can include some > algorithms that ingest those objects in SciPy. But, I'm not sure we > have an API in place that will allow handover of such objects to the > existing C/FORTRAN-level code. > > St?fan > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From einstein.edison at gmail.com Sat Oct 27 04:34:42 2018 From: einstein.edison at gmail.com (Hameer Abbasi) Date: Sat, 27 Oct 2018 10:34:42 +0200 Subject: [Numpy-discussion] Reminder: weekly status meeting In-Reply-To: References: <20181024220750.pytz7dav4dabeplx@carbo> <1f0af61b-62f6-4f98-a5e7-6241855a7006@Canary> <164a3678-0838-4bb7-84ed-92e1a249f875@Canary> <20181026170314.wqvwwc4ncudz5dzo@carbo> <20181026221028.3r62dbsqpzzcvjj6@carbo> Message-ID: > On Saturday, Oct 27, 2018 at 6:11 AM, Ralf Gommers wrote: > > > On Sat, Oct 27, 2018 at 11:10 AM Stefan van der Walt wrote: > > On Sat, 27 Oct 2018 10:27:49 +1300, Ralf Gommers wrote: > > > Just to make sure we're talking about the same things here: Stefan, I think > > > with "sparray" you mean "an n-D sparse array implementation that lives in > > > SciPy", nothing more specific? In that case pydata/sparse is the one > > > implementation, and including it in scipy.sparse would make it "sparray". > > > I'm currently indeed leaning towards depending on pydata/sparse rather than > > > including it in scipy. > > > > I want to double check: when we last spoke, it seemed as though certain > > refactorings inside of SciPy (specifically, sparray was mentioned) would > > simplify the life of pydata/sparse devs. That no longer seems to be the > > case? > > There's no such thing as `sparray` anywhere in SciPy. There's two inactive projects to create an n-D sparse array implementation, one of which is called sparray (https://github.com/perimosocordiae/sparray). And there's one very active project to do that same thing which is https://github.com/pydata/sparse > > > > > If our recommended route is to tell users to use pydata/sparse instead > > of SciPy (for the sparse array object), we probably want to get rid of > > our own internal implementation, and deprecate spmatrix > > Doc-deprecate I think; the sparse matrix classes in SciPy are very heavily used, so it doesn't make sense to start emitting deprecation warnings for them. But at some point we'll want to point users to pydata/sparse for new code. > > > (or, build > > spmatrix on top of pydata/sparse)? > > It's the matrix vs. array semantics that are the issue, so not sure that building one on top of the other would be useful. > > > > > Once we can define a clear API for sparse arrays, we can include some > > algorithms that ingest those objects in SciPy. But, I'm not sure we > > have an API in place that will allow handover of such objects to the > > existing C/FORTRAN-level code. > > I don't think the constructors for sparse matrix/array care about C/F order. pydata/sparse is pure Python (and uses Numba). For reusing scipy.sparse.linalg and scipy.sparse.csgraph you're right I think that that will need some careful design work. Not sure anyone has thought about that in a lot of detail yet. > They don?t yet. That is a planned feature, allowing an arbitrary permutation of input coordinates. > > There are interesting API questions probably, such as how to treat explicit zeros (that debate still isn't settled for the matrix classes IIRC). > Explicit zeros are easier now, just use a fill_value of NaN and work with zeros as usual. Best Regards, Hameer Abbasi > > And there's an interesting transition puzzle to figure out (which also includes np.matrix). At the moment the discussion on that is spread out over many mailing list threads and Github issues, at some point we'll need to summarize that. Probably around the time that the CSR/CSC replacement that Hameer mentioned is finished. > > Cheers, > Ralf > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From sylvain.corlay at gmail.com Sat Oct 27 11:16:56 2018 From: sylvain.corlay at gmail.com (Sylvain Corlay) Date: Sat, 27 Oct 2018 17:16:56 +0200 Subject: [Numpy-discussion] Depreciating asfortranarray and ascontiguousarray In-Reply-To: References: <8741181540498592@myt3-c573aa6fc782.qloud-c.yandex.net> Message-ID: I would also argue against deprecating these functions that we are using increasingly in several projects that I am involved in. On Sat, Oct 27, 2018, 01:28 Travis Oliphant wrote: > What is the justification for deprecation exactly? These functions have > been well documented and have had the intended behavior of producing arrays > with dimension at least 1 for some time. Why is it unexpected to produce > arrays of at least 1 dimension? For some users this is exactly what is > wanted. I don't understand the statement that behavior with 0-d arrays is > unexpected. > > If the desire is to shrink the API of NumPy, I could see that. But, it > seems odd to me to remove a much-used function with an established behavior > except as part of a wider API-shrinkage effort. > > 0-d arrays in NumPy are a separate conversation. At this point, I think > it was a mistake not to embrace 0-d arrays in NumPy from day one. In some > sense 0-d arrays *are* scalars at least conceptually and for JIT-producing > systems that exist now and will be growing in the future, they can be > equivalent to scalars. > > The array scalars should become how you define what is *in* a NumPy array > making them true Python types, rather than Python 1-style "instances" of a > single "Dtype" object. You would then have 0-d arrays and these Python > "memory" types describing what is *in* the array. > > There is a clear way to do this, some of which has been outlined by > Nathaniel, and the rest I have an outline for how to implement. I can > advise someone on how to do this. > > -Travis > > > > > On Thu, Oct 25, 2018 at 3:17 PM Alex Rogozhnikov < > alex.rogozhnikov at yandex.ru> wrote: > >> Dear numpy community, >> >> I'm planning to depreciate np.asfortranarray and np.ascontiguousarray >> functions due to their misbehavior on scalar (0-D tensors) with PR #12244 >> . >> >> Current behavior (converting scalars to 1-d array with single element) >> - is unexpected and contradicts to documentation >> - probably, can't be changed without breaking external code >> - I believe, this was a cause for poor support of 0-d arrays in mxnet. >> - both functions are easily replaced with asarray(..., order='...'), >> which has expected behavior >> >> There is no timeline for removal - we just need to discourage from using >> this functions in new code. >> >> Function naming may be related to how numpy treats 0-d tensors specially, >> >> and those probably should not be called arrays. >> https://www.numpy.org/neps/nep-0027-zero-rank-arrarys.html >> However, as a user I never thought about 0-d arrays being special and >> being "not arrays". >> >> >> Please see original discussion at github for more details >> https://github.com/numpy/numpy/issues/5300 >> >> Your comments welcome, >> Alex Rogozhnikov >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From teoliphant at gmail.com Sat Oct 27 12:00:58 2018 From: teoliphant at gmail.com (Travis Oliphant) Date: Sat, 27 Oct 2018 11:00:58 -0500 Subject: [Numpy-discussion] Depreciating asfortranarray and ascontiguousarray In-Reply-To: References: <8741181540498592@myt3-c573aa6fc782.qloud-c.yandex.net> <14021211540598803@sas1-890ba5c2334a.qloud-c.yandex.net> Message-ID: I agree with Number 2 and 4. On Sat, Oct 27, 2018 at 12:38 AM Eric Wieser wrote: > in order to be used prior to calling C or Fortran code that expected at > least a 1-d array > > I?d argue that the behavior for these functions should have just been to > raise an error saying ?this function does not support 0d arrays?, rather > than silently inserting extra dimensions. As a bonus, that would push the > function developers to add support for 0d. Obviously we can?t make it do > that now, but what we can do is have it emit a warning in those cases. > > I think our options are: > > 1. Deprecate the entire function > 2. Deprecate and eventually(?) throw an error upon calling the > function on 0d arrays, with a message like *?in future using > ascontiguousarray to promote 0d arrays to 1d arrays will not be supported. > If promotion is intentional, use ascontiguousarray(atleast1d(x)) to silence > this warning and keep the old behavior, and if not use asarray(x, > order='C') to preserve 0d arrays?* > 3. Deprecate (future-warning) when passed 0d arrays, and eventually > skip the upcast to 1d. > If the calling code really needed a 1d array, then it will probably > fail, which is not really different to 2, but has the advantage that the > names are less surprising. > 4. Only improve the documentation > > My preference would be 3 > > Eric > > On Fri, 26 Oct 2018 at 17:35 Travis Oliphant wrote: > > On Fri, Oct 26, 2018 at 7:14 PM Alex Rogozhnikov < >> alex.rogozhnikov at yandex.ru> wrote: >> >>> > If the desire is to shrink the API of NumPy, I could see that. >>> >>> Very good desire, but my goal was different. >>> >> >>> > For some users this is exactly what is wanted. >>> >>> Maybe so, but I didn't face such example (and nobody mentioned those so >>> far in the discussion). >>> The opposite (according to the issue) happened. Mxnet example is >>> sufficient in my opinion. >>> >> >> I agree that the old motivation of APIs that would make it easy to create >> SciPy is no longer a major motivation for most users and even developers >> and so these reasons would not be very present (as well as why it wasn't >> even mentioned in the documentation). >> >> >>> Simple example: >>> x = np.zeros([]) >>> assert(x.flags.c_contiguous) >>> assert(np.ascontiguousarray(x).shape == x.shape) >>> >>> Behavior contradicts to documentation (shape is changed) and to name >>> (flags are saying - it is already c_contiguous) >>> >>> If you insist, that keeping ndmin=1 is important (I am not yet >>> convinced, but I am ready to believe your autority), >>> we can add ndmin=1 to functions' signatures, this way explicitly >>> notifying users about expected dimension. >>> >> >> I understand the lack of being convinced. This is ultimately a problem >> of 0-d arrays not being fully embraced and accepted by the Numeric >> community originally (which NumPy inherited during the early days). Is >> there a way to document functions that will be removed on a major version >> increase which don't print warnings on use? I would support this. >> >> I'm a big supporter of making a NumPy 2.0 and have been for several >> years. Now that Python 3 transition has happened, I think we could >> seriously discuss this. I'm trying to raise funding for maintenance and >> progress for NumPy and SciPy right now via Quansight Labs >> http://www.quansight.com/labs and I hope to be able to help find grants >> to support the wonderful efforts that have been happening for some time. >> >> While I'm thrilled and impressed by the number of amazing devs who have >> kept NumPy and SciPy going in mostly their spare time, it has created >> challenges that we have not had continuous maintenance funding to allow >> continuous paid development so that several people who know about the early >> decisions could not be retained to spend time on helping the transition. >> >> Your bringing the problem of mxnet devs is most appreciated. I will make >> a documentation PR. >> >> -Travis >> >> >> >> >>> Alex. >>> >>> >>> 27.10.2018, 02:27, "Travis Oliphant" : >>> >>> What is the justification for deprecation exactly? These functions have >>> been well documented and have had the intended behavior of producing arrays >>> with dimension at least 1 for some time. Why is it unexpected to produce >>> arrays of at least 1 dimension? For some users this is exactly what is >>> wanted. I don't understand the statement that behavior with 0-d arrays is >>> unexpected. >>> >>> If the desire is to shrink the API of NumPy, I could see that. But, it >>> seems odd to me to remove a much-used function with an established behavior >>> except as part of a wider API-shrinkage effort. >>> >>> 0-d arrays in NumPy are a separate conversation. At this point, I think >>> it was a mistake not to embrace 0-d arrays in NumPy from day one. In some >>> sense 0-d arrays *are* scalars at least conceptually and for JIT-producing >>> systems that exist now and will be growing in the future, they can be >>> equivalent to scalars. >>> >>> The array scalars should become how you define what is *in* a NumPy >>> array making them true Python types, rather than Python 1-style "instances" >>> of a single "Dtype" object. You would then have 0-d arrays and these >>> Python "memory" types describing what is *in* the array. >>> >>> There is a clear way to do this, some of which has been outlined by >>> Nathaniel, and the rest I have an outline for how to implement. I can >>> advise someone on how to do this. >>> >>> -Travis >>> >>> >>> >>> >>> On Thu, Oct 25, 2018 at 3:17 PM Alex Rogozhnikov < >>> alex.rogozhnikov at yandex.ru> wrote: >>> >>> Dear numpy community, >>> >>> I'm planning to depreciate np.asfortranarray and np.ascontiguousarray >>> functions due to their misbehavior on scalar (0-D tensors) with PR >>> #12244. >>> >>> Current behavior (converting scalars to 1-d array with single element) >>> - is unexpected and contradicts to documentation >>> - probably, can't be changed without breaking external code >>> - I believe, this was a cause for poor support of 0-d arrays in mxnet. >>> - both functions are easily replaced with asarray(..., order='...'), >>> which has expected behavior >>> >>> There is no timeline for removal - we just need to discourage from using >>> this functions in new code. >>> >>> Function naming may be related to how numpy treats 0-d tensors >>> specially, >>> and those probably should not be called arrays. >>> https://www.numpy.org/neps/nep-0027-zero-rank-arrarys.html >>> However, as a user I never thought about 0-d arrays being special and >>> being "not arrays". >>> >>> >>> Please see original discussion at github for more details >>> https://github.com/numpy/numpy/issues/5300 >>> >>> Your comments welcome, >>> Alex Rogozhnikov >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >>> , >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > ? > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Mon Oct 29 19:30:44 2018 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 29 Oct 2018 16:30:44 -0700 Subject: [Numpy-discussion] asanyarray vs. asarray In-Reply-To: References: <24100c7f-20fd-4eed-99b0-d37660f52223@Canary> Message-ID: On Fri, Oct 26, 2018 at 7:12 PM, Travis Oliphant wrote: > agree that we can stop bashing subclasses in general. The problem with > numpy subclasses is that they were made without adherence to SOLID: > https://en.wikipedia.org/wiki/SOLID. In particular the Liskov > substitution principle: https://en.wikipedia.org/wiki/ > Liskov_substitution_principle . > ... > did not properly apply them in creating np.matrix which clearly violates > the substitution principle. > So -- could a matrix subclass be made "properly"? or is that an example of something that should not have been a subclass? -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Mon Oct 29 23:54:17 2018 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Mon, 29 Oct 2018 20:54:17 -0700 Subject: [Numpy-discussion] asanyarray vs. asarray In-Reply-To: References: <24100c7f-20fd-4eed-99b0-d37660f52223@Canary> Message-ID: On Mon, Oct 29, 2018 at 4:31 PM Chris Barker wrote: > On Fri, Oct 26, 2018 at 7:12 PM, Travis Oliphant > wrote: > > >> agree that we can stop bashing subclasses in general. The problem with >> numpy subclasses is that they were made without adherence to SOLID: >> https://en.wikipedia.org/wiki/SOLID. In particular the Liskov >> substitution principle: >> https://en.wikipedia.org/wiki/Liskov_substitution_principle . >> > > ... > > >> did not properly apply them in creating np.matrix which clearly violates >> the substitution principle. >> > > So -- could a matrix subclass be made "properly"? or is that an example of > something that should not have been a subclass? > The latter - changing the behavior of multiplication breaks the principle. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From wieser.eric+numpy at gmail.com Tue Oct 30 00:47:54 2018 From: wieser.eric+numpy at gmail.com (Eric Wieser) Date: Mon, 29 Oct 2018 21:47:54 -0700 Subject: [Numpy-discussion] asanyarray vs. asarray In-Reply-To: References: <24100c7f-20fd-4eed-99b0-d37660f52223@Canary> Message-ID: The latter - changing the behavior of multiplication breaks the principle. But this is not the main reason for deprecating matrix - almost all of the problems I?ve seen have been caused by the way that matrices behave when sliced. The way that m[i][j] and m[i,j] are different is just one example of this, the fact that they must be 2d is another. Matrices behaving differently on multiplication isn?t super different in my mind to how string arrays fail to multiply at all. Eric On Mon, 29 Oct 2018 at 20:54 Ralf Gommers wrote: On Mon, Oct 29, 2018 at 4:31 PM Chris Barker wrote: > >> On Fri, Oct 26, 2018 at 7:12 PM, Travis Oliphant >> wrote: >> >> >>> agree that we can stop bashing subclasses in general. The problem >>> with numpy subclasses is that they were made without adherence to SOLID: >>> https://en.wikipedia.org/wiki/SOLID. In particular the Liskov >>> substitution principle: >>> https://en.wikipedia.org/wiki/Liskov_substitution_principle . >>> >> >> ... >> >> >>> did not properly apply them in creating np.matrix which clearly violates >>> the substitution principle. >>> >> >> So -- could a matrix subclass be made "properly"? or is that an example >> of something that should not have been a subclass? >> > > The latter - changing the behavior of multiplication breaks the principle. > > Ralf > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From matti.picus at gmail.com Tue Oct 30 05:04:04 2018 From: matti.picus at gmail.com (Matti Picus) Date: Tue, 30 Oct 2018 11:04:04 +0200 Subject: [Numpy-discussion] Attribute hiding APIs for PyArrayObject Message-ID: TL;DR - should we revert the attribute-hiding constructs in ndarraytypes.h and unify PyArrayObject_fields with PyArrayObject? Background NumPy 1.8 deprecated direct access to PyArrayObject fields. It made PyArrayObject "opaque", and hid the fields behind a PyArrayObject_fields structure https://github.com/numpy/numpy/blob/v1.15.3/numpy/core/include/numpy/ndarraytypes.h#L659 with a comment about moving this to a private header. In order to access the fields, users are supposed to use PyArray_FIELDNAME functions, like PyArray_DATA and PyArray_NDIM. It seems there were thoughts at the time that numpy might move away from a C-struct based underlying data structure. Other changes were also made to enum names, but those are relatively painless to find-and-replace. NumPy has a mechanism to manage deprecating APIs, C users define NPY_NO_DEPRICATED_API to a desired level, say NPY_1_8_API_VERSION, and can then access the API "as if" they were using NumPy 1.8. Users who do not define NPY_NO_DEPRICATED_API get a warning when compiling, and default to the pre-1.8 API (aliasing of PyArrayObject to PyArrayObject_fields and direct access to the C struct fields). This is convenient for downstream users, both since the new API does not provide much added value, and it is much easier to write a->nd than PyArray_NDIM(a). For instance, pandas uses direct assignment to the data field for fast json parsing https://github.com/pandas-dev/pandas/blob/master/pandas/_libs/src/ujson/python/JSONtoObj.c#L203 via chunks. Working around the new API in pandas would require more engineering. Also, for example, cython has a mechanism to transpile python code into C, mapping slow python attribute lookup to fast C struct field access https://cython.readthedocs.io/en/latest/src/userguide/extension_types.html#external-extension-types In a parallel but not really related universe, cython recently upgraded the object mapping so that we can quiet the annoying "size changed" runtime warning https://github.com/numpy/numpy/issues/11788 without requiring warning filters, but that requires updating the numpy.pxd file provided with cython, and it was proposed that NumPy actually vendor its own file rather than depending on the cython one (https://github.com/numpy/numpy/issues/11803). The problem We have now made further changes to our API. In NumPy 1.14 we changed UPDATEIFCOPY to WRITEBACKIFCOPY, and in 1.16 we would like to deprecate PyArray_SetNumericOps and PyArray_GetNumericOps. The strange warning when NPY_NO_DEPRICATED_API is annoying. The new API cannot be supported by cython without some deep surgery (https://github.com/cython/cython/pull/2640). When I tried dogfooding an updated numpy.pxd for the only cython code in NumPy, mtrand.pxy, I came across some of these issues (https://github.com/numpy/numpy/pull/12284). Forcing the new API will require downstream users to refactor code or re-engineer constructs, as in the pandas example above. The question Is the attribute-hiding effort worth it? Should we give up, revert the PyArrayObject/PyArrayObject_fields division and allow direct access from C to the numpy internals? Is there another path forward that is less painful? Matti From alex.rogozhnikov at yandex.ru Tue Oct 30 05:57:34 2018 From: alex.rogozhnikov at yandex.ru (Alex Rogozhnikov) Date: Tue, 30 Oct 2018 12:57:34 +0300 Subject: [Numpy-discussion] einops 0.1 Message-ID: <859711540893454@iva3-294f9af87d55.qloud-c.yandex.net> An HTML attachment was scrubbed... URL: From ewm at redtetrahedron.org Tue Oct 30 08:14:38 2018 From: ewm at redtetrahedron.org (Eric Moore) Date: Tue, 30 Oct 2018 08:14:38 -0400 Subject: [Numpy-discussion] asanyarray vs. asarray In-Reply-To: References: <24100c7f-20fd-4eed-99b0-d37660f52223@Canary> Message-ID: On Tue, Oct 30, 2018 at 12:49 AM Eric Wieser wrote: > The latter - changing the behavior of multiplication breaks the principle. > > But this is not the main reason for deprecating matrix - almost all of the > problems I?ve seen have been caused by the way that matrices behave when > sliced. The way that m[i][j] and m[i,j] are different is just one example > of this, the fact that they must be 2d is another. > > Matrices behaving differently on multiplication isn?t super different in > my mind to how string arrays fail to multiply at all. > The difference is that string arrays are not numeric. This is an issue since people want to pass a matrix Into places that want to multiple element wise but that then breaks that code unless special provisions are taken. Numerical codes don?t work on string arrays anyway. Eric Eric > > On Mon, 29 Oct 2018 at 20:54 Ralf Gommers wrote: > > On Mon, Oct 29, 2018 at 4:31 PM Chris Barker >> wrote: >> >>> On Fri, Oct 26, 2018 at 7:12 PM, Travis Oliphant >>> wrote: >>> >>> >>>> agree that we can stop bashing subclasses in general. The problem >>>> with numpy subclasses is that they were made without adherence to SOLID: >>>> https://en.wikipedia.org/wiki/SOLID. In particular the Liskov >>>> substitution principle: >>>> https://en.wikipedia.org/wiki/Liskov_substitution_principle . >>>> >>> >>> ... >>> >>> >>>> did not properly apply them in creating np.matrix which clearly >>>> violates the substitution principle. >>>> >>> >>> So -- could a matrix subclass be made "properly"? or is that an example >>> of something that should not have been a subclass? >>> >> >> The latter - changing the behavior of multiplication breaks the principle. >> >> Ralf >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > ? > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Tue Oct 30 10:54:07 2018 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 30 Oct 2018 15:54:07 +0100 Subject: [Numpy-discussion] Depreciating asfortranarray and ascontiguousarray In-Reply-To: References: <8741181540498592@myt3-c573aa6fc782.qloud-c.yandex.net> <14021211540598803@sas1-890ba5c2334a.qloud-c.yandex.net> Message-ID: <4f2c7ca80346d277a3e8ccdb2e47ef967467033d.camel@sipsolutions.net> On Sat, 2018-10-27 at 19:29 +1300, Ralf Gommers wrote: > > > On Sat, Oct 27, 2018 at 6:37 PM Eric Wieser < > wieser.eric+numpy at gmail.com> wrote: > > > in order to be used prior to calling C or Fortran code that > > > expected at least a 1-d array > > > > > > > > > > I'm a big supporter of making a NumPy 2.0 and have been for > > > several years. Now that Python 3 transition has happened, I think > > > we could seriously discuss this. > > I think it's more helpful to discuss goals and concrete plans for > those, rather than a "NumPy 2.0" label. The latter never worked in > the past, and not just because of lack of time/funding - it just > means different things to different people. We now have a good start > on what our major goals are (http://www.numpy.org/neps/#roadmap), > let's build on that. I agree. I do think that we should not be scared of a major release. But, I would rather see it as a step towards, for example, better dtypes. Aiming for a large cleanup seems like it might be a can of worms [0]. About the asfortranarray/ascontiguousarray thing. I am not sure I like FutureWarnings in the edge cases, it seems likely they arise randomly on functions where the devs may not even be aware of it. I do not like spamming the the API, but if we cannot agree on a nice way forward, maybe this is a point where creating new names is an options: * ascorderarray/asforderarray * asccontiguousarray/asfcontigouousarray * np.asarray(..., order='C'), is somewhat the same I guess not sure I like the names too much, but I think we could find new names here. And then putting warnings is IMO OK, if there is a an easy/nice enough way to avoid them (sure we can start in documentation if it helps). We can wait for the actual removal for very long and at least until the next major release or so, I do not think it matters much as long as visible deprecation warnings exist to push downstream into changing habits/code, the maintenance burden is pretty much zero after all. Discussing how to approach larger changes is important, but I doubt that these particular functions are problematic enough! - Sebastian [0] Happy to be shown wrong, but I seriously fear that aiming too high will hinder progress -- unless maybe there is some very good funding and skilled devs, but even then it might be too ambitious? -- and I am not even sure it is easier on downstream. > > > > I'm trying to raise funding for maintenance and progress for > > > NumPy and SciPy right now via Quansight Labs > > > http://www.quansight.com/labs and I hope to be able to help find > > > grants to support the wonderful efforts that have been happening > > > for some time. > > The NumPy grant and having Tyler/Matti/Stefan at BIDS is a great > start to funded development; more and more diverse funding sources > would be awesome. Yes, that is very cool news! - Sebastian > > Cheers, > Ralf > > > > While I'm thrilled and impressed by the number of amazing devs > > > who have kept NumPy and SciPy going in mostly their spare time, > > > it has created challenges that we have not had continuous > > > maintenance funding to allow continuous paid development so that > > > several people who know about the early decisions could not be > > > retained to spend time on helping the transition. > > > > > > Your bringing the problem of mxnet devs is most appreciated. I > > > will make a documentation PR. > > > > > > -Travis > > > > > > > > > > > > > > > > > Alex. > > > > > > > > > > > > 27.10.2018, 02:27, "Travis Oliphant" : > > > > > What is the justification for deprecation exactly? These > > > > > functions have been well documented and have had the intended > > > > > behavior of producing arrays with dimension at least 1 for > > > > > some time. Why is it unexpected to produce arrays of at > > > > > least 1 dimension? For some users this is exactly what is > > > > > wanted. I don't understand the statement that behavior with > > > > > 0-d arrays is unexpected. > > > > > > > > > > If the desire is to shrink the API of NumPy, I could see > > > > > that. But, it seems odd to me to remove a much-used > > > > > function with an established behavior except as part of a > > > > > wider API-shrinkage effort. > > > > > > > > > > 0-d arrays in NumPy are a separate conversation. At this > > > > > point, I think it was a mistake not to embrace 0-d arrays in > > > > > NumPy from day one. In some sense 0-d arrays *are* scalars > > > > > at least conceptually and for JIT-producing systems that > > > > > exist now and will be growing in the future, they can be > > > > > equivalent to scalars. > > > > > > > > > > The array scalars should become how you define what is *in* a > > > > > NumPy array making them true Python types, rather than Python > > > > > 1-style "instances" of a single "Dtype" object. You would > > > > > then have 0-d arrays and these Python "memory" types > > > > > describing what is *in* the array. > > > > > > > > > > There is a clear way to do this, some of which has been > > > > > outlined by Nathaniel, and the rest I have an outline for how > > > > > to implement. I can advise someone on how to do this. > > > > > > > > > > -Travis > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Oct 25, 2018 at 3:17 PM Alex Rogozhnikov < > > > > > alex.rogozhnikov at yandex.ru> wrote: > > > > > > Dear numpy community, > > > > > > > > > > > > I'm planning to depreciate np.asfortranarray and > > > > > > np.ascontiguousarray > > > > > > functions due to their misbehavior on scalar (0-D tensors) > > > > > > with PR #12244. > > > > > > > > > > > > Current behavior (converting scalars to 1-d array with > > > > > > single element) > > > > > > - is unexpected and contradicts to documentation > > > > > > - probably, can't be changed without breaking external code > > > > > > - I believe, this was a cause for poor support of 0-d > > > > > > arrays in mxnet. > > > > > > - both functions are easily replaced with asarray(..., > > > > > > order='...'), which has expected behavior > > > > > > > > > > > > There is no timeline for removal - we just need to > > > > > > discourage from using this functions in new code. > > > > > > > > > > > > Function naming may be related to how numpy treats 0-d > > > > > > tensors specially, > > > > > > and those probably should not be called arrays. > > > > > > https://www.numpy.org/neps/nep-0027-zero-rank-arrarys.html > > > > > > However, as a user I never thought about 0-d arrays being > > > > > > special and being "not arrays". > > > > > > > > > > > > > > > > > > Please see original discussion at github for more details > > > > > > https://github.com/numpy/numpy/issues/5300 > > > > > > > > > > > > Your comments welcome, > > > > > > Alex Rogozhnikov > > > > > > > > > > > > _______________________________________________ > > > > > > NumPy-Discussion mailing list > > > > > > NumPy-Discussion at python.org > > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > > > > , > > > > > _______________________________________________ > > > > > NumPy-Discussion mailing list > > > > > NumPy-Discussion at python.org > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > > > > NumPy-Discussion mailing list > > > > NumPy-Discussion at python.org > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at python.org > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From charlesr.harris at gmail.com Tue Oct 30 13:35:04 2018 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 30 Oct 2018 11:35:04 -0600 Subject: [Numpy-discussion] NumPy 1.15.4 release Message-ID: Hi All, Just a heads up that I am planning on making a 1.15.4 release this coming weekend. The only fixes planned at this point are - BUG: Fix fill value in masked array '==' and '!=' ops, #12257 - BUG: clear buffer_info_cache on scalar dealloc, #12249 If there are other fixes that you think needed, please let me know. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From matti.picus at gmail.com Tue Oct 30 15:15:50 2018 From: matti.picus at gmail.com (Matti Picus) Date: Tue, 30 Oct 2018 21:15:50 +0200 Subject: [Numpy-discussion] Reminder: weekly status meeting 31.10 at 12:00 pacific time Message-ID: <1e00f85e-16f5-841a-05de-f1ea0dbd8941@gmail.com> An HTML attachment was scrubbed... URL: From einstein.edison at gmail.com Tue Oct 30 16:24:36 2018 From: einstein.edison at gmail.com (Hameer Abbasi) Date: Tue, 30 Oct 2018 21:24:36 +0100 Subject: [Numpy-discussion] Reminder: weekly status meeting 31.10 at 12:00 pacific time In-Reply-To: <1e00f85e-16f5-841a-05de-f1ea0dbd8941@gmail.com> References: <1e00f85e-16f5-841a-05de-f1ea0dbd8941@gmail.com> Message-ID: <48dced8b-d66b-4b7b-8dc2-0e06916a448c@Canary> Hello! If I may make a suggestion, it might be nice to create a separate calendar and add people to it as needed for better management. Best Regards, Hameer Abbasi > On Tuesday, Oct 30, 2018 at 8:16 PM, Matti Picus wrote: > > The draft agenda is at https://hackmd.io/D3I3CdO2T9ipZ2g5uAChcA?both. > > > Everyone is invited to join. > > > > > > > Matti, Tyler and Stefan > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Tue Oct 30 17:22:04 2018 From: shoyer at gmail.com (Stephan Hoyer) Date: Tue, 30 Oct 2018 14:22:04 -0700 Subject: [Numpy-discussion] asanyarray vs. asarray In-Reply-To: References: <24100c7f-20fd-4eed-99b0-d37660f52223@Canary> Message-ID: On Mon, Oct 29, 2018 at 9:49 PM Eric Wieser wrote: > The latter - changing the behavior of multiplication breaks the principle. > > But this is not the main reason for deprecating matrix - almost all of the > problems I?ve seen have been caused by the way that matrices behave when > sliced. The way that m[i][j] and m[i,j] are different is just one example > of this, the fact that they must be 2d is another. > > Matrices behaving differently on multiplication isn?t super different in > my mind to how string arrays fail to multiply at all. > > Eric > It's certainly fine for arithmetic to work differently on an element-wise basis or even to error. But np.matrix changes the shape of results from various ndarray operations (e.g., both multiplication and indexing), which is more than any dtype can do. The Liskov substitution principle (LSP) suggests that the set of reasonable ndarray subclasses are exactly those that could also in principle correspond to a new dtype. Of np.ndarray subclasses in wide-spread use, I think only the various "array with units" types come close satisfying this criteria. They only fall short insofar as they present a misleading dtype (without unit information). The main problem with subclassing for numpy.ndarray is that it guarantees too much: a large set of operations/methods along with a specific memory layout exposed as part of its public API. Worse, ndarray itself is a little quirky (e.g., with indexing, and its handling of scalars vs. 0d arrays). In practice, it's basically impossible to layer on complex behavior with these exact semantics, so only extremely minimal ndarray subclasses don't violate LSP. Once we have more easily extended dtypes, I suspect most of the good use cases for subclassing will have gone away. -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Tue Oct 30 17:44:48 2018 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 30 Oct 2018 14:44:48 -0700 Subject: [Numpy-discussion] asanyarray vs. asarray In-Reply-To: References: <24100c7f-20fd-4eed-99b0-d37660f52223@Canary> Message-ID: On Tue, Oct 30, 2018 at 2:22 PM, Stephan Hoyer wrote: > The Liskov substitution principle (LSP) suggests that the set of > reasonable ndarray subclasses are exactly those that could also in > principle correspond to a new dtype. Of np.ndarray subclasses in > wide-spread use, I think only the various "array with units" types come > close satisfying this criteria. They only fall short insofar as they > present a misleading dtype (without unit information). > How about subclasses that only add functionality? My only use case of subclassing is exactly that: I have a "bounding box" object (probably could have been called a rectangle) that is a subclass of ndarray, is always shape (2,2), and has various methods for merging two such boxes, etc, adding a point, etc. I did it that way, 'cause I had a lot of code already that simply used a (2,2) array to represent a bounding box, and I wanted all that code to still work. I have had zero problems with it. Maybe that's too trivial to be worth talking about, but this kind of use case can be handy. It is a bit awkward to write the code, though -- it would be nice to have a cleaner API for this sort of subclassing (not that I have any idea how to do that) The main problem with subclassing for numpy.ndarray is that it guarantees > too much: a large set of operations/methods along with a specific memory > layout exposed as part of its public API. > This is a big deal -- we really have two concepts here: - a Python class (type) with certain behaviors in Python code - a wrapper around a strided memory block. maybe it's possible to be clear about that distinction: "Duck Arrays" are the Python API Maybe a C-API object would be useful, that shares the memory layout, but could have completely different functionality at the Python level. - CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From allanhaldane at gmail.com Tue Oct 30 18:16:49 2018 From: allanhaldane at gmail.com (Allan Haldane) Date: Tue, 30 Oct 2018 18:16:49 -0400 Subject: [Numpy-discussion] Reminder: weekly status meeting 31.10 at 12:00 pacific time In-Reply-To: <1e00f85e-16f5-841a-05de-f1ea0dbd8941@gmail.com> References: <1e00f85e-16f5-841a-05de-f1ea0dbd8941@gmail.com> Message-ID: I'll try to make it, but can't guarantee. The last time there was discussion of the structured-array PRs which are currently held up, and I sort of promised to have a writeup of the issues. I put up a draft of that here: https://gist.github.com/ahaldane/6cd44886efb449f9c8d5ea012747323b Allan On 10/30/18 3:15 PM, Matti Picus wrote: > The draft agenda is at https://hackmd.io/D3I3CdO2T9ipZ2g5uAChcA?both. > > Everyone is invited to join. > > > Matti, Tyler and Stefan > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > From stefanv at berkeley.edu Tue Oct 30 18:52:06 2018 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Tue, 30 Oct 2018 15:52:06 -0700 Subject: [Numpy-discussion] Reminder: weekly status meeting 31.10 at 12:00 pacific time In-Reply-To: <48dced8b-d66b-4b7b-8dc2-0e06916a448c@Canary> References: <1e00f85e-16f5-841a-05de-f1ea0dbd8941@gmail.com> <48dced8b-d66b-4b7b-8dc2-0e06916a448c@Canary> Message-ID: <20181030225206.hx7nnpsoril532gp@carbo> Hi Hameer, On Tue, 30 Oct 2018 21:24:36 +0100, Hameer Abbasi wrote: > If I may make a suggestion, it might be nice to create a separate > calendar and add people to it as needed for better management. Can you clarify what you want? Do you mean we should not announce the meeting agenda here, and instead only use a calendar? Or would you like a calendar link to subcribe to, that also contains a link to the meeting notes? Best regards, St?fan From einstein.edison at gmail.com Tue Oct 30 18:56:17 2018 From: einstein.edison at gmail.com (Hameer Abbasi) Date: Tue, 30 Oct 2018 23:56:17 +0100 Subject: [Numpy-discussion] Reminder: weekly status meeting 31.10 at 12:00 pacific time In-Reply-To: <20181030225206.hx7nnpsoril532gp@carbo> References: <1e00f85e-16f5-841a-05de-f1ea0dbd8941@gmail.com> <48dced8b-d66b-4b7b-8dc2-0e06916a448c@Canary> <20181030225206.hx7nnpsoril532gp@carbo> Message-ID: Hi, I meant we should have a calendar that?s possible to subscribe to, and in addition announce the agenda here, and that the calendar could contain a link to the meeting agenda. Best Regards, Hameer Abbasi > On Tuesday, Oct 30, 2018 at 11:52 PM, Stefan van der Walt wrote: > Hi Hameer, > > On Tue, 30 Oct 2018 21:24:36 +0100, Hameer Abbasi wrote: > > If I may make a suggestion, it might be nice to create a separate > > calendar and add people to it as needed for better management. > > Can you clarify what you want? Do you mean we should not announce the > meeting agenda here, and instead only use a calendar? Or would you like > a calendar link to subcribe to, that also contains a link to the meeting > notes? > > Best regards, > St?fan > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From harrigan.matthew at gmail.com Tue Oct 30 19:35:30 2018 From: harrigan.matthew at gmail.com (Matthew Harrigan) Date: Tue, 30 Oct 2018 19:35:30 -0400 Subject: [Numpy-discussion] asanyarray vs. asarray In-Reply-To: References: <24100c7f-20fd-4eed-99b0-d37660f52223@Canary> Message-ID: Would the extended dtypes also violate the Liskov substitution principle? In place operations which would mutate the dtype are one potential issue. Would a single dtype for an array be sufficient, i.e. np.polynomial coefficients? Compared to ndarray subclasses, the memory layout issue goes away, but there is still a large set of operations exposed as part of a public API with various quirks. I can imagine a new function "asunitless" scattered around downstream projects. On Tue, Oct 30, 2018 at 5:23 PM Stephan Hoyer wrote: > On Mon, Oct 29, 2018 at 9:49 PM Eric Wieser > wrote: > >> The latter - changing the behavior of multiplication breaks the principle. >> >> But this is not the main reason for deprecating matrix - almost all of >> the problems I?ve seen have been caused by the way that matrices behave >> when sliced. The way that m[i][j] and m[i,j] are different is just one >> example of this, the fact that they must be 2d is another. >> >> Matrices behaving differently on multiplication isn?t super different in >> my mind to how string arrays fail to multiply at all. >> >> Eric >> > It's certainly fine for arithmetic to work differently on an element-wise > basis or even to error. But np.matrix changes the shape of results from > various ndarray operations (e.g., both multiplication and indexing), which > is more than any dtype can do. > > The Liskov substitution principle (LSP) suggests that the set of > reasonable ndarray subclasses are exactly those that could also in > principle correspond to a new dtype. Of np.ndarray subclasses in > wide-spread use, I think only the various "array with units" types come > close satisfying this criteria. They only fall short insofar as they > present a misleading dtype (without unit information). > > The main problem with subclassing for numpy.ndarray is that it guarantees > too much: a large set of operations/methods along with a specific memory > layout exposed as part of its public API. Worse, ndarray itself is a little > quirky (e.g., with indexing, and its handling of scalars vs. 0d arrays). In > practice, it's basically impossible to layer on complex behavior with these > exact semantics, so only extremely minimal ndarray subclasses don't violate > LSP. > > Once we have more easily extended dtypes, I suspect most of the good use > cases for subclassing will have gone away. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefanv at berkeley.edu Tue Oct 30 20:07:09 2018 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Tue, 30 Oct 2018 17:07:09 -0700 Subject: [Numpy-discussion] Reminder: weekly status meeting 31.10 at 12:00 pacific time In-Reply-To: References: <1e00f85e-16f5-841a-05de-f1ea0dbd8941@gmail.com> <48dced8b-d66b-4b7b-8dc2-0e06916a448c@Canary> <20181030225206.hx7nnpsoril532gp@carbo> Message-ID: <20181031000709.k56otchs3bgzlqf4@carbo> On Tue, 30 Oct 2018 23:56:17 +0100, Hameer Abbasi wrote: > I meant we should have a calendar that?s possible to subscribe to, and > in addition announce the agenda here, and that the calendar could > contain a link to the meeting agenda. Here you go: https://calendar.google.com/calendar?cid=YmVya2VsZXkuZWR1X2lla2dwaWdtMjMyamJobGRzZmIyYzJqODFjQGdyb3VwLmNhbGVuZGFyLmdvb2dsZS5jb20 We'll also advertise this on the agenda and in future emails. Best regards, St?fan From wieser.eric+numpy at gmail.com Tue Oct 30 21:41:51 2018 From: wieser.eric+numpy at gmail.com (Eric Wieser) Date: Tue, 30 Oct 2018 18:41:51 -0700 Subject: [Numpy-discussion] Attribute hiding APIs for PyArrayObject In-Reply-To: References: Message-ID: In NumPy 1.14 we changed UPDATEIFCOPY to WRITEBACKIFCOPY, and in 1.16 we would like to deprecate PyArray_SetNumericOps and PyArray_GetNumericOps. The strange warning when NPY_NO_DEPRICATED_API is annoying I?m not sure I make the connection here between hidden fields and API deprecation. You seem to be asking two vaguely related questions: 1. Should we have deprecated field access in the first place 2. Does our api deprecation mechanism need work I think a more substantial problem statement is needed for 2, so I?m only going to respond to 1 here. Hiding fields seems to me to match the CPython model of things, where your public api is PyArray_SomeGetter(thing). If you look at the cpython source code , they only expose the underlying struct fields if you don?t define Py_LIMITED_API, ie if you as a consumer volunteer to be broken by upstream changes in minor versions. People (like us) are willing to produce separate builds for each python versions, so often do not define this. We could add a similar PyArray_LIMITED_API that allows field access under a similar guarantee - the question is, are many downstream consumers willing to produce builds against multiple numpy versions? (especially if they also do so against multiple python versions) Also, for example, cython has a mechanism to transpile python code into C, mapping slow python attribute lookup to fast C struct field access How does this work for builtin types? Does cython deliberately not define Py_LIMITED_API? Or are you just forced to use PyTuple_GetItem(t) if you want the fast path. Eric On Tue, 30 Oct 2018 at 02:04 Matti Picus wrote: TL;DR - should we revert the attribute-hiding constructs in > ndarraytypes.h and unify PyArrayObject_fields with PyArrayObject? > > > Background > > > NumPy 1.8 deprecated direct access to PyArrayObject fields. It made > PyArrayObject "opaque", and hid the fields behind a PyArrayObject_fields > structure > > https://github.com/numpy/numpy/blob/v1.15.3/numpy/core/include/numpy/ndarraytypes.h#L659 > with a comment about moving this to a private header. In order to access > the fields, users are supposed to use PyArray_FIELDNAME functions, like > PyArray_DATA and PyArray_NDIM. It seems there were thoughts at the time > that numpy might move away from a C-struct based > > underlying data structure. Other changes were also made to enum names, > but those are relatively painless to find-and-replace. > > > NumPy has a mechanism to manage deprecating APIs, C users define > NPY_NO_DEPRICATED_API to a desired level, say NPY_1_8_API_VERSION, and > can then access the API "as if" they were using NumPy 1.8. Users who do > not define NPY_NO_DEPRICATED_API get a warning when compiling, and > default to the pre-1.8 API (aliasing of PyArrayObject to > PyArrayObject_fields and direct access to the C struct fields). This is > convenient for downstream users, both since the new API does not provide > much added value, and it is much easier to write a->nd than > PyArray_NDIM(a). For instance, pandas uses direct assignment to the data > field for fast json parsing > > https://github.com/pandas-dev/pandas/blob/master/pandas/_libs/src/ujson/python/JSONtoObj.c#L203 > via chunks. Working around the new API in pandas would require more > engineering. Also, for example, cython has a mechanism to transpile > python code into C, mapping slow python attribute lookup to fast C > struct field access > > https://cython.readthedocs.io/en/latest/src/userguide/extension_types.html#external-extension-types > > > In a parallel but not really related universe, cython recently upgraded > the object mapping so that we can quiet the annoying "size changed" > runtime warning https://github.com/numpy/numpy/issues/11788 without > requiring warning filters, but that requires updating the numpy.pxd file > provided with cython, and it was proposed that NumPy actually vendor its > own file rather than depending on the cython one > (https://github.com/numpy/numpy/issues/11803). > > > The problem > > > We have now made further changes to our API. In NumPy 1.14 we changed > UPDATEIFCOPY to WRITEBACKIFCOPY, and in 1.16 we would like to deprecate > PyArray_SetNumericOps and PyArray_GetNumericOps. The strange warning > when NPY_NO_DEPRICATED_API is annoying. The new API cannot be supported > by cython without some deep surgery > (https://github.com/cython/cython/pull/2640). When I tried dogfooding an > updated numpy.pxd for the only cython code in NumPy, mtrand.pxy, I came > across some of these issues (https://github.com/numpy/numpy/pull/12284). > Forcing the new API will require downstream users to refactor code or > re-engineer constructs, as in the pandas example above. > > > The question > > > Is the attribute-hiding effort worth it? Should we give up, revert the > PyArrayObject/PyArrayObject_fields division and allow direct access from > C to the numpy internals? Is there another path forward that is less > painful? > > > Matti > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Tue Oct 30 22:33:37 2018 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 30 Oct 2018 19:33:37 -0700 Subject: [Numpy-discussion] Attribute hiding APIs for PyArrayObject In-Reply-To: References: Message-ID: It's probably helpful to know that Py_LIMITED_API is a kinda-experimental thing that was added in CPython 3.2 (see PEP 384) and remains almost 100% unused. It has never been a popular or influential thing (for better or worse). -n On Tue, Oct 30, 2018 at 6:41 PM, Eric Wieser wrote: > In NumPy 1.14 we changed UPDATEIFCOPY to WRITEBACKIFCOPY, and in 1.16 we > would like to deprecate PyArray_SetNumericOps and PyArray_GetNumericOps. The > strange warning when NPY_NO_DEPRICATED_API is annoying > > I?m not sure I make the connection here between hidden fields and API > deprecation. You seem to be asking two vaguely related questions: > > Should we have deprecated field access in the first place > Does our api deprecation mechanism need work > > I think a more substantial problem statement is needed for 2, so I?m only > going to respond to 1 here. > > Hiding fields seems to me to match the CPython model of things, where your > public api is PyArray_SomeGetter(thing). > If you look at the cpython source code, they only expose the underlying > struct fields if you don?t define Py_LIMITED_API, ie if you as a consumer > volunteer to be broken by upstream changes in minor versions. People (like > us) are willing to produce separate builds for each python versions, so > often do not define this. > > We could add a similar PyArray_LIMITED_API that allows field access under a > similar guarantee - the question is, are many downstream consumers willing > to produce builds against multiple numpy versions? (especially if they also > do so against multiple python versions) > > Also, for example, cython has a mechanism to transpile python code into C, > mapping slow python attribute lookup to fast C struct field access > > How does this work for builtin types? Does cython deliberately not define > Py_LIMITED_API? Or are you just forced to use PyTuple_GetItem(t) if you want > the fast path. > > Eric > > On Tue, 30 Oct 2018 at 02:04 Matti Picus wrote: >> >> TL;DR - should we revert the attribute-hiding constructs in >> ndarraytypes.h and unify PyArrayObject_fields with PyArrayObject? >> >> >> Background >> >> >> NumPy 1.8 deprecated direct access to PyArrayObject fields. It made >> PyArrayObject "opaque", and hid the fields behind a PyArrayObject_fields >> structure >> >> https://github.com/numpy/numpy/blob/v1.15.3/numpy/core/include/numpy/ndarraytypes.h#L659 >> with a comment about moving this to a private header. In order to access >> the fields, users are supposed to use PyArray_FIELDNAME functions, like >> PyArray_DATA and PyArray_NDIM. It seems there were thoughts at the time >> that numpy might move away from a C-struct based >> >> underlying data structure. Other changes were also made to enum names, >> but those are relatively painless to find-and-replace. >> >> >> NumPy has a mechanism to manage deprecating APIs, C users define >> NPY_NO_DEPRICATED_API to a desired level, say NPY_1_8_API_VERSION, and >> can then access the API "as if" they were using NumPy 1.8. Users who do >> not define NPY_NO_DEPRICATED_API get a warning when compiling, and >> default to the pre-1.8 API (aliasing of PyArrayObject to >> PyArrayObject_fields and direct access to the C struct fields). This is >> convenient for downstream users, both since the new API does not provide >> much added value, and it is much easier to write a->nd than >> PyArray_NDIM(a). For instance, pandas uses direct assignment to the data >> field for fast json parsing >> >> https://github.com/pandas-dev/pandas/blob/master/pandas/_libs/src/ujson/python/JSONtoObj.c#L203 >> via chunks. Working around the new API in pandas would require more >> engineering. Also, for example, cython has a mechanism to transpile >> python code into C, mapping slow python attribute lookup to fast C >> struct field access >> >> https://cython.readthedocs.io/en/latest/src/userguide/extension_types.html#external-extension-types >> >> >> In a parallel but not really related universe, cython recently upgraded >> the object mapping so that we can quiet the annoying "size changed" >> runtime warning https://github.com/numpy/numpy/issues/11788 without >> requiring warning filters, but that requires updating the numpy.pxd file >> provided with cython, and it was proposed that NumPy actually vendor its >> own file rather than depending on the cython one >> (https://github.com/numpy/numpy/issues/11803). >> >> >> The problem >> >> >> We have now made further changes to our API. In NumPy 1.14 we changed >> UPDATEIFCOPY to WRITEBACKIFCOPY, and in 1.16 we would like to deprecate >> PyArray_SetNumericOps and PyArray_GetNumericOps. The strange warning >> when NPY_NO_DEPRICATED_API is annoying. The new API cannot be supported >> by cython without some deep surgery >> (https://github.com/cython/cython/pull/2640). When I tried dogfooding an >> updated numpy.pxd for the only cython code in NumPy, mtrand.pxy, I came >> across some of these issues (https://github.com/numpy/numpy/pull/12284). >> Forcing the new API will require downstream users to refactor code or >> re-engineer constructs, as in the pandas example above. >> >> >> The question >> >> >> Is the attribute-hiding effort worth it? Should we give up, revert the >> PyArrayObject/PyArrayObject_fields division and allow direct access from >> C to the numpy internals? Is there another path forward that is less >> painful? >> >> >> Matti >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -- Nathaniel J. Smith -- https://vorpus.org From encukou at gmail.com Wed Oct 31 04:46:03 2018 From: encukou at gmail.com (Petr Viktorin) Date: Wed, 31 Oct 2018 09:46:03 +0100 Subject: [Numpy-discussion] Attribute hiding APIs for PyArrayObject In-Reply-To: References: Message-ID: On 10/31/18 03:33, Nathaniel Smith wrote: > It's probably helpful to know that Py_LIMITED_API is a > kinda-experimental thing that was added in CPython 3.2 (see PEP 384) > and remains almost 100% unused. It has never been a popular or > influential thing (for better or worse). Py_LIMITED_API is not very influential *outside* CPython, but it's not (yet) a failed experiment. (Which is not what you said, but someone might read it that way.) The popularity is a bit of a chicken-and-egg problem. Py_LIMITED_API is not used much because the current implementation is not useful in the real world. But as large projects like Cython and PySide are looking at Py_LIMITED_API from their side, problems are getting found and fixed. It's not a fast process, being all volunteer-driven. But the limited API (= stable ABI) does have a major role in thoughts about future CPython API design, and the idea (not current implementation) is worth looking at. What's the idea? In addition to python35/python36/python37, there's a "python3" API that you can target, which is slower at run-time but won't inflate your build/test matrix. It's not either-or. CPython provides both. > -n > > On Tue, Oct 30, 2018 at 6:41 PM, Eric Wieser > wrote: >> In NumPy 1.14 we changed UPDATEIFCOPY to WRITEBACKIFCOPY, and in 1.16 we >> would like to deprecate PyArray_SetNumericOps and PyArray_GetNumericOps. The >> strange warning when NPY_NO_DEPRICATED_API is annoying >> >> I?m not sure I make the connection here between hidden fields and API >> deprecation. You seem to be asking two vaguely related questions: >> >> Should we have deprecated field access in the first place >> Does our api deprecation mechanism need work >> >> I think a more substantial problem statement is needed for 2, so I?m only >> going to respond to 1 here. >> >> Hiding fields seems to me to match the CPython model of things, where your >> public api is PyArray_SomeGetter(thing). >> If you look at the cpython source code, they only expose the underlying >> struct fields if you don?t define Py_LIMITED_API, ie if you as a consumer >> volunteer to be broken by upstream changes in minor versions. People (like >> us) are willing to produce separate builds for each python versions, so >> often do not define this. >> >> We could add a similar PyArray_LIMITED_API that allows field access under a >> similar guarantee - the question is, are many downstream consumers willing >> to produce builds against multiple numpy versions? (especially if they also >> do so against multiple python versions) >> >> Also, for example, cython has a mechanism to transpile python code into C, >> mapping slow python attribute lookup to fast C struct field access >> >> How does this work for builtin types? Does cython deliberately not define >> Py_LIMITED_API? Or are you just forced to use PyTuple_GetItem(t) if you want >> the fast path. >> >> Eric >> >> On Tue, 30 Oct 2018 at 02:04 Matti Picus wrote: >>> >>> TL;DR - should we revert the attribute-hiding constructs in >>> ndarraytypes.h and unify PyArrayObject_fields with PyArrayObject? >>> >>> >>> Background >>> >>> >>> NumPy 1.8 deprecated direct access to PyArrayObject fields. It made >>> PyArrayObject "opaque", and hid the fields behind a PyArrayObject_fields >>> structure >>> >>> https://github.com/numpy/numpy/blob/v1.15.3/numpy/core/include/numpy/ndarraytypes.h#L659 >>> with a comment about moving this to a private header. In order to access >>> the fields, users are supposed to use PyArray_FIELDNAME functions, like >>> PyArray_DATA and PyArray_NDIM. It seems there were thoughts at the time >>> that numpy might move away from a C-struct based >>> >>> underlying data structure. Other changes were also made to enum names, >>> but those are relatively painless to find-and-replace. >>> >>> >>> NumPy has a mechanism to manage deprecating APIs, C users define >>> NPY_NO_DEPRICATED_API to a desired level, say NPY_1_8_API_VERSION, and >>> can then access the API "as if" they were using NumPy 1.8. Users who do >>> not define NPY_NO_DEPRICATED_API get a warning when compiling, and >>> default to the pre-1.8 API (aliasing of PyArrayObject to >>> PyArrayObject_fields and direct access to the C struct fields). This is >>> convenient for downstream users, both since the new API does not provide >>> much added value, and it is much easier to write a->nd than >>> PyArray_NDIM(a). For instance, pandas uses direct assignment to the data >>> field for fast json parsing >>> >>> https://github.com/pandas-dev/pandas/blob/master/pandas/_libs/src/ujson/python/JSONtoObj.c#L203 >>> via chunks. Working around the new API in pandas would require more >>> engineering. Also, for example, cython has a mechanism to transpile >>> python code into C, mapping slow python attribute lookup to fast C >>> struct field access >>> >>> https://cython.readthedocs.io/en/latest/src/userguide/extension_types.html#external-extension-types >>> >>> >>> In a parallel but not really related universe, cython recently upgraded >>> the object mapping so that we can quiet the annoying "size changed" >>> runtime warning https://github.com/numpy/numpy/issues/11788 without >>> requiring warning filters, but that requires updating the numpy.pxd file >>> provided with cython, and it was proposed that NumPy actually vendor its >>> own file rather than depending on the cython one >>> (https://github.com/numpy/numpy/issues/11803). >>> >>> >>> The problem >>> >>> >>> We have now made further changes to our API. In NumPy 1.14 we changed >>> UPDATEIFCOPY to WRITEBACKIFCOPY, and in 1.16 we would like to deprecate >>> PyArray_SetNumericOps and PyArray_GetNumericOps. The strange warning >>> when NPY_NO_DEPRICATED_API is annoying. The new API cannot be supported >>> by cython without some deep surgery >>> (https://github.com/cython/cython/pull/2640). When I tried dogfooding an >>> updated numpy.pxd for the only cython code in NumPy, mtrand.pxy, I came >>> across some of these issues (https://github.com/numpy/numpy/pull/12284). >>> Forcing the new API will require downstream users to refactor code or >>> re-engineer constructs, as in the pandas example above. >>> >>> >>> The question >>> >>> >>> Is the attribute-hiding effort worth it? Should we give up, revert the >>> PyArrayObject/PyArrayObject_fields division and allow direct access from >>> C to the numpy internals? Is there another path forward that is less >>> painful? >>> >>> >>> Matti >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > > > From allanhaldane at gmail.com Wed Oct 31 17:59:29 2018 From: allanhaldane at gmail.com (Allan Haldane) Date: Wed, 31 Oct 2018 17:59:29 -0400 Subject: [Numpy-discussion] Attribute hiding APIs for PyArrayObject In-Reply-To: References: Message-ID: On 10/30/18 5:04 AM, Matti Picus wrote: > TL;DR - should we revert the attribute-hiding constructs in > ndarraytypes.h and unify PyArrayObject_fields with PyArrayObject? > > > Background > > > NumPy 1.8 deprecated direct access to PyArrayObject fields. It made > PyArrayObject "opaque", and hid the fields behind a PyArrayObject_fields > structure > https://github.com/numpy/numpy/blob/v1.15.3/numpy/core/include/numpy/ndarraytypes.h#L659 > with a comment about moving this to a private header. In order to access > the fields, users are supposed to use PyArray_FIELDNAME functions, like > PyArray_DATA and PyArray_NDIM. It seems there were thoughts at the time > that numpy might move away from a C-struct based > > underlying data structure. Other changes were also made to enum names, > but those are relatively painless to find-and-replace. > > > NumPy has a mechanism to manage deprecating APIs, C users define > NPY_NO_DEPRICATED_API to a desired level, say NPY_1_8_API_VERSION, and > can then access the API "as if" they were using NumPy 1.8. Users who do > not define NPY_NO_DEPRICATED_API get a warning when compiling, and > default to the pre-1.8 API (aliasing of PyArrayObject to > PyArrayObject_fields and direct access to the C struct fields). This is > convenient for downstream users, both since the new API does not provide > much added value, and it is much easier to write a->nd than > PyArray_NDIM(a). For instance, pandas uses direct assignment to the data > field for fast json parsing > https://github.com/pandas-dev/pandas/blob/master/pandas/_libs/src/ujson/python/JSONtoObj.c#L203 > via chunks. Working around the new API in pandas would require more > engineering. Also, for example, cython has a mechanism to transpile > python code into C, mapping slow python attribute lookup to fast C > struct field access > https://cython.readthedocs.io/en/latest/src/userguide/extension_types.html#external-extension-types > > > > In a parallel but not really related universe, cython recently upgraded > the object mapping so that we can quiet the annoying "size changed" > runtime warning https://github.com/numpy/numpy/issues/11788 without > requiring warning filters, but that requires updating the numpy.pxd file > provided with cython, and it was proposed that NumPy actually vendor its > own file rather than depending on the cython one > (https://github.com/numpy/numpy/issues/11803). > > > The problem > > > We have now made further changes to our API. In NumPy 1.14 we changed > UPDATEIFCOPY to WRITEBACKIFCOPY, and in 1.16 we would like to deprecate > PyArray_SetNumericOps and PyArray_GetNumericOps. The strange warning > when NPY_NO_DEPRICATED_API is annoying. The new API cannot be supported > by cython without some deep surgery > (https://github.com/cython/cython/pull/2640). When I tried dogfooding an > updated numpy.pxd for the only cython code in NumPy, mtrand.pxy, I came > across some of these issues (https://github.com/numpy/numpy/pull/12284). > Forcing the new API will require downstream users to refactor code or > re-engineer constructs, as in the pandas example above. I haven't understood the cython issue, but just want to mention that for optimization purposes it's nice to be able to modify the fields, like in the pandas/json example above. In particular, PyArray_ConcatenateArrays uses some tricks which temporarily clobber the data pointer and shape of an array to concatenate arrays efficiently. It seems fairly safe to me. These tricks would be nice to re-use in a C port of the new block code we merged recently. Those optimizations aren't possible if only using PyArray_Object. Cheers, Allan > The question > > > Is the attribute-hiding effort worth it? Should we give up, revert the > PyArrayObject/PyArrayObject_fields division and allow direct access from > C to the numpy internals? Is there another path forward that is less > painful? > > > Matti > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion From charlesr.harris at gmail.com Wed Oct 31 19:00:52 2018 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 31 Oct 2018 17:00:52 -0600 Subject: [Numpy-discussion] Attribute hiding APIs for PyArrayObject In-Reply-To: References: Message-ID: On Wed, Oct 31, 2018 at 3:59 PM Allan Haldane wrote: > On 10/30/18 5:04 AM, Matti Picus wrote: > > TL;DR - should we revert the attribute-hiding constructs in > > ndarraytypes.h and unify PyArrayObject_fields with PyArrayObject? > > > > > > Background > > > > > > NumPy 1.8 deprecated direct access to PyArrayObject fields. It made > > PyArrayObject "opaque", and hid the fields behind a PyArrayObject_fields > > structure > > > https://github.com/numpy/numpy/blob/v1.15.3/numpy/core/include/numpy/ndarraytypes.h#L659 > > with a comment about moving this to a private header. In order to access > > the fields, users are supposed to use PyArray_FIELDNAME functions, like > > PyArray_DATA and PyArray_NDIM. It seems there were thoughts at the time > > that numpy might move away from a C-struct based > > > > underlying data structure. Other changes were also made to enum names, > > but those are relatively painless to find-and-replace. > > > > > > NumPy has a mechanism to manage deprecating APIs, C users define > > NPY_NO_DEPRICATED_API to a desired level, say NPY_1_8_API_VERSION, and > > can then access the API "as if" they were using NumPy 1.8. Users who do > > not define NPY_NO_DEPRICATED_API get a warning when compiling, and > > default to the pre-1.8 API (aliasing of PyArrayObject to > > PyArrayObject_fields and direct access to the C struct fields). This is > > convenient for downstream users, both since the new API does not provide > > much added value, and it is much easier to write a->nd than > > PyArray_NDIM(a). For instance, pandas uses direct assignment to the data > > field for fast json parsing > > > https://github.com/pandas-dev/pandas/blob/master/pandas/_libs/src/ujson/python/JSONtoObj.c#L203 > > via chunks. Working around the new API in pandas would require more > > engineering. Also, for example, cython has a mechanism to transpile > > python code into C, mapping slow python attribute lookup to fast C > > struct field access > > > https://cython.readthedocs.io/en/latest/src/userguide/extension_types.html#external-extension-types > > > > > > > > In a parallel but not really related universe, cython recently upgraded > > the object mapping so that we can quiet the annoying "size changed" > > runtime warning https://github.com/numpy/numpy/issues/11788 without > > requiring warning filters, but that requires updating the numpy.pxd file > > provided with cython, and it was proposed that NumPy actually vendor its > > own file rather than depending on the cython one > > (https://github.com/numpy/numpy/issues/11803). > > > > > > The problem > > > > > > We have now made further changes to our API. In NumPy 1.14 we changed > > UPDATEIFCOPY to WRITEBACKIFCOPY, and in 1.16 we would like to deprecate > > PyArray_SetNumericOps and PyArray_GetNumericOps. The strange warning > > when NPY_NO_DEPRICATED_API is annoying. The new API cannot be supported > > by cython without some deep surgery > > (https://github.com/cython/cython/pull/2640). When I tried dogfooding an > > updated numpy.pxd for the only cython code in NumPy, mtrand.pxy, I came > > across some of these issues (https://github.com/numpy/numpy/pull/12284). > > Forcing the new API will require downstream users to refactor code or > > re-engineer constructs, as in the pandas example above. > > I haven't understood the cython issue, but just want to mention that for > optimization purposes it's nice to be able to modify the fields, like in > the pandas/json example above. > > In particular, PyArray_ConcatenateArrays uses some tricks which > temporarily clobber the data pointer and shape of an array to > concatenate arrays efficiently. It seems fairly safe to me. These tricks > would be nice to re-use in a C port of the new block code we merged > recently. > > Those optimizations aren't possible if only using PyArray_Object. > > It's OK for numpy internals to directly access the structures, as presumably they will be updated if anything changes. Maybe it would be useful for Cython to have a flag like Py_LIMITED_API? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Wed Oct 31 20:04:16 2018 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Wed, 31 Oct 2018 17:04:16 -0700 Subject: [Numpy-discussion] Attribute hiding APIs for PyArrayObject In-Reply-To: References: Message-ID: On Wed, Oct 31, 2018 at 4:01 PM Charles R Harris wrote: > > > On Wed, Oct 31, 2018 at 3:59 PM Allan Haldane > wrote: > >> On 10/30/18 5:04 AM, Matti Picus wrote: >> > TL;DR - should we revert the attribute-hiding constructs in >> > ndarraytypes.h and unify PyArrayObject_fields with PyArrayObject? >> > >> > >> > Background >> > >> > >> > NumPy 1.8 deprecated direct access to PyArrayObject fields. It made >> > PyArrayObject "opaque", and hid the fields behind a PyArrayObject_fields >> > structure >> > >> https://github.com/numpy/numpy/blob/v1.15.3/numpy/core/include/numpy/ndarraytypes.h#L659 >> > with a comment about moving this to a private header. In order to access >> > the fields, users are supposed to use PyArray_FIELDNAME functions, like >> > PyArray_DATA and PyArray_NDIM. It seems there were thoughts at the time >> > that numpy might move away from a C-struct based >> > >> > underlying data structure. Other changes were also made to enum names, >> > but those are relatively painless to find-and-replace. >> > >> > >> > NumPy has a mechanism to manage deprecating APIs, C users define >> > NPY_NO_DEPRICATED_API to a desired level, say NPY_1_8_API_VERSION, and >> > can then access the API "as if" they were using NumPy 1.8. Users who do >> > not define NPY_NO_DEPRICATED_API get a warning when compiling, and >> > default to the pre-1.8 API (aliasing of PyArrayObject to >> > PyArrayObject_fields and direct access to the C struct fields). This is >> > convenient for downstream users, both since the new API does not provide >> > much added value, and it is much easier to write a->nd than >> > PyArray_NDIM(a). For instance, pandas uses direct assignment to the data >> > field for fast json parsing >> > >> https://github.com/pandas-dev/pandas/blob/master/pandas/_libs/src/ujson/python/JSONtoObj.c#L203 >> > via chunks. Working around the new API in pandas would require more >> > engineering. Also, for example, cython has a mechanism to transpile >> > python code into C, mapping slow python attribute lookup to fast C >> > struct field access >> > >> https://cython.readthedocs.io/en/latest/src/userguide/extension_types.html#external-extension-types >> > >> > >> > >> > In a parallel but not really related universe, cython recently upgraded >> > the object mapping so that we can quiet the annoying "size changed" >> > runtime warning https://github.com/numpy/numpy/issues/11788 without >> > requiring warning filters, but that requires updating the numpy.pxd file >> > provided with cython, and it was proposed that NumPy actually vendor its >> > own file rather than depending on the cython one >> > (https://github.com/numpy/numpy/issues/11803). >> > >> > >> > The problem >> > >> > >> > We have now made further changes to our API. In NumPy 1.14 we changed >> > UPDATEIFCOPY to WRITEBACKIFCOPY, and in 1.16 we would like to deprecate >> > PyArray_SetNumericOps and PyArray_GetNumericOps. The strange warning >> > when NPY_NO_DEPRICATED_API is annoying. The new API cannot be supported >> > by cython without some deep surgery >> > (https://github.com/cython/cython/pull/2640). When I tried dogfooding >> an >> > updated numpy.pxd for the only cython code in NumPy, mtrand.pxy, I came >> > across some of these issues (https://github.com/numpy/numpy/pull/12284 >> ). >> > Forcing the new API will require downstream users to refactor code or >> > re-engineer constructs, as in the pandas example above. >> >> I haven't understood the cython issue, but just want to mention that for >> optimization purposes it's nice to be able to modify the fields, like in >> the pandas/json example above. >> >> In particular, PyArray_ConcatenateArrays uses some tricks which >> temporarily clobber the data pointer and shape of an array to >> concatenate arrays efficiently. It seems fairly safe to me. These tricks >> would be nice to re-use in a C port of the new block code we merged >> recently. >> >> Those optimizations aren't possible if only using PyArray_Object. >> >> > It's OK for numpy internals to directly access the structures, as > presumably they will be updated if anything changes. Maybe it would be > useful for Cython to have a flag like Py_LIMITED_API? > That probably only makes sense if we enable such a flag by default - which is a big backwards compat break that users can then undo by setting Py_LIMITED_API=0. Otherwise the vast majority of users will never use it, and hence we still cannot change in the C API without breaking the world. Such breakage would be fine for conda, because it special-cases NumPy in the same way as Python. For wheel/pip users however, it would cause major issues. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Wed Oct 31 20:28:01 2018 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Wed, 31 Oct 2018 17:28:01 -0700 Subject: [Numpy-discussion] asanyarray vs. asarray In-Reply-To: References: <24100c7f-20fd-4eed-99b0-d37660f52223@Canary> Message-ID: On Tue, Oct 30, 2018 at 2:22 PM Stephan Hoyer wrote: > On Mon, Oct 29, 2018 at 9:49 PM Eric Wieser > wrote: > >> The latter - changing the behavior of multiplication breaks the principle. >> >> But this is not the main reason for deprecating matrix - almost all of >> the problems I?ve seen have been caused by the way that matrices behave >> when sliced. The way that m[i][j] and m[i,j] are different is just one >> example of this, the fact that they must be 2d is another. >> >> Matrices behaving differently on multiplication isn?t super different in >> my mind to how string arrays fail to multiply at all. >> >> Eric >> > It's certainly fine for arithmetic to work differently on an element-wise > basis or even to error. But np.matrix changes the shape of results from > various ndarray operations (e.g., both multiplication and indexing), which > is more than any dtype can do. > > The Liskov substitution principle (LSP) suggests that the set of > reasonable ndarray subclasses are exactly those that could also in > principle correspond to a new dtype. > I don't think so. Dtypes have nothing to do with a whole set of use cases that add extra methods or attributes. Random made-up example: user has a system with 1000 sensor signals, some of which should be treated with robust statistics for . So user writes a subclass robust_ndarray, adds a bunch of methods like median/iqr/mad, and uses isinstance checks in functions that accept both ndarray and robust_ndarray to figure out how to preprocess sensor signals. Of course you can do everything you can do with subclasses also in other ways, but such "let's add some methods or attributes" are much more common (I think, hard to prove) than "let's change how indexing or multiplication works" in end user code. Cheers, Ralf > Of np.ndarray subclasses in wide-spread use, I think only the various > "array with units" types come close satisfying this criteria. They only > fall short insofar as they present a misleading dtype (without unit > information). > > The main problem with subclassing for numpy.ndarray is that it guarantees > too much: a large set of operations/methods along with a specific memory > layout exposed as part of its public API. Worse, ndarray itself is a little > quirky (e.g., with indexing, and its handling of scalars vs. 0d arrays). In > practice, it's basically impossible to layer on complex behavior with these > exact semantics, so only extremely minimal ndarray subclasses don't violate > LSP. > > Once we have more easily extended dtypes, I suspect most of the good use > cases for subclassing will have gone away. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: