From ndbecker2 at gmail.com Thu Sep 1 10:48:20 2016 From: ndbecker2 at gmail.com (Neal Becker) Date: Thu, 01 Sep 2016 10:48:20 -0400 Subject: [Numpy-discussion] State-of-the-art to use a C/C++ library from Python References: Message-ID: Jason Newton wrote: > I just wanted to follow up on the C++ side of OP email - Cython has quite > a > few difficulties working with C++ code at the moment. It's really more of > a C solution most of the time and you must split things up into a mostly C > call interface (that is the C code Cython can call) and limit > exposure/complications with templates and complex C++11+ constructs. > This may change in the longer term but in the near, that is the state. > > I used to use Boost.Python but I'm getting my feet wet with Pybind (which > is basically the same api but works more as you expect it to with it's > signature/type plumbing (including std::shared_ptr islanding), with some > other C++11 based improvements, and is header only + submodule friendly!). > I also remembered ndarray thanks to Neal's post but I haven't figured out > how to leverage it better than pybind, at the moment. I'd be interested > to see ndarray gain support for pybind interoperability... > > -Jason > > On Wed, Aug 31, 2016 at 1:08 PM, David Morris wrote: > >> On Wed, Aug 31, 2016 at 2:28 PM, Michael Bieri wrote: >> >>> Hi all >>> >>> There are several ways on how to use C/C++ code from Python with NumPy, >>> as given in http://docs.scipy.org/doc/numpy/user/c-info.html . >>> Furthermore, there's at least pybind11. >>> >>> I'm not quite sure which approach is state-of-the-art as of 2016. How >>> would you do it if you had to make a C/C++ library available in Python >>> right now? >>> >>> In my case, I have a C library with some scientific functions on >>> matrices and vectors. You will typically call a few functions to >>> configure the computation, then hand over some pointers to existing >>> buffers containing vector data, then start the computation, and finally >>> read back the data. The library also can use MPI to parallelize. >>> >> >> I have been delighted with Cython for this purpose. Great integration >> with NumPy (you can access numpy arrays directly as C arrays), very >> python like syntax and amazing performance. >> >> Good luck, >> >> David >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> pybind11 looks very nice. My problem is that the numpy API exposed by pybind11 is fairly weak at this point, as far as I can see from the docs. ndarray exposes a lot of functionality through the Array object, including convenient indexing and slicing. AFAICT, the interface in pybind11 is pretty low level - just pointers. There is also some functionality exposed by pybind11 using eigen. Personally, I find eigen rather baroque, and only use it when I see no alternative. From mailinglists at xgm.de Thu Sep 1 10:49:57 2016 From: mailinglists at xgm.de (Florian Lindner) Date: Thu, 1 Sep 2016 16:49:57 +0200 Subject: [Numpy-discussion] Reading in a mesh file In-Reply-To: References: <738e8474-524a-8cd0-9340-4b56d7a909da@xgm.de> Message-ID: <57c79494-33c4-4a8c-ded9-15c6298d65fd@xgm.de> Hello, thanks for your reply which was really helpful! My problem is that I discovered that the data I got is rather unordered. The documentation for reshape says: Read the elements of a using this index order, and place the elements into the reshaped array using this index order. ?C? means to read / write the elements using C-like index order, with the last axis index changing fastest, back to the first axis index changing slowest. ?F? means to read / write the elements using Fortran-like index order, with the first index changing fastest, and the last index changing slowest. With my data both dimensions change, so there is no specific ordering of the points, just a bunch of arbitrarily mixed "x y z value" data. My idea is: out = np.loadtxt(...) x = np.unique(out[:,0]) y = np.unique[out]:,1]) xx, yy = np.meshgrid(x, y) values = lookup(xx, yy, out) lookup is ufunc (I hope that term is correct here) that looks up the value of every x and y in out, like x_filtered = out[ out[:,0] == x, :] y_filtered = out[ out[:,1] == y, :] return y_filtered[2] (untested, just a sketch) Would this work? Any better way? Thanks, Florian Am 31.08.2016 um 17:06 schrieb Robert Kern: > On Wed, Aug 31, 2016 at 4:00 PM, Florian Lindner > wrote: >> >> Hello, >> >> I have mesh (more exactly: just a bunch of nodes) description with values associated to the nodes in a file, e.g. for a >> 3x3 mesh: >> >> 0 0 10 >> 0 0.3 11 >> 0 0.6 12 >> 0.3 0 20 >> 0.3 0.3 21 >> 0.3 0.6 22 >> 0.6 0 30 >> 0.6 0.3 31 >> 0.6 0.6 32 >> >> What is best way to read it in and get data structures like the ones I get from np.meshgrid? >> >> Of course, I know about np.loadtxt, but I'm having trouble getting the resulting arrays (x, y, values) in the right form >> and to retain association to the values. > > For this particular case (known shape and ordering), this is what I would do. Maybe throw in a .T or three depending on > exactly how you want them to be laid out. > > [~/scratch] > |1> !cat mesh.txt > > 0 0 10 > 0 0.3 11 > 0 0.6 12 > 0.3 0 20 > 0.3 0.3 21 > 0.3 0.6 22 > 0.6 0 30 > 0.6 0.3 31 > 0.6 0.6 32 > > [~/scratch] > |2> nodes = np.loadtxt('mesh.txt') > > [~/scratch] > |3> nodes > array([[ 0. , 0. , 10. ], > [ 0. , 0.3, 11. ], > [ 0. , 0.6, 12. ], > [ 0.3, 0. , 20. ], > [ 0.3, 0.3, 21. ], > [ 0.3, 0.6, 22. ], > [ 0.6, 0. , 30. ], > [ 0.6, 0.3, 31. ], > [ 0.6, 0.6, 32. ]]) > > [~/scratch] > |4> reshaped = nodes.reshape((3, 3, -1)) > > [~/scratch] > |5> reshaped > array([[[ 0. , 0. , 10. ], > [ 0. , 0.3, 11. ], > [ 0. , 0.6, 12. ]], > > [[ 0.3, 0. , 20. ], > [ 0.3, 0.3, 21. ], > [ 0.3, 0.6, 22. ]], > > [[ 0.6, 0. , 30. ], > [ 0.6, 0.3, 31. ], > [ 0.6, 0.6, 32. ]]]) > > [~/scratch] > |7> x = reshaped[..., 0] > > [~/scratch] > |8> y = reshaped[..., 1] > > [~/scratch] > |9> values = reshaped[..., 2] > > [~/scratch] > |10> x > array([[ 0. , 0. , 0. ], > [ 0.3, 0.3, 0.3], > [ 0.6, 0.6, 0.6]]) > > [~/scratch] > |11> y > array([[ 0. , 0.3, 0.6], > [ 0. , 0.3, 0.6], > [ 0. , 0.3, 0.6]]) > > [~/scratch] > |12> values > array([[ 10., 11., 12.], > [ 20., 21., 22.], > [ 30., 31., 32.]]) > > -- > Robert Kern > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > From robert.kern at gmail.com Thu Sep 1 11:03:11 2016 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 1 Sep 2016 16:03:11 +0100 Subject: [Numpy-discussion] Reading in a mesh file In-Reply-To: <57c79494-33c4-4a8c-ded9-15c6298d65fd@xgm.de> References: <738e8474-524a-8cd0-9340-4b56d7a909da@xgm.de> <57c79494-33c4-4a8c-ded9-15c6298d65fd@xgm.de> Message-ID: On Thu, Sep 1, 2016 at 3:49 PM, Florian Lindner wrote: > > Hello, > > thanks for your reply which was really helpful! > > My problem is that I discovered that the data I got is rather unordered. > > The documentation for reshape says: Read the elements of a using this index order, and place the elements into the > reshaped array using this index order. ?C? means to read / write the elements using C-like index order, with the last > axis index changing fastest, back to the first axis index changing slowest. ?F? means to read / write the elements using > Fortran-like index order, with the first index changing fastest, and the last index changing slowest. > > With my data both dimensions change, so there is no specific ordering of the points, just a bunch of arbitrarily mixed > "x y z value" data. > > My idea is: > > out = np.loadtxt(...) > x = np.unique(out[:,0]) > y = np.unique[out]:,1]) > xx, yy = np.meshgrid(x, y) > > values = lookup(xx, yy, out) > > lookup is ufunc (I hope that term is correct here) that looks up the value of every x and y in out, like > x_filtered = out[ out[:,0] == x, :] > y_filtered = out[ out[:,1] == y, :] > return y_filtered[2] > > (untested, just a sketch) > > Would this work? Any better way? If the (x, y) values are actually drawn from a rectilinear grid, then you can use np.lexsort() to sort the rows before reshaping. [~/scratch] |4> !cat random-mesh.txt 0.3 0.3 21 0 0 10 0 0.3 11 0.3 0.6 22 0 0.6 12 0.6 0.3 31 0.3 0 20 0.6 0.6 32 0.6 0 30 [~/scratch] |5> scrambled_nodes = np.loadtxt('random-mesh.txt') # Note! Put the "faster" column before the "slower" column! [~/scratch] |6> i = np.lexsort([scrambled_nodes[:, 1], scrambled_nodes[:, 0]]) [~/scratch] |7> sorted_nodes = scrambled_nodes[i] [~/scratch] |8> sorted_nodes array([[ 0. , 0. , 10. ], [ 0. , 0.3, 11. ], [ 0. , 0.6, 12. ], [ 0.3, 0. , 20. ], [ 0.3, 0.3, 21. ], [ 0.3, 0.6, 22. ], [ 0.6, 0. , 30. ], [ 0.6, 0.3, 31. ], [ 0.6, 0.6, 32. ]]) Then carry on with the reshape()ing as before. If the grid points that "ought to be the same" are not actually identical, then you may end up with some problems, e.g. if you had "0.300000000001 0.0 20.0" as a row, but all of the other "x=0.3" rows had "0.3", then that row would get sorted out of order. You would have to clean up the grid coordinates a bit first. -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.e.creasey.00 at googlemail.com Fri Sep 2 04:16:25 2016 From: p.e.creasey.00 at googlemail.com (Peter Creasey) Date: Fri, 2 Sep 2016 09:16:25 +0100 Subject: [Numpy-discussion] State-of-the-art to use a C/C++ library from Python Message-ID: > Date: Wed, 31 Aug 2016 13:28:21 +0200 > From: Michael Bieri > > I'm not quite sure which approach is state-of-the-art as of 2016. How would > you do it if you had to make a C/C++ library available in Python right now? > > In my case, I have a C library with some scientific functions on matrices > and vectors. You will typically call a few functions to configure the > computation, then hand over some pointers to existing buffers containing > vector data, then start the computation, and finally read back the data. > The library also can use MPI to parallelize. > Depending on how minimal and universal you want to keep things, I use the ctypes approach quite often, i.e. treat your numpy inputs an outputs as arrays of doubles etc using the ndpointer(...) syntax. I find it works well if you have a small number of well-defined functions (not too many options) which are numerically very heavy. With this approach I usually wrap each method in python to check the inputs for contiguity, pass in the sizes etc. and allocate the numpy array for the result. Peter From njs at pobox.com Fri Sep 2 05:16:35 2016 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 2 Sep 2016 02:16:35 -0700 Subject: [Numpy-discussion] State-of-the-art to use a C/C++ library from Python In-Reply-To: References: Message-ID: On Fri, Sep 2, 2016 at 1:16 AM, Peter Creasey wrote: >> Date: Wed, 31 Aug 2016 13:28:21 +0200 >> From: Michael Bieri >> >> I'm not quite sure which approach is state-of-the-art as of 2016. How would >> you do it if you had to make a C/C++ library available in Python right now? >> >> In my case, I have a C library with some scientific functions on matrices >> and vectors. You will typically call a few functions to configure the >> computation, then hand over some pointers to existing buffers containing >> vector data, then start the computation, and finally read back the data. >> The library also can use MPI to parallelize. >> > > Depending on how minimal and universal you want to keep things, I use > the ctypes approach quite often, i.e. treat your numpy inputs an > outputs as arrays of doubles etc using the ndpointer(...) syntax. I > find it works well if you have a small number of well-defined > functions (not too many options) which are numerically very heavy. > With this approach I usually wrap each method in python to check the > inputs for contiguity, pass in the sizes etc. and allocate the numpy > array for the result. FWIW, the broader Python community seems to have largely deprecated ctypes in favor of cffi. Unfortunately I don't know if anyone has written helpers like numpy.ctypeslib for cffi... -n -- Nathaniel J. Smith -- https://vorpus.org From cmkleffner at gmail.com Fri Sep 2 06:33:05 2016 From: cmkleffner at gmail.com (Carl Kleffner) Date: Fri, 2 Sep 2016 12:33:05 +0200 Subject: [Numpy-discussion] State-of-the-art to use a C/C++ library from Python In-Reply-To: References: Message-ID: maybe https://bitbucket.org/memotype/cffiwrap or https://github.com/andrewleech/cfficloak helps? C. 2016-09-02 11:16 GMT+02:00 Nathaniel Smith : > On Fri, Sep 2, 2016 at 1:16 AM, Peter Creasey > wrote: > >> Date: Wed, 31 Aug 2016 13:28:21 +0200 > >> From: Michael Bieri > >> > >> I'm not quite sure which approach is state-of-the-art as of 2016. How > would > >> you do it if you had to make a C/C++ library available in Python right > now? > >> > >> In my case, I have a C library with some scientific functions on > matrices > >> and vectors. You will typically call a few functions to configure the > >> computation, then hand over some pointers to existing buffers containing > >> vector data, then start the computation, and finally read back the data. > >> The library also can use MPI to parallelize. > >> > > > > Depending on how minimal and universal you want to keep things, I use > > the ctypes approach quite often, i.e. treat your numpy inputs an > > outputs as arrays of doubles etc using the ndpointer(...) syntax. I > > find it works well if you have a small number of well-defined > > functions (not too many options) which are numerically very heavy. > > With this approach I usually wrap each method in python to check the > > inputs for contiguity, pass in the sizes etc. and allocate the numpy > > array for the result. > > FWIW, the broader Python community seems to have largely deprecated > ctypes in favor of cffi. Unfortunately I don't know if anyone has > written helpers like numpy.ctypeslib for cffi... > > -n > > -- > Nathaniel J. Smith -- https://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From seb.haase at gmail.com Fri Sep 2 07:46:42 2016 From: seb.haase at gmail.com (Sebastian Haase) Date: Fri, 2 Sep 2016 13:46:42 +0200 Subject: [Numpy-discussion] State-of-the-art to use a C/C++ library from Python In-Reply-To: References: Message-ID: How do these two relate to each other !? - Sebastian On Fri, Sep 2, 2016 at 12:33 PM, Carl Kleffner wrote: > maybe https://bitbucket.org/memotype/cffiwrap or https://github.com/ > andrewleech/cfficloak helps? > > C. > > > 2016-09-02 11:16 GMT+02:00 Nathaniel Smith : > >> On Fri, Sep 2, 2016 at 1:16 AM, Peter Creasey >> wrote: >> >> Date: Wed, 31 Aug 2016 13:28:21 +0200 >> >> From: Michael Bieri >> >> >> >> I'm not quite sure which approach is state-of-the-art as of 2016. How >> would >> >> you do it if you had to make a C/C++ library available in Python right >> now? >> >> >> >> In my case, I have a C library with some scientific functions on >> matrices >> >> and vectors. You will typically call a few functions to configure the >> >> computation, then hand over some pointers to existing buffers >> containing >> >> vector data, then start the computation, and finally read back the >> data. >> >> The library also can use MPI to parallelize. >> >> >> > >> > Depending on how minimal and universal you want to keep things, I use >> > the ctypes approach quite often, i.e. treat your numpy inputs an >> > outputs as arrays of doubles etc using the ndpointer(...) syntax. I >> > find it works well if you have a small number of well-defined >> > functions (not too many options) which are numerically very heavy. >> > With this approach I usually wrap each method in python to check the >> > inputs for contiguity, pass in the sizes etc. and allocate the numpy >> > array for the result. >> >> FWIW, the broader Python community seems to have largely deprecated >> ctypes in favor of cffi. Unfortunately I don't know if anyone has >> written helpers like numpy.ctypeslib for cffi... >> >> -n >> >> -- >> Nathaniel J. Smith -- https://vorpus.org >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cmkleffner at gmail.com Fri Sep 2 07:53:20 2016 From: cmkleffner at gmail.com (Carl Kleffner) Date: Fri, 2 Sep 2016 13:53:20 +0200 Subject: [Numpy-discussion] State-of-the-art to use a C/C++ library from Python In-Reply-To: References: Message-ID: fork / extension of cffiwrap: *"cfficloak - A simple but flexible module for creating object-oriented, pythonic CFFI wrappers.This is an extension of https://bitbucket.org/memotype/cffiwrap "* 2016-09-02 13:46 GMT+02:00 Sebastian Haase : > How do these two relate to each other !? > - Sebastian > > > On Fri, Sep 2, 2016 at 12:33 PM, Carl Kleffner > wrote: > >> maybe https://bitbucket.org/memotype/cffiwrap or >> https://github.com/andrewleech/cfficloak helps? >> >> C. >> >> >> 2016-09-02 11:16 GMT+02:00 Nathaniel Smith : >> >>> On Fri, Sep 2, 2016 at 1:16 AM, Peter Creasey >>> wrote: >>> >> Date: Wed, 31 Aug 2016 13:28:21 +0200 >>> >> From: Michael Bieri >>> >> >>> >> I'm not quite sure which approach is state-of-the-art as of 2016. How >>> would >>> >> you do it if you had to make a C/C++ library available in Python >>> right now? >>> >> >>> >> In my case, I have a C library with some scientific functions on >>> matrices >>> >> and vectors. You will typically call a few functions to configure the >>> >> computation, then hand over some pointers to existing buffers >>> containing >>> >> vector data, then start the computation, and finally read back the >>> data. >>> >> The library also can use MPI to parallelize. >>> >> >>> > >>> > Depending on how minimal and universal you want to keep things, I use >>> > the ctypes approach quite often, i.e. treat your numpy inputs an >>> > outputs as arrays of doubles etc using the ndpointer(...) syntax. I >>> > find it works well if you have a small number of well-defined >>> > functions (not too many options) which are numerically very heavy. >>> > With this approach I usually wrap each method in python to check the >>> > inputs for contiguity, pass in the sizes etc. and allocate the numpy >>> > array for the result. >>> >>> FWIW, the broader Python community seems to have largely deprecated >>> ctypes in favor of cffi. Unfortunately I don't know if anyone has >>> written helpers like numpy.ctypeslib for cffi... >>> >>> -n >>> >>> -- >>> Nathaniel J. Smith -- https://vorpus.org >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> https://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From totonixsame at gmail.com Fri Sep 2 15:33:25 2016 From: totonixsame at gmail.com (Thiago Franco Moraes) Date: Fri, 02 Sep 2016 19:33:25 +0000 Subject: [Numpy-discussion] State-of-the-art to use a C/C++ library from Python In-Reply-To: References: Message-ID: I think you can use ffi.from_buffer and ffi.cast from cffi. On Fri, Sep 2, 2016 at 8:53 AM Carl Kleffner wrote: > fork / extension of cffiwrap: > > > *"cfficloak - A simple but flexible module for creating object-oriented, > pythonic CFFI wrappers.This is an extension of > https://bitbucket.org/memotype/cffiwrap > "* > > 2016-09-02 13:46 GMT+02:00 Sebastian Haase : > >> How do these two relate to each other !? >> - Sebastian >> >> >> On Fri, Sep 2, 2016 at 12:33 PM, Carl Kleffner >> wrote: >> >>> maybe https://bitbucket.org/memotype/cffiwrap or >>> https://github.com/andrewleech/cfficloak helps? >>> >>> C. >>> >>> >>> 2016-09-02 11:16 GMT+02:00 Nathaniel Smith : >>> >>>> On Fri, Sep 2, 2016 at 1:16 AM, Peter Creasey >>>> wrote: >>>> >> Date: Wed, 31 Aug 2016 13:28:21 +0200 >>>> >> From: Michael Bieri >>>> >> >>>> >> I'm not quite sure which approach is state-of-the-art as of 2016. >>>> How would >>>> >> you do it if you had to make a C/C++ library available in Python >>>> right now? >>>> >> >>>> >> In my case, I have a C library with some scientific functions on >>>> matrices >>>> >> and vectors. You will typically call a few functions to configure the >>>> >> computation, then hand over some pointers to existing buffers >>>> containing >>>> >> vector data, then start the computation, and finally read back the >>>> data. >>>> >> The library also can use MPI to parallelize. >>>> >> >>>> > >>>> > Depending on how minimal and universal you want to keep things, I use >>>> > the ctypes approach quite often, i.e. treat your numpy inputs an >>>> > outputs as arrays of doubles etc using the ndpointer(...) syntax. I >>>> > find it works well if you have a small number of well-defined >>>> > functions (not too many options) which are numerically very heavy. >>>> > With this approach I usually wrap each method in python to check the >>>> > inputs for contiguity, pass in the sizes etc. and allocate the numpy >>>> > array for the result. >>>> >>>> FWIW, the broader Python community seems to have largely deprecated >>>> ctypes in favor of cffi. Unfortunately I don't know if anyone has >>>> written helpers like numpy.ctypeslib for cffi... >>>> >>>> -n >>>> >>>> -- >>>> Nathaniel J. Smith -- https://vorpus.org >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at scipy.org >>>> https://mail.scipy.org/mailman/listinfo/numpy-discussion >>>> >>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> https://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Sat Sep 3 15:08:46 2016 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Sat, 03 Sep 2016 21:08:46 +0200 Subject: [Numpy-discussion] New Indexing Methods Revival #N (subclasses!) Message-ID: <1472929726.4472.23.camel@sipsolutions.net> Hi all, not that I am planning to spend much time on this right now, however, I did a small rebase of the stuff I had (did not push yet) on oindex and remembered the old problem ;). The one remaining issue I have with adding things like (except making the code prettier and writing tests): arr.oindex[...] ?# outer/orthogonal indexing arr.vindex[...] ?# Picking of elements (much like current) arr.lindex[...] ?# current behaviour for backward compat is what to do about subclasses. Now what I can do (and have currently in my branch) is to tell someone on `subclass.oindex[...]`: This won't work, the subclass implements `__getitem__` or `__setitem__` so I don't know if the result would be correct (its a bit annoying if you also warn about using those attributes, but...). However, with or without such error, we need a nice way for subclasses to define these attributes! This is important even within numpy at least for masked arrays (possibly also matrix and memmap). They (typically) do some stuff before or after the plain indexing operation, so how do we make it convenient to allow them to do the same stuff for the special indexing attributes without weird code duplication? I can think of things, but nothing too great yet so maybe you guys got an elegant idea. - Sebastian -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From sebastian at sipsolutions.net Sun Sep 4 08:10:23 2016 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Sun, 04 Sep 2016 14:10:23 +0200 Subject: [Numpy-discussion] New Indexing Methods Revival #N (subclasses!) In-Reply-To: <1472929726.4472.23.camel@sipsolutions.net> (sfid-20160903_211025_493240_E2648126) References: <1472929726.4472.23.camel@sipsolutions.net> (sfid-20160903_211025_493240_E2648126) Message-ID: <1472991023.2375.17.camel@sipsolutions.net> On Sa, 2016-09-03 at 21:08 +0200, Sebastian Berg wrote: > Hi all, > > not that I am planning to spend much time on this right now, however, > I > did a small rebase of the stuff I had (did not push yet) on oindex > and > remembered the old problem ;). > > The one remaining issue I have with adding things like (except making > the code prettier and writing tests): > > arr.oindex[...] ?# outer/orthogonal indexing > arr.vindex[...] ?# Picking of elements (much like current) > arr.lindex[...] ?# current behaviour for backward compat > > is what to do about subclasses. Now what I can do (and have currently > in my branch) is to tell someone on `subclass.oindex[...]`: This > won't > work, the subclass implements `__getitem__` or `__setitem__` so I > don't > know if the result would be correct (its a bit annoying if you also > warn about using those attributes, but...). > Hmm, I am considering to expose a new indexing helper object. So that subclasses could implement something like `__numpy_getitem__` and `__numpy_setitem__` and if they do (and preferably nothing else) they would get back passed a small object with some information about the indexing operation. So that the subclass would implement: ``` def __numpy_setitem__(self, indexer, values): ? ? indexer.method ?# one of {"plain", "oindex", "vindex", "lindex"} ? ? indexer.scalar ?# Will the result be a scalar? ? ? indexer.view ?# Will the result be a view or a copy? ? ? # More information might be possible (note that not all checks are # done at this point, just basic checks will have happened already). ? ? # Do some code, that prepares self or values, could also use ? ? # indexer for another array (e.g. mask) of the same shape. ? ? result = indexer(self, values) ? ? # Do some coded to fixup the result if necessary. ? ? # Should discuss whether result is first a plain ndarray or ? ? # already wrapped. ``` This could be implemented in the C-side without much hassle, I think. Of course it adds some new API which we would have to support indefinitely. But it seems something like this would also fix the hassle of identifying e.g. if the result should be a scalar for a subclass (which may even be impossible in some cases). Would be very happy about feedback from subclassers! - Sebastian > However, with or without such error, we need a nice way for > subclasses > to define these attributes! This is important even within numpy at > least for masked arrays (possibly also matrix and memmap). > > They (typically) do some stuff before or after the plain indexing > operation, so how do we make it convenient to allow them to do the > same > stuff for the special indexing attributes without weird code > duplication? I can think of things, but nothing too great yet so > maybe > you guys got an elegant idea. > > - Sebastian > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From sebastian at sipsolutions.net Sun Sep 4 08:23:03 2016 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Sun, 04 Sep 2016 14:23:03 +0200 Subject: [Numpy-discussion] New Indexing Methods Revival #N (subclasses!) In-Reply-To: <1472991023.2375.17.camel@sipsolutions.net> References: <1472929726.4472.23.camel@sipsolutions.net> (sfid-20160903_211025_493240_E2648126) <1472991023.2375.17.camel@sipsolutions.net> Message-ID: <1472991783.2375.23.camel@sipsolutions.net> On So, 2016-09-04 at 14:10 +0200, Sebastian Berg wrote: > On Sa, 2016-09-03 at 21:08 +0200, Sebastian Berg wrote: > > > > Hi all, > > > > not that I am planning to spend much time on this right now, > > however, > > I > > did a small rebase of the stuff I had (did not push yet) on oindex > > and > > remembered the old problem ;). > > > > The one remaining issue I have with adding things like (except > > making > > the code prettier and writing tests): > > > > arr.oindex[...] ?# outer/orthogonal indexing > > arr.vindex[...] ?# Picking of elements (much like current) > > arr.lindex[...] ?# current behaviour for backward compat > > > > is what to do about subclasses. Now what I can do (and have > > currently > > in my branch) is to tell someone on `subclass.oindex[...]`: This > > won't > > work, the subclass implements `__getitem__` or `__setitem__` so I > > don't > > know if the result would be correct (its a bit annoying if you also > > warn about using those attributes, but...). > > > Hmm, I am considering to expose a new indexing helper object. So that > subclasses could implement something like `__numpy_getitem__` and > `__numpy_setitem__` and if they do (and preferably nothing else) they > would get back passed a small object with some information about the > indexing operation. So that the subclass would implement: > > ``` > def __numpy_setitem__(self, indexer, values): > ? ? indexer.method ?# one of {"plain", "oindex", "vindex", "lindex"} > ? ? indexer.scalar ?# Will the result be a scalar? > ? ? indexer.view ?# Will the result be a view or a copy? > ? ? # More information might be possible (note that not all checks > are > ????# done at this point, just basic checks will have happened > already). > > ? ? # Do some code, that prepares self or values, could also use > ? ? # indexer for another array (e.g. mask) of the same shape. > > ? ? result = indexer(self, values) > > ? ? # Do some coded to fixup the result if necessary. > ? ? # Should discuss whether result is first a plain ndarray or > ? ? # already wrapped. > ``` Hmm, field access is a bit annoying, but I guess can/has to be included. > > This could be implemented in the C-side without much hassle, I think. > Of course it adds some new API which we would have to support > indefinitely. But it seems something like this would also fix the > hassle of identifying e.g. if the result should be a scalar for a > subclass (which may even be impossible in some cases). > > Would be very happy about feedback from subclassers! > > - Sebastian > > > > > > However, with or without such error, we need a nice way for > > subclasses > > to define these attributes! This is important even within numpy at > > least for masked arrays (possibly also matrix and memmap). > > > > They (typically) do some stuff before or after the plain indexing > > operation, so how do we make it convenient to allow them to do the > > same > > stuff for the special indexing attributes without weird code > > duplication? I can think of things, but nothing too great yet so > > maybe > > you guys got an elegant idea. > > > > - Sebastian > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > https://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From reiter.paul at gmail.com Sun Sep 4 09:39:32 2016 From: reiter.paul at gmail.com (Paul Reiter) Date: Sun, 4 Sep 2016 15:39:32 +0200 Subject: [Numpy-discussion] Pull Request regarding meshgrid Message-ID: https://github.com/numpy/numpy/pull/7984 Hi everybody, I created my first pull request for numpy and as mentioned in the numpy development workflow documentation I hereby post a link to it and a short description to the mailing list. Please take a look. I didn't find a good way to create a contour plot of data of the form: [(x1, y1, f(x1, y1)), (x2, y2, f(x2, y2)), ..., (xn, yn, f(xn, yn))]. In order to do a contour plot, one has to bring the data into the meshgrid format. One possibility would be complicated sorting and reshaping of the data, but this is not easily possible especially if values are missing (not all combinations of (x, y) contained in data). Another way, which is used in all tutorials about contour plotting, is to create the meshgrid beforehand and than apply the function to the meshgrid matrices: x = np.linspace(-3, 3, n) y = np.linspace(-3, 3, n) X, Y = np.meshgrid(x, y) Z = f(X, Y) plt.contourplot(X, Y, Z) But if one does not have the function but only the data, this is also no option. My function essentially creates a dictionary {(x1, y1): f(x1, y1), (x2, y2): f(x2, y2), ..., (xn, yn): f(xn, yn)} with the coordinate tuples as keys and function values as values. Then it creates a meshgrid from all unique x and y coordinates (X and Y). The dictionary is then used to create the matrix Z, filling in np.nan for all missing values. This allows to do the following, with x, y and z being the x, y coordinates and z being the according function value: plt.contourplot(*meshgridify(x, y, f=z)) Maybe there is a simpler solution, but I didn't find one. -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.h.vankerkwijk at gmail.com Sun Sep 4 11:20:53 2016 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Sun, 4 Sep 2016 11:20:53 -0400 Subject: [Numpy-discussion] New Indexing Methods Revival #N (subclasses!) In-Reply-To: <1472991783.2375.23.camel@sipsolutions.net> References: <1472929726.4472.23.camel@sipsolutions.net> <1472991023.2375.17.camel@sipsolutions.net> <1472991783.2375.23.camel@sipsolutions.net> Message-ID: Hi Sebastian, I haven't given this as much thought as it deserves, but thought I would comment from the astropy perspective, where we both have direct subclasses of `ndarray` (`Quantity`, `Column`, `MaskedColumn`) and classes that store their data internally as ndarray (subclass) instances (`Time`, `SkyCoord`, ...). One comment would be that if one were to introduce a special method, one should perhaps think a bit more broadly, and capture more than the indexing methods with it. I wonder about this because for the array-holding classes mentioned above, we initially just had `__getitem__`, which got the relevant items from the underlying arrays, and then constructed a new instance with those. But recently we realised that methods like `reshape`, `transpose`, etc., require essentially the same steps, and so we constructed a new `ShapedLikeNDArray` mixin, which provides all of those [1] as long as one defines a single `_apply` method. (Indeed, it turns out that the same technique works without any real change for some numpy functions such as `np.broadcast_to`.) That said, in the actual ndarray subclasses, we have not found a need to overwrite any of the reshaping methods, since those methods are all handled OK via `__array_finalize__`. We do overwrite `__getitem__` (and `item`) as we need to take care of scalars. And we would obviously have to overwrite `oindex`, etc., as well, for the same reason, so in that respect a common method might be useful. However, perhaps it is worth considering that the only reason we need to overwrite them in the first place, unlike what is the case for all the shape-changing methods, is that scalar output does not get put through `__array_finalize__`. Might it be an idea to have the new indexing methods return array scalars instead of normal ones so we can get rid of this? All the best, Marten [1] https://github.com/astropy/astropy/blob/master/astropy/utils/misc.py#L856 From sebastian at sipsolutions.net Mon Sep 5 01:48:25 2016 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Mon, 05 Sep 2016 07:48:25 +0200 Subject: [Numpy-discussion] New Indexing Methods Revival #N (subclasses!) In-Reply-To: References: <1472929726.4472.23.camel@sipsolutions.net> <1472991023.2375.17.camel@sipsolutions.net> <1472991783.2375.23.camel@sipsolutions.net> Message-ID: <1473054505.2375.35.camel@sipsolutions.net> On So, 2016-09-04 at 11:20 -0400, Marten van Kerkwijk wrote: > Hi Sebastian, > > I haven't given this as much thought as it deserves, but thought I > would comment from the astropy perspective, where we both have direct > subclasses of `ndarray` (`Quantity`, `Column`, `MaskedColumn`) and > classes that store their data internally as ndarray (subclass) > instances (`Time`, `SkyCoord`, ...). > > One comment would be that if one were to introduce a special method, > one should perhaps think a bit more broadly, and capture more than > the > indexing methods with it. I wonder about this because for the > array-holding classes mentioned above, we initially just had > `__getitem__`, which got the relevant items from the underlying > arrays, and then constructed a new instance with those. But recently > we realised that methods like `reshape`, `transpose`, etc., require > essentially the same steps, and so we constructed a new > `ShapedLikeNDArray` mixin, which provides all of those [1] as long as > one defines a single `_apply` method. (Indeed, it turns out that the > same technique works without any real change for some numpy functions > such as `np.broadcast_to`.) > > That said, in the actual ndarray subclasses, we have not found a need > to overwrite any of the reshaping methods, since those methods are > all > handled OK via `__array_finalize__`. We do overwrite `__getitem__` > (and `item`) as we need to take care of scalars. And we would > obviously have to overwrite `oindex`, etc., as well, for the same > reason, so in that respect a common method might be useful. > > However, perhaps it is worth considering that the only reason we need > to overwrite them in the first place, unlike what is the case for all > the shape-changing methods, is that scalar output does not get put > through `__array_finalize__`. Might it be an idea to have the new > indexing methods return array scalars instead of normal ones so we > can > get rid of this? I did not realize the new numpys are special with the scalar handling? The indexing (already before 1.9. I believe) always goes through PyArray_ScalarReturn or so, which I thought was used by almost all functions. If you mean the attributes (oindex, etc.), they could behave a bit different of course, though not sure to what it extend it actually helps since that would also create disparity. If we implement a new special method (__numpy_getitem__), they definitely should behave slightly different in some places. One option might be to not even do the wrapping, but leave it to the subclass. However, if you have an array with arrays inside, knowing whether to return a scalar correctly would have to rely on inspecting the index object, which is why I suggested the indexer to give a few extra informations (such as this one). Of course, since the scalar return goes through a ScalarReturn function, that function could maybe also be tought to indicate the scalar to `__array_finalize__`/`__array_wrap__` (not sure what exactly applies). - Sebastian > All the best, > > Marten > > [1] https://github.com/astropy/astropy/blob/master/astropy/utils/misc > .py#L856 > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From charlesr.harris at gmail.com Mon Sep 5 13:54:57 2016 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 5 Sep 2016 11:54:57 -0600 Subject: [Numpy-discussion] Correct error of invalid axis arguments. Message-ID: Hi All, At the moment there are two error types raised when invalid axis arguments are encountered: IndexError and ValueError. I prefer ValueError for arguments, IndexError seems more appropriate when the bad axis value is used as an index. In any case, having mixed error types is inconvenient, but also inconvenient to change. Should we worry about that? If so, what should the error be? Note that some of the mixup arises because the axis values are not checked before use, in which case IndexError is raised. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Mon Sep 5 14:25:11 2016 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Mon, 05 Sep 2016 20:25:11 +0200 Subject: [Numpy-discussion] Correct error of invalid axis arguments. In-Reply-To: References: Message-ID: <1473099911.22224.3.camel@sipsolutions.net> On Mo, 2016-09-05 at 11:54 -0600, Charles R Harris wrote: > Hi All, > > At the moment there are two error types raised when invalid axis > arguments are encountered: IndexError and ValueError. I prefer > ValueError for arguments, IndexError seems more appropriate when the > bad axis value is used as an index. In any case, having mixed error > types is inconvenient, but also inconvenient to change. Should we > worry about that? If so, what should the error be? Note that some of > the mixup arises because the axis values are not checked before use, > in which case IndexError is raised. > I am not too bothered about it myself, but yes, it is a bit annoying. My gut feeling on it would be to not worry about it much, unless we implement some more general input validator for the python side (which possibly could do even more like validate/convert input arrays all in one go). Putting explicit guards to every single python side function is of course possible too, but I am not quite convinced its worth the trouble. - Sebastian > Chuck > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From m.h.vankerkwijk at gmail.com Mon Sep 5 14:54:07 2016 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Mon, 5 Sep 2016 14:54:07 -0400 Subject: [Numpy-discussion] New Indexing Methods Revival #N (subclasses!) In-Reply-To: <1473054505.2375.35.camel@sipsolutions.net> References: <1472929726.4472.23.camel@sipsolutions.net> <1472991023.2375.17.camel@sipsolutions.net> <1472991783.2375.23.camel@sipsolutions.net> <1473054505.2375.35.camel@sipsolutions.net> Message-ID: Hi Sebastian, Indeed, having the scalar pass through `__array_wrap__` would have been useful (_finalize__ is too late, since one cannot change the class any more, just set attributes). But that is water under the bridge, since we're stuck with people not expecting that. I think the slightly larger question, but one somewhat orthogonal to your suggestion of a new dundermethod, is whether one cannot avoid more such methods by the new indexing routines returning array scalars instead of regular ones. Obviously, though, this has larger scope, as it might be part of the merging of the now partially separate code paths for scalar and array arithmetic, etc. All the best, Marten From sebastian at sipsolutions.net Mon Sep 5 15:22:15 2016 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Mon, 05 Sep 2016 21:22:15 +0200 Subject: [Numpy-discussion] New Indexing Methods Revival #N (subclasses!) In-Reply-To: References: <1472929726.4472.23.camel@sipsolutions.net> <1472991023.2375.17.camel@sipsolutions.net> <1472991783.2375.23.camel@sipsolutions.net> <1473054505.2375.35.camel@sipsolutions.net> Message-ID: <1473103335.22224.17.camel@sipsolutions.net> On Mo, 2016-09-05 at 14:54 -0400, Marten van Kerkwijk wrote: > Hi Sebastian, > > Indeed, having the scalar pass through `__array_wrap__` would have > been useful (_finalize__ is too late, since one cannot change the > class any more, just set attributes).??But that is water under the > bridge, since we're stuck with people not expecting that. > > I think the slightly larger question, but one somewhat orthogonal to > your suggestion of a new dundermethod, is whether one cannot avoid > more such methods by the new indexing routines returning array > scalars > instead of regular ones. > > Obviously, though, this has larger scope, as it might be part of the > merging of the now partially separate code paths for scalar and array > arithmetic, etc. Thanks for the input. I am not quite sure about all of the things. Calling array wrap for the scalar returns does not sound like a problem (it would also effect other code paths). Calling it only for the new methods creates a bit of branching, but is not a big deal. Would it help you though? You could avoid implementing all the new indexing methods for many/most subclasses, but how do you tell numpy that you are supporting them? Right now I thought it would make sense to give an error if you try `subclass.vindex[...]` but the subclass has `__getitem__` implemented (and not overwritten vindex). The dundermethod gives a way to tell numpy: you know what to do. For the sake of masked arrays it is also convenient (you can use the indexer also on the mask), but masked arrays are rather special. It would be interesting if there are more complex subclasses out there, which implement `__getitem__` or `__setitem__`. Maybe all we need is some new trick for the scalars and most subclasses can just remove their `__getitem__` methods.... - Sebastian > > All the best, > > Marten > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From m.h.vankerkwijk at gmail.com Mon Sep 5 18:24:17 2016 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Mon, 5 Sep 2016 18:24:17 -0400 Subject: [Numpy-discussion] New Indexing Methods Revival #N (subclasses!) In-Reply-To: <1473103335.22224.17.camel@sipsolutions.net> References: <1472929726.4472.23.camel@sipsolutions.net> <1472991023.2375.17.camel@sipsolutions.net> <1472991783.2375.23.camel@sipsolutions.net> <1473054505.2375.35.camel@sipsolutions.net> <1473103335.22224.17.camel@sipsolutions.net> Message-ID: Hi Sebastian, It would seem to me that any subclass has to keep up to date with new features in ndarray, and while I think ndarray has a responsibility not to break backward compatibility, I do not think it has to protect against new features possibly not working as expected in subclasses. In particular, I think it is overly complicated (and an unnecessary maintenance burden) to error out if a subclass has `__getitem__` overwritten, but not `oindex`. For somewhat similar reasons, I'm not too keen on a new `__numpy_getitem__` method; I realise it might reduce complexity for some ndarray subclasses eventually, but it also is an additional maintenance burden. If you really think it is useful, I think it might be more helpful to define a new mixin class which provides a version of all indexing methods that just call `__numpy_getitem__` if that is provided by the class that uses the mixin. I would *not* put it in `ndarray` proper. Indeed, the above might even be handier for subclasses, since they can choose, if they wish, to implement a similar mixin for older numpy versions, so that all the numpy version stuff can be moved to a single location. (E.g., I can imagine doing the same for `__numpy_ufunc__`.) Overall, my sense would be to keep your PR to just implementing the various new index methods (which are great -- I still don't really like the names, but sadly don't have better suggestions...). But it might be good if others pipe in here too, in particular those maintaining ndarray subclasses! All the best, Marten From m.h.vankerkwijk at gmail.com Mon Sep 5 18:31:36 2016 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Mon, 5 Sep 2016 18:31:36 -0400 Subject: [Numpy-discussion] New Indexing Methods Revival #N (subclasses!) In-Reply-To: References: <1472929726.4472.23.camel@sipsolutions.net> <1472991023.2375.17.camel@sipsolutions.net> <1472991783.2375.23.camel@sipsolutions.net> <1473054505.2375.35.camel@sipsolutions.net> <1473103335.22224.17.camel@sipsolutions.net> Message-ID: Actually, on those names: an alternative to your proposal would be to introduce only one new method which can do all types of indexing, depending on a keyword argument, i.e., something like ``` def getitem(self, item, mode='outer'): ... ``` -- Marten From nathan12343 at gmail.com Mon Sep 5 19:19:01 2016 From: nathan12343 at gmail.com (Nathan Goldbaum) Date: Mon, 5 Sep 2016 18:19:01 -0500 Subject: [Numpy-discussion] New Indexing Methods Revival #N (subclasses!) In-Reply-To: References: <1472929726.4472.23.camel@sipsolutions.net> <1472991023.2375.17.camel@sipsolutions.net> <1472991783.2375.23.camel@sipsolutions.net> <1473054505.2375.35.camel@sipsolutions.net> <1473103335.22224.17.camel@sipsolutions.net> Message-ID: On Monday, September 5, 2016, Marten van Kerkwijk wrote: > Hi Sebastian, > > It would seem to me that any subclass has to keep up to date with new > features in ndarray, and while I think ndarray has a responsibility > not to break backward compatibility, I do not think it has to protect > against new features possibly not working as expected in subclasses. > In particular, I think it is overly complicated (and an unnecessary > maintenance burden) to error out if a subclass has `__getitem__` > overwritten, but not `oindex`. > > For somewhat similar reasons, I'm not too keen on a new > `__numpy_getitem__` method; I realise it might reduce complexity for > some ndarray subclasses eventually, but it also is an additional > maintenance burden. If you really think it is useful, I think it might > be more helpful to define a new mixin class which provides a version > of all indexing methods that just call `__numpy_getitem__` if that is > provided by the class that uses the mixin. I would *not* put it in > `ndarray` proper. I disagree that multiple inheritance (i.e. with your proposed mixin and ndarray) is something that numpy should enshrine in its API for subclasses. As the maintainer of an ndarray subclass, I'd much rather prefer just to implement a new duner method that older numpy versions will ignore. > > Indeed, the above might even be handier for subclasses, since they can > choose, if they wish, to implement a similar mixin for older numpy > versions, so that all the numpy version stuff can be moved to a single > location. (E.g., I can imagine doing the same for `__numpy_ufunc__`.) > > Overall, my sense would be to keep your PR to just implementing the > various new index methods (which are great -- I still don't really > like the names, but sadly don't have better suggestions...). > > But it might be good if others pipe in here too, in particular those > maintaining ndarray subclasses! > > All the best, > > Marten > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.h.vankerkwijk at gmail.com Mon Sep 5 21:00:42 2016 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Mon, 5 Sep 2016 21:00:42 -0400 Subject: [Numpy-discussion] New Indexing Methods Revival #N (subclasses!) In-Reply-To: References: <1472929726.4472.23.camel@sipsolutions.net> <1472991023.2375.17.camel@sipsolutions.net> <1472991783.2375.23.camel@sipsolutions.net> <1473054505.2375.35.camel@sipsolutions.net> <1473103335.22224.17.camel@sipsolutions.net> Message-ID: Hi Nathan, The question originally posed is whether `ndarray` should provide that single method as a convenience already, even though it doesn't actually use it itself. Do you think that is useful, i.e., a big advantage over overwriting the new oindex, vindex, and another that I forget? My own feeling is that while it is good to provide some hooks for subclasses (__array_prepare__, wrap, finalize, numpy_ufunc), this one is too fine-grained and the benefits do not outweigh the cost, especially since it could easily be done with a mixin (unlike those other cases, which are not used to cover ndarray methods, but rather numpy functions, i.e., they provide subclasses with hooks into those functions, which no mixin could possibly do). All the best, Marten From m.h.vankerkwijk at gmail.com Mon Sep 5 21:02:27 2016 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Mon, 5 Sep 2016 21:02:27 -0400 Subject: [Numpy-discussion] New Indexing Methods Revival #N (subclasses!) In-Reply-To: References: <1472929726.4472.23.camel@sipsolutions.net> <1472991023.2375.17.camel@sipsolutions.net> <1472991783.2375.23.camel@sipsolutions.net> <1473054505.2375.35.camel@sipsolutions.net> <1473103335.22224.17.camel@sipsolutions.net> Message-ID: p.s. Just to be clear: personally, I think we should have neither `__numpy_getitem__` nor a mixin; we should just get the quite wonderful new indexing methods! From sebastian at sipsolutions.net Tue Sep 6 02:48:15 2016 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 06 Sep 2016 08:48:15 +0200 Subject: [Numpy-discussion] New Indexing Methods Revival #N (subclasses!) In-Reply-To: References: <1472929726.4472.23.camel@sipsolutions.net> <1472991023.2375.17.camel@sipsolutions.net> <1472991783.2375.23.camel@sipsolutions.net> <1473054505.2375.35.camel@sipsolutions.net> <1473103335.22224.17.camel@sipsolutions.net> Message-ID: <1473144495.22224.28.camel@sipsolutions.net> On Mo, 2016-09-05 at 18:24 -0400, Marten van Kerkwijk wrote: > Hi Sebastian, > > It would seem to me that any subclass has to keep up to date with new > features in ndarray, and while I think ndarray has a responsibility > not to break backward compatibility, I do not think it has to protect > against new features possibly not working as expected in subclasses. > In particular, I think it is overly complicated (and an unnecessary > maintenance burden) to error out if a subclass has `__getitem__` > overwritten, but not `oindex`. > It is not complicated implementation wise to check for `__getitem__` existence. However, I start to agree that a warning icould be the better option. It might work after all. > For somewhat similar reasons, I'm not too keen on a new > `__numpy_getitem__` method; I realise it might reduce complexity for > some ndarray subclasses eventually, but it also is an additional > maintenance burden. If you really think it is useful, I think it > might > be more helpful to define a new mixin class which provides a version > of all indexing methods that just call `__numpy_getitem__` if that is > provided by the class that uses the mixin. I would *not* put it in > `ndarray` proper. > Yes, that is maybe a simplier option (in the sense of maintainability), the other would have a bit of extra information available. If this extra information is unnecessary, a MixIn is probably a bit simpler. > Indeed, the above might even be handier for subclasses, since they > can > choose, if they wish, to implement a similar mixin for older numpy > versions, so that all the numpy version stuff can be moved to a > single > location. (E.g., I can imagine doing the same for `__numpy_ufunc__`.) > You can always implement a mixin for older verions if you do all the logic yourself, but I would prefer not to duplicate that logic (Jaime wrote a python class that does it for normal arrays -- not sure if its 100% the same as I did, but you could probably use it in a subclass). So that a numpy provided mixin would not help with that supporting it in old numpy versions, I think. > Overall, my sense would be to keep your PR to just implementing the > various new index methods (which are great -- I still don't really > like the names, but sadly don't have better suggestions...). > Well... The thing is that we have to fix the subclasses within numpy as well (most importantly MaskedArrays). Of course you could delay things a bit, but in the end whatever we use internally could likely also be whatever subclasses might end up using. > But it might be good if others pipe in here too, in particular those > maintaining ndarray subclasses! > Yeah :). > All the best, > > Marten > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From sebastian at sipsolutions.net Tue Sep 6 02:49:48 2016 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 06 Sep 2016 08:49:48 +0200 Subject: [Numpy-discussion] New Indexing Methods Revival #N (subclasses!) In-Reply-To: References: <1472929726.4472.23.camel@sipsolutions.net> <1472991023.2375.17.camel@sipsolutions.net> <1472991783.2375.23.camel@sipsolutions.net> <1473054505.2375.35.camel@sipsolutions.net> <1473103335.22224.17.camel@sipsolutions.net> Message-ID: <1473144588.22224.29.camel@sipsolutions.net> On Mo, 2016-09-05 at 18:19 -0500, Nathan Goldbaum wrote: > > > On Monday, September 5, 2016, Marten van Kerkwijk ail.com> wrote: > > Hi Sebastian, > > > > It would seem to me that any subclass has to keep up to date with > > new > > features in ndarray, and while I think ndarray has a responsibility > > not to break backward compatibility, I do not think it has to > > protect > > against new features possibly not working as expected in > > subclasses. > > In particular, I think it is overly complicated (and an unnecessary > > maintenance burden) to error out if a subclass has `__getitem__` > > overwritten, but not `oindex`. > > > > For somewhat similar reasons, I'm not too keen on a new > > `__numpy_getitem__` method; I realise it might reduce complexity > > for > > some ndarray subclasses eventually, but it also is an additional > > maintenance burden. If you really think it is useful, I think it > > might > > be more helpful to define a new mixin class which provides a > > version > > of all indexing methods that just call `__numpy_getitem__` if that > > is > > provided by the class that uses the mixin. I would *not* put it in > > `ndarray` proper. > I disagree that multiple inheritance (i.e. with your proposed mixin > and ndarray) is something that numpy should enshrine in its API for > subclasses. As the maintainer of an ndarray subclass, I'd much rather > prefer just to implement a new duner method that older numpy versions > will ignore. > ? Hmm, OK, so that would be a + for the method solution even without the need of any of the extra capabilities that may be possible. > > > > Indeed, the above might even be handier for subclasses, since they > > can > > choose, if they wish, to implement a similar mixin for older numpy > > versions, so that all the numpy version stuff can be moved to a > > single > > location. (E.g., I can imagine doing the same for > > `__numpy_ufunc__`.) > > > > Overall, my sense would be to keep your PR to just implementing the > > various new index methods (which are great -- I still don't really > > like the names, but sadly don't have better suggestions...). > > > > But it might be good if others pipe in here too, in particular > > those > > maintaining ndarray subclasses! > > > > All the best, > > > > Marten > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From sebastian at sipsolutions.net Tue Sep 6 02:52:39 2016 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 06 Sep 2016 08:52:39 +0200 Subject: [Numpy-discussion] New Indexing Methods Revival #N (subclasses!) In-Reply-To: References: <1472929726.4472.23.camel@sipsolutions.net> <1472991023.2375.17.camel@sipsolutions.net> <1472991783.2375.23.camel@sipsolutions.net> <1473054505.2375.35.camel@sipsolutions.net> <1473103335.22224.17.camel@sipsolutions.net> Message-ID: <1473144759.22224.32.camel@sipsolutions.net> On Mo, 2016-09-05 at 21:02 -0400, Marten van Kerkwijk wrote: > p.s. Just to be clear: personally, I think we should have neither > `__numpy_getitem__` nor a mixin; we should just get the quite > wonderful new indexing methods! Hehe, yes but see MaskedArrays. They need logic to also index the mask, so `__getitem__`, etc. actually do a lot of things. Without any complex changes (implementing a unified method within that class or similar). The new indexing attributes simply cannot work right. And I think at least a warning might be in order (from the numpy side) for such a subclass. But maybe MaskedArrays are pretty much the most complex subclass available in that regard.... > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From sebastian at sipsolutions.net Tue Sep 6 02:53:44 2016 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 06 Sep 2016 08:53:44 +0200 Subject: [Numpy-discussion] New Indexing Methods Revival #N (subclasses!) In-Reply-To: References: <1472929726.4472.23.camel@sipsolutions.net> <1472991023.2375.17.camel@sipsolutions.net> <1472991783.2375.23.camel@sipsolutions.net> <1473054505.2375.35.camel@sipsolutions.net> <1473103335.22224.17.camel@sipsolutions.net> Message-ID: <1473144824.22224.33.camel@sipsolutions.net> On Mo, 2016-09-05 at 18:31 -0400, Marten van Kerkwijk wrote: > Actually, on those names: an alternative to your proposal would be to > introduce only one new method which can do all types of indexing, > depending on a keyword argument, i.e., something like > ``` > def getitem(self, item, mode='outer'): > ????... > ``` Yeah we can do that easily. The only disadvantage is losing the square bracket notation. > > -- Marten > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From sebastian at sipsolutions.net Tue Sep 6 03:37:44 2016 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 06 Sep 2016 09:37:44 +0200 Subject: [Numpy-discussion] New Indexing Methods Revival #N (subclasses!) In-Reply-To: References: <1472929726.4472.23.camel@sipsolutions.net> <1472991023.2375.17.camel@sipsolutions.net> <1472991783.2375.23.camel@sipsolutions.net> <1473054505.2375.35.camel@sipsolutions.net> <1473103335.22224.17.camel@sipsolutions.net> Message-ID: <1473147464.22224.36.camel@sipsolutions.net> On Mo, 2016-09-05 at 18:31 -0400, Marten van Kerkwijk wrote: > Actually, on those names: an alternative to your proposal would be to > introduce only one new method which can do all types of indexing, > depending on a keyword argument, i.e., something like > ``` > def getitem(self, item, mode='outer'): > ????... > ``` Have I been overthinking this, eh? Just making it `__getitem__(self, index, mode=...)` and then from `vindex` calling the subclasses `__getitem__(self, index, mode="vector")` or so would already solve the issue almost fully? Only thing I am not quite sure about: 1. Is `__getitem__` in some way special to make this difficult (also considering some new ideas like allowing object[a=4]? 2. Do we currently have serious deficiencies we want to fix, and could maybe not fix like that. > -- Marten > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From sebastian at sipsolutions.net Tue Sep 6 03:46:20 2016 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 06 Sep 2016 09:46:20 +0200 Subject: [Numpy-discussion] New Indexing Methods Revival #N (subclasses!) In-Reply-To: <1473147464.22224.36.camel@sipsolutions.net> References: <1472929726.4472.23.camel@sipsolutions.net> <1472991023.2375.17.camel@sipsolutions.net> <1472991783.2375.23.camel@sipsolutions.net> <1473054505.2375.35.camel@sipsolutions.net> <1473103335.22224.17.camel@sipsolutions.net> <1473147464.22224.36.camel@sipsolutions.net> Message-ID: <1473147980.22224.37.camel@sipsolutions.net> On Di, 2016-09-06 at 09:37 +0200, Sebastian Berg wrote: > On Mo, 2016-09-05 at 18:31 -0400, Marten van Kerkwijk wrote: > > > > Actually, on those names: an alternative to your proposal would be > > to > > introduce only one new method which can do all types of indexing, > > depending on a keyword argument, i.e., something like > > ``` > > def getitem(self, item, mode='outer'): > > ????... > > ``` > Have I been overthinking this, eh? Just making it `__getitem__(self, > index, mode=...)` and then from `vindex` calling the subclasses > `__getitem__(self, index, mode="vector")` or so would already solve > the > issue almost fully? Only thing I am not quite sure about: > > 1. Is `__getitem__` in some way special to make this difficult (also > considering some new ideas like allowing object[a=4]? OK; I think the C-side slot cannot get the kwarg likely, but probably you can find a solution for that.... > 2. Do we currently have serious deficiencies we want to fix, and > could > maybe not fix like that. > > > > > > -- Marten > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > https://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From robert.kern at gmail.com Tue Sep 6 05:57:17 2016 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 6 Sep 2016 10:57:17 +0100 Subject: [Numpy-discussion] New Indexing Methods Revival #N (subclasses!) In-Reply-To: <1473147980.22224.37.camel@sipsolutions.net> References: <1472929726.4472.23.camel@sipsolutions.net> <1472991023.2375.17.camel@sipsolutions.net> <1472991783.2375.23.camel@sipsolutions.net> <1473054505.2375.35.camel@sipsolutions.net> <1473103335.22224.17.camel@sipsolutions.net> <1473147464.22224.36.camel@sipsolutions.net> <1473147980.22224.37.camel@sipsolutions.net> Message-ID: On Tue, Sep 6, 2016 at 8:46 AM, Sebastian Berg wrote: > > On Di, 2016-09-06 at 09:37 +0200, Sebastian Berg wrote: > > On Mo, 2016-09-05 at 18:31 -0400, Marten van Kerkwijk wrote: > > > > > > Actually, on those names: an alternative to your proposal would be > > > to > > > introduce only one new method which can do all types of indexing, > > > depending on a keyword argument, i.e., something like > > > ``` > > > def getitem(self, item, mode='outer'): > > > ... > > > ``` > > Have I been overthinking this, eh? Just making it `__getitem__(self, > > index, mode=...)` and then from `vindex` calling the subclasses > > `__getitem__(self, index, mode="vector")` or so would already solve > > the > > issue almost fully? Only thing I am not quite sure about: > > > > 1. Is `__getitem__` in some way special to make this difficult (also > > considering some new ideas like allowing object[a=4]? > > OK; I think the C-side slot cannot get the kwarg likely, but probably > you can find a solution for that.... Well, the solution is to use a different name, I think. -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From mail at telenczuk.pl Tue Sep 6 08:05:55 2016 From: mail at telenczuk.pl (Bartosz Telenczuk) Date: Tue, 06 Sep 2016 14:05:55 +0200 Subject: [Numpy-discussion] Which NumPy/Numpy/numpy spelling? In-Reply-To: <1472632206.22852.0.camel@sipsolutions.net> References: <57c41fc7bf6ce_4222a36c5419@Pct-EqAlain-Z30.notmuch> <1472584674.1559543.710714393.59C5761E@webmail.messagingengine.com> <1472632206.22852.0.camel@sipsolutions.net> Message-ID: <57ceb12337960_26e1ec32683b@Pct-EqAlain-Z30.notmuch> Hi, The general consensus seems to be in favour of using "NumPy" when referring to the project and "numpy" as a module name. Please note that there are currently PRs in 3 different repositories implementing this practice: - numpy docs: https://github.com/numpy/numpy/pull/8021 - numpy.org website: https://github.com/numpy/numpy.org/pull/5 - Scipy Lecture Notes: https://github.com/scipy-lectures/scipy-lecture-notes/pull/265 The name of the mailing lists still conflicts with the practice, but perhaps it would be more hassle to rename it than it's worth it. :). There is also some instances of "Numpy" spelling in numpy sources, but changing them would probably need more care and time. If everyone agrees the PRs could be merged together. Please review and comment! Thanks in advance, Bartosz From sebastian at sipsolutions.net Tue Sep 6 12:26:27 2016 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 06 Sep 2016 18:26:27 +0200 Subject: [Numpy-discussion] New Indexing Methods Revival #N (subclasses!) In-Reply-To: References: <1472929726.4472.23.camel@sipsolutions.net> <1472991023.2375.17.camel@sipsolutions.net> <1472991783.2375.23.camel@sipsolutions.net> <1473054505.2375.35.camel@sipsolutions.net> <1473103335.22224.17.camel@sipsolutions.net> <1473147464.22224.36.camel@sipsolutions.net> <1473147980.22224.37.camel@sipsolutions.net> Message-ID: <1473179187.16087.14.camel@sipsolutions.net> On Di, 2016-09-06 at 10:57 +0100, Robert Kern wrote: > On Tue, Sep 6, 2016 at 8:46 AM, Sebastian Berg s.net> wrote: > > > > On Di, 2016-09-06 at 09:37 +0200, Sebastian Berg wrote: > > > On Mo, 2016-09-05 at 18:31 -0400, Marten van Kerkwijk wrote: > > > > > > > > Actually, on those names: an alternative to your proposal would > be > > > > to > > > > introduce only one new method which can do all types of > indexing, > > > > depending on a keyword argument, i.e., something like > > > > ``` > > > > def getitem(self, item, mode='outer'): > > > > ? ? ... > > > > ``` > > > Have I been overthinking this, eh? Just making it > `__getitem__(self, > > > index, mode=...)` and then from `vindex` calling the subclasses > > > `__getitem__(self, index, mode="vector")` or so would already > solve > > > the > > > issue almost fully? Only thing I am not quite sure about: > > > > > > 1. Is `__getitem__` in some way special to make this difficult > (also > > > considering some new ideas like allowing object[a=4]? > > > > OK; I think the C-side slot cannot get the kwarg likely, but > probably > > you can find a solution for that.... > > Well, the solution is to use a different name, I think. > Yeah :). Which goes back to `__numpy_getitem__` or so, just with a slightly different (simpler) API. Something more along: 1. If subclass has `__numpy_getitem__` call it with the method ? ?keyword. Or just add the argument to `__getitem__` which should ? ?likely work as well. 2. Implement `ndarray.__numpy_getitem__` which takes the method ?? keyword and subclasses would call it instead of ? ?`ndarray.__getitem__` their base class call. The option I first mentioned would be similar but allows to give a bit of extra information to the subclass which may be useful. But if no one actually needs that information (this information would be things available after inspection of the indexing object) it just adds quite a bit of code and thus a maintenance burden). Such a new method could of course do things slightly different (such as the scalar cases, I really have to understand that wrapping thing, I am always worried about the array of array case as well. Or that annoying setitem calls getitem. Or maybe not wrap the array at all.). - Sebastian > -- > Robert Kern > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From shoyer at gmail.com Tue Sep 6 13:10:19 2016 From: shoyer at gmail.com (Stephan Hoyer) Date: Tue, 6 Sep 2016 10:10:19 -0700 Subject: [Numpy-discussion] New Indexing Methods Revival #N (subclasses!) In-Reply-To: References: <1472929726.4472.23.camel@sipsolutions.net> <1472991023.2375.17.camel@sipsolutions.net> <1472991783.2375.23.camel@sipsolutions.net> <1473054505.2375.35.camel@sipsolutions.net> <1473103335.22224.17.camel@sipsolutions.net> Message-ID: On Mon, Sep 5, 2016 at 6:02 PM, Marten van Kerkwijk < m.h.vankerkwijk at gmail.com> wrote: > p.s. Just to be clear: personally, I think we should have neither > `__numpy_getitem__` nor a mixin; we should just get the quite > wonderful new indexing methods! +1 I don't maintain ndarray subclasses (I prefer composition), but I don't think it's too difficult to require implementing vindex and oindex properties from scratch. Side note: I would prefer the more verbose "legacy_index" to "lindex". We really want to discourage this one, and two new abbreviations are bad enough. -------------- next part -------------- An HTML attachment was scrubbed... URL: From nathan12343 at gmail.com Tue Sep 6 13:11:52 2016 From: nathan12343 at gmail.com (Nathan Goldbaum) Date: Tue, 6 Sep 2016 12:11:52 -0500 Subject: [Numpy-discussion] New Indexing Methods Revival #N (subclasses!) In-Reply-To: References: <1472929726.4472.23.camel@sipsolutions.net> <1472991023.2375.17.camel@sipsolutions.net> <1472991783.2375.23.camel@sipsolutions.net> <1473054505.2375.35.camel@sipsolutions.net> <1473103335.22224.17.camel@sipsolutions.net> Message-ID: On Tuesday, September 6, 2016, Stephan Hoyer wrote: > On Mon, Sep 5, 2016 at 6:02 PM, Marten van Kerkwijk < > m.h.vankerkwijk at gmail.com > > wrote: > >> p.s. Just to be clear: personally, I think we should have neither >> `__numpy_getitem__` nor a mixin; we should just get the quite >> wonderful new indexing methods! > > > +1 > > I don't maintain ndarray subclasses (I prefer composition), but I don't > think it's too difficult to require implementing vindex and oindex > properties from scratch. > > Side note: I would prefer the more verbose "legacy_index" to "lindex". We > really want to discourage this one, and two new abbreviations are bad > enough. > Very much agreed. -------------- next part -------------- An HTML attachment was scrubbed... URL: From perimosocordiae at gmail.com Tue Sep 6 13:23:03 2016 From: perimosocordiae at gmail.com (CJ Carey) Date: Tue, 6 Sep 2016 12:23:03 -0500 Subject: [Numpy-discussion] New Indexing Methods Revival #N (subclasses!) In-Reply-To: References: <1472929726.4472.23.camel@sipsolutions.net> <1472991023.2375.17.camel@sipsolutions.net> <1472991783.2375.23.camel@sipsolutions.net> <1473054505.2375.35.camel@sipsolutions.net> <1473103335.22224.17.camel@sipsolutions.net> Message-ID: I'm also in the non-subclass array-like camp, and I'd love to just write vindex and oindex methods, then have: def __getitem__(self, idx): return np.dispatch_getitem(self, idx) Where "dispatch_getitem" does some basic argument checking and calls either vindex or oindex as appropriate. Maybe that fits better as a mixin; I don't really mind either way. On Tue, Sep 6, 2016 at 12:11 PM, Nathan Goldbaum wrote: > > > On Tuesday, September 6, 2016, Stephan Hoyer wrote: > >> On Mon, Sep 5, 2016 at 6:02 PM, Marten van Kerkwijk < >> m.h.vankerkwijk at gmail.com> wrote: >> >>> p.s. Just to be clear: personally, I think we should have neither >>> `__numpy_getitem__` nor a mixin; we should just get the quite >>> wonderful new indexing methods! >> >> >> +1 >> >> I don't maintain ndarray subclasses (I prefer composition), but I don't >> think it's too difficult to require implementing vindex and oindex >> properties from scratch. >> >> Side note: I would prefer the more verbose "legacy_index" to "lindex". We >> really want to discourage this one, and two new abbreviations are bad >> enough. >> > > Very much agreed. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.h.vankerkwijk at gmail.com Tue Sep 6 13:56:12 2016 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Tue, 6 Sep 2016 13:56:12 -0400 Subject: [Numpy-discussion] New Indexing Methods Revival #N (subclasses!) In-Reply-To: References: <1472929726.4472.23.camel@sipsolutions.net> <1472991023.2375.17.camel@sipsolutions.net> <1472991783.2375.23.camel@sipsolutions.net> <1473054505.2375.35.camel@sipsolutions.net> <1473103335.22224.17.camel@sipsolutions.net> Message-ID: I'd love to solve it with `__getitem__`... Since most subclasses will have that defined that with just a single argument, calling it from `oindex` with an extra mode argument set will properly fail, so good in that sense (although one better ensure a useful error message...). Another option would be to store the mode as an additional part in the index tuple (perhaps as a dict), i.e., in python form do something like: ``` def __getitem__(self, index): if isinstance(index, tuple) and isinstance(index[-1], dict): *index, kwargs = index else: kwargs = {} return self._internal_indexer(index, **kwargs) ``` This way, if all a subclass does is to call `super(SubClass, self).__getitem__(index)` (as does `Quantity`; it does not look at the index at all), it would work automagically. Indeed, one could them decide the type of index even by regular slicing in the following way ``` array[:, 10, {'mode': 'vector'}] ``` which one could of course have a special token for (like `np.newaxis` for `None`), so that it would be something like ``` array[:, 10, np.vector_index] ``` However, looking at the above, I fear this is too baroque even by my standards! -- Marten From m.h.vankerkwijk at gmail.com Tue Sep 6 13:59:14 2016 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Tue, 6 Sep 2016 13:59:14 -0400 Subject: [Numpy-discussion] New Indexing Methods Revival #N (subclasses!) In-Reply-To: References: <1472929726.4472.23.camel@sipsolutions.net> <1472991023.2375.17.camel@sipsolutions.net> <1472991783.2375.23.camel@sipsolutions.net> <1473054505.2375.35.camel@sipsolutions.net> <1473103335.22224.17.camel@sipsolutions.net> Message-ID: In a separate message, since perhaps a little less looney: yet another option would be work by analogy with np.ix_ and define pre-dispatch index preparation routines np.ox_ and np.vx_ (say), which would be used as in: ``` array[np.ox_[:, 10]] -- or -- array[np.vx_[:, 10]] ``` This could work if those functions each return something appropriate for the legacy indexer, or, if that is not possible, a specific subclass of tuple as a marker that gets interpreted further up. In the end, though, probably also too complicated. It may remain best to simply implement the new methods instead and keep it at that! -- Marten From sebastian at sipsolutions.net Tue Sep 6 17:23:34 2016 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 06 Sep 2016 23:23:34 +0200 Subject: [Numpy-discussion] New Indexing Methods Revival #N (subclasses!) In-Reply-To: References: <1472929726.4472.23.camel@sipsolutions.net> <1472991023.2375.17.camel@sipsolutions.net> <1472991783.2375.23.camel@sipsolutions.net> <1473054505.2375.35.camel@sipsolutions.net> <1473103335.22224.17.camel@sipsolutions.net> Message-ID: <1473197014.4429.16.camel@sipsolutions.net> On Di, 2016-09-06 at 10:10 -0700, Stephan Hoyer wrote: > On Mon, Sep 5, 2016 at 6:02 PM, Marten van Kerkwijk gmail.com> wrote: > > p.s. Just to be clear: personally, I think we should have neither > > `__numpy_getitem__` nor a mixin; we should just get the quite > > wonderful new indexing methods! > +1 > > I don't maintain ndarray subclasses (I prefer composition), but I > don't think it's too difficult to require implementing vindex and > oindex properties from scratch. > Well, in some sense why I brought it up is masked arrays. They have quite a bit of code in `__getitem__` including doing an identical indexing operation to the mask. I suppose you can always solve it with such code: def __special_getitem__(self, index, method=None): ? ? # do magic ? ? if method is None: ? ? ? ? # (not sure if this gets passed the base class ? ? ? ? res = super()[index] ? ? elif method == "outer": ? ? ? ? res = super().oindex[index] ? ? # ... ? ? # more magic. def __getitem__(self, index): ? ? self.__special_getitem__(index) @property def oindex(self): ? ? # define class elsewhere I guess ? ? class _indexer(object): ? ? ? ? def __init__(self, arr): ? ? ? ? ? ? self.arr = arr ? ? ? ? def __getitem__(self, index) ? ? ? ? ? ? arr.__special_getitem__(index, method='oter') ? ? return _indexer(self) Though I am not 100% sure without testing, a superclass method that understands the `method` kwarg might work better. We can teach numpy to pass in that `method` to getitem so that you don't have to implement that `_indexer` class for the special attribute. I first started to do that for MaskedArrays, and while it is not hard, it seemed a bit tedious. If we move this to a method with a new name, a slight advantage would be that other oddities could be removed maybe. By now it seems to me that nobody really needs the extra information (i.e. preprocessing information of the indexing tuple)...? I thought it could be useful to know things about the result, but I suppose you can check most things (view vs. no view; field access; scalar access) also afterwards? > Side note: I would prefer the more verbose "legacy_index" to > "lindex". We really want to discourage this one, and two new > abbreviations are bad enough. Sounds good to me. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From sebastian at sipsolutions.net Tue Sep 6 17:29:55 2016 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 06 Sep 2016 23:29:55 +0200 Subject: [Numpy-discussion] New Indexing Methods Revival #N (subclasses!) In-Reply-To: References: <1472929726.4472.23.camel@sipsolutions.net> <1472991023.2375.17.camel@sipsolutions.net> <1472991783.2375.23.camel@sipsolutions.net> <1473054505.2375.35.camel@sipsolutions.net> <1473103335.22224.17.camel@sipsolutions.net> Message-ID: <1473197395.4429.22.camel@sipsolutions.net> On Di, 2016-09-06 at 13:59 -0400, Marten van Kerkwijk wrote: > In a separate message, since perhaps a little less looney: yet > another > option would be work by analogy with np.ix_ and define pre-dispatch > index preparation routines np.ox_ and np.vx_ (say), which would be > used as in: > ``` > array[np.ox_[:, 10]]???-- or -- array[np.vx_[:, 10]] > ``` > This could work if those functions each return something appropriate > for the legacy indexer, or, if that is not possible, a specific > subclass of tuple as a marker that gets interpreted further up. Sure, it would be a solution, but not sure it is any better implementation wise then just passing an extra argument. As for the syntax for plain arrays, I am not convinced to be honest. - Sebastian > In the end, though, probably also too complicated. It may remain best > to simply implement the new methods instead and keep it at that! > > -- Marten > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From chris.barker at noaa.gov Tue Sep 6 18:42:03 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 6 Sep 2016 15:42:03 -0700 Subject: [Numpy-discussion] State-of-the-art to use a C/C++ library from Python In-Reply-To: References: Message-ID: On Fri, Sep 2, 2016 at 1:16 AM, Peter Creasey wrote: > > I'm not quite sure which approach is state-of-the-art as of 2016. How > would > > you do it if you had to make a C/C++ library available in Python right > now? > > > > In my case, I have a C library with some scientific functions on matrices > > and vectors. You will typically call a few functions to configure the > > computation, then hand over some pointers to existing buffers containing > > vector data, then start the computation, and finally read back the data. > > The library also can use MPI to parallelize. > Cython works really well for this. ctypes is a better option if you have a "black box" shared lib you want a call a couple functions in. Cython works better if you want to write a little "thicker" wrapper around youe C code -- i.e. it may do a scalar computation, and you want to apply it to an entire numpy array, at C speed. Either would work in this case, But I like Cyton better, as long as I don't have compilation issues. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.leopardi at gmail.com Wed Sep 7 06:45:11 2016 From: paul.leopardi at gmail.com (Paul Leopardi) Date: Wed, 07 Sep 2016 20:45:11 +1000 Subject: [Numpy-discussion] Has anyone ever used f2py with Cray ftn? How? Message-ID: <2C2217DC-711A-4667-8248-89F295D6E9B3@gmail.com> Hi, Has anyone ever used f2py with the Cray ftn compiler driver? The compiler driver can drive Cray, Gnu, Intel Fortran compilers, including necessary libraries, via loaded modules. Assuming that this has never been done, or that the existing code to do this is unavailable: To use Cray ftn with f2py do I need to change the source code under numpy/distutils/fcompiler, as suggested by this blog post? https://gehrcke.de/2014/02/building-numpy-and-scipy-with-intel-compilers-and-intel-mkl-on-a-64-bit-machine/ I would need to create a new file, cray.py under this directory, contain classes for each of the Cray, Gnu and Intel compilers as invoked by the ftn driver. What other files would I need to change? How would I package tests? How would I contribute the resulting code to NumPy? All the best, Paul -- Paul Leopardi https://sites.google.com/site/paulleopardi/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Wed Sep 7 09:35:39 2016 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 7 Sep 2016 15:35:39 +0200 Subject: [Numpy-discussion] State-of-the-art to use a C/C++ library from Python References: Message-ID: <20160907153539.52373a9d@fsol> On Fri, 2 Sep 2016 02:16:35 -0700 Nathaniel Smith wrote: > > > > Depending on how minimal and universal you want to keep things, I use > > the ctypes approach quite often, i.e. treat your numpy inputs an > > outputs as arrays of doubles etc using the ndpointer(...) syntax. I > > find it works well if you have a small number of well-defined > > functions (not too many options) which are numerically very heavy. > > With this approach I usually wrap each method in python to check the > > inputs for contiguity, pass in the sizes etc. and allocate the numpy > > array for the result. > > FWIW, the broader Python community seems to have largely deprecated > ctypes in favor of cffi. I'm not sure about "largely deprecated". For sure, that's the notion spreaded by a number of people. Regards Antoine. From sebastian at sipsolutions.net Wed Sep 7 12:02:59 2016 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Wed, 07 Sep 2016 18:02:59 +0200 Subject: [Numpy-discussion] New iterator API (nditer): Overlap detection in NumPy Message-ID: <1473264179.7942.9.camel@sipsolutions.net> Hi all, Pauli just opened a nice pull request [1] to add overlap detection to the new iterator, this means adding a new iterator flag: `NPY_ITER_COPY_IF_OVERLAP` If passed to the iterator (also exposed in python), the iterator will copy the operands such that reading and writing should only occur for identical operands. For now this is implemented by always copying the output/writable operand (this could be improved though, so I would not say its fixed API). Since adding this flag is new API, please feel free to suggest other names/approaches or even decline the change ;). This is basically a first step, which should be easily followed by adding overlap detection to ufuncs, removing traps such as the well (or not so well known) `a += a.T`. Other parts of numpy may follow one by one. The work is based on his older awesome new memory overlap detection implementation. If there are no comments, I will probably merge it very soon, so we can look at the follow up things. - Sebastian [1]?https://github.com/numpy/numpy/pull/8026 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From njs at pobox.com Wed Sep 7 12:22:24 2016 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 7 Sep 2016 09:22:24 -0700 Subject: [Numpy-discussion] New iterator API (nditer): Overlap detection in NumPy In-Reply-To: <1473264179.7942.9.camel@sipsolutions.net> References: <1473264179.7942.9.camel@sipsolutions.net> Message-ID: On Sep 7, 2016 9:03 AM, "Sebastian Berg" wrote: > > Hi all, > > Pauli just opened a nice pull request [1] to add overlap detection to > the new iterator, this means adding a new iterator flag: > > `NPY_ITER_COPY_IF_OVERLAP` > > If passed to the iterator (also exposed in python), the iterator will > copy the operands such that reading and writing should only occur for > identical operands. For now this is implemented by always copying the > output/writable operand (this could be improved though, so I would not > say its fixed API). I wonder if there is any way we can avoid the flag, and just make this happen automatically when appropriate? nditer has too many "unbreak-me" flags already. Are there any cases where we *don't* want the copy-if-overlap behavior? Traditionally overlap has triggered undefined behavior, so there's no backcompat issue, right? -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Wed Sep 7 12:36:24 2016 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Wed, 07 Sep 2016 18:36:24 +0200 Subject: [Numpy-discussion] New iterator API (nditer): Overlap detection in NumPy In-Reply-To: References: <1473264179.7942.9.camel@sipsolutions.net> Message-ID: <1473266184.7942.21.camel@sipsolutions.net> On Mi, 2016-09-07 at 09:22 -0700, Nathaniel Smith wrote: > On Sep 7, 2016 9:03 AM, "Sebastian Berg" > wrote: > > > > Hi all, > > > > Pauli just opened a nice pull request [1] to add overlap detection > to > > the new iterator, this means adding a new iterator flag: > > > > `NPY_ITER_COPY_IF_OVERLAP` > > > > If passed to the iterator (also exposed in python), the iterator > will > > copy the operands such that reading and writing should only occur > for > > identical operands. For now this is implemented by always copying > the > > output/writable operand (this could be improved though, so I would > not > > say its fixed API). > I wonder if there is any way we can avoid the flag, and just make > this happen automatically when appropriate? nditer has too many > "unbreak-me" flags already. > Are there any cases where we *don't* want the copy-if-overlap > behavior? Traditionally overlap has triggered undefined behavior, so > there's no backcompat issue, right? Puh, I remember weird abuses, that sometimes stopped working. Even just adding it to ufuncs might destroy some weird cases in someones script.... Whether or not we can just make it default, might be worth thinking about it. What do downstream projects that use the API think? My guess is that would be projects such as numexpr, numba, or I think theano? Maybe another approach is to think about some other way to make good defaults to the iterator easier/saner. Heck, I wonder if we would default to things like "zero size ok" and warned about it, anyone would notice unless as in: Oh I should make it zero size ok ;). - Sebastian > -n > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From pav at iki.fi Wed Sep 7 14:16:55 2016 From: pav at iki.fi (Pauli Virtanen) Date: Wed, 7 Sep 2016 18:16:55 +0000 (UTC) Subject: [Numpy-discussion] New iterator API (nditer): Overlap detection in NumPy References: <1473264179.7942.9.camel@sipsolutions.net> Message-ID: Wed, 07 Sep 2016 09:22:24 -0700, Nathaniel Smith kirjoitti: [clip] > I wonder if there is any way we can avoid the flag, and just make this > happen automatically when appropriate? nditer has too many "unbreak-me" > flags already. > > Are there any cases where we *don't* want the copy-if-overlap behavior? > Traditionally overlap has triggered undefined behavior, so there's no > backcompat issue, right? I didn't put it on by default, because of backward compatibility and side effects that break things. On side effects: there are some bugs in ufunc code that need fixing if the flag is turned on (wheremask code breaks, and ufuncs write to wrong output arrays). Moreover, copying write operands with updateifcopy marks the original arrays as read-only, until the copied array is decrefed. There may also be other side effects that are not so obvious. The PR is not mergeable if the flag would be on by default --- that requires inspecting all the uses of the iterator in the numpy codebase and making sure there's no weird stuff done. I'm not sure how much 3rd party code is using the iterator, but I'm a bit worried also that copies break assumptions also there. It might be possible to turn it on by default for operands with COPY or UPDATEIFCOPY flags --- but I'm not sure if that's helpful (now you'd need to set the flags to all input operands). -- Pauli Virtanen From andrea at andreabedini.com Wed Sep 7 21:38:44 2016 From: andrea at andreabedini.com (Andrea Bedini) Date: Thu, 8 Sep 2016 09:38:44 +0800 Subject: [Numpy-discussion] Which NumPy/Numpy/numpy spelling? In-Reply-To: <57ceb12337960_26e1ec32683b@Pct-EqAlain-Z30.notmuch> References: <57c41fc7bf6ce_4222a36c5419@Pct-EqAlain-Z30.notmuch> <1472584674.1559543.710714393.59C5761E@webmail.messagingengine.com> <1472632206.22852.0.camel@sipsolutions.net> <57ceb12337960_26e1ec32683b@Pct-EqAlain-Z30.notmuch> Message-ID: <833AA6E1-0619-48CA-8C7B-0248C78A3780@andreabedini.com> > On 6 Sep 2016, at 8:05 PM, Bartosz Telenczuk wrote: > > The name of the mailing lists still conflicts with the practice, but perhaps it would be more hassle to rename it than it's worth it. :) The footer appended by the mailing list shows that the name it?s right but only the subject tag is wrong. It?s trivial to fix. > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion Best wishes, Andrea -- Andrea Bedini @andreabedini, http://www.andreabedini.com See the impact of my research at https://impactstory.org/AndreaBedini use https://keybase.io/andreabedini to send me encrypted messages Key fingerprint = 17D5 FB49 FA18 A068 CF53 C5C2 9503 64C1 B2D5 9591 From mail at telenczuk.pl Thu Sep 8 04:12:01 2016 From: mail at telenczuk.pl (Bartosz Telenczuk) Date: Thu, 08 Sep 2016 10:12:01 +0200 Subject: [Numpy-discussion] Which NumPy/Numpy/numpy spelling? In-Reply-To: <833AA6E1-0619-48CA-8C7B-0248C78A3780@andreabedini.com> References: <57c41fc7bf6ce_4222a36c5419@Pct-EqAlain-Z30.notmuch> <1472584674.1559543.710714393.59C5761E@webmail.messagingengine.com> <1472632206.22852.0.camel@sipsolutions.net> <57ceb12337960_26e1ec32683b@Pct-EqAlain-Z30.notmuch> <833AA6E1-0619-48CA-8C7B-0248C78A3780@andreabedini.com> Message-ID: <57d11d5139cb3_34a516525d8e1@Pct-EqAlain-Z30.notmuch> > The footer appended by the mailing list shows that the name it?s right but only the subject tag is wrong. It?s trivial to fix. You are probably right, but I wouldn't like to mess up with people's mail filters (some of which may depend on the subject tag). Cheers, Bartosz From jorisvandenbossche at gmail.com Thu Sep 8 06:11:59 2016 From: jorisvandenbossche at gmail.com (Joris Van den Bossche) Date: Thu, 8 Sep 2016 12:11:59 +0200 Subject: [Numpy-discussion] ANN: pandas v0.19.0rc1 - RELEASE CANDIDATE Message-ID: Hi, I'm pleased to announce the availability of the first release candidate of Pandas 0.19.0. Please try this RC and report any issues at the pandas issue tracker . The release candidate can be installed with conda from our development channel (builds for osx-64, linux-64 and win-64 are available for Python 2.7, 3.4 and 3.5): conda install -c pandas pandas=0.19.0rc1 or with pip from PyPI (wheels are available): pip install --pre pandas==0.19.0rc1 --- THIS IS NOT A PRODUCTION RELEASE This is a major release from 0.18.1 and includes a number of API changes, several new features, enhancements, and performance improvements along with a large number of bug fixes. Highlights include: - New method merge_asof for asof-style time-series joining, see here - The .rolling() method is now time-series aware, see here - read_csv now supports parsing Categorical data, see here - A function union_categorical has been added for combining categoricals, see here - PeriodIndex now has its own period dtype, and changed to be more consistent with other Index classes. See here - Sparse data structures gained enhanced support of int and bool dtypes, see here - Comparison operations with Series no longer ignores the index, see here for an overview of the API changes. - Introduction of a pandas development API for utility functions, see here . - Deprecation of Panel4D and PanelND. We recommend to represent these types of n-dimensional data with the xarray package . - Removal of the previously deprecated modules pandas.io.data, pandas.io.wb, pandas.tools.rplot. See the Whatsnew file for more information. Please report any issues here . A big thanks to all contributors! Joris -------------- next part -------------- An HTML attachment was scrubbed... URL: From morph at debian.org Fri Sep 9 03:20:01 2016 From: morph at debian.org (Sandro Tosi) Date: Fri, 9 Sep 2016 08:20:01 +0100 Subject: [Numpy-discussion] Numpy 1.11.2 In-Reply-To: References: Message-ID: what is the status for this? i checked on GH and https://github.com/numpy/numpy/milestone/43 seems to report no issue pending. the reason i'm asking is that i still have to package 1.11.1 for debian, but i dont want to do all the work and then the next day you release a new version (oh, dear Murphy :) ) On Mon, Aug 15, 2016 at 1:12 AM, Charles R Harris wrote: > "Events, dear boy, events" ;) There were a couple of bugs that turned up at > the last moment that needed fixing. At the moment there are two, possibly > three, bugs that need finishing off. > > A fix for compilation on PPC running RHEL 7.2 (done, but not verified) > Roll back Numpy reload error: more than one project was reloading. > Maybe fix crash for quicksort of object arrays with bogus comparison. > > Chuck > > > > On Sun, Aug 14, 2016 at 11:11 AM, Sandro Tosi wrote: >> >> hey there, what happened here? do you still plan to release a 1.11.2rc1 >> soon? >> >> On Wed, Aug 3, 2016 at 9:09 PM, Charles R Harris >> wrote: >> > Hi All, >> > >> > I would like to release Numpy 1.11.2rc1 this weekend. It will contain a >> > few >> > small fixes and enhancements for windows and the last Scipy release. If >> > there are any pending PRs that you think should go in or be backported >> > for >> > this release, please speak up. >> > >> > Chuck >> > >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at scipy.org >> > https://mail.scipy.org/mailman/listinfo/numpy-discussion >> > >> >> >> >> -- >> Sandro "morph" Tosi >> My website: http://sandrotosi.me/ >> Me at Debian: http://wiki.debian.org/SandroTosi >> G+: https://plus.google.com/u/0/+SandroTosi >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Sandro "morph" Tosi My website: http://sandrotosi.me/ Me at Debian: http://wiki.debian.org/SandroTosi G+: https://plus.google.com/u/0/+SandroTosi From charlesr.harris at gmail.com Fri Sep 9 10:26:20 2016 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 9 Sep 2016 08:26:20 -0600 Subject: [Numpy-discussion] Numpy 1.11.2 In-Reply-To: References: Message-ID: On Fri, Sep 9, 2016 at 1:20 AM, Sandro Tosi wrote: > what is the status for this? i checked on GH and > https://github.com/numpy/numpy/milestone/43 seems to report no issue > pending. the reason i'm asking is that i still have to package 1.11.1 > for debian, but i dont want to do all the work and then the next day > you release a new version (oh, dear Murphy :) ) > I'm planning on putting out 1.11.2rc1 this weekend, then 1-2 weeks to the final. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Sep 9 10:46:55 2016 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 9 Sep 2016 08:46:55 -0600 Subject: [Numpy-discussion] gmane Message-ID: Hi All, Looks like gmane is going down . Does anyone know of an alternative for searching and referencing the NumPy mail archives? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From nathan12343 at gmail.com Fri Sep 9 10:50:58 2016 From: nathan12343 at gmail.com (Nathan Goldbaum) Date: Fri, 9 Sep 2016 09:50:58 -0500 Subject: [Numpy-discussion] gmane In-Reply-To: References: Message-ID: Hi Chuck, Since that blog post, Gmane is under new management and will continue to be available: http://home.gmane.org Nathan On Friday, September 9, 2016, Charles R Harris wrote: > Hi All, > > Looks like gmane is going down > . Does anyone > know of an alternative for searching and referencing the NumPy mail > archives? > > Chuck > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Sat Sep 10 06:01:41 2016 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Sat, 10 Sep 2016 12:01:41 +0200 Subject: [Numpy-discussion] Continued New Indexing Methods Revival #N (subclasses!) Message-ID: <1473501701.7610.9.camel@sipsolutions.net> Hi all, from the discussion, I was thinking maybe something like this: class B(): ? ?def __numpy_getitem__(self, index, indexing_method="plain"): ? ? ? ?# do magic. ? ? ? ?return super().__numpy_getitem__( index, indexing_method=indexing_method) as new API. There are some issues, though. An old subclass may define `__getitem__`. Now the behaviour that would seem nice to me is: 1. No new attribute (no `__numpy_getitem__`) and also no ? ?`__getitem__`/`__setitem__`: Should just work 2. No new attribute but old attributes defined: Should at ? ?least give a warning (or an error) when using the new ? ?attributes, since the behaviour might buggy. 3. `__numpy_getitem__` defined: Will channel all indexing through it ? ?(except maybe some edge cases in python 2). Best, also avoid that ? ?use getitem in setitem trick.... If you define both (which might ? ?make sense for some edge case stuff), you should just channel it ? ?through this yourself. Now the issue I have is that for 1. and 2. to work correctly, I need to know which methods are overloaded by the subclass. Checking is a bit tedious and the method I hacked first for getitem and setitem does not work for a normal method. Can anyone think of a nicer way to do this trick that does not require quite as much hackery. Or is there an easy way to do the overloading check? - Sebastian -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From sebastian at sipsolutions.net Sat Sep 10 09:49:10 2016 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Sat, 10 Sep 2016 15:49:10 +0200 Subject: [Numpy-discussion] Continued New Indexing Methods Revival #N (subclasses!) In-Reply-To: <1473501701.7610.9.camel@sipsolutions.net> References: <1473501701.7610.9.camel@sipsolutions.net> Message-ID: <1473515350.11703.1.camel@sipsolutions.net> On Sa, 2016-09-10 at 12:01 +0200, Sebastian Berg wrote: > Hi all, > > from the discussion, I was thinking maybe something like this: > > class B(): > ? ?def __numpy_getitem__(self, index, indexing_method="plain"): > ? ? ? ?# do magic. > ? ? ? ?return super().__numpy_getitem__( > ???????????index, indexing_method=indexing_method) > > as new API. There are some issues, though. An old subclass may define > `__getitem__`. Now the behaviour that would seem nice to me is: > > 1. No new attribute (no `__numpy_getitem__`) and also no > ? ?`__getitem__`/`__setitem__`: Should just work > 2. No new attribute but old attributes defined: Should at > ? ?least give a warning (or an error) when using the new > ? ?attributes, since the behaviour might buggy. > 3. `__numpy_getitem__` defined: Will channel all indexing through it > ? ?(except maybe some edge cases in python 2). Best, also avoid that > ? ?use getitem in setitem trick.... If you define both (which might > ? ?make sense for some edge case stuff), you should just channel it > ? ?through this yourself. > Maybe in shorter; I would like to know if a subclass: 1. requires no fixup 2. may need fixup 3. supports everything. And I am not sure how to approach this. > Now the issue I have is that for 1. and 2. to work correctly, I need > to > know which methods are overloaded by the subclass. Checking is a bit > tedious and the method I hacked first for getitem and setitem does > not > work for a normal method. > > Can anyone think of a nicer way to do this trick that does not > require > quite as much hackery. Or is there an easy way to do the overloading > check? > > - Sebastian > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From sebastian at sipsolutions.net Sun Sep 11 09:07:09 2016 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Sun, 11 Sep 2016 15:07:09 +0200 Subject: [Numpy-discussion] New Indexing Methods Revival #N (subclasses!) In-Reply-To: References: <1472929726.4472.23.camel@sipsolutions.net> <1472991023.2375.17.camel@sipsolutions.net> <1472991783.2375.23.camel@sipsolutions.net> <1473054505.2375.35.camel@sipsolutions.net> <1473103335.22224.17.camel@sipsolutions.net> Message-ID: <1473599229.23942.2.camel@sipsolutions.net> On Di, 2016-09-06 at 13:59 -0400, Marten van Kerkwijk wrote: > In a separate message, since perhaps a little less looney: yet > another > option would be work by analogy with np.ix_ and define pre-dispatch > index preparation routines np.ox_ and np.vx_ (say), which would be > used as in: > ``` > array[np.ox_[:, 10]]???-- or -- array[np.vx_[:, 10]] > ``` > This could work if those functions each return something appropriate > for the legacy indexer, or, if that is not possible, a specific > subclass of tuple as a marker that gets interpreted further up. > A specific subclass of tuple.... Part of me thinks this is horrifying, but it actually would solve some of the subclassing issues if `arr.vindex[...]` could end up calling `__getitem__` with a bit special indexing tuple value. I simply can't quite find the end of subclassing issues. We have tests for things like masked array correctly calling the `_data` subclass, but if the `_data` subclass does not implement the new method, numpy would have to run in circles (or something).... - Sebastian > In the end, though, probably also too complicated. It may remain best > to simply implement the new methods instead and keep it at that! > > -- Marten > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From m.h.vankerkwijk at gmail.com Sun Sep 11 11:19:33 2016 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Sun, 11 Sep 2016 11:19:33 -0400 Subject: [Numpy-discussion] New Indexing Methods Revival #N (subclasses!) In-Reply-To: <1473599229.23942.2.camel@sipsolutions.net> References: <1472929726.4472.23.camel@sipsolutions.net> <1472991023.2375.17.camel@sipsolutions.net> <1472991783.2375.23.camel@sipsolutions.net> <1473054505.2375.35.camel@sipsolutions.net> <1473103335.22224.17.camel@sipsolutions.net> <1473599229.23942.2.camel@sipsolutions.net> Message-ID: There remains the option to just let subclasses deal with new ndarray features... Certainly, for `Quantity`, I'll quite happily do that. And if it alllows the ndarray code to remain simple and efficient, it is probably the best solution. -- Marten From sebastian at sipsolutions.net Sun Sep 11 12:28:51 2016 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Sun, 11 Sep 2016 18:28:51 +0200 Subject: [Numpy-discussion] New Indexing Methods Revival #N (subclasses!) In-Reply-To: References: <1472929726.4472.23.camel@sipsolutions.net> <1472991023.2375.17.camel@sipsolutions.net> <1472991783.2375.23.camel@sipsolutions.net> <1473054505.2375.35.camel@sipsolutions.net> <1473103335.22224.17.camel@sipsolutions.net> <1473599229.23942.2.camel@sipsolutions.net> Message-ID: <1473611331.23942.4.camel@sipsolutions.net> On So, 2016-09-11 at 11:19 -0400, Marten van Kerkwijk wrote: > There remains the option to just let subclasses deal with new ndarray > features...??Certainly, for `Quantity`, I'll quite happily do that. > And if it alllows the ndarray code to remain simple and efficient, it > is probably the best solution.??-- Marten > Maybe, but I can't quite shake the feeling that we would see a lot of annoying bugs for subclasses that don't adept very quickely. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From pav at iki.fi Sun Sep 11 17:21:13 2016 From: pav at iki.fi (Pauli Virtanen) Date: Sun, 11 Sep 2016 21:21:13 +0000 (UTC) Subject: [Numpy-discussion] New iterator APIs (nditer / MapIter): Overlap detection in NumPy References: <1473264179.7942.9.camel@sipsolutions.net> Message-ID: Hi, In the end some further API additions turn out to appear needed: * NPY_ITER_COPY_IF_OVERLAP, NPY_ITER_OVERLAP_NOT_SAME flags for NpyIter_New. * New API function PyArray_MapIterArrayCopyIfOverlap, as ufunc.at needs to check overlaps for index arrays before constructing iterators, and the parsing is done in multiarray. Continuation here: https://github.com/numpy/numpy/pull/8043 Wed, 07 Sep 2016 18:02:59 +0200, Sebastian Berg kirjoitti: > Hi all, > > Pauli just opened a nice pull request [1] to add overlap detection to > the new iterator, this means adding a new iterator flag: > > `NPY_ITER_COPY_IF_OVERLAP` > > If passed to the iterator (also exposed in python), the iterator will > copy the operands such that reading and writing should only occur for > identical operands. For now this is implemented by always copying the > output/writable operand (this could be improved though, so I would not > say its fixed API). > > Since adding this flag is new API, please feel free to suggest other > names/approaches or even decline the change ;). > > > This is basically a first step, which should be easily followed by > adding overlap detection to ufuncs, removing traps such as the well (or > not so well known) `a += a.T`. Other parts of numpy may follow one by > one. > > The work is based on his older awesome new memory overlap detection > implementation. > > If there are no comments, I will probably merge it very soon, so we can > look at the follow up things. > > - Sebastian > > > [1]?https://github.com/numpy/numpy/ pull/8026_______________________________________________ > NumPy-Discussion mailing list NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion From tom.kooij at gmail.com Mon Sep 12 04:01:30 2016 From: tom.kooij at gmail.com (Tom Kooij) Date: Mon, 12 Sep 2016 10:01:30 +0200 Subject: [Numpy-discussion] ANN: PyTables 3.3.0 Message-ID: =========================== Announcing PyTables 3.3.0 =========================== We are happy to announce PyTables 3.3.0. What's new ========== - Single codebase Python 2 and 3 support (PR #493). - Internal Blosc version updated to 1.11.1 (closes :issue:`541`) - Full BitShuffle support for new Blosc versions (>= 1.8). - It is now possible to remove all rows from a table. - It is now possible to read reference types by dereferencing them as numpy array of objects (closes :issue:`518` and :issue:`519`). Thanks to Ehsan Azar - Fixed Windows 32 and 64-bit builds. In case you want to know more in detail what has changed in this version, please refer to: http://www.pytables.org/release_notes.html You can install it via pip or download a source package with generated PDF and HTML docs from: https://github.com/PyTables/PyTables/releases/tag/v3.3.0 For an online version of the manual, visit: http://www.pytables.org/usersguide/index.html What it is? =========== PyTables is a library for managing hierarchical datasets and designed to efficiently cope with extremely large amounts of data with support for full 64-bit file addressing. PyTables runs on top of the HDF5 library and NumPy package for achieving maximum throughput and convenient use. PyTables includes OPSI, a new indexing technology, allowing to perform data lookups in tables exceeding 10 gigarows (10**10 rows) in less than a tenth of a second. Resources ========= About PyTables: http://www.pytables.org About the HDF5 library: http://hdfgroup.org/HDF5/ About NumPy: http://numpy.scipy.org/ Acknowledgments =============== Thanks to many users who provided feature improvements, patches, bug reports, support and suggestions. See the ``THANKS`` file in the distribution package for a (incomplete) list of contributors. Most specially, a lot of kudos go to the HDF5 and NumPy makers. Without them, PyTables simply would not exist. Share your experience ===================== Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. ---- **Enjoy data!** -- The PyTables Developers .. Local Variables: .. mode: rst .. coding: utf-8 .. fill-column: 72 .. End: -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Mon Sep 12 05:31:07 2016 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Mon, 12 Sep 2016 11:31:07 +0200 Subject: [Numpy-discussion] New iterator APIs (nditer / MapIter): Overlap detection in NumPy In-Reply-To: References: <1473264179.7942.9.camel@sipsolutions.net> Message-ID: <1473672667.2681.4.camel@sipsolutions.net> On So, 2016-09-11 at 21:21 +0000, Pauli Virtanen wrote: > Hi, > > In the end some further API additions turn out to appear needed: > Very nice :). > * NPY_ITER_COPY_IF_OVERLAP, NPY_ITER_OVERLAP_NOT_SAME > ? flags for NpyIter_New. > > * New API function PyArray_MapIterArrayCopyIfOverlap, > ? as ufunc.at needs to check overlaps for index arrays? > ? before constructing iterators, and the parsing is done? > ? in multiarray. I think here Nathaniels point might be right. It could be we can assume that copying is always fine, there is probably only one or two downstream projects using this function, plus it seems harder to create abusing structures that actually do something useful. It was only exposed for usage in `ufunc.at` if I remember right. I know theano uses it though, but not sure about anyone else, maybe numba. On the other hand.... It is not the worst API clutter in history. > > Continuation here: https://github.com/numpy/numpy/pull/8043 > > > > Wed, 07 Sep 2016 18:02:59 +0200, Sebastian Berg kirjoitti: > > > > > Hi all, > > > > Pauli just opened a nice pull request [1] to add overlap detection > > to > > the new iterator, this means adding a new iterator flag: > > > > `NPY_ITER_COPY_IF_OVERLAP` > > > > If passed to the iterator (also exposed in python), the iterator > > will > > copy the operands such that reading and writing should only occur > > for > > identical operands. For now this is implemented by always copying > > the > > output/writable operand (this could be improved though, so I would > > not > > say its fixed API). > > > > Since adding this flag is new API, please feel free to suggest > > other > > names/approaches or even decline the change ;). > > > > > > This is basically a first step, which should be easily followed by > > adding overlap detection to ufuncs, removing traps such as the well > > (or > > not so well known) `a += a.T`. Other parts of numpy may follow one > > by > > one. > > > > The work is based on his older awesome new memory overlap detection > > implementation. > > > > If there are no comments, I will probably merge it very soon, so we > > can > > look at the follow up things. > > > > - Sebastian > > > > > > [1]?https://github.com/numpy/numpy/ > pull/8026_______________________________________________ > > > > NumPy-Discussion mailing list NumPy-Discussion at scipy.org > > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From charlesr.harris at gmail.com Mon Sep 12 13:11:08 2016 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 12 Sep 2016 11:11:08 -0600 Subject: [Numpy-discussion] NumPy 1.11.2rc1 Message-ID: Hi All, I'm pleased to announce the release of Numpy 1.11.2rc1. This release supports Python 2.6 - 2.7, and 3.2 - 3.5 and fixes bugs and regressions found in Numpy 1.11.1. Wheels for Linux, Windows, and OSX can be found on PyPI. Sources are available on both PyPI and Sourceforge . Thanks to all who were involved in this release. The following pull requests have been merged. PRs overridden by later merges and trivial release notes updates have been omitted. - #7736 BUG: Many functions silently drop 'keepdims' kwarg. - #7738 ENH: Add extra kwargs and update doc of many MA methods. - #7778 DOC: Update Numpy 1.11.1 release notes. - #7793 BUG: MaskedArray.count treats negative axes incorrectly. - #7816 BUG: Fix array too big error for wide dtypes. - #7821 BUG: Make sure npy_mul_with_overflow_ detects overflow. - #7824 MAINT: Allocate fewer bytes for empty arrays. - #7847 MAINT,DOC: Fix some imp module uses and update f2py.compile docstring. - #7849 MAINT: Fix remaining uses of deprecated Python imp module. - #7851 BLD: Fix ATLAS version detection. - #7896 BUG: Construct ma.array from np.array which contains padding. - #7904 BUG: Fix float16 type not being called due to wrong ordering. - #7917 BUG: Production install of numpy should not require nose. - #7919 BLD: Fixed MKL detection for recent versions of this library. - #7920 BUG: Fix for issue #7835 (ma.median of 1d). - #7932 BUG: Monkey-patch _msvccompile.gen_lib_option like other compilers. - #7939 BUG: Check for HAVE_LDOUBLE_DOUBLE_DOUBLE_LE in npy_math_complex. - #7953 BUG: Guard against buggy comparisons in generic quicksort. - #7954 BUG: Use keyword arguments to initialize Extension base class. - #7955 BUG: Make sure numpy globals keep identity after reload. - #7972 BUG: MSVCCompiler grows 'lib' & 'include' env strings exponentially. - #8005 BLD: Remove __NUMPY_SETUP__ from builtins at end of setup.py. - #8010 MAINT: Remove leftover imp module imports. - #8020 BUG: Fix return of np.ma.count if keepdims is True and axis is None. - #8024 BUG: Fix numpy.ma.median. - #8031 BUG: Fix np.ma.median with only one non-masked value. - #8044 BUG: Fix bug in NpyIter buffering with discontinuous arrays. The following people contributed to this release. The '+' marks first time contributors. - Allan Haldane - Bertrand Lefebvre - Charles Harris - Julian Taylor - Lo?c Est?ve - Marshall Bockrath-Vandegrift+ - Michael Seifert+ - Pauli Virtanen - Ralf Gommers - Sebastian Berg - Shota Kawabuchi+ - Thomas A Caswell - Valentin Valls+ - Xavier Abellan Ecija+ Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Mon Sep 12 16:22:40 2016 From: pav at iki.fi (Pauli Virtanen) Date: Mon, 12 Sep 2016 20:22:40 +0000 (UTC) Subject: [Numpy-discussion] New iterator APIs (nditer / MapIter): Overlap detection in NumPy References: <1473264179.7942.9.camel@sipsolutions.net> <1473672667.2681.4.camel@sipsolutions.net> Message-ID: Mon, 12 Sep 2016 11:31:07 +0200, Sebastian Berg kirjoitti: >> * NPY_ITER_COPY_IF_OVERLAP, NPY_ITER_OVERLAP_NOT_SAME >> ? flags for NpyIter_New. >> >> * New API function PyArray_MapIterArrayCopyIfOverlap, >> ? as ufunc.at needs to check overlaps for index arrays before >> ? constructing iterators, and the parsing is done in multiarray. > > I think here Nathaniels point might be right. It could be we can assume > that copying is always fine, there is probably only one or two > downstream projects using this function, plus it seems harder to create > abusing structures that actually do something useful. > It was only exposed for usage in `ufunc.at` if I remember right. I know > theano uses it though, but not sure about anyone else, maybe numba. On > the other hand.... It is not the worst API clutter in history. Do you suggest that I break the PyArray_MapIterArray API? One issue here is that the function doesn't make distinction between read- only access and read-write access, so copying may give unnecessary slowdown. The second thing is that it will result to a bit uglier code, as I need to manage the overlap with the second operation in ufunc_at. For NpyIter, I'd still be wary about copying by default, because it's not needed everywhere (the may_share_memory checks are better done earlier), and since the semantic change can break things inside Numpy. Pauli From sebastian at sipsolutions.net Mon Sep 12 16:46:02 2016 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Mon, 12 Sep 2016 22:46:02 +0200 Subject: [Numpy-discussion] New iterator APIs (nditer / MapIter): Overlap detection in NumPy In-Reply-To: References: <1473264179.7942.9.camel@sipsolutions.net> <1473672667.2681.4.camel@sipsolutions.net> Message-ID: <1473713162.32113.7.camel@sipsolutions.net> On Mo, 2016-09-12 at 20:22 +0000, Pauli Virtanen wrote: > Mon, 12 Sep 2016 11:31:07 +0200, Sebastian Berg kirjoitti: > > > > > > > > * NPY_ITER_COPY_IF_OVERLAP, NPY_ITER_OVERLAP_NOT_SAME > > > ? flags for NpyIter_New. > > > > > > * New API function PyArray_MapIterArrayCopyIfOverlap, > > > ? as ufunc.at needs to check overlaps for index arrays before > > > ? constructing iterators, and the parsing is done in multiarray. > > I think here Nathaniels point might be right. It could be we can > > assume > > that copying is always fine, there is probably only one or two > > downstream projects using this function, plus it seems harder to > > create > > abusing structures that actually do something useful. > > It was only exposed for usage in `ufunc.at` if I remember right. I > > know > > theano uses it though, but not sure about anyone else, maybe numba. > > On > > the other hand.... It is not the worst API clutter in history. > Do you suggest that I break the PyArray_MapIterArray API? > > One issue here is that the function doesn't make distinction between > read- > only access and read-write access, so copying may give unnecessary? > slowdown. The second thing is that it will result to a bit uglier > code, as? > I need to manage the overlap with the second operation in ufunc_at. > Yeah, was only wondering about MapIterArray, because I might get away with the API break in the case that it works everywhere for our internal usage. But if its not quite straight forward, there is no point in thinking about it. > For NpyIter, I'd still be wary about copying by default, because it's > not? > needed everywhere (the may_share_memory checks are better done > earlier),? > and since the semantic change can break things inside Numpy. > Yes, I tend to agree here about it. You can always wonder whether its still the most convenient place to do the checks (at least for a few places), but from glancing at the code, it still seems elegant to me. If we are concerned about making the iterator more and more complex, maybe we can really do something else about it as well. I am not sure whether I will manage to look at it very closely soon, so would invite anyone to take a look; this is definitely a highlight! - Sebastian > Pauli > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From vilanova at ac.upc.edu Tue Sep 13 09:02:26 2016 From: vilanova at ac.upc.edu (=?utf-8?Q?Llu=C3=ADs_Vilanova?=) Date: Tue, 13 Sep 2016 15:02:26 +0200 Subject: [Numpy-discussion] String & unicode arrays vs text loading in python 3 Message-ID: <87fup4rlzx.fsf@fimbulvetr.bsc.es> Hi! I'm giving a shot to issue #3184 [1], based on the observation that the string dtype ('S') under python 3 uses byte arrays instead of unicode (the only readable string type in python 3). This brings two major problems: * numpy code has to go through loops to open and read files as binary data to load text into a bytes array, and does not play well with users providing string (unicode) arguments * the repr of these arrays shows strings as b'text' instead of 'text', which breaks doctests of software built on numpy What I'm trying to do is make dtypes 'S' and 'U' equivalnt (NPY_STRING and NPY_UNICODE). Now the question. Keeping 'S' and 'U' as separate dtypes (but same internal implementation) will provide the best backwards compatibility, but is more cumbersome to implement. Is it acceptable to internally just translate all appearances of 'S' (NPY_STRING) to 'U' (NPY_UNICODE) and get rid of one of the two when running in python 3? The main drawback I see is that dtype reprs would not always be as expected: # python 2 >>> np.array('foo', dtype='S') array('foo', dtype='|S3') # python 3 >>> np.array('foo', dtype='S') array('foo', dtype=' References: <87fup4rlzx.fsf@fimbulvetr.bsc.es> Message-ID: <1473773974.28390.4.camel@sipsolutions.net> On Di, 2016-09-13 at 15:02 +0200, Llu?s Vilanova wrote: > Hi! I'm giving a shot to issue #3184 [1], based on the observation > that the > string dtype ('S') under python 3 uses byte arrays instead of unicode > (the only > readable string type in python 3). > > This brings two major problems: > > * numpy code has to go through loops to open and read files as binary > data to > ? load text into a bytes array, and does not play well with users > providing > ? string (unicode) arguments > > * the repr of these arrays shows strings as b'text' instead of > 'text', which > ? breaks doctests of software built on numpy > > What I'm trying to do is make dtypes 'S' and 'U' equivalnt > (NPY_STRING and > NPY_UNICODE). > > Now the question. Keeping 'S' and 'U' as separate dtypes (but same > internal > implementation) will provide the best backwards compatibility, but is > more > cumbersome to implement. I am not sure how that can be possible. Those types are fundamentally different in how they store their data. String types use one byte per character, unicode types will use 4 bytes per character. You can maybe default to unicode in more cases in python 3, but you cannot make them identical internally. What about giving `np.loadtxt` an encoding kwarg or something along that line? - Sebastian > > Is it acceptable to internally just translate all appearances of 'S' > (NPY_STRING) to 'U' (NPY_UNICODE) and get rid of one of the two when > running in > python 3? > > The main drawback I see is that dtype reprs would not always be as > expected: > > ???# python 2 > ???>>> np.array('foo', dtype='S') > ???array('foo', > ?????????dtype='|S3') > > ???# python 3 > ???>>> np.array('foo', dtype='S') > ???array('foo', > ?????????dtype=' > > [1] https://github.com/numpy/numpy/issues/3184 > > > Cheers, > ? Lluis > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From vilanova at ac.upc.edu Tue Sep 13 10:17:51 2016 From: vilanova at ac.upc.edu (=?utf-8?Q?Llu=C3=ADs_Vilanova?=) Date: Tue, 13 Sep 2016 16:17:51 +0200 Subject: [Numpy-discussion] String & unicode arrays vs text loading in python 3 In-Reply-To: <1473773974.28390.4.camel@sipsolutions.net> (Sebastian Berg's message of "Tue, 13 Sep 2016 15:39:34 +0200") References: <87fup4rlzx.fsf@fimbulvetr.bsc.es> <1473773974.28390.4.camel@sipsolutions.net> Message-ID: <871t0nq3xs.fsf@fimbulvetr.bsc.es> Sebastian Berg writes: > On Di, 2016-09-13 at 15:02 +0200, Llu?s Vilanova wrote: >> Hi! I'm giving a shot to issue #3184 [1], based on the observation >> that the >> string dtype ('S') under python 3 uses byte arrays instead of unicode >> (the only >> readable string type in python 3). >> >> This brings two major problems: >> >> * numpy code has to go through loops to open and read files as binary >> data to >> ? load text into a bytes array, and does not play well with users >> providing >> ? string (unicode) arguments >> >> * the repr of these arrays shows strings as b'text' instead of >> 'text', which >> ? breaks doctests of software built on numpy >> >> What I'm trying to do is make dtypes 'S' and 'U' equivalnt >> (NPY_STRING and >> NPY_UNICODE). >> >> Now the question. Keeping 'S' and 'U' as separate dtypes (but same >> internal >> implementation) will provide the best backwards compatibility, but is >> more >> cumbersome to implement. > I am not sure how that can be possible. Those types are fundamentally > different in how they store their data. String types use one byte per > character, unicode types will use 4 bytes per character. You can maybe > default to unicode in more cases in python 3, but you cannot make them > identical internally. > What about giving `np.loadtxt` an encoding kwarg or something along > that line? np.loadtxt and np.genfromtxt are already quite complex in handling the implicit conversion to byte-array imposed by numpy's port to python 3, and still fail in some corner cases. This conversion is also inherently surprising to users, since what I'd get in python 2: >>> np.array('foo', dtype='S') array('foo', dtype='|S3') In python 3 gives me a surprising (note the prefix on the resulting string): >>> np.array('foo', dtype='S') array(b'foo', dtype='|S3') It's not only surprising, but also breaks absolutely all the doctests I have with arrays that contain strings (it even breaks numpy's examples). That's why adding an encoding kwarg (better than the current auto-magical conversion to binary) won't solve my problems. The 'S' dtype will still be a binary array, which shows up in the repr. Since all strings in python 3 are unicode, I'm expecting "string" and "unicode" arrays in numpy to be the same *and* show up as strings (e.g., 'foo' instead of b'foo'). Yes, the difference between these types is in how they store their data. What I'm proposing is to always use unicode in python 3. If necessary, we can add a new dtype that lets users store raw byte arrays. By making them explicitly byte arrays, that shouldn't raise any new surprises. I already started doing the changes I described (as a result from the discussion in #3184 [1]), but wanted to double-check with the list before getting deeper into it. [1] https://github.com/numpy/numpy/issues/3184 Cheers, Lluis From vilanova at ac.upc.edu Tue Sep 13 10:21:57 2016 From: vilanova at ac.upc.edu (=?utf-8?Q?Llu=C3=ADs_Vilanova?=) Date: Tue, 13 Sep 2016 16:21:57 +0200 Subject: [Numpy-discussion] String & unicode arrays vs text loading in python 3 In-Reply-To: <1473773974.28390.4.camel@sipsolutions.net> (Sebastian Berg's message of "Tue, 13 Sep 2016 15:39:34 +0200") References: <87fup4rlzx.fsf@fimbulvetr.bsc.es> <1473773974.28390.4.camel@sipsolutions.net> Message-ID: <87fup3op6i.fsf@fimbulvetr.bsc.es> Sebastian Berg writes: > On Di, 2016-09-13 at 15:02 +0200, Llu?s Vilanova wrote: >> Hi! I'm giving a shot to issue #3184 [1], based on the observation >> that the >> string dtype ('S') under python 3 uses byte arrays instead of unicode >> (the only >> readable string type in python 3). >> >> This brings two major problems: >> >> * numpy code has to go through loops to open and read files as binary >> data to >> ? load text into a bytes array, and does not play well with users >> providing >> ? string (unicode) arguments >> >> * the repr of these arrays shows strings as b'text' instead of >> 'text', which >> ? breaks doctests of software built on numpy >> >> What I'm trying to do is make dtypes 'S' and 'U' equivalnt >> (NPY_STRING and >> NPY_UNICODE). >> >> Now the question. Keeping 'S' and 'U' as separate dtypes (but same >> internal >> implementation) will provide the best backwards compatibility, but is >> more >> cumbersome to implement. > I am not sure how that can be possible. Those types are fundamentally > different in how they store their data. String types use one byte per > character, unicode types will use 4 bytes per character. You can maybe > default to unicode in more cases in python 3, but you cannot make them > identical internally. BTW, by identical I mean having two externally visible types, but a common implementation in python 3 (that of NPY_UNICODE). The as-sane but not backwards-compatible option (I'm asking if this is acceptable) is to only retain 'S' (NPY_STRING), but with the NPY_UNICODE implementation, and making 'U' (and np.unicode_) and alias for 'S' (and np.string_). Cheers, Lluis From shoyer at gmail.com Tue Sep 13 12:47:24 2016 From: shoyer at gmail.com (Stephan Hoyer) Date: Tue, 13 Sep 2016 09:47:24 -0700 Subject: [Numpy-discussion] guvectorize, a helper for writing generalized ufuncs Message-ID: NumPy has the handy np.vectorize for turning Python code that operates on scalars into a function that vectorizes works like a ufunc, but no helper function for creating generalized ufuncs ( http://docs.scipy.org/doc/numpy/reference/c-api.generalized-ufuncs.html). np.apply_along_axis accomplishes some of this, but it only allows a single core dimension on a single argument. So I propose adding a new object, np.guvectorize(pyfunc, signature, otypes, ...), where pyfunc is defined over the core dimensions only of any inputs and signature is any valid gufunc signature (a string). Calling this object would apply the gufunc. This is inspired by the similar numba.guvectorize, which is currently the easiest way to write a gufunc in Python. In addition to be handy like vectorize, such functionality would be especially useful for with working libraries that build upon NumPy to extend the capabilities of generalized ufuncs (e.g., xarray after https://github.com/pydata/xarray/pull/964). Cheers, Stephan -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Tue Sep 13 12:55:38 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 13 Sep 2016 09:55:38 -0700 Subject: [Numpy-discussion] String & unicode arrays vs text loading in python 3 In-Reply-To: <87fup3op6i.fsf@fimbulvetr.bsc.es> References: <87fup4rlzx.fsf@fimbulvetr.bsc.es> <1473773974.28390.4.camel@sipsolutions.net> <87fup3op6i.fsf@fimbulvetr.bsc.es> Message-ID: We had a big long discussion about this on this list a while back (maybe 2 yrs ago???) please search the archives to find it. Though I'm pretty sure that we never did come to a conclusion. I think it stared with wanting better support ofr unicode in loadtxt and the like, and ended up delving into other encodings for the 'U' dtype, and maybe a single byte string dtype (latin-1), or maybe a variable-size unicode object like Py3's, or... However, it is absolutely a non-starter to change the binary representation of the 'S' type in any version of numpy. Due to the legacy of py2 (and, indeed, most computing environments) 'S' is a single byte string representation. And the binary representation is often really key to numpy use. Period, end of story. And that maps to a py2 string and py3 bytes object. py2 does, of course, have a Unicode object as well. If you want your code (and doctests, and ...) to be compatible, then you should probably go to Unicode strings everywhere. py3 now supports the u'string' no-op literal to make this easier. (though I guess the __repr__ won't tack on that 'u', which is going to be a problem for docstrings). Note also that py3 has added more an more "string-like" support to the bytes object, so it's not too bad to go bytes-only. -CHB On Tue, Sep 13, 2016 at 7:21 AM, Llu?s Vilanova wrote: > Sebastian Berg writes: > > > On Di, 2016-09-13 at 15:02 +0200, Llu?s Vilanova wrote: > >> Hi! I'm giving a shot to issue #3184 [1], based on the observation > >> that the > >> string dtype ('S') under python 3 uses byte arrays instead of unicode > >> (the only > >> readable string type in python 3). > >> > >> This brings two major problems: > >> > >> * numpy code has to go through loops to open and read files as binary > >> data to > >> load text into a bytes array, and does not play well with users > >> providing > >> string (unicode) arguments > >> > >> * the repr of these arrays shows strings as b'text' instead of > >> 'text', which > >> breaks doctests of software built on numpy > >> > >> What I'm trying to do is make dtypes 'S' and 'U' equivalnt > >> (NPY_STRING and > >> NPY_UNICODE). > >> > >> Now the question. Keeping 'S' and 'U' as separate dtypes (but same > >> internal > >> implementation) will provide the best backwards compatibility, but is > >> more > >> cumbersome to implement. > > > I am not sure how that can be possible. Those types are fundamentally > > different in how they store their data. String types use one byte per > > character, unicode types will use 4 bytes per character. You can maybe > > default to unicode in more cases in python 3, but you cannot make them > > identical internally. > > BTW, by identical I mean having two externally visible types, but a common > implementation in python 3 (that of NPY_UNICODE). > > The as-sane but not backwards-compatible option (I'm asking if this is > acceptable) is to only retain 'S' (NPY_STRING), but with the NPY_UNICODE > implementation, and making 'U' (and np.unicode_) and alias for 'S' (and > np.string_). > > > Cheers, > Lluis > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From nathan12343 at gmail.com Tue Sep 13 13:39:49 2016 From: nathan12343 at gmail.com (Nathan Goldbaum) Date: Tue, 13 Sep 2016 12:39:49 -0500 Subject: [Numpy-discussion] guvectorize, a helper for writing generalized ufuncs In-Reply-To: References: Message-ID: On Tue, Sep 13, 2016 at 11:47 AM, Stephan Hoyer wrote: > NumPy has the handy np.vectorize for turning Python code that operates on > scalars into a function that vectorizes works like a ufunc, but no helper > function for creating generalized ufuncs (http://docs.scipy.org/doc/ > numpy/reference/c-api.generalized-ufuncs.html). > > np.apply_along_axis accomplishes some of this, but it only allows a single > core dimension on a single argument. > > So I propose adding a new object, np.guvectorize(pyfunc, signature, > otypes, ...), where pyfunc is defined over the core dimensions only of any > inputs and signature is any valid gufunc signature (a string). Calling this > object would apply the gufunc. This is inspired by the similar > numba.guvectorize, which is currently the easiest way to write a gufunc in > Python. > > In addition to be handy like vectorize, such functionality would be > especially useful for with working libraries that build upon NumPy to > extend the capabilities of generalized ufuncs (e.g., xarray after > https://github.com/pydata/xarray/pull/964). > > First, this seems really cool. I hope it goes somewhere. I'm curious whether you have a plan to deal with the python functional call overhead. Numba gets around this by JIT-compiling python functions - is there something analogous you can do in NumPy or will this always be limited by the overhead of repeatedly calling a Python implementation of the "core" operation? -Nathan > Cheers, > Stephan > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Tue Sep 13 13:59:32 2016 From: shoyer at gmail.com (Stephan Hoyer) Date: Tue, 13 Sep 2016 10:59:32 -0700 Subject: [Numpy-discussion] guvectorize, a helper for writing generalized ufuncs In-Reply-To: References: Message-ID: On Tue, Sep 13, 2016 at 10:39 AM, Nathan Goldbaum wrote: > I'm curious whether you have a plan to deal with the python functional > call overhead. Numba gets around this by JIT-compiling python functions - > is there something analogous you can do in NumPy or will this always be > limited by the overhead of repeatedly calling a Python implementation of > the "core" operation? > I don't think there is any way to avoid this in NumPy proper, but that's OK (it's similar to the existing overhead of vectorize). Numba already has guvectorize (and it's own version of vectorize as well), which already does exactly this. -------------- next part -------------- An HTML attachment was scrubbed... URL: From vilanova at ac.upc.edu Tue Sep 13 14:05:51 2016 From: vilanova at ac.upc.edu (=?utf-8?Q?Llu=C3=ADs_Vilanova?=) Date: Tue, 13 Sep 2016 20:05:51 +0200 Subject: [Numpy-discussion] String & unicode arrays vs text loading in python 3 In-Reply-To: (Chris Barker's message of "Tue, 13 Sep 2016 09:55:38 -0700") References: <87fup4rlzx.fsf@fimbulvetr.bsc.es> <1473773974.28390.4.camel@sipsolutions.net> <87fup3op6i.fsf@fimbulvetr.bsc.es> Message-ID: <87intzhdz4.fsf@fimbulvetr.bsc.es> Chris Barker writes: > We had a big long discussion about this on this list a while back (maybe 2 yrs > ago???) please search the archives to find it. Though I'm pretty sure that we > never did come to a conclusion. I think it stared with wanting better support > ofr unicode in loadtxt and the like, and ended up delving into other encodings > for the 'U' dtype, and maybe a single byte string dtype (latin-1), or maybe a > variable-size unicode object like Py3's, or... > However, it is absolutely a non-starter to change the binary representation of > the 'S' type in any version of numpy. Due to the legacy of py2 (and, indeed, > most computing environments) 'S' is a single byte string representation. And the > binary representation is often really key to numpy use. > Period, end of story. Great, that's the type of info I wanted to get before going forward. I guess there's code relying on the binary representation of 'S' to do mmap's or access the array's raw contents. Is that right? > And that maps to a py2 string and py3 bytes object. > py2 does, of course, have a Unicode object as well. If you want your code (and > doctests, and ...) to be compatible, then you should probably go to Unicode > strings everywhere. py3 now supports the u'string' no-op literal to make this > easier. > (though I guess the __repr__ won't tack on that 'u', which is going to be a > problem for docstrings). That's exactly the problem. Doing all examples and doctests with 'U' instead of 'S' will break it for py2 instead of py3. > Note also that py3 has added more an more "string-like" support to the bytes > object, so it's not too bad to go bytes-only. There is a fundamental semantic difference between a string and a byte array, that's the core of the problem. Here's an alternative that only handles the repr. Separate fixes would be needed for loadtxt's and genfromtxt's problems (Sevastian Berg briefly pointed at that, but I'd like to know more). Whenever we repr an array using 'S', we can instead show a unicode in py3. That keeps the binary representation, but will always show the expected result to users, and it's only a handful of lines added to dump_data(). If needed, I could easily add a bytes array to make the alternative explicit (where py3 would repr the contents as b'foo'). This would only leave the less-common paths inconsistent across python versions, which should not be a problem for most examples/doctests: * A 'U' array will show u'foo' in py2 and 'foo' in py3. * The new binary array will show 'foo' in py2 and b'foo' in py3 (that could also be patched on the repr code). * A 'O' array will not be able to do any meaningful repr conversions. A more complex alternative (and actually closer to what I'm proposing) is to modify numpy in py3 to restrict 'S' to using 8-bit points in a unicode string. It would have the binary compatibility, while being a unicode string in practice. Cheers, Lluis From shoyer at gmail.com Tue Sep 13 14:21:21 2016 From: shoyer at gmail.com (Stephan Hoyer) Date: Tue, 13 Sep 2016 11:21:21 -0700 Subject: [Numpy-discussion] String & unicode arrays vs text loading in python 3 In-Reply-To: <87intzhdz4.fsf@fimbulvetr.bsc.es> References: <87fup4rlzx.fsf@fimbulvetr.bsc.es> <1473773974.28390.4.camel@sipsolutions.net> <87fup3op6i.fsf@fimbulvetr.bsc.es> <87intzhdz4.fsf@fimbulvetr.bsc.es> Message-ID: On Tue, Sep 13, 2016 at 11:05 AM, Llu?s Vilanova wrote: > Whenever we repr an array using 'S', we can instead show a unicode in py3. > That > keeps the binary representation, but will always show the expected result > to > users, and it's only a handful of lines added to dump_data(). > > If needed, I could easily add a bytes array to make the alternative > explicit > (where py3 would repr the contents as b'foo'). > > This would only leave the less-common paths inconsistent across python > versions, > which should not be a problem for most examples/doctests: > > * A 'U' array will show u'foo' in py2 and 'foo' in py3. > * The new binary array will show 'foo' in py2 and b'foo' in py3 (that > could also > be patched on the repr code). > * A 'O' array will not be able to do any meaningful repr conversions. > > > A more complex alternative (and actually closer to what I'm proposing) is > to > modify numpy in py3 to restrict 'S' to using 8-bit points in a unicode > string. It would have the binary compatibility, while being a unicode > string in > practice. I'm afraid these are both also non-starters at this point. NumPy's string dtype corresponds to bytes on Python 3, and you can use it to store arbitrary binary values. Would it really be an improvement to change the repr, if the scalar value resulting from indexing is still bytes? The sanest approach is probably a new dtype for one-byte strings. We talked about this a few years ago, but nobody has implemented it yet: http://numpy-discussion.scipy.narkive.com/3nqDu3Zk/a-one-byte-string-dtype (normally I would link to the archives on scipy.org, but the certificate for HTTPS has expired so you see a big error message right now...) -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Tue Sep 13 16:44:53 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 13 Sep 2016 13:44:53 -0700 Subject: [Numpy-discussion] String & unicode arrays vs text loading in python 3 In-Reply-To: <87intzhdz4.fsf@fimbulvetr.bsc.es> References: <87fup4rlzx.fsf@fimbulvetr.bsc.es> <1473773974.28390.4.camel@sipsolutions.net> <87fup3op6i.fsf@fimbulvetr.bsc.es> <87intzhdz4.fsf@fimbulvetr.bsc.es> Message-ID: On Tue, Sep 13, 2016 at 11:05 AM, Llu?s Vilanova wrote: > Great, that's the type of info I wanted to get before going forward. I > guess > there's code relying on the binary representation of 'S' to do mmap's or > access > the array's raw contents. Is that right? yes, there is a LOT of code, most of it third party, that relies on particular binary representations of the numpy dtypes. There is a fundamental semantic difference between a string and a byte > array, > that's the core of the problem. > well yes. but they were mingled in py2, and the 'S' dtype is essentially a py2 string. But in py3, it maps more closely with bytes than string -- though yes, not exactly either :-( Here's an alternative that only handles the repr. > > Whenever we repr an array using 'S', we can instead show a unicode in py3. > That > keeps the binary representation, but will always show the expected result > to > users, and it's only a handful of lines added to dump_data(). > This would probably be more confusing than helpful -- if a 'S' object converts to a bytes object, than it's repr should show that. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Wed Sep 14 01:54:28 2016 From: travis at continuum.io (Travis Oliphant) Date: Wed, 14 Sep 2016 00:54:28 -0500 Subject: [Numpy-discussion] guvectorize, a helper for writing generalized ufuncs In-Reply-To: References: Message-ID: There has been some discussion on the Numba mailing list as well about a version of guvectorize that doesn't compile for testing and flexibility. Having this be inside NumPy itself seems ideal. -Travis On Tue, Sep 13, 2016 at 12:59 PM, Stephan Hoyer wrote: > On Tue, Sep 13, 2016 at 10:39 AM, Nathan Goldbaum > wrote: > >> I'm curious whether you have a plan to deal with the python functional >> call overhead. Numba gets around this by JIT-compiling python functions - >> is there something analogous you can do in NumPy or will this always be >> limited by the overhead of repeatedly calling a Python implementation of >> the "core" operation? >> > > I don't think there is any way to avoid this in NumPy proper, but that's > OK (it's similar to the existing overhead of vectorize). > > Numba already has guvectorize (and it's own version of vectorize as well), > which already does exactly this. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- *Travis Oliphant, PhD* *Co-founder and CEO* @teoliphant 512-222-5440 http://www.continuum.io -------------- next part -------------- An HTML attachment was scrubbed... URL: From dpinte at enthought.com Wed Sep 14 07:00:30 2016 From: dpinte at enthought.com (Didrik Pinte) Date: Wed, 14 Sep 2016 13:00:30 +0200 Subject: [Numpy-discussion] mail.scipy.org update Message-ID: Hi everyone, While updating the scipy SSL certificates yesterday, it appeared that filesystem of the servers is corrupted (more than likely a hardware failure). The problem is restricted to one volume and impacts only the web services. The mailing list/mailman service works as expected. We're working on restoring all the different non-functional services. Thanks for you patience! -- Didrik -------------- next part -------------- An HTML attachment was scrubbed... URL: From vilanova at ac.upc.edu Wed Sep 14 10:36:48 2016 From: vilanova at ac.upc.edu (=?utf-8?Q?Llu=C3=ADs_Vilanova?=) Date: Wed, 14 Sep 2016 16:36:48 +0200 Subject: [Numpy-discussion] String & unicode arrays vs text loading in python 3 In-Reply-To: (Stephan Hoyer's message of "Tue, 13 Sep 2016 11:21:21 -0700") References: <87fup4rlzx.fsf@fimbulvetr.bsc.es> <1473773974.28390.4.camel@sipsolutions.net> <87fup3op6i.fsf@fimbulvetr.bsc.es> <87intzhdz4.fsf@fimbulvetr.bsc.es> Message-ID: <87h99ih7jz.fsf@fimbulvetr.bsc.es> Stephan Hoyer writes: > On Tue, Sep 13, 2016 at 11:05 AM, Llu?s Vilanova wrote: > Whenever we repr an array using 'S', we can instead show a unicode in py3. > That > keeps the binary representation, but will always show the expected result to > users, and it's only a handful of lines added to dump_data(). > If needed, I could easily add a bytes array to make the alternative explicit > (where py3 would repr the contents as b'foo'). > This would only leave the less-common paths inconsistent across python > versions, > which should not be a problem for most examples/doctests: > * A 'U' array will show u'foo' in py2 and 'foo' in py3. > * The new binary array will show 'foo' in py2 and b'foo' in py3 (that could > also > be patched on the repr code). > * A 'O' array will not be able to do any meaningful repr conversions. > A more complex alternative (and actually closer to what I'm proposing) is to > modify numpy in py3 to restrict 'S' to using 8-bit points in a unicode > string. It would have the binary compatibility, while being a unicode string > in > practice. > I'm afraid these are both also non-starters at this point. NumPy's string dtype > corresponds to bytes on Python 3, and you can use it to store arbitrary binary > values. Would it really be an improvement to change the repr, if the scalar > value resulting from indexing is still bytes? > The sanest approach is probably a new dtype for one-byte strings. We talked > about this a few years ago, but nobody has implemented it yet: > http://numpy-discussion.scipy.narkive.com/3nqDu3Zk/a-one-byte-string-dtype >From the ref manual, 'S' is a "(byte-)string", which (to me) should never have non-printable characters. That's why I'm advocating "S" to be your proposed one-byte strings, while a new "B" dtype is needed for arbitrary binary arrays. This has the added benefit of making docstrings correct on both py2 and py3. But I won't keep pushing for this; I understand the backwards-compatibility issues mentioned before. Maybe "S" should just be deprecated, "s" (as the one-byte strings) and "B" added instead, and all docstrings and tests changed to "s". In any case, after reading the whole thread, it's not clear to me what's the consensus on what the solution should be (Chris's summary is the closest thing to that). Cheers, Lluis From don.porges at gmail.com Wed Sep 14 11:42:07 2016 From: don.porges at gmail.com (Don Porges) Date: Wed, 14 Sep 2016 11:42:07 -0400 Subject: [Numpy-discussion] Swig tests failing on Array2.resize() Message-ID: On both osx and Linux, I am seeing 3 swig tests failing out of the entire suite. These both call Array2.resize() with 2 (non-self) arguments. Note that the last two are in fact supposed to fail, but not because of a wrong number of arguments, as happens here: ====================================================================== ERROR: testResize0 (__main__.Array2TestCase) Test Array2 resize method, size ---------------------------------------------------------------------- Traceback (most recent call last): File "testArray.py", line 177, in testResize0 self.array2.resize(newRows, newCols) TypeError: resize() takes exactly 2 arguments (3 given) ====================================================================== ERROR: testResizeBad1 (__main__.Array2TestCase) Test Array2 resize method, negative nrows ---------------------------------------------------------------------- Traceback (most recent call last): File "testArray.py", line 188, in testResizeBad1 self.assertRaises(ValueError, self.array2.resize, -5, 5) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/unittest/case.py", line 475, in assertRaises callableObj(*args, **kwargs) TypeError: resize() takes exactly 2 arguments (3 given) ====================================================================== ERROR: testResizeBad2 (__main__.Array2TestCase) Test Array2 resize method, negative ncols ---------------------------------------------------------------------- Traceback (most recent call last): File "testArray.py", line 192, in testResizeBad2 self.assertRaises(ValueError, self.array2.resize, 5, -5) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/unittest/case.py", line 475, in assertRaises callableObj(*args, **kwargs) TypeError: resize() takes exactly 2 arguments (3 given) ------------------------------------------------------------------------------------------- This is with python 2.7.x, swig 3.0.10, and numpy 1.11.1 . I note that >> import Array >> help(Array) gives, for Array2.resize, only this signature: *resize*(self, nrows) Is this currently supposed to work, or have I done something clearly wrong? -------------- next part -------------- An HTML attachment was scrubbed... URL: From dpinte at fastmail.fm Mon Sep 19 06:55:40 2016 From: dpinte at fastmail.fm (Didrik Pinte) Date: Mon, 19 Sep 2016 12:55:40 +0200 Subject: [Numpy-discussion] test Message-ID: <1474282540.2160041.729991921.6B2557FC@webmail.messagingengine.com> checking if server works as expected after SSL cert update From dpinte at enthought.com Mon Sep 26 05:20:16 2016 From: dpinte at enthought.com (Didrik Pinte) Date: Mon, 26 Sep 2016 11:20:16 +0200 Subject: [Numpy-discussion] mail.scipy.org update In-Reply-To: References: Message-ID: If all the SSL certification updates have been done properly, this message should go through. -- Didrik On 14 September 2016 at 13:00, Didrik Pinte wrote: > Hi everyone, > > While updating the scipy SSL certificates yesterday, it appeared that > filesystem of the servers is corrupted (more than likely a hardware > failure). The problem is restricted to one volume and impacts only the web > services. The mailing list/mailman service works as expected. > > We're working on restoring all the different non-functional services. > > Thanks for you patience! > > -- Didrik > -- Didrik Pinte +32 475 665 668 +44 1223 969515 Enthought Inc. dpinte at enthought.com Scientific Computing Solutions http://www.enthought.com The information contained in this message is Enthought confidential & not to be dissiminated to outside parties without explicit prior approval from sender. This message is intended solely for the addressee(s), If you are not the intended recipient, please contact the sender by return e-mail and destroy all copies of the original message. -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Sun Sep 25 23:52:49 2016 From: shoyer at gmail.com (Stephan Hoyer) Date: Sun, 25 Sep 2016 20:52:49 -0700 Subject: [Numpy-discussion] guvectorize, a helper for writing generalized ufuncs In-Reply-To: References: Message-ID: I have put a pull request implementing numpy.guvectorize up for review: https://github.com/numpy/numpy/pull/8054 Cheers, Stephan On Tue, Sep 13, 2016 at 10:54 PM, Travis Oliphant wrote: > There has been some discussion on the Numba mailing list as well about a > version of guvectorize that doesn't compile for testing and flexibility. > > Having this be inside NumPy itself seems ideal. > > -Travis > > > On Tue, Sep 13, 2016 at 12:59 PM, Stephan Hoyer wrote: > >> On Tue, Sep 13, 2016 at 10:39 AM, Nathan Goldbaum >> wrote: >> >>> I'm curious whether you have a plan to deal with the python functional >>> call overhead. Numba gets around this by JIT-compiling python functions - >>> is there something analogous you can do in NumPy or will this always be >>> limited by the overhead of repeatedly calling a Python implementation of >>> the "core" operation? >>> >> >> I don't think there is any way to avoid this in NumPy proper, but that's >> OK (it's similar to the existing overhead of vectorize). >> >> Numba already has guvectorize (and it's own version of vectorize as >> well), which already does exactly this. >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > > -- > > *Travis Oliphant, PhD* > *Co-founder and CEO* > > > @teoliphant > 512-222-5440 > http://www.continuum.io > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Mon Sep 26 05:54:16 2016 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Mon, 26 Sep 2016 22:54:16 +1300 Subject: [Numpy-discussion] mail.scipy.org update In-Reply-To: References: Message-ID: On Mon, Sep 26, 2016 at 10:20 PM, Didrik Pinte wrote: > If all the SSL certification updates have been done properly, this message > should go through. > It did, thanks Didrik! Ralf > > -- Didrik > > On 14 September 2016 at 13:00, Didrik Pinte wrote: > >> Hi everyone, >> >> While updating the scipy SSL certificates yesterday, it appeared that >> filesystem of the servers is corrupted (more than likely a hardware >> failure). The problem is restricted to one volume and impacts only the web >> services. The mailing list/mailman service works as expected. >> >> We're working on restoring all the different non-functional services. >> >> Thanks for you patience! >> >> -- Didrik >> > > > > -- > Didrik Pinte +32 475 665 668 > +44 1223 969515 > Enthought Inc. dpinte at enthought.com > Scientific Computing Solutions http://www.enthought.com > > The information contained in this message is Enthought confidential & not > to be dissiminated to outside parties without explicit prior approval from > sender. This message is intended solely for the addressee(s), If you are > not the intended recipient, please contact the sender by return e-mail and > destroy all copies of the original message. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Sep 23 16:20:13 2016 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 23 Sep 2016 14:20:13 -0600 Subject: [Numpy-discussion] testing Message-ID: Testing if this gets posted... Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From e.samehta at gmail.com Mon Sep 26 09:36:08 2016 From: e.samehta at gmail.com (Saurabh Mehta) Date: Mon, 26 Sep 2016 15:36:08 +0200 Subject: [Numpy-discussion] [7949] How to handle these deprecation Warning errors Message-ID: Hi I am working on issue #7949, and need to use "-3" switch while running python 2.7 for my tests. python -3 -c "import numpy as np; np.test()" Several errors are reported and all all of them are DeprecationWarnings, which is ok. (https://travis-ci.org/njase/numpy/jobs/162733502) *But now these errors must be either fixed or skipped. This is where I am facing problem. Pls suggest:* 1. *Identify that python was invoked with -3 and skip these cases*: There seems no way to know if python was invoked with -3 sys.argv only reports about "-c" and ignores other switches 2. *Fix these issues:* All of them are about deprecated APIs and new APIs have been introduced in python 3. Since I am using python 2.x, I don't see a way to fix them What to do? Regards Saurabh -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Mon Sep 26 11:08:46 2016 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Mon, 26 Sep 2016 17:08:46 +0200 Subject: [Numpy-discussion] [7949] How to handle these deprecation Warning errors In-Reply-To: References: Message-ID: <1474902526.27722.20.camel@sipsolutions.net> On Mo, 2016-09-26 at 15:36 +0200, Saurabh Mehta wrote: > Hi > > I am working on issue #7949, and need to use "-3" switch while > running python 2.7 for my tests. > python -3 -c "import numpy as np; np.test()" > > Several errors are reported and all all of them are > DeprecationWarnings, which is ok. (https://travis-ci.org/njase/numpy/ > jobs/162733502) > OK, most of them seem harmless (i.e. don't use the slice C-slot....). I think we should just get rid of the slice C-slot thing globally, the simplest way would be to go to numpy/testing/nosetester.py, look for things like `sup.filter(message='Not importing directory')`. Then, maybe specific `if sys.version_info.major == 2 and sys.py3kwarning`, just add some `sup.filter` such as: ``` if sys.version_info.major == 2 and sys.py3kwarning: ? ? sup.filter(DeprecationWarning, message="in 3.x, __setslice__") ? ? sup.filter(DeprecationWarning, message="in 3.x, __getslice__") ``` First scrolling through the errors, my guess is that the other errors we can also silence more locally. This silencing might also leek to scipy, but frankly I am not worried about it for those slots. The threading warnings seem also quite noisy (and useless), but not sure right away what the best approach for that would be. - Sebastian > But now these errors must be either fixed or skipped. This is where I > am facing problem.?Pls suggest: > > 1. Identify that python was invoked with -3 and skip these cases: > There seems no way to know if python was invoked with -3 > sys.argv only reports about "-c" and ignores other switches > > > 2. Fix these issues:?All of them are about deprecated APIs and new > APIs have been introduced in python 3. Since I am using python 2.x, I > don't see a way to fix them > > What to do? > > Regards > Saurabh > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From mailinglists at xgm.de Mon Sep 26 11:17:37 2016 From: mailinglists at xgm.de (Florian Lindner) Date: Mon, 26 Sep 2016 17:17:37 +0200 Subject: [Numpy-discussion] Parsing a file with dates to datetim64 Message-ID: <297efc41-28d0-80da-ca74-cd01e911d7b5@xgm.de> Hey, I have a file: ;;;;;;;;;Eintrittsdatum;;; ;;;;;;;;;04.03.16;;10,00 ?;genehmigt ;;;;;;;;;04.03.16;;10,00 ?;genehmigt which I try to parse using def dateToNumpyDate(s): s = s.decode("utf-8") ret = datetime.datetime.strptime(s, "%d.%m.%y").isoformat() return ret def generateMembers(): members = np.genfromtxt("test_CSC_Mitglieder.csv", dtype = { "names" : ["EntryDate"], "formats" : ['datetime64[D]'] }, converters = { 9 : dateToNumpyDate }, skip_header = 1, delimiter = ";", usecols = (9)) count = members.shape[0] y = np.linspace(1, count, count) print(members) print(members.dtype) plt.plot(members["EntryDate"], y) plt.show() but the datatype was ignored homehow, generateMembers() File "CSC.py", line 76, in generateMembers plt.plot(members["EntryDate"], y) IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices I also tried to print the return of dateToNumpbyData: 2016-03-04T00:00:00 2016-03-04T00:00:00 2016-03-04T00:00:00 Is there a problem with the dtype argument? Thanks, Florian From a.beloi at samsung.com Mon Sep 26 14:52:28 2016 From: a.beloi at samsung.com (Alex Beloi) Date: Mon, 26 Sep 2016 11:52:28 -0700 Subject: [Numpy-discussion] PR 8053 np.random.multinomial tolerance param References: Message-ID: <008401d21827$21dc42c0$6594c840$@samsung.com> Hello, Pull Request: https://github.com/numpy/numpy/pull/8053 I would like to expose a tolerance parameter for the function numpy.random.multinomial. The function `multinomial(n, pvals, size=None)` correctly raises exception when `sum(pvals) > 1 + 1e-12` as these values should sum to 1. However, other libraries often cannot or do not guarantee such level of precision. Specifically, I have encountered issues with tensorflow function tf.nn.softmax, which is expected to output a tensor whose values sum to 1, but often with precision of only 1e-8. I propose to expose the `1e-12` tolerance to a non-negative float parameter with default value `1e-12`. Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Mon Sep 26 14:59:00 2016 From: shoyer at gmail.com (Stephan Hoyer) Date: Mon, 26 Sep 2016 11:59:00 -0700 Subject: [Numpy-discussion] PR 8053 np.random.multinomial tolerance param In-Reply-To: <008401d21827$21dc42c0$6594c840$@samsung.com> References: <008401d21827$21dc42c0$6594c840$@samsung.com> Message-ID: I would actually be just as happy to relax the tolerance here to 1e-8 always. I doubt this would catch any fewer bugs than the current default. In contrast, adding new parameters adds cognitive overload for everyone encountering the function. Also, for your use case note that tensorflow has it's own function for generating random values from a multinomial distribution: https://www.tensorflow.org/versions/r0.10/api_docs/python/constant_op.html#multinomial On Mon, Sep 26, 2016 at 11:52 AM, Alex Beloi wrote: > Hello, > > > > Pull Request: https://github.com/numpy/numpy/pull/8053 > > > > I would like to expose a tolerance parameter for the function > numpy.random.multinomial. > > > > The function `multinomial(n, pvals, size=None)` correctly raises exception > when `sum(pvals) > 1 + 1e-12` as these values should sum to 1. However, > other libraries often cannot or do not guarantee such level of precision. > > > > Specifically, I have encountered issues with tensorflow function > tf.nn.softmax, which is expected to output a tensor whose values sum to 1, > but often with precision of only 1e-8. > > > > I propose to expose the `1e-12` tolerance to a non-negative float > parameter with default value `1e-12`. > > > > Alex > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Tue Sep 27 06:09:07 2016 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Tue, 27 Sep 2016 23:09:07 +1300 Subject: [Numpy-discussion] Fwd: [NumFOCUS Projects] Job Posting Message-ID: Hi all, The below job posting may be of interest. It's actually a key role for a quite exciting program, could really have an impact on a large set of scientific computing projects. Cheers, Ralf ---------- Forwarded message ---------- From: Leah Silen Date: Thu, Sep 22, 2016 at 6:05 AM Subject: [NumFOCUS Projects] Job Posting To: projects at numfocus.org The last project email contained information on both the Summit and the Sustainability Program Job Posting. In case you missed it, we wanted to resend the job posting info. Please share with those in your circles. Best, Leah NumFOCUS Projects Director NumFOCUS is seeking to hire a full-time Projects Director to develop and run a sustainability incubator program for NumFOCUS fiscally sponsored open source projects. This is the first program of its kind, with opportunity for enormous impact on the open source ecosystem. The Projects Director will work with NumFOCUS fiscally sponsored Open Source projects to develop appropriate business development strategies and opportunities to engage with industry. The learnings from this program will be public, meaning it has the potential to change how all open source projects are managed. NumFOCUS is 501(c)3 whose mission is to promote sustainable high-level programming languages, open code development, and reproducible scientific research. They accomplish this mission through their educational programs and events as well as through fiscal sponsorship of open source data science projects. They aim to increase collaboration and communication within the scientific computing community. The Projects Director position is initially funded for 2 years through a grant from the Sloan Foundation. The incumbent will be hired as an employee of NumFOCUS. Review of applications will begin October 10th, and the position will remain open until filled. Job description The Projects Director will lead efforts to identify skills and perspectives that Open Source Project leads need to establish sustainability models and seek opportunities for funding and will consult with NumFOCUS projects and disseminate outcomes. Responsibilities Include: - Organize a ?business bootcamp? workshop, coach project leads and lead an Advisory Board for these business development efforts - Develop business and financial plans, improve communication, build industry relationships, and broadly develop strategies for fostering these relationships. - Lead efforts to identify skills and perspectives that Open Source Project leads need to establish sustainability models and seek opportunities for funding - Using materials from workshops and other resources, develop and disseminate an Open Source Nurturing and Actionable Practices Kit (OSNAP Kit) to provide guidance to open source projects outside of NumFOCUS - Establish assessment strategies to measure impact and help NumFOCUS open-source projects develop industry relationships Qualifications As this role provides a business, operations and marketing perspective to Open Source project leads, the candidate should have: - Preferred: 4 years experience in business development, operations and/or marketing - Strong oral and written communication skills - Teaching experience of some capacity, e.g. teaching at universities or creating and running workshops - Demonstrated ability to interact with a broad range of people from technical leads to industry leaders - Experience working with open communities, e.g. open source software, hardware or makers, - A rich network of experts in business and technology to draw ideas and collaboration from. Salary NumFOCUS offers a competitive salary commensurate with experience and a comprehensive benefits package. Applications To apply, please submit your resume/CV and cover letter (up to two pages) to NumFOCUS at: projects_director at numfocus.org See the posting at: http://www.numfocus.org/blog/ projects-director-job-posting --- Leah Silen Executive Director, NumFOCUS leah at numfocus.org 512-222-5449 -- You received this message because you are subscribed to the Google Groups "Fiscally Sponsored Project Representatives" group. To unsubscribe from this group and stop receiving emails from it, send an email to projects+unsubscribe at numfocus.org. To post to this group, send email to projects at numfocus.org. Visit this group at https://groups.google.com/a/numfocus.org/group/projects/ . To view this discussion on the web visit https://groups.google.com/a/ numfocus.org/d/msgid/projects/CALWv6u%2BkwhK_RrFq7bCfMq9mB36o5FjGJjBBGo0yJ1 jzPLsmvw%40mail.gmail.com . -------------- next part -------------- An HTML attachment was scrubbed... URL: From a.beloi at samsung.com Tue Sep 27 17:01:21 2016 From: a.beloi at samsung.com (Alex Beloi) Date: Tue, 27 Sep 2016 14:01:21 -0700 Subject: [Numpy-discussion] PR 8053 np.random.multinomial tolerance param In-Reply-To: References: <008401d21827$21dc42c0$6594c840$@samsung.com> Message-ID: <00ba01d21902$4d4f3030$e7ed9090$@samsung.com> Thanks for pointing out the tensorflow multinomial implementation, this will cover my use case perfectly. The documentation on raises is redundant as well, the relevant information is mentioned in the parameter description. I?ve closed the PR. Cheers, Alex From: NumPy-Discussion [mailto:numpy-discussion-bounces at scipy.org] On Behalf Of Stephan Hoyer Sent: Monday, September 26, 2016 11:59 AM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] PR 8053 np.random.multinomial tolerance param I would actually be just as happy to relax the tolerance here to 1e-8 always. I doubt this would catch any fewer bugs than the current default. In contrast, adding new parameters adds cognitive overload for everyone encountering the function. Also, for your use case note that tensorflow has it's own function for generating random values from a multinomial distribution: https://www.tensorflow.org/versions/r0.10/api_docs/python/constant_op.html#multinomial On Mon, Sep 26, 2016 at 11:52 AM, Alex Beloi wrote: Hello, Pull Request: https://github.com/numpy/numpy/pull/8053 I would like to expose a tolerance parameter for the function numpy.random.multinomial. The function `multinomial(n, pvals, size=None)` correctly raises exception when `sum(pvals) > 1 + 1e-12` as these values should sum to 1. However, other libraries often cannot or do not guarantee such level of precision. Specifically, I have encountered issues with tensorflow function tf.nn.softmax, which is expected to output a tensor whose values sum to 1, but often with precision of only 1e-8. I propose to expose the `1e-12` tolerance to a non-negative float parameter with default value `1e-12`. Alex _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From oleksandr.pavlyk at intel.com Tue Sep 27 17:09:59 2016 From: oleksandr.pavlyk at intel.com (Pavlyk, Oleksandr) Date: Tue, 27 Sep 2016 21:09:59 +0000 Subject: [Numpy-discussion] Using library-specific headers Message-ID: <4C9EDA7282E297428F3986994EB0FBD3868308@ORSMSX110.amr.corp.intel.com> Suppose I would like to take advantage of some functions from MKL in numpy C source code, which would require to use #include "mkl.h" Ideally this include line must not break the build of numpy when MKL is not present, so my initial approach was to use #if defined(SCIPY_MKL_H) #include "mkl.h" #endif Unfortunately, this did not work when building with gcc on a machine where MKL is present on default LD_LIBRARY_PATH, because then the distutils code was setting SCIPY_MKL_H preprocessor variable, even though mkl headers are not on the C_INCLUDE_PATH. What is the preferred solution to include an external library header to ensure that code-base continues to build in most common cases? One approach I can think of is to set a preprocessor variable, say HAVE_MKL_HEADERS in numpy/core/includes/numpy/config.h depending on an outcome of building of a simple _configtest.c using config.try_compile(), like it is done in numpy/core/setup.py Is there a simpler, or a better way? Thank you, Oleksandr -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtaylor.debian at googlemail.com Thu Sep 29 09:09:34 2016 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Thu, 29 Sep 2016 15:09:34 +0200 Subject: [Numpy-discussion] Using library-specific headers In-Reply-To: <4C9EDA7282E297428F3986994EB0FBD3868308@ORSMSX110.amr.corp.intel.com> References: <4C9EDA7282E297428F3986994EB0FBD3868308@ORSMSX110.amr.corp.intel.com> Message-ID: <5994a3ab-567a-cafd-9c81-fc3d1a4d87d8@googlemail.com> On 09/27/2016 11:09 PM, Pavlyk, Oleksandr wrote: > Suppose I would like to take advantage of some functions from MKL in > numpy C source code, which would require to use > > > > #include ?mkl.h? > > > > Ideally this include line must not break the build of numpy when MKL is > not present, so my initial approach was to use > > > > #if defined(SCIPY_MKL_H) > > #include ?mkl.h? > > #endif > > > > Unfortunately, this did not work when building with gcc on a machine > where MKL is present on default LD_LIBRARY_PATH, because then the > distutils code was setting SCIPY_MKL_H preprocessor variable, even > though mkl headers are not on the C_INCLUDE_PATH. > > > > What is the preferred solution to include an external library header to > ensure that code-base continues to build in most common cases? > > > > One approach I can think of is to set a preprocessor variable, say > HAVE_MKL_HEADERS in numpy/core/includes/numpy/config.h depending on an > outcome of building of a simple _configtest.c using > config.try_compile(), like it is done in numpy/core/setup.py // > > / / > > Is there a simpler, or a better way? > hi, you could put the header into OPTIONAL_HEADERS in numpy/core/setup_common.py. This will define HAVE_HEADERFILENAME_H for you but this will not check that the corresponding the library actually exists and can be linked. For that SCIPY_MKL_H is probably the right macro, though its name is confusing as it does not check for the header presence ... Can you tell us more about what from mkl you are attempting to add and for what purpos, e.g. is it something that should go into numpy proper or just for personal/internal use? cheers, Julian From oleksandr.pavlyk at intel.com Thu Sep 29 13:27:23 2016 From: oleksandr.pavlyk at intel.com (Pavlyk, Oleksandr) Date: Thu, 29 Sep 2016 17:27:23 +0000 Subject: [Numpy-discussion] Using library-specific headers In-Reply-To: <5994a3ab-567a-cafd-9c81-fc3d1a4d87d8@googlemail.com> References: <4C9EDA7282E297428F3986994EB0FBD3868308@ORSMSX110.amr.corp.intel.com> <5994a3ab-567a-cafd-9c81-fc3d1a4d87d8@googlemail.com> Message-ID: <4C9EDA7282E297428F3986994EB0FBD3868901@ORSMSX110.amr.corp.intel.com> Hi Julian, Thank you very much for the response. It appears to work. I work on "Intel Distribution for Python" at Intel Corp. This question was motivated by work needed to prepare pull requests with our changes/optimizations to numpy source code. In particular, the numpy.random_intel package https://mail.scipy.org/pipermail/numpy-discussion/2016-June/075693.html relies on MKL, but its potential inclusion in numpy should not break the build if MKL is unavailable. Also our benchmarking was pointing at Numpy's sequential memory copying as a bottleneck. I am working to open a pull request into the main trunk of numpy to take advantage of multithreaded MKL's BLAS dcopy function to do memory copying in parallel for sufficiently large sizes. Related to numpy.random_inter, I noticed that the randomstate package, which extends numpy.random was not being made a part of numpy, but rather published on PyPI as a stand-alone module. Does that mean that the community decided against including it in numpy's codebase? If so, I would appreciate if someone could elaborate on or point me to the reasoning behind that decision. Thank you, Oleksandr -----Original Message----- From: NumPy-Discussion [mailto:numpy-discussion-bounces at scipy.org] On Behalf Of Julian Taylor Sent: Thursday, September 29, 2016 8:10 AM To: numpy-discussion at scipy.org Subject: Re: [Numpy-discussion] Using library-specific headers On 09/27/2016 11:09 PM, Pavlyk, Oleksandr wrote: > Suppose I would like to take advantage of some functions from MKL in > numpy C source code, which would require to use > > > > #include "mkl.h" > > > > Ideally this include line must not break the build of numpy when MKL > is not present, so my initial approach was to use > > > > #if defined(SCIPY_MKL_H) > > #include "mkl.h" > > #endif > > > > Unfortunately, this did not work when building with gcc on a machine > where MKL is present on default LD_LIBRARY_PATH, because then the > distutils code was setting SCIPY_MKL_H preprocessor variable, even > though mkl headers are not on the C_INCLUDE_PATH. > > > > What is the preferred solution to include an external library header > to ensure that code-base continues to build in most common cases? > > > > One approach I can think of is to set a preprocessor variable, say > HAVE_MKL_HEADERS in numpy/core/includes/numpy/config.h depending on an > outcome of building of a simple _configtest.c using > config.try_compile(), like it is done in numpy/core/setup.py // > > / / > > Is there a simpler, or a better way? > hi, you could put the header into OPTIONAL_HEADERS in numpy/core/setup_common.py. This will define HAVE_HEADERFILENAME_H for you but this will not check that the corresponding the library actually exists and can be linked. For that SCIPY_MKL_H is probably the right macro, though its name is confusing as it does not check for the header presence ... Can you tell us more about what from mkl you are attempting to add and for what purpos, e.g. is it something that should go into numpy proper or just for personal/internal use? cheers, Julian _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion From robert.kern at gmail.com Thu Sep 29 14:01:14 2016 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 29 Sep 2016 19:01:14 +0100 Subject: [Numpy-discussion] Using library-specific headers In-Reply-To: <4C9EDA7282E297428F3986994EB0FBD3868901@ORSMSX110.amr.corp.intel.com> References: <4C9EDA7282E297428F3986994EB0FBD3868308@ORSMSX110.amr.corp.intel.com> <5994a3ab-567a-cafd-9c81-fc3d1a4d87d8@googlemail.com> <4C9EDA7282E297428F3986994EB0FBD3868901@ORSMSX110.amr.corp.intel.com> Message-ID: On Thu, Sep 29, 2016 at 6:27 PM, Pavlyk, Oleksandr < oleksandr.pavlyk at intel.com> wrote: > Related to numpy.random_inter, I noticed that the randomstate package, which extends numpy.random was > not being made a part of numpy, but rather published on PyPI as a stand-alone module. Does that mean that > the community decided against including it in numpy's codebase? If so, I would appreciate if someone could > elaborate on or point me to the reasoning behind that decision. No, we are just working out the API and the extensibility machinery in a separate package before committing to backwards compatibility. -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From robbmcleod at gmail.com Thu Sep 29 14:12:07 2016 From: robbmcleod at gmail.com (Robert McLeod) Date: Thu, 29 Sep 2016 20:12:07 +0200 Subject: [Numpy-discussion] Using library-specific headers In-Reply-To: <4C9EDA7282E297428F3986994EB0FBD3868901@ORSMSX110.amr.corp.intel.com> References: <4C9EDA7282E297428F3986994EB0FBD3868308@ORSMSX110.amr.corp.intel.com> <5994a3ab-567a-cafd-9c81-fc3d1a4d87d8@googlemail.com> <4C9EDA7282E297428F3986994EB0FBD3868901@ORSMSX110.amr.corp.intel.com> Message-ID: Pavlyk, NumExpr optionally includes MKL's VML at compile-time. You may want to look at its implementation. From what I recall it relies on a function in a bootstrapped __config__.py to determine if MKL is present. Robert On Thu, Sep 29, 2016 at 7:27 PM, Pavlyk, Oleksandr < oleksandr.pavlyk at intel.com> wrote: > Hi Julian, > > Thank you very much for the response. It appears to work. > > I work on "Intel Distribution for Python" at Intel Corp. This question was > motivated by work needed to > prepare pull requests with our changes/optimizations to numpy source code. > In particular, the numpy.random_intel package > > https://mail.scipy.org/pipermail/numpy-discussion/2016-June/075693.html > > relies on MKL, but its potential inclusion in numpy should not break the > build if MKL is unavailable. > > Also our benchmarking was pointing at Numpy's sequential memory copying as > a bottleneck. > I am working to open a pull request into the main trunk of numpy to take > advantage of multithreaded > MKL's BLAS dcopy function to do memory copying in parallel for > sufficiently large sizes. > > Related to numpy.random_inter, I noticed that the randomstate package, > which extends numpy.random was > not being made a part of numpy, but rather published on PyPI as a > stand-alone module. Does that mean that > the community decided against including it in numpy's codebase? If so, I > would appreciate if someone could > elaborate on or point me to the reasoning behind that decision. > > Thank you, > Oleksandr > > > > -----Original Message----- > From: NumPy-Discussion [mailto:numpy-discussion-bounces at scipy.org] On > Behalf Of Julian Taylor > Sent: Thursday, September 29, 2016 8:10 AM > To: numpy-discussion at scipy.org > Subject: Re: [Numpy-discussion] Using library-specific headers > > On 09/27/2016 11:09 PM, Pavlyk, Oleksandr wrote: > > Suppose I would like to take advantage of some functions from MKL in > > numpy C source code, which would require to use > > > > > > > > #include "mkl.h" > > > > > > > > Ideally this include line must not break the build of numpy when MKL > > is not present, so my initial approach was to use > > > > > > > > #if defined(SCIPY_MKL_H) > > > > #include "mkl.h" > > > > #endif > > > > > > > > Unfortunately, this did not work when building with gcc on a machine > > where MKL is present on default LD_LIBRARY_PATH, because then the > > distutils code was setting SCIPY_MKL_H preprocessor variable, even > > though mkl headers are not on the C_INCLUDE_PATH. > > > > > > > > What is the preferred solution to include an external library header > > to ensure that code-base continues to build in most common cases? > > > > > > > > One approach I can think of is to set a preprocessor variable, say > > HAVE_MKL_HEADERS in numpy/core/includes/numpy/config.h depending on an > > outcome of building of a simple _configtest.c using > > config.try_compile(), like it is done in numpy/core/setup.py // > > > > / / > > > > Is there a simpler, or a better way? > > > > hi, > you could put the header into OPTIONAL_HEADERS in > numpy/core/setup_common.py. This will define HAVE_HEADERFILENAME_H for you > but this will not check that the corresponding the library actually exists > and can be linked. > For that SCIPY_MKL_H is probably the right macro, though its name is > confusing as it does not check for the header presence ... > > Can you tell us more about what from mkl you are attempting to add and for > what purpos, e.g. is it something that should go into numpy proper or just > for personal/internal use? > > cheers, > Julian > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Robert McLeod, Ph.D. Center for Cellular Imaging and Nano Analytics (C-CINA) Biozentrum der Universit?t Basel Mattenstrasse 26, 4058 Basel Work: +41.061.387.3225 robert.mcleod at unibas.ch robert.mcleod at bsse.ethz.ch robbmcleod at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From cimrman3 at ntc.zcu.cz Fri Sep 30 03:50:31 2016 From: cimrman3 at ntc.zcu.cz (Robert Cimrman) Date: Fri, 30 Sep 2016 09:50:31 +0200 Subject: [Numpy-discussion] ANN: SfePy 2016.3 Message-ID: I am pleased to announce release 2016.3 of SfePy. Description ----------- SfePy (simple finite elements in Python) is a software for solving systems of coupled partial differential equations by the finite element method or by the isogeometric analysis (limited support). It is distributed under the new BSD license. Home page: http://sfepy.org Mailing list: http://groups.google.com/group/sfepy-devel Git (source) repository, issue tracker: http://github.com/sfepy/sfepy Highlights of this release -------------------------- - Python 3 support - testing with Travis CI - new classes for homogenized coefficients - using argparse instead of optparse For full release notes see http://docs.sfepy.org/doc/release_notes.html#id1 (rather long and technical). Cheers, Robert Cimrman --- Contributors to this release in alphabetical order: Robert Cimrman Jan Heczko Thomas Kluyver Vladimir Lukes From jtaylor.debian at googlemail.com Fri Sep 30 09:38:37 2016 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Fri, 30 Sep 2016 15:38:37 +0200 Subject: [Numpy-discussion] automatically avoiding temporary arrays Message-ID: <283e3000-0b9c-886c-e322-1ff4d2e8cb26@googlemail.com> hi, Temporary arrays generated in expressions are expensive as the imply extra memory bandwidth which is the bottleneck in most numpy operations. For example: r = a + b + c creates the b + c temporary and then adds a to it. This can be rewritten to be more efficient using inplace operations: r = b + c r += a This saves some memory bandwidth and can speedup the operation by 50% for very large arrays or even more if the inplace operation allows it to be completed completely in the cpu cache. The problem is that inplace operations are a lot less readable so they are often only used in well optimized code. But due to pythons refcounting semantics we can actually do some inplace conversions transparently. If an operand in python has a reference count of one it must be a temporary so we can use it as the destination array. CPython itself does this optimization for string concatenations. In numpy we have the issue that we can be called from the C-API directly where the reference count may be one for other reasons. To solve this we can check the backtrace until the python frame evaluation function. If there are only numpy and python functions in between that and our entry point we should be able to elide the temporary. This PR implements this: https://github.com/numpy/numpy/pull/7997 It currently only supports Linux with glibc (which has reliable backtraces via unwinding) and maybe MacOS depending on how good their backtrace is. On windows the backtrace APIs are different and I don't know them but in theory it could also be done there. A problem is that checking the backtrace is quite expensive, so should only be enabled when the involved arrays are large enough for it to be worthwhile. In my testing this seems to be around 180-300KiB sized arrays, basically where they start spilling out of the CPU L2 cache. I made a little crappy benchmark script to test this cutoff in this branch: https://github.com/juliantaylor/numpy/tree/elide-bench If you are interested you can run it with: python setup.py build_ext -j 4 --inplace ipython --profile=null check.ipy At the end it will plot the ratio between elided and non-elided runtime. It should get larger than one around 180KiB on most cpus. If no one points out some flaw in the approach, I'm hoping to get this into the next numpy version. cheers, Julian -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From charlesr.harris at gmail.com Fri Sep 30 10:30:10 2016 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 30 Sep 2016 08:30:10 -0600 Subject: [Numpy-discussion] Vendorize tempita Message-ID: Hi All, There is a PR to vendorize tempita. This removes tempita as a dependency and simplifies some things. Feedback on this step is welcome. One question is whether the package should be renamed to something like `npy_tempita`, as otherwise installed tempita, if any has priority. Thoughts? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Fri Sep 30 11:13:15 2016 From: shoyer at gmail.com (Stephan Hoyer) Date: Fri, 30 Sep 2016 08:13:15 -0700 Subject: [Numpy-discussion] Vendorize tempita In-Reply-To: References: Message-ID: One way to do this is to move to vendorized dependencies into an submodule of numpy itself (e.g., sklearn.externals.joblib, though maybe even a little more indirection than that would be valuable to make it clear that it isn't part of NumPy public API). This would avoid further enlarging the set of namespaces we use. In any case, I'm perfectly OK with using something like npy_tempita internally, too, as long as we can be sure that we're using NumPy's vendorized version, not whatever version is installed locally. We're not planning to actually install "npy_tempita" when installing numpy (even for dev installs), right? On Fri, Sep 30, 2016 at 7:30 AM, Charles R Harris wrote: > Hi All, > > There is a PR to vendorize > tempita. This removes tempita as a dependency and simplifies some things. > Feedback on this step is welcome. One question is whether the package > should be renamed to something like `npy_tempita`, as otherwise installed > tempita, if any has priority. > > Thoughts? > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.v.root at gmail.com Fri Sep 30 11:21:56 2016 From: ben.v.root at gmail.com (Benjamin Root) Date: Fri, 30 Sep 2016 11:21:56 -0400 Subject: [Numpy-discussion] Vendorize tempita In-Reply-To: References: Message-ID: This is the first I am hearing of tempita (looks to be a templating language). How is it a dependency of numpy? Do I now need tempita in order to use numpy, or is it a build-time-only dependency? Ben On Fri, Sep 30, 2016 at 11:13 AM, Stephan Hoyer wrote: > One way to do this is to move to vendorized dependencies into an submodule > of numpy itself (e.g., sklearn.externals.joblib, though maybe even a little > more indirection than that would be valuable to make it clear that it isn't > part of NumPy public API). This would avoid further enlarging the set of > namespaces we use. > > In any case, I'm perfectly OK with using something like npy_tempita > internally, too, as long as we can be sure that we're using NumPy's > vendorized version, not whatever version is installed locally. We're not > planning to actually install "npy_tempita" when installing numpy (even for > dev installs), right? > > > On Fri, Sep 30, 2016 at 7:30 AM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> Hi All, >> >> There is a PR to vendorize >> tempita. This removes tempita as a dependency and simplifies some things. >> Feedback on this step is welcome. One question is whether the package >> should be renamed to something like `npy_tempita`, as otherwise installed >> tempita, if any has priority. >> >> Thoughts? >> >> Chuck >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Sep 30 11:25:15 2016 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 30 Sep 2016 09:25:15 -0600 Subject: [Numpy-discussion] Vendorize tempita In-Reply-To: References: Message-ID: On Fri, Sep 30, 2016 at 9:13 AM, Stephan Hoyer wrote: > One way to do this is to move to vendorized dependencies into an submodule > of numpy itself (e.g., sklearn.externals.joblib, though maybe even a little > more indirection than that would be valuable to make it clear that it isn't > part of NumPy public API). This would avoid further enlarging the set of > namespaces we use. > > In any case, I'm perfectly OK with using something like npy_tempita > internally, too, as long as we can be sure that we're using NumPy's > vendorized version, not whatever version is installed locally. We're not > planning to actually install "npy_tempita" when installing numpy (even for > dev installs), right? > > > The only thing in the tools directory included in a source distribution is the swig directory. Tempita is only currently used by the cythonize script also in the tools directory. The search path for the cythonize script is 1) installed modules, 2) modules in same directory, which is why it might be good to rename the module npy_tempita` so that is always the one used. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Sep 30 11:29:36 2016 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 30 Sep 2016 09:29:36 -0600 Subject: [Numpy-discussion] Vendorize tempita In-Reply-To: References: Message-ID: On Fri, Sep 30, 2016 at 9:21 AM, Benjamin Root wrote: > This is the first I am hearing of tempita (looks to be a templating > language). How is it a dependency of numpy? Do I now need tempita in order > to use numpy, or is it a build-time-only dependency? > Build time only. The virtue of tempita is that it can be used to generate cython sources. We could adapt one of our current templating scripts to do that also, but that would seem to be more work. Note that tempita is currently included in cython, but the cython folks consider that an implemention detail that should not be depended upon. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From evgeny.burovskiy at gmail.com Fri Sep 30 11:48:53 2016 From: evgeny.burovskiy at gmail.com (Evgeni Burovski) Date: Fri, 30 Sep 2016 18:48:53 +0300 Subject: [Numpy-discussion] Vendorize tempita In-Reply-To: References: Message-ID: On Fri, Sep 30, 2016 at 6:29 PM, Charles R Harris wrote: > > > On Fri, Sep 30, 2016 at 9:21 AM, Benjamin Root wrote: >> >> This is the first I am hearing of tempita (looks to be a templating >> language). How is it a dependency of numpy? Do I now need tempita in order >> to use numpy, or is it a build-time-only dependency? > > > Build time only. The virtue of tempita is that it can be used to generate > cython sources. We could adapt one of our current templating scripts to do > that also, but that would seem to be more work. Note that tempita is > currently included in cython, but the cython folks consider that an > implemention detail that should not be depended upon. > > > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > Ideally, it's packaged in such a way that it's usable for scipy too -- at the moment it's used in scipy.sparse via Cython.Tempita + a fallback to system installed tempita if Cython.Tempita is not available (however I'm not sure that fallback is ever exercised). Since scipy needs to support numpy down to 1.8.2, a vendorized copy will not be usable for scipy for quite a while. So, it'd be great to handle it like numpydoc: to have npy_tempita as a small self-contained package with the repo under the numpy organization and include it via a git submodule. Chuck, do you think tempita would need much in terms of maintenance? To put some money where my mouth is, I can offer to do some legwork for packaging it up. Cheers, Evgeni From charlesr.harris at gmail.com Fri Sep 30 12:10:38 2016 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 30 Sep 2016 10:10:38 -0600 Subject: [Numpy-discussion] Vendorize tempita In-Reply-To: References: Message-ID: On Fri, Sep 30, 2016 at 9:48 AM, Evgeni Burovski wrote: > On Fri, Sep 30, 2016 at 6:29 PM, Charles R Harris > wrote: > > > > > > On Fri, Sep 30, 2016 at 9:21 AM, Benjamin Root > wrote: > >> > >> This is the first I am hearing of tempita (looks to be a templating > >> language). How is it a dependency of numpy? Do I now need tempita in > order > >> to use numpy, or is it a build-time-only dependency? > > > > > > Build time only. The virtue of tempita is that it can be used to generate > > cython sources. We could adapt one of our current templating scripts to > do > > that also, but that would seem to be more work. Note that tempita is > > currently included in cython, but the cython folks consider that an > > implemention detail that should not be depended upon. > > > > > > > > Chuck > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > Ideally, it's packaged in such a way that it's usable for scipy too -- > at the moment it's used in scipy.sparse via Cython.Tempita + a > fallback to system installed tempita if Cython.Tempita is not > available (however I'm not sure that fallback is ever exercised). > Since scipy needs to support numpy down to 1.8.2, a vendorized copy > will not be usable for scipy for quite a while. > > So, it'd be great to handle it like numpydoc: to have npy_tempita as a > small self-contained package with the repo under the numpy > organization and include it via a git submodule. Chuck, do you think > tempita would need much in terms of maintenance? > > To put some money where my mouth is, I can offer to do some legwork > for packaging it up. > > It might be better to keep tempita and cythonize together so that the search path works out right. It is also possible that other scripts might be wanted as cythonize is currently restricted to cython files (*.pyx.in, *. pxi.in). There are two other templating scripts in numpy/distutils, and I think f2py has a dependency on one of those. If there is a set of tools that would be common to both scipy and numpy, having them included as a submodule would be a good idea. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Sep 30 12:36:47 2016 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 30 Sep 2016 10:36:47 -0600 Subject: [Numpy-discussion] Vendorize tempita In-Reply-To: References: Message-ID: On Fri, Sep 30, 2016 at 10:10 AM, Charles R Harris < charlesr.harris at gmail.com> wrote: > > > On Fri, Sep 30, 2016 at 9:48 AM, Evgeni Burovski < > evgeny.burovskiy at gmail.com> wrote: > >> On Fri, Sep 30, 2016 at 6:29 PM, Charles R Harris >> wrote: >> > >> > >> > On Fri, Sep 30, 2016 at 9:21 AM, Benjamin Root >> wrote: >> >> >> >> This is the first I am hearing of tempita (looks to be a templating >> >> language). How is it a dependency of numpy? Do I now need tempita in >> order >> >> to use numpy, or is it a build-time-only dependency? >> > >> > >> > Build time only. The virtue of tempita is that it can be used to >> generate >> > cython sources. We could adapt one of our current templating scripts to >> do >> > that also, but that would seem to be more work. Note that tempita is >> > currently included in cython, but the cython folks consider that an >> > implemention detail that should not be depended upon. >> > >> > >> > >> > Chuck >> > >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at scipy.org >> > https://mail.scipy.org/mailman/listinfo/numpy-discussion >> > >> >> >> Ideally, it's packaged in such a way that it's usable for scipy too -- >> at the moment it's used in scipy.sparse via Cython.Tempita + a >> fallback to system installed tempita if Cython.Tempita is not >> available (however I'm not sure that fallback is ever exercised). >> Since scipy needs to support numpy down to 1.8.2, a vendorized copy >> will not be usable for scipy for quite a while. >> >> So, it'd be great to handle it like numpydoc: to have npy_tempita as a >> small self-contained package with the repo under the numpy >> organization and include it via a git submodule. Chuck, do you think >> tempita would need much in terms of maintenance? >> >> To put some money where my mouth is, I can offer to do some legwork >> for packaging it up. >> >> > It might be better to keep tempita and cythonize together so that the > search path works out right. It is also possible that other scripts might > be wanted as cythonize is currently restricted to cython files (*.pyx.in, > *.pxi.in). There are two other templating scripts in numpy/distutils, and > I think f2py has a dependency on one of those. > > If there is a set of tools that would be common to both scipy and numpy, > having them included as a submodule would be a good idea. > > Hmm, I suppose it just depends on where submodule is, so a npy_tempita alone would work fine. There isn't much maintenance needed if you resist the urge to refactor the code. I removed a six dependency, but that is now upstream as well. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Fri Sep 30 17:09:04 2016 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 30 Sep 2016 17:09:04 -0400 Subject: [Numpy-discussion] automatically avoiding temporary arrays In-Reply-To: <283e3000-0b9c-886c-e322-1ff4d2e8cb26@googlemail.com> References: <283e3000-0b9c-886c-e322-1ff4d2e8cb26@googlemail.com> Message-ID: On Fri, Sep 30, 2016 at 9:38 AM, Julian Taylor wrote: > hi, > Temporary arrays generated in expressions are expensive as the imply > extra memory bandwidth which is the bottleneck in most numpy operations. > For example: > > r = a + b + c > > creates the b + c temporary and then adds a to it. > This can be rewritten to be more efficient using inplace operations: > > r = b + c > r += a general question (I wouldn't understand the details even if I looked.) how is this affected by broadcasting and type promotion? Some of the main reasons that I don't like to use inplace operation in general is that I'm often not sure when type promotion occurs and when arrays expand during broadcasting. for example b + c is 1-D, a is 2-D, and r has the broadcasted shape. another case when I switch away from broadcasting is when b + c is int or bool and a is float. Thankfully, we get error messages for casting now. > > This saves some memory bandwidth and can speedup the operation by 50% > for very large arrays or even more if the inplace operation allows it to > be completed completely in the cpu cache. I didn't realize the difference can be so large. That would make streamlining some code worth the effort. Josef > > The problem is that inplace operations are a lot less readable so they > are often only used in well optimized code. But due to pythons > refcounting semantics we can actually do some inplace conversions > transparently. > If an operand in python has a reference count of one it must be a > temporary so we can use it as the destination array. CPython itself does > this optimization for string concatenations. > > In numpy we have the issue that we can be called from the C-API directly > where the reference count may be one for other reasons. > To solve this we can check the backtrace until the python frame > evaluation function. If there are only numpy and python functions in > between that and our entry point we should be able to elide the temporary. > > This PR implements this: > https://github.com/numpy/numpy/pull/7997 > > It currently only supports Linux with glibc (which has reliable > backtraces via unwinding) and maybe MacOS depending on how good their > backtrace is. On windows the backtrace APIs are different and I don't > know them but in theory it could also be done there. > > A problem is that checking the backtrace is quite expensive, so should > only be enabled when the involved arrays are large enough for it to be > worthwhile. In my testing this seems to be around 180-300KiB sized > arrays, basically where they start spilling out of the CPU L2 cache. > > I made a little crappy benchmark script to test this cutoff in this branch: > https://github.com/juliantaylor/numpy/tree/elide-bench > > If you are interested you can run it with: > python setup.py build_ext -j 4 --inplace > ipython --profile=null check.ipy > > At the end it will plot the ratio between elided and non-elided runtime. > It should get larger than one around 180KiB on most cpus. > > If no one points out some flaw in the approach, I'm hoping to get this > into the next numpy version. > > cheers, > Julian > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > From jtaylor.debian at googlemail.com Fri Sep 30 17:50:00 2016 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Fri, 30 Sep 2016 23:50:00 +0200 Subject: [Numpy-discussion] automatically avoiding temporary arrays In-Reply-To: References: <283e3000-0b9c-886c-e322-1ff4d2e8cb26@googlemail.com> Message-ID: On 30.09.2016 23:09, josef.pktd at gmail.com wrote: > On Fri, Sep 30, 2016 at 9:38 AM, Julian Taylor > wrote: >> hi, >> Temporary arrays generated in expressions are expensive as the imply >> extra memory bandwidth which is the bottleneck in most numpy operations. >> For example: >> >> r = a + b + c >> >> creates the b + c temporary and then adds a to it. >> This can be rewritten to be more efficient using inplace operations: >> >> r = b + c >> r += a > > general question (I wouldn't understand the details even if I looked.) > > how is this affected by broadcasting and type promotion? > > Some of the main reasons that I don't like to use inplace operation in > general is that I'm often not sure when type promotion occurs and when > arrays expand during broadcasting. > > for example b + c is 1-D, a is 2-D, and r has the broadcasted shape. > another case when I switch away from broadcasting is when b + c is int > or bool and a is float. Thankfully, we get error messages for casting > now. the temporary is only avoided when the casting follows the safe rule, so it should be the same as what you get without inplace operations. E.g. float32-temporary + float64 will not be converted to the unsafe float32 += float64 which a normal inplace operations would allow. But float64-temp + float32 is transformed. Currently the only broadcasting that will be transformed is temporary + scalar value, otherwise it will only work on matching array sizes. Though there is not really anything that prevents full broadcasting but its not implemented yet in the PR. > >> >> This saves some memory bandwidth and can speedup the operation by 50% >> for very large arrays or even more if the inplace operation allows it to >> be completed completely in the cpu cache. > > I didn't realize the difference can be so large. That would make > streamlining some code worth the effort. > > Josef > > >> >> The problem is that inplace operations are a lot less readable so they >> are often only used in well optimized code. But due to pythons >> refcounting semantics we can actually do some inplace conversions >> transparently. >> If an operand in python has a reference count of one it must be a >> temporary so we can use it as the destination array. CPython itself does >> this optimization for string concatenations. >> >> In numpy we have the issue that we can be called from the C-API directly >> where the reference count may be one for other reasons. >> To solve this we can check the backtrace until the python frame >> evaluation function. If there are only numpy and python functions in >> between that and our entry point we should be able to elide the temporary. >> >> This PR implements this: >> https://github.com/numpy/numpy/pull/7997 >> >> It currently only supports Linux with glibc (which has reliable >> backtraces via unwinding) and maybe MacOS depending on how good their >> backtrace is. On windows the backtrace APIs are different and I don't >> know them but in theory it could also be done there. >> >> A problem is that checking the backtrace is quite expensive, so should >> only be enabled when the involved arrays are large enough for it to be >> worthwhile. In my testing this seems to be around 180-300KiB sized >> arrays, basically where they start spilling out of the CPU L2 cache. >> >> I made a little crappy benchmark script to test this cutoff in this branch: >> https://github.com/juliantaylor/numpy/tree/elide-bench >> >> If you are interested you can run it with: >> python setup.py build_ext -j 4 --inplace >> ipython --profile=null check.ipy >> >> At the end it will plot the ratio between elided and non-elided runtime. >> It should get larger than one around 180KiB on most cpus. >> >> If no one points out some flaw in the approach, I'm hoping to get this >> into the next numpy version. >> >> cheers, >> Julian >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From charlesr.harris at gmail.com Fri Sep 30 20:42:28 2016 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 30 Sep 2016 18:42:28 -0600 Subject: [Numpy-discussion] Vendorize tempita In-Reply-To: References: Message-ID: On Fri, Sep 30, 2016 at 10:36 AM, Charles R Harris < charlesr.harris at gmail.com> wrote: > > > On Fri, Sep 30, 2016 at 10:10 AM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Fri, Sep 30, 2016 at 9:48 AM, Evgeni Burovski < >> evgeny.burovskiy at gmail.com> wrote: >> >>> On Fri, Sep 30, 2016 at 6:29 PM, Charles R Harris >>> wrote: >>> > >>> > >>> > On Fri, Sep 30, 2016 at 9:21 AM, Benjamin Root >>> wrote: >>> >> >>> >> This is the first I am hearing of tempita (looks to be a templating >>> >> language). How is it a dependency of numpy? Do I now need tempita in >>> order >>> >> to use numpy, or is it a build-time-only dependency? >>> > >>> > >>> > Build time only. The virtue of tempita is that it can be used to >>> generate >>> > cython sources. We could adapt one of our current templating scripts >>> to do >>> > that also, but that would seem to be more work. Note that tempita is >>> > currently included in cython, but the cython folks consider that an >>> > implemention detail that should not be depended upon. >>> > >>> > >>> > >>> > Chuck >>> > >>> > _______________________________________________ >>> > NumPy-Discussion mailing list >>> > NumPy-Discussion at scipy.org >>> > https://mail.scipy.org/mailman/listinfo/numpy-discussion >>> > >>> >>> >>> Ideally, it's packaged in such a way that it's usable for scipy too -- >>> at the moment it's used in scipy.sparse via Cython.Tempita + a >>> fallback to system installed tempita if Cython.Tempita is not >>> available (however I'm not sure that fallback is ever exercised). >>> Since scipy needs to support numpy down to 1.8.2, a vendorized copy >>> will not be usable for scipy for quite a while. >>> >>> So, it'd be great to handle it like numpydoc: to have npy_tempita as a >>> small self-contained package with the repo under the numpy >>> organization and include it via a git submodule. Chuck, do you think >>> tempita would need much in terms of maintenance? >>> >>> To put some money where my mouth is, I can offer to do some legwork >>> for packaging it up. >>> >>> >> It might be better to keep tempita and cythonize together so that the >> search path works out right. It is also possible that other scripts might >> be wanted as cythonize is currently restricted to cython files (*.pyx.in, >> *.pxi.in). There are two other templating scripts in numpy/distutils, >> and I think f2py has a dependency on one of those. >> >> If there is a set of tools that would be common to both scipy and numpy, >> having them included as a submodule would be a good idea. >> >> > Hmm, I suppose it just depends on where submodule is, so a npy_tempita > alone would work fine. There isn't much maintenance needed if you resist > the urge to refactor the code. I removed a six dependency, but that is now > upstream as well. > There don't seem to be any objections, so I will put the current vendorization in. Evgeni, if you think it a good idea to make a repo for this and use submodules, go ahead with that. I have left out the testing infrastructure at https://github.com/gjhiggins/tempita which runs a sparse set of doctests. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Fri Sep 30 23:08:01 2016 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 1 Oct 2016 16:08:01 +1300 Subject: [Numpy-discussion] Vendorize tempita In-Reply-To: References: Message-ID: On Sat, Oct 1, 2016 at 1:42 PM, Charles R Harris wrote: > > > On Fri, Sep 30, 2016 at 10:36 AM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Fri, Sep 30, 2016 at 10:10 AM, Charles R Harris < >> charlesr.harris at gmail.com> wrote: >> >>> >>> >>> On Fri, Sep 30, 2016 at 9:48 AM, Evgeni Burovski < >>> evgeny.burovskiy at gmail.com> wrote: >>> >>>> On Fri, Sep 30, 2016 at 6:29 PM, Charles R Harris >>>> wrote: >>>> > >>>> > >>>> > On Fri, Sep 30, 2016 at 9:21 AM, Benjamin Root >>>> wrote: >>>> >> >>>> >> This is the first I am hearing of tempita (looks to be a templating >>>> >> language). How is it a dependency of numpy? Do I now need tempita in >>>> order >>>> >> to use numpy, or is it a build-time-only dependency? >>>> > >>>> > >>>> > Build time only. The virtue of tempita is that it can be used to >>>> generate >>>> > cython sources. We could adapt one of our current templating scripts >>>> to do >>>> > that also, but that would seem to be more work. Note that tempita is >>>> > currently included in cython, but the cython folks consider that an >>>> > implemention detail that should not be depended upon. >>>> > >>>> > >>>> > >>>> > Chuck >>>> > >>>> > _______________________________________________ >>>> > NumPy-Discussion mailing list >>>> > NumPy-Discussion at scipy.org >>>> > https://mail.scipy.org/mailman/listinfo/numpy-discussion >>>> > >>>> >>>> >>>> Ideally, it's packaged in such a way that it's usable for scipy too -- >>>> at the moment it's used in scipy.sparse via Cython.Tempita + a >>>> fallback to system installed tempita if Cython.Tempita is not >>>> available (however I'm not sure that fallback is ever exercised). >>>> Since scipy needs to support numpy down to 1.8.2, a vendorized copy >>>> will not be usable for scipy for quite a while. >>>> >>>> So, it'd be great to handle it like numpydoc: to have npy_tempita as a >>>> small self-contained package with the repo under the numpy >>>> organization and include it via a git submodule. Chuck, do you think >>>> tempita would need much in terms of maintenance? >>>> >>>> To put some money where my mouth is, I can offer to do some legwork >>>> for packaging it up. >>>> >>>> >>> It might be better to keep tempita and cythonize together so that the >>> search path works out right. It is also possible that other scripts might >>> be wanted as cythonize is currently restricted to cython files (*.pyx.in, >>> *.pxi.in). There are two other templating scripts in numpy/distutils, >>> and I think f2py has a dependency on one of those. >>> >>> If there is a set of tools that would be common to both scipy and numpy, >>> having them included as a submodule would be a good idea. >>> >>> >> Hmm, I suppose it just depends on where submodule is, so a npy_tempita >> alone would work fine. There isn't much maintenance needed if you resist >> the urge to refactor the code. I removed a six dependency, but that is now >> upstream as well. >> > > There don't seem to be any objections, so I will put the current > vendorization in. > LGTM as is. tools/ seems to be the right place, its outside the numpy package so no one can import it as numpy.something, which is better than a numpy.externals or numpy.vendor submodule. > Evgeni, if you think it a good idea to make a repo for this and use > submodules, go ahead with that. I have left out the testing infrastructure > at https://github.com/gjhiggins/tempita which runs a sparse set of > doctests. > There's a problem with that: it will then not be possible to do: git clone ... python setup.py install # or equivalent We shouldn't force everyone to mess with "git submodule" for this. I suspect a submodule would also break a pip install directly from github. There'll be very few (if any) changes from upstream Tempita, so making a reusable npy_tempita seems premature. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: