From chris.barker at noaa.gov Sat Aug 1 18:55:02 2015 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Sat, 1 Aug 2015 15:55:02 -0700 Subject: [Numpy-discussion] Proposal: Deprecate np.int, np.float, etc.? In-Reply-To: <1278336859460085079.353193sturla.molden-gmail.com@news.gmane.org> References: <55B25F1A.70107@googlemail.com> <1281473095460055826.208120sturla.molden-gmail.com@news.gmane.org> <1051175736349575057@unknownmsgid> <1278336859460085079.353193sturla.molden-gmail.com@news.gmane.org> Message-ID: <-2868036826153660775@unknownmsgid> >> Turns out I was passing in numpy arrays that I had typed as "np.int". >> It worked OK two years ago when I was testing only on 32 bit pythons, >> but today I got a bunch of failed tests on 64 bit OS-X -- a np.int is >> now a C long! > > It has always been C long. It is the C long that varies between platforms. Of course, it's that a c long was a c int on the platform I wrote the code on the first time. Which is part of the problem with C -- if two types happen to be the same, the compiler is perfectly happy. But that was an error in the first place, it never should have passed. But that's just me. ;-) Anyway, as far as concrete proposals go. I say we deprecate the Python types in the numpy namespace (i.e int and float) Other than that, I'm not sure there's any problem. -Chris > > Sturla > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From charlesr.harris at gmail.com Sat Aug 1 19:51:16 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 1 Aug 2015 17:51:16 -0600 Subject: [Numpy-discussion] Branching 1.10 Sunday, Aug 2. Message-ID: Hi All, Just a heads up. If anything absolutely needed has been left out, please make a noise. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Sat Aug 1 19:52:47 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Sat, 1 Aug 2015 23:52:47 +0000 (UTC) Subject: [Numpy-discussion] Proposal: Deprecate np.int, np.float, etc.? References: <55B25F1A.70107@googlemail.com> <1281473095460055826.208120sturla.molden-gmail.com@news.gmane.org> <1051175736349575057@unknownmsgid> <1278336859460085079.353193sturla.molden-gmail.com@news.gmane.org> <-2868036826153660775@unknownmsgid> Message-ID: <1157332328460165630.410393sturla.molden-gmail.com@news.gmane.org> Chris Barker - NOAA Federal wrote: > Which is part of the problem with C -- if two types happen to be the > same, the compiler is perfectly happy. That int and long int be the same is not more problematic than int and signed int be the same. Sturla From charlesr.harris at gmail.com Sun Aug 2 01:08:10 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 1 Aug 2015 23:08:10 -0600 Subject: [Numpy-discussion] mailmap update Message-ID: Hi All, I'm trying to update the .mailmap file on github and could use some help. The current version seems common to both numpy and scipy, hence the crosspost. Here is what I've got so far. Alex Griffing ncsu.edu> alex ncsu.edu> Alex Griffing ncsu.edu> argriffing ncsu.edu> Alex Griffing ncsu.edu> argriffing users.noreply.github.com> Behzad Nouri gmail.com> behzad nouri gmail.com> Carl Kleffner gmail.com> carlkl gmail.com> Christoph Gohlke uci.edu> Christolph Gohlke uci.edu> Christoph Gohlke uci.edu> cgholke ?> Christoph Gohlke uci.edu> cgohlke uci.edu> Han Genuit gmail.com> Han gmail.com> Jaime Fernandez gmail.com> Jaime gmail.com > Jaime Fernandez gmail.com> jaimefrio gmail.com> Mark Wiebe gmail.com> Mark gmail.com> Mark Wiebe gmail.com> Mark Wiebe enthought.com> Mark Wiebe gmail.com> Mark Wiebe georg.(none)> Nathaniel J. Smith pobox.com> njsmith pobox.com> Ond?ej ?ert?k gmail.com> Ondrej Certik gmail.com> Ralf Gommers googlemail.com> rgommers googlemail.com> Saullo Giovani gmail.com> saullogiovani gmail.com> Sebastian Berg sipsolutions.net> seberg sipsolutions.net> Anon abdulmuneer gmail.com> Anon amir gmail.com> Anon cel gmail.com> Anon chebee7i gmail.com> Anon empeeu yahoo.com> Anon endolith gmail.com> Anon hannaro gmx.net> Anon hpaulj myuw.net> Anon immerrr gmail.com> Anon jmrosen155 Jordans-MacBook-Pro.local> Anon jnothman student.usyd.edu.au> Anon kanhua gmail.com> Anon mamikony sig.com> Anon mbyt web.de> Anon mlai begws92.beg.utexas.edu> Anon ryanblak gmail.com> Anon styr gmail.com> Anon tdihp hotmail.com> Anon tpoole gmail.com> Anon wim glenn melbourneit.com.au> The Anon author is just a standing in for unknown author. I can make a guess at some of those, but would prefer it if the people in question could supply their proper name and address. TIA, Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.b.poole at gmail.com Sun Aug 2 05:04:07 2015 From: t.b.poole at gmail.com (Tom Poole) Date: Sun, 2 Aug 2015 10:04:07 +0100 Subject: [Numpy-discussion] mailmap update In-Reply-To: References: Message-ID: <3EC93933-DC4B-497E-B052-FE359CEA632F@gmail.com> Hi Chuck, Tom Poole gmail.com > tpoole gmail.com > Tom > On 2 Aug 2015, at 06:08, Charles R Harris wrote: > > Hi All, > > I'm trying to update the .mailmap file on github and could use some help. The current version seems common to both numpy and scipy, hence the crosspost. Here is what I've got so far. > > Alex Griffing ncsu.edu > alex ncsu.edu > > Alex Griffing ncsu.edu > argriffing ncsu.edu > > Alex Griffing ncsu.edu > argriffing users.noreply.github.com > > Behzad Nouri gmail.com > behzad nouri gmail.com > > Carl Kleffner gmail.com > carlkl gmail.com > > Christoph Gohlke uci.edu > Christolph Gohlke uci.edu > > Christoph Gohlke uci.edu > cgholke ?> > Christoph Gohlke uci.edu > cgohlke uci.edu > > Han Genuit gmail.com > Han gmail.com > > Jaime Fernandez gmail.com > Jaime gmail.com > > Jaime Fernandez gmail.com > jaimefrio gmail.com > > Mark Wiebe gmail.com > Mark gmail.com > > Mark Wiebe gmail.com > Mark Wiebe enthought.com > > Mark Wiebe gmail.com > Mark Wiebe georg.(none)> > Nathaniel J. Smith pobox.com > njsmith pobox.com > > Ond?ej ?ert?k gmail.com > Ondrej Certik gmail.com > > Ralf Gommers googlemail.com > rgommers googlemail.com > > Saullo Giovani gmail.com > saullogiovani gmail.com > > Sebastian Berg sipsolutions.net > seberg sipsolutions.net > > > Anon > abdulmuneer gmail.com > > Anon > amir gmail.com > > Anon > cel gmail.com > > Anon > chebee7i gmail.com > > Anon > empeeu yahoo.com > > Anon > endolith gmail.com > > Anon > hannaro gmx.net > > Anon > hpaulj myuw.net > > Anon > immerrr gmail.com > > Anon > jmrosen155 Jordans-MacBook-Pro.local> > Anon > jnothman student.usyd.edu.au > > Anon > kanhua gmail.com > > Anon > mamikony sig.com > > Anon > mbyt web.de > > Anon > mlai begws92.beg.utexas.edu > > Anon > ryanblak gmail.com > > Anon > styr gmail.com > > Anon > tdihp hotmail.com > > Anon > tpoole gmail.com > > Anon > wim glenn melbourneit.com.au > > > The Anon author is just a standing in for unknown author. I can make a guess at some of those, but would prefer it if the people in question could supply their proper name and address. > > TIA, > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From sturla.molden at gmail.com Sun Aug 2 08:13:14 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Sun, 02 Aug 2015 14:13:14 +0200 Subject: [Numpy-discussion] Proposal: Deprecate np.int, np.float, etc.? In-Reply-To: <55BB2611.10003@googlemail.com> References: <55B25F1A.70107@googlemail.com> <55BB2611.10003@googlemail.com> Message-ID: On 31/07/15 09:38, Julian Taylor wrote: > A long is only machine word wide on posix, in windows its not. Actually it is the opposite. A pointer is 64 bit on AMD64, but the native integer and pointer offset is only 32 bit. But it does not matter because it is int that should be machine word sized, not long, which it is on both platforms. Sturla From kwang24 at wisc.edu Sun Aug 2 09:55:54 2015 From: kwang24 at wisc.edu (Kang Wang) Date: Sun, 02 Aug 2015 08:55:54 -0500 Subject: [Numpy-discussion] Change default order to Fortran order Message-ID: <7740864542dd.55bddb1a@wiscmail.wisc.edu> Hi, I am an imaging researcher, and a new Python user. My first Python project is to somehow modify NumPy source code such that everything is Fortran column-major by default. I read about the information in the link below, but for us, the fact is that?we absolutely want to use Fortran column major, and we want to make it default. Explicitly writing " order = 'F' " all over the place is not acceptable to us. http://docs.scipy.org/doc/numpy/reference/internals.html#multidimensional-array-indexing-order-issues I tried searching in this email list, as well as google search in general. However, I have not found anything useful. This must be a common request/need, I believe. Can anyone provide any insight/help? Thank you very much, Kang -- Kang Wang, Ph.D. 1111 Highland Ave., Room 1113 Madison, WI 53705-2275 ---------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Sun Aug 2 10:27:08 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Sun, 02 Aug 2015 16:27:08 +0200 Subject: [Numpy-discussion] Change default order to Fortran order In-Reply-To: <7740864542dd.55bddb1a@wiscmail.wisc.edu> References: <7740864542dd.55bddb1a@wiscmail.wisc.edu> Message-ID: On 02/08/15 15:55, Kang Wang wrote: > Can anyone provide any insight/help? There is no "default order". There was before, but now all operators control the order of their return arrays from the order of their input array. The only thing that makes C order "default" is the keyword argument to np.empty, np.ones and np.zeros. Just monkey patch those functions and it should be fine. Sturla From sebastian at sipsolutions.net Sun Aug 2 13:19:43 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Sun, 2 Aug 2015 17:19:43 +0000 Subject: [Numpy-discussion] Change default order to Fortran order In-Reply-To: References: <7740864542dd.55bddb1a@wiscmail.wisc.edu> Message-ID: Well, numpy has a tendency to prefer C order. There is nothing you can do about that really. But you just cannot be sure what you get in some cases. Often you need something specific for interfaceing other code. But in that case quite often you also do not need to fear the copy. - Sebastian On Sun Aug 2 16:27:08 2015 GMT+0200, Sturla Molden wrote: > On 02/08/15 15:55, Kang Wang wrote: > > > Can anyone provide any insight/help? > > There is no "default order". There was before, but now all operators > control the order of their return arrays from the order of their input > array. The only thing that makes C order "default" is the keyword > argument to np.empty, np.ones and np.zeros. Just monkey patch those > functions and it should be fine. > > Sturla > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From kwang24 at wisc.edu Sun Aug 2 16:14:15 2015 From: kwang24 at wisc.edu (Kang Wang) Date: Sun, 02 Aug 2015 15:14:15 -0500 Subject: [Numpy-discussion] Change default order to Fortran order In-Reply-To: <772087c3198ed4.55be79da@wiscmail.wisc.edu> References: <7740864542dd.55bddb1a@wiscmail.wisc.edu> <74b0de5619c9ee.55be77b0@wiscmail.wisc.edu> <77208f8c19abe0.55be77ed@wiscmail.wisc.edu> <7740c3e01988b3.55be782b@wiscmail.wisc.edu> <7740821519faf2.55be7868@wiscmail.wisc.edu> <74b0aa0e19ff5a.55be78a5@wiscmail.wisc.edu> <7610d2e119e175.55be78e3@wiscmail.wisc.edu> <76b0a897198bde.55be7921@wiscmail.wisc.edu> <7450811719dc7d.55be795e@wiscmail.wisc.edu> <75e0cf9619906d.55be799c@wiscmail.wisc.edu> <772087c3198ed4.55be79da@wiscmail.wisc.edu> Message-ID: <7720ae8f19c5b1.55be33c7@wiscmail.wisc.edu> Thank you all for replying! I did a quick test, using python 2.6.6, and the original numpy package on my Linux computer without any change.== x = np.zeros((2,3),dtype=np.int32,order='F') print "x.strides =" print x.strides y = x + 1 print "y.strides =" print y.strides == Output: -------- x.strides = (4, 8) y.strides = (12, 4) -------- So, basically, "x" is Fortran-style column-major (because I explicitly write order='F'), but "y" is C-style row-major. This is going to be very annoying. What I really want is: - I do not have to write order='F' explicitly when declaring "x" - both "x" and "y" are Fortran-style column-major Which file should I modify to achieve this goal? Right now, I am just trying to get some basic stuff working with all arrays default to Fortran-style, and I can worry about interfacing with other code/libraries later. Thanks, Kang On 08/02/15, Sebastian Berg wrote: > Well, numpy has a tendency to prefer C order. There is nothing you can do about that really. But you just cannot be sure what you get in some cases. > Often you need something specific for interfaceing other code. But in that case quite often you also do not need to fear the copy. > > - Sebastian > > > On Sun Aug 2 16:27:08 2015 GMT+0200, Sturla Molden wrote: > > On 02/08/15 15:55, Kang Wang wrote: > > > > > Can anyone provide any insight/help? > > > > There is no "default order". There was before, but now all operators > > control the order of their return arrays from the order of their input > > array. The only thing that makes C order "default" is the keyword > > argument to np.empty, np.ones and np.zeros. Just monkey patch those > > functions and it should be fine. > > > > Sturla > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- Kang Wang, Ph.D. 1111 Highland Ave., Room 1113 Madison, WI 53705-2275 TEL 608-263-0066 http://www.medphysics.wisc.edu/~kang/ ---------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Sun Aug 2 16:22:01 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Sun, 02 Aug 2015 22:22:01 +0200 Subject: [Numpy-discussion] Change default order to Fortran order In-Reply-To: <7720ae8f19c5b1.55be33c7@wiscmail.wisc.edu> References: <7740864542dd.55bddb1a@wiscmail.wisc.edu> <74b0de5619c9ee.55be77b0@wiscmail.wisc.edu> <77208f8c19abe0.55be77ed@wiscmail.wisc.edu> <7740c3e01988b3.55be782b@wiscmail.wisc.edu> <7740821519faf2.55be7868@wiscmail.wisc.edu> <74b0aa0e19ff5a.55be78a5@wiscmail.wisc.edu> <7610d2e119e175.55be78e3@wiscmail.wisc.edu> <76b0a897198bde.55be7921@wiscmail.wisc.edu> <7450811719dc7d.55be795e@wiscmail.wisc.edu> <75e0cf9619906d.55be799c@wiscmail.wisc.edu> <772087c3198ed4.55be79da@wiscmail.wisc.edu> <7720ae8f19c5b1.55be33c7@wiscmail.wisc.edu> Message-ID: On 02/08/15 22:14, Kang Wang wrote: > Thank you all for replying! > > I did a quick test, using python 2.6.6, and the original numpy package > on my Linux computer without any change. > == > x = np.zeros((2,3),dtype=np.int32,order='F') > print "x.strides =" > print x.strides > > y = x + 1 > print "y.strides =" > print y.strides > == > > Output: > -------- > x.strides = > (4, 8) > y.strides = > (12, 4) > -------- Update NumPy. This is the behavior I talked about that has changed. Now NumPy does this: In [21]: x = np.zeros((2,3),dtype=np.int32,order='F') In [22]: y = x + 1 In [24]: x.strides Out[24]: (4, 8) In [25]: y.strides Out[25]: (4, 8) Sturla From bryanv at continuum.io Sun Aug 2 16:28:24 2015 From: bryanv at continuum.io (Bryan Van de Ven) Date: Sun, 2 Aug 2015 15:28:24 -0500 Subject: [Numpy-discussion] Change default order to Fortran order In-Reply-To: References: <7740864542dd.55bddb1a@wiscmail.wisc.edu> <74b0de5619c9ee.55be77b0@wiscmail.wisc.edu> <77208f8c19abe0.55be77ed@wiscmail.wisc.edu> <7740c3e01988b3.55be782b@wiscmail.wisc.edu> <7740821519faf2.55be7868@wiscmail.wisc.edu> <74b0aa0e19ff5a.55be78a5@wiscmail.wisc.edu> <7610d2e119e175.55be78e3@wiscmail.wisc.edu> <76b0a897198bde.55be7921@wiscmail.wisc.edu> <7450811719dc7d.55be795e@wiscmail.wisc.edu> <75e0cf9619906d.55be799c@wiscmail.wisc.edu> <772087c3198ed4.55be79da@wiscmail.wisc.edu> <7720ae8f19c5b1.55be33c7@wiscmail.wisc.edu> Message-ID: <83FD48B4-2CF1-4D9E-AED4-FA0F17A07722@continuum.io> And to eliminate the order kwarg, use functools.partial to patch the zeros function (or any others, as needed): In [26]: import numpy as np In [27]: from functools import partial In [28]: np.zeros = partial(np.zeros, order="F") In [29]: x = np.zeros((2,3), dtype=np.int32) In [30]: y = x + 1 In [31]: x.strides Out[31]: (4, 8) In [32]: y.strides Out[32]: (4, 8) In [33]: np.__version__ Out[33]: '1.9.2' Bryan > On Aug 2, 2015, at 3:22 PM, Sturla Molden wrote: > > On 02/08/15 22:14, Kang Wang wrote: >> Thank you all for replying! >> >> I did a quick test, using python 2.6.6, and the original numpy package >> on my Linux computer without any change. >> == >> x = np.zeros((2,3),dtype=np.int32,order='F') >> print "x.strides =" >> print x.strides >> >> y = x + 1 >> print "y.strides =" >> print y.strides >> == >> >> Output: >> -------- >> x.strides = >> (4, 8) >> y.strides = >> (12, 4) >> -------- > > Update NumPy. This is the behavior I talked about that has changed. > > Now NumPy does this: > > > In [21]: x = np.zeros((2,3),dtype=np.int32,order='F') > > In [22]: y = x + 1 > > In [24]: x.strides > Out[24]: (4, 8) > > In [25]: y.strides > Out[25]: (4, 8) > > > > Sturla > > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From sturla.molden at gmail.com Sun Aug 2 16:46:50 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Sun, 02 Aug 2015 22:46:50 +0200 Subject: [Numpy-discussion] Change default order to Fortran order In-Reply-To: <83FD48B4-2CF1-4D9E-AED4-FA0F17A07722@continuum.io> References: <7740864542dd.55bddb1a@wiscmail.wisc.edu> <74b0de5619c9ee.55be77b0@wiscmail.wisc.edu> <77208f8c19abe0.55be77ed@wiscmail.wisc.edu> <7740c3e01988b3.55be782b@wiscmail.wisc.edu> <7740821519faf2.55be7868@wiscmail.wisc.edu> <74b0aa0e19ff5a.55be78a5@wiscmail.wisc.edu> <7610d2e119e175.55be78e3@wiscmail.wisc.edu> <76b0a897198bde.55be7921@wiscmail.wisc.edu> <7450811719dc7d.55be795e@wiscmail.wisc.edu> <75e0cf9619906d.55be799c@wiscmail.wisc.edu> <772087c3198ed4.55be79da@wiscmail.wisc.edu> <7720ae8f19c5b1.55be33c7@wiscmail.wisc.edu> <83FD48B4-2CF1-4D9E-AED4-FA0F17A07722@continuum.io> Message-ID: On 02/08/15 22:28, Bryan Van de Ven wrote: > And to eliminate the order kwarg, use functools.partial to patch the zeros function (or any others, as needed): This will probably break code that depends on NumPy, like SciPy and scikit-image. But if NumPy is all that matters, sure go ahead and monkey patch. Otherwise keep the patched functions in another namespace. :-) Sturla From njs at pobox.com Sun Aug 2 18:08:19 2015 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 2 Aug 2015 22:08:19 +0000 Subject: [Numpy-discussion] Change default order to Fortran order In-Reply-To: <7740864542dd.55bddb1a@wiscmail.wisc.edu> References: <7740864542dd.55bddb1a@wiscmail.wisc.edu> Message-ID: On Aug 2, 2015 6:59 AM, "Kang Wang" wrote: > > Hi, > > I am an imaging researcher, and a new Python user. My first Python project is to somehow modify NumPy source code such that everything is Fortran column-major by default. > > I read about the information in the link below, but for us, the fact is that we absolutely want to use Fortran column major, and we want to make it default. Explicitly writing " order = 'F' " all over the place is not acceptable to us. > http://docs.scipy.org/doc/numpy/reference/internals.html#multidimensional-array-indexing-order-issues > > I tried searching in this email list, as well as google search in general. However, I have not found anything useful. This must be a common request/need, I believe. It isn't, I'm afraid. Basically what you're signing up for is to maintain your own copy of numpy all by yourself. You're totally within your rights to do this, but it isn't something I would really recommend as a first python project (to put it mildly). And unfortunately, there are plenty of libraries out there that use numpy and assume they will get C order by default, so your version of numpy will create lots of obscure errors, segfaults, etc. as you start using it with other packages. Obviously this will be a problem for you -- basically you may find yourself having to maintain your own copy of lots of libraries. Less obviously, this would also create a big problem for us, because your users will start filling bug reports on numpy, or on these random third party packages, and it will be massively confusing and a big waste of time because the problem will be with your package, not with any of our code. So if you do do this, please either (a) change the name of your package somehow ('import numpyfortran' or similar) so that everyone using it is clear that it's a non-standard product, or else (b) make sure that you only use it within your own team, don't allow anyone else to use it, and make a rule that no one is allowed to file bug reports, or ask or answer questions on mailing lists or stackoverflow, unless they have first double checked *every* time that what they're saying is also valid when using regular numpy. Again, I strongly recommend you not do this. There are literally millions of users who are using numpy as it currently is, and able to get stuff done. I don't know your specific situation, but maybe if you describe a bit more what it is you're doing and why you think you need all-Fortran-all-the-time, then people will be able to suggest strategies to work around things on your end, or find smaller tweaks to numpy that could go into the standard version. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sun Aug 2 18:17:51 2015 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 2 Aug 2015 22:17:51 +0000 Subject: [Numpy-discussion] Change default order to Fortran order In-Reply-To: <7720ae8f19c5b1.55be33c7@wiscmail.wisc.edu> References: <7740864542dd.55bddb1a@wiscmail.wisc.edu> <74b0de5619c9ee.55be77b0@wiscmail.wisc.edu> <77208f8c19abe0.55be77ed@wiscmail.wisc.edu> <7740c3e01988b3.55be782b@wiscmail.wisc.edu> <7740821519faf2.55be7868@wiscmail.wisc.edu> <74b0aa0e19ff5a.55be78a5@wiscmail.wisc.edu> <7610d2e119e175.55be78e3@wiscmail.wisc.edu> <76b0a897198bde.55be7921@wiscmail.wisc.edu> <7450811719dc7d.55be795e@wiscmail.wisc.edu> <75e0cf9619906d.55be799c@wiscmail.wisc.edu> <772087c3198ed4.55be79da@wiscmail.wisc.edu> <7720ae8f19c5b1.55be33c7@wiscmail.wisc.edu> Message-ID: On Aug 2, 2015 1:17 PM, "Kang Wang" wrote: > > Thank you all for replying! > > I did a quick test, using python 2.6.6, There's pretty much no good reason these days to be using python 2.6 (which was released in *2008*). I assume you're using it because you're using redhat or some redhat derivative, and that's what they ship by default? Even redhat engineers officially recommend that users *not* use the default python -- it's basically only intended for use by their own built-in system management scripts. If you're just getting started with python, then at this point I'd recommend starting with python 3.4. Some easy ways to get this installed: - Anaconda: the most popular scientific python distribution -- you pretty much just download one file and get a full, up to date setup of python and all the main scientific packages, in your home directory. Supported on all popular platforms. Trivial to use and requires no special permissions. http://continuum.io/downloads#py34 - One of Anaconda's competitors: http://www.scipy.org/install.html - Software collections: redhat's official way to do things like this: https://www.softwarecollections.org/en/ -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From sank.daniel at gmail.com Sun Aug 2 18:24:51 2015 From: sank.daniel at gmail.com (Daniel Sank) Date: Sun, 2 Aug 2015 15:24:51 -0700 Subject: [Numpy-discussion] Change default order to Fortran order In-Reply-To: References: <7740864542dd.55bddb1a@wiscmail.wisc.edu> Message-ID: Could you please explain why you need 'F' ordering? It's pretty unlikely that you actually care about the internal memory layout, and you'll get better advice if you explain why you think you do care. > My first Python project is to somehow modify NumPy source > code such that everything is Fortran column-major by default. This is the road to pain. You'll have to maintain your own fork and will probably inject bugs when trying to rewrite. Nobody will want to help fix them because everyone else just uses numpy as is. > And to eliminate the order kwarg, use functools.partial to patch the > zeros function (or any others, as needed): Instead of monkey patching, why not just define your own shims: fortran_zeros = partial(np.zeros(order='F')) Seems like this would lead to a lot less confusion (although until you tell us why you care about the in-memory layout I don't know the point of doing this at all). -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sun Aug 2 18:52:50 2015 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 2 Aug 2015 22:52:50 +0000 Subject: [Numpy-discussion] Change default order to Fortran order In-Reply-To: References: <7740864542dd.55bddb1a@wiscmail.wisc.edu> Message-ID: On Aug 2, 2015 7:30 AM, "Sturla Molden" wrote: > > On 02/08/15 15:55, Kang Wang wrote: > > > Can anyone provide any insight/help? > > There is no "default order". There was before, but now all operators > control the order of their return arrays from the order of their input > array. This is... overoptimistic. I would not rely on this in code that I wrote. It's true that many numpy operations do preserve the input order. But there are also many operations that don't. And which are which often changes between releases. (Not on purpose usually, but it's an easy bug to introduce. And sometimes it is done intentionally, e.g. to make functions faster. It sucks to have to make a function slower for everyone because someone somewhere is depending on memory layout default details.) And there are operations where it's not even clear what preserving order means (indexing a C array with a Fortran array, add(C, fortran), ...), and even lots of operations that intrinsically break contiguity/ordering (transpose, diagonal, slicing, swapaxes, ...), so you will end up with mixed orderings one way or another in any non-trivial program. Instead, it's better to explicitly specify order= just at the places where you care. That way your code is *locally* correct (you can check it will work by just reading the one function). The alternative is to try and enforce a *global* property on your program ("everyone everywhere is very careful to only use contiguity-preserving operations", where "everyone" includes third party libraries like numpy and others). In software design, local invariants invariants are always better than global invariants -- the most well known example is local variables versus global variables, but the principle is much broader. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From kwang24 at wisc.edu Sun Aug 2 22:16:17 2015 From: kwang24 at wisc.edu (Kang Wang) Date: Sun, 02 Aug 2015 21:16:17 -0500 Subject: [Numpy-discussion] Change default order to Fortran order In-Reply-To: <74b0c506198959.55beced2@wiscmail.wisc.edu> References: <7740864542dd.55bddb1a@wiscmail.wisc.edu> <7610d3f519bbd6.55bec8c3@wiscmail.wisc.edu> <762089a419b1da.55bec901@wiscmail.wisc.edu> <7690e69c19c6f5.55bec93e@wiscmail.wisc.edu> <74508bf519b933.55bec97c@wiscmail.wisc.edu> <7600e48b19abd0.55bec9b9@wiscmail.wisc.edu> <76109628198969.55bec9f7@wiscmail.wisc.edu> <7620c52619dbc7.55beca36@wiscmail.wisc.edu> <7740b86c19a9af.55beca74@wiscmail.wisc.edu> <74b0c96f19dd2e.55becab2@wiscmail.wisc.edu> <7720e87c19fe7f.55becaf0@wiscmail.wisc.edu> <76b0c5ac19a331.55becb2e@wiscmail.wisc.edu> <7620b4de199872.55becb6d@wiscmail.wisc.edu> <74b0bbd319feec.55becbab@wiscmail.wisc.edu> <7620d52d19bee2.55becbeb@wiscmail.wisc.edu> <76b0e43919dfdb.55becc29@wiscmail.wisc.edu> <75e0a7dc19bdab.55becc68@wiscmail.wisc.edu> <75e0c028198583.55becca6@wiscmail.wisc.edu> <74b0a09d19c79c.55becce4@wiscmail.wisc.edu> <76b0808619d856.55becd22@wiscmail.wisc.edu> <7610891d1992e0.55becd5f@wiscmail.wisc.edu> <7740b16819811d.55becd9d@wiscmail.wisc.edu> <75e0d05a19a210.55becddb@wiscmail.wisc.edu> <7620f86d19cdfb.55bece19@wiscmail.wisc.edu> <74b0e26619d88e.55bece57@wiscmail.wisc.edu> <7450aa2219d85b.55bece94@wiscmail.wisc.edu> <74b0c506198959.55beced2@wiscmail.wisc.edu> Message-ID: <74b0e2ba198afc.55be88a1@wiscmail.wisc.edu> Thank you all for replying and providing useful insights and suggestions. The reasons I really want to use column-major are: I am image-oriented user (not matrix-oriented, as explained in?http://docs.scipy.org/doc/numpy/reference/internals.html#multidimensional-array-indexing-order-issues) I am so used to read/write "I(x, y, z)" in textbook and code, and it is very likely that if the environment (row-major environment) forces me to write I(z, y, x), I will write a bug if I am not 100% focused. When this happens, it is difficult to debug, because everything compile and build fine. You will see run time error. Depending on environment, you may get useful error message (i.e. index out of range), but sometimes you just get bad image results. It actually has not too much to do with the actual data layout in memory. In imaging processing, especially medical imaging where I am working in, if you have a 3D image, everyone will agree that in memory, the X index is the fasted changing index, and the Z dimension (we often call it the "slice" dimension) has the largest stride in memory. So, if data?layout?is like this in memory, and image-oriented users are so used to read/write "I(x,y,z)", the only storage order that makes sense is column-major I also write code in MATLAB and C/C++. In MATLAB, matrix is column-major array. In C/C++, we often use ITK, which is also column-major (http://www.itk.org/Doxygen/html/classitk_1_1Image.html). I really prefer always read/write column-major code to minimize coding bugs related to storage order. I also prefer index to be 0-based; however, there is nothing I can do about it for MATLAB (which is 1-based). I can see that my original thought about "modifying NumPy source and re-compile" is probably a bad idea. The suggestions about using "fortran_zeros = partial(np.zeros(order='F'))" is probably the best way so far, in my opinion, and I am going to give it a try. Again, thank you all for replying. Kang On 08/02/15, Nathaniel Smith wrote: > > On Aug 2, 2015 7:30 AM, "Sturla Molden" wrote: > > > > > > On 02/08/15 15:55, Kang Wang wrote: > > > > > > > Can anyone provide any insight/help? > > > > > > There is no "default order". There was before, but now all operators > > > control the order of their return arrays from the order of their input > > > array. > > This is... overoptimistic. I would not rely on this in code that I wrote. > > It's true that many numpy operations do preserve the input order. But there are also many operations that don't. And which are which often changes between releases. (Not on purpose usually, but it's an easy bug to introduce. And sometimes it is done intentionally, e.g. to make functions faster. It sucks to have to make a function slower for everyone because someone somewhere is depending on memory layout default details.) And there are operations where it's not even clear what preserving order means (indexing a C array with a Fortran array, add(C, fortran), ...), and even lots of operations that intrinsically break contiguity/ordering (transpose, diagonal, slicing, swapaxes, ...), so you will end up with mixed orderings one way or another in any non-trivial program. > > Instead, it's better to explicitly specify order= just at the places where you care. That way your code is *locally* correct (you can check it will work by just reading the one function). The alternative is to try and enforce a *global* property on your program ("everyone everywhere is very careful to only use contiguity-preserving operations", where "everyone" includes third party libraries like numpy and others). In software design, local invariants invariants are always better than global invariants -- the most well known example is local variables versus global variables, but the principle is much broader. > > -n > > -- Kang Wang, Ph.D. 1111 Highland Ave., Room 1113 Madison, WI 53705-2275 ---------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: From nevion at gmail.com Sun Aug 2 22:55:53 2015 From: nevion at gmail.com (Jason Newton) Date: Sun, 2 Aug 2015 19:55:53 -0700 Subject: [Numpy-discussion] Change default order to Fortran order In-Reply-To: <74b0e2ba198afc.55be88a1@wiscmail.wisc.edu> References: <7740864542dd.55bddb1a@wiscmail.wisc.edu> <7610d3f519bbd6.55bec8c3@wiscmail.wisc.edu> <762089a419b1da.55bec901@wiscmail.wisc.edu> <7690e69c19c6f5.55bec93e@wiscmail.wisc.edu> <74508bf519b933.55bec97c@wiscmail.wisc.edu> <7600e48b19abd0.55bec9b9@wiscmail.wisc.edu> <76109628198969.55bec9f7@wiscmail.wisc.edu> <7620c52619dbc7.55beca36@wiscmail.wisc.edu> <7740b86c19a9af.55beca74@wiscmail.wisc.edu> <74b0c96f19dd2e.55becab2@wiscmail.wisc.edu> <7720e87c19fe7f.55becaf0@wiscmail.wisc.edu> <76b0c5ac19a331.55becb2e@wiscmail.wisc.edu> <7620b4de199872.55becb6d@wiscmail.wisc.edu> <74b0bbd319feec.55becbab@wiscmail.wisc.edu> <7620d52d19bee2.55becbeb@wiscmail.wisc.edu> <76b0e43919dfdb.55becc29@wiscmail.wisc.edu> <75e0a7dc19bdab.55becc68@wiscmail.wisc.edu> <75e0c028198583.55becca6@wiscmail.wisc.edu> <74b0a09d19c79c.55becce4@wiscmail.wisc.edu> <76b0808619d856.55becd22@wiscmail.wisc.edu> <7610891d1992e0.55becd5f@wiscmail.wisc.edu> <7740b16819811d.55becd9d@wiscmail.wisc.edu> <75e0d05a19a210.55becddb@wiscmail.wisc.edu> <7620f86d19cdfb.55bece19@wiscmail.wisc.edu> <74b0e26619d88e.55bece57@wiscmail.wisc.edu> <7450aa2219d85b.55bece94@wiscmail.wisc.edu> <74b0c506198959.55beced2@wiscmail.wisc.edu> <74b0e2ba198afc.55be88a1@wiscmail.wisc.edu> Message-ID: Just chiming in with my 2 cents, in direct response to your points... - Image oriented processing is most typically done with row-major storage layout. From hardware to general software implementations. - Well really think of it as [slice,] row, column (logical)... you don't actually have to be concerned about the layout unless you want higher performance - in which case for a better access pattern you process a fundamental image-line at a time. I also find it helps me avoid bugs with xyz semantics by working with rows and columns only and remembering x=col, y = row. - I'm most familiar with having slice first like the above. - ITK is stored as row-major actually, but it's index type has dimensions specified as column,row, slice . Matlab does alot of things column order and thus acts different from implementations which can result in different outputs, but matlab seems perfectly happy living on an island where it's the only implementation providing a specific answer given a specific input. - Numpy is 0 based...? Good luck keeping it all sane though, -Jason On Sun, Aug 2, 2015 at 7:16 PM, Kang Wang wrote: > Thank you all for replying and providing useful insights and suggestions. > > The reasons I really want to use column-major are: > > - I am image-oriented user (not matrix-oriented, as explained in > http://docs.scipy.org/doc/numpy/reference/internals.html#multidimensional-array-indexing-order-issues > ) > - I am so used to read/write "I(x, y, z)" in textbook and code, and it > is very likely that if the environment (row-major environment) forces me to > write I(z, y, x), I will write a bug if I am not 100% focused. When this > happens, it is difficult to debug, because everything compile and build > fine. You will see run time error. Depending on environment, you may get > useful error message (i.e. index out of range), but sometimes you just get > bad image results. > - It actually has not too much to do with the actual data layout in > memory. In imaging processing, especially medical imaging where I am > working in, if you have a 3D image, everyone will agree that in memory, the > X index is the fasted changing index, and the Z dimension (we often call it > the "slice" dimension) has the largest stride in memory. So, if > data layout is like this in memory, and image-oriented users are so used to > read/write "I(x,y,z)", the only storage order that makes sense is > column-major > - I also write code in MATLAB and C/C++. In MATLAB, matrix is > column-major array. In C/C++, we often use ITK, which is also column-major ( > http://www.itk.org/Doxygen/html/classitk_1_1Image.html). I really > prefer always read/write column-major code to minimize coding bugs related > to storage order. > - I also prefer index to be 0-based; however, there is nothing I can > do about it for MATLAB (which is 1-based). > > I can see that my original thought about "modifying NumPy source and > re-compile" is probably a bad idea. The suggestions about using > "fortran_zeros = partial(np.zeros(order='F'))" is probably the best way so > far, in my opinion, and I am going to give it a try. > > Again, thank you all for replying. > > Kang > > On 08/02/15, *Nathaniel Smith * wrote: > > On Aug 2, 2015 7:30 AM, "Sturla Molden" wrote: > > > > On 02/08/15 15:55, Kang Wang wrote: > > > > > Can anyone provide any insight/help? > > > > There is no "default order". There was before, but now all operators > > control the order of their return arrays from the order of their input > > array. > > This is... overoptimistic. I would not rely on this in code that I wrote. > > It's true that many numpy operations do preserve the input order. But > there are also many operations that don't. And which are which often > changes between releases. (Not on purpose usually, but it's an easy bug to > introduce. And sometimes it is done intentionally, e.g. to make functions > faster. It sucks to have to make a function slower for everyone because > someone somewhere is depending on memory layout default details.) And there > are operations where it's not even clear what preserving order means > (indexing a C array with a Fortran array, add(C, fortran), ...), and even > lots of operations that intrinsically break contiguity/ordering (transpose, > diagonal, slicing, swapaxes, ...), so you will end up with mixed orderings > one way or another in any non-trivial program. > > Instead, it's better to explicitly specify order= just at the places where > you care. That way your code is *locally* correct (you can check it will > work by just reading the one function). The alternative is to try and > enforce a *global* property on your program ("everyone everywhere is very > careful to only use contiguity-preserving operations", where "everyone" > includes third party libraries like numpy and others). In software design, > local invariants invariants are always better than global invariants -- the > most well known example is local variables versus global variables, but the > principle is much broader. > > -n > > -- > *Kang Wang, Ph.D.* > 1111 Highland Ave., Room 1113 > Madison, WI 53705-2275 > ---------------------------------------- > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sun Aug 2 23:22:38 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 2 Aug 2015 21:22:38 -0600 Subject: [Numpy-discussion] 1.10.x is branched Message-ID: Hi All, Numpy 1.10.x is branched. There is still some cleanup to do before the alpha release, but that should be coming in a couple of days. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From sank.daniel at gmail.com Mon Aug 3 00:27:53 2015 From: sank.daniel at gmail.com (Daniel Sank) Date: Sun, 2 Aug 2015 21:27:53 -0700 Subject: [Numpy-discussion] Change default order to Fortran order In-Reply-To: <74b0e2ba198afc.55be88a1@wiscmail.wisc.edu> References: <7740864542dd.55bddb1a@wiscmail.wisc.edu> <7610d3f519bbd6.55bec8c3@wiscmail.wisc.edu> <762089a419b1da.55bec901@wiscmail.wisc.edu> <7690e69c19c6f5.55bec93e@wiscmail.wisc.edu> <74508bf519b933.55bec97c@wiscmail.wisc.edu> <7600e48b19abd0.55bec9b9@wiscmail.wisc.edu> <76109628198969.55bec9f7@wiscmail.wisc.edu> <7620c52619dbc7.55beca36@wiscmail.wisc.edu> <7740b86c19a9af.55beca74@wiscmail.wisc.edu> <74b0c96f19dd2e.55becab2@wiscmail.wisc.edu> <7720e87c19fe7f.55becaf0@wiscmail.wisc.edu> <76b0c5ac19a331.55becb2e@wiscmail.wisc.edu> <7620b4de199872.55becb6d@wiscmail.wisc.edu> <74b0bbd319feec.55becbab@wiscmail.wisc.edu> <7620d52d19bee2.55becbeb@wiscmail.wisc.edu> <76b0e43919dfdb.55becc29@wiscmail.wisc.edu> <75e0a7dc19bdab.55becc68@wiscmail.wisc.edu> <75e0c028198583.55becca6@wiscmail.wisc.edu> <74b0a09d19c79c.55becce4@wiscmail.wisc.edu> <76b0808619d856.55becd22@wiscmail.wisc.edu> <7610891d1992e0.55becd5f@wiscmail.wisc.edu> <7740b16819811d.55becd9d@wiscmail.wisc.edu> <75e0d05a19a210.55becddb@wiscmail.wisc.edu> <7620f86d19cdfb.55bece19@wiscmail.wisc.edu> <74b0e26619d88e.55bece57@wiscmail.wisc.edu> <7450aa2219d85b.55bece94@wiscmail.wisc.edu> <74b0c506198959.55beced2@wiscmail.wisc.edu> <74b0e2ba198afc.55be88a1@wiscmail.wisc.edu> Message-ID: Kang, Thank you for explaining your motivation. It's clear from your last note, as you said, that your desire for column-first indexing has nothing to do with in-memory data layout. That being the case, I strongly urge you to just use bare numpy and do not use the "fortran_zeros" function I recommended before. Changing the in-memory layout via the "order" keyword in numpy.zeros will not change the way indexing works at all. You gain absolutely nothing by changing the in-memory order unless you are writing some C or Fortran code which will interact with the data in memory. To see what I mean, consider the following examples: x = np.array([1, 2, 3], [4, 5, 6]]) x.shape >>> (2, 3) and x = np.array([1, 2, 3], [4, 5, 6]], order='F') x.shape >>> (2, 3) You see that changing the in-memory order has nothing whatsoever to do with the array's shape or how you access it. > You will see run time error. Depending on environment, you may get useful error message > (i.e. index out of range), but sometimes you just get bad image results. Could you give a very simple example of what you mean? I can't think of how this could ever happen and your fear here makes me think there's a fundamental misunderstanding about how array operations in numpy and other programming languages work. As an example, iteration in numpy goes through the first index: x = np.array([[1, 2, 3], [4, 5, 6]]) for foo in x: ... Inside the for loop, foo takes on the values [1, 2, 3] on the first iteration and [4, 5, 6] on the second. If you want to iterate through the columns just do this instead x = np.array([[1, 2, 3], [4, 5, 6]]) for foo in x.T: ... If your complaint is that you want np.array([[1, 2, 3], [4, 5, 6]]) to produce an array with shape (3, 2) then you should own up to the fact that the array constructor expects it the other way around and do this x = np.array([[1, 2, 3], [4, 5, 6]]).T instead. This is infinity times better than trying to write a shim function or patch numpy because with .T you're using (fast) built-in functionality which other people your code will understand. The real message here is that whether the first index runs over rows or columns is actually meaningless. The only places the row versus column issue has any meaning is when doing input/output (in which case you should use the transpose if you actually need it), or when doing iteration. One thing that would make sense if you're reading from a binary file format which uses column-major format would be to write your own reader function: def read_fortran_style_binary_file(file): return np.fromfile(file).T Note that if you do this then you already have a column major array in numpy and you don't have to worry about any other transposes (except, again, when doing more I/O or passing to something like a plotting function). On Sun, Aug 2, 2015 at 7:16 PM, Kang Wang wrote: > Thank you all for replying and providing useful insights and suggestions. > > The reasons I really want to use column-major are: > > - I am image-oriented user (not matrix-oriented, as explained in > http://docs.scipy.org/doc/numpy/reference/internals.html#multidimensional-array-indexing-order-issues > ) > - I am so used to read/write "I(x, y, z)" in textbook and code, and it > is very likely that if the environment (row-major environment) forces me to > write I(z, y, x), I will write a bug if I am not 100% focused. When this > happens, it is difficult to debug, because everything compile and build > fine. You will see run time error. Depending on environment, you may get > useful error message (i.e. index out of range), but sometimes you just get > bad image results. > - It actually has not too much to do with the actual data layout in > memory. In imaging processing, especially medical imaging where I am > working in, if you have a 3D image, everyone will agree that in memory, the > X index is the fasted changing index, and the Z dimension (we often call it > the "slice" dimension) has the largest stride in memory. So, if > data layout is like this in memory, and image-oriented users are so used to > read/write "I(x,y,z)", the only storage order that makes sense is > column-major > - I also write code in MATLAB and C/C++. In MATLAB, matrix is > column-major array. In C/C++, we often use ITK, which is also column-major ( > http://www.itk.org/Doxygen/html/classitk_1_1Image.html). I really > prefer always read/write column-major code to minimize coding bugs related > to storage order. > - I also prefer index to be 0-based; however, there is nothing I can > do about it for MATLAB (which is 1-based). > > I can see that my original thought about "modifying NumPy source and > re-compile" is probably a bad idea. The suggestions about using > "fortran_zeros = partial(np.zeros(order='F'))" is probably the best way so > far, in my opinion, and I am going to give it a try. > > Again, thank you all for replying. > > Kang > > On 08/02/15, *Nathaniel Smith * wrote: > > On Aug 2, 2015 7:30 AM, "Sturla Molden" wrote: > > > > On 02/08/15 15:55, Kang Wang wrote: > > > > > Can anyone provide any insight/help? > > > > There is no "default order". There was before, but now all operators > > control the order of their return arrays from the order of their input > > array. > > This is... overoptimistic. I would not rely on this in code that I wrote. > > It's true that many numpy operations do preserve the input order. But > there are also many operations that don't. And which are which often > changes between releases. (Not on purpose usually, but it's an easy bug to > introduce. And sometimes it is done intentionally, e.g. to make functions > faster. It sucks to have to make a function slower for everyone because > someone somewhere is depending on memory layout default details.) And there > are operations where it's not even clear what preserving order means > (indexing a C array with a Fortran array, add(C, fortran), ...), and even > lots of operations that intrinsically break contiguity/ordering (transpose, > diagonal, slicing, swapaxes, ...), so you will end up with mixed orderings > one way or another in any non-trivial program. > > Instead, it's better to explicitly specify order= just at the places where > you care. That way your code is *locally* correct (you can check it will > work by just reading the one function). The alternative is to try and > enforce a *global* property on your program ("everyone everywhere is very > careful to only use contiguity-preserving operations", where "everyone" > includes third party libraries like numpy and others). In software design, > local invariants invariants are always better than global invariants -- the > most well known example is local variables versus global variables, but the > principle is much broader. > > -n > > -- > *Kang Wang, Ph.D.* > 1111 Highland Ave., Room 1113 > Madison, WI 53705-2275 > ---------------------------------------- > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- Daniel Sank -------------- next part -------------- An HTML attachment was scrubbed... URL: From jni.soma at gmail.com Mon Aug 3 00:54:31 2015 From: jni.soma at gmail.com (Juan Nunez-Iglesias) Date: Mon, 03 Aug 2015 04:54:31 +0000 Subject: [Numpy-discussion] Change default order to Fortran order In-Reply-To: References: <7740864542dd.55bddb1a@wiscmail.wisc.edu> <7610d3f519bbd6.55bec8c3@wiscmail.wisc.edu> <762089a419b1da.55bec901@wiscmail.wisc.edu> <7690e69c19c6f5.55bec93e@wiscmail.wisc.edu> <74508bf519b933.55bec97c@wiscmail.wisc.edu> <7600e48b19abd0.55bec9b9@wiscmail.wisc.edu> <76109628198969.55bec9f7@wiscmail.wisc.edu> <7620c52619dbc7.55beca36@wiscmail.wisc.edu> <7740b86c19a9af.55beca74@wiscmail.wisc.edu> <74b0c96f19dd2e.55becab2@wiscmail.wisc.edu> <7720e87c19fe7f.55becaf0@wiscmail.wisc.edu> <76b0c5ac19a331.55becb2e@wiscmail.wisc.edu> <7620b4de199872.55becb6d@wiscmail.wisc.edu> <74b0bbd319feec.55becbab@wiscmail.wisc.edu> <7620d52d19bee2.55becbeb@wiscmail.wisc.edu> <76b0e43919dfdb.55becc29@wiscmail.wisc.edu> <75e0a7dc19bdab.55becc68@wiscmail.wisc.edu> <75e0c028198583.55becca6@wiscmail.wisc.edu> <74b0a09d19c79c.55becce4@wiscmail.wisc.edu> <76b0808619d856.55becd22@wiscmail.wisc.edu> <7610891d1992e0.55becd5f@wiscmail.wisc.edu> <7740b16819811d.55becd9d@wiscmail.wisc.edu> <75e0d05a19a210.55becddb@wiscmail.wisc.edu> <7620f86d19cdfb.55bece19@wiscmail.wisc.edu> <74b0e26619d88e.55bece57@wiscmail.wisc.edu> <7450aa2219d85b.55bece94@wiscmail.wisc.edu> <74b0c506198959.55beced2@wiscmail.wisc.edu> <74b0e2ba198afc.55be88a1@wiscmail.wisc.edu> Message-ID: Hi Kang, Feel free to come chat about your application on the scikit-image list [1]! I'll note that we've been through the array order discussion many times there and even have a doc page about it [2]. The short version is that you'll save yourself a lot of pain by starting to think of your images as (plane, row, column) instead of (x, y, z). The syntax actually becomes friendlier too. For example, to do something to each slice of data, you do: for plane in image: plane += foo instead of for z in image.shape[2]: image[:, :, z] += foo for example. Juan. [1] scikit-image at googlegroups.com [2] http://scikit-image.org/docs/dev/user_guide/numpy_images.html#coordinate-conventions PS: As to the renamed Fortran-ordered numpy, may I suggest "funpy". The F is for Fortran and the fun is for all the fun you'll have maintaining it. =P On Mon, 3 Aug 2015 at 6:28 am Daniel Sank wrote: > Kang, > > Thank you for explaining your motivation. It's clear from your last note, > as you said, that your desire for column-first indexing has nothing to do > with in-memory data layout. That being the case, I strongly urge you to > just use bare numpy and do not use the "fortran_zeros" function I > recommended before. Changing the in-memory layout via the "order" keyword > in numpy.zeros will not change the way indexing works at all. You gain > absolutely nothing by changing the in-memory order unless you are writing > some C or Fortran code which will interact with the data in memory. > > To see what I mean, consider the following examples: > > x = np.array([1, 2, 3], [4, 5, 6]]) > x.shape > >>> (2, 3) > > and > > x = np.array([1, 2, 3], [4, 5, 6]], order='F') > x.shape > >>> (2, 3) > > You see that changing the in-memory order has nothing whatsoever to do > with the array's shape or how you access it. > > > You will see run time error. Depending on environment, you may get > useful error message > > (i.e. index out of range), but sometimes you just get bad image results. > > Could you give a very simple example of what you mean? I can't think of > how this could ever happen and your fear here makes me think there's a > fundamental misunderstanding about how array operations in numpy and other > programming languages work. As an example, iteration in numpy goes through > the first index: > > x = np.array([[1, 2, 3], [4, 5, 6]]) > for foo in x: > ... > > Inside the for loop, foo takes on the values [1, 2, 3] on the first > iteration and [4, 5, 6] on the second. If you want to iterate through the > columns just do this instead > > x = np.array([[1, 2, 3], [4, 5, 6]]) > for foo in x.T: > ... > > If your complaint is that you want np.array([[1, 2, 3], [4, 5, 6]]) to > produce an array with shape (3, 2) then you should own up to the fact that > the array constructor expects it the other way around and do this > > x = np.array([[1, 2, 3], [4, 5, 6]]).T > > instead. This is infinity times better than trying to write a shim > function or patch numpy because with .T you're using (fast) built-in > functionality which other people your code will understand. > > The real message here is that whether the first index runs over rows or > columns is actually meaningless. The only places the row versus column > issue has any meaning is when doing input/output (in which case you should > use the transpose if you actually need it), or when doing iteration. One > thing that would make sense if you're reading from a binary file format > which uses column-major format would be to write your own reader function: > > def read_fortran_style_binary_file(file): > return np.fromfile(file).T > > Note that if you do this then you already have a column major array in > numpy and you don't have to worry about any other transposes (except, > again, when doing more I/O or passing to something like a plotting > function). > > > > > On Sun, Aug 2, 2015 at 7:16 PM, Kang Wang wrote: > >> Thank you all for replying and providing useful insights and suggestions. >> >> The reasons I really want to use column-major are: >> >> - I am image-oriented user (not matrix-oriented, as explained in >> http://docs.scipy.org/doc/numpy/reference/internals.html#multidimensional-array-indexing-order-issues >> ) >> - I am so used to read/write "I(x, y, z)" in textbook and code, and >> it is very likely that if the environment (row-major environment) forces me >> to write I(z, y, x), I will write a bug if I am not 100% focused. When >> this happens, it is difficult to debug, because everything compile and >> build fine. You will see run time error. Depending on environment, you may >> get useful error message (i.e. index out of range), but sometimes you just >> get bad image results. >> - It actually has not too much to do with the actual data layout in >> memory. In imaging processing, especially medical imaging where I am >> working in, if you have a 3D image, everyone will agree that in memory, the >> X index is the fasted changing index, and the Z dimension (we often call it >> the "slice" dimension) has the largest stride in memory. So, if >> data layout is like this in memory, and image-oriented users are so used to >> read/write "I(x,y,z)", the only storage order that makes sense is >> column-major >> - I also write code in MATLAB and C/C++. In MATLAB, matrix is >> column-major array. In C/C++, we often use ITK, which is also column-major ( >> http://www.itk.org/Doxygen/html/classitk_1_1Image.html). I really >> prefer always read/write column-major code to minimize coding bugs related >> to storage order. >> - I also prefer index to be 0-based; however, there is nothing I can >> do about it for MATLAB (which is 1-based). >> >> I can see that my original thought about "modifying NumPy source and >> re-compile" is probably a bad idea. The suggestions about using >> "fortran_zeros = partial(np.zeros(order='F'))" is probably the best way so >> far, in my opinion, and I am going to give it a try. >> >> Again, thank you all for replying. >> >> Kang >> >> On 08/02/15, *Nathaniel Smith * wrote: >> >> On Aug 2, 2015 7:30 AM, "Sturla Molden" wrote: >> > >> > On 02/08/15 15:55, Kang Wang wrote: >> > >> > > Can anyone provide any insight/help? >> > >> > There is no "default order". There was before, but now all operators >> > control the order of their return arrays from the order of their input >> > array. >> >> This is... overoptimistic. I would not rely on this in code that I wrote. >> >> It's true that many numpy operations do preserve the input order. But >> there are also many operations that don't. And which are which often >> changes between releases. (Not on purpose usually, but it's an easy bug to >> introduce. And sometimes it is done intentionally, e.g. to make functions >> faster. It sucks to have to make a function slower for everyone because >> someone somewhere is depending on memory layout default details.) And there >> are operations where it's not even clear what preserving order means >> (indexing a C array with a Fortran array, add(C, fortran), ...), and even >> lots of operations that intrinsically break contiguity/ordering (transpose, >> diagonal, slicing, swapaxes, ...), so you will end up with mixed orderings >> one way or another in any non-trivial program. >> >> Instead, it's better to explicitly specify order= just at the places >> where you care. That way your code is *locally* correct (you can check it >> will work by just reading the one function). The alternative is to try and >> enforce a *global* property on your program ("everyone everywhere is very >> careful to only use contiguity-preserving operations", where "everyone" >> includes third party libraries like numpy and others). In software design, >> local invariants invariants are always better than global invariants -- the >> most well known example is local variables versus global variables, but the >> principle is much broader. >> >> -n >> >> -- >> *Kang Wang, Ph.D.* >> 1111 Highland Ave., Room 1113 >> Madison, WI 53705-2275 >> ---------------------------------------- >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > > -- > Daniel Sank > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kwang24 at wisc.edu Mon Aug 3 02:02:42 2015 From: kwang24 at wisc.edu (Kang Wang) Date: Mon, 03 Aug 2015 01:02:42 -0500 Subject: [Numpy-discussion] Change default order to Fortran order In-Reply-To: <75e0a7e7199aa0.55bf03c5@wiscmail.wisc.edu> References: <7740864542dd.55bddb1a@wiscmail.wisc.edu> <7610d3f519bbd6.55bec8c3@wiscmail.wisc.edu> <762089a419b1da.55bec901@wiscmail.wisc.edu> <7690e69c19c6f5.55bec93e@wiscmail.wisc.edu> <74508bf519b933.55bec97c@wiscmail.wisc.edu> <7600e48b19abd0.55bec9b9@wiscmail.wisc.edu> <76109628198969.55bec9f7@wiscmail.wisc.edu> <7620c52619dbc7.55beca36@wiscmail.wisc.edu> <7740b86c19a9af.55beca74@wiscmail.wisc.edu> <74b0c96f19dd2e.55becab2@wiscmail.wisc.edu> <7720e87c19fe7f.55becaf0@wiscmail.wisc.edu> <76b0c5ac19a331.55becb2e@wiscmail.wisc.edu> <7620b4de199872.55becb6d@wiscmail.wisc.edu> <74b0bbd319feec.55becbab@wiscmail.wisc.edu> <7620d52d19bee2.55becbeb@wiscmail.wisc.edu> <76b0e43919dfdb.55becc29@wiscmail.wisc.edu> <75e0a7dc19bdab.55becc68@wiscmail.wisc.edu> <75e0c028198583.55becca6@wiscmail.wisc.edu> <74b0a09d19c79c.55becce4@wiscmail.wisc.edu> <76b0808619d856.55becd22@wiscmail.wisc.edu> <7610891d1992e0.55becd5f@wiscmail.wisc.edu> <7740b16819811d.55becd9d@wiscmail.wisc.edu> <75e0d05a19a210.55becddb@wiscmail.wisc.edu> <7620f86d19cdfb.55bece19@wiscmail.wisc.edu> <74b0e26619d88e.55bece57@wiscmail.wisc.edu> <7450aa2219d85b.55bece94@wiscmail.wisc.edu> <74b0c506198959.55beced2@wiscmail.wisc.edu> <74b0e2ba198afc.55be88a1@wiscmail.wisc.edu> <7790f915199428.55befb9a@wiscmail.wisc.edu> <7790906919a1f1.55befbd8@wiscmail.wisc.edu> <7790852f19d571.55befc17@wiscmail.wisc.edu> <77209c0219e46d.55befc54@wiscmail.wisc.edu> <7720f14319db76.55befc92@wiscmail.wisc.edu> <7740cea919c20b.55befcd0@wiscmail.wisc.edu> <75e091da1983c6.55befd0d@wiscmail.wisc.edu> <75e092e119d0d4.55befd4c@wiscmail.wisc.edu> <74508dfa19c665.55befd89@wiscmail.wisc.edu> <779098271984ef.55befdc7@wiscmail.wisc.edu> <7790a0c619efcf.55befe05@wiscmail.wisc.edu> <7790e85019cb3c.55befe43@wiscmail.wisc.edu> <774081cb19c91f.55befe81@wiscmail.wisc.edu> <7740fc9119942d.55befebf@wiscmail.wisc.edu> <7740800619f499.55befefc@wiscmail.wisc.edu> <77408ff719acd0.55beffef@wiscmail.wisc.edu> <7720f55919b92b.55bf0068@wiscmail.wisc.edu> <7720c87719822d.55bf00a6@wiscmail.wisc.edu> <7720ae66199b7c.55bf00e4@wiscmail.wisc.edu> <7720bc3819aa29.55bf0122@wiscmail.wisc.edu> <7620dfc219e016.55bf0160@wiscmail.wisc.edu> <7620af8f19a909.55bf019e@wiscmail.wisc.edu> <7690986219ed18.55bf01dc@wiscmail.wisc.edu> <7690bc7419a786.55bf0219@wiscmail.wisc.edu> <7690ea4e19b283.55bf0258@wiscmail.wisc.edu> <76008e80199c29.55bf0296@wiscmail.wisc.edu> <75e0cd8219e481.55bf034b@wiscmail.wisc.edu> <75e0a7e7199aa0.55bf03c5@wiscmail.wisc.edu> Message-ID: <75e08cd219963d.55bebdb2@wiscmail.wisc.edu> This is very good discussion. Thank you all for replying. I can see the fundamental difference is that I always think/talk/read/write a 3D image as I(x, y, z), not (plane, row, column) . I am coming from MRI (magnetic resonance imaging) research, and I can assure you that the entire MRI community is using (x, y, z), including books, journal papers, conference abstracts, presentations, everything. We even talk about what we called "logical x/y/z" and "physical x/y/z", and the rotation matrix that converts the two coordinate systems. The radiologists are also used to (x, y, z). For example, we always say "my image is 256 by 256 by 20 slices", and we never say "20 by 256 by 256". So, basically, at least in MRI, we always talk about an image as I(x, y, z), and we always assume that "x" is the fastest changing index. That's why I prefer column-major (because it is more natural). Of course, I can totally get my work done by using row-major, I just have to always remind myself "write last dimension index first" when coding. I actually have done this before, and I found it would be so much easier if just using column-major. Kang On 08/02/15, Juan Nunez-Iglesias wrote: > Hi Kang, > > Feel free to come chat about your application on the scikit-image list [1]! I'll note that we've been through the array order discussion many times there and even have a doc page about it [2]. > > The short version is that you'll save yourself a lot of pain by starting to think of your images as (plane, row, column) instead of (x, y, z). The syntax actually becomes friendlier too. For example, to do something to each slice of data, you do: > > for plane in image: > plane?+= foo > > > instead of > > > for z in image.shape[2]: > image[:, :, z]?+= foo > > > for example. > > > Juan. > > > [1] scikit-image at googlegroups.com > [2]?http://scikit-image.org/docs/dev/user_guide/numpy_images.html#coordinate-conventions > > > PS: As to the renamed Fortran-ordered numpy, may I suggest "funpy". The F is for Fortran and the fun is for all the fun you'll have maintaining it. =P > > On Mon, 3 Aug 2015 at 6:28 am Daniel Sank wrote: > > > > Kang, > > > > Thank you for explaining your motivation. It's clear from your last note, as you said, that your desire for column-first indexing has nothing to do with in-memory data layout. That being the case, I strongly urge you to just use bare numpy and do not use the "fortran_zeros" function I recommended before. Changing the in-memory layout via the "order" keyword in numpy.zeros will not change the way indexing works at all. You gain absolutely nothing by changing the in-memory order unless you are writing some C or Fortran code which will interact with the data in memory. > > > > > > To see what I mean, consider the following examples: > > > > > > x = np.array([1, 2, 3], [4, 5, 6]]) > > x.shape > > >>> (2, 3) > > > > > > and > > > > > > x = np.array([1, 2, 3], [4, 5, 6]], order='F') > > x.shape > > >>> (2, 3) > > > > > > > > You see that changing the in-memory order has nothing whatsoever to do with the array's shape or how you access it. > > > > > > > > > You will see run time error. Depending on environment, you may get useful error message > > > (i.e. index out of range), but sometimes you just get bad image results. > > > > > > > > Could you give a very simple example of what you mean? I can't think of how this could ever happen and your fear here makes me think there's a fundamental misunderstanding about how array operations in numpy and other programming languages work. As an example, iteration in numpy goes through the first index: > > > > > > x = np.array([[1, 2, 3], [4, 5, 6]]) > > for foo in x: > > ... > > > > > > Inside the for loop, foo takes on the values [1, 2, 3] on the first iteration and [4, 5, 6] on the second. If you want to iterate through the columns just do this instead > > > > > > x = np.array([[1, 2, 3], [4, 5, 6]]) > > for foo in x.T: > > ... > > > > > > > > If your complaint is that you want np.array([[1, 2, 3], [4, 5, 6]]) to produce an array with shape (3, 2) then you should own up to the fact that the array constructor expects it the other way around and do this > > > > > > > > x = np.array([[1, 2, 3], [4, 5, 6]]).T > > > > > > > > instead. This is infinity times better than trying to write a shim function or patch numpy because with .T you're using (fast) built-in functionality which other people your code will understand. > > > > The real message here is that whether the first index runs over rows or columns is actually meaningless. The only places the row versus column issue has any meaning is when doing input/output (in which case you should use the transpose if you actually need it), or when doing iteration. One thing that would make sense if you're reading from a binary file format which uses column-major format would be to write your own reader function: > > > > > > > > def read_fortran_style_binary_file(file): > > return np.fromfile(file).T > > > > > > Note that if you do this then you already have a column major array in numpy and you don't have to worry about any other transposes (except, again, when doing more I/O or passing to something like a plotting function). > > > > > > > > > > > > > > > > > > > > > > On Sun, Aug 2, 2015 at 7:16 PM, Kang Wang wrote: > > > > > > > > > Thank you all for replying and providing useful insights and suggestions. > > > > > > The reasons I really want to use column-major are: > > > > > > I am image-oriented user (not matrix-oriented, as explained in?http://docs.scipy.org/doc/numpy/reference/internals.html#multidimensional-array-indexing-order-issues) > > > I am so used to read/write "I(x, y, z)" in textbook and code, and it is very likely that if the environment (row-major environment) forces me to write I(z, y, x), I will write a bug if I am not 100% focused. When this happens, it is difficult to debug, because everything compile and build fine. You will see run time error. Depending on environment, you may get useful error message (i.e. index out of range), but sometimes you just get bad image results. > > > It actually has not too much to do with the actual data layout in memory. In imaging processing, especially medical imaging where I am working in, if you have a 3D image, everyone will agree that in memory, the X index is the fasted changing index, and the Z dimension (we often call it the "slice" dimension) has the largest stride in memory. So, if data?layout?is like this in memory, and image-oriented users are so used to read/write "I(x,y,z)", the only storage order that makes sense is column-major > > > I also write code in MATLAB and C/C++. In MATLAB, matrix is column-major array. In C/C++, we often use ITK, which is also column-major (http://www.itk.org/Doxygen/html/classitk_1_1Image.html). I really prefer always read/write column-major code to minimize coding bugs related to storage order. > > > I also prefer index to be 0-based; however, there is nothing I can do about it for MATLAB (which is 1-based). > > > > > > I can see that my original thought about "modifying NumPy source and re-compile" is probably a bad idea. The suggestions about using "fortran_zeros = partial(np.zeros(order='F'))" is probably the best way so far, in my opinion, and I am going to give it a try. > > > > > > > > > Again, thank you all for replying. > > > > > > > > > Kang > > > > > > On 08/02/15, Nathaniel Smith wrote: > > > > > > > > On Aug 2, 2015 7:30 AM, "Sturla Molden" wrote: > > > > > > > > > > > > > > > > > > On 02/08/15 15:55, Kang Wang wrote: > > > > > > > > > > > > > > > > > > > Can anyone provide any insight/help? > > > > > > > > > > > > > > > > > > There is no "default order". There was before, but now all operators > > > > > > > > > control the order of their return arrays from the order of their input > > > > > > > > > array. > > > > > > > > This is... overoptimistic. I would not rely on this in code that I wrote. > > > > > > > > It's true that many numpy operations do preserve the input order. But there are also many operations that don't. And which are which often changes between releases. (Not on purpose usually, but it's an easy bug to introduce. And sometimes it is done intentionally, e.g. to make functions faster. It sucks to have to make a function slower for everyone because someone somewhere is depending on memory layout default details.) And there are operations where it's not even clear what preserving order means (indexing a C array with a Fortran array, add(C, fortran), ...), and even lots of operations that intrinsically break contiguity/ordering (transpose, diagonal, slicing, swapaxes, ...), so you will end up with mixed orderings one way or another in any non-trivial program. > > > > > > > > Instead, it's better to explicitly specify order= just at the places where you care. That way your code is *locally* correct (you can check it will work by just reading the one function). The alternative is to try and enforce a *global* property on your program ("everyone everywhere is very careful to only use contiguity-preserving operations", where "everyone" includes third party libraries like numpy and others). In software design, local invariants invariants are always better than global invariants -- the most well known example is local variables versus global variables, but the principle is much broader. > > > > > > > > -n > > > > > > > > > > > > > > > > > -- > > > Kang Wang, Ph.D. > > > 1111 Highland Ave., Room 1113 > > > Madison, WI 53705-2275 > > > ---------------------------------------- > > > > > > > > > _______________________________________________ > > > > > > NumPy-Discussion mailing list > > > > > > NumPy-Discussion at scipy.org > > > > > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > > > > > > > > > > > > > > > > -- > > Daniel Sank > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > NumPy-Discussion mailing list > > > > NumPy-Discussion at scipy.org > > > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > > -- Kang Wang, Ph.D. 1111 Highland Ave., Room 1113 Madison, WI 53705-2275 ---------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Mon Aug 3 02:42:05 2015 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Mon, 3 Aug 2015 08:42:05 +0200 Subject: [Numpy-discussion] 1.10.x is branched In-Reply-To: References: Message-ID: On Mon, Aug 3, 2015 at 5:22 AM, Charles R Harris wrote: > Hi All, > > Numpy 1.10.x is branched. There is still some cleanup to do before the > alpha release, but that should be coming in a couple of days. > > Thanks Chuck. Looks like it's shaping up nicely. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Mon Aug 3 03:09:00 2015 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 3 Aug 2015 07:09:00 +0000 Subject: [Numpy-discussion] Change default order to Fortran order In-Reply-To: <75e08cd219963d.55bebdb2@wiscmail.wisc.edu> References: <7740864542dd.55bddb1a@wiscmail.wisc.edu> <7610d3f519bbd6.55bec8c3@wiscmail.wisc.edu> <762089a419b1da.55bec901@wiscmail.wisc.edu> <7690e69c19c6f5.55bec93e@wiscmail.wisc.edu> <74508bf519b933.55bec97c@wiscmail.wisc.edu> <7600e48b19abd0.55bec9b9@wiscmail.wisc.edu> <76109628198969.55bec9f7@wiscmail.wisc.edu> <7620c52619dbc7.55beca36@wiscmail.wisc.edu> <7740b86c19a9af.55beca74@wiscmail.wisc.edu> <74b0c96f19dd2e.55becab2@wiscmail.wisc.edu> <7720e87c19fe7f.55becaf0@wiscmail.wisc.edu> <76b0c5ac19a331.55becb2e@wiscmail.wisc.edu> <7620b4de199872.55becb6d@wiscmail.wisc.edu> <74b0bbd319feec.55becbab@wiscmail.wisc.edu> <7620d52d19bee2.55becbeb@wiscmail.wisc.edu> <76b0e43919dfdb.55becc29@wiscmail.wisc.edu> <75e0a7dc19bdab.55becc68@wiscmail.wisc.edu> <75e0c028198583.55becca6@wiscmail.wisc.edu> <74b0a09d19c79c.55becce4@wiscmail.wisc.edu> <76b0808619d856.55becd22@wiscmail.wisc.edu> <7610891d1992e0.55becd5f@wiscmail.wisc.edu> <7740b16819811d.55becd9d@wiscmail.wisc.edu> <75e0d05a19a210.55becddb@wiscmail.wisc.edu> <7620f86d19cdfb.55bece19@wiscmail.wisc.edu> <74b0e26619d88e.55bece57@wiscmail.wisc.edu> <7450aa2219d85b.55bece94@wiscmail.wisc.edu> <74b0c506198959.55beced2@wiscmail.wisc.edu> <74b0e2ba198afc.55be88a1@wiscmail.wisc.edu> <7790f915199428.55befb9a@wiscmail.wisc.edu> <7790906919a1f1.55befbd8@wiscmail.wisc.edu> <7790852f19d571.55befc17@wiscmail.wisc.edu> <77209c0219e46d.55befc54@wiscmail.wisc.edu> <7720f14319db76.55befc92@wiscmail.wisc.edu> <7740cea919c20b.55befcd0@wiscmail.wisc.edu> <75e091da1983c6.55befd0d@wiscmail.wisc.edu> <75e092e119d0d4.55befd4c@wiscmail.wisc.edu> <74508dfa19c665.55befd89@wiscmail.wisc.edu> <779098271984ef.55befdc7@wiscmail.wisc.edu> <7790a0c619efcf.55befe05@wiscmail.wisc.edu> <7790e85019cb3c.55befe43@wiscmail.wisc.edu> <774081cb19c91f.55befe81@wiscmail.wisc.edu> <7740fc9119942d.55befebf@wiscmail.wisc.edu> <7740800619f499.55befefc@wiscmail.wisc.edu> <77408ff719acd0.55beffef@wiscmail.wisc.edu> <7720f55919b92b.55bf0068@wiscmail.wisc.edu> <7720c87719822d.55bf00a6@wiscmail.wisc.edu> <7720ae66199b7c.55bf00e4@wiscmail.wisc.edu> <7720bc3819aa29.55bf0122@wiscmail.wisc.edu> <7620dfc219e016.55bf0160@wiscmail.wisc.edu> <7620af8f19a909.55bf019e@wiscmail.wisc.edu> <7690986219ed18.55bf01dc@wiscmail.wisc.edu> <7690bc7419a786.55bf0219@wiscmail.wisc.edu> <7690ea4e19b283.55bf0258@wiscmail.wisc.edu> <76008e80199c29.55bf0296@wiscmail.wisc.edu> <75e0cd8219e481.55bf034b@wiscmail.wisc.edu> <75e0a7e7199aa0.55bf03c5@wiscmail.wisc.edu> <75e08cd219963d.55bebdb2@wiscmail.wisc.edu> Message-ID: On Aug 2, 2015 11:06 PM, "Kang Wang" wrote: > > This is very good discussion. Thank you all for replying. > > I can see the fundamental difference is that I always think/talk/read/write a 3D image as I(x, y, z), not (plane, row, column) . I am coming from MRI (magnetic resonance imaging) research, and I can assure you that the entire MRI community is using (x, y, z), including books, journal papers, conference abstracts, presentations, everything. We even talk about what we called "logical x/y/z" and "physical x/y/z", and the rotation matrix that converts the two coordinate systems. The radiologists are also used to (x, y, z). For example, we always say "my image is 256 by 256 by 20 slices", and we never say "20 by 256 by 256". > > So, basically, at least in MRI, we always talk about an image as I(x, y, z), and we always assume that "x" is the fastest changing index. That's why I prefer column-major (because it is more natural). > > Of course, I can totally get my work done by using row-major, I just have to always remind myself "write last dimension index first" when coding. I actually have done this before, and I found it would be so much easier if just using column-major. Why not just use I[x, y, z] like you're used to, and let the computer worry about the physical layout in memory? Sometimes this will be Fortran order and sometimes C order and sometimes something else, but you don't have to know or care; 99% of the time it doesn't matter. The worst case is that when you use a python wrapper to call into a library that can only handle Fortran order, then the wrapper will quietly have to convert the memory order around and it will be slightly slower than if you had used Fortran order in the first place. But in practice you'll barely ever notice this, and when you do, *then* you can tell numpy explicitly what memory layout you want in the situation where it matters. General principle: do what's easiest for the programmer, not what's easiest for the computer. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Mon Aug 3 04:49:35 2015 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 3 Aug 2015 09:49:35 +0100 Subject: [Numpy-discussion] Change default order to Fortran order In-Reply-To: References: <7740864542dd.55bddb1a@wiscmail.wisc.edu> <7610d3f519bbd6.55bec8c3@wiscmail.wisc.edu> <762089a419b1da.55bec901@wiscmail.wisc.edu> <7690e69c19c6f5.55bec93e@wiscmail.wisc.edu> <74508bf519b933.55bec97c@wiscmail.wisc.edu> <7600e48b19abd0.55bec9b9@wiscmail.wisc.edu> <76109628198969.55bec9f7@wiscmail.wisc.edu> <7620c52619dbc7.55beca36@wiscmail.wisc.edu> <7740b86c19a9af.55beca74@wiscmail.wisc.edu> <74b0c96f19dd2e.55becab2@wiscmail.wisc.edu> <7720e87c19fe7f.55becaf0@wiscmail.wisc.edu> <76b0c5ac19a331.55becb2e@wiscmail.wisc.edu> <7620b4de199872.55becb6d@wiscmail.wisc.edu> <74b0bbd319feec.55becbab@wiscmail.wisc.edu> <7620d52d19bee2.55becbeb@wiscmail.wisc.edu> <76b0e43919dfdb.55becc29@wiscmail.wisc.edu> <75e0a7dc19bdab.55becc68@wiscmail.wisc.edu> <75e0c028198583.55becca6@wiscmail.wisc.edu> <74b0a09d19c79c.55becce4@wiscmail.wisc.edu> <76b0808619d856.55becd22@wiscmail.wisc.edu> <7610891d1992e0.55becd5f@wiscmail.wisc.edu> <7740b16819811d.55becd9d@wiscmail.wisc.edu> <75e0d05a19a210.55becddb@wiscmail.wisc.edu> <7620f86d19cdfb.55bece19@wiscmail.wisc.edu> <74b0e26619d88e.55bece57@wiscmail.wisc.edu> <7450aa2219d85b.55bece94@wiscmail.wisc.edu> <74b0c506198959.55beced2@wiscmail.wisc.edu> <74b0e2ba198afc.55be88a1@wiscmail.wisc.edu> <7790f915199428.55befb9a@wiscmail.wisc.edu> <7790906919a1f1.55befbd8@wiscmail.wisc.edu> <7790852f19d571.55befc17@wiscmail.wisc.edu> <77209c0219e46d.55befc54@wiscmail.wisc.edu> <7720f14319db76.55befc92@wiscmail.wisc.edu> <7740cea919c20b.55befcd0@wiscmail.wisc.edu> <75e091da1983c6.55befd0d@wiscmail.wisc.edu> <75e092e119d0d4.55befd4c@wiscmail.wisc.edu> <74508dfa19c665.55befd89@wiscmail.wisc.edu> <779098271984ef.55befdc7@wiscmail.wisc.edu> <7790a0c619efcf.55befe05@wiscmail.wisc.edu> <7790e85019cb3c.55befe43@wiscmail.wisc.edu> <774081cb19c91f.55befe81@wiscmail.wisc.edu> <7740fc9119942d.55befebf@wiscmail.wisc.edu> <7740800619f499.55befefc@wiscmail.wisc.edu> <77408ff719acd0.55beffef@wiscmail.wisc.edu> <7720f55919b92b.55bf0068@wiscmail.wisc.edu> <7720c87719822d.55bf00a6@wiscmail.wisc.edu> <7720ae66199b7c.55bf00e4@wiscmail.wisc.edu> <7720bc3819aa29.55bf0122@wiscmail.wisc.edu> <7620dfc219e016.55bf0160@wiscmail.wisc.edu> <7620af8f19a909.55bf019e@wiscmail.wisc.edu> <7690986219ed18.55bf01dc@wiscmail.wisc.edu> <7690bc7419a786.55bf0219@wiscmail.wisc.edu> <7690ea4e19b283.55bf0258@wiscmail.wisc.edu> <76008e80199c29.55bf0296@wiscmail.wisc.edu> <75e0cd8219e481.55bf034b@wiscmail.wisc.edu> <75e0a7e7199aa0.55bf03c5@wiscmail.wisc.edu> <75e08cd219963d.55bebdb2@wiscmail.wisc.edu> Message-ID: Hi, On Mon, Aug 3, 2015 at 8:09 AM, Nathaniel Smith wrote: > On Aug 2, 2015 11:06 PM, "Kang Wang" wrote: >> >> This is very good discussion. Thank you all for replying. >> >> I can see the fundamental difference is that I always >> think/talk/read/write a 3D image as I(x, y, z), not (plane, row, column) . I >> am coming from MRI (magnetic resonance imaging) research, and I can assure >> you that the entire MRI community is using (x, y, z), including books, >> journal papers, conference abstracts, presentations, everything. We even >> talk about what we called "logical x/y/z" and "physical x/y/z", and the >> rotation matrix that converts the two coordinate systems. The radiologists >> are also used to (x, y, z). For example, we always say "my image is 256 by >> 256 by 20 slices", and we never say "20 by 256 by 256". >> >> So, basically, at least in MRI, we always talk about an image as I(x, y, >> z), and we always assume that "x" is the fastest changing index. That's why >> I prefer column-major (because it is more natural). >> >> Of course, I can totally get my work done by using row-major, I just have >> to always remind myself "write last dimension index first" when coding. I >> actually have done this before, and I found it would be so much easier if >> just using column-major. > > Why not just use I[x, y, z] like you're used to, and let the computer worry > about the physical layout in memory? Sometimes this will be Fortran order > and sometimes C order and sometimes something else, but you don't have to > know or care; 99% of the time it doesn't matter. The worst case is that when > you use a python wrapper to call into a library that can only handle Fortran > order, then the wrapper will quietly have to convert the memory order around > and it will be slightly slower than if you had used Fortran order in the > first place. But in practice you'll barely ever notice this, and when you > do, *then* you can tell numpy explicitly what memory layout you want in the > situation where it matters. Yes - if you are using numpy, you really have to look numpy in the eye and say: "I will let you worry about the array element order in memory, and in return, you promise to make indexing work as I would expect" Just for example, let's say you loaded an MRI image into memory: In [1]: import nibabel In [2]: img = nibabel.load('my_mri.nii') In [3]: data = img.get_data() Because NIfTI images are Fortran memory layout, this happens to be the memory layout you get for your array: In [4]: data.flags Out[4]: C_CONTIGUOUS : False F_CONTIGUOUS : True OWNDATA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False But now - in Python - all I care about is what data I have on the first, second, third axes. For example, I could do this: In [5]: data_copy = data.copy() This has exactly the same values as the original array, and at the same index positions: In [7]: import numpy as np In [8]: np.all(data == data) Out[8]: memmap(True, dtype=bool) but I now have a C memory layout array. In [9]: data_copy.flags Out[9]: C_CONTIGUOUS : True F_CONTIGUOUS : False OWNDATA : True WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False Worse than that, if I slice my original data array, then I get an array that is neither C- or Fortran- compatible in memory: In [10]: data_view = data[:, :, ::2] In [11]: data_view.flags Out[11]: C_CONTIGUOUS : False F_CONTIGUOUS : False OWNDATA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False So - if you want every array to be Fortran-contiguous in memory, I would not start with numpy at all, I would write your own array library. The alternative - or "the numpy way" - is to give up on enforcing a particular layout in memory, until you need to pass an array to some C or C++ or Fortran code that needs some particular layout, in which case you get your extension code to copy the array into the required layout on entry. Of course this is what numpy itself has to do when interfacing with external libraries like BLAS or LAPACK. Cheers, Matthew From sebastian at sipsolutions.net Mon Aug 3 08:02:15 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Mon, 3 Aug 2015 12:02:15 +0000 Subject: [Numpy-discussion] Change default order to Fortran order In-Reply-To: References: <7740864542dd.55bddb1a@wiscmail.wisc.edu> <7610d3f519bbd6.55bec8c3@wiscmail.wisc.edu> <762089a419b1da.55bec901@wiscmail.wisc.edu> <7690e69c19c6f5.55bec93e@wiscmail.wisc.edu> <74508bf519b933.55bec97c@wiscmail.wisc.edu> <7600e48b19abd0.55bec9b9@wiscmail.wisc.edu> <76109628198969.55bec9f7@wiscmail.wisc.edu> <7620c52619dbc7.55beca36@wiscmail.wisc.edu> <7740b86c19a9af.55beca74@wiscmail.wisc.edu> <74b0c96f19dd2e.55becab2@wiscmail.wisc.edu> <7720e87c19fe7f.55becaf0@wiscmail.wisc.edu> <76b0c5ac19a331.55becb2e@wiscmail.wisc.edu> <7620b4de199872.55becb6d@wiscmail.wisc.edu> <74b0bbd319feec.55becbab@wiscmail.wisc.edu> <7620d52d19bee2.55becbeb@wiscmail.wisc.edu> <76b0e43919dfdb.55becc29@wiscmail.wisc.edu> <75e0a7dc19bdab.55becc68@wiscmail.wisc.edu> <75e0c028198583.55becca6@wiscmail.wisc.edu> <74b0a09d19c79c.55becce4@wiscmail.wisc.edu> <76b0808619d856.55becd22@wiscmail.wisc.edu> <7610891d1992e0.55becd5f@wiscmail.wisc.edu> <7740b16819811d.55becd9d@wiscmail.wisc.edu> <75e0d05a19a210.55becddb@wiscmail.wisc.edu> <7620f86d19cdfb.55bece19@wiscmail.wisc.edu> <74b0e26619d88e.55bece57@wiscmail.wisc.edu> <7450aa2219d85b.55bece94@wiscmail.wisc.edu> <74b0c506198959.55beced2@wiscmail.wisc.edu> <74b0e2ba198afc.55be88a1@wiscmail.wisc.edu> <7790f915199428.55befb9a@wiscmail.wisc.edu> <7790906919a1f1.55befbd8@wiscmail.wisc.edu> <7790852f19d571.55befc17@wiscmail.wisc.edu> <77209c0219e46d.55befc54@wiscmail.wisc.edu> <7720f14319db76.55befc92@wiscmail.wisc.edu> <7740cea919c20b.55befcd0@wiscmail.wisc.edu> <75e091da1983c6.55befd0d@wiscmail.wisc.edu> <75e092e119d0d4.55befd4c@wiscmail.wisc.edu> <74508dfa19c665.55befd89@wiscmail.wisc.edu> <779098271984ef.55befdc7@wiscmail.wisc.edu> <7790a0c619efcf.55befe05@wiscmail.wisc.edu> <7790e85019cb3c.55befe43@wiscmail.wisc.edu> <774081cb19c91f.55befe81@wiscmail.wisc.edu> <7740fc9119942d.55befebf@wiscmail.wisc.edu> <7740800619f499.55befefc@wiscmail.wisc.edu> <77408ff719acd0.55beffef@wiscmail.wisc.edu> <7720f55919b92b.55bf0068@wiscmail.wisc.edu> <7720c87719822d.55bf00a6@wiscmail.wisc.edu> <7720ae66199b7c.55bf00e4@wiscmail.wisc.edu> <7720bc3819aa29.55bf0122@wiscmail.wisc.edu> <7620dfc219e016.55bf0160@wiscmail.wisc.edu> <7620af8f19a909.55bf019e@wiscmail.wisc.edu> <7690986219ed18.55bf01dc@wiscmail.wisc.edu> <7690bc7419a786.55bf0219@wiscmail.wisc.edu> <7690ea4e19b283.55bf0258@wiscmail.wisc.edu> <76008e80199c29.55bf0296@wiscmail.wisc.edu> <75e0cd8219e481.55bf034b@wiscmail.wisc.edu> <75e0a7e7199aa0.55bf03c5@wiscmail.wisc.edu> <75e08cd219963d.55bebdb2@wiscmail.wisc.edu> Message-ID: On Mon Aug 3 10:49:35 2015 GMT+0200, Matthew Brett wrote: > Hi, > > On Mon, Aug 3, 2015 at 8:09 AM, Nathaniel Smith wrote: > > On Aug 2, 2015 11:06 PM, "Kang Wang" wrote: > >> > >> This is very good discussion. Thank you all for replying. > >> > >> I can see the fundamental difference is that I always > >> think/talk/read/write a 3D image as I(x, y, z), not (plane, row, column) . I > >> am coming from MRI (magnetic resonance imaging) research, and I can assure > >> you that the entire MRI community is using (x, y, z), including books, > >> journal papers, conference abstracts, presentations, everything. We even > >> talk about what we called "logical x/y/z" and "physical x/y/z", and the > >> rotation matrix that converts the two coordinate systems. The radiologists > >> are also used to (x, y, z). For example, we always say "my image is 256 by > >> 256 by 20 slices", and we never say "20 by 256 by 256". > >> > >> So, basically, at least in MRI, we always talk about an image as I(x, y, > >> z), and we always assume that "x" is the fastest changing index. That's why > >> I prefer column-major (because it is more natural). > >> > >> Of course, I can totally get my work done by using row-major, I just have > >> to always remind myself "write last dimension index first" when coding. I > >> actually have done this before, and I found it would be so much easier if > >> just using column-major. > > > > Why not just use I[x, y, z] like you're used to, and let the computer worry > > about the physical layout in memory? Sometimes this will be Fortran order > > and sometimes C order and sometimes something else, but you don't have to > > know or care; 99% of the time it doesn't matter. The worst case is that when > > you use a python wrapper to call into a library that can only handle Fortran > > order, then the wrapper will quietly have to convert the memory order around > > and it will be slightly slower than if you had used Fortran order in the > > first place. But in practice you'll barely ever notice this, and when you > > do, *then* you can tell numpy explicitly what memory layout you want in the > > situation where it matters. > > Yes - if you are using numpy, you really have to look numpy in the eye and say: > > "I will let you worry about the array element order in memory, and in > return, you promise to make indexing work as I would expect" > > Just for example, let's say you loaded an MRI image into memory: > > In [1]: import nibabel > In [2]: img = nibabel.load('my_mri.nii') > In [3]: data = img.get_data() > > Because NIfTI images are Fortran memory layout, this happens to be the > memory layout you get for your array: > > In [4]: data.flags > Out[4]: > C_CONTIGUOUS : False > F_CONTIGUOUS : True > OWNDATA : False > WRITEABLE : True > ALIGNED : True > UPDATEIFCOPY : False > > But now - in Python - all I care about is what data I have on the > first, second, third axes. For example, I could do this: > > In [5]: data_copy = data.copy() > > This has exactly the same values as the original array, and at the > same index positions: > > In [7]: import numpy as np > In [8]: np.all(data == data) > Out[8]: memmap(True, dtype=bool) > > but I now have a C memory layout array. > > In [9]: data_copy.flags > Out[9]: > C_CONTIGUOUS : True > F_CONTIGUOUS : False > OWNDATA : True > WRITEABLE : True > ALIGNED : True > UPDATEIFCOPY : False > Yeah, I would like to second those arguments. Most of the time, there is no need to worry about layout. For large chunks you allocate, it may make sense for speed, etc. So you can alias creation functions. Generally, I would suggest to simply not worry about the memory layout. Also do not *trust* the layout for most function returns. If you need a specific layout to interface other code, always check what you got it. -Sebastian > Worse than that, if I slice my original data array, then I get an > array that is neither C- or Fortran- compatible in memory: > > In [10]: data_view = data[:, :, ::2] > In [11]: data_view.flags > Out[11]: > C_CONTIGUOUS : False > F_CONTIGUOUS : False > OWNDATA : False > WRITEABLE : True > ALIGNED : True > UPDATEIFCOPY : False > > So - if you want every array to be Fortran-contiguous in memory, I > would not start with numpy at all, I would write your own array > library. > > The alternative - or "the numpy way" - is to give up on enforcing a > particular layout in memory, until you need to pass an array to some C > or C++ or Fortran code that needs some particular layout, in which > case you get your extension code to copy the array into the required > layout on entry. Of course this is what numpy itself has to do when > interfacing with external libraries like BLAS or LAPACK. > > Cheers, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From grlee77 at gmail.com Mon Aug 3 10:13:56 2015 From: grlee77 at gmail.com (Gregory Lee) Date: Mon, 3 Aug 2015 10:13:56 -0400 Subject: [Numpy-discussion] Change default order to Fortran order In-Reply-To: References: <7740864542dd.55bddb1a@wiscmail.wisc.edu> <7610d3f519bbd6.55bec8c3@wiscmail.wisc.edu> <762089a419b1da.55bec901@wiscmail.wisc.edu> <7690e69c19c6f5.55bec93e@wiscmail.wisc.edu> <74508bf519b933.55bec97c@wiscmail.wisc.edu> <7600e48b19abd0.55bec9b9@wiscmail.wisc.edu> <76109628198969.55bec9f7@wiscmail.wisc.edu> <7620c52619dbc7.55beca36@wiscmail.wisc.edu> <7740b86c19a9af.55beca74@wiscmail.wisc.edu> <74b0c96f19dd2e.55becab2@wiscmail.wisc.edu> <7720e87c19fe7f.55becaf0@wiscmail.wisc.edu> <76b0c5ac19a331.55becb2e@wiscmail.wisc.edu> <7620b4de199872.55becb6d@wiscmail.wisc.edu> <74b0bbd319feec.55becbab@wiscmail.wisc.edu> <7620d52d19bee2.55becbeb@wiscmail.wisc.edu> <76b0e43919dfdb.55becc29@wiscmail.wisc.edu> <75e0a7dc19bdab.55becc68@wiscmail.wisc.edu> <75e0c028198583.55becca6@wiscmail.wisc.edu> <74b0a09d19c79c.55becce4@wiscmail.wisc.edu> <76b0808619d856.55becd22@wiscmail.wisc.edu> <7610891d1992e0.55becd5f@wiscmail.wisc.edu> <7740b16819811d.55becd9d@wiscmail.wisc.edu> <75e0d05a19a210.55becddb@wiscmail.wisc.edu> <7620f86d19cdfb.55bece19@wiscmail.wisc.edu> <74b0e26619d88e.55bece57@wiscmail.wisc.edu> <7450aa2219d85b.55bece94@wiscmail.wisc.edu> <74b0c506198959.55beced2@wiscmail.wisc.edu> <74b0e2ba198afc.55be88a1@wiscmail.wisc.edu> <7790f915199428.55befb9a@wiscmail.wisc.edu> <7790906919a1f1.55befbd8@wiscmail.wisc.edu> <7790852f19d571.55befc17@wiscmail.wisc.edu> <77209c0219e46d.55befc54@wiscmail.wisc.edu> <7720f14319db76.55befc92@wiscmail.wisc.edu> <7740cea919c20b.55befcd0@wiscmail.wisc.edu> <75e091da1983c6.55befd0d@wiscmail.wisc.edu> <75e092e119d0d4.55befd4c@wiscmail.wisc.edu> <74508dfa19c665.55befd89@wiscmail.wisc.edu> <779098271984ef.55befdc7@wiscmail.wisc.edu> <7790a0c619efcf.55befe05@wiscmail.wisc.edu> <7790e85019cb3c.55befe43@wiscmail.wisc.edu> <774081cb19c91f.55befe81@wiscmail.wisc.edu> <7740fc9119942d.55befebf@wiscmail.wisc.edu> <7740800619f499.55befefc@wiscmail.wisc.edu> <77408ff719acd0.55beffef@wiscmail.wisc.edu> <7720f55919b92b.55bf0068@wiscmail.wisc.edu> <7720c87719822d.55bf00a6@wiscmail.wisc.edu> <7720ae66199b7c.55bf00e4@wiscmail.wisc.edu> <7720bc3819aa29.55bf0122@wiscmail.wisc.edu> <7620dfc219e016.55bf0160@wiscmail.wisc.edu> <7620af8f19a909.55bf019e@wiscmail.wisc.edu> <7690986219ed18.55bf01dc@wiscmail.wisc.edu> <7690bc7419a786.55bf0219@wiscmail.wisc.edu> <7690ea4e19b283.55bf0258@wiscmail.wisc.edu> <76008e80199c29.55bf0296@wiscmail.wisc.edu> <75e0cd8219e481.55bf034b@wiscmail.wisc.edu> <75e0a7e7199aa0.55bf03c5@wiscmail.wisc.edu> <75e08cd219963d.55bebdb2@wiscmail.wisc.edu> Message-ID: I agree that often you don't need to worry about the memory order. However, it is not uncommon in medical imaging to go back and forth between a 2D or 3D image representation and a 1D array representation (e.g. as often used in image reconstruction algorithms). I found that the main time it was necessary to pay careful attention to the memory layout was when converting Matlab scripts that involve reshaping operations. On Mon, Aug 3, 2015 at 8:02 AM, Sebastian Berg wrote: > On Mon Aug 3 10:49:35 2015 GMT+0200, Matthew Brett wrote: > > Hi, > > > > On Mon, Aug 3, 2015 at 8:09 AM, Nathaniel Smith wrote: > > > On Aug 2, 2015 11:06 PM, "Kang Wang" wrote: > > >> > > >> This is very good discussion. Thank you all for replying. > > >> > > >> I can see the fundamental difference is that I always > > >> think/talk/read/write a 3D image as I(x, y, z), not (plane, row, > column) . I > > >> am coming from MRI (magnetic resonance imaging) research, and I can > assure > > >> you that the entire MRI community is using (x, y, z), including books, > > >> journal papers, conference abstracts, presentations, everything. We > even > > >> talk about what we called "logical x/y/z" and "physical x/y/z", and > the > > >> rotation matrix that converts the two coordinate systems. The > radiologists > > >> are also used to (x, y, z). For example, we always say "my image is > 256 by > > >> 256 by 20 slices", and we never say "20 by 256 by 256". > > >> > > >> So, basically, at least in MRI, we always talk about an image as I(x, > y, > > >> z), and we always assume that "x" is the fastest changing index. > That's why > > >> I prefer column-major (because it is more natural). > > >> > > >> Of course, I can totally get my work done by using row-major, I just > have > > >> to always remind myself "write last dimension index first" when > coding. I > > >> actually have done this before, and I found it would be so much > easier if > > >> just using column-major. > > > > > > Why not just use I[x, y, z] like you're used to, and let the computer > worry > > > about the physical layout in memory? Sometimes this will be Fortran > order > > > and sometimes C order and sometimes something else, but you don't have > to > > > know or care; 99% of the time it doesn't matter. The worst case is > that when > > > you use a python wrapper to call into a library that can only handle > Fortran > > > order, then the wrapper will quietly have to convert the memory order > around > > > and it will be slightly slower than if you had used Fortran order in > the > > > first place. But in practice you'll barely ever notice this, and when > you > > > do, *then* you can tell numpy explicitly what memory layout you want > in the > > > situation where it matters. > > > > Yes - if you are using numpy, you really have to look numpy in the eye > and say: > > > > "I will let you worry about the array element order in memory, and in > > return, you promise to make indexing work as I would expect" > > > > Just for example, let's say you loaded an MRI image into memory: > > > > In [1]: import nibabel > > In [2]: img = nibabel.load('my_mri.nii') > > In [3]: data = img.get_data() > > > > Because NIfTI images are Fortran memory layout, this happens to be the > > memory layout you get for your array: > > > > In [4]: data.flags > > Out[4]: > > C_CONTIGUOUS : False > > F_CONTIGUOUS : True > > OWNDATA : False > > WRITEABLE : True > > ALIGNED : True > > UPDATEIFCOPY : False > > > > But now - in Python - all I care about is what data I have on the > > first, second, third axes. For example, I could do this: > > > > In [5]: data_copy = data.copy() > > > > This has exactly the same values as the original array, and at the > > same index positions: > > > > In [7]: import numpy as np > > In [8]: np.all(data == data) > > Out[8]: memmap(True, dtype=bool) > > > > but I now have a C memory layout array. > > > > In [9]: data_copy.flags > > Out[9]: > > C_CONTIGUOUS : True > > F_CONTIGUOUS : False > > OWNDATA : True > > WRITEABLE : True > > ALIGNED : True > > UPDATEIFCOPY : False > > > > Yeah, I would like to second those arguments. Most of the time, there is > no need to worry about layout. For large chunks you allocate, it may make > sense for speed, etc. So you can alias creation functions. Generally, I > would suggest to simply not worry about the memory layout. Also do not > *trust* the layout for most function returns. If you need a specific layout > to interface other code, always check what you got it. > > -Sebastian > > > > Worse than that, if I slice my original data array, then I get an > > array that is neither C- or Fortran- compatible in memory: > > > > In [10]: data_view = data[:, :, ::2] > > In [11]: data_view.flags > > Out[11]: > > C_CONTIGUOUS : False > > F_CONTIGUOUS : False > > OWNDATA : False > > WRITEABLE : True > > ALIGNED : True > > UPDATEIFCOPY : False > > > > So - if you want every array to be Fortran-contiguous in memory, I > > would not start with numpy at all, I would write your own array > > library. > > > > The alternative - or "the numpy way" - is to give up on enforcing a > > particular layout in memory, until you need to pass an array to some C > > or C++ or Fortran code that needs some particular layout, in which > > case you get your extension code to copy the array into the required > > layout on entry. Of course this is what numpy itself has to do when > > interfacing with external libraries like BLAS or LAPACK. > > > > Cheers, > > > > Matthew > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Mon Aug 3 10:26:10 2015 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 3 Aug 2015 15:26:10 +0100 Subject: [Numpy-discussion] Change default order to Fortran order In-Reply-To: References: <7740864542dd.55bddb1a@wiscmail.wisc.edu> <7610d3f519bbd6.55bec8c3@wiscmail.wisc.edu> <762089a419b1da.55bec901@wiscmail.wisc.edu> <7690e69c19c6f5.55bec93e@wiscmail.wisc.edu> <74508bf519b933.55bec97c@wiscmail.wisc.edu> <7600e48b19abd0.55bec9b9@wiscmail.wisc.edu> <76109628198969.55bec9f7@wiscmail.wisc.edu> <7620c52619dbc7.55beca36@wiscmail.wisc.edu> <7740b86c19a9af.55beca74@wiscmail.wisc.edu> <74b0c96f19dd2e.55becab2@wiscmail.wisc.edu> <7720e87c19fe7f.55becaf0@wiscmail.wisc.edu> <76b0c5ac19a331.55becb2e@wiscmail.wisc.edu> <7620b4de199872.55becb6d@wiscmail.wisc.edu> <74b0bbd319feec.55becbab@wiscmail.wisc.edu> <7620d52d19bee2.55becbeb@wiscmail.wisc.edu> <76b0e43919dfdb.55becc29@wiscmail.wisc.edu> <75e0a7dc19bdab.55becc68@wiscmail.wisc.edu> <75e0c028198583.55becca6@wiscmail.wisc.edu> <74b0a09d19c79c.55becce4@wiscmail.wisc.edu> <76b0808619d856.55becd22@wiscmail.wisc.edu> <7610891d1992e0.55becd5f@wiscmail.wisc.edu> <7740b16819811d.55becd9d@wiscmail.wisc.edu> <75e0d05a19a210.55becddb@wiscmail.wisc.edu> <7620f86d19cdfb.55bece19@wiscmail.wisc.edu> <74b0e26619d88e.55bece57@wiscmail.wisc.edu> <7450aa2219d85b.55bece94@wiscmail.wisc.edu> <74b0c506198959.55beced2@wiscmail.wisc.edu> <74b0e2ba198afc.55be88a1@wiscmail.wisc.edu> <7790f915199428.55befb9a@wiscmail.wisc.edu> <7790906919a1f1.55befbd8@wiscmail.wisc.edu> <7790852f19d571.55befc17@wiscmail.wisc.edu> <77209c0219e46d.55befc54@wiscmail.wisc.edu> <7720f14319db76.55befc92@wiscmail.wisc.edu> <7740cea919c20b.55befcd0@wiscmail.wisc.edu> <75e091da1983c6.55befd0d@wiscmail.wisc.edu> <75e092e119d0d4.55befd4c@wiscmail.wisc.edu> <74508dfa19c665.55befd89@wiscmail.wisc.edu> <779098271984ef.55befdc7@wiscmail.wisc.edu> <7790a0c619efcf.55befe05@wiscmail.wisc.edu> <7790e85019cb3c.55befe43@wiscmail.wisc.edu> <774081cb19c91f.55befe81@wiscmail.wisc.edu> <7740fc9119942d.55befebf@wiscmail.wisc.edu> <7740800619f499.55befefc@wiscmail.wisc.edu> <77408ff719acd0.55beffef@wiscmail.wisc.edu> <7720f55919b92b.55bf0068@wiscmail.wisc.edu> <7720c87719822d.55bf00a6@wiscmail.wisc.edu> <7720ae66199b7c.55bf00e4@wiscmail.wisc.edu> <7720bc3819aa29.55bf0122@wiscmail.wisc.edu> <7620dfc219e016.55bf0160@wiscmail.wisc.edu> <7620af8f19a909.55bf019e@wiscmail.wisc.edu> <7690986219ed18.55bf01dc@wiscmail.wisc.edu> <7690bc7419a786.55bf0219@wiscmail.wisc.edu> <7690ea4e19b283.55bf0258@wiscmail.wisc.edu> <76008e80199c29.55bf0296@wiscmail.wisc.edu> <75e0cd8219e481.55bf034b@wiscmail.wisc.edu> <75e0a7e7199aa0.55bf03c5@wiscmail.wisc.edu> <75e08cd219963d.55bebdb2@wiscmail.wisc.edu> Message-ID: Hi, On Mon, Aug 3, 2015 at 3:13 PM, Gregory Lee wrote: > I agree that often you don't need to worry about the memory order. However, > it is not uncommon in medical imaging to go back and forth between a 2D or > 3D image representation and a 1D array representation (e.g. as often used in > image reconstruction algorithms). I found that the main time it was > necessary to pay careful attention to the memory layout was when converting > Matlab scripts that involve reshaping operations. Yes, good point. A typical example would be this kind of thing: # data is a 4D array with time / volume axis last data_2d = data.reshape((-1, data.shape[-1]) For MATLAB, the columns of this array would (by default) have the values on the first axis fastest changing, then the second, then the third, whereas numpy's default is the other way round. I find I usually don't have to worry about this, because I'm later going to do: data_processed_4d = data_2d.reshape(data.shape) which will reverse the previous reshape in the correct way. But in any case - this is not directly to do with the array memory layout. You will get the same output from reshape whether the memory layout of `data` was Fortran or C. Cheers, Matthew From sebastian at sipsolutions.net Mon Aug 3 10:53:23 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Mon, 3 Aug 2015 14:53:23 +0000 Subject: [Numpy-discussion] Change default order to Fortran order In-Reply-To: References: <7740864542dd.55bddb1a@wiscmail.wisc.edu> <7610d3f519bbd6.55bec8c3@wiscmail.wisc.edu> <762089a419b1da.55bec901@wiscmail.wisc.edu> <7690e69c19c6f5.55bec93e@wiscmail.wisc.edu> <74508bf519b933.55bec97c@wiscmail.wisc.edu> <7600e48b19abd0.55bec9b9@wiscmail.wisc.edu> <76109628198969.55bec9f7@wiscmail.wisc.edu> <7620c52619dbc7.55beca36@wiscmail.wisc.edu> <7740b86c19a9af.55beca74@wiscmail.wisc.edu> <74b0c96f19dd2e.55becab2@wiscmail.wisc.edu> <7720e87c19fe7f.55becaf0@wiscmail.wisc.edu> <76b0c5ac19a331.55becb2e@wiscmail.wisc.edu> <7620b4de199872.55becb6d@wiscmail.wisc.edu> <74b0bbd319feec.55becbab@wiscmail.wisc.edu> <7620d52d19bee2.55becbeb@wiscmail.wisc.edu> <76b0e43919dfdb.55becc29@wiscmail.wisc.edu> <75e0a7dc19bdab.55becc68@wiscmail.wisc.edu> <75e0c028198583.55becca6@wiscmail.wisc.edu> <74b0a09d19c79c.55becce4@wiscmail.wisc.edu> <76b0808619d856.55becd22@wiscmail.wisc.edu> <7610891d1992e0.55becd5f@wiscmail.wisc.edu> <7740b16819811d.55becd9d@wiscmail.wisc.edu> <75e0d05a19a210.55becddb@wiscmail.wisc.edu> <7620f86d19cdfb.55bece19@wiscmail.wisc.edu> <74b0e26619d88e.55bece57@wiscmail.wisc.edu> <7450aa2219d85b.55bece94@wiscmail.wisc.edu> <74b0c506198959.55beced2@wiscmail.wisc.edu> <74b0e2ba198afc.55be88a1@wiscmail.wisc.edu> <7790f915199428.55befb9a@wiscmail.wisc.edu> <7790906919a1f1.55befbd8@wiscmail.wisc.edu> <7790852f19d571.55befc17@wiscmail.wisc.edu> <77209c0219e46d.55befc54@wiscmail.wisc.edu> <7720f14319db76.55befc92@wiscmail.wisc.edu> <7740cea919c20b.55befcd0@wiscmail.wisc.edu> <75e091da1983c6.55befd0d@wiscmail.wisc.edu> <75e092e119d0d4.55befd4c@wiscmail.wisc.edu> <74508dfa19c665.55befd89@wiscmail.wisc.edu> <779098271984ef.55befdc7@wiscmail.wisc.edu> <7790a0c619efcf.55befe05@wiscmail.wisc.edu> <7790e85019cb3c.55befe43@wiscmail.wisc.edu> <774081cb19c91f.55befe81@wiscmail.wisc.edu> <7740fc9119942d.55befebf@wiscmail.wisc.edu> <7740800619f499.55befefc@wiscmail.wisc.edu> <77408ff719acd0.55beffef@wiscmail.wisc.edu> <7720f55919b92b.55bf0068@wiscmail.wisc.edu> <7720c87719822d.55bf00a6@wiscmail.wisc.edu> <7720ae66199b7c.55bf00e4@wiscmail.wisc.edu> <7720bc3819aa29.55bf0122@wiscmail.wisc.edu> <7620dfc219e016.55bf0160@wiscmail.wisc.edu> <7620af8f19a909.55bf019e@wiscmail.wisc.edu> <7690986219ed18.55bf01dc@wiscmail.wisc.edu> <7690bc7419a786.55bf0219@wiscmail.wisc.edu> <7690ea4e19b283.55bf0258@wiscmail.wisc.edu> <76008e80199c29.55bf0296@wiscmail.wisc.edu> <75e0cd8219e481.55bf034b@wiscmail.wisc.edu> <75e0a7e7199aa0.55bf03c5@wiscmail.wisc.edu> <75e08cd219963d.55bebdb2@wiscmail.wisc.edu> Message-ID: On Mon Aug 3 16:26:10 2015 GMT+0200, Matthew Brett wrote: > Hi, > > On Mon, Aug 3, 2015 at 3:13 PM, Gregory Lee wrote: > > I agree that often you don't need to worry about the memory order. However, > > it is not uncommon in medical imaging to go back and forth between a 2D or > > 3D image representation and a 1D array representation (e.g. as often used in > > image reconstruction algorithms). I found that the main time it was > > necessary to pay careful attention to the memory layout was when converting > > Matlab scripts that involve reshaping operations. > > Yes, good point. A typical example would be this kind of thing: > > # data is a 4D array with time / volume axis last > data_2d = data.reshape((-1, data.shape[-1]) > > For MATLAB, the columns of this array would (by default) have the > values on the first axis fastest changing, then the second, then the > third, whereas numpy's default is the other way round. > > I find I usually don't have to worry about this, because I'm later going to do: > > data_processed_4d = data_2d.reshape(data.shape) > > which will reverse the previous reshape in the correct way. > > But in any case - this is not directly to do with the array memory > layout. You will get the same output from reshape whether the memory > layout of `data` was Fortran or C. > Just as a remark. Reshape has an (iteration not really memory) order parameter, thou it may do more copies if those do not match. - Sebastian > Cheers, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From sturla.molden at gmail.com Mon Aug 3 10:55:04 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Mon, 3 Aug 2015 14:55:04 +0000 (UTC) Subject: [Numpy-discussion] Change default order to Fortran order References: <74b0e2ba198afc.55be88a1@wiscmail.wisc.edu> Message-ID: <1809670063460305340.813019sturla.molden-gmail.com@news.gmane.org> Juan Nunez-Iglesias wrote: > The short version is that you'll save yourself a lot of pain by starting to > think of your images as (plane, row, column) instead of (x, y, z). There are several things to consider here. 1. The vertices in computer graphics (OpenGL) are (x,y,z). 2. OpenGL rotation matrices and projection matrice are stored in column major order. 3. OpenGL frame buffers are indexed (x,y) in column major order with (0,0) being lower left. 4. ITK and VTK depends on OpenGL and are thus using column major order. 5. Those who use Matlab or Fortran in addition to Python prefer column major order. 6. BLAS and LAPACK use column major order. 7. The common notation in image prorcessing (as opposed to computer graphics in geberal) is indexing (row, column), in row major order, with (0,0) being upper left. All in all, this is a strong case for prefering column major order and the common mathematical notation (x,y,z). Also notice how the ususal notation in image pricessing differs from OpenGL. Sturla From c99.smruti at gmail.com Mon Aug 3 11:00:27 2015 From: c99.smruti at gmail.com (SMRUTI RANJAN SAHOO) Date: Mon, 3 Aug 2015 20:30:27 +0530 Subject: [Numpy-discussion] Change default order to Fortran order In-Reply-To: <7740864542dd.55bddb1a@wiscmail.wisc.edu> References: <7740864542dd.55bddb1a@wiscmail.wisc.edu> Message-ID: well its really great idea. i can help on python but i don't have any knowledge on fortran. On Sun, Aug 2, 2015 at 7:25 PM, Kang Wang wrote: > Hi, > > I am an imaging researcher, and a new Python user. My first Python project > is to somehow modify NumPy source code such that everything is Fortran > column-major by default. > > I read about the information in the link below, but for us, the fact is > that *we absolutely want to use Fortran column major, and we want to > make it default. Explicitly writing " order = 'F' " all over the place is > not acceptable to us*. > > http://docs.scipy.org/doc/numpy/reference/internals.html#multidimensional-array-indexing-order-issues > > I tried searching in this email list, as well as google search in general. > However, I have not found anything useful. This must be a common > request/need, I believe. > > Can anyone provide any insight/help? > > Thank you very much, > > Kang > > -- > *Kang Wang, Ph.D.* > 1111 Highland Ave., Room 1113 > Madison, WI 53705-2275 > ---------------------------------------- > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Mon Aug 3 11:16:02 2015 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 3 Aug 2015 16:16:02 +0100 Subject: [Numpy-discussion] Change default order to Fortran order In-Reply-To: <1809670063460305340.813019sturla.molden-gmail.com@news.gmane.org> References: <74b0e2ba198afc.55be88a1@wiscmail.wisc.edu> <1809670063460305340.813019sturla.molden-gmail.com@news.gmane.org> Message-ID: On Mon, Aug 3, 2015 at 3:55 PM, Sturla Molden wrote: > Juan Nunez-Iglesias wrote: > >> The short version is that you'll save yourself a lot of pain by starting to >> think of your images as (plane, row, column) instead of (x, y, z). > > There are several things to consider here. > > 1. The vertices in computer graphics (OpenGL) are (x,y,z). > > 2. OpenGL rotation matrices and projection matrice are stored in column > major order. > > 3. OpenGL frame buffers are indexed (x,y) in column major order with (0,0) > being lower left. > > 4. ITK and VTK depends on OpenGL and are thus using column major order. > > 5. Those who use Matlab or Fortran in addition to Python prefer column > major order. > > 6. BLAS and LAPACK use column major order. > > 7. The common notation in image prorcessing (as opposed to computer > graphics in geberal) is indexing (row, column), in row major order, with > (0,0) being upper left. > > All in all, this is a strong case for prefering column major order and the > common mathematical notation (x,y,z). > > Also notice how the ususal notation in image pricessing differs from > OpenGL. Sure, but to avoid confusion, maybe move the discussion of image indexing order to another thread? I think this thread is about memory layout, which is a different issue. Cheers, Matthew From sturla.molden at gmail.com Mon Aug 3 11:42:03 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Mon, 3 Aug 2015 15:42:03 +0000 (UTC) Subject: [Numpy-discussion] Change default order to Fortran order References: <7740864542dd.55bddb1a@wiscmail.wisc.edu> Message-ID: <111980011460308224.682392sturla.molden-gmail.com@news.gmane.org> SMRUTI RANJAN SAHOO wrote: > well its really great idea. i can help on python but i don't have any > knowledge on fortran. I have been thinking in these lines too. But I have always thought it would be too much work for very little in return, and it might not interop properly with libraries written for NumPy (though PEP3118 might have changed that). I am not sure using Fortran in addition to Cython is a good idea, but it might be. At least if we limit the number of dimenstions to, say, 4 or less, ot would be easy to implement most of the code in vectorized Fortran. Sturla From sturla.molden at gmail.com Mon Aug 3 12:01:09 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Mon, 3 Aug 2015 16:01:09 +0000 (UTC) Subject: [Numpy-discussion] Change default order to Fortran order References: <74b0e2ba198afc.55be88a1@wiscmail.wisc.edu> <1809670063460305340.813019sturla.molden-gmail.com@news.gmane.org> Message-ID: <240594636460309376.989533sturla.molden-gmail.com@news.gmane.org> Matthew Brett wrote: > Sure, but to avoid confusion, maybe move the discussion of image > indexing order to another thread? > > I think this thread is about memory layout, which is a different issue. It is actually a bit convoluted and not completely orthogonal. Memory layout does not matter for 2d ndexing, i.e. (x,y) vs. (row, column), if you are careful when iterating, but it does matter for Nd indexing. There is a reason to prefer (x,y,z,t,r) in column major order or (recording, time, slice, row, column) in row major order. Otherwise you can get very inefficient memory traversals. Then if you work with visualization libraries that expects (x,y,z) and column major order, e.g. ITK, VTK and OpenGL, this is really what you want to use. And the choise of indexing (x,y,z) cannot be seen as independent of the memory layout. Remember, it is not just a matter of mapping coordinates to pixels. The data sets are so large in MRI processing that memory layout does matter. Sturla From matthew.brett at gmail.com Mon Aug 3 12:24:51 2015 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 3 Aug 2015 17:24:51 +0100 Subject: [Numpy-discussion] Change default order to Fortran order In-Reply-To: <240594636460309376.989533sturla.molden-gmail.com@news.gmane.org> References: <74b0e2ba198afc.55be88a1@wiscmail.wisc.edu> <1809670063460305340.813019sturla.molden-gmail.com@news.gmane.org> <240594636460309376.989533sturla.molden-gmail.com@news.gmane.org> Message-ID: On Mon, Aug 3, 2015 at 5:01 PM, Sturla Molden wrote: > Matthew Brett wrote: > >> Sure, but to avoid confusion, maybe move the discussion of image >> indexing order to another thread? >> >> I think this thread is about memory layout, which is a different issue. > > It is actually a bit convoluted and not completely orthogonal. Memory > layout does not matter for 2d ndexing, i.e. (x,y) vs. (row, column), if you > are careful when iterating, but it does matter for Nd indexing. There is a > reason to prefer (x,y,z,t,r) in column major order or (recording, time, > slice, row, column) in row major order. Otherwise you can get very > inefficient memory traversals. Then if you work with visualization > libraries that expects (x,y,z) and column major order, e.g. ITK, VTK and > OpenGL, this is really what you want to use. And the choise of indexing > (x,y,z) cannot be seen as independent of the memory layout. Remember, it is > not just a matter of mapping coordinates to pixels. The data sets are so > large in MRI processing that memory layout does matter. I completely agree that memory layout can affect performance, and this can be important. On the other hand, I think you agree that the relationship of axis order to memory layout is just a matter of mapping coordinates to pixels. So you can change the axis ordering without changing the memory layout and the memory layout without changing the axis ordering. Of course you could argue that it would be simpler to fuse the two issues, and enforce one memory layout - say Fortran. The result might well be easier think about, but it wouldn't be much like numpy, and it would have lots of performance and memory disadvantages. Cheers, Matthew From chris.barker at noaa.gov Mon Aug 3 12:25:22 2015 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 3 Aug 2015 09:25:22 -0700 Subject: [Numpy-discussion] Proposal: Deprecate np.int, np.float, etc.? In-Reply-To: References: <55B25F1A.70107@googlemail.com> <55BB2611.10003@googlemail.com> Message-ID: On Sun, Aug 2, 2015 at 5:13 AM, Sturla Molden wrote: > > A long is only machine word wide on posix, in windows its not. > > Actually it is the opposite. A pointer is 64 bit on AMD64, but the > native integer and pointer offset is only 32 bit. But it does not matter > because it is int that should be machine word sized, not long, which it > is on both platforms. > All this illustrates that there is a lot of platform independence and complexity to the "standard" C types. I suppose it's a good thing -- you can use something like "int" in C code, and presto! more precision in the future when you re-compile on a newer system. However, for any code that needs some kind of binary compatibility between systems (or is dynamic, like python -- i.e. types are declared at run-time, not compile time), the "fixed width types are a lot safer (or at least easier to reason about). So we have tow issue with numpy: 1) confusing python types with C types -- e.g. np.int is currently a python integer, NOT a C int -- I think this is a litte too confusing, and should be depricated. (and np.long -- even more confusing!!!) 2) The vagaries of the standard C types: int, long, etc (spelled np.intc, which is a int32 on my machine, anyway) [NOTE: is there a C long dtype? I can't find it at the moment...] It's probably a good idea to keep these, particularly for interfacing with C code (like my example of calling C code that use int). Though it would be good to make sure the docstring make it clear what they are. However, I"d like to see a recommended practice of using sized types wherevver you can: uint8 int32 float32 float54 etc.... not sure how to propagate that practice, but I'd love to see it become common. Should we add aliases for the stdint names? np.int_32_t, etc??? might be good to adhere to an established standard. -CHB > > Sturla > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Mon Aug 3 12:30:06 2015 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 3 Aug 2015 09:30:06 -0700 Subject: [Numpy-discussion] Change default order to Fortran order In-Reply-To: References: <7740864542dd.55bddb1a@wiscmail.wisc.edu> <74b0de5619c9ee.55be77b0@wiscmail.wisc.edu> <77208f8c19abe0.55be77ed@wiscmail.wisc.edu> <7740c3e01988b3.55be782b@wiscmail.wisc.edu> <7740821519faf2.55be7868@wiscmail.wisc.edu> <74b0aa0e19ff5a.55be78a5@wiscmail.wisc.edu> <7610d2e119e175.55be78e3@wiscmail.wisc.edu> <76b0a897198bde.55be7921@wiscmail.wisc.edu> <7450811719dc7d.55be795e@wiscmail.wisc.edu> <75e0cf9619906d.55be799c@wiscmail.wisc.edu> <772087c3198ed4.55be79da@wiscmail.wisc.edu> <7720ae8f19c5b1.55be33c7@wiscmail.wisc.edu> <83FD48B4-2CF1-4D9E-AED4-FA0F17A07722@continuum.io> Message-ID: On Sun, Aug 2, 2015 at 1:46 PM, Sturla Molden wrote: > On 02/08/15 22:28, Bryan Van de Ven wrote: > > And to eliminate the order kwarg, use functools.partial to patch the > zeros function (or any others, as needed): > > This will probably break code that depends on NumPy, like SciPy and > scikit-image. But if NumPy is all that matters, sure go ahead and monkey > patch. Otherwise keep the patched functions in another namespace. > I"d be really careful about this -- sure it's annoying, but a kind of global change of behavior could wreak havok. I'd create a set of Fortran-order constructors -- if it were me, I do: fzeros fones, etc..... but you could, I suppose, create a namespace and ut hem all there, then create a fnumpy that would write those over: import numpy as np and away you go -- but that wouldn't change any code that imports numpy in the usual way. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From allanhaldane at gmail.com Mon Aug 3 13:03:58 2015 From: allanhaldane at gmail.com (Allan Haldane) Date: Mon, 3 Aug 2015 13:03:58 -0400 Subject: [Numpy-discussion] Proposal: Deprecate np.int, np.float, etc.? In-Reply-To: References: <55B25F1A.70107@googlemail.com> <55BB2611.10003@googlemail.com> Message-ID: <55BF9EFE.4030503@gmail.com> On 08/03/2015 12:25 PM, Chris Barker wrote: > 2) The vagaries of the standard C types: int, long, etc (spelled > np.intc, which is a int32 on my machine, anyway) > [NOTE: is there a C long dtype? I can't find it at the moment...] Numpy does define "the platform dependent C integer types short, long, longlong and their unsigned versions" according to the docs. size_t is the same size as intc. Even though float and double are virtually always IEEE single and double precision, maybe for consistency we should also define np.floatc, np.doublec and np.longdoublec? Allan From charlesr.harris at gmail.com Mon Aug 3 13:11:02 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 3 Aug 2015 11:11:02 -0600 Subject: [Numpy-discussion] Change default order to Fortran order In-Reply-To: References: <74b0e2ba198afc.55be88a1@wiscmail.wisc.edu> <1809670063460305340.813019sturla.molden-gmail.com@news.gmane.org> <240594636460309376.989533sturla.molden-gmail.com@news.gmane.org> Message-ID: On Mon, Aug 3, 2015 at 10:24 AM, Matthew Brett wrote: > On Mon, Aug 3, 2015 at 5:01 PM, Sturla Molden > wrote: > > Matthew Brett wrote: > > > >> Sure, but to avoid confusion, maybe move the discussion of image > >> indexing order to another thread? > >> > >> I think this thread is about memory layout, which is a different issue. > > > > It is actually a bit convoluted and not completely orthogonal. Memory > > layout does not matter for 2d ndexing, i.e. (x,y) vs. (row, column), if > you > > are careful when iterating, but it does matter for Nd indexing. There is > a > > reason to prefer (x,y,z,t,r) in column major order or (recording, time, > > slice, row, column) in row major order. Otherwise you can get very > > inefficient memory traversals. Then if you work with visualization > > libraries that expects (x,y,z) and column major order, e.g. ITK, VTK and > > OpenGL, this is really what you want to use. And the choise of indexing > > (x,y,z) cannot be seen as independent of the memory layout. Remember, it > is > > not just a matter of mapping coordinates to pixels. The data sets are so > > large in MRI processing that memory layout does matter. > > I completely agree that memory layout can affect performance, and this > can be important. > > On the other hand, I think you agree that the relationship of axis > order to memory layout is just a matter of mapping coordinates to > pixels. > > So you can change the axis ordering without changing the memory layout > and the memory layout without changing the axis ordering. > > Of course you could argue that it would be simpler to fuse the two > issues, and enforce one memory layout - say Fortran. The result > might well be easier think about, but it wouldn't be much like numpy, > and it would have lots of performance and memory disadvantages. > > Cheers, > I would also strongly suggest that once you have decided on a convention that you thoroughly document it somewhere. That will not only help you, but anyone who later needs to maintain the code will bless you rather than d*nm you to eternal torture by the seven demons of Ipsos. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Mon Aug 3 14:05:57 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Mon, 03 Aug 2015 20:05:57 +0200 Subject: [Numpy-discussion] Proposal: Deprecate np.int, np.float, etc.? In-Reply-To: References: <55B25F1A.70107@googlemail.com> <55BB2611.10003@googlemail.com> Message-ID: On 03/08/15 18:25, Chris Barker wrote: > 2) The vagaries of the standard C types: int, long, etc (spelled > np.intc, which is a int32 on my machine, anyway) > [NOTE: is there a C long dtype? I can't find it at the moment...] There is, it is called np.int. This just illustrates the problem... Sturla From chris.barker at noaa.gov Mon Aug 3 14:51:27 2015 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 3 Aug 2015 11:51:27 -0700 Subject: [Numpy-discussion] Proposal: Deprecate np.int, np.float, etc.? In-Reply-To: References: <55B25F1A.70107@googlemail.com> <55BB2611.10003@googlemail.com> Message-ID: On Mon, Aug 3, 2015 at 11:05 AM, Sturla Molden wrote: > On 03/08/15 18:25, Chris Barker wrote: > > > [NOTE: is there a C long dtype? I can't find it at the moment...] > > There is, it is called np.int. well, IIUC, np.int is the python integer type, which is a C long in all the implemtations of cPython that I know about -- but is that a guarantee? in the future as well? For instance, if it were up to me, I'd use an int_64_t on all 64 bit platforms, rather than having that odd 32 bit on Windows, 64 bit on *nix silliness.... This just illustrates the problem... So another minor proposal: add a numpy.longc type, which would be platform C long. (and probably just an alias to something already there). -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Mon Aug 3 15:32:47 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Mon, 03 Aug 2015 21:32:47 +0200 Subject: [Numpy-discussion] Proposal: Deprecate np.int, np.float, etc.? In-Reply-To: References: <55B25F1A.70107@googlemail.com> <55BB2611.10003@googlemail.com> Message-ID: On 03/08/15 20:51, Chris Barker wrote: > well, IIUC, np.int is the python integer type, which is > a C long in all the implemtations of cPython that I know about -- but is > that a guarantee?in the future as well? It is a Python int on Python 2. On Python 3 dtype=np.int means the dtype will be C long, because a Python int has no size limit. But np.int aliases Python int. And creating an array with dype=int therefore does not create an array of Python int, it creates an array of C long. To actually get dtype=int we have to write dtype=object, which is just crazy. Sturla From antonio.valentino at tiscali.it Tue Aug 4 01:35:38 2015 From: antonio.valentino at tiscali.it (Antonio Valentino) Date: Tue, 4 Aug 2015 07:35:38 +0200 Subject: [Numpy-discussion] [pytables-dev] ANN: PyTables 3.2.1 released In-Reply-To: <40397670-4BC4-4272-A5D9-D24B3005F896@andreabedini.com> References: <40397670-4BC4-4272-A5D9-D24B3005F896@andreabedini.com> Message-ID: <05EA129D-2931-4D9C-84C6-665D5ED39962@tiscali.it> Good job. Thanks Andrea -- Antonio Valentino > Il giorno 04/ago/2015, alle ore 02:38, Andrea Bedini ha scritto: > > =========================== > Announcing PyTables 3.2.1 > =========================== > > We are happy to announce PyTables 3.2.1. > > > What's new > ========== > > This is a bug fix release. It contains a fix for a segv fault in > indexesextension.keysort(). > > In case you want to know more in detail what has changed in this > version, please refer to: http://www.pytables.org/release_notes.html > > For an online version of the manual, visit: > http://www.pytables.org/usersguide/index.html > > > What it is? > =========== > > PyTables is a library for managing hierarchical datasets and > designed to efficiently cope with extremely large amounts of data with > support for full 64-bit file addressing. PyTables runs on top of > the HDF5 library and NumPy package for achieving maximum throughput and > convenient use. PyTables includes OPSI, a new indexing technology, > allowing to perform data lookups in tables exceeding 10 gigarows > (10**10 rows) in less than a tenth of a second. > > > Resources > ========= > > About PyTables: http://www.pytables.org > > About the HDF5 library: http://hdfgroup.org/HDF5/ > > About NumPy: http://numpy.scipy.org/ > > > Acknowledgments > =============== > > Thanks to many users who provided feature improvements, patches, bug > reports, support and suggestions. See the ``THANKS`` file in the > distribution package for a (incomplete) list of contributors. Most > specially, a lot of kudos go to the HDF5 and NumPy makers. > Without them, PyTables simply would not exist. > > > Share your experience > ===================== > > Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. > > > ---- > > **Enjoy data!** > > -- The PyTables Developers > > > -- > You received this message because you are subscribed to the Google Groups "pytables-dev" group. > To unsubscribe from this group and stop receiving emails from it, send an email to pytables-dev+unsubscribe at googlegroups.com. > Visit this group at http://groups.google.com/group/pytables-dev. > For more options, visit https://groups.google.com/d/optout. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 842 bytes Desc: Message signed with OpenPGP using GPGMail URL: From sebastian at sipsolutions.net Tue Aug 4 04:39:56 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 04 Aug 2015 10:39:56 +0200 Subject: [Numpy-discussion] Proposal: Deprecate np.int, np.float, etc.? In-Reply-To: References: <55B25F1A.70107@googlemail.com> <55BB2611.10003@googlemail.com> Message-ID: <1438677596.2070.6.camel@sipsolutions.net> On Mo, 2015-08-03 at 21:32 +0200, Sturla Molden wrote: > On 03/08/15 20:51, Chris Barker wrote: > > > well, IIUC, np.int is the python integer type, which is > > a C long in all the implemtations of cPython that I know about -- but is > > that a guarantee?in the future as well? > > It is a Python int on Python 2. > > On Python 3 dtype=np.int means the dtype will be C long, because a > Python int has no size limit. But np.int aliases Python int. And > creating an array with dype=int therefore does not create an array of > Python int, it creates an array of C long. To actually get dtype=int we > have to write dtype=object, which is just crazy. > Since it seemes there may be a few half truths flying around in this thread. See http://docs.scipy.org/doc/numpy/user/basics.types.html and also note the sentence below the table (maybe the table should also note these): Additionally to intc the platform dependent C integer types short, long, longlong and their unsigned versions are defined. - Sebastian > > Sturla > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From sebastian at sipsolutions.net Tue Aug 4 04:41:49 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 04 Aug 2015 10:41:49 +0200 Subject: [Numpy-discussion] Proposal: Deprecate np.int, np.float, etc.? In-Reply-To: References: <55B25F1A.70107@googlemail.com> <55BB2611.10003@googlemail.com> Message-ID: <1438677709.2070.7.camel@sipsolutions.net> On Mo, 2015-08-03 at 21:32 +0200, Sturla Molden wrote: > On 03/08/15 20:51, Chris Barker wrote: > > > well, IIUC, np.int is the python integer type, which is > > a C long in all the implemtations of cPython that I know about -- but is > > that a guarantee?in the future as well? > > It is a Python int on Python 2. > > On Python 3 dtype=np.int means the dtype will be C long, because a > Python int has no size limit. But np.int aliases Python int. And > creating an array with dype=int therefore does not create an array of > Python int, it creates an array of C long. To actually get dtype=int we > have to write dtype=object, which is just crazy. > PS: I guess longdouble/complexlongdouble (and its floatXXX variants) are missing. And it might be a good place to note that floatXXX is not IEEE floatXXX. > > Sturla > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From josef.pktd at gmail.com Tue Aug 4 05:57:40 2015 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 4 Aug 2015 05:57:40 -0400 Subject: [Numpy-discussion] Proposal: Deprecate np.int, np.float, etc.? In-Reply-To: <1438677596.2070.6.camel@sipsolutions.net> References: <55B25F1A.70107@googlemail.com> <55BB2611.10003@googlemail.com> <1438677596.2070.6.camel@sipsolutions.net> Message-ID: On Tue, Aug 4, 2015 at 4:39 AM, Sebastian Berg wrote: > On Mo, 2015-08-03 at 21:32 +0200, Sturla Molden wrote: > > On 03/08/15 20:51, Chris Barker wrote: > > > > > well, IIUC, np.int is the python integer type, which > is > > > a C long in all the implemtations of cPython that I know about -- but > is > > > that a guarantee?in the future as well? > > > > It is a Python int on Python 2. > > > > On Python 3 dtype=np.int means the dtype will be C long, because a > > Python int has no size limit. But np.int aliases Python int. And > > creating an array with dype=int therefore does not create an array of > > Python int, it creates an array of C long. To actually get dtype=int we > > have to write dtype=object, which is just crazy. > > > > Since it seemes there may be a few half truths flying around in this > thread. See http://docs.scipy.org/doc/numpy/user/basics.types.html Quote: "Note that, above, we use the *Python* float object as a dtype. NumPy knows that int refers to np.int_, bool meansnp.bool_, that float is np.float_ and complex is np.complex_. The other data-types do not have Python equivalents." Is there a conflict with the current thread? Josef (I'm not a C person, so most of this is outside my scope, except for watching bugfixes to make older code work for larger datasets. Use `intp`, Luke.) > > > and also note the sentence below the table (maybe the table should also > note these): > > Additionally to intc the platform dependent C integer types short, long, > longlong and their unsigned versions are defined. > > - Sebastian > > > > > Sturla > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Tue Aug 4 06:20:57 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 04 Aug 2015 12:20:57 +0200 Subject: [Numpy-discussion] Proposal: Deprecate np.int, np.float, etc.? In-Reply-To: References: <55B25F1A.70107@googlemail.com> <55BB2611.10003@googlemail.com> <1438677596.2070.6.camel@sipsolutions.net> Message-ID: <1438683657.2070.12.camel@sipsolutions.net> On Di, 2015-08-04 at 05:57 -0400, josef.pktd at gmail.com wrote: > > > On Tue, Aug 4, 2015 at 4:39 AM, Sebastian Berg > wrote: > On Mo, 2015-08-03 at 21:32 +0200, Sturla Molden wrote: > > On 03/08/15 20:51, Chris Barker wrote: > > > > > well, IIUC, np.int is the python integer > type, which is > > > a C long in all the implemtations of cPython that I know > about -- but is > > > that a guarantee?in the future as well? > > > > It is a Python int on Python 2. > > > > On Python 3 dtype=np.int means the dtype will be C long, > because a > > Python int has no size limit. But np.int aliases Python int. > And > > creating an array with dype=int therefore does not create an > array of > > Python int, it creates an array of C long. To actually get > dtype=int we > > have to write dtype=object, which is just crazy. > > > > Since it seemes there may be a few half truths flying around > in this > thread. See > http://docs.scipy.org/doc/numpy/user/basics.types.html > > > > > Quote: > > > "Note that, above, we use the Python float object as a dtype. NumPy > knows that int refers to np.int_, bool meansnp.bool_, > that float is np.float_ and complex is np.complex_. The other > data-types do not have Python equivalents." > > > Is there a conflict with the current thread? > No, but I had the impression that the C compatible type names "short", "cint", "long", etc. where forgotten. > > Josef > > (I'm not a C person, so most of this is outside my scope, except for > watching bugfixes to make older code work for larger datasets. Use > `intp`, Luke.) > > > > and also note the sentence below the table (maybe the table > should also > note these): > > Additionally to intc the platform dependent C integer types > short, long, > longlong and their unsigned versions are defined. > > - Sebastian > > > > > Sturla > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From scopatz at gmail.com Tue Aug 4 10:13:03 2015 From: scopatz at gmail.com (Anthony Scopatz) Date: Tue, 04 Aug 2015 14:13:03 +0000 Subject: [Numpy-discussion] [pytables-dev] ANN: PyTables 3.2.1 released In-Reply-To: <40397670-4BC4-4272-A5D9-D24B3005F896@andreabedini.com> References: <40397670-4BC4-4272-A5D9-D24B3005F896@andreabedini.com> Message-ID: Congrats! On Mon, Aug 3, 2015 at 7:38 PM Andrea Bedini wrote: > =========================== > Announcing PyTables 3.2.1 > =========================== > > We are happy to announce PyTables 3.2.1. > > > What's new > ========== > > This is a bug fix release. It contains a fix for a segv fault in > indexesextension.keysort(). > > In case you want to know more in detail what has changed in this > version, please refer to: http://www.pytables.org/release_notes.html > > For an online version of the manual, visit: > http://www.pytables.org/usersguide/index.html > > > What it is? > =========== > > PyTables is a library for managing hierarchical datasets and > designed to efficiently cope with extremely large amounts of data with > support for full 64-bit file addressing. PyTables runs on top of > the HDF5 library and NumPy package for achieving maximum throughput and > convenient use. PyTables includes OPSI, a new indexing technology, > allowing to perform data lookups in tables exceeding 10 gigarows > (10**10 rows) in less than a tenth of a second. > > > Resources > ========= > > About PyTables: http://www.pytables.org > > About the HDF5 library: http://hdfgroup.org/HDF5/ > > About NumPy: http://numpy.scipy.org/ > > > Acknowledgments > =============== > > Thanks to many users who provided feature improvements, patches, bug > reports, support and suggestions. See the ``THANKS`` file in the > distribution package for a (incomplete) list of contributors. Most > specially, a lot of kudos go to the HDF5 and NumPy makers. > Without them, PyTables simply would not exist. > > > Share your experience > ===================== > > Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. > > > ---- > > **Enjoy data!** > > -- The PyTables Developers > > > -- > You received this message because you are subscribed to the Google Groups > "pytables-dev" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to pytables-dev+unsubscribe at googlegroups.com. > Visit this group at http://groups.google.com/group/pytables-dev. > For more options, visit https://groups.google.com/d/optout. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu Aug 6 18:17:08 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 6 Aug 2015 16:17:08 -0600 Subject: [Numpy-discussion] Numpy-vendor vcvarsall.bat problem. Message-ID: Anyone know how to fix this? I've run into it before and never got it figured out. [192.168.121.189:22] out: File "C:\Python34\lib\distutils\msvc9compiler.py", line 259, in query_vcvarsall [192.168.121.189:22] out: [192.168.121.189:22] out: raise DistutilsPlatformError("Unable to find vcvarsall.bat") [192.168.121.189:22] out: [192.168.121.189:22] out: distutils.errors.DistutilsPlatformError: Unable to find vcvarsall.bat Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Thu Aug 6 18:22:47 2015 From: cournape at gmail.com (David Cournapeau) Date: Thu, 6 Aug 2015 23:22:47 +0100 Subject: [Numpy-discussion] Numpy-vendor vcvarsall.bat problem. In-Reply-To: References: Message-ID: Sorry if that's obvious, but do you have Visual Studio 2010 installed ? On Thu, Aug 6, 2015 at 11:17 PM, Charles R Harris wrote: > Anyone know how to fix this? I've run into it before and never got it > figured out. > > [192.168.121.189:22] out: File > "C:\Python34\lib\distutils\msvc9compiler.py", line 259, in query_vcvarsall > [192.168.121.189:22] out: > [192.168.121.189:22] out: raise DistutilsPlatformError("Unable to > find vcvarsall.bat") > [192.168.121.189:22] out: > [192.168.121.189:22] out: distutils.errors.DistutilsPlatformError: Unable > to find vcvarsall.bat > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu Aug 6 19:11:01 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 6 Aug 2015 17:11:01 -0600 Subject: [Numpy-discussion] Numpy-vendor vcvarsall.bat problem. In-Reply-To: References: Message-ID: On Thu, Aug 6, 2015 at 4:22 PM, David Cournapeau wrote: > Sorry if that's obvious, but do you have Visual Studio 2010 installed ? > > On Thu, Aug 6, 2015 at 11:17 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> Anyone know how to fix this? I've run into it before and never got it >> figured out. >> >> [192.168.121.189:22] out: File >> "C:\Python34\lib\distutils\msvc9compiler.py", line 259, in query_vcvarsall >> [192.168.121.189:22] out: >> [192.168.121.189:22] out: raise DistutilsPlatformError("Unable to >> find vcvarsall.bat") >> [192.168.121.189:22] out: >> [192.168.121.189:22] out: distutils.errors.DistutilsPlatformError: >> Unable to find vcvarsall.bat >> >> Chuck >> >> >> I'm running numpy-vendor, which is running wine. I think it is all mingw with a few installed dll's. The error is coming from the Python distutils as part of `has_cblas`. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu Aug 6 19:19:07 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 6 Aug 2015 17:19:07 -0600 Subject: [Numpy-discussion] Numpy-vendor vcvarsall.bat problem. In-Reply-To: References: Message-ID: On Thu, Aug 6, 2015 at 5:11 PM, Charles R Harris wrote: > > > On Thu, Aug 6, 2015 at 4:22 PM, David Cournapeau > wrote: > >> Sorry if that's obvious, but do you have Visual Studio 2010 installed ? >> >> On Thu, Aug 6, 2015 at 11:17 PM, Charles R Harris < >> charlesr.harris at gmail.com> wrote: >> >>> Anyone know how to fix this? I've run into it before and never got it >>> figured out. >>> >>> [192.168.121.189:22] out: File >>> "C:\Python34\lib\distutils\msvc9compiler.py", line 259, in query_vcvarsall >>> [192.168.121.189:22] out: >>> [192.168.121.189:22] out: raise DistutilsPlatformError("Unable to >>> find vcvarsall.bat") >>> [192.168.121.189:22] out: >>> [192.168.121.189:22] out: distutils.errors.DistutilsPlatformError: >>> Unable to find vcvarsall.bat >>> >>> Chuck >>> >>> >>> > I'm running numpy-vendor, which is running wine. I think it is all mingw > with a few installed dll's. The error is coming from the Python distutils > as part of `has_cblas`. > > It's not impossible that we have changed the build somewhere along the line. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu Aug 6 20:44:02 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 6 Aug 2015 18:44:02 -0600 Subject: [Numpy-discussion] numpy-vendor cythonize problem Message-ID: I note that current numpy-vendor fails to cythonize in windows builds. Cython is installed, but I assume it needs to also be installed in each of the python versions in wine. Because the need to cythonize was already present in 1.9, I assume that the problem has been solved but the solution is not present in numpy-vendor in the numpy repos. Julian, do you have a solution for that? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From allanhaldane at gmail.com Thu Aug 6 22:59:15 2015 From: allanhaldane at gmail.com (Allan Haldane) Date: Thu, 06 Aug 2015 22:59:15 -0400 Subject: [Numpy-discussion] improving structured array assignment Message-ID: <55C41F03.2060706@gmail.com> Hello all, I've written up a tentative PR which tidies up structured array assignment, https://github.com/numpy/numpy/pull/6053 It has a backward incompatible change which I'd especially like to get some feedback on: Structure assignment now always works "by field position" instead of "by field name". Consider the following assignment: >>> v1 = np.array([(1,2,3)], ... dtype=[('a', 'i4'), ('b', 'i4'), ('c', 'i4')]) >>> v2 = np.array([(4,5,6)], ... dtype=[('b', 'i4'), ('a', 'i4'), ('c', 'i4')]) >>> v1[:] = v2 Previously, v1 would be set to "(5,4,6)" but with the PR it is set to "(4,5,6)". This might seem like negligible improvement, but assignment "by field name" has lots of inconsistent/broken edge cases which I've listed in the PR, which disappear with assignment "by field position". The PR doesn't seem to break much of anything in scipy, pandas, and astropy. If possible, I'd like to try getting a deprecation warning for this change into 1.10. I also changed a few more minor things about structure assignment, expanded the docs on structured arrays, and made a multi-field index (arr[['f1', 'f0']]) return a view instead of a copy, which had been planned for 1.10 but didn't get in because of the strange behavior of structure assignment. Allan From ralf.gommers at gmail.com Fri Aug 7 02:38:56 2015 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Fri, 7 Aug 2015 08:38:56 +0200 Subject: [Numpy-discussion] numpy-vendor cythonize problem In-Reply-To: References: Message-ID: On Fri, Aug 7, 2015 at 2:44 AM, Charles R Harris wrote: > I note that current numpy-vendor fails to cythonize in windows builds. > Cython is installed, but I assume it needs to also be installed in each of > the python versions in wine. Because the need to cythonize was already > present in 1.9, I assume that the problem has been solved but the solution > is not present in numpy-vendor in the numpy repos. > It's easy to work around by running cythonize in the Linux env you're using numpy-vendor in, then the files don't need to be generated in the Windows build. I think I've done that before. You only need one Windows Cython installed if the "cython" script is found. If not, you go to this except clause which indeed needs a Cython for every Python version: https://github.com/numpy/numpy/commit/dd220014373f A change similar to the f2py fix in https://github.com/numpy/numpy/commit/dd220014373f will likely fix it. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From srean.list at gmail.com Fri Aug 7 03:44:51 2015 From: srean.list at gmail.com (srean) Date: Fri, 7 Aug 2015 13:14:51 +0530 Subject: [Numpy-discussion] Shared memory check on in-place modification. In-Reply-To: References: Message-ID: Wait, when assignments and slicing mix wasn't the behavior supposed to be equivalent to copying the RHS to a temporary and then assigning using the temporary. Is that a false memory ? Or has the behavior changed ? As long as the behavior is well defined and succinct it should be ok On Tuesday, July 28, 2015, Sebastian Berg wrote: > > On Mon Jul 27 22:51:52 2015 GMT+0200, Sturla Molden wrote: > > On 27/07/15 22:10, Anton Akhmerov wrote: > > > Hi everyone, > > > > > > I have encountered an initially rather confusing problem in a piece of > > > code that attempted to symmetrize a matrix: `h += h.T` > > > The problem of course appears due to `h.T` being a view of `h`, and > > > some elements being overwritten during the __iadd__ call. > > > > I think the typical proposal is to raise a warning. Note there is > np.may_share_memoty. But the logic to give the warning is possibly not > quite easy, since this is ok to use sometimes. If someone figures it out > (mostly) I would be very happy zo see such warnings. > > > > Here is another example > > > > >>> a = np.ones(10) > > >>> a[1:] += a[:-1] > > >>> a > > array([ 1., 2., 3., 2., 3., 2., 3., 2., 3., 2.]) > > > > I am not sure I totally dislike this behavior. If it could be made > > constent it could be used to vectorize recursive algorithms. In the case > > above I would prefer the output to be: > > > > array([ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.]) > > > > It does not happen because we do not enforce that the result of one > > operation is stored before the next two operands are read. The only way > > to speed up recursive equations today is to use compiled code. > > > > > > Sturla > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Fri Aug 7 04:08:48 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Fri, 07 Aug 2015 10:08:48 +0200 Subject: [Numpy-discussion] Shared memory check on in-place modification. In-Reply-To: References: Message-ID: <1438934928.18578.87.camel@sipsolutions.net> On Fr, 2015-08-07 at 13:14 +0530, srean wrote: > Wait, when assignments and slicing mix wasn't the behavior supposed to > be equivalent to copying the RHS to a temporary and then assigning > using the temporary. Is that a false memory ? Or has the behavior > changed ? As long as the behavior is well defined and succinct it > should be ok > No, NumPy has never done that as far as I know. And since SIMD instructions etc. make this even less predictable (you used to be able to abuse in-place logic, even if usually the same can be done with ufunc.accumulate so it was a bad idea anyway), you have to avoid it. Pauli is working currently on implementing the logic needed to find if such a copy is necessary [1] which is very cool indeed. So I think it is likely we will such copy logic in NumPy 1.11. - Sebastian [1] See https://github.com/numpy/numpy/pull/6166 it is not an easy problem. > On Tuesday, July 28, 2015, Sebastian Berg > wrote: > > > > On Mon Jul 27 22:51:52 2015 GMT+0200, Sturla Molden wrote: > > On 27/07/15 22:10, Anton Akhmerov wrote: > > > Hi everyone, > > > > > > I have encountered an initially rather confusing problem > in a piece of > > > code that attempted to symmetrize a matrix: `h += h.T` > > > The problem of course appears due to `h.T` being a view of > `h`, and > > > some elements being overwritten during the __iadd__ call. > > > > I think the typical proposal is to raise a warning. Note there > is np.may_share_memoty. But the logic to give the warning is > possibly not quite easy, since this is ok to use sometimes. If > someone figures it out (mostly) I would be very happy zo see > such warnings. > > > > Here is another example > > > > >>> a = np.ones(10) > > >>> a[1:] += a[:-1] > > >>> a > > array([ 1., 2., 3., 2., 3., 2., 3., 2., 3., 2.]) > > > > I am not sure I totally dislike this behavior. If it could > be made > > constent it could be used to vectorize recursive algorithms. > In the case > > above I would prefer the output to be: > > > > array([ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.]) > > > > It does not happen because we do not enforce that the result > of one > > operation is stored before the next two operands are read. > The only way > > to speed up recursive equations today is to use compiled > code. > > > > > > Sturla > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From cournape at gmail.com Fri Aug 7 05:33:21 2015 From: cournape at gmail.com (David Cournapeau) Date: Fri, 7 Aug 2015 10:33:21 +0100 Subject: [Numpy-discussion] Numpy-vendor vcvarsall.bat problem. In-Reply-To: References: Message-ID: Which command exactly did you run to have that error ? Normally, the code in msvc9compiler should not be called if you call the setup.py with the mingw compiler as expected by distutils On Fri, Aug 7, 2015 at 12:19 AM, Charles R Harris wrote: > > > On Thu, Aug 6, 2015 at 5:11 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Thu, Aug 6, 2015 at 4:22 PM, David Cournapeau >> wrote: >> >>> Sorry if that's obvious, but do you have Visual Studio 2010 installed ? >>> >>> On Thu, Aug 6, 2015 at 11:17 PM, Charles R Harris < >>> charlesr.harris at gmail.com> wrote: >>> >>>> Anyone know how to fix this? I've run into it before and never got it >>>> figured out. >>>> >>>> [192.168.121.189:22] out: File >>>> "C:\Python34\lib\distutils\msvc9compiler.py", line 259, in query_vcvarsall >>>> [192.168.121.189:22] out: >>>> [192.168.121.189:22] out: raise DistutilsPlatformError("Unable to >>>> find vcvarsall.bat") >>>> [192.168.121.189:22] out: >>>> [192.168.121.189:22] out: distutils.errors.DistutilsPlatformError: >>>> Unable to find vcvarsall.bat >>>> >>>> Chuck >>>> >>>> >>>> >> I'm running numpy-vendor, which is running wine. I think it is all mingw >> with a few installed dll's. The error is coming from the Python distutils >> as part of `has_cblas`. >> >> > It's not impossible that we have changed the build somewhere along the > line. > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From srean.list at gmail.com Fri Aug 7 06:18:27 2015 From: srean.list at gmail.com (srean) Date: Fri, 7 Aug 2015 15:48:27 +0530 Subject: [Numpy-discussion] Shared memory check on in-place modification. In-Reply-To: <1438934928.18578.87.camel@sipsolutions.net> References: <1438934928.18578.87.camel@sipsolutions.net> Message-ID: I got_misled_by (extrapolated erroneously from) this description of temporaries in the documentation http://docs.scipy.org/doc/numpy/user/basics.indexing.html#assigning-values-to-indexed-arrays ,,,])]" ... new array is extracted from the original (as a temporary) containing the values at 1, 1, 3, 1, then the value 1 is added to the temporary, and then the temporary is assigned back to the original array. Thus the value of the array at x[1]+1 is assigned to x[1] three times, rather than being incremented 3 times." It is talking about a slightly different scenario of course, the temporary corresponds to the LHS. Anyhow, as long as the behavior is defined rigorously it should not be a problem. Now, I vaguely remember abusing ufuncs and aliasing in interactive sessions for some weird cumsum like operations (I plead bashfully guilty). On Fri, Aug 7, 2015 at 1:38 PM, Sebastian Berg wrote: > On Fr, 2015-08-07 at 13:14 +0530, srean wrote: > > Wait, when assignments and slicing mix wasn't the behavior supposed to > > be equivalent to copying the RHS to a temporary and then assigning > > using the temporary. Is that a false memory ? Or has the behavior > > changed ? As long as the behavior is well defined and succinct it > > should be ok > > > > No, NumPy has never done that as far as I know. And since SIMD > instructions etc. make this even less predictable (you used to be able > to abuse in-place logic, even if usually the same can be done with > ufunc.accumulate so it was a bad idea anyway), you have to avoid it. > > Pauli is working currently on implementing the logic needed to find if > such a copy is necessary [1] which is very cool indeed. So I think it is > likely we will such copy logic in NumPy 1.11. > > - Sebastian > > > [1] See https://github.com/numpy/numpy/pull/6166 it is not an easy > problem. > > > > On Tuesday, July 28, 2015, Sebastian Berg > > wrote: > > > > > > > > On Mon Jul 27 22:51:52 2015 GMT+0200, Sturla Molden wrote: > > > On 27/07/15 22:10, Anton Akhmerov wrote: > > > > Hi everyone, > > > > > > > > I have encountered an initially rather confusing problem > > in a piece of > > > > code that attempted to symmetrize a matrix: `h += h.T` > > > > The problem of course appears due to `h.T` being a view of > > `h`, and > > > > some elements being overwritten during the __iadd__ call. > > > > > > > I think the typical proposal is to raise a warning. Note there > > is np.may_share_memoty. But the logic to give the warning is > > possibly not quite easy, since this is ok to use sometimes. If > > someone figures it out (mostly) I would be very happy zo see > > such warnings. > > > > > > > Here is another example > > > > > > >>> a = np.ones(10) > > > >>> a[1:] += a[:-1] > > > >>> a > > > array([ 1., 2., 3., 2., 3., 2., 3., 2., 3., 2.]) > > > > > > I am not sure I totally dislike this behavior. If it could > > be made > > > constent it could be used to vectorize recursive algorithms. > > In the case > > > above I would prefer the output to be: > > > > > > array([ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.]) > > > > > > It does not happen because we do not enforce that the result > > of one > > > operation is stored before the next two operands are read. > > The only way > > > to speed up recursive equations today is to use compiled > > code. > > > > > > > > > Sturla > > > > > > > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at scipy.org > > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaime.frio at gmail.com Fri Aug 7 09:29:36 2015 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Fri, 7 Aug 2015 06:29:36 -0700 Subject: [Numpy-discussion] Numpy-vendor vcvarsall.bat problem. In-Reply-To: References: Message-ID: On Fri, Aug 7, 2015 at 2:33 AM, David Cournapeau wrote: > Which command exactly did you run to have that error ? Normally, the code > in msvc9compiler should not be called if you call the setup.py with the > mingw compiler as expected by distutils > FWIW, the incantation that works for me to compile numpy on Windows with mingw is: python setup.py config --compiler=mingw32 build --compiler=mingw32 install but I am not sure I have ever tried it with Python 3. I think my source for this was: http://nipy.sourceforge.net/nipy/devel/devel/install/windows_scipy_build.html Jaime > > On Fri, Aug 7, 2015 at 12:19 AM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Thu, Aug 6, 2015 at 5:11 PM, Charles R Harris < >> charlesr.harris at gmail.com> wrote: >> >>> >>> >>> On Thu, Aug 6, 2015 at 4:22 PM, David Cournapeau >>> wrote: >>> >>>> Sorry if that's obvious, but do you have Visual Studio 2010 installed ? >>>> >>>> On Thu, Aug 6, 2015 at 11:17 PM, Charles R Harris < >>>> charlesr.harris at gmail.com> wrote: >>>> >>>>> Anyone know how to fix this? I've run into it before and never got it >>>>> figured out. >>>>> >>>>> [192.168.121.189:22] out: File >>>>> "C:\Python34\lib\distutils\msvc9compiler.py", line 259, in query_vcvarsall >>>>> [192.168.121.189:22] out: >>>>> [192.168.121.189:22] out: raise DistutilsPlatformError("Unable to >>>>> find vcvarsall.bat") >>>>> [192.168.121.189:22] out: >>>>> [192.168.121.189:22] out: distutils.errors.DistutilsPlatformError: >>>>> Unable to find vcvarsall.bat >>>>> >>>>> Chuck >>>>> >>>>> >>>>> >>> I'm running numpy-vendor, which is running wine. I think it is all mingw >>> with a few installed dll's. The error is coming from the Python distutils >>> as part of `has_cblas`. >>> >>> >> It's not impossible that we have changed the build somewhere along the >> line. >> >> Chuck >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Aug 7 10:02:53 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 7 Aug 2015 08:02:53 -0600 Subject: [Numpy-discussion] Numpy-vendor vcvarsall.bat problem. In-Reply-To: References: Message-ID: On Fri, Aug 7, 2015 at 3:33 AM, David Cournapeau wrote: > Which command exactly did you run to have that error ? Normally, the code > in msvc9compiler should not be called if you call the setup.py with the > mingw compiler as expected by distutils > I'm running numpy-vendor which is running wine inside ubuntu inside a vm. The relevant commands are run("rm -rf ../local") run("paver sdist") run("python setup.py install --prefix ../local") run("paver pdf") run("paver bdist_superpack -p 3.4") run("paver bdist_superpack -p 3.3") run("paver bdist_superpack -p 2.7") run("paver write_release_and_log") run("paver bdist_wininst_simple -p 2.7") run("paver bdist_wininst_simple -p 3.3") run("paver bdist_wininst_simple -p 3.4") Which don't look suspicious. I think we may have changed something in numpy/distutils, possibly as part of https://github.com/numpy/numpy/pull/6152 Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Aug 7 10:16:06 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 7 Aug 2015 08:16:06 -0600 Subject: [Numpy-discussion] Numpy-vendor vcvarsall.bat problem. In-Reply-To: References: Message-ID: On Fri, Aug 7, 2015 at 8:02 AM, Charles R Harris wrote: > > > On Fri, Aug 7, 2015 at 3:33 AM, David Cournapeau > wrote: > >> Which command exactly did you run to have that error ? Normally, the code >> in msvc9compiler should not be called if you call the setup.py with the >> mingw compiler as expected by distutils >> > > I'm running numpy-vendor which is running wine inside ubuntu inside a vm. > The relevant commands are > > run("rm -rf ../local") > run("paver sdist") > run("python setup.py install --prefix ../local") > run("paver pdf") > run("paver bdist_superpack -p 3.4") > run("paver bdist_superpack -p 3.3") > run("paver bdist_superpack -p 2.7") > run("paver write_release_and_log") > run("paver bdist_wininst_simple -p 2.7") > run("paver bdist_wininst_simple -p 3.3") > run("paver bdist_wininst_simple -p 3.4") > > Which don't look suspicious. I think we may have changed something in > numpy/distutils, possibly as part of > https://github.com/numpy/numpy/pull/6152 > Actually, looks like b6d0263239926e8b14ebc26a0d7b9469fa7866d4. Hmm..., strange. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Aug 7 11:15:58 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 7 Aug 2015 09:15:58 -0600 Subject: [Numpy-discussion] Numpy-vendor vcvarsall.bat problem. In-Reply-To: References: Message-ID: On Fri, Aug 7, 2015 at 8:16 AM, Charles R Harris wrote: > > > On Fri, Aug 7, 2015 at 8:02 AM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Fri, Aug 7, 2015 at 3:33 AM, David Cournapeau >> wrote: >> >>> Which command exactly did you run to have that error ? Normally, the >>> code in msvc9compiler should not be called if you call the setup.py with >>> the mingw compiler as expected by distutils >>> >> >> I'm running numpy-vendor which is running wine inside ubuntu inside a vm. >> The relevant commands are >> >> run("rm -rf ../local") >> run("paver sdist") >> run("python setup.py install --prefix ../local") >> run("paver pdf") >> run("paver bdist_superpack -p 3.4") >> run("paver bdist_superpack -p 3.3") >> run("paver bdist_superpack -p 2.7") >> run("paver write_release_and_log") >> run("paver bdist_wininst_simple -p 2.7") >> run("paver bdist_wininst_simple -p 3.3") >> run("paver bdist_wininst_simple -p 3.4") >> >> Which don't look suspicious. I think we may have changed something in >> numpy/distutils, possibly as part of >> https://github.com/numpy/numpy/pull/6152 >> > > Actually, looks like b6d0263239926e8b14ebc26a0d7b9469fa7866d4. Hmm..., > strange. > OK, that just leads to an earlier cythonize error because random.pyx changed, so not the root cause. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Aug 7 11:36:38 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 7 Aug 2015 09:36:38 -0600 Subject: [Numpy-discussion] Numpy-vendor vcvarsall.bat problem. In-Reply-To: References: Message-ID: So the problem comes from the has_cblas function def has_cblas(self): # primitive cblas check by looking for the header res = False c = distutils.ccompiler.new_compiler() tmpdir = tempfile.mkdtemp() s = """#include """ src = os.path.join(tmpdir, 'source.c') try: with open(src, 'wt') as f: f.write(s) try: c.compile([src], output_dir=tmpdir, include_dirs=self.get_include_dirs()) res = True except distutils.ccompiler.CompileError: res = False finally: shutil.rmtree(tmpdir) return res The problem is the test compile, which does not use the mingw compiler, but falls back to the compiler found in python distutils. Not sure what the fix is. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Aug 7 12:17:27 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 7 Aug 2015 10:17:27 -0600 Subject: [Numpy-discussion] Numpy-vendor vcvarsall.bat problem. In-Reply-To: References: Message-ID: On Fri, Aug 7, 2015 at 9:36 AM, Charles R Harris wrote: > So the problem comes from the has_cblas function > > def has_cblas(self): > # primitive cblas check by looking for the header > res = False > c = distutils.ccompiler.new_compiler() > tmpdir = tempfile.mkdtemp() > s = """#include """ > src = os.path.join(tmpdir, 'source.c') > try: > with open(src, 'wt') as f: > f.write(s) > try: > c.compile([src], output_dir=tmpdir, > include_dirs=self.get_include_dirs()) > res = True > except distutils.ccompiler.CompileError: > res = False > finally: > shutil.rmtree(tmpdir) > return res > > The problem is the test compile, which does not use the mingw compiler, > but falls back to the compiler found in python distutils. Not sure what the > fix is. > See #6175 . Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.v.root at gmail.com Mon Aug 10 12:09:13 2015 From: ben.v.root at gmail.com (Benjamin Root) Date: Mon, 10 Aug 2015 12:09:13 -0400 Subject: [Numpy-discussion] np.in1d() & sets, bug? Message-ID: Just came across this one today: >>> np.in1d([1], set([0, 1, 2]), assume_unique=True) array([ False], dtype=bool) >>> np.in1d([1], [0, 1, 2], assume_unique=True) array([ True], dtype=bool) I am assuming this has something to do with the fact that order is not guaranteed with set() objects? I was kind of hoping that setting "assume_unique=True" would be sufficient to overcome that problem. Should sets be rejected as an error? This was using v1.9.0 Cheers! Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Mon Aug 10 13:10:04 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Mon, 10 Aug 2015 19:10:04 +0200 Subject: [Numpy-discussion] np.in1d() & sets, bug? In-Reply-To: References: Message-ID: <1439226604.3041.8.camel@sipsolutions.net> On Mo, 2015-08-10 at 12:09 -0400, Benjamin Root wrote: > Just came across this one today: > > >>> np.in1d([1], set([0, 1, 2]), assume_unique=True) > array([ False], dtype=bool) > > >>> np.in1d([1], [0, 1, 2], assume_unique=True) > > array([ True], dtype=bool) > > > I am assuming this has something to do with the fact that order is not > guaranteed with set() objects? I was kind of hoping that setting > "assume_unique=True" would be sufficient to overcome that problem. > Should sets be rejected as an error? > Not really, it is "simply" because ``np.asarray(set([1, 2, 3]))`` returns an object array and 1 is not the same as ``set([1, 2, 3])``. I think earlier numpy versions may have had "short cuts" for short lists or something so this may have worked in some cases.... - Sebastian > > This was using v1.9.0 > > > Cheers! > > Ben Root > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From njs at pobox.com Mon Aug 10 13:38:18 2015 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 10 Aug 2015 10:38:18 -0700 Subject: [Numpy-discussion] np.in1d() & sets, bug? In-Reply-To: <1439226604.3041.8.camel@sipsolutions.net> References: <1439226604.3041.8.camel@sipsolutions.net> Message-ID: Another case where refusing to implicitly create object arrays would have avoided a lot of confusion... On Aug 10, 2015 10:13 AM, "Sebastian Berg" wrote: > On Mo, 2015-08-10 at 12:09 -0400, Benjamin Root wrote: > > Just came across this one today: > > > > >>> np.in1d([1], set([0, 1, 2]), assume_unique=True) > > array([ False], dtype=bool) > > > > >>> np.in1d([1], [0, 1, 2], assume_unique=True) > > > > array([ True], dtype=bool) > > > > > > I am assuming this has something to do with the fact that order is not > > guaranteed with set() objects? I was kind of hoping that setting > > "assume_unique=True" would be sufficient to overcome that problem. > > Should sets be rejected as an error? > > > > Not really, it is "simply" because ``np.asarray(set([1, 2, 3]))`` > returns an object array and 1 is not the same as ``set([1, 2, 3])``. > > I think earlier numpy versions may have had "short cuts" for short lists > or something so this may have worked in some cases.... > > - Sebastian > > > > > > This was using v1.9.0 > > > > > > Cheers! > > > > Ben Root > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Mon Aug 10 13:40:38 2015 From: ben.root at ou.edu (Benjamin Root) Date: Mon, 10 Aug 2015 13:40:38 -0400 Subject: [Numpy-discussion] np.in1d() & sets, bug? In-Reply-To: <1439226604.3041.8.camel@sipsolutions.net> References: <1439226604.3041.8.camel@sipsolutions.net> Message-ID: > Not really, it is "simply" because ``np.asarray(set([1, 2, 3]))`` > returns an object array Holy crap! To be pedantic, it looks like it turns it into a numpy scalar, but still! I wouldn't have expected np.asarray() on a set (or dictionary, for that matter) to work because order is not guaranteed. Is this expected behavior? Digging into the implementation of in1d(), I can see now how passing a set() wouldn't be useful at all (as an aside, pretty clever algorithm). I know sets aren't array-like, but the code that used this seemed to work at first, and this problem wasn't revealed until I created some unit tests to exercise some possible corner cases. Silently producing possibly erroneous results is dangerous. Don't know if better documentation or some better sanity checking would be called for here, though. Ben Root On Mon, Aug 10, 2015 at 1:10 PM, Sebastian Berg wrote: > On Mo, 2015-08-10 at 12:09 -0400, Benjamin Root wrote: > > Just came across this one today: > > > > >>> np.in1d([1], set([0, 1, 2]), assume_unique=True) > > array([ False], dtype=bool) > > > > >>> np.in1d([1], [0, 1, 2], assume_unique=True) > > > > array([ True], dtype=bool) > > > > > > I am assuming this has something to do with the fact that order is not > > guaranteed with set() objects? I was kind of hoping that setting > > "assume_unique=True" would be sufficient to overcome that problem. > > Should sets be rejected as an error? > > > > Not really, it is "simply" because ``np.asarray(set([1, 2, 3]))`` > returns an object array and 1 is not the same as ``set([1, 2, 3])``. > > I think earlier numpy versions may have had "short cuts" for short lists > or something so this may have worked in some cases.... > > - Sebastian > > > > > > This was using v1.9.0 > > > > > > Cheers! > > > > Ben Root > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Mon Aug 10 14:08:07 2015 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 10 Aug 2015 14:08:07 -0400 Subject: [Numpy-discussion] np.in1d() & sets, bug? In-Reply-To: References: <1439226604.3041.8.camel@sipsolutions.net> Message-ID: On Mon, Aug 10, 2015 at 1:40 PM, Benjamin Root wrote: > > Not really, it is "simply" because ``np.asarray(set([1, 2, 3]))`` > > returns an object array > > Holy crap! To be pedantic, it looks like it turns it into a numpy scalar, > but still! I wouldn't have expected np.asarray() on a set (or dictionary, > for that matter) to work because order is not guaranteed. Is this expected > behavior? > > Digging into the implementation of in1d(), I can see now how passing a > set() wouldn't be useful at all (as an aside, pretty clever algorithm). I > know sets aren't array-like, but the code that used this seemed to work at > first, and this problem wasn't revealed until I created some unit tests to > exercise some possible corner cases. Silently producing possibly erroneous > results is dangerous. Don't know if better documentation or some better > sanity checking would be called for here, though. > > Ben Root > > > On Mon, Aug 10, 2015 at 1:10 PM, Sebastian Berg < > sebastian at sipsolutions.net> wrote: > >> On Mo, 2015-08-10 at 12:09 -0400, Benjamin Root wrote: >> > Just came across this one today: >> > >> > >>> np.in1d([1], set([0, 1, 2]), assume_unique=True) >> > array([ False], dtype=bool) >> > >> > >>> np.in1d([1], [0, 1, 2], assume_unique=True) >> > >> > array([ True], dtype=bool) >> > >> > >> > I am assuming this has something to do with the fact that order is not >> > guaranteed with set() objects? I was kind of hoping that setting >> > "assume_unique=True" would be sufficient to overcome that problem. >> > Should sets be rejected as an error? >> > >> >> Not really, it is "simply" because ``np.asarray(set([1, 2, 3]))`` >> returns an object array and 1 is not the same as ``set([1, 2, 3])``. >> >> I think earlier numpy versions may have had "short cuts" for short lists >> or something so this may have worked in some cases.... >> > is it possible to get at least a UserWarning when creating an object array and dtype object hasn't been explicitly requested or underlying data is already in an object dtype? Josef > >> - Sebastian >> >> >> > >> > This was using v1.9.0 >> > >> > >> > Cheers! >> > >> > Ben Root >> > >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at scipy.org >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon Aug 10 18:34:59 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 10 Aug 2015 16:34:59 -0600 Subject: [Numpy-discussion] mingw32 and numpy 1.10 Message-ID: Mingw32 will not compile current numpy due to initialization of a static structure slot with a Python C-API function. The function is not considered a constant expression by the old gcc in mingw32. Compilation does work with more recent compilers; evidently the meaning of "constant expression" is up to the vendor. So, this is fixable if we initialize the slot with 0, but that loses some precision/functionality. The question is, do we want to support mingw32, and numpy-vendor as well, for numpy 1.10.0? I think the answer is probably "yes", but we may want to reconsider for numpy 1.11, when we may want to use Carl's mingw64 toolchain instead. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Mon Aug 10 18:53:46 2015 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 10 Aug 2015 15:53:46 -0700 Subject: [Numpy-discussion] mingw32 and numpy 1.10 In-Reply-To: References: Message-ID: On Aug 10, 2015 3:38 PM, "Charles R Harris" wrote: > > Mingw32 will not compile current numpy due to initialization of a static structure slot with a Python C-API function. The function is not considered a constant expression by the old gcc in mingw32. Compilation does work with more recent compilers; evidently the meaning of "constant expression" is up to the vendor. I think in this particular case, we should be able to fill in the slot with an assignment just before calling PyType_Ready? > So, this is fixable if we initialize the slot with 0, but that loses some precision/functionality. The question is, do we want to support mingw32, and numpy-vendor as well, for numpy 1.10.0? I think the answer is probably "yes", but we may want to reconsider for numpy 1.11, when we may want to use Carl's mingw64 toolchain instead. While it's obviously not what we want to do in the long run, if these problems turn out to be intractable then yeah, IMO it wouldn't be the end of the world to temporarily give up on providing the sourceforge win32 downloads, given that we already don't provide win64, the current win32 build strategy is almost certainly a dead end going forward, and that win32 and win64 builds are widely available elsewhere. Esp. since time spent trying to keep our win32 builds limping along both delays the release for everyone and wastes time that you could probably find other things to do with... I'm not sure this particular problem is the tipping point, but it's a calculation we should keep in mind. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon Aug 10 19:21:41 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 10 Aug 2015 17:21:41 -0600 Subject: [Numpy-discussion] mingw32 and numpy 1.10 In-Reply-To: References: Message-ID: On Mon, Aug 10, 2015 at 4:53 PM, Nathaniel Smith wrote: > On Aug 10, 2015 3:38 PM, "Charles R Harris" > wrote: > > > > Mingw32 will not compile current numpy due to initialization of a static > structure slot with a Python C-API function. The function is not considered > a constant expression by the old gcc in mingw32. Compilation does work with > more recent compilers; evidently the meaning of "constant expression" is up > to the vendor. > > I think in this particular case, we should be able to fill in the slot > with an assignment just before calling PyType_Ready? > > > So, this is fixable if we initialize the slot with 0, but that loses > some precision/functionality. The question is, do we want to support > mingw32, and numpy-vendor as well, for numpy 1.10.0? I think the answer is > probably "yes", but we may want to reconsider for numpy 1.11, when we may > want to use Carl's mingw64 toolchain instead. > > While it's obviously not what we want to do in the long run, if these > problems turn out to be intractable then yeah, IMO it wouldn't be the end > of the world to temporarily give up on providing the sourceforge win32 > downloads, given that we already don't provide win64, the current win32 > build strategy is almost certainly a dead end going forward, and that win32 > and win64 builds are widely available elsewhere. Esp. since time spent > trying to keep our win32 builds limping along both delays the release for > everyone and wastes time that you could probably find other things to do > with... I'm not sure this particular problem is the tipping point, but it's > a calculation we should keep in mind. > See https://github.com/numpy/numpy/pull/6190. I don't have a problem reinitializing the slot later if that looks like the best way to go. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From pieter.eendebak at gmail.com Tue Aug 11 04:36:23 2015 From: pieter.eendebak at gmail.com (Pieter Eendebak) Date: Tue, 11 Aug 2015 10:36:23 +0200 Subject: [Numpy-discussion] overhead in np.matrix Message-ID: The overhead of the np.matrix class is quite high for small matrices. See for example the following code: import time import math import numpy as np def rot2D(phi): c=math.cos(phi); return np.matrix(c) _b=np.matrix(np.zeros( (1,))) def rot2Dx(phi): global _b r=_b.copy() c=math.cos(phi); r.itemset(0, c) return r phi=.023 %timeit rot2D(phi) %timeit rot2Dx(phi) The second implementation performs much better by using a copy instead of a constructor. Is there a way to efficiency create a new np.matrix object? For other functions in my code I do not have the option to copy an existing matrix, but I need to construct a new object or perform a cast from np.array to np.matrix. I am already aware of two alternatives: - Using the new multiplication operator ( https://www.python.org/dev/peps/pep-0465/). This is a good solution, but only python 3.5 - Using the .dot functions from np.array. This works, but personally I like the notation using np.matrix much better. I also created an issue on github: https://github.com/numpy/numpy/issues/6186 With kind regards, Pieter Eendebak -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.klemm at intel.com Tue Aug 11 10:37:38 2015 From: michael.klemm at intel.com (Klemm, Michael) Date: Tue, 11 Aug 2015 14:37:38 +0000 Subject: [Numpy-discussion] ANN: pyMIC v0.6 Released Message-ID: <0DAB4B4FC42EAA41802458ADA9C2F82444DB2BBF@IRSMSX104.ger.corp.intel.com> Announcement: pyMIC v0.6 ========================= I'm happy to announce the release of pyMIC v0.6. pyMIC is a Python module to offload computation in a Python program to the Intel Xeon Phi coprocessor. It contains offloadable arrays and device management functions. It supports invocation of native kernels (C/C++, Fortran) and blends in with Numpy's array types for float, complex, and int data types. For more information and downloads please visit pyMIC's Github page: https://github.com/01org/pyMIC. You can find pyMIC's mailinglist at https://lists.01.org/mailman/listinfo/pymic. Full change log: ================= Version 0.6 ---------------------------- - Experimental support for the Windows operating system. - Switched to Cython to generate the glue code for pyMIC. - Now using Markdown for README and CHANGELOG. - Introduced PYMIC_DEBUG=3 to trace argument passing for kernels. - Bugfix: added back the translate_device_pointer() function. - Bugfix: example SVD now respects order of the passed matrices when applying the `dgemm` routine. - Bugfix: fixed memory leak when invoking kernels. - Bugfix: fixed broken translation of fake pointers. - Refactoring: simplified bridge between pyMIC and LIBXSTREAM. Version 0.5 ---------------------------- - Introduced new kernel API that avoids insane pointer unpacking. - pyMIC now uses libxstreams as the offload back-end (https://github.com/hfp/libxstream). - Added smart pointers to make handling of fake pointers easier. Version 0.4 ---------------------------- - New low-level API to allocate, deallocate, and transfer data (see OffloadStream). - Support for in-place binary operators. - New internal design to handle offloads. Version 0.3 ---------------------------- - Improved handling of libraries and kernel invocation. - Trace collection (PYMIC_TRACE=1, PYMIC_TRACE_STACKS={none,compact,full}). - Replaced the device-centric API with a stream API. - Refactoring to better match PEP8 recommendations. - Added support for int(int64) and complex(complex128) data types. - Reworked the benchmarks and examples to fit the new API. - Bugfix: fixed syntax errors in OffloadArray. Version 0.2 ---------------------------- - Small improvements to the README files. - New example: Singular Value Decomposition. - Some documentation for the API functions. - Added a basic testsuite for unit testing (WIP). - Bugfix: benchmarks now use the latest interface. - Bugfix: numpy.ndarray does not offer an attribute 'order'. - Bugfix: number_of_devices was not visible after import. - Bugfix: member offload_array.device is now initialized. - Bugfix: use exception for errors w/ invoke_kernel & load_library. Version 0.1 ---------------------------- Initial release. Dr.-Ing. Michael Klemm Senior Application Engineer Software and Services Group Developer Relations Division Phone +49 89 9914 2340 Cell +49 174 2417583 Intel Deutschland GmbH Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany Tel: +49 89 99 8853-0, www.intel.de Managing Directors: Christin Eisenschmid, Prof. Dr. Hermann Eul Chairperson of the Supervisory Board: Tiffany Doon Silva Registered Office: Munich Commercial Register: Amtsgericht Muenchen HRB 186928 From charlesr.harris at gmail.com Tue Aug 11 17:23:08 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 11 Aug 2015 15:23:08 -0600 Subject: [Numpy-discussion] ANN: Numpy 1.10.0b1 release Message-ID: Hi All, give this release a whirl and report any problems either on the numpy-discussion list or by opening an issue on github. I'm pleased to announce the first beta release of Numpy 1.10.0. There is over a year's worth of enhancements and bug fixes in the 1.10.0 release, so please give this release a whirl and report any problems either on the numpy-discussion list or by opening an issue on github. Tarballs, installers, and release notes may be found in the usual place at Sourceforge . Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From nathan12343 at gmail.com Tue Aug 11 17:44:45 2015 From: nathan12343 at gmail.com (Nathan Goldbaum) Date: Tue, 11 Aug 2015 16:44:45 -0500 Subject: [Numpy-discussion] ANN: Numpy 1.10.0b1 release In-Reply-To: References: Message-ID: Maybe this is just me, I tried to build the tarball I got from sourceforge in a fresh virtualenv on my mac and received the following error: clang: numpy/core/src/multiarray/buffer.c clang: src/multiarray/cblasfuncs.c clang: error: no such file or directory: 'src/multiarray/cblasfuncs.c' clang: error: no input files clang: error: no such file or directory: 'src/multiarray/cblasfuncs.c' clang: error: no input files Indeed, that file is not present in the tarball. My download of the tarball from sourceforge has sha1 hash 424bcee49507a655260068e767bd198094fc5604. It looks like the tarball from github ( https://github.com/numpy/numpy/archive/v1.10.0b1.tar.gz) has the needed file. On Tue, Aug 11, 2015 at 4:23 PM, Charles R Harris wrote: > Hi All, > > give this release a whirl and report any problems either on the > numpy-discussion list or by opening an issue on github. > I'm pleased to announce the first beta release of Numpy 1.10.0. There is > over a year's worth of enhancements and bug fixes in the 1.10.0 release, so > please give this release a whirl and report any problems either on the > numpy-discussion list or by opening an issue on github. Tarballs, > installers, and release notes may be found in the usual place at > Sourceforge > . > > Chuck > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue Aug 11 18:13:23 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 11 Aug 2015 16:13:23 -0600 Subject: [Numpy-discussion] ANN: Numpy 1.10.0b1 release In-Reply-To: References: Message-ID: On Tue, Aug 11, 2015 at 3:44 PM, Nathan Goldbaum wrote: > Maybe this is just me, I tried to build the tarball I got from sourceforge > in a fresh virtualenv on my mac and received the following error: > > clang: numpy/core/src/multiarray/buffer.c > clang: src/multiarray/cblasfuncs.c > clang: error: no such file or directory: 'src/multiarray/cblasfuncs.c' > clang: error: no input files > clang: error: no such file or directory: 'src/multiarray/cblasfuncs.c' > clang: error: no input files > > Indeed, that file is not present in the tarball. My download of the > tarball from sourceforge has sha1 > hash 424bcee49507a655260068e767bd198094fc5604. > > It looks like the tarball from github ( > https://github.com/numpy/numpy/archive/v1.10.0b1.tar.gz) has the needed > file. > Hmm, interesting. I'll take a look. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue Aug 11 18:22:13 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 11 Aug 2015 16:22:13 -0600 Subject: [Numpy-discussion] ANN: Numpy 1.10.0b1 release In-Reply-To: References: Message-ID: On Tue, Aug 11, 2015 at 4:13 PM, Charles R Harris wrote: > > > On Tue, Aug 11, 2015 at 3:44 PM, Nathan Goldbaum > wrote: > >> Maybe this is just me, I tried to build the tarball I got from >> sourceforge in a fresh virtualenv on my mac and received the following >> error: >> >> clang: numpy/core/src/multiarray/buffer.c >> clang: src/multiarray/cblasfuncs.c >> clang: error: no such file or directory: 'src/multiarray/cblasfuncs.c' >> clang: error: no input files >> clang: error: no such file or directory: 'src/multiarray/cblasfuncs.c' >> clang: error: no input files >> >> Indeed, that file is not present in the tarball. My download of the >> tarball from sourceforge has sha1 >> hash 424bcee49507a655260068e767bd198094fc5604. >> >> It looks like the tarball from github ( >> https://github.com/numpy/numpy/archive/v1.10.0b1.tar.gz) has the needed >> file. >> > > Hmm, interesting. I'll take a look. > I'm uploading replacements for the tar and zip files.. Not sure why numpy-vendor didn't do the job. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From sandro.tosi at gmail.com Tue Aug 11 18:32:47 2015 From: sandro.tosi at gmail.com (Sandro Tosi) Date: Tue, 11 Aug 2015 23:32:47 +0100 Subject: [Numpy-discussion] ANN: Numpy 1.10.0b1 release In-Reply-To: References: Message-ID: On Tue, Aug 11, 2015 at 11:22 PM, Charles R Harris wrote: > I'm uploading replacements for the tar and zip files.. Not sure why > numpy-vendor didn't do the job. if so then please upload a b2, so it will avoid confusion with those who have downloaded a b1-pre-fix and those with a b1-post-fix Regards, -- Sandro Tosi (aka morph, morpheus, matrixhasu) My website: http://matrixhasu.altervista.org/ Me at Debian: http://wiki.debian.org/SandroTosi From charlesr.harris at gmail.com Tue Aug 11 18:46:31 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 11 Aug 2015 16:46:31 -0600 Subject: [Numpy-discussion] ANN: Numpy 1.10.0b1 release In-Reply-To: References: Message-ID: On Tue, Aug 11, 2015 at 4:32 PM, Sandro Tosi wrote: > On Tue, Aug 11, 2015 at 11:22 PM, Charles R Harris > wrote: > > I'm uploading replacements for the tar and zip files.. Not sure why > > numpy-vendor didn't do the job. > > if so then please upload a b2, so it will avoid confusion with those > who have downloaded a b1-pre-fix and those with a b1-post-fix > > I think it was caught early enough that it won't be much of a problem. I expect there will be other problems that will require a beta 2 soon enough ;) Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From sandro.tosi at gmail.com Tue Aug 11 20:04:53 2015 From: sandro.tosi at gmail.com (Sandro Tosi) Date: Wed, 12 Aug 2015 01:04:53 +0100 Subject: [Numpy-discussion] ANN: Numpy 1.10.0b1 release In-Reply-To: References: Message-ID: On Tue, Aug 11, 2015 at 11:46 PM, Charles R Harris wrote: > On Tue, Aug 11, 2015 at 4:32 PM, Sandro Tosi wrote: >> >> On Tue, Aug 11, 2015 at 11:22 PM, Charles R Harris >> wrote: >> > I'm uploading replacements for the tar and zip files.. Not sure why >> > numpy-vendor didn't do the job. >> >> if so then please upload a b2, so it will avoid confusion with those >> who have downloaded a b1-pre-fix and those with a b1-post-fix >> > > I think it was caught early enough that it won't be much of a problem. I > expect there will be other problems that will require a beta 2 soon enough > ;) we can agree to disagree, but nevermind :) I gave it a quick build test on debian unstable amd64 and it's building correctly, I will finalize the package in the next few days and upload to the various weird architectures Debian supports and see where it fails. Regards, -- Sandro Tosi (aka morph, morpheus, matrixhasu) My website: http://matrixhasu.altervista.org/ Me at Debian: http://wiki.debian.org/SandroTosi From jensj at fysik.dtu.dk Wed Aug 12 03:41:57 2015 From: jensj at fysik.dtu.dk (=?windows-1252?Q?Jens_J=F8rgen_Mortensen?=) Date: Wed, 12 Aug 2015 09:41:57 +0200 Subject: [Numpy-discussion] [SciPy-Dev] ANN: Numpy 1.10.0b1 release In-Reply-To: References: Message-ID: <55CAF8C5.8060202@fysik.dtu.dk> On 08/11/2015 11:23 PM, Charles R Harris wrote: > Hi All, > > give this release a whirl and report any problems either on the > numpy-discussion list or by opening an issue on github. > I'm pleased to announce the first beta release of Numpy 1.10.0. There > is over a year's worth of enhancements and bug fixes in the 1.10.0 > release, so please give this release a whirl and report any problems > either on the numpy-discussion list or by opening an issue on github. > Tarballs, installers, and release notes may be found in the usual > place at Sourceforge > . This looks a bit strange: Python 2.7.9 (default, Apr 2 2015, 15:33:21) [GCC 4.9.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import numpy as np >>> np.zeros(1).strides (9223372036854775807,) >>> np.zeros(42).strides (8,) >>> np.__version__ '1.10.0b1' This is on Ubuntu 15.04. Jens J?rgen > > Chuck > > > > > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at scipy.org > http://mail.scipy.org/mailman/listinfo/scipy-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Wed Aug 12 03:51:39 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Wed, 12 Aug 2015 09:51:39 +0200 Subject: [Numpy-discussion] [SciPy-Dev] ANN: Numpy 1.10.0b1 release In-Reply-To: <55CAF8C5.8060202@fysik.dtu.dk> References: <55CAF8C5.8060202@fysik.dtu.dk> Message-ID: <1439365899.17032.7.camel@sipsolutions.net> On Mi, 2015-08-12 at 09:41 +0200, Jens J?rgen Mortensen wrote: > On 08/11/2015 11:23 PM, Charles R Harris wrote: > > Hi All, > > > > give this release a whirl and report any problems either on the > > numpy-discussion list or by opening an issue on github. > > > > I'm pleased to announce the first beta release of Numpy 1.10.0. > > There is over a year's worth of enhancements and bug fixes in the > > 1.10.0 release, so please give this release a whirl and report any > > problems either on the numpy-discussion list or by opening an issue > > on github. Tarballs, installers, and release notes may be found in > > the usual place at Sourceforge. > > > > This looks a bit strange: > It is intentional, it will not be the case in the final release. And thanks Chuck for all the release work! - Sebastian > Python 2.7.9 (default, Apr 2 2015, 15:33:21) > [GCC 4.9.2] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> import numpy as np > >>> np.zeros(1).strides > (9223372036854775807,) > >>> np.zeros(42).strides > (8,) > >>> np.__version__ > '1.10.0b1' > > This is on Ubuntu 15.04. > > Jens J?rgen > > > > > > > Chuck > > > > > > > > > > > > > > _______________________________________________ > > SciPy-Dev mailing list > > SciPy-Dev at scipy.org > > http://mail.scipy.org/mailman/listinfo/scipy-dev > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From njs at pobox.com Wed Aug 12 04:07:37 2015 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 12 Aug 2015 01:07:37 -0700 Subject: [Numpy-discussion] [SciPy-Dev] ANN: Numpy 1.10.0b1 release In-Reply-To: <1439365899.17032.7.camel@sipsolutions.net> References: <55CAF8C5.8060202@fysik.dtu.dk> <1439365899.17032.7.camel@sipsolutions.net> Message-ID: On Wed, Aug 12, 2015 at 12:51 AM, Sebastian Berg wrote: > On Mi, 2015-08-12 at 09:41 +0200, Jens J?rgen Mortensen wrote: >> On 08/11/2015 11:23 PM, Charles R Harris wrote: >> > Hi All, >> > >> > give this release a whirl and report any problems either on the >> > numpy-discussion list or by opening an issue on github. >> > >> > I'm pleased to announce the first beta release of Numpy 1.10.0. >> > There is over a year's worth of enhancements and bug fixes in the >> > 1.10.0 release, so please give this release a whirl and report any >> > problems either on the numpy-discussion list or by opening an issue >> > on github. Tarballs, installers, and release notes may be found in >> > the usual place at Sourceforge. >> > >> >> This looks a bit strange: >> >> Python 2.7.9 (default, Apr 2 2015, 15:33:21) >> [GCC 4.9.2] on linux2 >> Type "help", "copyright", "credits" or "license" for more information. >> >>> import numpy as np >> >>> np.zeros(1).strides >> (9223372036854775807,) >> >>> np.zeros(42).strides >> (8,) >> >>> np.__version__ >> '1.10.0b1' > > It is intentional, it will not be the case in the final release. Given how quickly this surprised someone, it looks like it would be helpful to have some single link we could give people to explain what's going on here. Do we have such a thing? In a few minutes of searching all I was able to find was http://docs.scipy.org/doc/numpy/release.html#npy-relaxed-strides-checking https://github.com/numpy/numpy/blob/master/doc/release/1.10.0-notes.rst#relaxed-stride-checking which together kinda sorta hint at what's going on if you squint, but not really? Maybe we should add a paragraph to the 1.10 release notes? > And thanks Chuck for all the release work! Indeed! -n -- Nathaniel J. Smith -- http://vorpus.org From sebastian at sipsolutions.net Wed Aug 12 07:23:46 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Wed, 12 Aug 2015 13:23:46 +0200 Subject: [Numpy-discussion] [SciPy-Dev] ANN: Numpy 1.10.0b1 release In-Reply-To: References: <55CAF8C5.8060202@fysik.dtu.dk> <1439365899.17032.7.camel@sipsolutions.net> Message-ID: <1439378626.17032.15.camel@sipsolutions.net> On Mi, 2015-08-12 at 01:07 -0700, Nathaniel Smith wrote: > On Wed, Aug 12, 2015 at 12:51 AM, Sebastian Berg > wrote: > > On Mi, 2015-08-12 at 09:41 +0200, Jens J?rgen Mortensen wrote: > >> On 08/11/2015 11:23 PM, Charles R Harris wrote: > >> > Hi All, > >> > > >> > give this release a whirl and report any problems either on the > >> > numpy-discussion list or by opening an issue on github. > >> > > >> > I'm pleased to announce the first beta release of Numpy 1.10.0. > >> > There is over a year's worth of enhancements and bug fixes in the > >> > 1.10.0 release, so please give this release a whirl and report any > >> > problems either on the numpy-discussion list or by opening an issue > >> > on github. Tarballs, installers, and release notes may be found in > >> > the usual place at Sourceforge. > >> > > >> > >> This looks a bit strange: > >> > >> Python 2.7.9 (default, Apr 2 2015, 15:33:21) > >> [GCC 4.9.2] on linux2 > >> Type "help", "copyright", "credits" or "license" for more information. > >> >>> import numpy as np > >> >>> np.zeros(1).strides > >> (9223372036854775807,) > >> >>> np.zeros(42).strides > >> (8,) > >> >>> np.__version__ > >> '1.10.0b1' > > > > It is intentional, it will not be the case in the final release. > > Given how quickly this surprised someone, it looks like it would be > helpful to have some single link we could give people to explain > what's going on here. Do we have such a thing? In a few minutes of > searching all I was able to find was > > http://docs.scipy.org/doc/numpy/release.html#npy-relaxed-strides-checking > https://github.com/numpy/numpy/blob/master/doc/release/1.10.0-notes.rst#relaxed-stride-checking > > which together kinda sorta hint at what's going on if you squint, but > not really? Maybe we should add a paragraph to the 1.10 release notes? > True, frankly, after I hit send I thought I should have explained more in any case. I think relaxed strides is explained (though 1.10 could possible link/include more). The issue is that I/we forgot to mention the "funny" stride messing up to expose bugs/help debugging.... So in case someone wonders. When relaxed strides is active, we intentionally give this funny stride that Jens saw (we will not do this in a final release) because failure to work correctly with it hints to general bugs and tests are likely to miss without this "help". Of course some software that is totally fine might still stumble on these strides, though I expect making it work fine with them should not be hard and make it more robust in any case. - Sebastian > > And thanks Chuck for all the release work! > > Indeed! > > -n > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From deen at mpia.de Wed Aug 12 12:12:28 2015 From: deen at mpia.de (Casey Deen) Date: Wed, 12 Aug 2015 18:12:28 +0200 Subject: [Numpy-discussion] f2py and callbacks with variables Message-ID: <55CB706C.2020502@mpia.de> Hi all- I've run into what I think might be a bug in f2py and callbacks to python. Or, maybe I'm not using things correctly. I have created a very minimal example which illustrates my problem at: https://github.com/soylentdeen/fluffy-kumquat The issue seems to affect call backs with variables, but only when they are called indirectly (i.e. from other fortran routines). For example, if I have a python function def show_number(n): print("%d" % n) and I setup a callback in a fortran routine: subroutine cb cf2py intent(callback, hide) blah external blah call blah(5) end and connect it to the python routine fortranObject.blah = show_number I can successfully call the cb routine from python: >fortranObject.cb 5 However, if I call the cb routine from within another fortran routine, it seems to lose its marbles subroutine no_cb call cb end capi_return is NULL Call-back cb_blah_in_cb__user__routines failed. For more information, please have a look at the github repository. I've reproduced the behavior on both linux and mac. I'm not sure if this is an error in the way I'm using the code, or if it is an actual bug. Any and all help would be very much appreciated. Cheers, Casey -- Dr. Casey Deen Post-doctoral Researcher deen at mpia.de +49-6221-528-375 Max Planck Institut f?r Astronomie (MPIA) K?nigstuhl 17 D-69117 Heidelberg, Germany From christian.engwer at uni-muenster.de Wed Aug 12 12:23:16 2015 From: christian.engwer at uni-muenster.de (Christian Engwer) Date: Wed, 12 Aug 2015 18:23:16 +0200 Subject: [Numpy-discussion] Problems using add_npy_pkg_config Message-ID: <20150812162047.GA26389@sansibar.localdomain> Dear all, I'm trying to use the numpy distutils to install native C libraries. These are part of a larger roject and should be usable standalone. I managed to install headers and libs, but now I experience problems writing the corresponding pkg file. I first tried to do the trick without numpy, but getting all the pathes right in all different setups is really a mess. Please a find a m.w.e. attached to this mail. It consists of foo.c foo.ini.in and setup.py. I'm sure I missed some important part, but somehow the distribution variable in build_src seems to be uniinitalized. Calling > python setup.py install --prefix=/tmp/foo.inst fils with ... File "/usr/lib/python2.7/dist-packages/numpy/distutils/command/build_src.py", line 257, in build_npy_pkg_config pkg_path = self.distribution.package_dir[pkg] TypeError: 'NoneType' object has no attribute '__getitem__' I also tried to adopt parts of the numpy setup, but these use sub-modules, which I don't need... might this the the cause of my problems? Any help is highly appreciated ;-) Cheers Christian -------------- next part -------------- A non-text attachment was scrubbed... Name: foo.c Type: text/x-csrc Size: 25 bytes Desc: not available URL: -------------- next part -------------- [meta] Name=@foo@ Version=1.0 Description=dummy description [default] Cflags=-I at prefix@/include Libs= -------------- next part -------------- A non-text attachment was scrubbed... Name: setup.py Type: text/x-python Size: 525 bytes Desc: not available URL: From ralf.gommers at gmail.com Wed Aug 12 12:50:52 2015 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Wed, 12 Aug 2015 18:50:52 +0200 Subject: [Numpy-discussion] Problems using add_npy_pkg_config In-Reply-To: <20150812162047.GA26389@sansibar.localdomain> References: <20150812162047.GA26389@sansibar.localdomain> Message-ID: On Wed, Aug 12, 2015 at 6:23 PM, Christian Engwer < christian.engwer at uni-muenster.de> wrote: > Dear all, > > I'm trying to use the numpy distutils to install native C > libraries. These are part of a larger roject and should be usable > standalone. I managed to install headers and libs, but now I > experience problems writing the corresponding pkg file. I first tried > to do the trick without numpy, but getting all the pathes right in all > different setups is really a mess. > This doesn't answer your question but: why? If you're not distributing a Python project, there is no reason to use distutils instead of a sane build system. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Wed Aug 12 13:23:01 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 12 Aug 2015 11:23:01 -0600 Subject: [Numpy-discussion] Problems using add_npy_pkg_config In-Reply-To: References: <20150812162047.GA26389@sansibar.localdomain> Message-ID: On Wed, Aug 12, 2015 at 10:50 AM, Ralf Gommers wrote: > > > On Wed, Aug 12, 2015 at 6:23 PM, Christian Engwer < > christian.engwer at uni-muenster.de> wrote: > >> Dear all, >> >> I'm trying to use the numpy distutils to install native C >> libraries. These are part of a larger roject and should be usable >> standalone. I managed to install headers and libs, but now I >> experience problems writing the corresponding pkg file. I first tried >> to do the trick without numpy, but getting all the pathes right in all >> different setups is really a mess. >> > > This doesn't answer your question but: why? If you're not distributing a > Python project, there is no reason to use distutils instead of a sane build > system. > Believe it or not, distutils *is* one of the saner build systems when you want something cross platform. Sad, isn't it... Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Wed Aug 12 13:35:06 2015 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Wed, 12 Aug 2015 19:35:06 +0200 Subject: [Numpy-discussion] Problems using add_npy_pkg_config In-Reply-To: References: <20150812162047.GA26389@sansibar.localdomain> Message-ID: On Wed, Aug 12, 2015 at 7:23 PM, Charles R Harris wrote: > > > On Wed, Aug 12, 2015 at 10:50 AM, Ralf Gommers > wrote: > >> >> >> On Wed, Aug 12, 2015 at 6:23 PM, Christian Engwer < >> christian.engwer at uni-muenster.de> wrote: >> >>> Dear all, >>> >>> I'm trying to use the numpy distutils to install native C >>> libraries. These are part of a larger roject and should be usable >>> standalone. I managed to install headers and libs, but now I >>> experience problems writing the corresponding pkg file. I first tried >>> to do the trick without numpy, but getting all the pathes right in all >>> different setups is really a mess. >>> >> >> This doesn't answer your question but: why? If you're not distributing a >> Python project, there is no reason to use distutils instead of a sane build >> system. >> > > Believe it or not, distutils *is* one of the saner build systems when you > want something cross platform. Sad, isn't it... > Come on. We don't take it seriously, and neither do the Python core devs. It's also pretty much completely unsupported. Numpy.distutils is a bit better in that respect than Python distutils, which doesn't even get sane patches merged. Try Scons, Tup, Gradle, Shake, Waf or anything else that's at least somewhat modern and supported. Do not use numpy.distutils unless there's no other mature choice (i.e. you're developing a Python project). Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From pearu.peterson at gmail.com Wed Aug 12 15:34:08 2015 From: pearu.peterson at gmail.com (Pearu Peterson) Date: Wed, 12 Aug 2015 22:34:08 +0300 Subject: [Numpy-discussion] f2py and callbacks with variables In-Reply-To: <55CB706C.2020502@mpia.de> References: <55CB706C.2020502@mpia.de> Message-ID: Hi Casey, What you observe, is not a f2py bug. When f2py sees a code like subroutine foo call bar end subroutine foo then it will not make an attempt to analyze bar because of implicit assumption that all statements that has no references to foo arguments are irrelevant for wrapper function generation. For your example, f2py needs some help. Try the following signature in .pyf file: subroutine barney ! in :flintstone:nocallback.f use test__user__routines, fred=>fred, bambam=>bambam intent(callback, hide) fred external fred intent(callback,hide) bambam external bambam end subroutine barney Btw, instead of f2py -c -m flintstone flintstone.pyf callback.f nocallback.f use f2py -c flintstone.pyf callback.f nocallback.f because module name comes from the .pyf file. HTH, Pearu On Wed, Aug 12, 2015 at 7:12 PM, Casey Deen wrote: > Hi all- > > I've run into what I think might be a bug in f2py and callbacks to > python. Or, maybe I'm not using things correctly. I have created a > very minimal example which illustrates my problem at: > > https://github.com/soylentdeen/fluffy-kumquat > > The issue seems to affect call backs with variables, but only when they > are called indirectly (i.e. from other fortran routines). For example, > if I have a python function > > def show_number(n): > print("%d" % n) > > and I setup a callback in a fortran routine: > > subroutine cb > cf2py intent(callback, hide) blah > external blah > call blah(5) > end > > and connect it to the python routine > fortranObject.blah = show_number > > I can successfully call the cb routine from python: > > >fortranObject.cb > 5 > > However, if I call the cb routine from within another fortran routine, > it seems to lose its marbles > > subroutine no_cb > call cb > end > > capi_return is NULL > Call-back cb_blah_in_cb__user__routines failed. > > For more information, please have a look at the github repository. I've > reproduced the behavior on both linux and mac. I'm not sure if this is > an error in the way I'm using the code, or if it is an actual bug. Any > and all help would be very much appreciated. > > Cheers, > Casey > > > -- > Dr. Casey Deen > Post-doctoral Researcher > deen at mpia.de +49-6221-528-375 > Max Planck Institut f?r Astronomie (MPIA) > K?nigstuhl 17 D-69117 Heidelberg, Germany > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From edisongustavo at gmail.com Wed Aug 12 15:53:33 2015 From: edisongustavo at gmail.com (Edison Gustavo Muenz) Date: Wed, 12 Aug 2015 16:53:33 -0300 Subject: [Numpy-discussion] Problems using add_npy_pkg_config In-Reply-To: References: <20150812162047.GA26389@sansibar.localdomain> Message-ID: Why don't you use CMake ? It's pretty standard for C/C++. On Wed, Aug 12, 2015 at 2:35 PM, Ralf Gommers wrote: > > > On Wed, Aug 12, 2015 at 7:23 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Wed, Aug 12, 2015 at 10:50 AM, Ralf Gommers >> wrote: >> >>> >>> >>> On Wed, Aug 12, 2015 at 6:23 PM, Christian Engwer < >>> christian.engwer at uni-muenster.de> wrote: >>> >>>> Dear all, >>>> >>>> I'm trying to use the numpy distutils to install native C >>>> libraries. These are part of a larger roject and should be usable >>>> standalone. I managed to install headers and libs, but now I >>>> experience problems writing the corresponding pkg file. I first tried >>>> to do the trick without numpy, but getting all the pathes right in all >>>> different setups is really a mess. >>>> >>> >>> This doesn't answer your question but: why? If you're not distributing a >>> Python project, there is no reason to use distutils instead of a sane build >>> system. >>> >> >> Believe it or not, distutils *is* one of the saner build systems when you >> want something cross platform. Sad, isn't it... >> > > Come on. We don't take it seriously, and neither do the Python core devs. > It's also pretty much completely unsupported. Numpy.distutils is a bit > better in that respect than Python distutils, which doesn't even get sane > patches merged. > > Try Scons, Tup, Gradle, Shake, Waf or anything else that's at least > somewhat modern and supported. Do not use numpy.distutils unless there's no > other mature choice (i.e. you're developing a Python project). > > Ralf > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nathan12343 at gmail.com Wed Aug 12 17:03:13 2015 From: nathan12343 at gmail.com (Nathan Goldbaum) Date: Wed, 12 Aug 2015 16:03:13 -0500 Subject: [Numpy-discussion] Changes to np.digitize since NumPy 1.9? Message-ID: Hi all, I've been testing the package I spend most of my time on, yt, under numpy 1.10b1 since the announcement went out. I think I've narrowed down and fixed all of the test failures that cropped up except for one last issue. It seems that the behavior of np.digitize with respect to ndarray subclasses has changed since the NumPy 1.9 series. Consider the following test script: ```python import numpy as np class MyArray(np.ndarray): def __new__(cls, *args, **kwargs): return np.ndarray.__new__(cls, *args, **kwargs) data = np.arange(100) bins = np.arange(100) + 0.5 data = data.view(MyArray) bins = bins.view(MyArray) digits = np.digitize(data, bins) print type(digits) ``` Under NumPy 1.9.2, this prints "", but under the 1.10 beta, it prints "" I'm curious why this change was made. Since digitize outputs index arrays, it doesn't make sense to me why it should return anything but a plain ndarray. I see in the release notes that digitize now uses searchsorted under the hood. Is this related? We can "fix" this in our codebase by wrapping digitize or by adding numpy version checks in places where the output type matters. Is it also possible for me to customize the return type here by exploiting the ufunc machinery and the __array_wrap__ and __array_finalize__ functions? Thanks for any help or advice you might have, Nathan -------------- next part -------------- An HTML attachment was scrubbed... URL: From efiring at hawaii.edu Wed Aug 12 17:08:58 2015 From: efiring at hawaii.edu (Eric Firing) Date: Wed, 12 Aug 2015 10:08:58 -1100 Subject: [Numpy-discussion] Problems using add_npy_pkg_config In-Reply-To: References: <20150812162047.GA26389@sansibar.localdomain> Message-ID: I used to use scons, but I've been pretty happy with switching to waf. (Very limited use in both cases: two relatively simple packages.) One of the nicest things is how light it is--no external dependencies, everything can be included in the package itself. From deen at mpia.de Wed Aug 12 16:46:42 2015 From: deen at mpia.de (Casey Deen) Date: Wed, 12 Aug 2015 22:46:42 +0200 Subject: [Numpy-discussion] f2py and callbacks with variables In-Reply-To: References: <55CB706C.2020502@mpia.de> Message-ID: <55CBB0B2.9060202@mpia.de> Hi Pearu- Thanks so much! This works! Can you point me to a reference for the format of the .pyf files? My ~day of searching found a few pages on the scipy website, but nothing which went into this amount of detail. I also asked Stackoverflow, and unless you object, I'd like to add your explanation and mark it as SOLVED for future poor souls wrestling with this problem. I'll also update the github repository with before and after versions of the .pyf file. Cheers, Casey On 08/12/2015 09:34 PM, Pearu Peterson wrote: > Hi Casey, > > What you observe, is not a f2py bug. When f2py sees a code like > > subroutine foo > call bar > end subroutine foo > > then it will not make an attempt to analyze bar because of implicit > assumption that all statements that has no references to foo arguments > are irrelevant for wrapper function generation. > For your example, f2py needs some help. Try the following signature in > .pyf file: > > subroutine barney ! in :flintstone:nocallback.f > use test__user__routines, fred=>fred, bambam=>bambam > intent(callback, hide) fred > external fred > intent(callback,hide) bambam > external bambam > end subroutine barney > > Btw, instead of > > f2py -c -m flintstone flintstone.pyf callback.f nocallback.f > > use > > f2py -c flintstone.pyf callback.f nocallback.f > > because module name comes from the .pyf file. > > HTH, > Pearu > > On Wed, Aug 12, 2015 at 7:12 PM, Casey Deen > wrote: > > Hi all- > > I've run into what I think might be a bug in f2py and callbacks to > python. Or, maybe I'm not using things correctly. I have created a > very minimal example which illustrates my problem at: > > https://github.com/soylentdeen/fluffy-kumquat > > The issue seems to affect call backs with variables, but only when they > are called indirectly (i.e. from other fortran routines). For example, > if I have a python function > > def show_number(n): > print("%d" % n) > > and I setup a callback in a fortran routine: > > subroutine cb > cf2py intent(callback, hide) blah > external blah > call blah(5) > end > > and connect it to the python routine > fortranObject.blah = show_number > > I can successfully call the cb routine from python: > > >fortranObject.cb > 5 > > However, if I call the cb routine from within another fortran routine, > it seems to lose its marbles > > subroutine no_cb > call cb > end > > capi_return is NULL > Call-back cb_blah_in_cb__user__routines failed. > > For more information, please have a look at the github repository. I've > reproduced the behavior on both linux and mac. I'm not sure if this is > an error in the way I'm using the code, or if it is an actual bug. Any > and all help would be very much appreciated. > > Cheers, > Casey > > > -- > Dr. Casey Deen > Post-doctoral Researcher > deen at mpia.de > +49-6221-528-375 > Max Planck Institut f?r Astronomie (MPIA) > K?nigstuhl 17 D-69117 Heidelberg, Germany > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Dr. Casey Deen Post-doctoral Researcher deen at mpia.de +49-6221-528-375 Max Planck Institut f?r Astronomie (MPIA) K?nigstuhl 17 D-69117 Heidelberg, Germany From njs at pobox.com Thu Aug 13 01:42:46 2015 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 12 Aug 2015 22:42:46 -0700 Subject: [Numpy-discussion] Changes to np.digitize since NumPy 1.9? In-Reply-To: References: Message-ID: On Aug 12, 2015 2:06 PM, "Nathan Goldbaum" wrote: > > Hi all, > > I've been testing the package I spend most of my time on, yt, under numpy 1.10b1 since the announcement went out. > > I think I've narrowed down and fixed all of the test failures that cropped up except for one last issue. This doesn't respond to your main question -- sorry! -- but is there a list of the changes you had to make somewhere? We generally do want to know when we break things -- that's why we do pre-releases! -- but it's often hard to know :-). -n From jaime.frio at gmail.com Thu Aug 13 02:09:17 2015 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Wed, 12 Aug 2015 23:09:17 -0700 Subject: [Numpy-discussion] Changes to np.digitize since NumPy 1.9? In-Reply-To: References: Message-ID: On Wed, Aug 12, 2015 at 2:03 PM, Nathan Goldbaum wrote: > Hi all, > > I've been testing the package I spend most of my time on, yt, under numpy > 1.10b1 since the announcement went out. > > I think I've narrowed down and fixed all of the test failures that cropped > up except for one last issue. It seems that the behavior of np.digitize > with respect to ndarray subclasses has changed since the NumPy 1.9 series. > Consider the following test script: > > ```python > import numpy as np > > > class MyArray(np.ndarray): > def __new__(cls, *args, **kwargs): > return np.ndarray.__new__(cls, *args, **kwargs) > > data = np.arange(100) > > bins = np.arange(100) + 0.5 > > data = data.view(MyArray) > > bins = bins.view(MyArray) > > digits = np.digitize(data, bins) > > print type(digits) > ``` > > Under NumPy 1.9.2, this prints "", but under the > 1.10 beta, it prints "" > > I'm curious why this change was made. Since digitize outputs index arrays, > it doesn't make sense to me why it should return anything but a plain > ndarray. I see in the release notes that digitize now uses searchsorted > under the hood. Is this related? > It is indeed searchsorted's fault, as it returns an object of the same type as the needle (the items to search for): >>> import numpy as np >>> class A(np.ndarray): pass >>> class B(np.ndarray): pass >>> np.arange(10).view(A).searchsorted(np.arange(5).view(B)) B([0, 1, 2, 3, 4]) I am all for making index-returning functions always return a base ndarray, and will be more than happy to send a PR fixing this if there is some agreement. Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Thu Aug 13 07:47:27 2015 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 13 Aug 2015 11:47:27 +0000 Subject: [Numpy-discussion] Changes to np.digitize since NumPy 1.9? In-Reply-To: References: Message-ID: On Aug 12, 2015 11:12 PM, "Jaime Fern?ndez del R?o" wrote: > > On Wed, Aug 12, 2015 at 2:03 PM, Nathan Goldbaum wrote: >> >> Hi all, >> >> I've been testing the package I spend most of my time on, yt, under numpy 1.10b1 since the announcement went out. >> >> I think I've narrowed down and fixed all of the test failures that cropped up except for one last issue. It seems that the behavior of np.digitize with respect to ndarray subclasses has changed since the NumPy 1.9 series. Consider the following test script: >> >> ```python >> import numpy as np >> >> >> class MyArray(np.ndarray): >> def __new__(cls, *args, **kwargs): >> return np.ndarray.__new__(cls, *args, **kwargs) >> >> data = np.arange(100) >> >> bins = np.arange(100) + 0.5 >> >> data = data.view(MyArray) >> >> bins = bins.view(MyArray) >> >> digits = np.digitize(data, bins) >> >> print type(digits) >> ``` >> >> Under NumPy 1.9.2, this prints "", but under the 1.10 beta, it prints "" >> >> I'm curious why this change was made. Since digitize outputs index arrays, it doesn't make sense to me why it should return anything but a plain ndarray. I see in the release notes that digitize now uses searchsorted under the hood. Is this related? > > > It is indeed searchsorted's fault, as it returns an object of the same type as the needle (the items to search for): > > >>> import numpy as np > >>> class A(np.ndarray): pass > >>> class B(np.ndarray): pass > >>> np.arange(10).view(A).searchsorted(np.arange(5).view(B)) > B([0, 1, 2, 3, 4]) > > I am all for making index-returning functions always return a base ndarray, and will be more than happy to send a PR fixing this if there is some agreement. Makes sense to me. I won't be surprised if someone else then shows up saying that of course they depend on index array return types matching the input, but if that happens then I guess we can let them and Nathan fight it out :-). -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Thu Aug 13 08:04:34 2015 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 13 Aug 2015 13:04:34 +0100 Subject: [Numpy-discussion] [SciPy-Dev] ANN: Numpy 1.10.0b1 release In-Reply-To: <1439378626.17032.15.camel@sipsolutions.net> References: <55CAF8C5.8060202@fysik.dtu.dk> <1439365899.17032.7.camel@sipsolutions.net> <1439378626.17032.15.camel@sipsolutions.net> Message-ID: Hi, On Wed, Aug 12, 2015 at 12:23 PM, Sebastian Berg wrote: > On Mi, 2015-08-12 at 01:07 -0700, Nathaniel Smith wrote: >> On Wed, Aug 12, 2015 at 12:51 AM, Sebastian Berg >> wrote: >> > On Mi, 2015-08-12 at 09:41 +0200, Jens J?rgen Mortensen wrote: >> >> On 08/11/2015 11:23 PM, Charles R Harris wrote: >> >> > Hi All, >> >> > >> >> > give this release a whirl and report any problems either on the >> >> > numpy-discussion list or by opening an issue on github. >> >> > >> >> > I'm pleased to announce the first beta release of Numpy 1.10.0. >> >> > There is over a year's worth of enhancements and bug fixes in the >> >> > 1.10.0 release, so please give this release a whirl and report any >> >> > problems either on the numpy-discussion list or by opening an issue >> >> > on github. Tarballs, installers, and release notes may be found in >> >> > the usual place at Sourceforge. I'm getting test errors on the standard OSX numpy / scipy compilation rig: Python.org Python OSX 10.9 clang gfortran 4.2.3 Compiling from the `maintenance/1.10.x` branch (is there a 1.10.0b1 tag)? ====================================================================== ERROR: test_accelerate_framework_sgemv_fix (test_multiarray.TestDot) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/mb312/.virtualenvs/test/lib/python2.7/site-packages/numpy/core/tests/test_multiarray.py", line 4218, in test_accelerate_framework_sgemv_fix m = aligned_array(100, 15, np.float32) File "/Users/mb312/.virtualenvs/test/lib/python2.7/site-packages/numpy/core/tests/test_multiarray.py", line 4200, in aligned_array d = np.dtype() TypeError: Required argument 'dtype' (pos 1) not found This one should be fixed by https://github.com/numpy/numpy/pull/6202 ====================================================================== ERROR: test_callback.TestF77Callback.test_string_callback ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/mb312/.virtualenvs/test/lib/python2.7/site-packages/nose/case.py", line 381, in setUp try_run(self.inst, ('setup', 'setUp')) File "/Users/mb312/.virtualenvs/test/lib/python2.7/site-packages/nose/util.py", line 471, in try_run return func() File "/Users/mb312/.virtualenvs/test/lib/python2.7/site-packages/numpy/f2py/tests/util.py", line 362, in setUp module_name=self.module_name) File "/Users/mb312/.virtualenvs/test/lib/python2.7/site-packages/numpy/f2py/tests/util.py", line 79, in wrapper memo[key] = func(*a, **kw) File "/Users/mb312/.virtualenvs/test/lib/python2.7/site-packages/numpy/f2py/tests/util.py", line 170, in build_code module_name=module_name) File "/Users/mb312/.virtualenvs/test/lib/python2.7/site-packages/numpy/f2py/tests/util.py", line 79, in wrapper memo[key] = func(*a, **kw) File "/Users/mb312/.virtualenvs/test/lib/python2.7/site-packages/numpy/f2py/tests/util.py", line 150, in build_module __import__(module_name) ImportError: dlopen(/var/folders/s7/r25pn2xj48n4cm76_mgsb78h0000gn/T/tmpa39XPB/_test_ext_module_5403.so, 2): Symbol not found: _func0_ Referenced from: /var/folders/s7/r25pn2xj48n4cm76_mgsb78h0000gn/T/tmpa39XPB/_test_ext_module_5403.so Expected in: dynamic lookup Any ideas about this second one? Cheers, Matthew From charlesr.harris at gmail.com Thu Aug 13 10:44:25 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 13 Aug 2015 08:44:25 -0600 Subject: [Numpy-discussion] Changes to np.digitize since NumPy 1.9? In-Reply-To: References: Message-ID: On Thu, Aug 13, 2015 at 12:09 AM, Jaime Fern?ndez del R?o < jaime.frio at gmail.com> wrote: > On Wed, Aug 12, 2015 at 2:03 PM, Nathan Goldbaum > wrote: > >> Hi all, >> >> I've been testing the package I spend most of my time on, yt, under numpy >> 1.10b1 since the announcement went out. >> >> I think I've narrowed down and fixed all of the test failures that >> cropped up except for one last issue. It seems that the behavior of >> np.digitize with respect to ndarray subclasses has changed since the NumPy >> 1.9 series. Consider the following test script: >> >> ```python >> import numpy as np >> >> >> class MyArray(np.ndarray): >> def __new__(cls, *args, **kwargs): >> return np.ndarray.__new__(cls, *args, **kwargs) >> >> data = np.arange(100) >> >> bins = np.arange(100) + 0.5 >> >> data = data.view(MyArray) >> >> bins = bins.view(MyArray) >> >> digits = np.digitize(data, bins) >> >> print type(digits) >> ``` >> >> Under NumPy 1.9.2, this prints "", but under the >> 1.10 beta, it prints "" >> >> I'm curious why this change was made. Since digitize outputs index >> arrays, it doesn't make sense to me why it should return anything but a >> plain ndarray. I see in the release notes that digitize now uses >> searchsorted under the hood. Is this related? >> > > It is indeed searchsorted's fault, as it returns an object of the same > type as the needle (the items to search for): > > >>> import numpy as np > >>> class A(np.ndarray): pass > >>> class B(np.ndarray): pass > >>> np.arange(10).view(A).searchsorted(np.arange(5).view(B)) > B([0, 1, 2, 3, 4]) > > I am all for making index-returning functions always return a base > ndarray, and will be more than happy to send a PR fixing this if there is > some agreement. > I think that is the right thing to do. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From nathan12343 at gmail.com Thu Aug 13 10:59:33 2015 From: nathan12343 at gmail.com (Nathan Goldbaum) Date: Thu, 13 Aug 2015 09:59:33 -0500 Subject: [Numpy-discussion] Changes to np.digitize since NumPy 1.9? In-Reply-To: References: Message-ID: On Thu, Aug 13, 2015 at 9:44 AM, Charles R Harris wrote: > > > On Thu, Aug 13, 2015 at 12:09 AM, Jaime Fern?ndez del R?o < > jaime.frio at gmail.com> wrote: > >> On Wed, Aug 12, 2015 at 2:03 PM, Nathan Goldbaum >> wrote: >> >>> Hi all, >>> >>> I've been testing the package I spend most of my time on, yt, under >>> numpy 1.10b1 since the announcement went out. >>> >>> I think I've narrowed down and fixed all of the test failures that >>> cropped up except for one last issue. It seems that the behavior of >>> np.digitize with respect to ndarray subclasses has changed since the NumPy >>> 1.9 series. Consider the following test script: >>> >>> ```python >>> import numpy as np >>> >>> >>> class MyArray(np.ndarray): >>> def __new__(cls, *args, **kwargs): >>> return np.ndarray.__new__(cls, *args, **kwargs) >>> >>> data = np.arange(100) >>> >>> bins = np.arange(100) + 0.5 >>> >>> data = data.view(MyArray) >>> >>> bins = bins.view(MyArray) >>> >>> digits = np.digitize(data, bins) >>> >>> print type(digits) >>> ``` >>> >>> Under NumPy 1.9.2, this prints "", but under the >>> 1.10 beta, it prints "" >>> >>> I'm curious why this change was made. Since digitize outputs index >>> arrays, it doesn't make sense to me why it should return anything but a >>> plain ndarray. I see in the release notes that digitize now uses >>> searchsorted under the hood. Is this related? >>> >> >> It is indeed searchsorted's fault, as it returns an object of the same >> type as the needle (the items to search for): >> >> >>> import numpy as np >> >>> class A(np.ndarray): pass >> >>> class B(np.ndarray): pass >> >>> np.arange(10).view(A).searchsorted(np.arange(5).view(B)) >> B([0, 1, 2, 3, 4]) >> >> I am all for making index-returning functions always return a base >> ndarray, and will be more than happy to send a PR fixing this if there is >> some agreement. >> > > I think that is the right thing to do. > Awesome, I'd appreciate having a PR to fix this. Arguably the return type *could* be the same type as the inputs, but given that it's a behavior change I agree that it's best to add a patch so the output of serachsorted is "sanitized" to be an ndarray before it's returned by digitize. To answer Nathaniel's question, I opened an issue on yt's bitbucket page to record the test failures: https://bitbucket.org/yt_analysis/yt/issues/1063/new-test-failures-using-numpy-110-beta I've fixed two of the classes of errors in that bug in yt itself, since it looks like we were relying on buggy or deprecated behavior in NumPy. Here are the PRs for those fixes: https://bitbucket.org/yt_analysis/yt/pull-requests/1697/cast-enzo-grid-start-index-to-int-arrays/diff https://bitbucket.org/yt_analysis/yt/pull-requests/1696/add-assert_allclose_units-like/diff > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From peridot.faceted at gmail.com Thu Aug 13 11:52:22 2015 From: peridot.faceted at gmail.com (Anne Archibald) Date: Thu, 13 Aug 2015 15:52:22 +0000 Subject: [Numpy-discussion] Development workflow (not git tutorial) Message-ID: Hi, What is a sensible way to work on (modify, compile, and test) numpy? There is documentation about "contributing to numpy" at: http://docs.scipy.org/doc/numpy-dev/dev/index.html and: http://docs.scipy.org/doc/numpy-dev/dev/gitwash/development_workflow.html but these are entirely focused on using git. I have no problem with that aspect. It is building and testing that I am looking for the Right Way to do. My current approach is to build an empty virtualenv, pip install nose, and from the numpy root directory do "python setup.py build_ext --inplace" and "python -c 'import numpy; numpy.test()'". This works, for my stock system python, though I get a lot of weird messages suggesting distutils problems (for example "python setup.py develop", although suggested by setup.py itself, claims that "develop" is not a command). But I don't know how (for example) to test with python3 without starting from a separate clean source tree. What do you recommend: use virtualenvs? Is building inplace the way to go? Is there a better way to run all tests? Are there other packages that should go into the virtualenv? What is the best way to test on multiple python versions? Switch cleanly between feature branches? Surely I can't be the only person wishing for advice on a sensible way to work with an in-development version of numpy? Perhaps this would be a good addition to CONTRIBUTING.md or the website? Thanks, Anne -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Thu Aug 13 12:00:10 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Thu, 13 Aug 2015 18:00:10 +0200 Subject: [Numpy-discussion] Development workflow (not git tutorial) In-Reply-To: References: Message-ID: <1439481610.10782.24.camel@sipsolutions.net> On Do, 2015-08-13 at 15:52 +0000, Anne Archibald wrote: > Hi, > > > What is a sensible way to work on (modify, compile, and test) numpy? > > > There is documentation about "contributing to numpy" at: > http://docs.scipy.org/doc/numpy-dev/dev/index.html > > and: > http://docs.scipy.org/doc/numpy-dev/dev/gitwash/development_workflow.html > > but these are entirely focused on using git. I have no problem with > that aspect. It is building and testing that I am looking for the > Right Way to do. > > > My current approach is to build an empty virtualenv, pip install nose, > and from the numpy root directory do "python setup.py build_ext > --inplace" and "python -c 'import numpy; numpy.test()'". This works, > for my stock system python, though I get a lot of weird messages > suggesting distutils problems (for example "python setup.py develop", > although suggested by setup.py itself, claims that "develop" is not a > command). But I don't know how (for example) to test with python3 > without starting from a separate clean source tree. > We have the `runtests.py` script which will do exactly this (don't think it gives lots of weird warnings normally). I think that is the only real tip I can give. - Sebastian > > What do you recommend: use virtualenvs? Is building inplace the way to > go? Is there a better way to run all tests? Are there other packages > that should go into the virtualenv? What is the best way to test on > multiple python versions? Switch cleanly between feature branches? > > > Surely I can't be the only person wishing for advice on a sensible way > to work with an in-development version of numpy? Perhaps this would be > a good addition to CONTRIBUTING.md or the website? > > > Thanks, > Anne > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From charlesr.harris at gmail.com Thu Aug 13 12:32:52 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 13 Aug 2015 10:32:52 -0600 Subject: [Numpy-discussion] Development workflow (not git tutorial) In-Reply-To: <1439481610.10782.24.camel@sipsolutions.net> References: <1439481610.10782.24.camel@sipsolutions.net> Message-ID: On Thu, Aug 13, 2015 at 10:00 AM, Sebastian Berg wrote: > On Do, 2015-08-13 at 15:52 +0000, Anne Archibald wrote: > > Hi, > > > > > > What is a sensible way to work on (modify, compile, and test) numpy? > > > > > > There is documentation about "contributing to numpy" at: > > http://docs.scipy.org/doc/numpy-dev/dev/index.html > > > > and: > > > http://docs.scipy.org/doc/numpy-dev/dev/gitwash/development_workflow.html > > > > but these are entirely focused on using git. I have no problem with > > that aspect. It is building and testing that I am looking for the > > Right Way to do. > > > > > > My current approach is to build an empty virtualenv, pip install nose, > > and from the numpy root directory do "python setup.py build_ext > > --inplace" and "python -c 'import numpy; numpy.test()'". This works, > > for my stock system python, though I get a lot of weird messages > > suggesting distutils problems (for example "python setup.py develop", > > although suggested by setup.py itself, claims that "develop" is not a > > command). But I don't know how (for example) to test with python3 > > without starting from a separate clean source tree. > > > > We have the `runtests.py` script which will do exactly this (don't think > it gives lots of weird warnings normally). I think that is the only real > tip I can give. > +1 for `runtests.py`. Do `python runtests.py --help` to get started. If you want another python, say 3.5, `python3.5 runtests.py`. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaime.frio at gmail.com Thu Aug 13 12:57:18 2015 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Thu, 13 Aug 2015 09:57:18 -0700 Subject: [Numpy-discussion] Changes to np.digitize since NumPy 1.9? In-Reply-To: References: Message-ID: On Thu, Aug 13, 2015 at 7:59 AM, Nathan Goldbaum wrote: > > > On Thu, Aug 13, 2015 at 9:44 AM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Thu, Aug 13, 2015 at 12:09 AM, Jaime Fern?ndez del R?o < >> jaime.frio at gmail.com> wrote: >> >>> On Wed, Aug 12, 2015 at 2:03 PM, Nathan Goldbaum >>> wrote: >>> >>>> Hi all, >>>> >>>> I've been testing the package I spend most of my time on, yt, under >>>> numpy 1.10b1 since the announcement went out. >>>> >>>> I think I've narrowed down and fixed all of the test failures that >>>> cropped up except for one last issue. It seems that the behavior of >>>> np.digitize with respect to ndarray subclasses has changed since the NumPy >>>> 1.9 series. Consider the following test script: >>>> >>>> ```python >>>> import numpy as np >>>> >>>> >>>> class MyArray(np.ndarray): >>>> def __new__(cls, *args, **kwargs): >>>> return np.ndarray.__new__(cls, *args, **kwargs) >>>> >>>> data = np.arange(100) >>>> >>>> bins = np.arange(100) + 0.5 >>>> >>>> data = data.view(MyArray) >>>> >>>> bins = bins.view(MyArray) >>>> >>>> digits = np.digitize(data, bins) >>>> >>>> print type(digits) >>>> ``` >>>> >>>> Under NumPy 1.9.2, this prints "", but under the >>>> 1.10 beta, it prints "" >>>> >>>> I'm curious why this change was made. Since digitize outputs index >>>> arrays, it doesn't make sense to me why it should return anything but a >>>> plain ndarray. I see in the release notes that digitize now uses >>>> searchsorted under the hood. Is this related? >>>> >>> >>> It is indeed searchsorted's fault, as it returns an object of the same >>> type as the needle (the items to search for): >>> >>> >>> import numpy as np >>> >>> class A(np.ndarray): pass >>> >>> class B(np.ndarray): pass >>> >>> np.arange(10).view(A).searchsorted(np.arange(5).view(B)) >>> B([0, 1, 2, 3, 4]) >>> >>> I am all for making index-returning functions always return a base >>> ndarray, and will be more than happy to send a PR fixing this if there is >>> some agreement. >>> >> >> I think that is the right thing to do. >> > > Awesome, I'd appreciate having a PR to fix this. Arguably the return type > *could* be the same type as the inputs, but given that it's a behavior > change I agree that it's best to add a patch so the output of serachsorted > is "sanitized" to be an ndarray before it's returned by digitize. > It is relatively simple to do, just replace Py_TYPE(ap2) with &PyArray_Type in this line: https://github.com/numpy/numpy/blob/maintenance/1.10.x/numpy/core/src/multiarray/item_selection.c#L1725 Then fix all the tests that are expecting searchsorted to return something else than a base ndarray. We already have modified nonzero to return base ndarray's in this release, see the release notes, so it will go with the same theme. For 1.11 I think we should try to extend this "if it returns an index, it will be a base ndarray" to all other functions that don't right now. Then sit back and watch AstroPy come down in flames... ;-))) Seriously, I think this makes a lot of sense, and should be documented as the way NumPy handles index arrays. Anyway, I will try to find time tonight to put this PR together, unless someone beats me to it, which I would be totally fine with. Jaime > > To answer Nathaniel's question, I opened an issue on yt's bitbucket page > to record the test failures: > > > https://bitbucket.org/yt_analysis/yt/issues/1063/new-test-failures-using-numpy-110-beta > > I've fixed two of the classes of errors in that bug in yt itself, since it > looks like we were relying on buggy or deprecated behavior in NumPy. Here > are the PRs for those fixes: > > > https://bitbucket.org/yt_analysis/yt/pull-requests/1697/cast-enzo-grid-start-index-to-int-arrays/diff > > https://bitbucket.org/yt_analysis/yt/pull-requests/1696/add-assert_allclose_units-like/diff > >> >> Chuck >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefanv at berkeley.edu Thu Aug 13 14:25:09 2015 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Thu, 13 Aug 2015 11:25:09 -0700 Subject: [Numpy-discussion] Development workflow (not git tutorial) In-Reply-To: References: Message-ID: <87lhdfxcze.fsf@berkeley.edu> On 2015-08-13 08:52:22, Anne Archibald wrote: > My current approach is to build an empty virtualenv, pip install > nose, and from the numpy root directory do "python setup.py > build_ext --inplace" and "python -c 'import numpy; > numpy.test()'". This works, for my stock system python, though I > get a lot of weird messages suggesting distutils problems (for > example "python setup.py develop", although suggested by > setup.py itself, claims that "develop" is not a command). But I > don't know how (for example) to test with python3 without > starting from a separate clean source tree. Nowadays, you can use pip install -e . to install an in-place "editable" version of numpy. This should also execute "build_ext" for you. St?fan From sebastian at sipsolutions.net Thu Aug 13 14:34:24 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Thu, 13 Aug 2015 20:34:24 +0200 Subject: [Numpy-discussion] Multiarray API size mismatch 301 302? Message-ID: <1439490864.10782.28.camel@sipsolutions.net> Hey, just for hacking/testing, I tried to add to shape.c: /*NUMPY_API * * Checks if memory overlap exists */ NPY_NO_EXPORT int PyArray_ArraysShareMemory(PyArrayObject *arr1, PyArrayObject *arr2, int work) { return solve_may_share_memory(arr1, arr2, work); } and to numpy_api.py: # End 1.10 API 'PyArray_ArraysShareMemory': (301,), But I am getting the error: File "numpy/core/code_generators/generate_numpy_api.py", line 230, in do_generate_api (len(multiarray_api_dict), len(multiarray_api_index))) AssertionError: Multiarray API size mismatch 301 302 It is puzzling me, so anyone got a quick idea? - Sebastian -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From ben.root at ou.edu Thu Aug 13 14:36:31 2015 From: ben.root at ou.edu (Benjamin Root) Date: Thu, 13 Aug 2015 14:36:31 -0400 Subject: [Numpy-discussion] Multiarray API size mismatch 301 302? In-Reply-To: <1439490864.10782.28.camel@sipsolutions.net> References: <1439490864.10782.28.camel@sipsolutions.net> Message-ID: Did you do a "git clean -fxd" before re-installing? On Thu, Aug 13, 2015 at 2:34 PM, Sebastian Berg wrote: > Hey, > > just for hacking/testing, I tried to add to shape.c: > > > /*NUMPY_API > * > * Checks if memory overlap exists > */ > NPY_NO_EXPORT int > PyArray_ArraysShareMemory(PyArrayObject *arr1, PyArrayObject *arr2, int > work) { > return solve_may_share_memory(arr1, arr2, work); > } > > > > and to numpy_api.py: > > # End 1.10 API > 'PyArray_ArraysShareMemory': (301,), > > > But I am getting the error: > > File "numpy/core/code_generators/generate_numpy_api.py", line 230, in > do_generate_api > (len(multiarray_api_dict), len(multiarray_api_index))) > AssertionError: Multiarray API size mismatch 301 302 > > It is puzzling me, so anyone got a quick idea? > > - Sebastian > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Thu Aug 13 14:42:17 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Thu, 13 Aug 2015 20:42:17 +0200 Subject: [Numpy-discussion] Multiarray API size mismatch 301 302? In-Reply-To: References: <1439490864.10782.28.camel@sipsolutions.net> Message-ID: <1439491337.10782.29.camel@sipsolutions.net> On Do, 2015-08-13 at 14:36 -0400, Benjamin Root wrote: > Did you do a "git clean -fxd" before re-installing? > Yup. > > On Thu, Aug 13, 2015 at 2:34 PM, Sebastian Berg > wrote: > Hey, > > just for hacking/testing, I tried to add to shape.c: > > > /*NUMPY_API > * > * Checks if memory overlap exists > */ > NPY_NO_EXPORT int > PyArray_ArraysShareMemory(PyArrayObject *arr1, PyArrayObject > *arr2, int > work) { > return solve_may_share_memory(arr1, arr2, work); > } > > > > and to numpy_api.py: > > # End 1.10 API > 'PyArray_ArraysShareMemory': (301,), > > > But I am getting the error: > > File "numpy/core/code_generators/generate_numpy_api.py", > line 230, in > do_generate_api > (len(multiarray_api_dict), len(multiarray_api_index))) > AssertionError: Multiarray API size mismatch 301 302 > > It is puzzling me, so anyone got a quick idea? > > - Sebastian > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From christian.engwer at uni-muenster.de Thu Aug 13 14:45:07 2015 From: christian.engwer at uni-muenster.de (Christian Engwer) Date: Thu, 13 Aug 2015 20:45:07 +0200 Subject: [Numpy-discussion] Problems using add_npy_pkg_config In-Reply-To: References: <20150812162047.GA26389@sansibar.localdomain> Message-ID: <20150813184507.GA24303@sansibar.localdomain> > >> This doesn't answer your question but: why? If you're not distributing a > >> Python project, there is no reason to use distutils instead of a sane build > >> system. > > Come on. We don't take it seriously, and neither do the Python core devs. > It's also pretty much completely unsupported. Numpy.distutils is a bit > better in that respect than Python distutils, which doesn't even get sane > patches merged. > > Try Scons, Tup, Gradle, Shake, Waf or anything else that's at least > somewhat modern and supported. Do not use numpy.distutils unless there's no > other mature choice (i.e. you're developing a Python project). Sorry, reading my mail again, it seems that I didn't make this point clear. I have a project which is python + c-lib. The later which should be used by other c-projects as well. The minimal working example is without any python code, as I only have problems with the pkg config file. ... and concerning cmake, yes we tried this as well, but using cmake to distribute the python code is also a pita ;-) ... Christian From pearu.peterson at gmail.com Thu Aug 13 15:50:55 2015 From: pearu.peterson at gmail.com (Pearu Peterson) Date: Thu, 13 Aug 2015 22:50:55 +0300 Subject: [Numpy-discussion] f2py and callbacks with variables In-Reply-To: <55CBB0B2.9060202@mpia.de> References: <55CB706C.2020502@mpia.de> <55CBB0B2.9060202@mpia.de> Message-ID: Hi Casey, On Wed, Aug 12, 2015 at 11:46 PM, Casey Deen wrote: > Hi Pearu- > > Thanks so much! This works! Can you point me to a reference for the > format of the .pyf files? My ~day of searching found a few pages on the > scipy website, but nothing which went into this amount of detail. > > Try this: https://sysbio.ioc.ee/projects/f2py2e/usersguide/index.html#signature-file > I also asked Stackoverflow, and unless you object, I'd like to add your > explanation and mark it as SOLVED for future poor souls wrestling with > this problem. I'll also update the github repository with before and > after versions of the .pyf file. > > Go ahead with stackoverflow. Best regards, Pearu Cheers, > Casey > > On 08/12/2015 09:34 PM, Pearu Peterson wrote: > > Hi Casey, > > > > What you observe, is not a f2py bug. When f2py sees a code like > > > > subroutine foo > > call bar > > end subroutine foo > > > > then it will not make an attempt to analyze bar because of implicit > > assumption that all statements that has no references to foo arguments > > are irrelevant for wrapper function generation. > > For your example, f2py needs some help. Try the following signature in > > .pyf file: > > > > subroutine barney ! in :flintstone:nocallback.f > > use test__user__routines, fred=>fred, bambam=>bambam > > intent(callback, hide) fred > > external fred > > intent(callback,hide) bambam > > external bambam > > end subroutine barney > > > > Btw, instead of > > > > f2py -c -m flintstone flintstone.pyf callback.f nocallback.f > > > > use > > > > f2py -c flintstone.pyf callback.f nocallback.f > > > > because module name comes from the .pyf file. > > > > HTH, > > Pearu > > > > On Wed, Aug 12, 2015 at 7:12 PM, Casey Deen > > wrote: > > > > Hi all- > > > > I've run into what I think might be a bug in f2py and callbacks to > > python. Or, maybe I'm not using things correctly. I have created a > > very minimal example which illustrates my problem at: > > > > https://github.com/soylentdeen/fluffy-kumquat > > > > The issue seems to affect call backs with variables, but only when > they > > are called indirectly (i.e. from other fortran routines). For > example, > > if I have a python function > > > > def show_number(n): > > print("%d" % n) > > > > and I setup a callback in a fortran routine: > > > > subroutine cb > > cf2py intent(callback, hide) blah > > external blah > > call blah(5) > > end > > > > and connect it to the python routine > > fortranObject.blah = show_number > > > > I can successfully call the cb routine from python: > > > > >fortranObject.cb > > 5 > > > > However, if I call the cb routine from within another fortran > routine, > > it seems to lose its marbles > > > > subroutine no_cb > > call cb > > end > > > > capi_return is NULL > > Call-back cb_blah_in_cb__user__routines failed. > > > > For more information, please have a look at the github repository. > I've > > reproduced the behavior on both linux and mac. I'm not sure if this > is > > an error in the way I'm using the code, or if it is an actual bug. > Any > > and all help would be very much appreciated. > > > > Cheers, > > Casey > > > > > > -- > > Dr. Casey Deen > > Post-doctoral Researcher > > deen at mpia.de > > +49-6221-528-375 > > Max Planck Institut f?r Astronomie (MPIA) > > K?nigstuhl 17 D-69117 Heidelberg, Germany > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > -- > Dr. Casey Deen > Post-doctoral Researcher > deen at mpia.de +49-6221-528-375 > Max Planck Institut f?r Astronomie (MPIA) > K?nigstuhl 17 D-69117 Heidelberg, Germany > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Thu Aug 13 16:13:26 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Thu, 13 Aug 2015 22:13:26 +0200 Subject: [Numpy-discussion] Multiarray API size mismatch 301 302? In-Reply-To: <1439491337.10782.29.camel@sipsolutions.net> References: <1439490864.10782.28.camel@sipsolutions.net> <1439491337.10782.29.camel@sipsolutions.net> Message-ID: <1439496806.26334.0.camel@sipsolutions.net> So as Julian helped me, it was the wrong style of the function, the curly bracket has to go on the next line for the API generation to pick it up. - Sebastian On Do, 2015-08-13 at 20:42 +0200, Sebastian Berg wrote: > On Do, 2015-08-13 at 14:36 -0400, Benjamin Root wrote: > > Did you do a "git clean -fxd" before re-installing? > > > > Yup. > > > > > On Thu, Aug 13, 2015 at 2:34 PM, Sebastian Berg > > wrote: > > Hey, > > > > just for hacking/testing, I tried to add to shape.c: > > > > > > /*NUMPY_API > > * > > * Checks if memory overlap exists > > */ > > NPY_NO_EXPORT int > > PyArray_ArraysShareMemory(PyArrayObject *arr1, PyArrayObject > > *arr2, int > > work) { > > return solve_may_share_memory(arr1, arr2, work); > > } > > > > > > > > and to numpy_api.py: > > > > # End 1.10 API > > 'PyArray_ArraysShareMemory': (301,), > > > > > > But I am getting the error: > > > > File "numpy/core/code_generators/generate_numpy_api.py", > > line 230, in > > do_generate_api > > (len(multiarray_api_dict), len(multiarray_api_index))) > > AssertionError: Multiarray API size mismatch 301 302 > > > > It is puzzling me, so anyone got a quick idea? > > > > - Sebastian > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From ralf.gommers at gmail.com Thu Aug 13 17:09:24 2015 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Thu, 13 Aug 2015 23:09:24 +0200 Subject: [Numpy-discussion] Problems using add_npy_pkg_config In-Reply-To: <20150813184507.GA24303@sansibar.localdomain> References: <20150812162047.GA26389@sansibar.localdomain> <20150813184507.GA24303@sansibar.localdomain> Message-ID: On Thu, Aug 13, 2015 at 8:45 PM, Christian Engwer < christian.engwer at uni-muenster.de> wrote: > > >> This doesn't answer your question but: why? If you're not > distributing a > > >> Python project, there is no reason to use distutils instead of a sane > build > > >> system. > > > > Come on. We don't take it seriously, and neither do the Python core devs. > > It's also pretty much completely unsupported. Numpy.distutils is a bit > > better in that respect than Python distutils, which doesn't even get sane > > patches merged. > > > > Try Scons, Tup, Gradle, Shake, Waf or anything else that's at least > > somewhat modern and supported. Do not use numpy.distutils unless there's > no > > other mature choice (i.e. you're developing a Python project). > > Sorry, reading my mail again, it seems that I didn't make this point > clear. I have a project which is python + c-lib. The later which should be > used by other c-projects as well. > Thanks for clarifying. It makes more sense now:) > The minimal working example is without any python code, as I only have > problems with the pkg config file. > I stared at it for a while, and can't figure it out despite you following the example in the add_npy_pkg_config docstring pretty much to the letter. When you see that the error is generated in a function that starts with ``# XXX: another ugly workaround to circumvent distutils brain damage.``, you're usually in trouble..... Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Thu Aug 13 18:34:15 2015 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 13 Aug 2015 23:34:15 +0100 Subject: [Numpy-discussion] [SciPy-Dev] ANN: Numpy 1.10.0b1 release In-Reply-To: References: <55CAF8C5.8060202@fysik.dtu.dk> <1439365899.17032.7.camel@sipsolutions.net> <1439378626.17032.15.camel@sipsolutions.net> Message-ID: On Thu, Aug 13, 2015 at 1:04 PM, Matthew Brett wrote: > Hi, > > On Wed, Aug 12, 2015 at 12:23 PM, Sebastian Berg > wrote: >> On Mi, 2015-08-12 at 01:07 -0700, Nathaniel Smith wrote: >>> On Wed, Aug 12, 2015 at 12:51 AM, Sebastian Berg >>> wrote: >>> > On Mi, 2015-08-12 at 09:41 +0200, Jens J?rgen Mortensen wrote: >>> >> On 08/11/2015 11:23 PM, Charles R Harris wrote: >>> >> > Hi All, >>> >> > >>> >> > give this release a whirl and report any problems either on the >>> >> > numpy-discussion list or by opening an issue on github. >>> >> > >>> >> > I'm pleased to announce the first beta release of Numpy 1.10.0. >>> >> > There is over a year's worth of enhancements and bug fixes in the >>> >> > 1.10.0 release, so please give this release a whirl and report any >>> >> > problems either on the numpy-discussion list or by opening an issue >>> >> > on github. Tarballs, installers, and release notes may be found in >>> >> > the usual place at Sourceforge. > > I'm getting test errors on the standard OSX numpy / scipy compilation rig: > > Python.org Python > OSX 10.9 > clang > gfortran 4.2.3 > Compiling from the `maintenance/1.10.x` branch (is there a 1.10.0b1 tag)? > > ====================================================================== > ERROR: test_accelerate_framework_sgemv_fix (test_multiarray.TestDot) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "/Users/mb312/.virtualenvs/test/lib/python2.7/site-packages/numpy/core/tests/test_multiarray.py", > line 4218, in test_accelerate_framework_sgemv_fix > m = aligned_array(100, 15, np.float32) > File "/Users/mb312/.virtualenvs/test/lib/python2.7/site-packages/numpy/core/tests/test_multiarray.py", > line 4200, in aligned_array > d = np.dtype() > TypeError: Required argument 'dtype' (pos 1) not found > > This one should be fixed by https://github.com/numpy/numpy/pull/6202 > > ====================================================================== > ERROR: test_callback.TestF77Callback.test_string_callback > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "/Users/mb312/.virtualenvs/test/lib/python2.7/site-packages/nose/case.py", > line 381, in setUp > try_run(self.inst, ('setup', 'setUp')) > File "/Users/mb312/.virtualenvs/test/lib/python2.7/site-packages/nose/util.py", > line 471, in try_run > return func() > File "/Users/mb312/.virtualenvs/test/lib/python2.7/site-packages/numpy/f2py/tests/util.py", > line 362, in setUp > module_name=self.module_name) > File "/Users/mb312/.virtualenvs/test/lib/python2.7/site-packages/numpy/f2py/tests/util.py", > line 79, in wrapper > memo[key] = func(*a, **kw) > File "/Users/mb312/.virtualenvs/test/lib/python2.7/site-packages/numpy/f2py/tests/util.py", > line 170, in build_code > module_name=module_name) > File "/Users/mb312/.virtualenvs/test/lib/python2.7/site-packages/numpy/f2py/tests/util.py", > line 79, in wrapper > memo[key] = func(*a, **kw) > File "/Users/mb312/.virtualenvs/test/lib/python2.7/site-packages/numpy/f2py/tests/util.py", > line 150, in build_module > __import__(module_name) > ImportError: dlopen(/var/folders/s7/r25pn2xj48n4cm76_mgsb78h0000gn/T/tmpa39XPB/_test_ext_module_5403.so, > 2): Symbol not found: _func0_ > Referenced from: > /var/folders/s7/r25pn2xj48n4cm76_mgsb78h0000gn/T/tmpa39XPB/_test_ext_module_5403.so > Expected in: dynamic lookup > > Any ideas about this second one? I don't get this second error when building with homebrew gfortran 4.8. Is this expected? Should we be raising an error for earlier gfortrans? Cheers, Matthew From charlesr.harris at gmail.com Thu Aug 13 19:37:27 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 13 Aug 2015 17:37:27 -0600 Subject: [Numpy-discussion] [SciPy-Dev] ANN: Numpy 1.10.0b1 release In-Reply-To: References: <55CAF8C5.8060202@fysik.dtu.dk> <1439365899.17032.7.camel@sipsolutions.net> <1439378626.17032.15.camel@sipsolutions.net> Message-ID: On Thu, Aug 13, 2015 at 4:34 PM, Matthew Brett wrote: > On Thu, Aug 13, 2015 at 1:04 PM, Matthew Brett > wrote: > > Hi, > > > > On Wed, Aug 12, 2015 at 12:23 PM, Sebastian Berg > > wrote: > >> On Mi, 2015-08-12 at 01:07 -0700, Nathaniel Smith wrote: > >>> On Wed, Aug 12, 2015 at 12:51 AM, Sebastian Berg > >>> wrote: > >>> > On Mi, 2015-08-12 at 09:41 +0200, Jens J?rgen Mortensen wrote: > >>> >> On 08/11/2015 11:23 PM, Charles R Harris wrote: > >>> >> > Hi All, > >>> >> > > >>> >> > give this release a whirl and report any problems either on the > >>> >> > numpy-discussion list or by opening an issue on github. > >>> >> > > >>> >> > I'm pleased to announce the first beta release of Numpy 1.10.0. > >>> >> > There is over a year's worth of enhancements and bug fixes in the > >>> >> > 1.10.0 release, so please give this release a whirl and report any > >>> >> > problems either on the numpy-discussion list or by opening an > issue > >>> >> > on github. Tarballs, installers, and release notes may be found in > >>> >> > the usual place at Sourceforge. > > > > I'm getting test errors on the standard OSX numpy / scipy compilation > rig: > > > > Python.org Python > > OSX 10.9 > > clang > > gfortran 4.2.3 > > Compiling from the `maintenance/1.10.x` branch (is there a 1.10.0b1 tag)? > > > > ====================================================================== > > ERROR: test_accelerate_framework_sgemv_fix (test_multiarray.TestDot) > > ---------------------------------------------------------------------- > > Traceback (most recent call last): > > File > "/Users/mb312/.virtualenvs/test/lib/python2.7/site-packages/numpy/core/tests/test_multiarray.py", > > line 4218, in test_accelerate_framework_sgemv_fix > > m = aligned_array(100, 15, np.float32) > > File > "/Users/mb312/.virtualenvs/test/lib/python2.7/site-packages/numpy/core/tests/test_multiarray.py", > > line 4200, in aligned_array > > d = np.dtype() > > TypeError: Required argument 'dtype' (pos 1) not found > > > > This one should be fixed by https://github.com/numpy/numpy/pull/6202 > > > > ====================================================================== > > ERROR: test_callback.TestF77Callback.test_string_callback > > ---------------------------------------------------------------------- > > Traceback (most recent call last): > > File > "/Users/mb312/.virtualenvs/test/lib/python2.7/site-packages/nose/case.py", > > line 381, in setUp > > try_run(self.inst, ('setup', 'setUp')) > > File > "/Users/mb312/.virtualenvs/test/lib/python2.7/site-packages/nose/util.py", > > line 471, in try_run > > return func() > > File > "/Users/mb312/.virtualenvs/test/lib/python2.7/site-packages/numpy/f2py/tests/util.py", > > line 362, in setUp > > module_name=self.module_name) > > File > "/Users/mb312/.virtualenvs/test/lib/python2.7/site-packages/numpy/f2py/tests/util.py", > > line 79, in wrapper > > memo[key] = func(*a, **kw) > > File > "/Users/mb312/.virtualenvs/test/lib/python2.7/site-packages/numpy/f2py/tests/util.py", > > line 170, in build_code > > module_name=module_name) > > File > "/Users/mb312/.virtualenvs/test/lib/python2.7/site-packages/numpy/f2py/tests/util.py", > > line 79, in wrapper > > memo[key] = func(*a, **kw) > > File > "/Users/mb312/.virtualenvs/test/lib/python2.7/site-packages/numpy/f2py/tests/util.py", > > line 150, in build_module > > __import__(module_name) > > ImportError: > dlopen(/var/folders/s7/r25pn2xj48n4cm76_mgsb78h0000gn/T/tmpa39XPB/_test_ext_module_5403.so, > > 2): Symbol not found: _func0_ > > Referenced from: > > > /var/folders/s7/r25pn2xj48n4cm76_mgsb78h0000gn/T/tmpa39XPB/_test_ext_module_5403.so > > Expected in: dynamic lookup > > > > Any ideas about this second one? > > I don't get this second error when building with homebrew gfortran > 4.8. Is this expected? Should we be raising an error for earlier > gfortrans? > > Probaby, or we could make the failing bit, if we can find it, gcc version dependent. gcc 4.2 is eight years old, which OS X versions depend on it? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaime.frio at gmail.com Fri Aug 14 01:11:36 2015 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Thu, 13 Aug 2015 22:11:36 -0700 Subject: [Numpy-discussion] Changes to np.digitize since NumPy 1.9? In-Reply-To: References: Message-ID: On Thu, Aug 13, 2015 at 9:57 AM, Jaime Fern?ndez del R?o < jaime.frio at gmail.com> wrote: > On Thu, Aug 13, 2015 at 7:59 AM, Nathan Goldbaum > wrote: > >> >> >> On Thu, Aug 13, 2015 at 9:44 AM, Charles R Harris < >> charlesr.harris at gmail.com> wrote: >> >>> >>> >>> On Thu, Aug 13, 2015 at 12:09 AM, Jaime Fern?ndez del R?o < >>> jaime.frio at gmail.com> wrote: >>> >>>> On Wed, Aug 12, 2015 at 2:03 PM, Nathan Goldbaum >>> > wrote: >>>> >>>>> Hi all, >>>>> >>>>> I've been testing the package I spend most of my time on, yt, under >>>>> numpy 1.10b1 since the announcement went out. >>>>> >>>>> I think I've narrowed down and fixed all of the test failures that >>>>> cropped up except for one last issue. It seems that the behavior of >>>>> np.digitize with respect to ndarray subclasses has changed since the NumPy >>>>> 1.9 series. Consider the following test script: >>>>> >>>>> ```python >>>>> import numpy as np >>>>> >>>>> >>>>> class MyArray(np.ndarray): >>>>> def __new__(cls, *args, **kwargs): >>>>> return np.ndarray.__new__(cls, *args, **kwargs) >>>>> >>>>> data = np.arange(100) >>>>> >>>>> bins = np.arange(100) + 0.5 >>>>> >>>>> data = data.view(MyArray) >>>>> >>>>> bins = bins.view(MyArray) >>>>> >>>>> digits = np.digitize(data, bins) >>>>> >>>>> print type(digits) >>>>> ``` >>>>> >>>>> Under NumPy 1.9.2, this prints "", but under the >>>>> 1.10 beta, it prints "" >>>>> >>>>> I'm curious why this change was made. Since digitize outputs index >>>>> arrays, it doesn't make sense to me why it should return anything but a >>>>> plain ndarray. I see in the release notes that digitize now uses >>>>> searchsorted under the hood. Is this related? >>>>> >>>> >>>> It is indeed searchsorted's fault, as it returns an object of the same >>>> type as the needle (the items to search for): >>>> >>>> >>> import numpy as np >>>> >>> class A(np.ndarray): pass >>>> >>> class B(np.ndarray): pass >>>> >>> np.arange(10).view(A).searchsorted(np.arange(5).view(B)) >>>> B([0, 1, 2, 3, 4]) >>>> >>>> I am all for making index-returning functions always return a base >>>> ndarray, and will be more than happy to send a PR fixing this if there is >>>> some agreement. >>>> >>> >>> I think that is the right thing to do. >>> >> >> Awesome, I'd appreciate having a PR to fix this. Arguably the return type >> *could* be the same type as the inputs, but given that it's a behavior >> change I agree that it's best to add a patch so the output of serachsorted >> is "sanitized" to be an ndarray before it's returned by digitize. >> > > It is relatively simple to do, just replace Py_TYPE(ap2) with > &PyArray_Type in this line: > > > https://github.com/numpy/numpy/blob/maintenance/1.10.x/numpy/core/src/multiarray/item_selection.c#L1725 > > Then fix all the tests that are expecting searchsorted to return something > else than a base ndarray. We already have modified nonzero to return base > ndarray's in this release, see the release notes, so it will go with the > same theme. > > For 1.11 I think we should try to extend this "if it returns an index, it > will be a base ndarray" to all other functions that don't right now. Then > sit back and watch AstroPy come down in flames... ;-))) > > Seriously, I think this makes a lot of sense, and should be documented as > the way NumPy handles index arrays. > > Anyway, I will try to find time tonight to put this PR together, unless > someone beats me to it, which I would be totally fine with. > PR #6206 it is: https://github.com/numpy/numpy/pull/6206 Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From subbarao.athmuri at gmail.com Fri Aug 14 01:01:27 2015 From: subbarao.athmuri at gmail.com (subro) Date: Thu, 13 Aug 2015 22:01:27 -0700 (MST) Subject: [Numpy-discussion] Help in understanding Message-ID: <1439528487720-40827.post@n7.nabble.com> Hi, I am new to NumPy, Can someone help me in understanding below code. >>> names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe']) >>> data = np.random.random((7,4)) >>> print data [[ 0.85402649 0.12827655 0.5805555 0.86288236] [ 0.30162683 0.45269508 0.98098039 0.1291469 ] [ 0.21229924 0.37497112 0.57367496 0.08607771] [ 0.302866 0.42160468 0.26879288 0.68032467] [ 0.60612492 0.35210577 0.91355096 0.57872181] [ 0.11583826 0.81988882 0.39214077 0.51377566] [ 0.03767641 0.1920532 0.24872009 0.36068313]] >>> data[names == 'Bob'] array([[ 0.85402649, 0.12827655, 0.5805555 , 0.86288236], [ 0.302866 , 0.42160468, 0.26879288, 0.68032467]]) Also, can someone help me where and how to practice NumPy? -- View this message in context: http://numpy-discussion.10968.n7.nabble.com/Help-in-understanding-tp40827.html Sent from the Numpy-discussion mailing list archive at Nabble.com. From deadmanwalking.aditya at gmail.com Fri Aug 14 03:06:08 2015 From: deadmanwalking.aditya at gmail.com (Aditya Krishnamurthy) Date: Fri, 14 Aug 2015 12:36:08 +0530 Subject: [Numpy-discussion] Help in understanding In-Reply-To: <1439528487720-40827.post@n7.nabble.com> References: <1439528487720-40827.post@n7.nabble.com> Message-ID: names == 'Bob' returns a boolean array [True, False, False, True, False, False, False], and data[boolean_array] returns all those elements of data where the boolean array is True. data is a list of 7 lists, so the two lists corresponding to True values are returned. Read the Numpy basics and Advanced Numpy chapters from Python for Data Analysis by Wes McKinney, it is available ol. On Fri, Aug 14, 2015 at 10:31 AM, subro wrote: > Hi, > > I am new to NumPy, Can someone help me in understanding below code. > > >>> names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe']) > > >>> data = np.random.random((7,4)) > > >>> print data > [[ 0.85402649 0.12827655 0.5805555 0.86288236] > [ 0.30162683 0.45269508 0.98098039 0.1291469 ] > [ 0.21229924 0.37497112 0.57367496 0.08607771] > [ 0.302866 0.42160468 0.26879288 0.68032467] > [ 0.60612492 0.35210577 0.91355096 0.57872181] > [ 0.11583826 0.81988882 0.39214077 0.51377566] > [ 0.03767641 0.1920532 0.24872009 0.36068313]] > > >>> data[names == 'Bob'] > > array([[ 0.85402649, 0.12827655, 0.5805555 , 0.86288236], > [ 0.302866 , 0.42160468, 0.26879288, 0.68032467]]) > > Also, can someone help me where and how to practice NumPy? > > > > -- > View this message in context: > http://numpy-discussion.10968.n7.nabble.com/Help-in-understanding-tp40827.html > Sent from the Numpy-discussion mailing list archive at Nabble.com. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.h.vankerkwijk at gmail.com Fri Aug 14 05:15:39 2015 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Fri, 14 Aug 2015 11:15:39 +0200 Subject: [Numpy-discussion] Changes to np.digitize since NumPy 1.9? In-Reply-To: References: Message-ID: For what it's worth, also from my astropy perspective I think hat any index array should be a base ndarray! -- Marten On Fri, Aug 14, 2015 at 7:11 AM, Jaime Fern?ndez del R?o < jaime.frio at gmail.com> wrote: > On Thu, Aug 13, 2015 at 9:57 AM, Jaime Fern?ndez del R?o < > jaime.frio at gmail.com> wrote: > >> On Thu, Aug 13, 2015 at 7:59 AM, Nathan Goldbaum >> wrote: >> >>> >>> >>> On Thu, Aug 13, 2015 at 9:44 AM, Charles R Harris < >>> charlesr.harris at gmail.com> wrote: >>> >>>> >>>> >>>> On Thu, Aug 13, 2015 at 12:09 AM, Jaime Fern?ndez del R?o < >>>> jaime.frio at gmail.com> wrote: >>>> >>>>> On Wed, Aug 12, 2015 at 2:03 PM, Nathan Goldbaum < >>>>> nathan12343 at gmail.com> wrote: >>>>> >>>>>> Hi all, >>>>>> >>>>>> I've been testing the package I spend most of my time on, yt, under >>>>>> numpy 1.10b1 since the announcement went out. >>>>>> >>>>>> I think I've narrowed down and fixed all of the test failures that >>>>>> cropped up except for one last issue. It seems that the behavior of >>>>>> np.digitize with respect to ndarray subclasses has changed since the NumPy >>>>>> 1.9 series. Consider the following test script: >>>>>> >>>>>> ```python >>>>>> import numpy as np >>>>>> >>>>>> >>>>>> class MyArray(np.ndarray): >>>>>> def __new__(cls, *args, **kwargs): >>>>>> return np.ndarray.__new__(cls, *args, **kwargs) >>>>>> >>>>>> data = np.arange(100) >>>>>> >>>>>> bins = np.arange(100) + 0.5 >>>>>> >>>>>> data = data.view(MyArray) >>>>>> >>>>>> bins = bins.view(MyArray) >>>>>> >>>>>> digits = np.digitize(data, bins) >>>>>> >>>>>> print type(digits) >>>>>> ``` >>>>>> >>>>>> Under NumPy 1.9.2, this prints "", but under >>>>>> the 1.10 beta, it prints "" >>>>>> >>>>>> I'm curious why this change was made. Since digitize outputs index >>>>>> arrays, it doesn't make sense to me why it should return anything but a >>>>>> plain ndarray. I see in the release notes that digitize now uses >>>>>> searchsorted under the hood. Is this related? >>>>>> >>>>> >>>>> It is indeed searchsorted's fault, as it returns an object of the same >>>>> type as the needle (the items to search for): >>>>> >>>>> >>> import numpy as np >>>>> >>> class A(np.ndarray): pass >>>>> >>> class B(np.ndarray): pass >>>>> >>> np.arange(10).view(A).searchsorted(np.arange(5).view(B)) >>>>> B([0, 1, 2, 3, 4]) >>>>> >>>>> I am all for making index-returning functions always return a base >>>>> ndarray, and will be more than happy to send a PR fixing this if there is >>>>> some agreement. >>>>> >>>> >>>> I think that is the right thing to do. >>>> >>> >>> Awesome, I'd appreciate having a PR to fix this. Arguably the return >>> type *could* be the same type as the inputs, but given that it's a behavior >>> change I agree that it's best to add a patch so the output of serachsorted >>> is "sanitized" to be an ndarray before it's returned by digitize. >>> >> >> It is relatively simple to do, just replace Py_TYPE(ap2) with >> &PyArray_Type in this line: >> >> >> https://github.com/numpy/numpy/blob/maintenance/1.10.x/numpy/core/src/multiarray/item_selection.c#L1725 >> >> Then fix all the tests that are expecting searchsorted to return >> something else than a base ndarray. We already have modified nonzero to >> return base ndarray's in this release, see the release notes, so it will go >> with the same theme. >> >> For 1.11 I think we should try to extend this "if it returns an index, it >> will be a base ndarray" to all other functions that don't right now. Then >> sit back and watch AstroPy come down in flames... ;-))) >> >> Seriously, I think this makes a lot of sense, and should be documented as >> the way NumPy handles index arrays. >> >> Anyway, I will try to find time tonight to put this PR together, unless >> someone beats me to it, which I would be totally fine with. >> > > PR #6206 it is: https://github.com/numpy/numpy/pull/6206 > > Jaime > > -- > (\__/) > ( O.o) > ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes > de dominaci?n mundial. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Fri Aug 14 12:12:46 2015 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 14 Aug 2015 09:12:46 -0700 Subject: [Numpy-discussion] Development workflow (not git tutorial) In-Reply-To: <87lhdfxcze.fsf@berkeley.edu> References: <87lhdfxcze.fsf@berkeley.edu> Message-ID: On Thu, Aug 13, 2015 at 11:25 AM, Stefan van der Walt wrote: > >(for > > example "python setup.py develop", although suggested by > > setup.py itself, claims that "develop" is not a command). develop is a command provided by setuptools, not distutils itself. I find it absolutely invaluable -- it is THE way to go when actively working on any package under development. if numpy doesn't currently use setuptools, it probably should (though maybe it's gets messy with numpy's distutils extensions...) Nowadays, you can use > > pip install -e . > pip "injects" setuptools into the mix -- so this may be develope mode with a different name. but yes, a fine option for a package that doesn't use setuptools out of the box. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Fri Aug 14 13:08:11 2015 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 14 Aug 2015 13:08:11 -0400 Subject: [Numpy-discussion] Development workflow (not git tutorial) In-Reply-To: References: <87lhdfxcze.fsf@berkeley.edu> Message-ID: I used to be a huge advocate for the "develop" mode, but not anymore. I have run into way too many Heisenbugs that would clear up if I nuked my source tree and re-clone. I should also note that there is currently an open issue with "pip install -e" and namespace packages. This has been reported to matplotlib with regards to mpl_toolkits. Essentially, if you have namespace packages, it doesn't get installed correctly in this mode, and python won't find them. On Fri, Aug 14, 2015 at 12:12 PM, Chris Barker wrote: > On Thu, Aug 13, 2015 at 11:25 AM, Stefan van der Walt < > stefanv at berkeley.edu> wrote: > >> >(for >> > example "python setup.py develop", although suggested by >> > setup.py itself, claims that "develop" is not a command). > > > develop is a command provided by setuptools, not distutils itself. > > I find it absolutely invaluable -- it is THE way to go when actively > working on any package under development. > > if numpy doesn't currently use setuptools, it probably should (though > maybe it's gets messy with numpy's distutils extensions...) > > Nowadays, you can use >> >> pip install -e . >> > > pip "injects" setuptools into the mix -- so this may be develope mode with > a different name. but yes, a fine option for a package that doesn't use > setuptools out of the box. > > -Chris > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From allanhaldane at gmail.com Fri Aug 14 13:45:33 2015 From: allanhaldane at gmail.com (Allan Haldane) Date: Fri, 14 Aug 2015 13:45:33 -0400 Subject: [Numpy-discussion] Development workflow (not git tutorial) In-Reply-To: References: Message-ID: <55CE293D.70502@gmail.com> On 08/13/2015 11:52 AM, Anne Archibald wrote: > Hi, > > What is a sensible way to work on (modify, compile, and test) numpy? > > There is documentation about "contributing to numpy" at: > http://docs.scipy.org/doc/numpy-dev/dev/index.html > and: > http://docs.scipy.org/doc/numpy-dev/dev/gitwash/development_workflow.html > but these are entirely focused on using git. I have no problem with that > aspect. It is building and testing that I am looking for the Right Way > to do. Related to this, does anyone know how to debug numpy in gdb with proper symbols/source lines, like I can do with other C extensions? I've tried modifying numpy distutils to try to add the right compiler/linker flags, without success. Allan From pav at iki.fi Fri Aug 14 13:52:24 2015 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 14 Aug 2015 20:52:24 +0300 Subject: [Numpy-discussion] Development workflow (not git tutorial) In-Reply-To: <55CE293D.70502@gmail.com> References: <55CE293D.70502@gmail.com> Message-ID: 14.08.2015, 20:45, Allan Haldane kirjoitti: [clip] > Related to this, does anyone know how to debug numpy in gdb with proper > symbols/source lines, like I can do with other C extensions? I've tried > modifying numpy distutils to try to add the right compiler/linker flags, > without success. runtests.py --help gdb --args python runtests.py -g --python script.py grep env runtests.py From allanhaldane at gmail.com Fri Aug 14 13:57:21 2015 From: allanhaldane at gmail.com (Allan Haldane) Date: Fri, 14 Aug 2015 13:57:21 -0400 Subject: [Numpy-discussion] Development workflow (not git tutorial) In-Reply-To: References: <55CE293D.70502@gmail.com> Message-ID: <55CE2C01.4000504@gmail.com> On 08/14/2015 01:52 PM, Pauli Virtanen wrote: > 14.08.2015, 20:45, Allan Haldane kirjoitti: > [clip] >> Related to this, does anyone know how to debug numpy in gdb with proper >> symbols/source lines, like I can do with other C extensions? I've tried >> modifying numpy distutils to try to add the right compiler/linker flags, >> without success. > > runtests.py --help > > gdb --args python runtests.py -g --python script.py > > grep env runtests.py Oh! Thank you, I missed that. From njs at pobox.com Fri Aug 14 15:19:16 2015 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 14 Aug 2015 12:19:16 -0700 Subject: [Numpy-discussion] Development workflow (not git tutorial) In-Reply-To: References: <87lhdfxcze.fsf@berkeley.edu> Message-ID: On Aug 14, 2015 09:16, "Chris Barker" wrote: > > On Thu, Aug 13, 2015 at 11:25 AM, Stefan van der Walt < stefanv at berkeley.edu> wrote: >> >> >(for >> > example "python setup.py develop", although suggested by >> > setup.py itself, claims that "develop" is not a command). > > > develop is a command provided by setuptools, not distutils itself. > > I find it absolutely invaluable -- it is THE way to go when actively working on any package under development. > > if numpy doesn't currently use setuptools, it probably should (though maybe it's gets messy with numpy's distutils extensions...) Regarding using setuptools by default, one problem is that it actually acts rather differently from distutils by default. See https://bitbucket.org/pypa/setuptools/issues/371/setuptools-and-state-of-pep-376 >> Nowadays, you can use >> >> pip install -e . > > > pip "injects" setuptools into the mix -- so this may be develope mode with a different name. but yes, a fine option for a package that doesn't use setuptools out of the box. The version of setuptools that pip injects is also monkeypatched by pip to fix some of setuptools more obnoxious defaults. (The ones described in that bug report.) Using pip is also is the only way to reliably install all the right metadata needed to avoid problems later -- in particular pip will record the information needed to do uninstall/upgrades correctly, which neither distutils nor setuptools will do if you run setup.py directly. Basically this means running 'setup.py install' is always broken, for all projects and no matter how setup.py is written, and you should always run a pip command instead, even when building from the source tree. This is true for every python package, though, not just numpy. So setuptools doesn't provide much that's compelling for us... I believe if you really want it, though, you can run numpy's setupegg.py, which is the same as setup.py but using setuptools. Or something like that? I share Benjamin's doubts about the whole 'develop' approach, though, however accessed. For pure python packages, just importing from the source tree directly works fine and is way less error prone. For non-pure packages, I don't trust develop much anyway... build_ext --inplace can work nicely, or for numpy in particular runtests solves all my problems. (Though even then I still sometimes need to nuke the build directory or run clean manually.) -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Fri Aug 14 16:19:56 2015 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 14 Aug 2015 13:19:56 -0700 Subject: [Numpy-discussion] Development workflow (not git tutorial) In-Reply-To: References: <87lhdfxcze.fsf@berkeley.edu> Message-ID: On Fri, Aug 14, 2015 at 10:08 AM, Benjamin Root wrote: > I used to be a huge advocate for the "develop" mode, but not anymore. I > have run into way too many Heisenbugs that would clear up if I nuked my > source tree and re-clone. > well, you do need to remember to clean out once in a while, when somethign weird is happening... But I prefer that to the other options, which are: * re-builda nd re-install with every frikin' change * do sys.path manipulations, which is ugly,, error prone, and has the same problems as develop mode anyway * rely on relative imports for all your tests and the like -- error prone and ugly -- oh, and you still have the problems above... I should also note that there is currently an open issue with "pip install > -e" and namespace packages. > yeah, I actually gave up on namespace packages due to them not working right with develop mode. (I'm not sure if -e and develop mode are exactly the same or not...) -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From jenshnielsen at gmail.com Fri Aug 14 16:34:16 2015 From: jenshnielsen at gmail.com (Jens Nielsen) Date: Fri, 14 Aug 2015 20:34:16 +0000 Subject: [Numpy-discussion] Development workflow (not git tutorial) In-Reply-To: References: <87lhdfxcze.fsf@berkeley.edu> Message-ID: I think it's clear that develop/-e does not work well together with namespace packages. As noted on the relevant matplotlib issue https://github.com/matplotlib/matplotlib/issues/4907 I think the issue with namespace packages is essentially this well known one https://github.com/pypa/pip/issues/3 which I think I agree with Chris is enough to drop namespace packages if possible. >From the output of pip install -e I would say that it clear that it calls develop. Since pip install -e and and pip install uses fundamentally different ways of manage namespace packages they can't work together. In the case of matplotlib issue #4907 basemap is probably installed into the namespace with pip install while matplotlib is installed with pip install -e which clearly triggers the issue in https://github.com/pypa/pip/issues/3 best Jens fre. 14. aug. 2015 kl. 21.21 skrev Chris Barker : > On Fri, Aug 14, 2015 at 10:08 AM, Benjamin Root wrote: > >> I used to be a huge advocate for the "develop" mode, but not anymore. I >> have run into way too many Heisenbugs that would clear up if I nuked my >> source tree and re-clone. >> > > well, you do need to remember to clean out once in a while, when somethign > weird is happening... > > But I prefer that to the other options, which are: > > * re-builda nd re-install with every frikin' change > > * do sys.path manipulations, which is ugly,, error prone, and has the same > problems as develop mode anyway > > * rely on relative imports for all your tests and the like -- error prone > and ugly -- oh, and you still have the problems above... > > > I should also note that there is currently an open issue with "pip install >> -e" and namespace packages. >> > > yeah, I actually gave up on namespace packages due to them not working > right with develop mode. > > (I'm not sure if -e and develop mode are exactly the same or not...) > > -CHB > > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefanv at berkeley.edu Fri Aug 14 13:15:14 2015 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Fri, 14 Aug 2015 10:15:14 -0700 Subject: [Numpy-discussion] Development workflow (not git tutorial) In-Reply-To: References: <87lhdfxcze.fsf@berkeley.edu> Message-ID: <87lhddx04d.fsf@berkeley.edu> On 2015-08-14 10:08:11, Benjamin Root wrote: > I should also note that there is currently an open issue with > "pip install -e" and namespace packages. This has been reported > to matplotlib with regards to mpl_toolkits. Essentially, if you > have namespace packages, it doesn't get installed correctly in > this mode, and python won't find them. There are lots of issues with namespace packages, which is why what used to be scikits.learn and scikits.image are now all standalone packages. Perhaps mpl_toolkits should think of becoming mpl_3d, mpl_basemaps, etc.? St?fan From christian.engwer at uni-muenster.de Fri Aug 14 17:25:33 2015 From: christian.engwer at uni-muenster.de (Christian Engwer) Date: Fri, 14 Aug 2015 23:25:33 +0200 Subject: [Numpy-discussion] Problems using add_npy_pkg_config In-Reply-To: References: <20150812162047.GA26389@sansibar.localdomain> <20150813184507.GA24303@sansibar.localdomain> Message-ID: <20150814212533.GA8656@sansibar.localdomain> Dear Ralf, > I stared at it for a while, and can't figure it out despite you following > the example in the add_npy_pkg_config docstring pretty much to the letter. > When you see that the error is generated in a function that starts with ``# > XXX: another ugly workaround to circumvent distutils brain damage.``, > you're usually in trouble..... what a pity... do you have an alternative suggestion? Is there a good alternative, e.g. using cmake, to distribute python modules? Ciao Christian From chris.barker at noaa.gov Fri Aug 14 18:44:24 2015 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 14 Aug 2015 15:44:24 -0700 Subject: [Numpy-discussion] Development workflow (not git tutorial) In-Reply-To: <87lhddx04d.fsf@berkeley.edu> References: <87lhdfxcze.fsf@berkeley.edu> <87lhddx04d.fsf@berkeley.edu> Message-ID: On Fri, Aug 14, 2015 at 10:15 AM, Stefan van der Walt wrote: > Perhaps mpl_toolkits should think of > becoming mpl_3d, mpl_basemaps, etc.? > namespace packages are a fine idea, but the implementation(s) are just one big kludge... I think so, but we're getting off-topic here. numpy doesn't use namespace packages, so develop mode works there. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Fri Aug 14 19:08:05 2015 From: pav at iki.fi (Pauli Virtanen) Date: Sat, 15 Aug 2015 02:08:05 +0300 Subject: [Numpy-discussion] Development workflow (not git tutorial) In-Reply-To: References: <87lhdfxcze.fsf@berkeley.edu> <87lhddx04d.fsf@berkeley.edu> Message-ID: 15.08.2015, 01:44, Chris Barker kirjoitti: [clip] > numpy doesn't use namespace packages, so develop mode works there. The develop mode is mainly useful with a virtualenv. Otherwise, you install work-in-progress development version into your ~/.local which then breaks everything else. In addition to this, "python setupegg.py develop --uninstall" says "Note: you must uninstall or replace scripts manually!", and since the scripts end up with dev version requirement hardcoded, and you have to delete the scripts manually. Virtualenvs are annoying to manage, and at least for me personally it's easier to just deal with pythonpath, especially as runtests.py manages that. Anyway, TIMTOWTDI From ralf.gommers at gmail.com Sat Aug 15 04:19:24 2015 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 15 Aug 2015 10:19:24 +0200 Subject: [Numpy-discussion] Development workflow (not git tutorial) In-Reply-To: References: <87lhdfxcze.fsf@berkeley.edu> <87lhddx04d.fsf@berkeley.edu> Message-ID: On Sat, Aug 15, 2015 at 1:08 AM, Pauli Virtanen wrote: > 15.08.2015, 01:44, Chris Barker kirjoitti: > [clip] > > numpy doesn't use namespace packages, so develop mode works there. > > The develop mode is mainly useful with a virtualenv. > > Otherwise, you install work-in-progress development version into your > ~/.local which then breaks everything else. In addition to this, "python > setupegg.py develop --uninstall" says "Note: you must uninstall or > replace scripts manually!", and since the scripts end up with dev > version requirement hardcoded, and you have to delete the scripts manually. > > Virtualenvs are annoying to manage, and at least for me personally it's > easier to just deal with pythonpath, especially as runtests.py manages > that. > I completely agree. Virtualenv/pip/setuptools all too many issues and corner cases where things don't quite work for development purposes. Using runtests.py is the most reliable approach for working on numpy (or just use an in-place build + pythonpath management if you prefer). To get back to the original question of Anne (as well as the gdb one): most of what was said and recommended in this thread is fairly well documented in https://github.com/numpy/numpy/blob/master/doc/source/dev/development_environment.rst Unfortunately it doesn't yet show up in http://docs.scipy.org/doc/numpy-dev/dev/index.html because that hasn't been updated in a while. If someone who knows how to do that could push an update, that would be great. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sat Aug 15 05:19:51 2015 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 15 Aug 2015 11:19:51 +0200 Subject: [Numpy-discussion] Problems using add_npy_pkg_config In-Reply-To: <20150814212533.GA8656@sansibar.localdomain> References: <20150812162047.GA26389@sansibar.localdomain> <20150813184507.GA24303@sansibar.localdomain> <20150814212533.GA8656@sansibar.localdomain> Message-ID: On Fri, Aug 14, 2015 at 11:25 PM, Christian Engwer < christian.engwer at uni-muenster.de> wrote: > Dear Ralf, > > > I stared at it for a while, and can't figure it out despite you following > > the example in the add_npy_pkg_config docstring pretty much to the > letter. > > When you see that the error is generated in a function that starts with > ``# > > XXX: another ugly workaround to circumvent distutils brain damage.``, > > you're usually in trouble..... > > what a pity... do you have an alternative suggestion? Is there a good > alternative, e.g. using cmake, to distribute python modules? > I wouldn't give up on distutils here (yet). For distributing/installing python packages, PyPi + pip are the de-facto standard and pip is currently tied to distutils/setuptools unfortunately. That I can't figure out your issue in 20 minutes doesn't mean it's not fixable, it just means that I'm not smart enough to keep the distutils "design" in my head:) The code you're trying to use isn't well tested because while a lot of packages use numpy.distutils with compiled code, very few Python packages expose a C API. For example Scipy doesn't use `add_npy_pkg_config` or `add_installed_library` at all. Those functions work for numpy itself though, so they can't be completely broken. If no one has an answer here, what I would do if I were you is break out your debugger and figure out what's in `pkg` when you build numpy itself and why it's None when you build your own code. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sat Aug 15 05:27:27 2015 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 15 Aug 2015 11:27:27 +0200 Subject: [Numpy-discussion] Problems using add_npy_pkg_config In-Reply-To: References: <20150812162047.GA26389@sansibar.localdomain> <20150813184507.GA24303@sansibar.localdomain> <20150814212533.GA8656@sansibar.localdomain> Message-ID: On Sat, Aug 15, 2015 at 11:19 AM, Ralf Gommers wrote: > > > > On Fri, Aug 14, 2015 at 11:25 PM, Christian Engwer < > christian.engwer at uni-muenster.de> wrote: > >> Dear Ralf, >> >> > I stared at it for a while, and can't figure it out despite you >> following >> > the example in the add_npy_pkg_config docstring pretty much to the >> letter. >> > When you see that the error is generated in a function that starts with >> ``# >> > XXX: another ugly workaround to circumvent distutils brain damage.``, >> > you're usually in trouble..... >> >> what a pity... do you have an alternative suggestion? Is there a good >> alternative, e.g. using cmake, to distribute python modules? >> > > > I wouldn't give up on distutils here (yet). For distributing/installing > python packages, PyPi + pip are the de-facto standard and pip is currently > tied to distutils/setuptools unfortunately. > Correction: the above is only completely true if you rely on source builds. You can't avoid those with PyPi on Linux, but if you only need to support Windows and OS X nowadays you can get away with no disutils if you upload only binary wheels for those OSes to PyPi. Regarding alternatives, this discussion is a bit older but mostly still relevant: http://article.gmane.org/gmane.comp.python.numeric.general/27788 Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From Nicolas.Rougier at inria.fr Sun Aug 16 14:59:05 2015 From: Nicolas.Rougier at inria.fr (Nicolas P. Rougier) Date: Sun, 16 Aug 2015 20:59:05 +0200 Subject: [Numpy-discussion] Numpy 100 exercices Message-ID: Hi all, I've just updated the collection of numpy exercices (collected from this list and stack overflow) that lives at: https://github.com/rougier/numpy-100 http://www.labri.fr/perso/nrougier/teaching/numpy.100/index.html Unfortunately, I also realized there are currently "only" 60 exercices... So, if you remember a nice question that has been answered on this list (or elsewhere)... Or you can also make a PR on github. Nicolas From charlesr.harris at gmail.com Sun Aug 16 16:04:33 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 16 Aug 2015 14:04:33 -0600 Subject: [Numpy-discussion] Numpy 1.11 Message-ID: Hi All, While waiting for Christoph to drop the other shoe on 1.10.0b1, I thought I'd try again to start a discussion on the 1.11 release. If we want to get the next release out in a timely manner there should be some advance planning to cover at least the following three items - Release manager: I can do that again, but would be more than happy to let someone else give it a shot. - Goals to meet: - __numpy_ufunc__ - masked array and recarray fixups for astropy - ? - Release date to shoot for, I'd suggest Feb 1, 2016 Thoughts? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Mon Aug 17 04:53:01 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Mon, 17 Aug 2015 10:53:01 +0200 Subject: [Numpy-discussion] Numpy 1.11 In-Reply-To: References: Message-ID: <1439801581.1893.29.camel@sipsolutions.net> On So, 2015-08-16 at 14:04 -0600, Charles R Harris wrote: > Hi All, > > > While waiting for Christoph to drop the other shoe on 1.10.0b1, I > thought I'd try again to start a discussion on the 1.11 release. If we > want to get the next release out in a timely manner there should be > some advance planning to cover at least the following three items > > * Release manager: I can do that again, but would be more than > happy to let someone else give it a shot. > * Goals to meet: > * __numpy_ufunc__ > * masked array and recarray fixups for astropy > * ? > * Release date to shoot for, I'd suggest Feb 1, 2016 > Sounds good to me. Feb. 1 probably would mean a feature freeze in the beginning of January? I guess we could try to push a bit harder to a stricter 4 months cycle, but with the common and of year stress and holiday season it sounds reasonable. Can't think of any goal to explicitly meet [1], but I guess that is mostly downstream wishes anyway ;). Personally, I would like to see Pauli's overlap detection and the proposed indexing changes by the next release. That should be doable, but they are not that explicit goals. Our Austin discussions might have some smaller stuff/wishes that should have some priority. - Sebastian [1] However, I would very much like to see the organizational stuff to be finalized this year. But I know it is probably quite a bit of work which can drag on. > Thoughts? > > Chuck > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From mailinglists at xgm.de Mon Aug 17 07:11:43 2015 From: mailinglists at xgm.de (Florian Lindner) Date: Mon, 17 Aug 2015 13:11:43 +0200 Subject: [Numpy-discussion] NPY_DOUBLE not declared Message-ID: Hello, I try to converse a piece of C code to the new NumPy API to get rid of the deprecation warning. #warning "Using deprecated NumPy API, disable it by " "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" As a first step I replaced arrayobject.h by numpy/npy_math.h: #include #include But this gives errors that NPY_DOUBLE is not declared. http://docs.scipy.org/doc/numpy/reference/c-api.dtype.html#c.NPY_DOUBLE gives no information where NPY_DOUBLE is declared, so I used the standard npy_math.h header. src/action/PythonAction.cpp:90:51: error: 'NPY_DOUBLE' was not declared in this scope PyArray_SimpleNewFromData(1, sourceDim, NPY_DOUBLE, sourceValues); Including numpy/npy_common.h does not change it either. Thanks, Florian From sturla.molden at gmail.com Mon Aug 17 09:05:42 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Mon, 17 Aug 2015 15:05:42 +0200 Subject: [Numpy-discussion] NPY_DOUBLE not declared In-Reply-To: References: Message-ID: Why not do as it says instead? #define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION /* Or: NPY_NO_DEPRECATED_API NPY_API_VERSION */ #include #include Sturla On 17/08/15 13:11, Florian Lindner wrote: > Hello, > > I try to converse a piece of C code to the new NumPy API to get rid of the > deprecation warning. > > #warning "Using deprecated NumPy API, disable it by " "#defining > NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" > > As a first step I replaced arrayobject.h by numpy/npy_math.h: > > #include > #include > > But this gives errors that NPY_DOUBLE is not declared. > > http://docs.scipy.org/doc/numpy/reference/c-api.dtype.html#c.NPY_DOUBLE > gives no information where NPY_DOUBLE is declared, so I used the standard > npy_math.h header. > > src/action/PythonAction.cpp:90:51: error: 'NPY_DOUBLE' was not declared in > this scope > PyArray_SimpleNewFromData(1, sourceDim, NPY_DOUBLE, > sourceValues); > > > Including numpy/npy_common.h does not change it either. > > Thanks, > Florian > From ralf.gommers at gmail.com Mon Aug 17 15:53:06 2015 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Mon, 17 Aug 2015 21:53:06 +0200 Subject: [Numpy-discussion] Numpy 1.11 In-Reply-To: <1439801581.1893.29.camel@sipsolutions.net> References: <1439801581.1893.29.camel@sipsolutions.net> Message-ID: On Mon, Aug 17, 2015 at 10:53 AM, Sebastian Berg wrote: > On So, 2015-08-16 at 14:04 -0600, Charles R Harris wrote: > > Hi All, > > > > > > While waiting for Christoph to drop the other shoe on 1.10.0b1, I > > thought I'd try again to start a discussion on the 1.11 release. If we > > want to get the next release out in a timely manner there should be > > some advance planning to cover at least the following three items > > > > * Release manager: I can do that again, but would be more than > > happy to let someone else give it a shot. > While I don't want to do the job, I will volunteer to help fix the documentation on how to do this job where needed. > > * Goals to meet: > > * __numpy_ufunc__ > > * masked array and recarray fixups for astropy > > * ? > > * Release date to shoot for, I'd suggest Feb 1, 2016 > > > > Sounds good to me. Feb. 1 probably would mean a feature freeze in the > beginning of January? I guess we could try to push a bit harder to a > stricter 4 months cycle, but with the common and of year stress and > holiday season it sounds reasonable. > That means branching early to mid December, sounds fine to me. > Can't think of any goal to explicitly meet [1], but I guess that is > mostly downstream wishes anyway ;). > Personally, I would like to see Pauli's overlap detection and the > proposed indexing changes by the next release. That should be doable, > but they are not that explicit goals. > Our Austin discussions might have some smaller stuff/wishes that should > have some priority. > > - Sebastian > > > [1] However, I would very much like to see the organizational stuff to > be finalized this year. But I know it is probably quite a bit of work > which can drag on. > not related to a release, but +10 for this goal Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue Aug 18 16:07:17 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 18 Aug 2015 14:07:17 -0600 Subject: [Numpy-discussion] Proposal to remove the Bento build. Message-ID: Hi All, . I'm bringing up this topic again on account of the discussion at https://github.com/numpy/numpy/pull/6199. The proposal is to stop (trying) to support the Bento build system for Numpy and remove it. Votes and discussion welcome. Along the same lines, Pauli has suggested removing the single file builds, but Nathaniel has pointed out that it may be the only way to produce static python + numpy builds. If anyone does that or has more information about it, please comment. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Tue Aug 18 19:15:39 2015 From: cournape at gmail.com (David Cournapeau) Date: Wed, 19 Aug 2015 00:15:39 +0100 Subject: [Numpy-discussion] Proposal to remove the Bento build. In-Reply-To: References: Message-ID: If everybody wants to remove bento, we should remove it. Regarding single file builds, why would it help for static builds ? I understand it would make things slightly easier to have one .o per extension, but it does not change the fundamental process as the exported symbols are the same in the end ? David On Tue, Aug 18, 2015 at 9:07 PM, Charles R Harris wrote: > Hi All, > . > I'm bringing up this topic again on account of the discussion at > https://github.com/numpy/numpy/pull/6199. The proposal is to stop > (trying) to support the Bento build system for Numpy and remove it. Votes > and discussion welcome. > > Along the same lines, Pauli has suggested removing the single file builds, > but Nathaniel has pointed out that it may be the only way to produce static > python + numpy builds. If anyone does that or has more information about > it, please comment. > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Tue Aug 18 20:22:17 2015 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 18 Aug 2015 17:22:17 -0700 Subject: [Numpy-discussion] Proposal to remove the Bento build. In-Reply-To: References: Message-ID: On Tue, Aug 18, 2015 at 4:15 PM, David Cournapeau wrote: > If everybody wants to remove bento, we should remove it. FWIW, I don't really have an opinion either way on bento versus distutils, I just feel that we shouldn't maintain two build systems unless we're actively planning to get rid of one of them, and for several years now we haven't really been learning anything by keeping the bento build working, nor has there been any movement towards switching to bento as the one-and-only build system, or even a clear consensus that this would be a good thing. (Obviously distutils and numpy.distutils are junk, so that's a point in bento's favor, but it isn't *totally* cut and dried -- we know numpy.distutils works and we have to maintain it regardless for backcompat, while bento doesn't seem to have any activity upstream or any other users...). So I'd be totally in favor of adding bento back later if/when such a plan materializes; I just don't think it makes sense to keep continuously investing effort into it just in case such a plan materializes later. > Regarding single file builds, why would it help for static builds ? I > understand it would make things slightly easier to have one .o per > extension, but it does not change the fundamental process as the exported > symbols are the same in the end ? IIUC they aren't: with the multi-file build we control exported symbols using __attribute__((visibility("hidden")) or equivalent, which hides symbols from the shared object export table, but not from other translation units that are statically linked. So if you want to statically link cpython and numpy, you need some other way to let numpy .o files see each others's symbols without exposing them to cpython's .o files, and the single-file build provides one mechanism to do that: make the numpy symbols 'static' and then combine them all into a single translation unit. I would love to be wrong about this though. The single file build is pretty klugey :-). -n -- Nathaniel J. Smith -- http://vorpus.org From cournape at gmail.com Wed Aug 19 08:43:04 2015 From: cournape at gmail.com (David Cournapeau) Date: Wed, 19 Aug 2015 13:43:04 +0100 Subject: [Numpy-discussion] Proposal to remove the Bento build. In-Reply-To: References: Message-ID: On Wed, Aug 19, 2015 at 1:22 AM, Nathaniel Smith wrote: > On Tue, Aug 18, 2015 at 4:15 PM, David Cournapeau > wrote: > > If everybody wants to remove bento, we should remove it. > > FWIW, I don't really have an opinion either way on bento versus > distutils, I just feel that we shouldn't maintain two build systems > unless we're actively planning to get rid of one of them, and for > several years now we haven't really been learning anything by keeping > the bento build working, nor has there been any movement towards > switching to bento as the one-and-only build system, or even a clear > consensus that this would be a good thing. (Obviously distutils and > numpy.distutils are junk, so that's a point in bento's favor, but it > isn't *totally* cut and dried -- we know numpy.distutils works and we > have to maintain it regardless for backcompat, while bento doesn't > seem to have any activity upstream or any other users...). > > So I'd be totally in favor of adding bento back later if/when such a > plan materializes; I just don't think it makes sense to keep > continuously investing effort into it just in case such a plan > materializes later. > > > Regarding single file builds, why would it help for static builds ? I > > understand it would make things slightly easier to have one .o per > > extension, but it does not change the fundamental process as the exported > > symbols are the same in the end ? > > IIUC they aren't: with the multi-file build we control exported > symbols using __attribute__((visibility("hidden")) or equivalent, > which hides symbols from the shared object export table, but not from > other translation units that are statically linked. So if you want to > statically link cpython and numpy, you need some other way to let > numpy .o files see each others's symbols without exposing them to > cpython's .o files, It is less a problem than in shared linking because you can detect the conflicts at linking time (instead of loading time). and the single-file build provides one mechanism > to do that: make the numpy symbols 'static' and then combine them all > into a single translation unit. > > I would love to be wrong about this though. The single file build is > pretty klugey :-). > I know, it took me a while to split the files to go out of single file build in the first place :) David > > -n > > -- > Nathaniel J. Smith -- http://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rainwoodman at gmail.com Wed Aug 19 16:10:51 2015 From: rainwoodman at gmail.com (Feng Yu) Date: Wed, 19 Aug 2015 13:10:51 -0700 Subject: [Numpy-discussion] Fwd: Reverse(DESC)-ordered sorting In-Reply-To: References: Message-ID: Dear list, This is forwarded from issue 6217 https://github.com/numpy/numpy/issues/6217 "What is the way to implement DESC ordering in the sorting routines of numpy?" (I am borrowing DESC/ASC from the SQL notation) For a stable DESC ordering sort, one can not revert the sorted array via argsort()[::-1] . I propose the following API change to argsorts/sort. (haven't thought about lexsort yet) I will use argsort as an example. Currently, argsort supports sorting by keys ('order') and by 'axis'. These two somewhat orthonal interfaces need to be treated differently. 1. by axis. Since there is just one sorting key, a single 'reversed' keyword argument is sufficient: a.argsort(axis=0, kind='merge', reversed=True) Jaime suggested this can be implemented efficiently as a post-processing step. (https://github.com/numpy/numpy/issues/6217#issuecomment-132604920) Is there a reference to the algorithm? Because all of the sorting algorithms for 'atomic' dtypes are using _LT functions, a post processing step seems to be the only viable solution, if possible. 2. by fields, ('order' kwarg) A single 'reversed' keyword argument will not work, because some keys are ASC but others are DESC, for example, sorting my LastName ASC, then Salary DESC. a.argsort(kind='merge', order=[('LastName', ('FirstName', 'asc'), ('Salary', 'desc'))]) The parsing rule of order is: - if an item is tuple, the first item is the fieldname, the second item is DESC/ASC - if an item is scalar, the fieldname is the item, the ordering is ASC. This part of the code already goes to VOID_compare, which walks a temporary copy of a.dtype to call the comparison functions. If I understood the purpose of c_metadata (numpy 1.7+) correctly, the ASC/DESC flags, offsets, and comparison functions can all be pre-compiled and passed into VOID_compare via c_metadata of the temporary type-descriptor. By just looking this will actually make VOID_compare faster by avoiding calling several Python C-API functions. negating the return value of cmp seems to be a negligable overhead in such a complex function. 3. If both reversed and order is given, the ASC/DESC fields in 'order' are effectively reversed. Any comments? Best, Yu From jaime.frio at gmail.com Thu Aug 20 00:43:04 2015 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Wed, 19 Aug 2015 21:43:04 -0700 Subject: [Numpy-discussion] Fwd: Reverse(DESC)-ordered sorting In-Reply-To: References: Message-ID: On Wed, Aug 19, 2015 at 1:10 PM, Feng Yu wrote: > Dear list, > > This is forwarded from issue 6217 > https://github.com/numpy/numpy/issues/6217 > > "What is the way to implement DESC ordering in the sorting routines of > numpy?" > > (I am borrowing DESC/ASC from the SQL notation) > > For a stable DESC ordering sort, one can not revert the sorted array via > argsort()[::-1] . > > I propose the following API change to argsorts/sort. (haven't thought > about lexsort yet) I will use argsort as an example. > > Currently, argsort supports sorting by keys ('order') and by 'axis'. > These two somewhat orthonal interfaces need to be treated differently. > > 1. by axis. > > Since there is just one sorting key, a single 'reversed' keyword > argument is sufficient: > > a.argsort(axis=0, kind='merge', reversed=True) > > Jaime suggested this can be implemented efficiently as a > post-processing step. > (https://github.com/numpy/numpy/issues/6217#issuecomment-132604920) Is > there a reference to the algorithm? > My thinking was that, for native types, you can stably reverse a sorted permutation in-place by first reversing item-by-item, then reversing every chunk of repeated entries. Sort of the way you would reverse the words in a sentence in-place: first reverse every letter, then reverse everything bounded by spaces: TURN ME AROUND -> DNUORA EM NRUT -> AROUND EM NRUT -> AROUND ME NRUT -> AROUND ME TURN We could even add a type-specific function to do this for each of the native types in the npy_sort library. As I mentioned in Yu's very nice PR , probably it is best to leave the signature of the function alone, and have something like order='desc' be the trigger for the proposed reversed=True. Jaime > > Because all of the sorting algorithms for 'atomic' dtypes are using > _LT functions, a post processing step seems to be the only viable > solution, if possible. > > 2. by fields, ('order' kwarg) > > A single 'reversed' keyword argument will not work, because some keys > are ASC but others are DESC, for example, sorting my LastName ASC, > then Salary DESC. > > a.argsort(kind='merge', order=[('LastName', ('FirstName', 'asc'), > ('Salary', 'desc'))]) > > The parsing rule of order is: > > - if an item is tuple, the first item is the fieldname, the second > item is DESC/ASC > - if an item is scalar, the fieldname is the item, the ordering is ASC. > > This part of the code already goes to VOID_compare, which walks a > temporary copy of a.dtype to call the comparison functions. > > If I understood the purpose of c_metadata (numpy 1.7+) correctly, the > ASC/DESC flags, offsets, and comparison functions can all be > pre-compiled and passed into VOID_compare via c_metadata of the > temporary type-descriptor. > > By just looking this will actually make VOID_compare faster by > avoiding calling several Python C-API functions. negating the return > value of cmp seems to be a negligable overhead in such a complex > function. > 3. If both reversed and order is given, the ASC/DESC fields in 'order' > are effectively reversed. > > Any comments? > > Best, > > Yu > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From fabien.maussion at gmail.com Sun Aug 23 13:54:29 2015 From: fabien.maussion at gmail.com (Fabien) Date: Sun, 23 Aug 2015 19:54:29 +0200 Subject: [Numpy-discussion] Numpy helper function for __getitem__? Message-ID: Folks, My search engine was not able to help me on this one, possibly because I don't know exactly *what* I am looking for. I need to override __getitem__ for a class that wrapps a numpy array. I know the dimensions of my array (which can be variable from instance to instance), and I know what I want to do: for one preselected dimension, I need to select another slice than requested by the user, do something with the data, and return the variable. I am looking for a function that helps me to "clean" the input of __getitem__. There are so many possible cases, when the user uses [:] or [..., 1:2] or [0, ..., :] and so forth. But all these cases have an equivalent index array of len(ndimensions) with only valid slice() objects in it. This array would be much easier for me to work with. in pseudo code: def __getitem__(self, item): # clean input item = np.clean_item(item, ndimensions=4) # Ok now item is guaranteed to be of len 4 item[2] = slice() # Continue etc. Is there such a function in numpy? I hope I have been clear enough... Thanks a lot! Fabien From shoyer at gmail.com Sun Aug 23 14:08:02 2015 From: shoyer at gmail.com (Stephan Hoyer) Date: Sun, 23 Aug 2015 11:08:02 -0700 (PDT) Subject: [Numpy-discussion] Numpy helper function for __getitem__? In-Reply-To: References: Message-ID: <1440353282711.d9fa3274@Nodemailer> I don't think NumPy has a function like this (at least, not exposed to Python), but I wrote one for xray, "expanded_indexer", that you are welcome to borrow: https://github.com/xray/xray/blob/v0.6.0/xray/core/indexing.py#L10 ?Stephan On Sunday, Aug 23, 2015 at 7:54 PM, Fabien , wrote: Folks, My search engine was not able to help me on this one, possibly because I don't know exactly *what* I am looking for. I need to override __getitem__ for a class that wrapps a numpy array. I know the dimensions of my array (which can be variable from instance to instance), and I know what I want to do: for one preselected dimension, I need to select another slice than requested by the user, do something with the data, and return the variable. I am looking for a function that helps me to "clean" the input of __getitem__. There are so many possible cases, when the user uses [:] or [..., 1:2] or [0, ..., :] and so forth. But all these cases have an equivalent index array of len(ndimensions) with only valid slice() objects in it. This array would be much easier for me to work with. in pseudo code: def __getitem__(self, item): # clean input item = np.clean_item(item, ndimensions=4) # Ok now item is guaranteed to be of len 4 item[2] = slice() # Continue etc. Is there such a function in numpy? I hope I have been clear enough... Thanks a lot! Fabien _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From fabien.maussion at gmail.com Sun Aug 23 17:24:06 2015 From: fabien.maussion at gmail.com (Fabien) Date: Sun, 23 Aug 2015 23:24:06 +0200 Subject: [Numpy-discussion] Numpy helper function for __getitem__? In-Reply-To: <1440353282711.d9fa3274@Nodemailer> References: <1440353282711.d9fa3274@Nodemailer> Message-ID: On 08/23/2015 08:08 PM, Stephan Hoyer wrote: > I don't think NumPy has a function like this (at least, not exposed to > Python), but I wrote one for xray, "expanded_indexer", that you are > welcome to borrow: Hi Stephan, that's perfect, thanks! Fabien From chris.laumann at gmail.com Sun Aug 23 18:02:07 2015 From: chris.laumann at gmail.com (Chris Laumann) Date: Sun, 23 Aug 2015 18:02:07 -0400 Subject: [Numpy-discussion] py2/py3 pickling Message-ID: <406F1C7C-7B83-4F78-A614-84DEF2857495@gmail.com> Hi all- Is there documentation about the limits and workarounds for py2/py3 pickle/np.save/load compatibility? I haven't found anything except developer bug tracking discussions (eg. #4879 in github numpy). The kinds of errors you get can be really obscure when save/loading complicated objects or pickles containing numpy scalars. It's really unclear to me why the following shouldn't work -- it doesn't have anything apparent to do with string handling and unicode. Run in py2: import pickle import numpy as np a = np.float64(0.99) pickle.dump(a, open('test.pkl', 'wb')) And then in py3: import pickle import numpy as np b = pickle.load(open('test.pkl', 'rb')) And you get: UnicodeDecodeError: 'ascii' codec can't decode byte 0xae in position 0: ordinal not in range(128) If you force encoding='bytes' in the load, it works. Is this explained anywhere? Best, C From sebastian at sipsolutions.net Mon Aug 24 04:23:22 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Mon, 24 Aug 2015 10:23:22 +0200 Subject: [Numpy-discussion] Numpy helper function for __getitem__? In-Reply-To: <1440353282711.d9fa3274@Nodemailer> References: <1440353282711.d9fa3274@Nodemailer> Message-ID: <1440404602.2051.14.camel@sipsolutions.net> On So, 2015-08-23 at 11:08 -0700, Stephan Hoyer wrote: > I don't think NumPy has a function like this (at least, not exposed to > Python), but I wrote one for xray, "expanded_indexer", that you are > welcome to borrow: > https://github.com/xray/xray/blob/v0.6.0/xray/core/indexing.py#L10 > > Yeah, we have no such functionality. We do have a function which does all of this in C but it is somewhat more complex not exposed in any case. That function seems nice, though on its own not complete? It does not seem to handle `np.newaxis`/`None` or boolean indexing arrays well. One other thing which is not really important, we are deprecating the use of multiple ellipsis. Fabien, just to make sure you are aware. If you are overriding `__getitem__`, you should also implement `__setitem__`. NumPy does some magic if you do not. That will seem to make `__setitem__` work fine, but breaks down if you have advanced indexing involved (or if you return copies, though it spits warnings in that case). - Sebastian > > > ?Stephan > > > > On Sunday, Aug 23, 2015 at 7:54 PM, Fabien > , wrote: > Folks, > > My search engine was not able to help me on this one, possibly > because I > don't know exactly *what* I am looking for. > > I need to override __getitem__ for a class that wrapps a numpy > array. I > know the dimensions of my array (which can be variable from > instance to > instance), and I know what I want to do: for one preselected > dimension, > I need to select another slice than requested by the user, do > something > with the data, and return the variable. > > I am looking for a function that helps me to "clean" the input > of > __getitem__. There are so many possible cases, when the user > uses [:] or > [..., 1:2] or [0, ..., :] and so forth. But all these cases > have an > equivalent index array of len(ndimensions) with only valid > slice() > objects in it. This array would be much easier for me to work > with. > > in pseudo code: > > def __getitem__(self, item): > # clean input > item = np.clean_item(item, ndimensions=4) > # Ok now item is guaranteed to be of len 4 > item[2] = slice() > # Continue > etc. > > Is there such a function in numpy? > > I hope I have been clear enough... Thanks a lot! > > Fabien > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From pav at iki.fi Mon Aug 24 12:25:49 2015 From: pav at iki.fi (Pauli Virtanen) Date: Mon, 24 Aug 2015 19:25:49 +0300 Subject: [Numpy-discussion] py2/py3 pickling In-Reply-To: <406F1C7C-7B83-4F78-A614-84DEF2857495@gmail.com> References: <406F1C7C-7B83-4F78-A614-84DEF2857495@gmail.com> Message-ID: 24.08.2015, 01:02, Chris Laumann kirjoitti: [clip] > Is there documentation about the limits and workarounds for py2/py3 > pickle/np.save/load compatibility? I haven't found anything except > developer bug tracking discussions (eg. #4879 in github numpy). Not sure if it's written down somewhere but: - You should consider pickles not portable between Py2/3. - Setting encoding='bytes' or encoding='latin1' should produce correct results for numerical data. However, neither is "safe" because the option also affects other data than numpy arrays that you may have possibly saved. - np.save/np.load are portable, as long as you don't save object arrays or anything that gets converted to one by np.array (these are saved by pickling) From njs at pobox.com Mon Aug 24 14:30:14 2015 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 24 Aug 2015 11:30:14 -0700 Subject: [Numpy-discussion] py2/py3 pickling In-Reply-To: References: <406F1C7C-7B83-4F78-A614-84DEF2857495@gmail.com> Message-ID: On Aug 24, 2015 9:29 AM, "Pauli Virtanen" wrote: > > 24.08.2015, 01:02, Chris Laumann kirjoitti: > [clip] > > Is there documentation about the limits and workarounds for py2/py3 > > pickle/np.save/load compatibility? I haven't found anything except > > developer bug tracking discussions (eg. #4879 in github numpy). > > Not sure if it's written down somewhere but: > > - You should consider pickles not portable between Py2/3. > > - Setting encoding='bytes' or encoding='latin1' should produce correct > results for numerical data. However, neither is "safe" because the > option also affects other data than numpy arrays that you may have > possibly saved. For those wondering what's going on here: if you pickled a str in python 2, then python 3 wants to unpickle it as a str. But in python 2 str was a vector of arbitrary bytes in some assumed encoding, and in python 3 str is a vector of Unicode characters. So it needs to know what encoding to use, which is fine and what you'd expect for the py2->py3 transition. But: when pickling arrays, numpy on py2 used a str to store the raw memory of your array. Trying to run this data through a character decoder then obviously makes a mess of everything. So the fundamental problem is that on py2, there's no way to distinguish between a string of text and a string of bytes -- they're encoded in exactly the same way in the pickle file -- and the python 3 unpickler just has to guess. You can tell it to guess in a way that works for raw bytes -- that's what the encoding= options Pauli mentions above do -- but obviously this will then be incorrect if you have any actual non-latin1 textual strings in your pickle, and you can't get it to handle both correctly at the same time. If you're desperate, it should be possible to get your data out of py2 pickles by loading then with one of the encoding options above, and then going through the resulting object and converting all the actual textual strings back to the correct encoding by hand. No data is actually lost. And of course even this is unnecessary if your file contains only ASCII/latin1. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.laumann at gmail.com Mon Aug 24 18:15:26 2015 From: chris.laumann at gmail.com (Chris Laumann) Date: Mon, 24 Aug 2015 18:15:26 -0400 Subject: [Numpy-discussion] py2/py3 pickling In-Reply-To: References: <406F1C7C-7B83-4F78-A614-84DEF2857495@gmail.com> Message-ID: <9159236E-D8A9-4C7B-87E3-AD802EF57A18@gmail.com> Hi- Would it be possible then (in relatively short order) to create a py2 -> py3 numpy pickle converter? This would run in py2, np.load or unpickle a pickle in the usual way and then repickle and/or save using a pickler that uses an explicit pickle type for encoding the bytes associated with numpy dtypes. The numpy unpickler in py3 would then know what to do. IE. is there a way to make the numpy py2 pickler be explicit about byte strings? Presumably this would cover most use-cases even for complicated pickled objects and could be used transparently within py2 or py3. Best, C > On Aug 24, 2015, at 2:30 PM, Nathaniel Smith wrote: > > On Aug 24, 2015 9:29 AM, "Pauli Virtanen" > wrote: > > > > 24.08.2015, 01:02, Chris Laumann kirjoitti: > > [clip] > > > Is there documentation about the limits and workarounds for py2/py3 > > > pickle/np.save/load compatibility? I haven't found anything except > > > developer bug tracking discussions (eg. #4879 in github numpy). > > > > Not sure if it's written down somewhere but: > > > > - You should consider pickles not portable between Py2/3. > > > > - Setting encoding='bytes' or encoding='latin1' should produce correct > > results for numerical data. However, neither is "safe" because the > > option also affects other data than numpy arrays that you may have > > possibly saved. > > For those wondering what's going on here: if you pickled a str in python 2, then python 3 wants to unpickle it as a str. But in python 2 str was a vector of arbitrary bytes in some assumed encoding, and in python 3 str is a vector of Unicode characters. So it needs to know what encoding to use, which is fine and what you'd expect for the py2->py3 transition. > > But: when pickling arrays, numpy on py2 used a str to store the raw memory of your array. Trying to run this data through a character decoder then obviously makes a mess of everything. So the fundamental problem is that on py2, there's no way to distinguish between a string of text and a string of bytes -- they're encoded in exactly the same way in the pickle file -- and the python 3 unpickler just has to guess. You can tell it to guess in a way that works for raw bytes -- that's what the encoding= options Pauli mentions above do -- but obviously this will then be incorrect if you have any actual non-latin1 textual strings in your pickle, and you can't get it to handle both correctly at the same time. > > If you're desperate, it should be possible to get your data out of py2 pickles by loading then with one of the encoding options above, and then going through the resulting object and converting all the actual textual strings back to the correct encoding by hand. No data is actually lost. And of course even this is unnecessary if your file contains only ASCII/latin1. > > -n > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Tue Aug 25 06:03:41 2015 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 25 Aug 2015 03:03:41 -0700 Subject: [Numpy-discussion] Notes from the numpy dev meeting at scipy 2015 Message-ID: Hi all, These are the notes from the NumPy dev meeting held July 7, 2015, at the SciPy conference in Austin, presented here so the list can keep up with what happens, and so you can give feedback. Please do give feedback, none of this is final! (Also, if anyone who was there notices anything I left out or mischaracterized, please speak up -- these are a lot of notes I'm trying to gather together, so I could easily have missed something!) Thanks to Jill Cowan and the rest of the SciPy organizers for donating space and organizing logistics for us, and to the Berkeley Institute for Data Science for funding travel for Jaime, Nathaniel, and Sebastian. Attendees ========= Present in the room for all or part: Daniel Allan, Chris Barker, Sebastian Berg, Thomas Caswell, Jeff Reback, Jaime Fern?ndez del R?o, Chuck Harris, Nathaniel Smith, St?fan van der Walt. (Note: I'm pretty sure this list is incomplete) Joining remotely for all or part: Stephan Hoyer, Julian Taylor. Formalizing our governance/decision making ========================================== This was a major focus of discussion. At a high level, the consensus was to steal IPython's governance document ("IPEP 29") and modify it to remove its use of a BDFL as a "backstop" to normal community consensus-based decision, and replace it with a new "backstop" based on Apache-project-style consensus voting amongst the core team. I'll send out a proper draft of this shortly for further discussion. Development roadmap =================== General consensus: Let's assume NumPy is going to remain important indefinitely, and try to make it better, instead of waiting for something better to come along. (This is unlikely to be wasted effort even if something better does come along, and it's hardly a sure thing that that will happen anyway.) Let's focus on evolving numpy as far as we can without major break-the-world changes (no "numpy 2.0", at least in the foreseeable future). And, as a target for that evolution, let's change our focus from numpy as "NumPy is the library that gives you the np.ndarray object (plus some attached infrastructure)", to "NumPy provides the standard framework for working with arrays and array-like objects in Python" This means, creating defined interfaces between array-like objects / ufunc objects / dtype objects, so that it becomes possible for third parties to add their own and mix-and-match. Right now ufuncs are pretty good at this, but if you want a new array class or dtype then in most cases you pretty much have to modify numpy itself. Vision: instead of everyone who wants a new container type having to reimplement all of numpy, Alice can implement an array class using (sparse / distributed / compressed / tiled / gpu / out-of-core / delayed / ...) storage, pass it to code that was written using direct calls to np.* functions, and it just works. (Instead of np.sin being "the way you calculate the sine of an ndarray", it's "the way you calculate the sine of any array-like container object".) Vision: Darryl can implement a new dtype for (categorical data / astronomical dates / integers-with-missing-values / ...) without having to touch the numpy core. Vision: Chandni can then come along and combine them by doing a = alice_array([...], dtype=darryl_dtype) and it just works. Vision: no-one is tempted to subclass ndarray, because anything you can do with an ndarray subclass you can also easily do by defining your own new class that implements the "array protocol". Supporting third-party array types ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Sub-goals: - Get __numpy_ufunc__ done, which will cover a good chunk of numpy's API right there. - Go through the rest of the stuff in numpy, and figure out some story for how to let it handle third-party array classes: - ufunc ALL the things: Some things can be converted directly into (g)ufuncs and then use __numpy_ufunc__ (e.g., np.std); some things could be converted into (g)ufuncs if we extended the (g)ufunc interface a bit (e.g. np.sort, np.matmul). - Some things probably need their own __numpy_ufunc__-like extensions (__numpy_concatenate__?) - Provide tools to make it easier to implement the more complicated parts of an array object (e.g. the bazillion different methods, many of which are ufuncs in disguise, or indexing) - Longer-run interesting research project: __numpy_ufunc__ requires that one or the other object have explicit knowledge of how to handle the other, so to handle binary ufuncs with N array types you need something like N**2 __numpy_ufunc__ code paths. As an alternative, if there were some interface that an object could export that provided the operations nditer needs to efficiently iterate over (chunks of) it, then you would only need N implementations of this interface to handle all N**2 operations. This would solve a lot of problems for projects like: - blosc - dask - distarray - numpy.ma - pandas - scipy.sparse - xray Supporting third-party dtypes ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We already have something like a C level "dtype protocol". Conceptually, the way you define a new dtype is by defining a new class whose instances have data attributes defining the parameters of the dtype (what fields are in *this* record dtype, how many characters are in *this* string dtype, what units are used for *this* datetime64, etc.), and you define a bunch of methods to do things like convert an object from a Python object to your dtype or vice-versa, to copy an array of your dtype from one place to another, to cast to and from your new dtype, etc. This part is great. The problem is, in the current implementation, we don't actually use the Python object system to define these classes / attributes / methods. Instead, all possible dtypes are jammed into a single Python-level class, whose struct has fields for the union of all possible dtype's attributes, and instead of Python-style method slots there's just a big table of function pointers attached to each object. So the main proposal is that we keep the basic design, but switch it so that the float64 dtype, the int64 dtype, etc. actually literally are subclasses of np.dtype, each implementing their own fields and Python-style methods. Some of the pieces involved in doing this: - The current dtype methods should be cleaned up -- e.g. 'dot' and 'less_than' are both dtype methods, when conceptually they're much more like ufuncs. - The ufunc inner-loop interface currently does not get a reference to the dtype object, so they can't see its attributes and this is a big obstacle to many interesting dtypes (e.g., it's hard to implement np.equal for categoricals if you don't know what categories each has). So we need to add new arguments to the core ufunc loop signature. (Fortunately this can be done in a backwards-compatible way.) - We need to figure out what exactly the dtype methods should be, and add them to the dtype class (possibly with backwards compatibility shims for anyone who is accessing PyArray_ArrFuncs directly). - Casting will be possibly the trickiest thing to work out, though the basic idea of using dunder-dispatch-like __cast__ and __rcast__ methods seems workable. (Encouragingly, this is also exactly what dynd also does, though unfortunately dynd does not yet support user-defined dtypes even to the extent that numpy does, so there isn't much else we can steal from them.) - We may also want to rethink the casting rules while we're at it, since they have some very weird corners right now (e.g. see [https://github.com/numpy/numpy/issues/6240]) - We need to migrate the current dtypes over to the new system, which can be done in stages: - First stick them all in a single "legacy dtype" class whose methods just dispatch to the PyArray_ArrFuncs per-object "method table" - Then move each of them into their own classes - We should provide a Python-level wrapper for the protocol, so that you can call dtype methods from Python - And vice-versa, it should be possible to subclass dtype at the Python level - etc. Fortunately, AFAICT pretty much all of this can be done while maintaining backwards compatibility (though we may want to break some obscure cases to avoid expending *too* much effort with weird backcompat contortions that will only help a vanishingly small proportion of the userbase), and a lot of the above changes can be done as semi-independent mini-projects, so there's no need for some branch to go off and spend a year rewriting the world. Obviously there are still a lot of details to work out, though. But overall, there was widespread agreement that this is one of the #1 pain points for our users (e.g. it's the single main request from pandas), and fixing it is very high priority. Some features that would become straightforward to implement (e.g. even in third-party libraries) if this were fixed: - missing value support - physical unit tracking (meters / seconds -> array of velocity; meters + seconds -> error) - better and more diverse datetime representations (e.g. datetimes with attached timezones, or using funky geophysical or astronomical calendars) - categorical data - variable length strings - strings-with-encodings (e.g. latin1) - forward mode automatic differentiation (write a function that computes f(x) where x is an array of float64; pass that function an array with a special dtype and get out both f(x) and f'(x)) - probably others I'm forgetting right now I should also note that there was one substantial objection to this plan, from Travis Oliphant (in discussions later in the conference). I'm not confident I understand his objections well enough to reproduce them here, though -- perhaps he'll elaborate. Money ===== There was an extensive discussion on the topic of: "if we had money, what would we do with it?" This is partially motivated by the realization that there are a number of sources that we could probably get money from, if we had a good story for what we wanted to do, so it's not just an idle question. Points of general agreement: - Doing the in-person meeting was a good thing. We should plan do that again, at least once a year. So one thing to spend money on is travel subsidies to make sure that happens and is productive. - While it's tempting to imagine hiring junior people for the more frustrating/boring work like maintaining buildbots, release infrastructure, updating docs, etc., this seems difficult to do realistically with our current resources -- how do we hire for this, who would manage them, etc.? - On the other hand, the general feeling was that if we found the money to hire a few more senior people who could take care of themselves more, then that would be good and we could realistically absorb that extra work without totally unbalancing the project. - A major open question is how we would recruit someone for a position like this, since apparently all the obvious candidates who are already active on the NumPy team already have other things going on. [For calibration on how hard this can be: NYU has apparently had an open position for a year with the job description of "come work at NYU full-time with a private-industry-competitive-salary on whatever your personal open-source scientific project is" (!) and still is having an extremely difficult time filling it: [http://cds.nyu.edu/research-engineer/]] - General consensus though was that there isn't much to be done about this though, except try it and see. - (By the way, if you're someone who's reading this and potentially interested in like a postdoc or better working on numpy, then let's talk...) More specific changes to numpy that had general consensus, but don't really fit into a high-level roadmap ========================================================================================================= - Resolved: we should merge multiarray.so and umath.so into a single extension module, so that they can share utility code without the current awkward contortions. - Resolved: we should start hiding new fields in the ufunc and dtype structs as soon as possible going forward. (I.e. they would not be present in the version of the structs that are exposed through the C API, but internally we would use a more detailed struct.) - Mayyyyyybe we should even go ahead and hide the subset of the existing fields that are really internal details that no-one should be using. If we did this without changing anything else then it would preserve ABI (the fields would still be where existing compiled extensions expect them to be, if any such extensions exist) while breaking API (trying to compile such extensions would give a clear error), so would be a smoother ramp if we think we need to eventually break those fields for real. (As discussed above, there are a bunch of fields in the dtype base class that only make sense for specific dtype subclasses, e.g. only record dtypes need a list of field names, but right now all dtypes have one anyway. So it would be nice to remove these from the base class entirely, but that is potentially ABI-breaking.) - Resolved: np.array should never return an object array unless explicitly requested (e.g. with dtype=object); it just causes too many surprising problems. - First step: add a deprecation warning - Eventually: make it an error. - The matrix class - Resolved: We won't add warnings yet, but we will prominently document that it is deprecated and should be avoided where-ever possible. - St?fan van der Walt volunteers to do this. - We'd all like to deprecate it properly, but the feeling was that the precondition for this is for scipy.sparse to provide sparse "arrays" that don't return np.matrix objects on ordinary operatoins. Until that happens we can't reasonably tell people that using np.matrix is a bug. - Resolved: we should add a similar prominent note to the "subclassing ndarray" documentation, warning people that this is painful and barely works and please don't do it if you have any alternatives. - Resolved: we want more, smaller releases -- every 6 months at least, aiming to go even faster (every 4 months?) - On the question of using Cython inside numpy core: - Everyone agrees that there are places where this would be an improvement (e.g., Python<->C interfaces, and places "when you want to do computer science", e.g. complicated algorithmic stuff like graph traversals) - Chuck wanted it to be clear though that he doesn't think it would be a good goal to try and rewrite all of numpy in Cython -- there also exist places where Cython ends up being "an uglier version of C". No-one disagreed. - Our text reader is apparently not very functional on Python 3, and generally slow and hard to work with. - Resolved: We should extract Pandas's awesome text reader/parser and convert it into its own package, that could then become a new backend for both pandas and numpy.loadtxt. - Jeff thinks this is a great idea - Thomas Caswell volunteers to do the extraction. - We should work on improving our tools for evolving the ABI, so that we will eventually be less constrained by decisions made decades ago. - One idea that had a lot of support was to switch from our current append-only C-API to a "sliding window" API based on explicit versions. So a downstream package might say #define NUMPY_API_VERSION 4 and they'd get the functions and behaviour provided in "version 4" of the numpy C api. If they wanted to get access to new stuff that was added in version 5, then they'd need to switch that #define, and at the same time clean up any usage of stuff that was removed or changed in version 5. And to provide a smooth migration path, one version of numpy would support multiple versions at once, gradually deprecating and dropping old versions. - If anyone wants to help bring pip up to scratch WRT tracking ABI dependencies (e.g., 'pip install numpy==' -> triggers rebuild of scipy against the new ABI), then that would be an extremely useful thing. Policies that should be documented ================================== ...together with some notes about what the contents of the document should be: How we manage bugs in the bug tracker. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - Github "milestones" should *only* be assigned to release-blocker bugs (which mostly means "regression from the last release"). In particular, if you're tempted to push a bug forward to the next release... then it's clearly not a blocker, so don't set it to the next release's milestone, just remove the milestone entirely. (Obvious exception to this: deprecation followup bugs where we decide that we want to keep the deprecation around a bit longer are a case where a bug actually does switch from being a blocker to release 1.x to being a blocker for release 1.(x+1).) - Don't hesitate to close an issue if there's no way forward -- e.g. a PR where the author has disappeared. Just post a link to this policy and close, with a polite note that we need to keep our tracker useful as a todo list, but they're welcome to re-open if things change. Deprecations and breakage policy: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - How long do we need to keep DeprecationWarnings around before we break things? This is tricky because on the one hand an aggressive (short) deprecation period lets us deliver new features and important cleanups more quickly, but on the other hand a too-aggressive deprecation period is difficult for our more conservative downstream users. - Idea that had the most support: pick a somewhat-aggressive warning period as our default, and make a rule that if someone asks for an extension during the beta cycle for the release that removes it, then we put it back for another release or two worth of grace period. (While also possibly upgrading the warning to be more visible during the grace period.) This gives us deprecation periods that are more adaptive on a case-by-case basis. - Lament: it would be really nice if we could get more people to test our beta releases, because in practice right now 1.x.0 ends up being where we actually the discover all the bugs, and 1.x.1 is where it actually becomes usable. Which sucks, and makes it difficult to have a solid policy about what counts as a regression, etc. Is there anything we can do about this? - ABI breakage: we distinguish between an ABI break that breaks everything (e.g., "import scipy" segfaults), versus an ABI break that breaks an occasional rare case (e.g., only apps that poke around in some obscure corner of some struct are affected). - The "break-the-world" type remains off-limit for now: the pain is still too large (conda helps, but there are lots of people who don't use conda!), and there aren't really any compelling improvements that this would enable anyway. - For the "break-0.1%-of-users" type, it is *not* ruled out by fiat, though we remain conservative: we should treat it like other API breaks in principle, and do a careful case-by-case analysis of the details of the situation, taking into account what kind of code would be broken, how common these cases are, how important the benefits are, whether there are any specific mitigation strategies we can use, etc. -- with this process of course taking into account that a segfault is nastier than a Python exception. Other points that were discussed ================================ - There was inconclusive discussion of what we should do with dot() in the places where it disagrees with the PEP 465 matmul semantics (specifically this is when both arguments have ndim >= 3, or one argument has ndim == 0). - The concern is that the current behavior is not very useful, and as far as we can tell no-one is using it; but, as people get used to the more-useful PEP 465 behavior, they will increasingly try to use it on the assumption that np.dot will work the same way, and this will create pain for lots of people. So Nathaniel argued that we should start at least issuing a visible warning when people invoke the corner-case behavior. - But OTOH, np.dot is such a core piece of infrastructure, and there's such a large landscape of code out there using numpy that we can't see, that others were reasonably wary of making any change. - For now: document prominently, but no change in behavior. Links to raw notes ================== Main page: [https://github.com/numpy/numpy/wiki/SciPy-2015-developer-meeting] Notes from the meeting proper: [https://docs.google.com/document/d/1IJcYdsHtk8MVAM4AZqFDBSf_nVG-mrB4Tv2bh9u1g4Y/edit?usp=sharing] Slides from the followup BoF: [https://gist.github.com/njsmith/eb42762054c88e810786/raw/b74f978ce10a972831c582485c80fb5b8e68183b/future-of-numpy-bof.odp] Notes from the followup BoF: [https://docs.google.com/document/d/11AuTPms5dIPo04JaBOWEoebXfk-tUzEZ-CvFnLIt33w/edit] -n -- Nathaniel J. Smith -- http://vorpus.org From fabien.maussion at gmail.com Tue Aug 25 11:41:36 2015 From: fabien.maussion at gmail.com (Fabien) Date: Tue, 25 Aug 2015 17:41:36 +0200 Subject: [Numpy-discussion] Numpy helper function for __getitem__? In-Reply-To: <1440404602.2051.14.camel@sipsolutions.net> References: <1440353282711.d9fa3274@Nodemailer> <1440404602.2051.14.camel@sipsolutions.net> Message-ID: On 08/24/2015 10:23 AM, Sebastian Berg wrote: > Fabien, just to make sure you are aware. If you are overriding > `__getitem__`, you should also implement `__setitem__`. NumPy does some > magic if you do not. That will seem to make `__setitem__` work fine, but > breaks down if you have advanced indexing involved (or if you return > copies, though it spits warnings in that case). Hi Sebastian, thanks for the info. I am writing a duck NetCDF4 Variable object, and therefore I am not trying to override Numpy arrays. I think that Stephan's function for xray is very useful. A possible improvement (probably at a certain performance cost) would be to be able to provide a shape instead of a number of dimensions. The output would then be slices with valid start and ends. Current behavior: In[9]: expanded_indexer(slice(None), 2) Out[9]: (slice(None, None, None), slice(None, None, None)) With shape: In[9]: expanded_indexer(slice(None), (3, 4)) Out[9]: (slice(0, 4, 1), slice(0, 5, 1)) But if nobody needed something like this before me, I think that I might have a design problem in my code (still quite new to python). Cheers and thanks, Fabien From pav at iki.fi Tue Aug 25 12:12:30 2015 From: pav at iki.fi (Pauli Virtanen) Date: Tue, 25 Aug 2015 19:12:30 +0300 Subject: [Numpy-discussion] py2/py3 pickling In-Reply-To: <9159236E-D8A9-4C7B-87E3-AD802EF57A18@gmail.com> References: <406F1C7C-7B83-4F78-A614-84DEF2857495@gmail.com> <9159236E-D8A9-4C7B-87E3-AD802EF57A18@gmail.com> Message-ID: 25.08.2015, 01:15, Chris Laumann kirjoitti: > Would it be possible then (in relatively short order) to create > a py2 -> py3 numpy pickle converter? You probably need to modify the pickle stream directly, replacing *STRING opcodes with *BYTES opcodes when it comes to objects that are needed for constructing Numpy arrays. https://hg.python.org/cpython/file/tip/Modules/_pickle.c#l82 Or, use a custom pickler class that emits the new opcodes when it comes to data that is part of Numpy arrays, as Python 2 pickler doesn't know how to write bytes opcodes. It's probably doable, although likely annoying to implement. the pickles created won't be loadable on Py2, only Py3. You'd need to find a volunteer who wants to work on this or just do it yourself, though. From charlesr.harris at gmail.com Tue Aug 25 12:26:02 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 25 Aug 2015 10:26:02 -0600 Subject: [Numpy-discussion] 1.10.0rc1 Message-ID: Hi All, The silence after the 1.10 beta has been eerie. Consequently, I'm thinking of making a first release candidate this weekend. If you haven't yet tested the beta, please do so. It would be good to discover as many problems as we can before the first release. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Tue Aug 25 12:26:31 2015 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 25 Aug 2015 18:26:31 +0200 Subject: [Numpy-discussion] py2/py3 pickling References: <406F1C7C-7B83-4F78-A614-84DEF2857495@gmail.com> <9159236E-D8A9-4C7B-87E3-AD802EF57A18@gmail.com> Message-ID: <20150825182631.77815a92@fsol> On Tue, 25 Aug 2015 19:12:30 +0300 Pauli Virtanen wrote: > 25.08.2015, 01:15, Chris Laumann kirjoitti: > > Would it be possible then (in relatively short order) to create > > a py2 -> py3 numpy pickle converter? > > You probably need to modify the pickle stream directly, replacing > *STRING opcodes with *BYTES opcodes when it comes to objects that are > needed for constructing Numpy arrays. > > https://hg.python.org/cpython/file/tip/Modules/_pickle.c#l82 > > Or, use a custom pickler class that emits the new opcodes when it comes > to data that is part of Numpy arrays, as Python 2 pickler doesn't know > how to write bytes opcodes. > > It's probably doable, although likely annoying to implement. the pickles > created won't be loadable on Py2, only Py3. One could take a look at how the built-in bytearray type achieves pickle compatibility between 2.x and 3.x. The solution is to serialize the binary data as a latin-1 decoded unicode string, and to return the right reconstructor from __reduce__. The solution is less space-efficient than pure bytes pickling, since the unicode string is serialized as utf-8 (so bytes > 0x80 are multibyte-encoded). There's also some CPU overhead, due to the successive decoding and encoding steps. You can take a look at the bytearray_reduce() function in Objects/bytearrayobject.c, both for 2.x and 3.x. (also note how the 3.x version does it only for protocols < 3, to achieve better efficiency on newer protocol versions) Another possibility would be a custom Unpickler class for 3.x, dealing specifically with 2.x-produced Numpy array pickles. That way the pickles themselves could be cross-version. Regards Antoine. From charlesr.harris at gmail.com Tue Aug 25 12:43:19 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 25 Aug 2015 10:43:19 -0600 Subject: [Numpy-discussion] Notes from the numpy dev meeting at scipy 2015 In-Reply-To: References: Message-ID: On Tue, Aug 25, 2015 at 4:03 AM, Nathaniel Smith wrote: > Hi all, > > These are the notes from the NumPy dev meeting held July 7, 2015, at > the SciPy conference in Austin, presented here so the list can keep up > with what happens, and so you can give feedback. Please do give > feedback, none of this is final! > > (Also, if anyone who was there notices anything I left out or > mischaracterized, please speak up -- these are a lot of notes I'm > trying to gather together, so I could easily have missed something!) > > Thanks to Jill Cowan and the rest of the SciPy organizers for donating > space and organizing logistics for us, and to the Berkeley Institute > for Data Science for funding travel for Jaime, Nathaniel, and > Sebastian. > > > Attendees > ========= > > Present in the room for all or part: Daniel Allan, Chris Barker, > Sebastian Berg, Thomas Caswell, Jeff Reback, Jaime Fern?ndez del > R?o, Chuck Harris, Nathaniel Smith, St?fan van der Walt. (Note: I'm > pretty sure this list is incomplete) > > Joining remotely for all or part: Stephan Hoyer, Julian Taylor. > > > Formalizing our governance/decision making > ========================================== > > This was a major focus of discussion. At a high level, the consensus > was to steal IPython's governance document ("IPEP 29") and modify it > to remove its use of a BDFL as a "backstop" to normal community > consensus-based decision, and replace it with a new "backstop" based > on Apache-project-style consensus voting amongst the core team. > > I'll send out a proper draft of this shortly for further discussion. > > > Development roadmap > =================== > > General consensus: > > Let's assume NumPy is going to remain important indefinitely, and > try to make it better, instead of waiting for something better to > come along. (This is unlikely to be wasted effort even if something > better does come along, and it's hardly a sure thing that that will > happen anyway.) > > Let's focus on evolving numpy as far as we can without major > break-the-world changes (no "numpy 2.0", at least in the foreseeable > future). > > And, as a target for that evolution, let's change our focus from > numpy as "NumPy is the library that gives you the np.ndarray object > (plus some attached infrastructure)", to "NumPy provides the > standard framework for working with arrays and array-like objects in > Python" > > This means, creating defined interfaces between array-like objects / > ufunc objects / dtype objects, so that it becomes possible for third > parties to add their own and mix-and-match. Right now ufuncs are > pretty good at this, but if you want a new array class or dtype then > in most cases you pretty much have to modify numpy itself. > > Vision: instead of everyone who wants a new container type having to > reimplement all of numpy, Alice can implement an array class using > (sparse / distributed / compressed / tiled / gpu / out-of-core / > delayed / ...) storage, pass it to code that was written using > direct calls to np.* functions, and it just works. (Instead of > np.sin being "the way you calculate the sine of an ndarray", it's > "the way you calculate the sine of any array-like container > object".) > > Vision: Darryl can implement a new dtype for (categorical data / > astronomical dates / integers-with-missing-values / ...) without > having to touch the numpy core. > > Vision: Chandni can then come along and combine them by doing > > a = alice_array([...], dtype=darryl_dtype) > > and it just works. > > Vision: no-one is tempted to subclass ndarray, because anything you > can do with an ndarray subclass you can also easily do by defining > your own new class that implements the "array protocol". > > > Supporting third-party array types > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > Sub-goals: > - Get __numpy_ufunc__ done, which will cover a good chunk of numpy's > API right there. > - Go through the rest of the stuff in numpy, and figure out some > story for how to let it handle third-party array classes: > - ufunc ALL the things: Some things can be converted directly into > (g)ufuncs and then use __numpy_ufunc__ (e.g., np.std); some > things could be converted into (g)ufuncs if we extended the > (g)ufunc interface a bit (e.g. np.sort, np.matmul). > - Some things probably need their own __numpy_ufunc__-like > extensions (__numpy_concatenate__?) > - Provide tools to make it easier to implement the more complicated > parts of an array object (e.g. the bazillion different methods, > many of which are ufuncs in disguise, or indexing) > - Longer-run interesting research project: __numpy_ufunc__ requires > that one or the other object have explicit knowledge of how to > handle the other, so to handle binary ufuncs with N array types > you need something like N**2 __numpy_ufunc__ code paths. As an > alternative, if there were some interface that an object could > export that provided the operations nditer needs to efficiently > iterate over (chunks of) it, then you would only need N > implementations of this interface to handle all N**2 operations. > > This would solve a lot of problems for projects like: > - blosc > - dask > - distarray > - numpy.ma > - pandas > - scipy.sparse > - xray > > > Supporting third-party dtypes > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > We already have something like a C level "dtype > protocol". Conceptually, the way you define a new dtype is by > defining a new class whose instances have data attributes defining > the parameters of the dtype (what fields are in *this* record dtype, > how many characters are in *this* string dtype, what units are used > for *this* datetime64, etc.), and you define a bunch of methods to > do things like convert an object from a Python object to your dtype > or vice-versa, to copy an array of your dtype from one place to > another, to cast to and from your new dtype, etc. This part is > great. > > The problem is, in the current implementation, we don't actually use > the Python object system to define these classes / attributes / > methods. Instead, all possible dtypes are jammed into a single > Python-level class, whose struct has fields for the union of all > possible dtype's attributes, and instead of Python-style method > slots there's just a big table of function pointers attached to each > object. > > So the main proposal is that we keep the basic design, but switch it > so that the float64 dtype, the int64 dtype, etc. actually literally > are subclasses of np.dtype, each implementing their own fields and > Python-style methods. > > Some of the pieces involved in doing this: > > - The current dtype methods should be cleaned up -- e.g. 'dot' and > 'less_than' are both dtype methods, when conceptually they're much > more like ufuncs. > > - The ufunc inner-loop interface currently does not get a reference > to the dtype object, so they can't see its attributes and this is > a big obstacle to many interesting dtypes (e.g., it's hard to > implement np.equal for categoricals if you don't know what > categories each has). So we need to add new arguments to the core > ufunc loop signature. (Fortunately this can be done in a > backwards-compatible way.) > > - We need to figure out what exactly the dtype methods should be, > and add them to the dtype class (possibly with backwards > compatibility shims for anyone who is accessing PyArray_ArrFuncs > directly). > > - Casting will be possibly the trickiest thing to work out, though > the basic idea of using dunder-dispatch-like __cast__ and > __rcast__ methods seems workable. (Encouragingly, this is also > exactly what dynd also does, though unfortunately dynd does not > yet support user-defined dtypes even to the extent that numpy > does, so there isn't much else we can steal from them.) > - We may also want to rethink the casting rules while we're at it, > since they have some very weird corners right now (e.g. see > [https://github.com/numpy/numpy/issues/6240]) > > - We need to migrate the current dtypes over to the new system, > which can be done in stages: > > - First stick them all in a single "legacy dtype" class whose > methods just dispatch to the PyArray_ArrFuncs per-object "method > table" > > - Then move each of them into their own classes > > - We should provide a Python-level wrapper for the protocol, so that > you can call dtype methods from Python > > - And vice-versa, it should be possible to subclass dtype at the > Python level > > - etc. > > Fortunately, AFAICT pretty much all of this can be done while > maintaining backwards compatibility (though we may want to break > some obscure cases to avoid expending *too* much effort with weird > backcompat contortions that will only help a vanishingly small > proportion of the userbase), and a lot of the above changes can be > done as semi-independent mini-projects, so there's no need for some > branch to go off and spend a year rewriting the world. > > Obviously there are still a lot of details to work out, though. But > overall, there was widespread agreement that this is one of the #1 > pain points for our users (e.g. it's the single main request from > pandas), and fixing it is very high priority. > > Some features that would become straightforward to implement > (e.g. even in third-party libraries) if this were fixed: > - missing value support > - physical unit tracking (meters / seconds -> array of velocity; > meters + seconds -> error) > - better and more diverse datetime representations (e.g. datetimes > with attached timezones, or using funky geophysical or > astronomical calendars) > - categorical data > - variable length strings > - strings-with-encodings (e.g. latin1) > - forward mode automatic differentiation (write a function that > computes f(x) where x is an array of float64; pass that function > an array with a special dtype and get out both f(x) and f'(x)) > - probably others I'm forgetting right now > > I should also note that there was one substantial objection to this > plan, from Travis Oliphant (in discussions later in the > conference). I'm not confident I understand his objections well > enough to reproduce them here, though -- perhaps he'll elaborate. > > > Money > ===== > > There was an extensive discussion on the topic of: "if we had money, > what would we do with it?" > > This is partially motivated by the realization that there are a > number of sources that we could probably get money from, if we had a > good story for what we wanted to do, so it's not just an idle > question. > > Points of general agreement: > > - Doing the in-person meeting was a good thing. We should plan do > that again, at least once a year. So one thing to spend money on > is travel subsidies to make sure that happens and is productive. > > - While it's tempting to imagine hiring junior people for the more > frustrating/boring work like maintaining buildbots, release > infrastructure, updating docs, etc., this seems difficult to do > realistically with our current resources -- how do we hire for > this, who would manage them, etc.? > > - On the other hand, the general feeling was that if we found the > money to hire a few more senior people who could take care of > themselves more, then that would be good and we could > realistically absorb that extra work without totally unbalancing > the project. > > - A major open question is how we would recruit someone for a > position like this, since apparently all the obvious candidates > who are already active on the NumPy team already have other > things going on. [For calibration on how hard this can be: NYU > has apparently had an open position for a year with the job > description of "come work at NYU full-time with a > private-industry-competitive-salary on whatever your personal > open-source scientific project is" (!) and still is having an > extremely difficult time filling it: > [http://cds.nyu.edu/research-engineer/]] > > - General consensus though was that there isn't much to be done > about this though, except try it and see. > > - (By the way, if you're someone who's reading this and > potentially interested in like a postdoc or better working on > numpy, then let's talk...) > > > More specific changes to numpy that had general consensus, but don't > really fit into a high-level roadmap > > ========================================================================================================= > > - Resolved: we should merge multiarray.so and umath.so into a single > extension module, so that they can share utility code without the > current awkward contortions. > > - Resolved: we should start hiding new fields in the ufunc and dtype > structs as soon as possible going forward. (I.e. they would not be > present in the version of the structs that are exposed through the > C API, but internally we would use a more detailed struct.) > - Mayyyyyybe we should even go ahead and hide the subset of the > existing fields that are really internal details that no-one > should be using. If we did this without changing anything else > then it would preserve ABI (the fields would still be where > existing compiled extensions expect them to be, if any such > extensions exist) while breaking API (trying to compile such > extensions would give a clear error), so would be a smoother > ramp if we think we need to eventually break those fields for > real. (As discussed above, there are a bunch of fields in the > dtype base class that only make sense for specific dtype > subclasses, e.g. only record dtypes need a list of field names, > but right now all dtypes have one anyway. So it would be nice to > remove these from the base class entirely, but that is > potentially ABI-breaking.) > > - Resolved: np.array should never return an object array unless > explicitly requested (e.g. with dtype=object); it just causes too > many surprising problems. > - First step: add a deprecation warning > - Eventually: make it an error. > > - The matrix class > - Resolved: We won't add warnings yet, but we will prominently > document that it is deprecated and should be avoided where-ever > possible. > - St?fan van der Walt volunteers to do this. > - We'd all like to deprecate it properly, but the feeling was that > the precondition for this is for scipy.sparse to provide sparse > "arrays" that don't return np.matrix objects on ordinary > operatoins. Until that happens we can't reasonably tell people > that using np.matrix is a bug. > > - Resolved: we should add a similar prominent note to the > "subclassing ndarray" documentation, warning people that this is > painful and barely works and please don't do it if you have any > alternatives. > > - Resolved: we want more, smaller releases -- every 6 months at > least, aiming to go even faster (every 4 months?) > > - On the question of using Cython inside numpy core: > - Everyone agrees that there are places where this would be an > improvement (e.g., Python<->C interfaces, and places "when you > want to do computer science", e.g. complicated algorithmic stuff > like graph traversals) > - Chuck wanted it to be clear though that he doesn't think it > would be a good goal to try and rewrite all of numpy in Cython > -- there also exist places where Cython ends up being "an uglier > version of C". No-one disagreed. > > - Our text reader is apparently not very functional on Python 3, and > generally slow and hard to work with. > - Resolved: We should extract Pandas's awesome text reader/parser > and convert it into its own package, that could then become a > new backend for both pandas and numpy.loadtxt. > - Jeff thinks this is a great idea > - Thomas Caswell volunteers to do the extraction. > > - We should work on improving our tools for evolving the ABI, so > that we will eventually be less constrained by decisions made > decades ago. > - One idea that had a lot of support was to switch from our > current append-only C-API to a "sliding window" API based on > explicit versions. So a downstream package might say > > #define NUMPY_API_VERSION 4 > > and they'd get the functions and behaviour provided in "version > 4" of the numpy C api. If they wanted to get access to new stuff > that was added in version 5, then they'd need to switch that > #define, and at the same time clean up any usage of stuff that > was removed or changed in version 5. And to provide a smooth > migration path, one version of numpy would support multiple > versions at once, gradually deprecating and dropping old > versions. > > - If anyone wants to help bring pip up to scratch WRT tracking ABI > dependencies (e.g., 'pip install numpy==' > -> triggers rebuild of scipy against the new ABI), then that > would be an extremely useful thing. > > > Policies that should be documented > ================================== > > ...together with some notes about what the contents of the document > should be: > > > How we manage bugs in the bug tracker. > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > - Github "milestones" should *only* be assigned to release-blocker > bugs (which mostly means "regression from the last release"). > > In particular, if you're tempted to push a bug forward to the next > release... then it's clearly not a blocker, so don't set it to the > next release's milestone, just remove the milestone entirely. > > (Obvious exception to this: deprecation followup bugs where we > decide that we want to keep the deprecation around a bit longer > are a case where a bug actually does switch from being a blocker > to release 1.x to being a blocker for release 1.(x+1).) > > - Don't hesitate to close an issue if there's no way forward -- > e.g. a PR where the author has disappeared. Just post a link to > this policy and close, with a polite note that we need to keep our > tracker useful as a todo list, but they're welcome to re-open if > things change. > > > Deprecations and breakage policy: > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > - How long do we need to keep DeprecationWarnings around before we > break things? This is tricky because on the one hand an aggressive > (short) deprecation period lets us deliver new features and > important cleanups more quickly, but on the other hand a > too-aggressive deprecation period is difficult for our more > conservative downstream users. > > - Idea that had the most support: pick a somewhat-aggressive > warning period as our default, and make a rule that if someone > asks for an extension during the beta cycle for the release that > removes it, then we put it back for another release or two worth > of grace period. (While also possibly upgrading the warning to > be more visible during the grace period.) This gives us > deprecation periods that are more adaptive on a case-by-case > basis. > > - Lament: it would be really nice if we could get more people to > test our beta releases, because in practice right now 1.x.0 ends > up being where we actually the discover all the bugs, and 1.x.1 is > where it actually becomes usable. Which sucks, and makes it > difficult to have a solid policy about what counts as a > regression, etc. Is there anything we can do about this? > > - ABI breakage: we distinguish between an ABI break that breaks > everything (e.g., "import scipy" segfaults), versus an ABI break > that breaks an occasional rare case (e.g., only apps that poke > around in some obscure corner of some struct are affected). > > - The "break-the-world" type remains off-limit for now: the pain > is still too large (conda helps, but there are lots of people > who don't use conda!), and there aren't really any compelling > improvements that this would enable anyway. > > - For the "break-0.1%-of-users" type, it is *not* ruled out by > fiat, though we remain conservative: we should treat it like > other API breaks in principle, and do a careful case-by-case > analysis of the details of the situation, taking into account > what kind of code would be broken, how common these cases are, > how important the benefits are, whether there are any specific > mitigation strategies we can use, etc. -- with this process of > course taking into account that a segfault is nastier than a > Python exception. > > > Other points that were discussed > ================================ > > - There was inconclusive discussion of what we should do with dot() > in the places where it disagrees with the PEP 465 matmul semantics > (specifically this is when both arguments have ndim >= 3, or one > argument has ndim == 0). > - The concern is that the current behavior is not very useful, and > as far as we can tell no-one is using it; but, as people get > used to the more-useful PEP 465 behavior, they will increasingly > try to use it on the assumption that np.dot will work the same > way, and this will create pain for lots of people. So Nathaniel > argued that we should start at least issuing a visible warning > when people invoke the corner-case behavior. > - But OTOH, np.dot is such a core piece of infrastructure, and > there's such a large landscape of code out there using numpy > that we can't see, that others were reasonably wary of making > any change. > - For now: document prominently, but no change in behavior. > > > Links to raw notes > ================== > > Main page: > [https://github.com/numpy/numpy/wiki/SciPy-2015-developer-meeting] > > Notes from the meeting proper: > [ > https://docs.google.com/document/d/1IJcYdsHtk8MVAM4AZqFDBSf_nVG-mrB4Tv2bh9u1g4Y/edit?usp=sharing > ] > > Slides from the followup BoF: > [ > https://gist.github.com/njsmith/eb42762054c88e810786/raw/b74f978ce10a972831c582485c80fb5b8e68183b/future-of-numpy-bof.odp > ] > > Notes from the followup BoF: > [ > https://docs.google.com/document/d/11AuTPms5dIPo04JaBOWEoebXfk-tUzEZ-CvFnLIt33w/edit > ] > > -n > > Hi Nathaniel. Thanks for putting this together. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From nathan12343 at gmail.com Tue Aug 25 12:52:42 2015 From: nathan12343 at gmail.com (Nathan Goldbaum) Date: Tue, 25 Aug 2015 11:52:42 -0500 Subject: [Numpy-discussion] Notes from the numpy dev meeting at scipy 2015 In-Reply-To: References: Message-ID: On Tue, Aug 25, 2015 at 5:03 AM, Nathaniel Smith wrote: > Hi all, > > These are the notes from the NumPy dev meeting held July 7, 2015, at > the SciPy conference in Austin, presented here so the list can keep up > with what happens, and so you can give feedback. Please do give > feedback, none of this is final! > > (Also, if anyone who was there notices anything I left out or > mischaracterized, please speak up -- these are a lot of notes I'm > trying to gather together, so I could easily have missed something!) > > Thanks to Jill Cowan and the rest of the SciPy organizers for donating > space and organizing logistics for us, and to the Berkeley Institute > for Data Science for funding travel for Jaime, Nathaniel, and > Sebastian. > > > Attendees > ========= > > Present in the room for all or part: Daniel Allan, Chris Barker, > Sebastian Berg, Thomas Caswell, Jeff Reback, Jaime Fern?ndez del > R?o, Chuck Harris, Nathaniel Smith, St?fan van der Walt. (Note: I'm > pretty sure this list is incomplete) > > Joining remotely for all or part: Stephan Hoyer, Julian Taylor. > > > Formalizing our governance/decision making > ========================================== > > This was a major focus of discussion. At a high level, the consensus > was to steal IPython's governance document ("IPEP 29") and modify it > to remove its use of a BDFL as a "backstop" to normal community > consensus-based decision, and replace it with a new "backstop" based > on Apache-project-style consensus voting amongst the core team. > > I'll send out a proper draft of this shortly for further discussion. > > > Development roadmap > =================== > > General consensus: > > Let's assume NumPy is going to remain important indefinitely, and > try to make it better, instead of waiting for something better to > come along. (This is unlikely to be wasted effort even if something > better does come along, and it's hardly a sure thing that that will > happen anyway.) > > Let's focus on evolving numpy as far as we can without major > break-the-world changes (no "numpy 2.0", at least in the foreseeable > future). > > And, as a target for that evolution, let's change our focus from > numpy as "NumPy is the library that gives you the np.ndarray object > (plus some attached infrastructure)", to "NumPy provides the > standard framework for working with arrays and array-like objects in > Python" > > This means, creating defined interfaces between array-like objects / > ufunc objects / dtype objects, so that it becomes possible for third > parties to add their own and mix-and-match. Right now ufuncs are > pretty good at this, but if you want a new array class or dtype then > in most cases you pretty much have to modify numpy itself. > > Vision: instead of everyone who wants a new container type having to > reimplement all of numpy, Alice can implement an array class using > (sparse / distributed / compressed / tiled / gpu / out-of-core / > delayed / ...) storage, pass it to code that was written using > direct calls to np.* functions, and it just works. (Instead of > np.sin being "the way you calculate the sine of an ndarray", it's > "the way you calculate the sine of any array-like container > object".) > > Vision: Darryl can implement a new dtype for (categorical data / > astronomical dates / integers-with-missing-values / ...) without > having to touch the numpy core. > > Vision: Chandni can then come along and combine them by doing > > a = alice_array([...], dtype=darryl_dtype) > > and it just works. > > Vision: no-one is tempted to subclass ndarray, because anything you > can do with an ndarray subclass you can also easily do by defining > your own new class that implements the "array protocol". > > > Supporting third-party array types > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > Sub-goals: > - Get __numpy_ufunc__ done, which will cover a good chunk of numpy's > API right there. > - Go through the rest of the stuff in numpy, and figure out some > story for how to let it handle third-party array classes: > - ufunc ALL the things: Some things can be converted directly into > (g)ufuncs and then use __numpy_ufunc__ (e.g., np.std); some > things could be converted into (g)ufuncs if we extended the > (g)ufunc interface a bit (e.g. np.sort, np.matmul). > - Some things probably need their own __numpy_ufunc__-like > extensions (__numpy_concatenate__?) > - Provide tools to make it easier to implement the more complicated > parts of an array object (e.g. the bazillion different methods, > many of which are ufuncs in disguise, or indexing) > - Longer-run interesting research project: __numpy_ufunc__ requires > that one or the other object have explicit knowledge of how to > handle the other, so to handle binary ufuncs with N array types > you need something like N**2 __numpy_ufunc__ code paths. As an > alternative, if there were some interface that an object could > export that provided the operations nditer needs to efficiently > iterate over (chunks of) it, then you would only need N > implementations of this interface to handle all N**2 operations. > > This would solve a lot of problems for projects like: > - blosc > - dask > - distarray > - numpy.ma > - pandas > - scipy.sparse > - xray > > > Supporting third-party dtypes > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > We already have something like a C level "dtype > protocol". Conceptually, the way you define a new dtype is by > defining a new class whose instances have data attributes defining > the parameters of the dtype (what fields are in *this* record dtype, > how many characters are in *this* string dtype, what units are used > for *this* datetime64, etc.), and you define a bunch of methods to > do things like convert an object from a Python object to your dtype > or vice-versa, to copy an array of your dtype from one place to > another, to cast to and from your new dtype, etc. This part is > great. > > The problem is, in the current implementation, we don't actually use > the Python object system to define these classes / attributes / > methods. Instead, all possible dtypes are jammed into a single > Python-level class, whose struct has fields for the union of all > possible dtype's attributes, and instead of Python-style method > slots there's just a big table of function pointers attached to each > object. > > So the main proposal is that we keep the basic design, but switch it > so that the float64 dtype, the int64 dtype, etc. actually literally > are subclasses of np.dtype, each implementing their own fields and > Python-style methods. > > Some of the pieces involved in doing this: > > - The current dtype methods should be cleaned up -- e.g. 'dot' and > 'less_than' are both dtype methods, when conceptually they're much > more like ufuncs. > > - The ufunc inner-loop interface currently does not get a reference > to the dtype object, so they can't see its attributes and this is > a big obstacle to many interesting dtypes (e.g., it's hard to > implement np.equal for categoricals if you don't know what > categories each has). So we need to add new arguments to the core > ufunc loop signature. (Fortunately this can be done in a > backwards-compatible way.) > > - We need to figure out what exactly the dtype methods should be, > and add them to the dtype class (possibly with backwards > compatibility shims for anyone who is accessing PyArray_ArrFuncs > directly). > > - Casting will be possibly the trickiest thing to work out, though > the basic idea of using dunder-dispatch-like __cast__ and > __rcast__ methods seems workable. (Encouragingly, this is also > exactly what dynd also does, though unfortunately dynd does not > yet support user-defined dtypes even to the extent that numpy > does, so there isn't much else we can steal from them.) > - We may also want to rethink the casting rules while we're at it, > since they have some very weird corners right now (e.g. see > [https://github.com/numpy/numpy/issues/6240]) > > - We need to migrate the current dtypes over to the new system, > which can be done in stages: > > - First stick them all in a single "legacy dtype" class whose > methods just dispatch to the PyArray_ArrFuncs per-object "method > table" > > - Then move each of them into their own classes > > - We should provide a Python-level wrapper for the protocol, so that > you can call dtype methods from Python > > - And vice-versa, it should be possible to subclass dtype at the > Python level > > - etc. > > Fortunately, AFAICT pretty much all of this can be done while > maintaining backwards compatibility (though we may want to break > some obscure cases to avoid expending *too* much effort with weird > backcompat contortions that will only help a vanishingly small > proportion of the userbase), and a lot of the above changes can be > done as semi-independent mini-projects, so there's no need for some > branch to go off and spend a year rewriting the world. > > Obviously there are still a lot of details to work out, though. But > overall, there was widespread agreement that this is one of the #1 > pain points for our users (e.g. it's the single main request from > pandas), and fixing it is very high priority. > > Some features that would become straightforward to implement > (e.g. even in third-party libraries) if this were fixed: > - missing value support > - physical unit tracking (meters / seconds -> array of velocity; > meters + seconds -> error) > - better and more diverse datetime representations (e.g. datetimes > with attached timezones, or using funky geophysical or > astronomical calendars) > - categorical data > - variable length strings > - strings-with-encodings (e.g. latin1) > - forward mode automatic differentiation (write a function that > computes f(x) where x is an array of float64; pass that function > an array with a special dtype and get out both f(x) and f'(x)) > - probably others I'm forgetting right now > > I should also note that there was one substantial objection to this > plan, from Travis Oliphant (in discussions later in the > conference). I'm not confident I understand his objections well > enough to reproduce them here, though -- perhaps he'll elaborate. > > > Money > ===== > > There was an extensive discussion on the topic of: "if we had money, > what would we do with it?" > > This is partially motivated by the realization that there are a > number of sources that we could probably get money from, if we had a > good story for what we wanted to do, so it's not just an idle > question. > > Points of general agreement: > > - Doing the in-person meeting was a good thing. We should plan do > that again, at least once a year. So one thing to spend money on > is travel subsidies to make sure that happens and is productive. > > - While it's tempting to imagine hiring junior people for the more > frustrating/boring work like maintaining buildbots, release > infrastructure, updating docs, etc., this seems difficult to do > realistically with our current resources -- how do we hire for > this, who would manage them, etc.? > > - On the other hand, the general feeling was that if we found the > money to hire a few more senior people who could take care of > themselves more, then that would be good and we could > realistically absorb that extra work without totally unbalancing > the project. > > - A major open question is how we would recruit someone for a > position like this, since apparently all the obvious candidates > who are already active on the NumPy team already have other > things going on. [For calibration on how hard this can be: NYU > has apparently had an open position for a year with the job > description of "come work at NYU full-time with a > private-industry-competitive-salary on whatever your personal > open-source scientific project is" (!) and still is having an > extremely difficult time filling it: > [http://cds.nyu.edu/research-engineer/]] > > - General consensus though was that there isn't much to be done > about this though, except try it and see. > > - (By the way, if you're someone who's reading this and > potentially interested in like a postdoc or better working on > numpy, then let's talk...) > > > More specific changes to numpy that had general consensus, but don't > really fit into a high-level roadmap > > ========================================================================================================= > > - Resolved: we should merge multiarray.so and umath.so into a single > extension module, so that they can share utility code without the > current awkward contortions. > > - Resolved: we should start hiding new fields in the ufunc and dtype > structs as soon as possible going forward. (I.e. they would not be > present in the version of the structs that are exposed through the > C API, but internally we would use a more detailed struct.) > - Mayyyyyybe we should even go ahead and hide the subset of the > existing fields that are really internal details that no-one > should be using. If we did this without changing anything else > then it would preserve ABI (the fields would still be where > existing compiled extensions expect them to be, if any such > extensions exist) while breaking API (trying to compile such > extensions would give a clear error), so would be a smoother > ramp if we think we need to eventually break those fields for > real. (As discussed above, there are a bunch of fields in the > dtype base class that only make sense for specific dtype > subclasses, e.g. only record dtypes need a list of field names, > but right now all dtypes have one anyway. So it would be nice to > remove these from the base class entirely, but that is > potentially ABI-breaking.) > > - Resolved: np.array should never return an object array unless > explicitly requested (e.g. with dtype=object); it just causes too > many surprising problems. > - First step: add a deprecation warning > - Eventually: make it an error. > > - The matrix class > - Resolved: We won't add warnings yet, but we will prominently > document that it is deprecated and should be avoided where-ever > possible. > - St?fan van der Walt volunteers to do this. > - We'd all like to deprecate it properly, but the feeling was that > the precondition for this is for scipy.sparse to provide sparse > "arrays" that don't return np.matrix objects on ordinary > operatoins. Until that happens we can't reasonably tell people > that using np.matrix is a bug. > > - Resolved: we should add a similar prominent note to the > "subclassing ndarray" documentation, warning people that this is > painful and barely works and please don't do it if you have any > alternatives. > > - Resolved: we want more, smaller releases -- every 6 months at > least, aiming to go even faster (every 4 months?) > > - On the question of using Cython inside numpy core: > - Everyone agrees that there are places where this would be an > improvement (e.g., Python<->C interfaces, and places "when you > want to do computer science", e.g. complicated algorithmic stuff > like graph traversals) > - Chuck wanted it to be clear though that he doesn't think it > would be a good goal to try and rewrite all of numpy in Cython > -- there also exist places where Cython ends up being "an uglier > version of C". No-one disagreed. > > - Our text reader is apparently not very functional on Python 3, and > generally slow and hard to work with. > - Resolved: We should extract Pandas's awesome text reader/parser > and convert it into its own package, that could then become a > new backend for both pandas and numpy.loadtxt. > - Jeff thinks this is a great idea > - Thomas Caswell volunteers to do the extraction. > > - We should work on improving our tools for evolving the ABI, so > that we will eventually be less constrained by decisions made > decades ago. > - One idea that had a lot of support was to switch from our > current append-only C-API to a "sliding window" API based on > explicit versions. So a downstream package might say > > #define NUMPY_API_VERSION 4 > > and they'd get the functions and behaviour provided in "version > 4" of the numpy C api. If they wanted to get access to new stuff > that was added in version 5, then they'd need to switch that > #define, and at the same time clean up any usage of stuff that > was removed or changed in version 5. And to provide a smooth > migration path, one version of numpy would support multiple > versions at once, gradually deprecating and dropping old > versions. > > - If anyone wants to help bring pip up to scratch WRT tracking ABI > dependencies (e.g., 'pip install numpy==' > -> triggers rebuild of scipy against the new ABI), then that > would be an extremely useful thing. > > > Policies that should be documented > ================================== > > ...together with some notes about what the contents of the document > should be: > > > How we manage bugs in the bug tracker. > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > - Github "milestones" should *only* be assigned to release-blocker > bugs (which mostly means "regression from the last release"). > > In particular, if you're tempted to push a bug forward to the next > release... then it's clearly not a blocker, so don't set it to the > next release's milestone, just remove the milestone entirely. > > (Obvious exception to this: deprecation followup bugs where we > decide that we want to keep the deprecation around a bit longer > are a case where a bug actually does switch from being a blocker > to release 1.x to being a blocker for release 1.(x+1).) > > - Don't hesitate to close an issue if there's no way forward -- > e.g. a PR where the author has disappeared. Just post a link to > this policy and close, with a polite note that we need to keep our > tracker useful as a todo list, but they're welcome to re-open if > things change. > > > Deprecations and breakage policy: > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > - How long do we need to keep DeprecationWarnings around before we > break things? This is tricky because on the one hand an aggressive > (short) deprecation period lets us deliver new features and > important cleanups more quickly, but on the other hand a > too-aggressive deprecation period is difficult for our more > conservative downstream users. > > - Idea that had the most support: pick a somewhat-aggressive > warning period as our default, and make a rule that if someone > asks for an extension during the beta cycle for the release that > removes it, then we put it back for another release or two worth > of grace period. (While also possibly upgrading the warning to > be more visible during the grace period.) This gives us > deprecation periods that are more adaptive on a case-by-case > basis. > > - Lament: it would be really nice if we could get more people to > test our beta releases, because in practice right now 1.x.0 ends > up being where we actually the discover all the bugs, and 1.x.1 is > where it actually becomes usable. Which sucks, and makes it > difficult to have a solid policy about what counts as a > regression, etc. Is there anything we can do about this? > Just a note in here - have you all thought about running the test suites for downstream projects as part of the numpy test suite? Thanks so much for the summary - lots of interesting ideas in here! > > - ABI breakage: we distinguish between an ABI break that breaks > everything (e.g., "import scipy" segfaults), versus an ABI break > that breaks an occasional rare case (e.g., only apps that poke > around in some obscure corner of some struct are affected). > > - The "break-the-world" type remains off-limit for now: the pain > is still too large (conda helps, but there are lots of people > who don't use conda!), and there aren't really any compelling > improvements that this would enable anyway. > > - For the "break-0.1%-of-users" type, it is *not* ruled out by > fiat, though we remain conservative: we should treat it like > other API breaks in principle, and do a careful case-by-case > analysis of the details of the situation, taking into account > what kind of code would be broken, how common these cases are, > how important the benefits are, whether there are any specific > mitigation strategies we can use, etc. -- with this process of > course taking into account that a segfault is nastier than a > Python exception. > > > Other points that were discussed > ================================ > > - There was inconclusive discussion of what we should do with dot() > in the places where it disagrees with the PEP 465 matmul semantics > (specifically this is when both arguments have ndim >= 3, or one > argument has ndim == 0). > - The concern is that the current behavior is not very useful, and > as far as we can tell no-one is using it; but, as people get > used to the more-useful PEP 465 behavior, they will increasingly > try to use it on the assumption that np.dot will work the same > way, and this will create pain for lots of people. So Nathaniel > argued that we should start at least issuing a visible warning > when people invoke the corner-case behavior. > - But OTOH, np.dot is such a core piece of infrastructure, and > there's such a large landscape of code out there using numpy > that we can't see, that others were reasonably wary of making > any change. > - For now: document prominently, but no change in behavior. > > > Links to raw notes > ================== > > Main page: > [https://github.com/numpy/numpy/wiki/SciPy-2015-developer-meeting] > > Notes from the meeting proper: > [ > https://docs.google.com/document/d/1IJcYdsHtk8MVAM4AZqFDBSf_nVG-mrB4Tv2bh9u1g4Y/edit?usp=sharing > ] > > Slides from the followup BoF: > [ > https://gist.github.com/njsmith/eb42762054c88e810786/raw/b74f978ce10a972831c582485c80fb5b8e68183b/future-of-numpy-bof.odp > ] > > Notes from the followup BoF: > [ > https://docs.google.com/document/d/11AuTPms5dIPo04JaBOWEoebXfk-tUzEZ-CvFnLIt33w/edit > ] > > -n > > -- > Nathaniel J. Smith -- http://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Tue Aug 25 15:00:54 2015 From: travis at continuum.io (Travis Oliphant) Date: Tue, 25 Aug 2015 14:00:54 -0500 Subject: [Numpy-discussion] Notes from the numpy dev meeting at scipy 2015 In-Reply-To: References: Message-ID: Thanks for the write-up Nathaniel. There is a lot of great detail and interesting ideas here. I've am very eager to understand how to help NumPy and the wider community move forward however I can (my passions on this have not changed since 1999, though what I myself spend time on has changed). There are a lot of ways to think about approaching this, though. It's hard to get all the ideas on the table, and it was unfortunate we couldn't get everybody wyho are core NumPy devs together in person to have this discussion as there are still a lot of questions unanswered and a lot of thought that has gone into other approaches that was not brought up or represented in the meeting (how does Numba fit into this, what about data-shape, dynd, memory-views and Python type system, etc.). If NumPy becomes just an interface-specification, then why don't we just do that *outside* NumPy itself in a way that doesn't jeopardize the stability of NumPy today. These are some of the real questions I have. I will try to write up my thoughts in more depth soon, but I won't be able to respond in-depth right now. I just wanted to comment because Nathaniel said I disagree which is only partly true. The three most important things for me are 1) let's make sure we have representation from as wide of the community as possible (this is really hard), 2) let's look around at the broader community and the prior art that is happening in this space right now and 3) let's not pretend we are going to be able to make all this happen without breaking ABI compatibility. Let's just break ABI compatibility with NumPy 2.0 *and* have as much fidelity with the API and semantics of current NumPy as possible (though there will be some changes necessary long-term). I don't think we should intentionally break ABI if we can avoid it, but I also don't think we should spend in-ordinate amounts of time trying to pretend that we won't break ABI (for at least some people), and most importantly we should not pretend *not* to break the ABI when we actually do. We did this once before with the roll-out of date-time, and it was really un-necessary. When I released NumPy 1.0, there were several things that I knew should be fixed very soon (NumPy was never designed to not break ABI). Those problems are still there. Now, that we have quite a bit better understanding of what NumPy *should* be (there have been tremendous strides in understanding and community size over the past 10 years), let's actually make the infrastructure we think will last for the next 20 years (instead of trying to shoe-horn new ideas into a 20-year old code-base that wasn't designed for it). NumPy is a hard code-base. It has been since Numeric days in 1995. I could be wrong, but my guess is that we will be passed by as a community if we don't seize the opportunity to build something better than we can build if we are forced to use a 20 year old code-base. It is more important to not break people's code and to be clear when a re-compile is necessary for dependencies. Those to me are the most important constraints. There are a lot of great ideas that we all have about what we want NumPy to be able to do. Some of this are pretty transformational (and the more exciting they are, the harder I think they are going to be to implement without breaking at least the ABI). There is probably some CAP-like theorem around Stability-Features-Speed-of-Development (pick 2) when it comes to Open Source Software development and making feature-progress with NumPy *is going* to create in-stability which concerns me. I would like to see a little-bit-of-pain one time with a NumPy 2.0, rather than a constant pain because of constant churn over many years approach that Nathaniel seems to advocate. To me NumPy 2.0 is an ABI-breaking release that is as API-compatible as possible and whose semantics are not dramatically different. There are at least 3 areas of compatibility (ABI, API, and semantic). ABI-compatibility is a non-feature in today's world. There are so many distributions of the NumPy stack (and conda makes it trivial for anyone to build their own or for you to build one yourself). Making less-optimal software-engineering choices because of fear of breaking the ABI is not something I'm supportive of at all. We should not break ABI every release, but a release every 3 years that breaks ABI is not a problem. API compatibility should be much more sacrosanct, but it is also something that can also be managed. Any NumPy 2.0 should definitely support the full NumPy API (though there could be deprecated swaths). I think the community has done well in using deprecation and limiting the public API to make this more manageable and I would love to see a NumPy 2.0 that solidifies a future-oriented API along with a back-ward compatible API that is also available. Semantic compatibility is the hardest. We have already broken this on multiple occasions throughout the 1.x NumPy releases. Every time you change the code, this can change. This is what I fear causing deep instability over the course of many years. These are things like the casting rule details, the effect of indexing changes, any change to the calculations approaches. It is and has been the most at risk during any code-changes. My view is that a NumPy 2.0 (with a new low-level architecture) minimizes these changes to a single release rather than unavoidably spreading them out over many, many releases. I think that summarizes my main concerns. I will write-up more forward thinking ideas for what else is possible in the coming weeks. In the mean time, thanks for keeping the discussion going. It is extremely exciting to see the help people have continued to provide to maintain and improve NumPy. It will be exciting to see what the next few years bring as well. Best, -Travis On Tue, Aug 25, 2015 at 5:03 AM, Nathaniel Smith wrote: > Hi all, > > These are the notes from the NumPy dev meeting held July 7, 2015, at > the SciPy conference in Austin, presented here so the list can keep up > with what happens, and so you can give feedback. Please do give > feedback, none of this is final! > > (Also, if anyone who was there notices anything I left out or > mischaracterized, please speak up -- these are a lot of notes I'm > trying to gather together, so I could easily have missed something!) > > Thanks to Jill Cowan and the rest of the SciPy organizers for donating > space and organizing logistics for us, and to the Berkeley Institute > for Data Science for funding travel for Jaime, Nathaniel, and > Sebastian. > > > Attendees > ========= > > Present in the room for all or part: Daniel Allan, Chris Barker, > Sebastian Berg, Thomas Caswell, Jeff Reback, Jaime Fern?ndez del > R?o, Chuck Harris, Nathaniel Smith, St?fan van der Walt. (Note: I'm > pretty sure this list is incomplete) > > Joining remotely for all or part: Stephan Hoyer, Julian Taylor. > > > Formalizing our governance/decision making > ========================================== > > This was a major focus of discussion. At a high level, the consensus > was to steal IPython's governance document ("IPEP 29") and modify it > to remove its use of a BDFL as a "backstop" to normal community > consensus-based decision, and replace it with a new "backstop" based > on Apache-project-style consensus voting amongst the core team. > > I'll send out a proper draft of this shortly for further discussion. > > > Development roadmap > =================== > > General consensus: > > Let's assume NumPy is going to remain important indefinitely, and > try to make it better, instead of waiting for something better to > come along. (This is unlikely to be wasted effort even if something > better does come along, and it's hardly a sure thing that that will > happen anyway.) > > Let's focus on evolving numpy as far as we can without major > break-the-world changes (no "numpy 2.0", at least in the foreseeable > future). > > And, as a target for that evolution, let's change our focus from > numpy as "NumPy is the library that gives you the np.ndarray object > (plus some attached infrastructure)", to "NumPy provides the > standard framework for working with arrays and array-like objects in > Python" > > This means, creating defined interfaces between array-like objects / > ufunc objects / dtype objects, so that it becomes possible for third > parties to add their own and mix-and-match. Right now ufuncs are > pretty good at this, but if you want a new array class or dtype then > in most cases you pretty much have to modify numpy itself. > > Vision: instead of everyone who wants a new container type having to > reimplement all of numpy, Alice can implement an array class using > (sparse / distributed / compressed / tiled / gpu / out-of-core / > delayed / ...) storage, pass it to code that was written using > direct calls to np.* functions, and it just works. (Instead of > np.sin being "the way you calculate the sine of an ndarray", it's > "the way you calculate the sine of any array-like container > object".) > > Vision: Darryl can implement a new dtype for (categorical data / > astronomical dates / integers-with-missing-values / ...) without > having to touch the numpy core. > > Vision: Chandni can then come along and combine them by doing > > a = alice_array([...], dtype=darryl_dtype) > > and it just works. > > Vision: no-one is tempted to subclass ndarray, because anything you > can do with an ndarray subclass you can also easily do by defining > your own new class that implements the "array protocol". > > > Supporting third-party array types > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > Sub-goals: > - Get __numpy_ufunc__ done, which will cover a good chunk of numpy's > API right there. > - Go through the rest of the stuff in numpy, and figure out some > story for how to let it handle third-party array classes: > - ufunc ALL the things: Some things can be converted directly into > (g)ufuncs and then use __numpy_ufunc__ (e.g., np.std); some > things could be converted into (g)ufuncs if we extended the > (g)ufunc interface a bit (e.g. np.sort, np.matmul). > - Some things probably need their own __numpy_ufunc__-like > extensions (__numpy_concatenate__?) > - Provide tools to make it easier to implement the more complicated > parts of an array object (e.g. the bazillion different methods, > many of which are ufuncs in disguise, or indexing) > - Longer-run interesting research project: __numpy_ufunc__ requires > that one or the other object have explicit knowledge of how to > handle the other, so to handle binary ufuncs with N array types > you need something like N**2 __numpy_ufunc__ code paths. As an > alternative, if there were some interface that an object could > export that provided the operations nditer needs to efficiently > iterate over (chunks of) it, then you would only need N > implementations of this interface to handle all N**2 operations. > > This would solve a lot of problems for projects like: > - blosc > - dask > - distarray > - numpy.ma > - pandas > - scipy.sparse > - xray > > > Supporting third-party dtypes > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > We already have something like a C level "dtype > protocol". Conceptually, the way you define a new dtype is by > defining a new class whose instances have data attributes defining > the parameters of the dtype (what fields are in *this* record dtype, > how many characters are in *this* string dtype, what units are used > for *this* datetime64, etc.), and you define a bunch of methods to > do things like convert an object from a Python object to your dtype > or vice-versa, to copy an array of your dtype from one place to > another, to cast to and from your new dtype, etc. This part is > great. > > The problem is, in the current implementation, we don't actually use > the Python object system to define these classes / attributes / > methods. Instead, all possible dtypes are jammed into a single > Python-level class, whose struct has fields for the union of all > possible dtype's attributes, and instead of Python-style method > slots there's just a big table of function pointers attached to each > object. > > So the main proposal is that we keep the basic design, but switch it > so that the float64 dtype, the int64 dtype, etc. actually literally > are subclasses of np.dtype, each implementing their own fields and > Python-style methods. > > Some of the pieces involved in doing this: > > - The current dtype methods should be cleaned up -- e.g. 'dot' and > 'less_than' are both dtype methods, when conceptually they're much > more like ufuncs. > > - The ufunc inner-loop interface currently does not get a reference > to the dtype object, so they can't see its attributes and this is > a big obstacle to many interesting dtypes (e.g., it's hard to > implement np.equal for categoricals if you don't know what > categories each has). So we need to add new arguments to the core > ufunc loop signature. (Fortunately this can be done in a > backwards-compatible way.) > > - We need to figure out what exactly the dtype methods should be, > and add them to the dtype class (possibly with backwards > compatibility shims for anyone who is accessing PyArray_ArrFuncs > directly). > > - Casting will be possibly the trickiest thing to work out, though > the basic idea of using dunder-dispatch-like __cast__ and > __rcast__ methods seems workable. (Encouragingly, this is also > exactly what dynd also does, though unfortunately dynd does not > yet support user-defined dtypes even to the extent that numpy > does, so there isn't much else we can steal from them.) > - We may also want to rethink the casting rules while we're at it, > since they have some very weird corners right now (e.g. see > [https://github.com/numpy/numpy/issues/6240]) > > - We need to migrate the current dtypes over to the new system, > which can be done in stages: > > - First stick them all in a single "legacy dtype" class whose > methods just dispatch to the PyArray_ArrFuncs per-object "method > table" > > - Then move each of them into their own classes > > - We should provide a Python-level wrapper for the protocol, so that > you can call dtype methods from Python > > - And vice-versa, it should be possible to subclass dtype at the > Python level > > - etc. > > Fortunately, AFAICT pretty much all of this can be done while > maintaining backwards compatibility (though we may want to break > some obscure cases to avoid expending *too* much effort with weird > backcompat contortions that will only help a vanishingly small > proportion of the userbase), and a lot of the above changes can be > done as semi-independent mini-projects, so there's no need for some > branch to go off and spend a year rewriting the world. > > Obviously there are still a lot of details to work out, though. But > overall, there was widespread agreement that this is one of the #1 > pain points for our users (e.g. it's the single main request from > pandas), and fixing it is very high priority. > > Some features that would become straightforward to implement > (e.g. even in third-party libraries) if this were fixed: > - missing value support > - physical unit tracking (meters / seconds -> array of velocity; > meters + seconds -> error) > - better and more diverse datetime representations (e.g. datetimes > with attached timezones, or using funky geophysical or > astronomical calendars) > - categorical data > - variable length strings > - strings-with-encodings (e.g. latin1) > - forward mode automatic differentiation (write a function that > computes f(x) where x is an array of float64; pass that function > an array with a special dtype and get out both f(x) and f'(x)) > - probably others I'm forgetting right now > > I should also note that there was one substantial objection to this > plan, from Travis Oliphant (in discussions later in the > conference). I'm not confident I understand his objections well > enough to reproduce them here, though -- perhaps he'll elaborate. > > > Money > ===== > > There was an extensive discussion on the topic of: "if we had money, > what would we do with it?" > > This is partially motivated by the realization that there are a > number of sources that we could probably get money from, if we had a > good story for what we wanted to do, so it's not just an idle > question. > > Points of general agreement: > > - Doing the in-person meeting was a good thing. We should plan do > that again, at least once a year. So one thing to spend money on > is travel subsidies to make sure that happens and is productive. > > - While it's tempting to imagine hiring junior people for the more > frustrating/boring work like maintaining buildbots, release > infrastructure, updating docs, etc., this seems difficult to do > realistically with our current resources -- how do we hire for > this, who would manage them, etc.? > > - On the other hand, the general feeling was that if we found the > money to hire a few more senior people who could take care of > themselves more, then that would be good and we could > realistically absorb that extra work without totally unbalancing > the project. > > - A major open question is how we would recruit someone for a > position like this, since apparently all the obvious candidates > who are already active on the NumPy team already have other > things going on. [For calibration on how hard this can be: NYU > has apparently had an open position for a year with the job > description of "come work at NYU full-time with a > private-industry-competitive-salary on whatever your personal > open-source scientific project is" (!) and still is having an > extremely difficult time filling it: > [http://cds.nyu.edu/research-engineer/]] > > - General consensus though was that there isn't much to be done > about this though, except try it and see. > > - (By the way, if you're someone who's reading this and > potentially interested in like a postdoc or better working on > numpy, then let's talk...) > > > More specific changes to numpy that had general consensus, but don't > really fit into a high-level roadmap > > ========================================================================================================= > > - Resolved: we should merge multiarray.so and umath.so into a single > extension module, so that they can share utility code without the > current awkward contortions. > > - Resolved: we should start hiding new fields in the ufunc and dtype > structs as soon as possible going forward. (I.e. they would not be > present in the version of the structs that are exposed through the > C API, but internally we would use a more detailed struct.) > - Mayyyyyybe we should even go ahead and hide the subset of the > existing fields that are really internal details that no-one > should be using. If we did this without changing anything else > then it would preserve ABI (the fields would still be where > existing compiled extensions expect them to be, if any such > extensions exist) while breaking API (trying to compile such > extensions would give a clear error), so would be a smoother > ramp if we think we need to eventually break those fields for > real. (As discussed above, there are a bunch of fields in the > dtype base class that only make sense for specific dtype > subclasses, e.g. only record dtypes need a list of field names, > but right now all dtypes have one anyway. So it would be nice to > remove these from the base class entirely, but that is > potentially ABI-breaking.) > > - Resolved: np.array should never return an object array unless > explicitly requested (e.g. with dtype=object); it just causes too > many surprising problems. > - First step: add a deprecation warning > - Eventually: make it an error. > > - The matrix class > - Resolved: We won't add warnings yet, but we will prominently > document that it is deprecated and should be avoided where-ever > possible. > - St?fan van der Walt volunteers to do this. > - We'd all like to deprecate it properly, but the feeling was that > the precondition for this is for scipy.sparse to provide sparse > "arrays" that don't return np.matrix objects on ordinary > operatoins. Until that happens we can't reasonably tell people > that using np.matrix is a bug. > > - Resolved: we should add a similar prominent note to the > "subclassing ndarray" documentation, warning people that this is > painful and barely works and please don't do it if you have any > alternatives. > > - Resolved: we want more, smaller releases -- every 6 months at > least, aiming to go even faster (every 4 months?) > > - On the question of using Cython inside numpy core: > - Everyone agrees that there are places where this would be an > improvement (e.g., Python<->C interfaces, and places "when you > want to do computer science", e.g. complicated algorithmic stuff > like graph traversals) > - Chuck wanted it to be clear though that he doesn't think it > would be a good goal to try and rewrite all of numpy in Cython > -- there also exist places where Cython ends up being "an uglier > version of C". No-one disagreed. > > - Our text reader is apparently not very functional on Python 3, and > generally slow and hard to work with. > - Resolved: We should extract Pandas's awesome text reader/parser > and convert it into its own package, that could then become a > new backend for both pandas and numpy.loadtxt. > - Jeff thinks this is a great idea > - Thomas Caswell volunteers to do the extraction. > > - We should work on improving our tools for evolving the ABI, so > that we will eventually be less constrained by decisions made > decades ago. > - One idea that had a lot of support was to switch from our > current append-only C-API to a "sliding window" API based on > explicit versions. So a downstream package might say > > #define NUMPY_API_VERSION 4 > > and they'd get the functions and behaviour provided in "version > 4" of the numpy C api. If they wanted to get access to new stuff > that was added in version 5, then they'd need to switch that > #define, and at the same time clean up any usage of stuff that > was removed or changed in version 5. And to provide a smooth > migration path, one version of numpy would support multiple > versions at once, gradually deprecating and dropping old > versions. > > - If anyone wants to help bring pip up to scratch WRT tracking ABI > dependencies (e.g., 'pip install numpy==' > -> triggers rebuild of scipy against the new ABI), then that > would be an extremely useful thing. > > > Policies that should be documented > ================================== > > ...together with some notes about what the contents of the document > should be: > > > How we manage bugs in the bug tracker. > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > - Github "milestones" should *only* be assigned to release-blocker > bugs (which mostly means "regression from the last release"). > > In particular, if you're tempted to push a bug forward to the next > release... then it's clearly not a blocker, so don't set it to the > next release's milestone, just remove the milestone entirely. > > (Obvious exception to this: deprecation followup bugs where we > decide that we want to keep the deprecation around a bit longer > are a case where a bug actually does switch from being a blocker > to release 1.x to being a blocker for release 1.(x+1).) > > - Don't hesitate to close an issue if there's no way forward -- > e.g. a PR where the author has disappeared. Just post a link to > this policy and close, with a polite note that we need to keep our > tracker useful as a todo list, but they're welcome to re-open if > things change. > > > Deprecations and breakage policy: > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > - How long do we need to keep DeprecationWarnings around before we > break things? This is tricky because on the one hand an aggressive > (short) deprecation period lets us deliver new features and > important cleanups more quickly, but on the other hand a > too-aggressive deprecation period is difficult for our more > conservative downstream users. > > - Idea that had the most support: pick a somewhat-aggressive > warning period as our default, and make a rule that if someone > asks for an extension during the beta cycle for the release that > removes it, then we put it back for another release or two worth > of grace period. (While also possibly upgrading the warning to > be more visible during the grace period.) This gives us > deprecation periods that are more adaptive on a case-by-case > basis. > > - Lament: it would be really nice if we could get more people to > test our beta releases, because in practice right now 1.x.0 ends > up being where we actually the discover all the bugs, and 1.x.1 is > where it actually becomes usable. Which sucks, and makes it > difficult to have a solid policy about what counts as a > regression, etc. Is there anything we can do about this? > > - ABI breakage: we distinguish between an ABI break that breaks > everything (e.g., "import scipy" segfaults), versus an ABI break > that breaks an occasional rare case (e.g., only apps that poke > around in some obscure corner of some struct are affected). > > - The "break-the-world" type remains off-limit for now: the pain > is still too large (conda helps, but there are lots of people > who don't use conda!), and there aren't really any compelling > improvements that this would enable anyway. > > - For the "break-0.1%-of-users" type, it is *not* ruled out by > fiat, though we remain conservative: we should treat it like > other API breaks in principle, and do a careful case-by-case > analysis of the details of the situation, taking into account > what kind of code would be broken, how common these cases are, > how important the benefits are, whether there are any specific > mitigation strategies we can use, etc. -- with this process of > course taking into account that a segfault is nastier than a > Python exception. > > > Other points that were discussed > ================================ > > - There was inconclusive discussion of what we should do with dot() > in the places where it disagrees with the PEP 465 matmul semantics > (specifically this is when both arguments have ndim >= 3, or one > argument has ndim == 0). > - The concern is that the current behavior is not very useful, and > as far as we can tell no-one is using it; but, as people get > used to the more-useful PEP 465 behavior, they will increasingly > try to use it on the assumption that np.dot will work the same > way, and this will create pain for lots of people. So Nathaniel > argued that we should start at least issuing a visible warning > when people invoke the corner-case behavior. > - But OTOH, np.dot is such a core piece of infrastructure, and > there's such a large landscape of code out there using numpy > that we can't see, that others were reasonably wary of making > any change. > - For now: document prominently, but no change in behavior. > > > Links to raw notes > ================== > > Main page: > [https://github.com/numpy/numpy/wiki/SciPy-2015-developer-meeting] > > Notes from the meeting proper: > [ > https://docs.google.com/document/d/1IJcYdsHtk8MVAM4AZqFDBSf_nVG-mrB4Tv2bh9u1g4Y/edit?usp=sharing > ] > > Slides from the followup BoF: > [ > https://gist.github.com/njsmith/eb42762054c88e810786/raw/b74f978ce10a972831c582485c80fb5b8e68183b/future-of-numpy-bof.odp > ] > > Notes from the followup BoF: > [ > https://docs.google.com/document/d/11AuTPms5dIPo04JaBOWEoebXfk-tUzEZ-CvFnLIt33w/edit > ] > > -n > > -- > Nathaniel J. Smith -- http://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- *Travis Oliphant* *Co-founder and CEO* @teoliphant 512-222-5440 http://www.continuum.io -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Tue Aug 25 15:21:59 2015 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 25 Aug 2015 21:21:59 +0200 Subject: [Numpy-discussion] Notes from the numpy dev meeting at scipy 2015 References: Message-ID: <20150825212159.39cee394@fsol> On Tue, 25 Aug 2015 03:03:41 -0700 Nathaniel Smith wrote: > > Supporting third-party dtypes > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > [...] > > Some features that would become straightforward to implement > (e.g. even in third-party libraries) if this were fixed: > - missing value support > - physical unit tracking (meters / seconds -> array of velocity; > meters + seconds -> error) > - better and more diverse datetime representations (e.g. datetimes > with attached timezones, or using funky geophysical or > astronomical calendars) > - categorical data > - variable length strings > - strings-with-encodings (e.g. latin1) > - forward mode automatic differentiation (write a function that > computes f(x) where x is an array of float64; pass that function > an array with a special dtype and get out both f(x) and f'(x)) > - probably others I'm forgetting right now It should also be the opportunity to streamline datetime64 and timedelta64 dtypes. Currently the unit information is IIRC hidden in some weird metadata thing called the PyArray_DatetimeMetaData. Also, thanks the notes. It has been an interesting read. Regards Antoine. From rainwoodman at gmail.com Tue Aug 25 15:46:11 2015 From: rainwoodman at gmail.com (Feng Yu) Date: Tue, 25 Aug 2015 12:46:11 -0700 Subject: [Numpy-discussion] Notes from the numpy dev meeting at scipy 2015 In-Reply-To: <20150825212159.39cee394@fsol> References: <20150825212159.39cee394@fsol> Message-ID: Hi Nathaniel, Thanks for the notes. In some sense, the new dtype class(es) will provided a way of formalizing these `weird` metadata, and probably exposing them to Python. May I add that please consider adding a way to declare the sorting order (priority and direction) of fields in a structured array in the new dtype as well? Regards, Yu On Tue, Aug 25, 2015 at 12:21 PM, Antoine Pitrou wrote: > On Tue, 25 Aug 2015 03:03:41 -0700 > Nathaniel Smith wrote: >> >> Supporting third-party dtypes >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> > [...] >> >> Some features that would become straightforward to implement >> (e.g. even in third-party libraries) if this were fixed: >> - missing value support >> - physical unit tracking (meters / seconds -> array of velocity; >> meters + seconds -> error) >> - better and more diverse datetime representations (e.g. datetimes >> with attached timezones, or using funky geophysical or >> astronomical calendars) >> - categorical data >> - variable length strings >> - strings-with-encodings (e.g. latin1) >> - forward mode automatic differentiation (write a function that >> computes f(x) where x is an array of float64; pass that function >> an array with a special dtype and get out both f(x) and f'(x)) >> - probably others I'm forgetting right now > > It should also be the opportunity to streamline datetime64 and > timedelta64 dtypes. Currently the unit information is IIRC hidden in > some weird metadata thing called the PyArray_DatetimeMetaData. > > Also, thanks the notes. It has been an interesting read. > > Regards > > Antoine. > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From charlesr.harris at gmail.com Tue Aug 25 16:58:46 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 25 Aug 2015 14:58:46 -0600 Subject: [Numpy-discussion] Notes from the numpy dev meeting at scipy 2015 In-Reply-To: References: Message-ID: On Tue, Aug 25, 2015 at 1:00 PM, Travis Oliphant wrote: > Thanks for the write-up Nathaniel. There is a lot of great detail and > interesting ideas here. > > I've am very eager to understand how to help NumPy and the wider community > move forward however I can (my passions on this have not changed since > 1999, though what I myself spend time on has changed). > > There are a lot of ways to think about approaching this, though. It's > hard to get all the ideas on the table, and it was unfortunate we couldn't > get everybody wyho are core NumPy devs together in person to have this > discussion as there are still a lot of questions unanswered and a lot of > thought that has gone into other approaches that was not brought up or > represented in the meeting (how does Numba fit into this, what about > data-shape, dynd, memory-views and Python type system, etc.). If NumPy > becomes just an interface-specification, then why don't we just do that > *outside* NumPy itself in a way that doesn't jeopardize the stability of > NumPy today. These are some of the real questions I have. I will try > to write up my thoughts in more depth soon, but I won't be able to respond > in-depth right now. I just wanted to comment because Nathaniel said I > disagree which is only partly true. > > The three most important things for me are 1) let's make sure we have > representation from as wide of the community as possible (this is really > hard), 2) let's look around at the broader community and the prior art that > is happening in this space right now and 3) let's not pretend we are going > to be able to make all this happen without breaking ABI compatibility. > Let's just break ABI compatibility with NumPy 2.0 *and* have as much > fidelity with the API and semantics of current NumPy as possible (though > there will be some changes necessary long-term). > > I don't think we should intentionally break ABI if we can avoid it, but I > also don't think we should spend in-ordinate amounts of time trying to > pretend that we won't break ABI (for at least some people), and most > importantly we should not pretend *not* to break the ABI when we actually > do. We did this once before with the roll-out of date-time, and it was > really un-necessary. When I released NumPy 1.0, there were several > things that I knew should be fixed very soon (NumPy was never designed to > not break ABI). Those problems are still there. Now, that we have > quite a bit better understanding of what NumPy *should* be (there have been > tremendous strides in understanding and community size over the past 10 > years), let's actually make the infrastructure we think will last for the > next 20 years (instead of trying to shoe-horn new ideas into a 20-year old > code-base that wasn't designed for it). > > NumPy is a hard code-base. It has been since Numeric days in 1995. I > could be wrong, but my guess is that we will be passed by as a community if > we don't seize the opportunity to build something better than we can build > if we are forced to use a 20 year old code-base. > > It is more important to not break people's code and to be clear when a > re-compile is necessary for dependencies. Those to me are the most > important constraints. There are a lot of great ideas that we all have > about what we want NumPy to be able to do. Some of this are pretty > transformational (and the more exciting they are, the harder I think they > are going to be to implement without breaking at least the ABI). There > is probably some CAP-like theorem around > Stability-Features-Speed-of-Development (pick 2) when it comes to Open > Source Software development and making feature-progress with NumPy *is > going* to create in-stability which concerns me. > > I would like to see a little-bit-of-pain one time with a NumPy 2.0, rather > than a constant pain because of constant churn over many years approach > that Nathaniel seems to advocate. To me NumPy 2.0 is an ABI-breaking > release that is as API-compatible as possible and whose semantics are not > dramatically different. > > There are at least 3 areas of compatibility (ABI, API, and semantic). > ABI-compatibility is a non-feature in today's world. There are so many > distributions of the NumPy stack (and conda makes it trivial for anyone to > build their own or for you to build one yourself). Making less-optimal > software-engineering choices because of fear of breaking the ABI is not > something I'm supportive of at all. We should not break ABI every > release, but a release every 3 years that breaks ABI is not a problem. > > API compatibility should be much more sacrosanct, but it is also something > that can also be managed. Any NumPy 2.0 should definitely support the > full NumPy API (though there could be deprecated swaths). I think the > community has done well in using deprecation and limiting the public API to > make this more manageable and I would love to see a NumPy 2.0 that > solidifies a future-oriented API along with a back-ward compatible API that > is also available. > > Semantic compatibility is the hardest. We have already broken this on > multiple occasions throughout the 1.x NumPy releases. Every time you > change the code, this can change. This is what I fear causing deep > instability over the course of many years. These are things like the > casting rule details, the effect of indexing changes, any change to the > calculations approaches. It is and has been the most at risk during any > code-changes. My view is that a NumPy 2.0 (with a new low-level > architecture) minimizes these changes to a single release rather than > unavoidably spreading them out over many, many releases. > > I think that summarizes my main concerns. I will write-up more forward > thinking ideas for what else is possible in the coming weeks. In the mean > time, thanks for keeping the discussion going. It is extremely exciting to > see the help people have continued to provide to maintain and improve > NumPy. It will be exciting to see what the next few years bring as > well. > I think the only thing that looks even a little bit like a numpy 2.0 at this time is dynd. Rewriting numpy, let alone producing numpy 2.0 is a major project. Dynd is 2.5+ years old, 3500+ commits in, and still in progress. If there is a decision to pursue Dynd I could support that, but I think we would want to think deeply about how to make the transition as painless as possible. It would be good at this point to get some feedback from people currently using dynd. IIRC, part of the reason for starting dynd was the perception that is was not possible to evolve numpy without running into compatibility road blocks. Travis, could you perhaps summarize the thinking that went into the decision to make dynd a separate project? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Tue Aug 25 20:53:00 2015 From: cournape at gmail.com (David Cournapeau) Date: Wed, 26 Aug 2015 01:53:00 +0100 Subject: [Numpy-discussion] Notes from the numpy dev meeting at scipy 2015 In-Reply-To: References: Message-ID: Thanks for the good summary Nathaniel. Regarding dtype machinery, I agree casting is the hardest part. Unless the code has changed dramatically, this was the main reason why you could not make most of the dtypes separate from numpy codebase (I tried to move the datetime dtype out of multiarray into a separate C extension some years ago). Being able to separate the dtypes from the multiarray module would be an obvious way to drive the internal API change. Regarding the use of cython in numpy, was there any discussion about the compilation/size cost of using cython, and talking to the cython team to improve this ? Or was that considered acceptable with current cython for numpy. I am convinced cleanly separating the low level parts from the python C API plumbing would be the single most important thing one could do to make the codebase more amenable. David On Tue, Aug 25, 2015 at 9:58 PM, Charles R Harris wrote: > > > On Tue, Aug 25, 2015 at 1:00 PM, Travis Oliphant > wrote: > >> Thanks for the write-up Nathaniel. There is a lot of great detail and >> interesting ideas here. >> >> I've am very eager to understand how to help NumPy and the wider >> community move forward however I can (my passions on this have not changed >> since 1999, though what I myself spend time on has changed). >> >> There are a lot of ways to think about approaching this, though. It's >> hard to get all the ideas on the table, and it was unfortunate we couldn't >> get everybody wyho are core NumPy devs together in person to have this >> discussion as there are still a lot of questions unanswered and a lot of >> thought that has gone into other approaches that was not brought up or >> represented in the meeting (how does Numba fit into this, what about >> data-shape, dynd, memory-views and Python type system, etc.). If NumPy >> becomes just an interface-specification, then why don't we just do that >> *outside* NumPy itself in a way that doesn't jeopardize the stability of >> NumPy today. These are some of the real questions I have. I will try >> to write up my thoughts in more depth soon, but I won't be able to respond >> in-depth right now. I just wanted to comment because Nathaniel said I >> disagree which is only partly true. >> >> The three most important things for me are 1) let's make sure we have >> representation from as wide of the community as possible (this is really >> hard), 2) let's look around at the broader community and the prior art that >> is happening in this space right now and 3) let's not pretend we are going >> to be able to make all this happen without breaking ABI compatibility. >> Let's just break ABI compatibility with NumPy 2.0 *and* have as much >> fidelity with the API and semantics of current NumPy as possible (though >> there will be some changes necessary long-term). >> >> I don't think we should intentionally break ABI if we can avoid it, but I >> also don't think we should spend in-ordinate amounts of time trying to >> pretend that we won't break ABI (for at least some people), and most >> importantly we should not pretend *not* to break the ABI when we actually >> do. We did this once before with the roll-out of date-time, and it was >> really un-necessary. When I released NumPy 1.0, there were several >> things that I knew should be fixed very soon (NumPy was never designed to >> not break ABI). Those problems are still there. Now, that we have >> quite a bit better understanding of what NumPy *should* be (there have been >> tremendous strides in understanding and community size over the past 10 >> years), let's actually make the infrastructure we think will last for the >> next 20 years (instead of trying to shoe-horn new ideas into a 20-year old >> code-base that wasn't designed for it). >> >> NumPy is a hard code-base. It has been since Numeric days in 1995. I >> could be wrong, but my guess is that we will be passed by as a community if >> we don't seize the opportunity to build something better than we can build >> if we are forced to use a 20 year old code-base. >> >> It is more important to not break people's code and to be clear when a >> re-compile is necessary for dependencies. Those to me are the most >> important constraints. There are a lot of great ideas that we all have >> about what we want NumPy to be able to do. Some of this are pretty >> transformational (and the more exciting they are, the harder I think they >> are going to be to implement without breaking at least the ABI). There >> is probably some CAP-like theorem around >> Stability-Features-Speed-of-Development (pick 2) when it comes to Open >> Source Software development and making feature-progress with NumPy *is >> going* to create in-stability which concerns me. >> >> I would like to see a little-bit-of-pain one time with a NumPy 2.0, >> rather than a constant pain because of constant churn over many years >> approach that Nathaniel seems to advocate. To me NumPy 2.0 is an >> ABI-breaking release that is as API-compatible as possible and whose >> semantics are not dramatically different. >> >> There are at least 3 areas of compatibility (ABI, API, and semantic). >> ABI-compatibility is a non-feature in today's world. There are so many >> distributions of the NumPy stack (and conda makes it trivial for anyone to >> build their own or for you to build one yourself). Making less-optimal >> software-engineering choices because of fear of breaking the ABI is not >> something I'm supportive of at all. We should not break ABI every >> release, but a release every 3 years that breaks ABI is not a problem. >> >> API compatibility should be much more sacrosanct, but it is also >> something that can also be managed. Any NumPy 2.0 should definitely >> support the full NumPy API (though there could be deprecated swaths). I >> think the community has done well in using deprecation and limiting the >> public API to make this more manageable and I would love to see a NumPy 2.0 >> that solidifies a future-oriented API along with a back-ward compatible API >> that is also available. >> >> Semantic compatibility is the hardest. We have already broken this on >> multiple occasions throughout the 1.x NumPy releases. Every time you >> change the code, this can change. This is what I fear causing deep >> instability over the course of many years. These are things like the >> casting rule details, the effect of indexing changes, any change to the >> calculations approaches. It is and has been the most at risk during any >> code-changes. My view is that a NumPy 2.0 (with a new low-level >> architecture) minimizes these changes to a single release rather than >> unavoidably spreading them out over many, many releases. >> >> I think that summarizes my main concerns. I will write-up more forward >> thinking ideas for what else is possible in the coming weeks. In the mean >> time, thanks for keeping the discussion going. It is extremely exciting to >> see the help people have continued to provide to maintain and improve >> NumPy. It will be exciting to see what the next few years bring as >> well. >> > > I think the only thing that looks even a little bit like a numpy 2.0 at > this time is dynd. Rewriting numpy, let alone producing numpy 2.0 is a > major project. Dynd is 2.5+ years old, 3500+ commits in, and still in > progress. If there is a decision to pursue Dynd I could support that, but > I think we would want to think deeply about how to make the transition as > painless as possible. It would be good at this point to get some feedback > from people currently using dynd. IIRC, part of the reason for starting > dynd was the perception that is was not possible to evolve numpy without > running into compatibility road blocks. Travis, could you perhaps summarize > the thinking that went into the decision to make dynd a separate project? > > > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Tue Aug 25 23:34:25 2015 From: travis at continuum.io (Travis Oliphant) Date: Tue, 25 Aug 2015 22:34:25 -0500 Subject: [Numpy-discussion] Notes from the numpy dev meeting at scipy 2015 In-Reply-To: References: Message-ID: On Tue, Aug 25, 2015 at 3:58 PM, Charles R Harris wrote: > > > On Tue, Aug 25, 2015 at 1:00 PM, Travis Oliphant > wrote: > >> Thanks for the write-up Nathaniel. There is a lot of great detail and >> interesting ideas here. >> >> >> > > There are at least 3 areas of compatibility (ABI, API, and semantic). >> ABI-compatibility is a non-feature in today's world. There are so many >> distributions of the NumPy stack (and conda makes it trivial for anyone to >> build their own or for you to build one yourself). Making less-optimal >> software-engineering choices because of fear of breaking the ABI is not >> something I'm supportive of at all. We should not break ABI every >> release, but a release every 3 years that breaks ABI is not a problem. >> >> API compatibility should be much more sacrosanct, but it is also >> something that can also be managed. Any NumPy 2.0 should definitely >> support the full NumPy API (though there could be deprecated swaths). I >> think the community has done well in using deprecation and limiting the >> public API to make this more manageable and I would love to see a NumPy 2.0 >> that solidifies a future-oriented API along with a back-ward compatible API >> that is also available. >> >> Semantic compatibility is the hardest. We have already broken this on >> multiple occasions throughout the 1.x NumPy releases. Every time you >> change the code, this can change. This is what I fear causing deep >> instability over the course of many years. These are things like the >> casting rule details, the effect of indexing changes, any change to the >> calculations approaches. It is and has been the most at risk during any >> code-changes. My view is that a NumPy 2.0 (with a new low-level >> architecture) minimizes these changes to a single release rather than >> unavoidably spreading them out over many, many releases. >> >> I think that summarizes my main concerns. I will write-up more forward >> thinking ideas for what else is possible in the coming weeks. In the mean >> time, thanks for keeping the discussion going. It is extremely exciting to >> see the help people have continued to provide to maintain and improve >> NumPy. It will be exciting to see what the next few years bring as >> well. >> > > I think the only thing that looks even a little bit like a numpy 2.0 at > this time is dynd. Rewriting numpy, let alone producing numpy 2.0 is a > major project. Dynd is 2.5+ years old, 3500+ commits in, and still in > progress. If there is a decision to pursue Dynd I could support that, but > I think we would want to think deeply about how to make the transition as > painless as possible. It would be good at this point to get some feedback > from people currently using dynd. IIRC, part of the reason for starting > dynd was the perception that is was not possible to evolve numpy without > running into compatibility road blocks. Travis, could you perhaps summarize > the thinking that went into the decision to make dynd a separate project? > Thanks Chuck. I'll do this in a separate email, but I just wanted to point out that when I say NumPy 2.0, I'm actually only specifically talking about a release of NumPy that breaks ABI compatibility --- not some potential re-write. I'm not ruling that out, but I'm not necessarily implying such a thing by saying NumPy 2.0. > > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- *Travis Oliphant* *Co-founder and CEO* @teoliphant 512-222-5440 http://www.continuum.io -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Wed Aug 26 00:55:49 2015 From: travis at continuum.io (Travis Oliphant) Date: Tue, 25 Aug 2015 23:55:49 -0500 Subject: [Numpy-discussion] Notes from the numpy dev meeting at scipy 2015 In-Reply-To: References: Message-ID: On Tue, Aug 25, 2015 at 3:58 PM, Charles R Harris wrote: > > > On Tue, Aug 25, 2015 at 1:00 PM, Travis Oliphant > wrote: > >> Thanks for the write-up Nathaniel. There is a lot of great detail and >> interesting ideas here. >> >> >> > > I think that summarizes my main concerns. I will write-up more forward >> thinking ideas for what else is possible in the coming weeks. In the mean >> time, thanks for keeping the discussion going. It is extremely exciting to >> see the help people have continued to provide to maintain and improve >> NumPy. It will be exciting to see what the next few years bring as >> well. >> > > I think the only thing that looks even a little bit like a numpy 2.0 at > this time is dynd. Rewriting numpy, let alone producing numpy 2.0 is a > major project. Dynd is 2.5+ years old, 3500+ commits in, and still in > progress. If there is a decision to pursue Dynd I could support that, but > I think we would want to think deeply about how to make the transition as > painless as possible. It would be good at this point to get some feedback > from people currently using dynd. IIRC, part of the reason for starting > dynd was the perception that is was not possible to evolve numpy without > running into compatibility road blocks. Travis, could you perhaps summarize > the thinking that went into the decision to make dynd a separate project? > I think it would be best if Mark Wiebe speaks up here. I can explain why Continuum supported DyND with some fraction of Mark's time for a few years and give my perspective, but ultimately DyND is Mark's story to tell (and a few talented people have now joined him in the effort). Mark Wiebe was a productive NumPy developer. He was one of a few people that jumped in on the code-base and made substantial and significant changes and came to understand just how hard it can be to develop in the NumPy code-base. He also is a C++ developer who really likes the beauty and power of that language (which definitely biases his NumPy work, but he did put a lot of effort into making NumPy better). Before Peter and I started Continuum, Mark had begun the DyND project as an example of a general-purpose dynamic array library that could be used by any dynamic language to make arrays. In the early days of Continuum, we spent time from at least Mark W, Bryan Van de Ven, Jay Borque, and Francesc Alted looking at how to extend NumPy to add 1) categorical data-types, 2) variable-length strings, and 3) better date-time types. Bryan, a good developer, who has gone on to be a primary developer of Bokeh spent quite a bit of time and had a prototype of categoricals *nearly* working. He did not like working on the NumPy code-base "at all". He struggled with it and found it very difficult to extend. He worked closely with Mark Wiebe who helped him the best he could. What took him 4 weeks in NumPy took him 3 days in DyND to build. I think that experience, convinced him and Mark W both that working with NumPy code-base would take too long to make significant progress. Also, during 2012 I was trying to help with release-management (though I ended up just hiring Ondrej Certek to actually do the work and he did a great job of getting a release of NumPy out the door --- thanks to much help from many of you). At that point, I realized very clearly, that what I could best do at this point was to try and get more resources for open source and for the NumPy stack rather than work on the code directly. We also did work with several clients that helped me realize just how many disruptive changes had happened from 1.4 to 1.7 for extensive users of NumPy (much more than would be justified from a "we don't break the ABI" mantra that was the stated goal). We also realized that the kind of experimentation we wanted to do in the first 2 years of Continuum would just not be possible on the NumPy code-base and the need for getting community buy-in on every decision would slow us down too much --- as we had to iterate rapidly on so many things and find our center as a startup. It also would not be fair to the NumPy community. Our decision to do *all* of our exploration outside the NumPy code base was basically 1) the kinds of changes we wanted ultimately were potentially dramatic and disruptive, 2) it would be too difficult and time-consuming to decide all things in public discussions with the NumPy community --- especially when some things were experimental 3) tying ourselves to releases of NumPy would be difficult at that time, and 4) the design of the NumPy code-base makes it difficult to contribute to --- both Mark W and Bryan V felt they could make progress *much* faster in a new code-base. Continuum did not have enough start-up funding to devote significant time on DyND in the early days. So Mark rallied what resources he could and we supported him the best we could and he made progress. My only real requirement with sponsoring his work when we did was that it must have a python interface that did not use Boost. He stretched Cython and found a lot of holes in it and that took a bit of his time as well. I think he is now a "just write your own wrapper believer" but I shouldn't put words in his mouth or digress. DyND became part of the Blaze effort once we received DARPA money (though the grant was primarily for Bokeh but we also received permission to use some of the funds for Numba and Blaze development). Because of the other work around Numba and Blaze, DyND work was delayed quite often. For the Blaze project, mostly DyND became another implementation of the data-shape data description mechanism and a way to proto-type computed columns and remote arrays (now in Blaze server). The Blaze team struggled for the first 18 months with the lack of a gelled team and a concrete vision for what it should be exactly. Thanks to Andy Terrel, Phillip Cloud, Mark Wiebe, and Matt Rocklin as well as others who are currently on the project, Blaze is now much more clear in its goals as a high-level array and table logical object for scientists, data-scientists, and engineers that can be backed by larger-than-memory (i.e. Dask) and cluster-based computational systems (i.e. Spark and Impala). This clarity was not present as we looked for people to collaborate with and explored the space of code-compilation, delayed evaluation, and data-type-systems that are necessary and useful for distributed array-systems generally. If you look today at Ibis and Bolt-project you see other examples of what Blaze is. I see massive overlap between Blaze and these projects. I think the description of those projects can help you understand Blaze which is why I mention them. In that confusion, Mark continued to make progress on his C++-based container-type (at one point we even called it "Blaze-local") that had the advantage of not requiring a Python-runtime and could fully parse the data-shape data-description system that is a generalization of NumPy dtypes (some on Continuum time, some on his own time). Last year, he attracted the attention of Irwin Zaid who added GPU-computation capability. Last fall, Pandas was able to make DyND an optional dependency because DyND has better support for some of the key things Pandas needs and does not require the full NumPy API. In January, Mark W left Continuum to go back to work in the digital effects industry on his old code-base though he continues to take interest in DyND. A month ago, Continuum began to again sponsor Irwin to work on DyND in order to continue its development at least sufficient to support 1) Pandas and 2) processing of semi-structured data (like a collection of JSON objects). DyND is a bigger system than NumPy (as it doesn't rely on Python at all for its core functionality). The Python-interface has not always been as up to date as it could be and Irwin is currently working on that as well as making it easier to install. I'm sure he would love the help if anyone wants to join him. At the same time in 2012, I became very enamored with Numba and the potential for how Numba could make it possible to not even *have* to depend on a single container library like NumPy. I often say that If Numba and Conda had existed 15 years ago, there would not even *be* a SciPy library. Instead there would be a collection of numba-modules that do all the same things. We might not even have Julia, as well --- but that is a longer and more controversial conversation. With Numba you can write your own array-code as needed. We moved the basic array-type into an llvm specification (llvm_array.py) in old llvm.py: https://github.com/llvmpy/llvmpy/blob/master/llvm_array/array.py. (Note that llvm.py is no longer maintained, though). At this point quite a bit of the NumPy API is implemented outside of NumPy in Numba (there is still much more to do, though). As Numba has developed, I have seen how *both* DyND *and* Numba could independently be an architecture to underly a new array abstraction that could effectively replace NumPy for people. A combination of the two would be quite powerful -- especially when combined now with Dask. Numba needs 2 things presently before I can confidently say that a numpy module could be built that is fully backwards API compatible with current NumPy in about 6 months (though not necessarily semantically in all corner cases). These 2 things are currently on the near-term Numba road-map: 1) the ability to ship a Python extension module that does not require numba to be installed, and 2) jit-classes (so that you can build native-classes and have that be part of the type-specification. So, basically you have 2 additional options for NumPy future besides what Nathaniel laid out: 1) DyND-based or 2) Numba-based. A combination of the two (DyND for a pre-compiled run-time library) and Numba for JIT extensions is also a corollary. A third approach has even more potential to change super-charge Python 3.X for array-oriented programming. This approach could also be combined with DyND and/or Numba as desired. This approach is to use the fact that the buffer protocol in Python exists and therefore we *can* have more than one array-type. In fact, the basic array-structure exists as the memory-view object in Python (rescued from its unfinished form by Antoine and now supported in Cython). The main problem with it as an underlying array-type for computation 1) it's type-system is low-level struct-string syntax that is hard to build-on and 2) there are no basic computations on memory-views. These are both easily remedied. So, the approach would be to: 1) build a Python-type-to-struct-string syntax translator that would allow you to create memory-views from a Python-based type-system that replaces dtype 2) make a new gufunc sub-system that works with memory-views as containers. I think this would be an interesting project in it's own right and could borrow from current NumPy a great deal --- I think it would be simpler than the re-factor of gufuncs that Nathaniel proposes to enable dtype-information to be available to the low-level multi-methods. You can basically eliminate NumPy with something that provides those 2 things --- and that is potentially something you could rally PyPy and Jython and any other Python implementation behind (rather than numpypy and/or numpy4j). If anyone is interested in pursuing this last idea, please let me know. It hit me like a brick at PyCon this year after talking with Nathaniel about what he wanted to do with dtypes and watching Guido's talk on type-hinting now in Python 3. Finally, as I've been thinking more and more about *big* data and the needs of scaling, I've toned-down my infatuation with "typed pointers" (which NumPy basically is). The real value of "typed pointers" is that there is so much low-level code out there that does interesting things that use "typed pointers" for their basic shared abstraction. However, what we really need shared abstractions around are "typed iterators" and a whole lot of code that uses these "typed iterators" for all kinds of calculations. The problem is that there is no C-ABI equivalent for typed iterators. Where is the BLAS or LAPACK for typed-iterators that doesn't rely on a particular C++ compiler to get the memory-layout?. Every language stack implements iterators in their own way --- so you have silos and not shared abstractions across run-times. The NumPy stack on typed-iterators is now a *whole lot* harder to build. This is part of why I want to see jit-classes on Numba -- I want to end up with a defined ABI for abstractions. Abstractions are great. Shared abstractions can be *viral* and are exponentially better. We need more of those! My plea to anyone reading this is: Please make more shared abstractions ;-) Of course no one person can make a shared abstraction --- they have to emerge! One person can make abstractions though --- and that is the pre-requisite to getting them adopted by others and therefore shared. I know this is a dump of a lot of information. Some of it might even make sense and perhaps a little bit might be useful to some of you. Now for a blatant plea -- if you are interested in working on NumPy (with ideas from whatever source --- not just mine), please talk to me --- we are hiring and I can arrange for some of your time to be spent contributing to any of these ideas (including what Nathaniel wrote about --- as long as we plan for ABI breakage). Guido offered this for Python, and I will offer it for NumPy --- if you are a woman with the right back-ground I will personally commit to training you to be able to work more on NumPy. But, be warned, working on NumPy is not the path to riches and fame is fleeting ;-) Best, -Travis > > > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- *Travis Oliphant* *Co-founder and CEO* @teoliphant 512-222-5440 http://www.continuum.io -------------- next part -------------- An HTML attachment was scrubbed... URL: From fperez.net at gmail.com Wed Aug 26 01:24:21 2015 From: fperez.net at gmail.com (Fernando Perez) Date: Tue, 25 Aug 2015 22:24:21 -0700 Subject: [Numpy-discussion] Python extensions for Python 3.5 - useful info... Message-ID: Just an FYI for the upcoming Python release, a very detailed post from Steve Dower, the Microsoft developer who is now in charge of the Windows releases for Python, on how the build process will change in 3.5 regarding extensions: http://stevedower.id.au/blog/building-for-python-3-5/ Cheers, f -- Fernando Perez (@fperez_org; http://fperez.org) fperez.net-at-gmail: mailing lists only (I ignore this when swamped!) fernando.perez-at-berkeley: contact me here for any direct mail -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Wed Aug 26 02:41:16 2015 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 25 Aug 2015 23:41:16 -0700 Subject: [Numpy-discussion] Notes from the numpy dev meeting at scipy 2015 In-Reply-To: References: Message-ID: Hi Travis, Thanks for taking the time to write up your thoughts! I have many thoughts in return, but I will try to restrict myself to two main ones :-). 1) On the question of whether work should be directed towards improving NumPy-as-it-is or instead towards a compatibility-breaking replacement: There's plenty of room for debate about whether it's better engineering practice to try and evolve an existing system in place versus starting over, and I guess we have some fundamental disagreements there, but I actually think this debate is a distraction -- we can agree to disagree, because in fact we have to try both. At a practical level: NumPy *is* going to continue to evolve, because it has users and people interested in evolving it; similarly, dynd and other alternatives libraries will also continue to evolve, because they also have people interested in doing it. And at a normative level, this is a good thing! If NumPy and dynd both get better, than that's awesome: the worst case is that NumPy adds the new features that we talked about at the meeting, and dynd simultaneously becomes so awesome that everyone wants to switch to it, and the result of this would be... that those NumPy features are exactly the ones that will make the transition to dynd easier. Or if some part of that plan goes wrong, then well, NumPy will still be there as a fallback, and in the mean time we've actually fixed the major pain points our users are begging us to fix. You seem to be urging us all to make a double-or-nothing wager that your extremely ambitious plans will all work out, with the entire numerical Python ecosystem as the stakes. I think this ambition is awesome, but maybe it'd be wise to hedge our bets a bit? 2) You really emphasize this idea of an ABI-breaking (but not API-breaking) release, and I think this must indicate some basic gap in how we're looking at things. Where I'm getting stuck here is that... I actually can't think of anything important that we can't do now, but could if we were allowed to break ABI compatibility. The kinds of things that break ABI but keep API are like... rearranging what order the fields in a struct fall in, or changing the numeric value of opaque constants like NPY_ARRAY_WRITEABLE. The biggest win I can think of is that we could save a few bytes per array by arranging the fields inside the ndarray struct more optimally, but that's hardly a feature to hang a 2.0 on. You seem to have a vision of this ABI-breaking release as being something very different from that, and I'm not clear on what this vision is. The main reason I personally am against having a big ABI-breaking release is not that I hate ABI breakage a priori, it's that all the big features that I care about and the are users are asking for seem to be ones that... don't actually require doing that. At most they seem to get a mild benefit from breaking some obscure corner cases. So the cost/benefits don't make any sense to me. So: can you give a concrete example of a change you have in mind where breaking ABI would be the key enabler? (I guess you might also be thinking of a separate issue that you sort of allude to: Perhaps we will try to make changes which we think don't involve breaking the ABI, but discover too late that we have failed to fully understand the implications and have broken it by mistake. IIUC this is what happened in the 1.4 timeframe when datetime64 was merged and accidentally renumbered some of the NPY_* constants. Partially I am less worried about this because I have a fair amount of confidence that our review and QA process has improved these days to the point that we would not let a change like that slip through by accident -- we have a lot more active reviewers, people are sensitized to the issues, we've successfully landed intrusive changes like Sebastian's indexing rewrite, ... though this is very much second-hand impressions on my part, and I'd welcome input from folks like Chuck who have a clearer view on how things have changed from then to now. But more importantly, even if this is true, then I can't see how your proposal helps. If we aren't good enough at our jobs to predict when we'll break ABI, then by assumption it makes no sense to pick one release and decide that this is the one time that we'll break ABI.) On Tue, Aug 25, 2015 at 12:00 PM, Travis Oliphant wrote: > Thanks for the write-up Nathaniel. There is a lot of great detail and > interesting ideas here. > > I've am very eager to understand how to help NumPy and the wider community > move forward however I can (my passions on this have not changed since > 1999, though what I myself spend time on has changed). > > There are a lot of ways to think about approaching this, though. It's > hard to get all the ideas on the table, and it was unfortunate we couldn't > get everybody wyho are core NumPy devs together in person to have this > discussion as there are still a lot of questions unanswered and a lot of > thought that has gone into other approaches that was not brought up or > represented in the meeting (how does Numba fit into this, what about > data-shape, dynd, memory-views and Python type system, etc.). If NumPy > becomes just an interface-specification, then why don't we just do that > *outside* NumPy itself in a way that doesn't jeopardize the stability of > NumPy today. These are some of the real questions I have. I will try > to write up my thoughts in more depth soon, but I won't be able to respond > in-depth right now. I just wanted to comment because Nathaniel said I > disagree which is only partly true. > > The three most important things for me are 1) let's make sure we have > representation from as wide of the community as possible (this is really > hard), 2) let's look around at the broader community and the prior art that > is happening in this space right now and 3) let's not pretend we are going > to be able to make all this happen without breaking ABI compatibility. > Let's just break ABI compatibility with NumPy 2.0 *and* have as much > fidelity with the API and semantics of current NumPy as possible (though > there will be some changes necessary long-term). > > I don't think we should intentionally break ABI if we can avoid it, but I > also don't think we should spend in-ordinate amounts of time trying to > pretend that we won't break ABI (for at least some people), and most > importantly we should not pretend *not* to break the ABI when we actually > do. We did this once before with the roll-out of date-time, and it was > really un-necessary. When I released NumPy 1.0, there were several > things that I knew should be fixed very soon (NumPy was never designed to > not break ABI). Those problems are still there. Now, that we have > quite a bit better understanding of what NumPy *should* be (there have been > tremendous strides in understanding and community size over the past 10 > years), let's actually make the infrastructure we think will last for the > next 20 years (instead of trying to shoe-horn new ideas into a 20-year old > code-base that wasn't designed for it). > > NumPy is a hard code-base. It has been since Numeric days in 1995. I > could be wrong, but my guess is that we will be passed by as a community if > we don't seize the opportunity to build something better than we can build > if we are forced to use a 20 year old code-base. > > It is more important to not break people's code and to be clear when a > re-compile is necessary for dependencies. Those to me are the most > important constraints. There are a lot of great ideas that we all have > about what we want NumPy to be able to do. Some of this are pretty > transformational (and the more exciting they are, the harder I think they > are going to be to implement without breaking at least the ABI). There > is probably some CAP-like theorem around > Stability-Features-Speed-of-Development (pick 2) when it comes to Open > Source Software development and making feature-progress with NumPy *is > going* to create in-stability which concerns me. > > I would like to see a little-bit-of-pain one time with a NumPy 2.0, rather > than a constant pain because of constant churn over many years approach > that Nathaniel seems to advocate. To me NumPy 2.0 is an ABI-breaking > release that is as API-compatible as possible and whose semantics are not > dramatically different. > > There are at least 3 areas of compatibility (ABI, API, and semantic). > ABI-compatibility is a non-feature in today's world. There are so many > distributions of the NumPy stack (and conda makes it trivial for anyone to > build their own or for you to build one yourself). Making less-optimal > software-engineering choices because of fear of breaking the ABI is not > something I'm supportive of at all. We should not break ABI every > release, but a release every 3 years that breaks ABI is not a problem. > > API compatibility should be much more sacrosanct, but it is also something > that can also be managed. Any NumPy 2.0 should definitely support the > full NumPy API (though there could be deprecated swaths). I think the > community has done well in using deprecation and limiting the public API to > make this more manageable and I would love to see a NumPy 2.0 that > solidifies a future-oriented API along with a back-ward compatible API that > is also available. > > Semantic compatibility is the hardest. We have already broken this on > multiple occasions throughout the 1.x NumPy releases. Every time you > change the code, this can change. This is what I fear causing deep > instability over the course of many years. These are things like the > casting rule details, the effect of indexing changes, any change to the > calculations approaches. It is and has been the most at risk during any > code-changes. My view is that a NumPy 2.0 (with a new low-level > architecture) minimizes these changes to a single release rather than > unavoidably spreading them out over many, many releases. > > I think that summarizes my main concerns. I will write-up more forward > thinking ideas for what else is possible in the coming weeks. In the mean > time, thanks for keeping the discussion going. It is extremely exciting to > see the help people have continued to provide to maintain and improve > NumPy. It will be exciting to see what the next few years bring as well. > > > Best, > > -Travis > > > > > > > On Tue, Aug 25, 2015 at 5:03 AM, Nathaniel Smith wrote: > >> Hi all, >> >> These are the notes from the NumPy dev meeting held July 7, 2015, at >> the SciPy conference in Austin, presented here so the list can keep up >> with what happens, and so you can give feedback. Please do give >> feedback, none of this is final! >> >> (Also, if anyone who was there notices anything I left out or >> mischaracterized, please speak up -- these are a lot of notes I'm >> trying to gather together, so I could easily have missed something!) >> >> Thanks to Jill Cowan and the rest of the SciPy organizers for donating >> space and organizing logistics for us, and to the Berkeley Institute >> for Data Science for funding travel for Jaime, Nathaniel, and >> Sebastian. >> >> >> Attendees >> ========= >> >> Present in the room for all or part: Daniel Allan, Chris Barker, >> Sebastian Berg, Thomas Caswell, Jeff Reback, Jaime Fern?ndez del >> R?o, Chuck Harris, Nathaniel Smith, St?fan van der Walt. (Note: I'm >> pretty sure this list is incomplete) >> >> Joining remotely for all or part: Stephan Hoyer, Julian Taylor. >> >> >> Formalizing our governance/decision making >> ========================================== >> >> This was a major focus of discussion. At a high level, the consensus >> was to steal IPython's governance document ("IPEP 29") and modify it >> to remove its use of a BDFL as a "backstop" to normal community >> consensus-based decision, and replace it with a new "backstop" based >> on Apache-project-style consensus voting amongst the core team. >> >> I'll send out a proper draft of this shortly for further discussion. >> >> >> Development roadmap >> =================== >> >> General consensus: >> >> Let's assume NumPy is going to remain important indefinitely, and >> try to make it better, instead of waiting for something better to >> come along. (This is unlikely to be wasted effort even if something >> better does come along, and it's hardly a sure thing that that will >> happen anyway.) >> >> Let's focus on evolving numpy as far as we can without major >> break-the-world changes (no "numpy 2.0", at least in the foreseeable >> future). >> >> And, as a target for that evolution, let's change our focus from >> numpy as "NumPy is the library that gives you the np.ndarray object >> (plus some attached infrastructure)", to "NumPy provides the >> standard framework for working with arrays and array-like objects in >> Python" >> >> This means, creating defined interfaces between array-like objects / >> ufunc objects / dtype objects, so that it becomes possible for third >> parties to add their own and mix-and-match. Right now ufuncs are >> pretty good at this, but if you want a new array class or dtype then >> in most cases you pretty much have to modify numpy itself. >> >> Vision: instead of everyone who wants a new container type having to >> reimplement all of numpy, Alice can implement an array class using >> (sparse / distributed / compressed / tiled / gpu / out-of-core / >> delayed / ...) storage, pass it to code that was written using >> direct calls to np.* functions, and it just works. (Instead of >> np.sin being "the way you calculate the sine of an ndarray", it's >> "the way you calculate the sine of any array-like container >> object".) >> >> Vision: Darryl can implement a new dtype for (categorical data / >> astronomical dates / integers-with-missing-values / ...) without >> having to touch the numpy core. >> >> Vision: Chandni can then come along and combine them by doing >> >> a = alice_array([...], dtype=darryl_dtype) >> >> and it just works. >> >> Vision: no-one is tempted to subclass ndarray, because anything you >> can do with an ndarray subclass you can also easily do by defining >> your own new class that implements the "array protocol". >> >> >> Supporting third-party array types >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> >> Sub-goals: >> - Get __numpy_ufunc__ done, which will cover a good chunk of numpy's >> API right there. >> - Go through the rest of the stuff in numpy, and figure out some >> story for how to let it handle third-party array classes: >> - ufunc ALL the things: Some things can be converted directly into >> (g)ufuncs and then use __numpy_ufunc__ (e.g., np.std); some >> things could be converted into (g)ufuncs if we extended the >> (g)ufunc interface a bit (e.g. np.sort, np.matmul). >> - Some things probably need their own __numpy_ufunc__-like >> extensions (__numpy_concatenate__?) >> - Provide tools to make it easier to implement the more complicated >> parts of an array object (e.g. the bazillion different methods, >> many of which are ufuncs in disguise, or indexing) >> - Longer-run interesting research project: __numpy_ufunc__ requires >> that one or the other object have explicit knowledge of how to >> handle the other, so to handle binary ufuncs with N array types >> you need something like N**2 __numpy_ufunc__ code paths. As an >> alternative, if there were some interface that an object could >> export that provided the operations nditer needs to efficiently >> iterate over (chunks of) it, then you would only need N >> implementations of this interface to handle all N**2 operations. >> >> This would solve a lot of problems for projects like: >> - blosc >> - dask >> - distarray >> - numpy.ma >> - pandas >> - scipy.sparse >> - xray >> >> >> Supporting third-party dtypes >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> >> We already have something like a C level "dtype >> protocol". Conceptually, the way you define a new dtype is by >> defining a new class whose instances have data attributes defining >> the parameters of the dtype (what fields are in *this* record dtype, >> how many characters are in *this* string dtype, what units are used >> for *this* datetime64, etc.), and you define a bunch of methods to >> do things like convert an object from a Python object to your dtype >> or vice-versa, to copy an array of your dtype from one place to >> another, to cast to and from your new dtype, etc. This part is >> great. >> >> The problem is, in the current implementation, we don't actually use >> the Python object system to define these classes / attributes / >> methods. Instead, all possible dtypes are jammed into a single >> Python-level class, whose struct has fields for the union of all >> possible dtype's attributes, and instead of Python-style method >> slots there's just a big table of function pointers attached to each >> object. >> >> So the main proposal is that we keep the basic design, but switch it >> so that the float64 dtype, the int64 dtype, etc. actually literally >> are subclasses of np.dtype, each implementing their own fields and >> Python-style methods. >> >> Some of the pieces involved in doing this: >> >> - The current dtype methods should be cleaned up -- e.g. 'dot' and >> 'less_than' are both dtype methods, when conceptually they're much >> more like ufuncs. >> >> - The ufunc inner-loop interface currently does not get a reference >> to the dtype object, so they can't see its attributes and this is >> a big obstacle to many interesting dtypes (e.g., it's hard to >> implement np.equal for categoricals if you don't know what >> categories each has). So we need to add new arguments to the core >> ufunc loop signature. (Fortunately this can be done in a >> backwards-compatible way.) >> >> - We need to figure out what exactly the dtype methods should be, >> and add them to the dtype class (possibly with backwards >> compatibility shims for anyone who is accessing PyArray_ArrFuncs >> directly). >> >> - Casting will be possibly the trickiest thing to work out, though >> the basic idea of using dunder-dispatch-like __cast__ and >> __rcast__ methods seems workable. (Encouragingly, this is also >> exactly what dynd also does, though unfortunately dynd does not >> yet support user-defined dtypes even to the extent that numpy >> does, so there isn't much else we can steal from them.) >> - We may also want to rethink the casting rules while we're at it, >> since they have some very weird corners right now (e.g. see >> [https://github.com/numpy/numpy/issues/6240]) >> >> - We need to migrate the current dtypes over to the new system, >> which can be done in stages: >> >> - First stick them all in a single "legacy dtype" class whose >> methods just dispatch to the PyArray_ArrFuncs per-object "method >> table" >> >> - Then move each of them into their own classes >> >> - We should provide a Python-level wrapper for the protocol, so that >> you can call dtype methods from Python >> >> - And vice-versa, it should be possible to subclass dtype at the >> Python level >> >> - etc. >> >> Fortunately, AFAICT pretty much all of this can be done while >> maintaining backwards compatibility (though we may want to break >> some obscure cases to avoid expending *too* much effort with weird >> backcompat contortions that will only help a vanishingly small >> proportion of the userbase), and a lot of the above changes can be >> done as semi-independent mini-projects, so there's no need for some >> branch to go off and spend a year rewriting the world. >> >> Obviously there are still a lot of details to work out, though. But >> overall, there was widespread agreement that this is one of the #1 >> pain points for our users (e.g. it's the single main request from >> pandas), and fixing it is very high priority. >> >> Some features that would become straightforward to implement >> (e.g. even in third-party libraries) if this were fixed: >> - missing value support >> - physical unit tracking (meters / seconds -> array of velocity; >> meters + seconds -> error) >> - better and more diverse datetime representations (e.g. datetimes >> with attached timezones, or using funky geophysical or >> astronomical calendars) >> - categorical data >> - variable length strings >> - strings-with-encodings (e.g. latin1) >> - forward mode automatic differentiation (write a function that >> computes f(x) where x is an array of float64; pass that function >> an array with a special dtype and get out both f(x) and f'(x)) >> - probably others I'm forgetting right now >> >> I should also note that there was one substantial objection to this >> plan, from Travis Oliphant (in discussions later in the >> conference). I'm not confident I understand his objections well >> enough to reproduce them here, though -- perhaps he'll elaborate. >> >> >> Money >> ===== >> >> There was an extensive discussion on the topic of: "if we had money, >> what would we do with it?" >> >> This is partially motivated by the realization that there are a >> number of sources that we could probably get money from, if we had a >> good story for what we wanted to do, so it's not just an idle >> question. >> >> Points of general agreement: >> >> - Doing the in-person meeting was a good thing. We should plan do >> that again, at least once a year. So one thing to spend money on >> is travel subsidies to make sure that happens and is productive. >> >> - While it's tempting to imagine hiring junior people for the more >> frustrating/boring work like maintaining buildbots, release >> infrastructure, updating docs, etc., this seems difficult to do >> realistically with our current resources -- how do we hire for >> this, who would manage them, etc.? >> >> - On the other hand, the general feeling was that if we found the >> money to hire a few more senior people who could take care of >> themselves more, then that would be good and we could >> realistically absorb that extra work without totally unbalancing >> the project. >> >> - A major open question is how we would recruit someone for a >> position like this, since apparently all the obvious candidates >> who are already active on the NumPy team already have other >> things going on. [For calibration on how hard this can be: NYU >> has apparently had an open position for a year with the job >> description of "come work at NYU full-time with a >> private-industry-competitive-salary on whatever your personal >> open-source scientific project is" (!) and still is having an >> extremely difficult time filling it: >> [http://cds.nyu.edu/research-engineer/]] >> >> - General consensus though was that there isn't much to be done >> about this though, except try it and see. >> >> - (By the way, if you're someone who's reading this and >> potentially interested in like a postdoc or better working on >> numpy, then let's talk...) >> >> >> More specific changes to numpy that had general consensus, but don't >> really fit into a high-level roadmap >> >> ========================================================================================================= >> >> - Resolved: we should merge multiarray.so and umath.so into a single >> extension module, so that they can share utility code without the >> current awkward contortions. >> >> - Resolved: we should start hiding new fields in the ufunc and dtype >> structs as soon as possible going forward. (I.e. they would not be >> present in the version of the structs that are exposed through the >> C API, but internally we would use a more detailed struct.) >> - Mayyyyyybe we should even go ahead and hide the subset of the >> existing fields that are really internal details that no-one >> should be using. If we did this without changing anything else >> then it would preserve ABI (the fields would still be where >> existing compiled extensions expect them to be, if any such >> extensions exist) while breaking API (trying to compile such >> extensions would give a clear error), so would be a smoother >> ramp if we think we need to eventually break those fields for >> real. (As discussed above, there are a bunch of fields in the >> dtype base class that only make sense for specific dtype >> subclasses, e.g. only record dtypes need a list of field names, >> but right now all dtypes have one anyway. So it would be nice to >> remove these from the base class entirely, but that is >> potentially ABI-breaking.) >> >> - Resolved: np.array should never return an object array unless >> explicitly requested (e.g. with dtype=object); it just causes too >> many surprising problems. >> - First step: add a deprecation warning >> - Eventually: make it an error. >> >> - The matrix class >> - Resolved: We won't add warnings yet, but we will prominently >> document that it is deprecated and should be avoided where-ever >> possible. >> - St?fan van der Walt volunteers to do this. >> - We'd all like to deprecate it properly, but the feeling was that >> the precondition for this is for scipy.sparse to provide sparse >> "arrays" that don't return np.matrix objects on ordinary >> operatoins. Until that happens we can't reasonably tell people >> that using np.matrix is a bug. >> >> - Resolved: we should add a similar prominent note to the >> "subclassing ndarray" documentation, warning people that this is >> painful and barely works and please don't do it if you have any >> alternatives. >> >> - Resolved: we want more, smaller releases -- every 6 months at >> least, aiming to go even faster (every 4 months?) >> >> - On the question of using Cython inside numpy core: >> - Everyone agrees that there are places where this would be an >> improvement (e.g., Python<->C interfaces, and places "when you >> want to do computer science", e.g. complicated algorithmic stuff >> like graph traversals) >> - Chuck wanted it to be clear though that he doesn't think it >> would be a good goal to try and rewrite all of numpy in Cython >> -- there also exist places where Cython ends up being "an uglier >> version of C". No-one disagreed. >> >> - Our text reader is apparently not very functional on Python 3, and >> generally slow and hard to work with. >> - Resolved: We should extract Pandas's awesome text reader/parser >> and convert it into its own package, that could then become a >> new backend for both pandas and numpy.loadtxt. >> - Jeff thinks this is a great idea >> - Thomas Caswell volunteers to do the extraction. >> >> - We should work on improving our tools for evolving the ABI, so >> that we will eventually be less constrained by decisions made >> decades ago. >> - One idea that had a lot of support was to switch from our >> current append-only C-API to a "sliding window" API based on >> explicit versions. So a downstream package might say >> >> #define NUMPY_API_VERSION 4 >> >> and they'd get the functions and behaviour provided in "version >> 4" of the numpy C api. If they wanted to get access to new stuff >> that was added in version 5, then they'd need to switch that >> #define, and at the same time clean up any usage of stuff that >> was removed or changed in version 5. And to provide a smooth >> migration path, one version of numpy would support multiple >> versions at once, gradually deprecating and dropping old >> versions. >> >> - If anyone wants to help bring pip up to scratch WRT tracking ABI >> dependencies (e.g., 'pip install numpy==' >> -> triggers rebuild of scipy against the new ABI), then that >> would be an extremely useful thing. >> >> >> Policies that should be documented >> ================================== >> >> ...together with some notes about what the contents of the document >> should be: >> >> >> How we manage bugs in the bug tracker. >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> >> - Github "milestones" should *only* be assigned to release-blocker >> bugs (which mostly means "regression from the last release"). >> >> In particular, if you're tempted to push a bug forward to the next >> release... then it's clearly not a blocker, so don't set it to the >> next release's milestone, just remove the milestone entirely. >> >> (Obvious exception to this: deprecation followup bugs where we >> decide that we want to keep the deprecation around a bit longer >> are a case where a bug actually does switch from being a blocker >> to release 1.x to being a blocker for release 1.(x+1).) >> >> - Don't hesitate to close an issue if there's no way forward -- >> e.g. a PR where the author has disappeared. Just post a link to >> this policy and close, with a polite note that we need to keep our >> tracker useful as a todo list, but they're welcome to re-open if >> things change. >> >> >> Deprecations and breakage policy: >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> >> - How long do we need to keep DeprecationWarnings around before we >> break things? This is tricky because on the one hand an aggressive >> (short) deprecation period lets us deliver new features and >> important cleanups more quickly, but on the other hand a >> too-aggressive deprecation period is difficult for our more >> conservative downstream users. >> >> - Idea that had the most support: pick a somewhat-aggressive >> warning period as our default, and make a rule that if someone >> asks for an extension during the beta cycle for the release that >> removes it, then we put it back for another release or two worth >> of grace period. (While also possibly upgrading the warning to >> be more visible during the grace period.) This gives us >> deprecation periods that are more adaptive on a case-by-case >> basis. >> >> - Lament: it would be really nice if we could get more people to >> test our beta releases, because in practice right now 1.x.0 ends >> up being where we actually the discover all the bugs, and 1.x.1 is >> where it actually becomes usable. Which sucks, and makes it >> difficult to have a solid policy about what counts as a >> regression, etc. Is there anything we can do about this? >> >> - ABI breakage: we distinguish between an ABI break that breaks >> everything (e.g., "import scipy" segfaults), versus an ABI break >> that breaks an occasional rare case (e.g., only apps that poke >> around in some obscure corner of some struct are affected). >> >> - The "break-the-world" type remains off-limit for now: the pain >> is still too large (conda helps, but there are lots of people >> who don't use conda!), and there aren't really any compelling >> improvements that this would enable anyway. >> >> - For the "break-0.1%-of-users" type, it is *not* ruled out by >> fiat, though we remain conservative: we should treat it like >> other API breaks in principle, and do a careful case-by-case >> analysis of the details of the situation, taking into account >> what kind of code would be broken, how common these cases are, >> how important the benefits are, whether there are any specific >> mitigation strategies we can use, etc. -- with this process of >> course taking into account that a segfault is nastier than a >> Python exception. >> >> >> Other points that were discussed >> ================================ >> >> - There was inconclusive discussion of what we should do with dot() >> in the places where it disagrees with the PEP 465 matmul semantics >> (specifically this is when both arguments have ndim >= 3, or one >> argument has ndim == 0). >> - The concern is that the current behavior is not very useful, and >> as far as we can tell no-one is using it; but, as people get >> used to the more-useful PEP 465 behavior, they will increasingly >> try to use it on the assumption that np.dot will work the same >> way, and this will create pain for lots of people. So Nathaniel >> argued that we should start at least issuing a visible warning >> when people invoke the corner-case behavior. >> - But OTOH, np.dot is such a core piece of infrastructure, and >> there's such a large landscape of code out there using numpy >> that we can't see, that others were reasonably wary of making >> any change. >> - For now: document prominently, but no change in behavior. >> >> >> Links to raw notes >> ================== >> >> Main page: >> [https://github.com/numpy/numpy/wiki/SciPy-2015-developer-meeting] >> >> Notes from the meeting proper: >> [ >> https://docs.google.com/document/d/1IJcYdsHtk8MVAM4AZqFDBSf_nVG-mrB4Tv2bh9u1g4Y/edit?usp=sharing >> ] >> >> Slides from the followup BoF: >> [ >> https://gist.github.com/njsmith/eb42762054c88e810786/raw/b74f978ce10a972831c582485c80fb5b8e68183b/future-of-numpy-bof.odp >> ] >> >> Notes from the followup BoF: >> [ >> https://docs.google.com/document/d/11AuTPms5dIPo04JaBOWEoebXfk-tUzEZ-CvFnLIt33w/edit >> ] >> >> -n >> >> -- >> Nathaniel J. Smith -- http://vorpus.org >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > > -- > > *Travis Oliphant* > *Co-founder and CEO* > > > @teoliphant > 512-222-5440 > http://www.continuum.io > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- Nathaniel J. Smith -- http://vorpus.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Wed Aug 26 02:42:10 2015 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 25 Aug 2015 23:42:10 -0700 Subject: [Numpy-discussion] Notes from the numpy dev meeting at scipy 2015 In-Reply-To: <20150825212159.39cee394@fsol> References: <20150825212159.39cee394@fsol> Message-ID: On Tue, Aug 25, 2015 at 12:21 PM, Antoine Pitrou wrote: > On Tue, 25 Aug 2015 03:03:41 -0700 > Nathaniel Smith wrote: >> >> Supporting third-party dtypes >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> > [...] >> >> Some features that would become straightforward to implement >> (e.g. even in third-party libraries) if this were fixed: >> - missing value support >> - physical unit tracking (meters / seconds -> array of velocity; >> meters + seconds -> error) >> - better and more diverse datetime representations (e.g. datetimes >> with attached timezones, or using funky geophysical or >> astronomical calendars) >> - categorical data >> - variable length strings >> - strings-with-encodings (e.g. latin1) >> - forward mode automatic differentiation (write a function that >> computes f(x) where x is an array of float64; pass that function >> an array with a special dtype and get out both f(x) and f'(x)) >> - probably others I'm forgetting right now > > It should also be the opportunity to streamline datetime64 and > timedelta64 dtypes. Currently the unit information is IIRC hidden in > some weird metadata thing called the PyArray_DatetimeMetaData. Yeah, and PyArray_DatetimeMetaData is an "NpyAuxData", which is its own personal little object system implemented in C with its own reference counting system... the design of dtypes has great bones, but the current implementation has a lot of, um, historical baggage. -n -- Nathaniel J. Smith -- http://vorpus.org From njs at pobox.com Wed Aug 26 02:59:30 2015 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 25 Aug 2015 23:59:30 -0700 Subject: [Numpy-discussion] testing numpy with downstream testsuites (was: Re: Notes from the numpy dev meeting at scipy 2015) Message-ID: [Popping this off to its own thread to try and keep things easier to follow] On Tue, Aug 25, 2015 at 9:52 AM, Nathan Goldbaum wrote: >> - Lament: it would be really nice if we could get more people to >> test our beta releases, because in practice right now 1.x.0 ends >> up being where we actually the discover all the bugs, and 1.x.1 is >> where it actually becomes usable. Which sucks, and makes it >> difficult to have a solid policy about what counts as a >> regression, etc. Is there anything we can do about this? > > Just a note in here - have you all thought about running the test suites for > downstream projects as part of the numpy test suite? I don't think it came up, but it's not a bad idea! The main problems I can foresee are: 1) Since we don't know the downstream code, it can be hard to interpret test suite failures. OTOH for changes we're uncertain of we already do often end up running some downstream test suites by hand, so it can only be an improvement on that... 2) Sometimes everyone including downstream agrees that breaking something is actually a good idea and they should just deal, but what do you do then? These both seem solvable though. I guess a good strategy would be to compile a travis-compatible wheel of $PACKAGE version $latest-stable against numpy 1.x, and then in the 1.(x+1) development period numpy would have an additional travis run which, instead of running the numpy test suite, instead does: pip install . pip install $PACKAGE-$latest-stable.whl python -c 'import package; package.test()' # adjust as necessary ? Where $PACKAGE is something like scipy / pandas / astropy / ... matplotlib would be nice but maybe impractical...? Maybe someone else will have objections but it seems like a reasonable idea to me. Want to put together a PR? Asides from fame and fortune and our earnest appreciation, your reward is you get to make sure that the packages you care about are included so that we break them less often in the future ;-). -n -- Nathaniel J. Smith -- http://vorpus.org From njs at pobox.com Wed Aug 26 03:05:41 2015 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 26 Aug 2015 00:05:41 -0700 Subject: [Numpy-discussion] Notes from the numpy dev meeting at scipy 2015 In-Reply-To: References: Message-ID: On Tue, Aug 25, 2015 at 5:53 PM, David Cournapeau wrote: > Thanks for the good summary Nathaniel. > > Regarding dtype machinery, I agree casting is the hardest part. Unless the > code has changed dramatically, this was the main reason why you could not > make most of the dtypes separate from numpy codebase (I tried to move the > datetime dtype out of multiarray into a separate C extension some years > ago). Being able to separate the dtypes from the multiarray module would be > an obvious way to drive the internal API change. For practical reasons I don't imagine we'll ever want to actually move the core dtypes out of multiarray -- if nothing else they will always remain a little bit special, like np.array([1.0, 2.0]) will just "know" that this should use the float64 dtype. But yeah, in general a good heuristic would be that -- aside from a few limited cases like that -- we want to make built-in dtypes and user-defined dtypes use the same APIs. > Regarding the use of cython in numpy, was there any discussion about the > compilation/size cost of using cython, and talking to the cython team to > improve this ? Or was that considered acceptable with current cython for > numpy. I am convinced cleanly separating the low level parts from the python > C API plumbing would be the single most important thing one could do to make > the codebase more amenable. It's still a more blue-sky idea than that... the discussion was more at the level of "is this something that is even worth trying to make work and seeing where the problems are?" The big immediate problem, before we got into code size issues, would be that we would need to be able to compile a mix of .pyx files and .c files into a single .so, while cython generated code currently makes some strong assumptions about how each .pyx file will live in its own .so. From playing around with it I suspect the first version of making this work will be klugey indeed. But yeah, the thing to do would be for someone to dig in and make the kluges and then decide how to clean them up once you know where they are. -n -- Nathaniel J. Smith -- http://vorpus.org From solipsis at pitrou.net Wed Aug 26 04:28:22 2015 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 26 Aug 2015 10:28:22 +0200 Subject: [Numpy-discussion] SHA256 mismatch on SourceForge downloads Message-ID: <20150826102822.40f83fef@fsol> Hello, The SourceForge download page for 1.10.0b1 mentions: 89e467cec774527dd254c1e039801726db1367433053801f0d8bc68deac74009 numpy-1.10.0b1.tar.gz But after downloading the file I get: $ sha256sum numpy-1.10.0b1.tar.gz 855695405092686264dc8ce7b3f5c939a6cf1a5639833e841a5bb6fb799cd6a8 numpy-1.10.0b1.tar.gz Also, since SouceForge doesn't provide any HTTPS downloads (it actually redirects HTTPS to HTTP (*)), this all looks a bit pointless. (*) seems like SourceForge is becoming a poster child of worst practices... Regards Antoine. From sebastian at sipsolutions.net Wed Aug 26 04:57:57 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Wed, 26 Aug 2015 10:57:57 +0200 Subject: [Numpy-discussion] Notes from the numpy dev meeting at scipy 2015 In-Reply-To: References: Message-ID: <1440579477.20263.60.camel@sipsolutions.net> On Mi, 2015-08-26 at 00:05 -0700, Nathaniel Smith wrote: > On Tue, Aug 25, 2015 at 5:53 PM, David Cournapeau wrote: > > Thanks for the good summary Nathaniel. > > > > Regarding dtype machinery, I agree casting is the hardest part. Unless the > > code has changed dramatically, this was the main reason why you could not > > make most of the dtypes separate from numpy codebase (I tried to move the > > datetime dtype out of multiarray into a separate C extension some years > > ago). Being able to separate the dtypes from the multiarray module would be > > an obvious way to drive the internal API change. > > For practical reasons I don't imagine we'll ever want to actually move > the core dtypes out of multiarray -- if nothing else they will always > remain a little bit special, like np.array([1.0, 2.0]) will just > "know" that this should use the float64 dtype. But yeah, in general a > good heuristic would be that -- aside from a few limited cases like > that -- we want to make built-in dtypes and user-defined dtypes use > the same APIs. > Well, casting is the conceptional hardest part. Marrying it to the rest of numpy is probably just as hard ;). With the chance of not having thought this through enough, maybe some points about the general discussion. I think I would like some more clarity of what we want and especially *need* [1]. From SciPy, there were two things I particularly remember: 1. the dtype/scalar issue 2. making an interface to make array-likes interaction more sane (this I think can go quite far, and we are already going part of it) The dtypes/scalars seem a particularly dark corner of numpy and if it is feasible for us to replace it with something new, then I would be willing to do some breaks for it (admittingly, given protest, I would back down from that and another solution would be needed). The point for me is, I currently think a dtype/scalar could get numpy a big way, especially from the point of view of downstream packages. Of course it would be harder to do in numpy then in something new, but it should also be of much more immediate use. Maybe I am going a bit too far with this right now, but I could imagine that if we cannot clean up the dtype/scalars, numpy may indeed be doomed or at least a brick slowing down a lot of other people. And if it is not possible to do this without a numpy 2, then likely that is the way to go. But I am not convinced we should aim to fix all the other stuff at the same time. I am afraid it would just accumulate to grow over everyones heads. In other words, I think if we can muster the resources I would like to see this problem attacked within numpy. If this proves impossible a new dtype abstraction may well be reason for numpy 2, or used by a DyND or similar? But I do believe we should not give up on Numpy here from the start, at least I do not see a compelling reason to do. Instead giving up on numpy seems like the last way out of a misery. And much of the different opinions to me seem to be whether we think this will clearly happen or not or has already happened (or maybe whether it is too costly to do in numpy). Cleaning it up, would open doors to many things. Note that I think it would make the numpy source much less scary, because I think it is the one big piece of code that is maybe not clearly a separate chunk [2]. After making it sane, I would argue that numpy does become much more maintainable and extensible. From my current view, probably enough so for a long time. Also, I think it would give us abstraction to make different/new projects work together better and if done well enough, some grand new project set to replace numpy could reuse it. Of course it is entirely possible that more things need to be changed in numpy and that some others would be just as hard or even harder to do. But if we can identify this as the "one big thing that gets us 90%" then I refuse to give up hope of doing it in numpy just yet. - Sebastian [1] Travis has said quite a lot about it, but it is not yet clear to me what is a priority/real pain point. Take "datashape" for example. By now I think that the datashape is likely a good idea to make structured arrays nicer, since it moves the "structured" part into the array object and not the dtype, which makes sense to me. However, I am not convinced that the datashape is something that would make numpy a compelling amount better. In fact I could imagine that for many things it would make it unnecessarily more complicated for users. [2] Take indexing, I like to think I did not break that much when redoing it (except on purpose, which I hope did not create much trouble). In some sense indexing was simple to redo, because it does not overlap at all with anything else directly. If we get dtypes/scalars more separated, I think we are at a point where this is possible with pretty much any part of numpy. > > Regarding the use of cython in numpy, was there any discussion about the > > compilation/size cost of using cython, and talking to the cython team to > > improve this ? Or was that considered acceptable with current cython for > > numpy. I am convinced cleanly separating the low level parts from the python > > C API plumbing would be the single most important thing one could do to make > > the codebase more amenable. > > It's still a more blue-sky idea than that... the discussion was more > at the level of "is this something that is even worth trying to make > work and seeing where the problems are?" > > The big immediate problem, before we got into code size issues, would > be that we would need to be able to compile a mix of .pyx files and .c > files into a single .so, while cython generated code currently makes > some strong assumptions about how each .pyx file will live in its own > .so. From playing around with it I suspect the first version of making > this work will be klugey indeed. But yeah, the thing to do would be > for someone to dig in and make the kluges and then decide how to clean > them up once you know where they are. > > -n > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From jtaylor.debian at googlemail.com Wed Aug 26 05:17:10 2015 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Wed, 26 Aug 2015 11:17:10 +0200 Subject: [Numpy-discussion] SHA256 mismatch on SourceForge downloads In-Reply-To: <20150826102822.40f83fef@fsol> References: <20150826102822.40f83fef@fsol> Message-ID: <55DD8416.1090609@googlemail.com> The file is also not signed so the checksums are not trustworthy anyway. Please sign the releases as we did in the past. On 08/26/2015 10:28 AM, Antoine Pitrou wrote: > > Hello, > > The SourceForge download page for 1.10.0b1 mentions: > > 89e467cec774527dd254c1e039801726db1367433053801f0d8bc68deac74009 > numpy-1.10.0b1.tar.gz > > But after downloading the file I get: > > $ sha256sum numpy-1.10.0b1.tar.gz > 855695405092686264dc8ce7b3f5c939a6cf1a5639833e841a5bb6fb799cd6a8 > numpy-1.10.0b1.tar.gz > > > Also, since SouceForge doesn't provide any HTTPS downloads (it > actually redirects HTTPS to HTTP (*)), this all looks a bit pointless. > > (*) seems like SourceForge is becoming a poster child of worst > practices... > > Regards > > Antoine. From jenshnielsen at gmail.com Wed Aug 26 06:38:39 2015 From: jenshnielsen at gmail.com (Jens Nielsen) Date: Wed, 26 Aug 2015 10:38:39 +0000 Subject: [Numpy-discussion] testing numpy with downstream testsuites (was: Re: Notes from the numpy dev meeting at scipy 2015) In-Reply-To: References: Message-ID: As a Matplotlib developer I try to test our code manually with all betas and rc of new numpy versions. (And already pushed fixed a few new deprecation warnings with 1.10beta1 which otherwise passes our test suite. I forgot to report this back since there were no issues to report ) However, we could actually do this automatically if numpy betas were uploaded as prereleases on pypi. We are already using Travis's allow failure mode to test python 3.5 betas and rc's along with all our dependencies installed with `pip --pre` https://pip.pypa.io/en/latest/reference/pip_install.html#pre-release-versions Putting prereleases on pypi would thus automate most of the testing of new Numpy versions for us. Best Jens ons. 26. aug. 2015 kl. 07.59 skrev Nathaniel Smith : > [Popping this off to its own thread to try and keep things easier to > follow] > > On Tue, Aug 25, 2015 at 9:52 AM, Nathan Goldbaum > wrote: > >> - Lament: it would be really nice if we could get more people to > >> test our beta releases, because in practice right now 1.x.0 ends > >> up being where we actually the discover all the bugs, and 1.x.1 is > >> where it actually becomes usable. Which sucks, and makes it > >> difficult to have a solid policy about what counts as a > >> regression, etc. Is there anything we can do about this? > > > > Just a note in here - have you all thought about running the test suites > for > > downstream projects as part of the numpy test suite? > > I don't think it came up, but it's not a bad idea! The main problems I > can foresee are: > 1) Since we don't know the downstream code, it can be hard to > interpret test suite failures. OTOH for changes we're uncertain of we > already do often end up running some downstream test suites by hand, > so it can only be an improvement on that... > 2) Sometimes everyone including downstream agrees that breaking > something is actually a good idea and they should just deal, but what > do you do then? > > These both seem solvable though. > > I guess a good strategy would be to compile a travis-compatible wheel > of $PACKAGE version $latest-stable against numpy 1.x, and then in the > 1.(x+1) development period numpy would have an additional travis run > which, instead of running the numpy test suite, instead does: > pip install . > pip install $PACKAGE-$latest-stable.whl > python -c 'import package; package.test()' # adjust as necessary > ? Where $PACKAGE is something like scipy / pandas / astropy / ... > matplotlib would be nice but maybe impractical...? > > Maybe someone else will have objections but it seems like a reasonable > idea to me. Want to put together a PR? Asides from fame and fortune > and our earnest appreciation, your reward is you get to make sure that > the packages you care about are included so that we break them less > often in the future ;-). > > -n > > -- > Nathaniel J. Smith -- http://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From faltet at gmail.com Wed Aug 26 07:14:19 2015 From: faltet at gmail.com (Francesc Alted) Date: Wed, 26 Aug 2015 13:14:19 +0200 Subject: [Numpy-discussion] Notes from the numpy dev meeting at scipy 2015 In-Reply-To: References: Message-ID: Hi, Thanks Nathaniel and others for sparking this discussion as I think it is very timely. 2015-08-25 12:03 GMT+02:00 Nathaniel Smith : > Let's focus on evolving numpy as far as we can without major > break-the-world changes (no "numpy 2.0", at least in the foreseeable > future). > > And, as a target for that evolution, let's change our focus from > numpy as "NumPy is the library that gives you the np.ndarray object > (plus some attached infrastructure)", to "NumPy provides the > standard framework for working with arrays and array-like objects in > Python" > Sorry to disagree here, but in my opinion NumPy *already* provides the standard framework for working with arrays and array-like objects in Python as its huge popularity shows. If what you mean is that there are too many efforts trying to provide other, specialized data containers (things like DataFrame in pandas, DataArray/Dataset in xarray or carray/ctable in bcolz just to mention a few), then let me say that I am of the opinion that there can't be a silver bullet for tackling all the problems that the PyData community is facing. The libraries using specialized data containers (pandas, xray, bcolz...) may have more or less machinery on top of them so that conversion to NumPy not necessarily happens internally (many times we don't want conversions for efficiency), but it is the capability of producing NumPy arrays out of them (or parts of them) what makes these specialized containers to be incredible more useful to users because they can use NumPy to fill the missing gaps, or just use NumPy as an intermediate container that acts as input for other libraries. On the subject on why I don't think a universal data container is feasible for PyData, you just have to have a look at how many data structures Python is providing in the language itself (tuples, lists, dicts, sets...), and how many are added in the standard library (like those in the collections sub-package). Every data container is designed to do a couple of things (maybe three) well, but for other use cases it is the responsibility of the user to choose the more appropriate depending on her needs. In the same vein, I also think that it makes little sense to try to come with a standard solution that is going to satisfy everyone's need. IMHO, and despite all efforts, neither NumPy, NumPy 2.0, DyND, bcolz or any other is going to offer the universal data container. Instead of that, let me summarize what users/developers like me need from NumPy for continue creating more specialized data containers: 1) Keep NumPy simple. NumPy is the truly cornerstone of PyData right now, and it will be for the foreseeable future, so please keep it usable and *minimal*. Before adding any more feature the increase in complexity should carefully weighted. 2) Make NumPy more flexible. Any rewrite that allows arrays or dtypes to be subclassed and extended more easily will be a huge win. *But* if in order to allow flexibility you have to make NumPy much more complex, then point 1) should prevail. 3) Make of NumPy a sustainable project. Historically NumPy depended on heroic efforts of individuals to make it what it is now: *an industry standard*. But individual efforts, while laudable, are not enough, so please, please, please continue the effort of constituting a governance team that ensures the future of NumPy (and with it, the whole PyData community). Finally, the question on whether NumPy 2.0 or projects like DyND should be chosen instead for implementing new features is still legitimate, and while I have my own opinions (favourable to DyND), I still see (such is the price of technological debt) a distant future where we will find NumPy as we know it, allowing more innovation to happen in Python Data space. Again, thanks to all those braves that are allowing others to build on top of NumPy's shoulders. -- Francesc Alted -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Wed Aug 26 07:14:29 2015 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 26 Aug 2015 12:14:29 +0100 Subject: [Numpy-discussion] testing numpy with downstream testsuites (was: Re: Notes from the numpy dev meeting at scipy 2015) In-Reply-To: References: Message-ID: Hi, On Wed, Aug 26, 2015 at 7:59 AM, Nathaniel Smith wrote: > [Popping this off to its own thread to try and keep things easier to follow] > > On Tue, Aug 25, 2015 at 9:52 AM, Nathan Goldbaum wrote: >>> - Lament: it would be really nice if we could get more people to >>> test our beta releases, because in practice right now 1.x.0 ends >>> up being where we actually the discover all the bugs, and 1.x.1 is >>> where it actually becomes usable. Which sucks, and makes it >>> difficult to have a solid policy about what counts as a >>> regression, etc. Is there anything we can do about this? >> >> Just a note in here - have you all thought about running the test suites for >> downstream projects as part of the numpy test suite? > > I don't think it came up, but it's not a bad idea! The main problems I > can foresee are: > 1) Since we don't know the downstream code, it can be hard to > interpret test suite failures. OTOH for changes we're uncertain of we > already do often end up running some downstream test suites by hand, > so it can only be an improvement on that... > 2) Sometimes everyone including downstream agrees that breaking > something is actually a good idea and they should just deal, but what > do you do then? > > These both seem solvable though. > > I guess a good strategy would be to compile a travis-compatible wheel > of $PACKAGE version $latest-stable against numpy 1.x, and then in the > 1.(x+1) development period numpy would have an additional travis run > which, instead of running the numpy test suite, instead does: > pip install . > pip install $PACKAGE-$latest-stable.whl > python -c 'import package; package.test()' # adjust as necessary > ? Where $PACKAGE is something like scipy / pandas / astropy / ... > matplotlib would be nice but maybe impractical...? > > Maybe someone else will have objections but it seems like a reasonable > idea to me. Want to put together a PR? Asides from fame and fortune > and our earnest appreciation, your reward is you get to make sure that > the packages you care about are included so that we break them less > often in the future ;-). One simple way to get going would be for the release manager to trigger a build from this repo: https://github.com/matthew-brett/travis-wheel-builder This build would then upload a wheel to: http://travis-wheels.scikit-image.org/ The upstream packages would have a test grid which included an entry with something like: pip install -f http://travis-wheels.scikit-image.org --pre numpy Cheers, Matthew From charlesr.harris at gmail.com Wed Aug 26 08:42:15 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 26 Aug 2015 06:42:15 -0600 Subject: [Numpy-discussion] SHA256 mismatch on SourceForge downloads In-Reply-To: <20150826102822.40f83fef@fsol> References: <20150826102822.40f83fef@fsol> Message-ID: On Wed, Aug 26, 2015 at 2:28 AM, Antoine Pitrou wrote: > > Hello, > > The SourceForge download page for 1.10.0b1 mentions: > > 89e467cec774527dd254c1e039801726db1367433053801f0d8bc68deac74009 > numpy-1.10.0b1.tar.gz > > But after downloading the file I get: > > $ sha256sum numpy-1.10.0b1.tar.gz > 855695405092686264dc8ce7b3f5c939a6cf1a5639833e841a5bb6fb799cd6a8 > numpy-1.10.0b1.tar.gz > > > Also, since SouceForge doesn't provide any HTTPS downloads (it > actually redirects HTTPS to HTTP (*)), this all looks a bit pointless. > > (*) seems like SourceForge is becoming a poster child of worst > practices... > > I know what happened there The original tarball generated by numpy-vendor was missing a file, so I uploaded new tar and zip files but neglected to change the sha256 signature. My bad. I'll try to do better for the 1.10.0rc1 Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeffreback at gmail.com Wed Aug 26 09:03:45 2015 From: jeffreback at gmail.com (Jeff Reback) Date: Wed, 26 Aug 2015 09:03:45 -0400 Subject: [Numpy-discussion] testing numpy with downstream testsuites (was: Re: Notes from the numpy dev meeting at scipy 2015) In-Reply-To: References: Message-ID: Pandas has for quite a while has a travis build where we install numpy master and then run our test suite. e.g. here: https://travis-ci.org/pydata/pandas/jobs/77256007 Over the last year this has uncovered a couple of changes which affected pandas (mainly using something deprecated which was turned off :) This was pretty simple to setup. Note that this adds 2+ minutes to the build (though our builds take a while anyhow so its not a big deal). On Wed, Aug 26, 2015 at 7:14 AM, Matthew Brett wrote: > Hi, > > On Wed, Aug 26, 2015 at 7:59 AM, Nathaniel Smith wrote: > > [Popping this off to its own thread to try and keep things easier to > follow] > > > > On Tue, Aug 25, 2015 at 9:52 AM, Nathan Goldbaum > wrote: > >>> - Lament: it would be really nice if we could get more people to > >>> test our beta releases, because in practice right now 1.x.0 ends > >>> up being where we actually the discover all the bugs, and 1.x.1 is > >>> where it actually becomes usable. Which sucks, and makes it > >>> difficult to have a solid policy about what counts as a > >>> regression, etc. Is there anything we can do about this? > >> > >> Just a note in here - have you all thought about running the test > suites for > >> downstream projects as part of the numpy test suite? > > > > I don't think it came up, but it's not a bad idea! The main problems I > > can foresee are: > > 1) Since we don't know the downstream code, it can be hard to > > interpret test suite failures. OTOH for changes we're uncertain of we > > already do often end up running some downstream test suites by hand, > > so it can only be an improvement on that... > > 2) Sometimes everyone including downstream agrees that breaking > > something is actually a good idea and they should just deal, but what > > do you do then? > > > > These both seem solvable though. > > > > I guess a good strategy would be to compile a travis-compatible wheel > > of $PACKAGE version $latest-stable against numpy 1.x, and then in the > > 1.(x+1) development period numpy would have an additional travis run > > which, instead of running the numpy test suite, instead does: > > pip install . > > pip install $PACKAGE-$latest-stable.whl > > python -c 'import package; package.test()' # adjust as necessary > > ? Where $PACKAGE is something like scipy / pandas / astropy / ... > > matplotlib would be nice but maybe impractical...? > > > > Maybe someone else will have objections but it seems like a reasonable > > idea to me. Want to put together a PR? Asides from fame and fortune > > and our earnest appreciation, your reward is you get to make sure that > > the packages you care about are included so that we break them less > > often in the future ;-). > > One simple way to get going would be for the release manager to > trigger a build from this repo: > > https://github.com/matthew-brett/travis-wheel-builder > > This build would then upload a wheel to: > > http://travis-wheels.scikit-image.org/ > > The upstream packages would have a test grid which included an entry > with something like: > > pip install -f http://travis-wheels.scikit-image.org --pre numpy > > Cheers, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Wed Aug 26 09:11:41 2015 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 26 Aug 2015 15:11:41 +0200 Subject: [Numpy-discussion] 1.10.0rc1 References: Message-ID: <20150826151141.17db3046@fsol> On Tue, 25 Aug 2015 10:26:02 -0600 Charles R Harris wrote: > Hi All, > > The silence after the 1.10 beta has been eerie. Consequently, I'm thinking > of making a first release candidate this weekend. If you haven't yet tested > the beta, please do so. It would be good to discover as many problems as we > can before the first release. Has typing of ufunc parameters become much stricter? I can't find anything in the release notes, but see (1.10b1): >>> arr = np.linspace(0, 5, 10) >>> out = np.empty_like(arr, dtype=np.intp) >>> np.round(arr, out=out) Traceback (most recent call last): File "", line 1, in File "/home/antoine/np110/lib/python3.4/site-packages/numpy/core/fromnumeric.py", line 2778, in round_ return round(decimals, out) TypeError: ufunc 'rint' output (typecode 'd') could not be coerced to provided output parameter (typecode 'l') according to the casting rule ''same_kind'' It used to work (1.9): >>> arr = np.linspace(0, 5, 10) >>> out = np.empty_like(arr, dtype=np.intp) >>> np.round(arr, out=out) array([0, 1, 1, 2, 2, 3, 3, 4, 4, 5]) >>> out array([0, 1, 1, 2, 2, 3, 3, 4, 4, 5]) Regards Antoine. From charlesr.harris at gmail.com Wed Aug 26 09:31:45 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 26 Aug 2015 07:31:45 -0600 Subject: [Numpy-discussion] 1.10.0rc1 In-Reply-To: <20150826151141.17db3046@fsol> References: <20150826151141.17db3046@fsol> Message-ID: On Wed, Aug 26, 2015 at 7:11 AM, Antoine Pitrou wrote: > On Tue, 25 Aug 2015 10:26:02 -0600 > Charles R Harris wrote: > > Hi All, > > > > The silence after the 1.10 beta has been eerie. Consequently, I'm > thinking > > of making a first release candidate this weekend. If you haven't yet > tested > > the beta, please do so. It would be good to discover as many problems as > we > > can before the first release. > > Has typing of ufunc parameters become much stricter? I can't find > anything in the release notes, but see (1.10b1): > > >>> arr = np.linspace(0, 5, 10) > >>> out = np.empty_like(arr, dtype=np.intp) > >>> np.round(arr, out=out) > Traceback (most recent call last): > File "", line 1, in > File > "/home/antoine/np110/lib/python3.4/site-packages/numpy/core/fromnumeric.py", > line 2778, in round_ > return round(decimals, out) > TypeError: ufunc 'rint' output (typecode 'd') could not be coerced to > provided output parameter (typecode 'l') according to the casting rule > ''same_kind'' > > > It used to work (1.9): > > >>> arr = np.linspace(0, 5, 10) > >>> out = np.empty_like(arr, dtype=np.intp) > >>> np.round(arr, out=out) > array([0, 1, 1, 2, 2, 3, 3, 4, 4, 5]) > >>> out > array([0, 1, 1, 2, 2, 3, 3, 4, 4, 5]) > The default casting mode has been changed. I think this has been raising a warning since 1.7 and was mentioned as a future change in 1.10, but you are right, it needs to be mentioned in the 1.10 release notes. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Wed Aug 26 09:32:30 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 26 Aug 2015 07:32:30 -0600 Subject: [Numpy-discussion] 1.10.0rc1 In-Reply-To: References: <20150826151141.17db3046@fsol> Message-ID: On Wed, Aug 26, 2015 at 7:31 AM, Charles R Harris wrote: > > > On Wed, Aug 26, 2015 at 7:11 AM, Antoine Pitrou > wrote: > >> On Tue, 25 Aug 2015 10:26:02 -0600 >> Charles R Harris wrote: >> > Hi All, >> > >> > The silence after the 1.10 beta has been eerie. Consequently, I'm >> thinking >> > of making a first release candidate this weekend. If you haven't yet >> tested >> > the beta, please do so. It would be good to discover as many problems >> as we >> > can before the first release. >> >> Has typing of ufunc parameters become much stricter? I can't find >> anything in the release notes, but see (1.10b1): >> >> >>> arr = np.linspace(0, 5, 10) >> >>> out = np.empty_like(arr, dtype=np.intp) >> >>> np.round(arr, out=out) >> Traceback (most recent call last): >> File "", line 1, in >> File >> "/home/antoine/np110/lib/python3.4/site-packages/numpy/core/fromnumeric.py", >> line 2778, in round_ >> return round(decimals, out) >> TypeError: ufunc 'rint' output (typecode 'd') could not be coerced to >> provided output parameter (typecode 'l') according to the casting rule >> ''same_kind'' >> >> >> It used to work (1.9): >> >> >>> arr = np.linspace(0, 5, 10) >> >>> out = np.empty_like(arr, dtype=np.intp) >> >>> np.round(arr, out=out) >> array([0, 1, 1, 2, 2, 3, 3, 4, 4, 5]) >> >>> out >> array([0, 1, 1, 2, 2, 3, 3, 4, 4, 5]) >> > > The default casting mode has been changed. I think this has been raising a > warning since 1.7 and was mentioned as a future change in 1.10, but you are > right, it needs to be mentioned in the 1.10 release notes. > Make that warned of in the 1.9.0 release notes. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Wed Aug 26 09:52:09 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 26 Aug 2015 07:52:09 -0600 Subject: [Numpy-discussion] 1.10.0rc1 In-Reply-To: References: <20150826151141.17db3046@fsol> Message-ID: On Wed, Aug 26, 2015 at 7:32 AM, Charles R Harris wrote: > > > On Wed, Aug 26, 2015 at 7:31 AM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Wed, Aug 26, 2015 at 7:11 AM, Antoine Pitrou >> wrote: >> >>> On Tue, 25 Aug 2015 10:26:02 -0600 >>> Charles R Harris wrote: >>> > Hi All, >>> > >>> > The silence after the 1.10 beta has been eerie. Consequently, I'm >>> thinking >>> > of making a first release candidate this weekend. If you haven't yet >>> tested >>> > the beta, please do so. It would be good to discover as many problems >>> as we >>> > can before the first release. >>> >>> Has typing of ufunc parameters become much stricter? I can't find >>> anything in the release notes, but see (1.10b1): >>> >>> >>> arr = np.linspace(0, 5, 10) >>> >>> out = np.empty_like(arr, dtype=np.intp) >>> >>> np.round(arr, out=out) >>> Traceback (most recent call last): >>> File "", line 1, in >>> File >>> "/home/antoine/np110/lib/python3.4/site-packages/numpy/core/fromnumeric.py", >>> line 2778, in round_ >>> return round(decimals, out) >>> TypeError: ufunc 'rint' output (typecode 'd') could not be coerced to >>> provided output parameter (typecode 'l') according to the casting rule >>> ''same_kind'' >>> >>> >>> It used to work (1.9): >>> >>> >>> arr = np.linspace(0, 5, 10) >>> >>> out = np.empty_like(arr, dtype=np.intp) >>> >>> np.round(arr, out=out) >>> array([0, 1, 1, 2, 2, 3, 3, 4, 4, 5]) >>> >>> out >>> array([0, 1, 1, 2, 2, 3, 3, 4, 4, 5]) >>> >> >> The default casting mode has been changed. I think this has been raising >> a warning since 1.7 and was mentioned as a future change in 1.10, but you >> are right, it needs to be mentioned in the 1.10 release notes. >> > > Make that warned of in the 1.9.0 release notes. > > Here it is in 1.9.0 with deprecation warning made visible. ``` In [3]: import warnings In [4]: warnings.simplefilter('always') In [5]: arr = np.linspace(0, 5, 10) In [6]: out = np.empty_like(arr, dtype=np.intp) In [7]: np.round(arr, out=out) /home/charris/.local/lib/python2.7/site-packages/numpy/core/fromnumeric.py:2640: DeprecationWarning: Implicitly casting between incompatible kinds. In a future numpy release, this will raise an error. Use casting="unsafe" if this is intentional. return round(decimals, out) Out[7]: array([0, 1, 1, 2, 2, 3, 3, 4, 4, 5]) ``` Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Wed Aug 26 10:06:14 2015 From: travis at continuum.io (Travis Oliphant) Date: Wed, 26 Aug 2015 09:06:14 -0500 Subject: [Numpy-discussion] Notes from the numpy dev meeting at scipy 2015 In-Reply-To: References: Message-ID: On Wed, Aug 26, 2015 at 1:41 AM, Nathaniel Smith wrote: > Hi Travis, > > Thanks for taking the time to write up your thoughts! > > I have many thoughts in return, but I will try to restrict myself to two > main ones :-). > > 1) On the question of whether work should be directed towards improving > NumPy-as-it-is or instead towards a compatibility-breaking replacement: > There's plenty of room for debate about whether it's better engineering > practice to try and evolve an existing system in place versus starting > over, and I guess we have some fundamental disagreements there, but I > actually think this debate is a distraction -- we can agree to disagree, > because in fact we have to try both. > Yes, on this we agree. I think NumPy can improve *and* we can have new innovative array objects. I don't disagree about that. > > At a practical level: NumPy *is* going to continue to evolve, because it > has users and people interested in evolving it; similarly, dynd and other > alternatives libraries will also continue to evolve, because they also have > people interested in doing it. And at a normative level, this is a good > thing! If NumPy and dynd both get better, than that's awesome: the worst > case is that NumPy adds the new features that we talked about at the > meeting, and dynd simultaneously becomes so awesome that everyone wants to > switch to it, and the result of this would be... that those NumPy features > are exactly the ones that will make the transition to dynd easier. Or if > some part of that plan goes wrong, then well, NumPy will still be there as > a fallback, and in the mean time we've actually fixed the major pain points > our users are begging us to fix. > > You seem to be urging us all to make a double-or-nothing wager that your > extremely ambitious plans will all work out, with the entire numerical > Python ecosystem as the stakes. I think this ambition is awesome, but maybe > it'd be wise to hedge our bets a bit? > You are mis-characterizing my view. I think NumPy can evolve (though I would personally rather see a bigger change to the underlying system like I outlined before). But, I don't believe it can even evolve easily in the direction needed without breaking ABI and that insisting on not breaking it or even putting too much effort into not breaking it will continue to create less-optimal solutions that are harder to maintain and do not take advantage of knowledge this community now has. I'm also very concerned that 'evolving' NumPy will create a situation where there are regular semantic and subtle API changes that will cause NumPy to be less stable for it's user-base. I've watched this happen. This at a time that people are already looking around for new and different approaches anyway. > > 2) You really emphasize this idea of an ABI-breaking (but not > API-breaking) release, and I think this must indicate some basic gap in how > we're looking at things. Where I'm getting stuck here is that... I actually > can't think of anything important that we can't do now, but could if we > were allowed to break ABI compatibility. The kinds of things that break ABI > but keep API are like... rearranging what order the fields in a struct fall > in, or changing the numeric value of opaque constants like > NPY_ARRAY_WRITEABLE. The biggest win I can think of is that we could save a > few bytes per array by arranging the fields inside the ndarray struct more > optimally, but that's hardly a feature to hang a 2.0 on. You seem to have a > vision of this ABI-breaking release as being something very different from > that, and I'm not clear on what this vision is. > > We already broke the ABI with date-time changes --- it's still broken for a certain percentage of users last I checked. So, part of my disagreement is that we've tried this and it didn't work --- even though smart people thought it would. I've had to deal with this personally and I'm not enthusiastic about having to deal with this for the next 5 years because of even more attempts to make changes while not breaking the ABI. I think the group is more careful now --- but I still think the API is broad enough and uses of NumPy deep enough that the effort involved in trying not to break the ABI is just not worth the effort (because it's a non-feature today). Adding new dtypes without breaking the ABI is tricky (and to do it without breaking the ABI is ugly). I also continue to believe that putting out a new ABI-breaking NumPy will allow re-compiling *once* (with some porting changes needed) and not subtle breakages requiring code-changes every time a release is made. If subtle changes aren't made, then the new features won't come. Right now, I'd rather have stability from NumPy than new features. New features can come from other libraries. One specific change that could easily be made in NumPy 2.0 (the current code but with an ABI change) is that Dtypes should become true type objects and array-scalars (which are the current type-objects) should become instances of those dtypes. That is the biggest clean-up needed, I think on the array-front. There should not be *both* array-scalars and dtype objects. They are the same thing fundamentally. It was a mistake to have both of them. I don't see how to make that change without breaking the ABI. Perhaps it could be done in a creative way --- but why put the effort into that and end up with an even more hacky code-base. NumPy's ABI was influenced by and evolved from Numeric and Numarray. It was not "designed" to last 30 years. I think the dtype "types" should potentially have different member-structures. The ufunc sub-system needs an overhaul --- it's member structures need upgrades. With generalized ufuncs and the iteration protocols of Mark Wiebe we know a whole lot more about ufuncs now. Ufuncs are the same 1995 structure that Jim Hugunin wrote. I suppose you *could* just tack new functions on the end of structure and keep growing the list (while leaving old, unused structures as unused or deprecated) --- or you can take the opportunity to tidy up a bit. The longer you leave everything the same, the harder you make the code-base and the more costly maintenance becomes. I just don't see the value there --- and I see a lot of pain. Regarding the ufunc subsystem. We've argued before about the lack of mulit-methods in NumPy. Continuing to add dunder-methods to try and get around it will continue to make the system harder to maintain and more brittle. You mention making NumPy an interface to multiple things along with many other ideas. I don't believe you can get there without real changes that break things (at the very least semantic changes). I'm not excited about those changes causing instability (which they will cause ---- to me the burden of proof that they won't is on you who wants to make the change and not on me to say how they will). I also think it will take much longer to get there incrementally (if at all) than just creating something on top of newer ideas. > The main reason I personally am against having a big ABI-breaking release > is not that I hate ABI breakage a priori, it's that all the big features > that I care about and the are users are asking for seem to be ones that... > don't actually require doing that. At most they seem to get a mild benefit > from breaking some obscure corner cases. So the cost/benefits don't make > any sense to me. > > So: can you give a concrete example of a change you have in mind where > breaking ABI would be the key enabler? > > (I guess you might also be thinking of a separate issue that you sort of > allude to: Perhaps we will try to make changes which we think don't involve > breaking the ABI, but discover too late that we have failed to fully > understand the implications and have broken it by mistake. IIUC this is > what happened in the 1.4 timeframe when datetime64 was merged and > accidentally renumbered some of the NPY_* constants. > Yes, this is what I'm mainly worried about. But, more than that, I'm concerned about general *semantic* and API changes at a rapid pace for a community that is just looking for stability and bug-fixes from NumPy itself --- with innovation happening elsewhere. > Partially I am less worried about this because I have a fair amount of > confidence that our review and QA process has improved these days to the > point that we would not let a change like that slip through by accident -- > we have a lot more active reviewers, people are sensitized to the issues, > we've successfully landed intrusive changes like Sebastian's indexing > rewrite, ... though this is very much second-hand impressions on my part, > and I'd welcome input from folks like Chuck who have a clearer view on how > things have changed from then to now. > > But more importantly, even if this is true, then I can't see how your > proposal helps. If we aren't good enough at our jobs to predict when we'll > break ABI, then by assumption it makes no sense to pick one release and > decide that this is the one time that we'll break ABI.) > I don't understand your point. Picking a release to break the ABI allows you to actually do things like change macros to functions and move structures around to be more consistent with a new design that is easier to maintain and allows more growth. It has nothing to do with "whether you are good at your job". Everyone has strengths and weaknesses. This kind of clean-up may be needed regularly --- every 3 years would not be a crazy pattern, but it could also be every 5 years if you wanted more discipline. I already knew we needed to break the ABI "soonish" when I released NumPy 1.0. The fact that we haven't officially done it yet (but have done it unofficially) is a great injustice to "what could be" and has slowed development of NumPy tremendously. We've gone back and forth on this. I'm fine if we disagree, but I just hope the disagreement doesn't lead to lack of cooperation as we both have the same ultimate interests in seeing array-computing in Python improve. I just don't support *major* changes without breaking the ABI without a whole lot of proof that it is possible (without hackiness). You have mentioned on your roadmap a lot of what I would consider *major* changes. Some of it you describe how to get there. The most important change (improving the dtype system) you don't. Part of my point is that we now *know* how to improve the dtype system. Let's do it. Let's not try "yet again" to do it differently inside an old system designed by a scientist who didn't understand type-theory or type systems (that was me by the way). Look at data-shape in the blaze project. Take that and build a Python type-system that also outputs struct-string syntax for memory-views. That's the data-description system that NumPy should be using --- not trying to hack on a mixed array-scalar, dtype-object system that may never support everything we now know is needed. Trying to incrementing from where we are now will only lead to a sub-optimal outcome and unfortunate instability when we already know what to do differently. I doubt I will convince you --- certainly not via email. I apologize in advance that I likely won't be able to respond in depth to any more questions that are really just "prove to me that I can't" kind of questions. Of course I can't prove that. All I'm saying is that to me the evidence and my experience leads me to not be able to support major changes like you have proposed without also intentionally breaking the ABI (and thus calling it NumPy 2.0). If I find time to write, I will try to use it to outline more specifically what I think is a better approach to array- and table-computing in Python that keeps the stability of NumPy and adds new features using different approaches. -Travis > > On Tue, Aug 25, 2015 at 12:00 PM, Travis Oliphant > wrote: > >> Thanks for the write-up Nathaniel. There is a lot of great detail and >> interesting ideas here. >> >> I've am very eager to understand how to help NumPy and the wider >> community move forward however I can (my passions on this have not changed >> since 1999, though what I myself spend time on has changed). >> >> There are a lot of ways to think about approaching this, though. It's >> hard to get all the ideas on the table, and it was unfortunate we couldn't >> get everybody wyho are core NumPy devs together in person to have this >> discussion as there are still a lot of questions unanswered and a lot of >> thought that has gone into other approaches that was not brought up or >> represented in the meeting (how does Numba fit into this, what about >> data-shape, dynd, memory-views and Python type system, etc.). If NumPy >> becomes just an interface-specification, then why don't we just do that >> *outside* NumPy itself in a way that doesn't jeopardize the stability of >> NumPy today. These are some of the real questions I have. I will try >> to write up my thoughts in more depth soon, but I won't be able to respond >> in-depth right now. I just wanted to comment because Nathaniel said I >> disagree which is only partly true. >> >> The three most important things for me are 1) let's make sure we have >> representation from as wide of the community as possible (this is really >> hard), 2) let's look around at the broader community and the prior art that >> is happening in this space right now and 3) let's not pretend we are going >> to be able to make all this happen without breaking ABI compatibility. >> Let's just break ABI compatibility with NumPy 2.0 *and* have as much >> fidelity with the API and semantics of current NumPy as possible (though >> there will be some changes necessary long-term). >> >> I don't think we should intentionally break ABI if we can avoid it, but I >> also don't think we should spend in-ordinate amounts of time trying to >> pretend that we won't break ABI (for at least some people), and most >> importantly we should not pretend *not* to break the ABI when we actually >> do. We did this once before with the roll-out of date-time, and it was >> really un-necessary. When I released NumPy 1.0, there were several >> things that I knew should be fixed very soon (NumPy was never designed to >> not break ABI). Those problems are still there. Now, that we have >> quite a bit better understanding of what NumPy *should* be (there have been >> tremendous strides in understanding and community size over the past 10 >> years), let's actually make the infrastructure we think will last for the >> next 20 years (instead of trying to shoe-horn new ideas into a 20-year old >> code-base that wasn't designed for it). >> >> NumPy is a hard code-base. It has been since Numeric days in 1995. I >> could be wrong, but my guess is that we will be passed by as a community if >> we don't seize the opportunity to build something better than we can build >> if we are forced to use a 20 year old code-base. >> >> It is more important to not break people's code and to be clear when a >> re-compile is necessary for dependencies. Those to me are the most >> important constraints. There are a lot of great ideas that we all have >> about what we want NumPy to be able to do. Some of this are pretty >> transformational (and the more exciting they are, the harder I think they >> are going to be to implement without breaking at least the ABI). There >> is probably some CAP-like theorem around >> Stability-Features-Speed-of-Development (pick 2) when it comes to Open >> Source Software development and making feature-progress with NumPy *is >> going* to create in-stability which concerns me. >> >> I would like to see a little-bit-of-pain one time with a NumPy 2.0, >> rather than a constant pain because of constant churn over many years >> approach that Nathaniel seems to advocate. To me NumPy 2.0 is an >> ABI-breaking release that is as API-compatible as possible and whose >> semantics are not dramatically different. >> >> There are at least 3 areas of compatibility (ABI, API, and semantic). >> ABI-compatibility is a non-feature in today's world. There are so many >> distributions of the NumPy stack (and conda makes it trivial for anyone to >> build their own or for you to build one yourself). Making less-optimal >> software-engineering choices because of fear of breaking the ABI is not >> something I'm supportive of at all. We should not break ABI every >> release, but a release every 3 years that breaks ABI is not a problem. >> >> API compatibility should be much more sacrosanct, but it is also >> something that can also be managed. Any NumPy 2.0 should definitely >> support the full NumPy API (though there could be deprecated swaths). I >> think the community has done well in using deprecation and limiting the >> public API to make this more manageable and I would love to see a NumPy 2.0 >> that solidifies a future-oriented API along with a back-ward compatible API >> that is also available. >> >> Semantic compatibility is the hardest. We have already broken this on >> multiple occasions throughout the 1.x NumPy releases. Every time you >> change the code, this can change. This is what I fear causing deep >> instability over the course of many years. These are things like the >> casting rule details, the effect of indexing changes, any change to the >> calculations approaches. It is and has been the most at risk during any >> code-changes. My view is that a NumPy 2.0 (with a new low-level >> architecture) minimizes these changes to a single release rather than >> unavoidably spreading them out over many, many releases. >> >> I think that summarizes my main concerns. I will write-up more forward >> thinking ideas for what else is possible in the coming weeks. In the mean >> time, thanks for keeping the discussion going. It is extremely exciting to >> see the help people have continued to provide to maintain and improve >> NumPy. It will be exciting to see what the next few years bring as well. >> >> >> Best, >> >> -Travis >> >> >> >> >> >> >> On Tue, Aug 25, 2015 at 5:03 AM, Nathaniel Smith wrote: >> >>> Hi all, >>> >>> These are the notes from the NumPy dev meeting held July 7, 2015, at >>> the SciPy conference in Austin, presented here so the list can keep up >>> with what happens, and so you can give feedback. Please do give >>> feedback, none of this is final! >>> >>> (Also, if anyone who was there notices anything I left out or >>> mischaracterized, please speak up -- these are a lot of notes I'm >>> trying to gather together, so I could easily have missed something!) >>> >>> Thanks to Jill Cowan and the rest of the SciPy organizers for donating >>> space and organizing logistics for us, and to the Berkeley Institute >>> for Data Science for funding travel for Jaime, Nathaniel, and >>> Sebastian. >>> >>> >>> Attendees >>> ========= >>> >>> Present in the room for all or part: Daniel Allan, Chris Barker, >>> Sebastian Berg, Thomas Caswell, Jeff Reback, Jaime Fern?ndez del >>> R?o, Chuck Harris, Nathaniel Smith, St?fan van der Walt. (Note: I'm >>> pretty sure this list is incomplete) >>> >>> Joining remotely for all or part: Stephan Hoyer, Julian Taylor. >>> >>> >>> Formalizing our governance/decision making >>> ========================================== >>> >>> This was a major focus of discussion. At a high level, the consensus >>> was to steal IPython's governance document ("IPEP 29") and modify it >>> to remove its use of a BDFL as a "backstop" to normal community >>> consensus-based decision, and replace it with a new "backstop" based >>> on Apache-project-style consensus voting amongst the core team. >>> >>> I'll send out a proper draft of this shortly for further discussion. >>> >>> >>> Development roadmap >>> =================== >>> >>> General consensus: >>> >>> Let's assume NumPy is going to remain important indefinitely, and >>> try to make it better, instead of waiting for something better to >>> come along. (This is unlikely to be wasted effort even if something >>> better does come along, and it's hardly a sure thing that that will >>> happen anyway.) >>> >>> Let's focus on evolving numpy as far as we can without major >>> break-the-world changes (no "numpy 2.0", at least in the foreseeable >>> future). >>> >>> And, as a target for that evolution, let's change our focus from >>> numpy as "NumPy is the library that gives you the np.ndarray object >>> (plus some attached infrastructure)", to "NumPy provides the >>> standard framework for working with arrays and array-like objects in >>> Python" >>> >>> This means, creating defined interfaces between array-like objects / >>> ufunc objects / dtype objects, so that it becomes possible for third >>> parties to add their own and mix-and-match. Right now ufuncs are >>> pretty good at this, but if you want a new array class or dtype then >>> in most cases you pretty much have to modify numpy itself. >>> >>> Vision: instead of everyone who wants a new container type having to >>> reimplement all of numpy, Alice can implement an array class using >>> (sparse / distributed / compressed / tiled / gpu / out-of-core / >>> delayed / ...) storage, pass it to code that was written using >>> direct calls to np.* functions, and it just works. (Instead of >>> np.sin being "the way you calculate the sine of an ndarray", it's >>> "the way you calculate the sine of any array-like container >>> object".) >>> >>> Vision: Darryl can implement a new dtype for (categorical data / >>> astronomical dates / integers-with-missing-values / ...) without >>> having to touch the numpy core. >>> >>> Vision: Chandni can then come along and combine them by doing >>> >>> a = alice_array([...], dtype=darryl_dtype) >>> >>> and it just works. >>> >>> Vision: no-one is tempted to subclass ndarray, because anything you >>> can do with an ndarray subclass you can also easily do by defining >>> your own new class that implements the "array protocol". >>> >>> >>> Supporting third-party array types >>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>> >>> Sub-goals: >>> - Get __numpy_ufunc__ done, which will cover a good chunk of numpy's >>> API right there. >>> - Go through the rest of the stuff in numpy, and figure out some >>> story for how to let it handle third-party array classes: >>> - ufunc ALL the things: Some things can be converted directly into >>> (g)ufuncs and then use __numpy_ufunc__ (e.g., np.std); some >>> things could be converted into (g)ufuncs if we extended the >>> (g)ufunc interface a bit (e.g. np.sort, np.matmul). >>> - Some things probably need their own __numpy_ufunc__-like >>> extensions (__numpy_concatenate__?) >>> - Provide tools to make it easier to implement the more complicated >>> parts of an array object (e.g. the bazillion different methods, >>> many of which are ufuncs in disguise, or indexing) >>> - Longer-run interesting research project: __numpy_ufunc__ requires >>> that one or the other object have explicit knowledge of how to >>> handle the other, so to handle binary ufuncs with N array types >>> you need something like N**2 __numpy_ufunc__ code paths. As an >>> alternative, if there were some interface that an object could >>> export that provided the operations nditer needs to efficiently >>> iterate over (chunks of) it, then you would only need N >>> implementations of this interface to handle all N**2 operations. >>> >>> This would solve a lot of problems for projects like: >>> - blosc >>> - dask >>> - distarray >>> - numpy.ma >>> - pandas >>> - scipy.sparse >>> - xray >>> >>> >>> Supporting third-party dtypes >>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>> >>> We already have something like a C level "dtype >>> protocol". Conceptually, the way you define a new dtype is by >>> defining a new class whose instances have data attributes defining >>> the parameters of the dtype (what fields are in *this* record dtype, >>> how many characters are in *this* string dtype, what units are used >>> for *this* datetime64, etc.), and you define a bunch of methods to >>> do things like convert an object from a Python object to your dtype >>> or vice-versa, to copy an array of your dtype from one place to >>> another, to cast to and from your new dtype, etc. This part is >>> great. >>> >>> The problem is, in the current implementation, we don't actually use >>> the Python object system to define these classes / attributes / >>> methods. Instead, all possible dtypes are jammed into a single >>> Python-level class, whose struct has fields for the union of all >>> possible dtype's attributes, and instead of Python-style method >>> slots there's just a big table of function pointers attached to each >>> object. >>> >>> So the main proposal is that we keep the basic design, but switch it >>> so that the float64 dtype, the int64 dtype, etc. actually literally >>> are subclasses of np.dtype, each implementing their own fields and >>> Python-style methods. >>> >>> Some of the pieces involved in doing this: >>> >>> - The current dtype methods should be cleaned up -- e.g. 'dot' and >>> 'less_than' are both dtype methods, when conceptually they're much >>> more like ufuncs. >>> >>> - The ufunc inner-loop interface currently does not get a reference >>> to the dtype object, so they can't see its attributes and this is >>> a big obstacle to many interesting dtypes (e.g., it's hard to >>> implement np.equal for categoricals if you don't know what >>> categories each has). So we need to add new arguments to the core >>> ufunc loop signature. (Fortunately this can be done in a >>> backwards-compatible way.) >>> >>> - We need to figure out what exactly the dtype methods should be, >>> and add them to the dtype class (possibly with backwards >>> compatibility shims for anyone who is accessing PyArray_ArrFuncs >>> directly). >>> >>> - Casting will be possibly the trickiest thing to work out, though >>> the basic idea of using dunder-dispatch-like __cast__ and >>> __rcast__ methods seems workable. (Encouragingly, this is also >>> exactly what dynd also does, though unfortunately dynd does not >>> yet support user-defined dtypes even to the extent that numpy >>> does, so there isn't much else we can steal from them.) >>> - We may also want to rethink the casting rules while we're at it, >>> since they have some very weird corners right now (e.g. see >>> [https://github.com/numpy/numpy/issues/6240]) >>> >>> - We need to migrate the current dtypes over to the new system, >>> which can be done in stages: >>> >>> - First stick them all in a single "legacy dtype" class whose >>> methods just dispatch to the PyArray_ArrFuncs per-object "method >>> table" >>> >>> - Then move each of them into their own classes >>> >>> - We should provide a Python-level wrapper for the protocol, so that >>> you can call dtype methods from Python >>> >>> - And vice-versa, it should be possible to subclass dtype at the >>> Python level >>> >>> - etc. >>> >>> Fortunately, AFAICT pretty much all of this can be done while >>> maintaining backwards compatibility (though we may want to break >>> some obscure cases to avoid expending *too* much effort with weird >>> backcompat contortions that will only help a vanishingly small >>> proportion of the userbase), and a lot of the above changes can be >>> done as semi-independent mini-projects, so there's no need for some >>> branch to go off and spend a year rewriting the world. >>> >>> Obviously there are still a lot of details to work out, though. But >>> overall, there was widespread agreement that this is one of the #1 >>> pain points for our users (e.g. it's the single main request from >>> pandas), and fixing it is very high priority. >>> >>> Some features that would become straightforward to implement >>> (e.g. even in third-party libraries) if this were fixed: >>> - missing value support >>> - physical unit tracking (meters / seconds -> array of velocity; >>> meters + seconds -> error) >>> - better and more diverse datetime representations (e.g. datetimes >>> with attached timezones, or using funky geophysical or >>> astronomical calendars) >>> - categorical data >>> - variable length strings >>> - strings-with-encodings (e.g. latin1) >>> - forward mode automatic differentiation (write a function that >>> computes f(x) where x is an array of float64; pass that function >>> an array with a special dtype and get out both f(x) and f'(x)) >>> - probably others I'm forgetting right now >>> >>> I should also note that there was one substantial objection to this >>> plan, from Travis Oliphant (in discussions later in the >>> conference). I'm not confident I understand his objections well >>> enough to reproduce them here, though -- perhaps he'll elaborate. >>> >>> >>> Money >>> ===== >>> >>> There was an extensive discussion on the topic of: "if we had money, >>> what would we do with it?" >>> >>> This is partially motivated by the realization that there are a >>> number of sources that we could probably get money from, if we had a >>> good story for what we wanted to do, so it's not just an idle >>> question. >>> >>> Points of general agreement: >>> >>> - Doing the in-person meeting was a good thing. We should plan do >>> that again, at least once a year. So one thing to spend money on >>> is travel subsidies to make sure that happens and is productive. >>> >>> - While it's tempting to imagine hiring junior people for the more >>> frustrating/boring work like maintaining buildbots, release >>> infrastructure, updating docs, etc., this seems difficult to do >>> realistically with our current resources -- how do we hire for >>> this, who would manage them, etc.? >>> >>> - On the other hand, the general feeling was that if we found the >>> money to hire a few more senior people who could take care of >>> themselves more, then that would be good and we could >>> realistically absorb that extra work without totally unbalancing >>> the project. >>> >>> - A major open question is how we would recruit someone for a >>> position like this, since apparently all the obvious candidates >>> who are already active on the NumPy team already have other >>> things going on. [For calibration on how hard this can be: NYU >>> has apparently had an open position for a year with the job >>> description of "come work at NYU full-time with a >>> private-industry-competitive-salary on whatever your personal >>> open-source scientific project is" (!) and still is having an >>> extremely difficult time filling it: >>> [http://cds.nyu.edu/research-engineer/]] >>> >>> - General consensus though was that there isn't much to be done >>> about this though, except try it and see. >>> >>> - (By the way, if you're someone who's reading this and >>> potentially interested in like a postdoc or better working on >>> numpy, then let's talk...) >>> >>> >>> More specific changes to numpy that had general consensus, but don't >>> really fit into a high-level roadmap >>> >>> ========================================================================================================= >>> >>> - Resolved: we should merge multiarray.so and umath.so into a single >>> extension module, so that they can share utility code without the >>> current awkward contortions. >>> >>> - Resolved: we should start hiding new fields in the ufunc and dtype >>> structs as soon as possible going forward. (I.e. they would not be >>> present in the version of the structs that are exposed through the >>> C API, but internally we would use a more detailed struct.) >>> - Mayyyyyybe we should even go ahead and hide the subset of the >>> existing fields that are really internal details that no-one >>> should be using. If we did this without changing anything else >>> then it would preserve ABI (the fields would still be where >>> existing compiled extensions expect them to be, if any such >>> extensions exist) while breaking API (trying to compile such >>> extensions would give a clear error), so would be a smoother >>> ramp if we think we need to eventually break those fields for >>> real. (As discussed above, there are a bunch of fields in the >>> dtype base class that only make sense for specific dtype >>> subclasses, e.g. only record dtypes need a list of field names, >>> but right now all dtypes have one anyway. So it would be nice to >>> remove these from the base class entirely, but that is >>> potentially ABI-breaking.) >>> >>> - Resolved: np.array should never return an object array unless >>> explicitly requested (e.g. with dtype=object); it just causes too >>> many surprising problems. >>> - First step: add a deprecation warning >>> - Eventually: make it an error. >>> >>> - The matrix class >>> - Resolved: We won't add warnings yet, but we will prominently >>> document that it is deprecated and should be avoided where-ever >>> possible. >>> - St?fan van der Walt volunteers to do this. >>> - We'd all like to deprecate it properly, but the feeling was that >>> the precondition for this is for scipy.sparse to provide sparse >>> "arrays" that don't return np.matrix objects on ordinary >>> operatoins. Until that happens we can't reasonably tell people >>> that using np.matrix is a bug. >>> >>> - Resolved: we should add a similar prominent note to the >>> "subclassing ndarray" documentation, warning people that this is >>> painful and barely works and please don't do it if you have any >>> alternatives. >>> >>> - Resolved: we want more, smaller releases -- every 6 months at >>> least, aiming to go even faster (every 4 months?) >>> >>> - On the question of using Cython inside numpy core: >>> - Everyone agrees that there are places where this would be an >>> improvement (e.g., Python<->C interfaces, and places "when you >>> want to do computer science", e.g. complicated algorithmic stuff >>> like graph traversals) >>> - Chuck wanted it to be clear though that he doesn't think it >>> would be a good goal to try and rewrite all of numpy in Cython >>> -- there also exist places where Cython ends up being "an uglier >>> version of C". No-one disagreed. >>> >>> - Our text reader is apparently not very functional on Python 3, and >>> generally slow and hard to work with. >>> - Resolved: We should extract Pandas's awesome text reader/parser >>> and convert it into its own package, that could then become a >>> new backend for both pandas and numpy.loadtxt. >>> - Jeff thinks this is a great idea >>> - Thomas Caswell volunteers to do the extraction. >>> >>> - We should work on improving our tools for evolving the ABI, so >>> that we will eventually be less constrained by decisions made >>> decades ago. >>> - One idea that had a lot of support was to switch from our >>> current append-only C-API to a "sliding window" API based on >>> explicit versions. So a downstream package might say >>> >>> #define NUMPY_API_VERSION 4 >>> >>> and they'd get the functions and behaviour provided in "version >>> 4" of the numpy C api. If they wanted to get access to new stuff >>> that was added in version 5, then they'd need to switch that >>> #define, and at the same time clean up any usage of stuff that >>> was removed or changed in version 5. And to provide a smooth >>> migration path, one version of numpy would support multiple >>> versions at once, gradually deprecating and dropping old >>> versions. >>> >>> - If anyone wants to help bring pip up to scratch WRT tracking ABI >>> dependencies (e.g., 'pip install numpy==' >>> -> triggers rebuild of scipy against the new ABI), then that >>> would be an extremely useful thing. >>> >>> >>> Policies that should be documented >>> ================================== >>> >>> ...together with some notes about what the contents of the document >>> should be: >>> >>> >>> How we manage bugs in the bug tracker. >>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>> >>> - Github "milestones" should *only* be assigned to release-blocker >>> bugs (which mostly means "regression from the last release"). >>> >>> In particular, if you're tempted to push a bug forward to the next >>> release... then it's clearly not a blocker, so don't set it to the >>> next release's milestone, just remove the milestone entirely. >>> >>> (Obvious exception to this: deprecation followup bugs where we >>> decide that we want to keep the deprecation around a bit longer >>> are a case where a bug actually does switch from being a blocker >>> to release 1.x to being a blocker for release 1.(x+1).) >>> >>> - Don't hesitate to close an issue if there's no way forward -- >>> e.g. a PR where the author has disappeared. Just post a link to >>> this policy and close, with a polite note that we need to keep our >>> tracker useful as a todo list, but they're welcome to re-open if >>> things change. >>> >>> >>> Deprecations and breakage policy: >>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>> >>> - How long do we need to keep DeprecationWarnings around before we >>> break things? This is tricky because on the one hand an aggressive >>> (short) deprecation period lets us deliver new features and >>> important cleanups more quickly, but on the other hand a >>> too-aggressive deprecation period is difficult for our more >>> conservative downstream users. >>> >>> - Idea that had the most support: pick a somewhat-aggressive >>> warning period as our default, and make a rule that if someone >>> asks for an extension during the beta cycle for the release that >>> removes it, then we put it back for another release or two worth >>> of grace period. (While also possibly upgrading the warning to >>> be more visible during the grace period.) This gives us >>> deprecation periods that are more adaptive on a case-by-case >>> basis. >>> >>> - Lament: it would be really nice if we could get more people to >>> test our beta releases, because in practice right now 1.x.0 ends >>> up being where we actually the discover all the bugs, and 1.x.1 is >>> where it actually becomes usable. Which sucks, and makes it >>> difficult to have a solid policy about what counts as a >>> regression, etc. Is there anything we can do about this? >>> >>> - ABI breakage: we distinguish between an ABI break that breaks >>> everything (e.g., "import scipy" segfaults), versus an ABI break >>> that breaks an occasional rare case (e.g., only apps that poke >>> around in some obscure corner of some struct are affected). >>> >>> - The "break-the-world" type remains off-limit for now: the pain >>> is still too large (conda helps, but there are lots of people >>> who don't use conda!), and there aren't really any compelling >>> improvements that this would enable anyway. >>> >>> - For the "break-0.1%-of-users" type, it is *not* ruled out by >>> fiat, though we remain conservative: we should treat it like >>> other API breaks in principle, and do a careful case-by-case >>> analysis of the details of the situation, taking into account >>> what kind of code would be broken, how common these cases are, >>> how important the benefits are, whether there are any specific >>> mitigation strategies we can use, etc. -- with this process of >>> course taking into account that a segfault is nastier than a >>> Python exception. >>> >>> >>> Other points that were discussed >>> ================================ >>> >>> - There was inconclusive discussion of what we should do with dot() >>> in the places where it disagrees with the PEP 465 matmul semantics >>> (specifically this is when both arguments have ndim >= 3, or one >>> argument has ndim == 0). >>> - The concern is that the current behavior is not very useful, and >>> as far as we can tell no-one is using it; but, as people get >>> used to the more-useful PEP 465 behavior, they will increasingly >>> try to use it on the assumption that np.dot will work the same >>> way, and this will create pain for lots of people. So Nathaniel >>> argued that we should start at least issuing a visible warning >>> when people invoke the corner-case behavior. >>> - But OTOH, np.dot is such a core piece of infrastructure, and >>> there's such a large landscape of code out there using numpy >>> that we can't see, that others were reasonably wary of making >>> any change. >>> - For now: document prominently, but no change in behavior. >>> >>> >>> Links to raw notes >>> ================== >>> >>> Main page: >>> [https://github.com/numpy/numpy/wiki/SciPy-2015-developer-meeting] >>> >>> Notes from the meeting proper: >>> [ >>> https://docs.google.com/document/d/1IJcYdsHtk8MVAM4AZqFDBSf_nVG-mrB4Tv2bh9u1g4Y/edit?usp=sharing >>> ] >>> >>> Slides from the followup BoF: >>> [ >>> https://gist.github.com/njsmith/eb42762054c88e810786/raw/b74f978ce10a972831c582485c80fb5b8e68183b/future-of-numpy-bof.odp >>> ] >>> >>> Notes from the followup BoF: >>> [ >>> https://docs.google.com/document/d/11AuTPms5dIPo04JaBOWEoebXfk-tUzEZ-CvFnLIt33w/edit >>> ] >>> >>> -n >>> >>> -- >>> Nathaniel J. Smith -- http://vorpus.org >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> >> >> >> -- >> >> *Travis Oliphant* >> *Co-founder and CEO* >> >> >> @teoliphant >> 512-222-5440 >> http://www.continuum.io >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > > -- > Nathaniel J. Smith -- http://vorpus.org > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- *Travis Oliphant* *Co-founder and CEO* @teoliphant 512-222-5440 http://www.continuum.io -------------- next part -------------- An HTML attachment was scrubbed... URL: From faltet at gmail.com Wed Aug 26 11:38:20 2015 From: faltet at gmail.com (Francesc Alted) Date: Wed, 26 Aug 2015 17:38:20 +0200 Subject: [Numpy-discussion] UTC-based datetime64 Message-ID: Hi, We've found that NumPy uses the local TZ for printing datetime64 timestamps: In [22]: t = datetime.utcnow() In [23]: print t 2015-08-26 11:52:10.662745 In [24]: np.array([t], dtype="datetime64[s]") Out[24]: array(['2015-08-26T13:52:10+0200'], dtype='datetime64[s]') Googling for a way to print UTC out of the box, the best thing I could find is: In [40]: [str(i.item()) for i in np.array([t], dtype="datetime64[s]")] Out[40]: ['2015-08-26 11:52:10'] Now, is there a better way to specify that I want the datetimes printed always in UTC? Thanks, -- Francesc Alted -------------- next part -------------- An HTML attachment was scrubbed... URL: From izaid at continuum.io Wed Aug 26 12:45:51 2015 From: izaid at continuum.io (Irwin Zaid) Date: Wed, 26 Aug 2015 16:45:51 +0000 (UTC) Subject: [Numpy-discussion] Notes from the numpy dev meeting at scipy 2015 References: Message-ID: Hello everyone, Mark and I thought it would be good to weigh in here and also be explicitly around to discuss DyND. To be clear, neither of us has strong feelings on what NumPy *should* do -- we are both long-time NumPy users and we both see NumPy being around for a while. But, as Francesc mentioned, there is also the open question of where the community should be implementing new features. It would certainly be nice to not have duplication of effort, but a decision like that can only arise naturally from a broad consensus. Travis covered DyND's history and it's relationship with Continuum pretty well, so what's really missing here is what DyND is, where it is going, and how long we think it'll take to get there. We'll try to stick to those topics. We designed DyND to fill what we saw as fundamental gaps in NumPy. These are not only missing features, but also limitations of its architecture. Many of these gaps have been mentioned several times before in this thread and elsewhere, but a brief list would include: better support for missing values, variable-length strings, GPUs, more extensible types, categoricals, more datetime features, ... Some of these were indeed on Nathaniel's list and many of them are already working (albeit sometimes partially) in DyND. And, yes, we strongly feel that NumPy's fundamental dependence on Python itself is a limitation. Why should we not take the fantastic success of NumPy and generalize it across other languages? So, we see DyND is having a twofold purpose. The first is to expand upon the kinds of data that NumPy can represent and do computations upon. The second is to provide a standard array package that can cross the language barrier and easily interoperate between C++, Python, or whatever you want. DyND, at the moment, is quite functional in some areas and lacking a bit in others. There is no doubt that it is still "experimental" and a bit unstable. But, it has advanced by a lot recently, and we are steadily working towards something like a version 1.0. In fact, DyND's internal C++ architecture stabilized some time ago -- what's missing now is really solid coverage of some common use cases, alongside up-to-date Python bindings and an easy installation process. All of these are in progress and advancing as quick as we can make them. On the other hand, we are also building out some other features. To give just one example that might excite people, DyND now has Numba interoperability -- one can write DyND's equivalent of a ufunc in Python and, with a single decorator, have a broadcasting or reduction callable that gets JITed or (soon) ahead-of-time compiled. Over the next few months, we are hopeful that we can get DyND into a state where it is largely usable by those familiar with NumPy semantics. The reason why we can be a bit more aggressive in our timeline now is because of the great support we are getting from Continuum. With all that said, we are happy to be a part of of any broader conversation involving NumPy and the community. All the best, Irwin and Mark From solipsis at pitrou.net Wed Aug 26 13:11:01 2015 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 26 Aug 2015 19:11:01 +0200 Subject: [Numpy-discussion] Notes from the numpy dev meeting at scipy 2015 References: Message-ID: <20150826191101.3cf475e1@fsol> On Wed, 26 Aug 2015 16:45:51 +0000 (UTC) Irwin Zaid wrote: > > So, we see DyND is having a twofold purpose. The first is to expand upon the > kinds of data that NumPy can represent and do computations upon. The second > is to provide a standard array package that can cross the language barrier > and easily interoperate between C++, Python, or whatever you want. One possible limitation is that the lingua franca for language interoperability is C, not C++. DyND doesn't have to be written in C, but exposing a nice C API may help make it attractive to the various language runtimes out there. (even those languages whose runtime doesn't have a compile-time interface to C generally have some kind of cffi or ctypes equivalent to load external C routines at runtime) Regards Antoine. From izaid at continuum.io Wed Aug 26 13:20:13 2015 From: izaid at continuum.io (Irwin Zaid) Date: Wed, 26 Aug 2015 18:20:13 +0100 Subject: [Numpy-discussion] Notes from the numpy dev meeting at scipy 2015 In-Reply-To: <20150826191101.3cf475e1@fsol> References: <20150826191101.3cf475e1@fsol> Message-ID: On Wed, Aug 26, 2015 at 6:11 PM, Antoine Pitrou wrote: > One possible limitation is that the lingua franca for language > interoperability is C, not C++. DyND doesn't have to be written in C, > but exposing a nice C API may help make it attractive to the various > language runtimes out there. > That is absolutely true and a C API is on the long-term roadmap. At the moment, a C API is not needed for DyND to be stable and usable from Python, which is one reason we aren't doing it now. Irwin -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Wed Aug 26 13:41:59 2015 From: mwwiebe at gmail.com (Mark Wiebe) Date: Wed, 26 Aug 2015 10:41:59 -0700 Subject: [Numpy-discussion] Notes from the numpy dev meeting at scipy 2015 In-Reply-To: References: Message-ID: I thought I'd add a little more specifically about the kind of graphics/point cloud work I'm doing right now at Thinkbox, and how it relates. To echo Francesc's point about NumPy already being an industry standard, within the VFX/graphics industry there is a reference platform definition on Linux, and the most recent iteration of that specifies a version of NumPy. It also includes a bunch of other open source libraries worth taking a look at if you haven't seen them before: http://www.vfxplatform.com/ Point cloud/particle system data, mesh geometry, numerical grids (both dense and sparse), and many other primitive components in graphics are built out of arrays. What NumPy represents for that kind of data is amazing. The extra baggage of an API tied to the CPython GIL can be a hard pill to swallow, though, and this is one of the reasons I'm hopeful that as DyND continues maturing, it can make inroads into places NumPy hasn't been able to. Thanks, Mark On Wed, Aug 26, 2015 at 9:45 AM, Irwin Zaid wrote: > Hello everyone, > > Mark and I thought it would be good to weigh in here and also be explicitly > around to discuss DyND. To be clear, neither of us has strong feelings on > what NumPy *should* do -- we are both long-time NumPy users and we both see > NumPy being around for a while. But, as Francesc mentioned, there is also > the open question of where the community should be implementing new > features. It would certainly be nice to not have duplication of effort, but > a decision like that can only arise naturally from a broad consensus. > > Travis covered DyND's history and it's relationship with Continuum pretty > well, so what's really missing here is what DyND is, where it is going, and > how long we think it'll take to get there. We'll try to stick to those > topics. > > We designed DyND to fill what we saw as fundamental gaps in NumPy. These > are > not only missing features, but also limitations of its architecture. Many > of > these gaps have been mentioned several times before in this thread and > elsewhere, but a brief list would include: better support for missing > values, variable-length strings, GPUs, more extensible types, categoricals, > more datetime features, ... Some of these were indeed on Nathaniel's list > and many of them are already working (albeit sometimes partially) in DyND. > > And, yes, we strongly feel that NumPy's fundamental dependence on Python > itself is a limitation. Why should we not take the fantastic success of > NumPy and generalize it across other languages? > > So, we see DyND is having a twofold purpose. The first is to expand upon > the > kinds of data that NumPy can represent and do computations upon. The second > is to provide a standard array package that can cross the language barrier > and easily interoperate between C++, Python, or whatever you want. > > DyND, at the moment, is quite functional in some areas and lacking a bit in > others. There is no doubt that it is still "experimental" and a bit > unstable. But, it has advanced by a lot recently, and we are steadily > working towards something like a version 1.0. In fact, DyND's internal C++ > architecture stabilized some time ago -- what's missing now is really solid > coverage of some common use cases, alongside up-to-date Python bindings and > an easy installation process. All of these are in progress and advancing as > quick as we can make them. > > On the other hand, we are also building out some other features. To give > just one example that might excite people, DyND now has Numba > interoperability -- one can write DyND's equivalent of a ufunc in Python > and, with a single decorator, have a broadcasting or reduction callable > that > gets JITed or (soon) ahead-of-time compiled. > > Over the next few months, we are hopeful that we can get DyND into a state > where it is largely usable by those familiar with NumPy semantics. The > reason why we can be a bit more aggressive in our timeline now is because > of > the great support we are getting from Continuum. > > With all that said, we are happy to be a part of of any broader > conversation > involving NumPy and the community. > > All the best, > > Irwin and Mark > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Wed Aug 26 13:44:19 2015 From: mwwiebe at gmail.com (Mark Wiebe) Date: Wed, 26 Aug 2015 10:44:19 -0700 Subject: [Numpy-discussion] Notes from the numpy dev meeting at scipy 2015 In-Reply-To: <20150826191101.3cf475e1@fsol> References: <20150826191101.3cf475e1@fsol> Message-ID: On Wed, Aug 26, 2015 at 10:11 AM, Antoine Pitrou wrote: > On Wed, 26 Aug 2015 16:45:51 +0000 (UTC) > Irwin Zaid wrote: > > > > So, we see DyND is having a twofold purpose. The first is to expand upon > the > > kinds of data that NumPy can represent and do computations upon. The > second > > is to provide a standard array package that can cross the language > barrier > > and easily interoperate between C++, Python, or whatever you want. > > One possible limitation is that the lingua franca for language > interoperability is C, not C++. DyND doesn't have to be written in C, > but exposing a nice C API may help make it attractive to the various > language runtimes out there. > > (even those languages whose runtime doesn't have a compile-time > interface to C generally have some kind of cffi or ctypes equivalent to > load external C routines at runtime) > I kind of like the path LLVM has chosen here, of a stable C API and an unstable C++ API. This has both pros and cons though, so I'm not sure what will be right for DyND in the long term. -Mark > Regards > > Antoine. > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Wed Aug 26 13:50:47 2015 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 26 Aug 2015 18:50:47 +0100 Subject: [Numpy-discussion] Comments on governance proposal (was: Notes from the numpy dev meeting at scipy 2015) Message-ID: Hi, Splitting this one off too because it's a rather different discussion, although related. On Tue, Aug 25, 2015 at 11:03 AM, Nathaniel Smith wrote: [snip] > Formalizing our governance/decision making > ========================================== > > This was a major focus of discussion. At a high level, the consensus > was to steal IPython's governance document ("IPEP 29") and modify it > to remove its use of a BDFL as a "backstop" to normal community > consensus-based decision, and replace it with a new "backstop" based > on Apache-project-style consensus voting amongst the core team. Here's a plea to avoid a 'core' structure if at all possible. Historically it seems to have some severe risks, and experienced people have blamed this structure for the decline of various projects including NetBSD and Xfree86, summaries here: http://asterisk.dynevor.org/melting-core.html http://asterisk.dynevor.org/xfree-forked.html In short, the core structure seems to be characteristically associated with a conservatism and lack of vision that causes the project to stagnate. There's also evidence from the NetBSD / OpenBSD split [1] and the XFree86 / X.org split [2] - that the core structure can lead to bad decisions being taken in private that no or few members of the core group are prepared to defend. I guess what is happening is that distributed responsibility leads to poor accountability, and therefore poor decisions. So, I hope very much we can avoid that trap in our own governance. Best, Matthew [1] http://mail-index.netbsd.org/netbsd-users/1994/12/23/0000.html [2] http://www.xfree86.org/pipermail/forum/2003-March/001997.html From pav at iki.fi Wed Aug 26 13:58:26 2015 From: pav at iki.fi (Pauli Virtanen) Date: Wed, 26 Aug 2015 20:58:26 +0300 Subject: [Numpy-discussion] Notes from the numpy dev meeting at scipy 2015 In-Reply-To: References: Message-ID: 26.08.2015, 14:14, Francesc Alted kirjoitti: [clip] > 2015-08-25 12:03 GMT+02:00 Nathaniel Smith : >> Let's focus on evolving numpy as far as we can without major >> break-the-world changes (no "numpy 2.0", at least in the foreseeable >> future). >> >> And, as a target for that evolution, let's change our focus from >> numpy as "NumPy is the library that gives you the np.ndarray object >> (plus some attached infrastructure)", to "NumPy provides the >> standard framework for working with arrays and array-like objects in >> Python" > > Sorry to disagree here, but in my opinion NumPy *already* provides the > standard framework for working with arrays and array-like objects in Python > as its huge popularity shows. If what you mean is that there are too many > efforts trying to provide other, specialized data containers (things like > DataFrame in pandas, DataArray/Dataset in xarray or carray/ctable in bcolz > just to mention a few), then let me say that I am of the opinion that there > can't be a silver bullet for tackling all the problems that the PyData > community is facing. My reading of the above was that this was about multimethods, and allowing different types of containers to interoperate beyond the array interface and Python's builtin operator hooks. The exact performance details of course vary, and an algorithm written for in-memory arrays just fails for too large on-disk or distributed arrays. However, a case for a minimal common API probably could be made, esp. in algorithms mainly relying on linear algebra. This is to a degree different from subclassing, as many of the array-like objects you might want do not have a simple strided memory model. Pauli From shoyer at gmail.com Wed Aug 26 13:59:32 2015 From: shoyer at gmail.com (Stephan Hoyer) Date: Wed, 26 Aug 2015 10:59:32 -0700 Subject: [Numpy-discussion] Numpy helper function for __getitem__? In-Reply-To: References: <1440353282711.d9fa3274@Nodemailer> <1440404602.2051.14.camel@sipsolutions.net> Message-ID: Indeed, the helper function I wrote for xray was not designed to handle None/np.newaxis or non-1d Boolean indexers, because those are not valid indexers for xray objects. I think it could be straightforwardly extended to handle None simply by not counting them towards the total number of dimensions. On Tue, Aug 25, 2015 at 8:41 AM, Fabien wrote: > I think that Stephan's function for xray is very useful. A possible > improvement (probably at a certain performance cost) would be to be able > to provide a shape instead of a number of dimensions. The output would > then be slices with valid start and ends. > > Current behavior: > In[9]: expanded_indexer(slice(None), 2) > Out[9]: (slice(None, None, None), slice(None, None, None)) > > With shape: > In[9]: expanded_indexer(slice(None), (3, 4)) > Out[9]: (slice(0, 4, 1), slice(0, 5, 1)) > > But if nobody needed something like this before me, I think that I might > have a design problem in my code (still quite new to python). > Glad you found it helpful! Python's slice object has the indices method which implements this logic, e.g., In [15]: s = slice(None, 10) In [16]: s.indices(100) Out[16]: (0, 10, 1) Cheers, Stephan -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefanv at berkeley.edu Wed Aug 26 18:46:55 2015 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Wed, 26 Aug 2015 15:46:55 -0700 Subject: [Numpy-discussion] Comments on governance proposal (was: Notes from the numpy dev meeting at scipy 2015) In-Reply-To: References: Message-ID: <87y4gxllb4.fsf@berkeley.edu> Hi Matthew On 2015-08-26 10:50:47, Matthew Brett wrote: > In short, the core structure seems to be characteristically > associated with a conservatism and lack of vision that causes > the project to stagnate. Can you describe how a democratic governance structure would look? It's not clear from the discussions linked where successful examples are to be found. Thanks St?fan From daniel.p.bliss at gmail.com Wed Aug 26 18:51:27 2015 From: daniel.p.bliss at gmail.com (Daniel Bliss) Date: Wed, 26 Aug 2015 15:51:27 -0700 Subject: [Numpy-discussion] Defining a white noise process using numpy Message-ID: Hi all, Can anyone give me some advice for translating this equation into code using numpy? eta(t) = lim(dt -> 0) N(0, 1/sqrt(dt)), where N(a, b) is a Gaussian random variable of mean a and variance b**2. This is a heuristic definition of a white noise process. Thanks, Dan -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.v.root at gmail.com Wed Aug 26 21:59:43 2015 From: ben.v.root at gmail.com (Benjamin Root) Date: Wed, 26 Aug 2015 21:59:43 -0400 Subject: [Numpy-discussion] 1.10.0rc1 In-Reply-To: References: <20150826151141.17db3046@fsol> Message-ID: Just a data point, I just tested 1.9.0rc1 (built from source) with matplotlib master, and things appear to be fine there. In fact, matplotlib was built against 1.7.x (I was hunting down a regression), and worked against the 1.9.0 install, so the ABI appears intact. Cheers! Ben Root On Wed, Aug 26, 2015 at 9:52 AM, Charles R Harris wrote: > > > On Wed, Aug 26, 2015 at 7:32 AM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Wed, Aug 26, 2015 at 7:31 AM, Charles R Harris < >> charlesr.harris at gmail.com> wrote: >> >>> >>> >>> On Wed, Aug 26, 2015 at 7:11 AM, Antoine Pitrou >>> wrote: >>> >>>> On Tue, 25 Aug 2015 10:26:02 -0600 >>>> Charles R Harris wrote: >>>> > Hi All, >>>> > >>>> > The silence after the 1.10 beta has been eerie. Consequently, I'm >>>> thinking >>>> > of making a first release candidate this weekend. If you haven't yet >>>> tested >>>> > the beta, please do so. It would be good to discover as many problems >>>> as we >>>> > can before the first release. >>>> >>>> Has typing of ufunc parameters become much stricter? I can't find >>>> anything in the release notes, but see (1.10b1): >>>> >>>> >>> arr = np.linspace(0, 5, 10) >>>> >>> out = np.empty_like(arr, dtype=np.intp) >>>> >>> np.round(arr, out=out) >>>> Traceback (most recent call last): >>>> File "", line 1, in >>>> File >>>> "/home/antoine/np110/lib/python3.4/site-packages/numpy/core/fromnumeric.py", >>>> line 2778, in round_ >>>> return round(decimals, out) >>>> TypeError: ufunc 'rint' output (typecode 'd') could not be coerced to >>>> provided output parameter (typecode 'l') according to the casting rule >>>> ''same_kind'' >>>> >>>> >>>> It used to work (1.9): >>>> >>>> >>> arr = np.linspace(0, 5, 10) >>>> >>> out = np.empty_like(arr, dtype=np.intp) >>>> >>> np.round(arr, out=out) >>>> array([0, 1, 1, 2, 2, 3, 3, 4, 4, 5]) >>>> >>> out >>>> array([0, 1, 1, 2, 2, 3, 3, 4, 4, 5]) >>>> >>> >>> The default casting mode has been changed. I think this has been raising >>> a warning since 1.7 and was mentioned as a future change in 1.10, but you >>> are right, it needs to be mentioned in the 1.10 release notes. >>> >> >> Make that warned of in the 1.9.0 release notes. >> >> > Here it is in 1.9.0 with deprecation warning made visible. > ``` > In [3]: import warnings > > In [4]: warnings.simplefilter('always') > > In [5]: arr = np.linspace(0, 5, 10) > > In [6]: out = np.empty_like(arr, dtype=np.intp) > > In [7]: np.round(arr, out=out) > /home/charris/.local/lib/python2.7/site-packages/numpy/core/fromnumeric.py:2640: > DeprecationWarning: Implicitly casting between incompatible kinds. In a > future numpy release, this will raise an error. Use casting="unsafe" if > this is intentional. > return round(decimals, out) > Out[7]: array([0, 1, 1, 2, 2, 3, 3, 4, 4, 5]) > ``` > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Wed Aug 26 22:52:53 2015 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 26 Aug 2015 19:52:53 -0700 Subject: [Numpy-discussion] 1.10.0rc1 In-Reply-To: References: <20150826151141.17db3046@fsol> Message-ID: On Aug 26, 2015 7:03 PM, "Benjamin Root" wrote: > > Just a data point, I just tested 1.9.0rc1 (built from source) with matplotlib master, and things appear to be fine there. In fact, matplotlib was built against 1.7.x (I was hunting down a regression), and worked against the 1.9.0 install, so the ABI appears intact. 1.9, or 1.10? -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.v.root at gmail.com Wed Aug 26 23:01:55 2015 From: ben.v.root at gmail.com (Benjamin Root) Date: Wed, 26 Aug 2015 23:01:55 -0400 Subject: [Numpy-discussion] 1.10.0rc1 In-Reply-To: References: <20150826151141.17db3046@fsol> Message-ID: Aw, crap... I looked at the list of tags and saw the rc1... I'll test again in the morning.... Grumble, grumble... On Aug 26, 2015 10:53 PM, "Nathaniel Smith" wrote: > On Aug 26, 2015 7:03 PM, "Benjamin Root" wrote: > > > > Just a data point, I just tested 1.9.0rc1 (built from source) with > matplotlib master, and things appear to be fine there. In fact, matplotlib > was built against 1.7.x (I was hunting down a regression), and worked > against the 1.9.0 install, so the ABI appears intact. > > 1.9, or 1.10? > > -n > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Thu Aug 27 02:33:51 2015 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 26 Aug 2015 23:33:51 -0700 Subject: [Numpy-discussion] [DRAFT] numpy governance document Message-ID: Hi all, Here's a first draft of a governance document for NumPy. A few people have seen sneak peeks and have suggested possibly reorganizing it either by taking some of the "how consensus works" stuff out into a separate document, or alternatively keeping that stuff in the foreground and moving some of the more legalistic bits into footnotes or something. I think there's some value in having a single document that sorta lays out everything a new contributor needs to know, both how things work normally and how they work when we need to fall back on formal processes, and have tried to make the different aspects fit together better in this draft, but I'm happy to reorganize however if that's what people want -- I just figured that I should at least get this out so the content is available for discussion, even if the form isn't perfect yet :-). In case it's useful for context, large chunks of text are taken from the Jupyter/IPython project's IPEP 29: https://github.com/ipython/ipython/wiki/IPEP-29:-Project-Governance There are some random tweaks throughout, but the parts that are mostly new are the "Summary", "Consensus-based decision making by the community", and "Council decision making" sections. -n --------------------- The purpose of this document is to formalize the governance process used by the NumPy project in both ordinary and extraordinary situations, and to clarify how decisions are made and how the various elements of our community interact, including the relationship between open source collaborative development and work that may be funded by for-profit or non-profit entities. Summary ======= NumPy is a community-owned and community-run project. To the maximum extent possible, decisions about project direction are made by community consensus (but note that "consensus" here has a somewhat technical meaning that might not match everyone's expectations -- see below). Some members of the community additionally contribute by serving on the NumPy steering council, where they are responsible for facilitating the establishment of community consensus, for stewarding project resources, and -- in extreme cases -- for making project decisions if the normal community-based process breaks down. The Project =========== The NumPy Project (The Project) is an open source software project affiliated with the 501(c)3 NumFocus Foundation. The goal of The Project is to develop open source software for array-based computing in Python, and in particular the `numpy` package, along with related software such as `f2py` and the NumPy Sphinx extensions. The Software developed by The Project is released under the BSD (or similar) open source license, developed openly and hosted on public GitHub repositories under the `numpy` GitHub organization. The Project is developed by a team of distributed developers, called Contributors. Contributors are individuals who have contributed code, documentation, designs or other work to the Project. Anyone can be a Contributor. Contributors can be affiliated with any legal entity or none. Contributors participate in the project by submitting, reviewing and discussing GitHub Pull Requests and Issues and participating in open and public Project discussions on GitHub, mailing lists, and other channels. The foundation of Project participation is openness and transparency. Here is a list of the current Contributors to the main NumPy repository: [https://github.com/numpy/numpy/graphs/contributors](https://github.com/numpy/numpy/graphs/contributors) The Project Community consists of all Contributors and Users of the Project. Contributors work on behalf of and are responsible to the larger Project Community and we strive to keep the barrier between Contributors and Users as low as possible. The Project is formally affiliated with the 501(c)3 NumFOCUS Foundation ([http://numfocus.org](http://numfocus.org)), which serves as its fiscal sponsor, may hold project trademarks and other intellectual property, helps manage project donations and acts as a parent legal entity. NumFOCUS is the only legal entity that has a formal relationship with the project (see Institutional Partners section below). Governance ========== This section describes the governance and leadership model of The Project. The foundations of Project governance are: - Openness & Transparency - Active Contribution - Institutional Neutrality Consensus-based decision making by the community ------------------------------------------------ Normally, all project decisions will be made by consensus of all interested Contributors. The primary goal of this approach is to ensure that the people who are most affected by and involved in any given change can contribute their knowledge in the confidence that their voices will be heard, because thoughtful review from a broad community is the best mechanism we know of for creating high-quality software. The mechanism we use to accomplish this goal may be unfamiliar for those who are not experienced with the cultural norms around free/open-source software development. We provide a summary here, and highly recommend that all Contributors additionally read [Chapter 4: Social and Political Infrastructure](http://producingoss.com/en/producingoss.html#social-infrastructure) of Karl Fogel's classic *Producing Open Source Software*, and in particular the section on [Consensus-based Democracy](http://producingoss.com/en/producingoss.html#consensus-democracy), for a more detailed discussion. In this context, consensus does *not* require: - that we wait to solicit everybody's opinion on every change, - that we ever hold a vote on anything, - or that everybody is happy or agrees with every decision. For us, what consensus means is that we entrust *everyone* with the right to veto any change if they feel it necessary. While this may sound like a recipe for obstruction and pain, this is not what happens. Instead, we find that most people take this responsibility seriously, and only invoke their veto when they judge that a serious problem is being ignored, and that their veto is necessary to protect the project. And in practice, it turns out that such vetoes are almost never formally invoked, because their mere possibility ensures that Contributors are motivated from the start to find some solution that everyone can live with -- thus accomplishing our goal of ensuring that all interested perspectives are taken into account. How do we know when consensus has been achieved? In principle, this is rather difficult, since consensus is defined by the absence of vetos, which requires us to somehow prove a negative. In practice, we use a combination of our best judgement (e.g., a simple and uncontroversial bug fix posted on GitHub and reviewed by a core developer is probably fine) and best efforts (e.g., all substantive API changes must be posted to the mailing list in order to give the broader community a chance to catch any problems and suggest improvements; we assume that anyone who cares enough about NumPy to invoke their veto right should be on the mailing list). If no-one bothers to comment on the mailing list after a few days, then it's probably fine. And worst case, if a change is more controversial than expected, or a crucial critique is delayed because someone was on vacation, then it's no big deal: we apologize for misjudging the situation, [back up, and sort things out](http://producingoss.com/en/producingoss.html#version-control-relaxation). If one does need to invoke a formal veto, then it should consist of: - an unambiguous statement that a veto is being invoked, - an explanation of why it is being invoked, and - a description of what conditions (if any) would convince the vetoer to withdraw their veto. If all proposals for resolving some issue are vetoed, then the status quo wins by default. In the worst case, if a Contributor is genuinely misusing their veto in an obstructive fashion to the detriment of the project, then they can be ejected from the project by consensus of the Steering Council -- see below. Steering Council ---------------- The Project will have a Steering Council that consists of Project Contributors who have produced contributions that are substantial in quality and quantity, and sustained over at least one year. The overall role of the Council is to ensure, with input from the Community, the long-term well-being of the project, both technically and as a community. During the everyday project activities, council members participate in all discussions, code review and other project activities as peers with all other Contributors and the Community. In these everyday activities, Council Members do not have any special power or privilege through their membership on the Council. However, it is expected that because of the quality and quantity of their contributions and their expert knowledge of the Project Software and Services that Council Members will provide useful guidance, both technical and in terms of project direction, to potentially less experienced contributors. The Steering Council and its Members play a special role in certain situations. In particular, the Council may, if necessary: - Make decisions about the overall scope, vision and direction of the project. - Make decisions about strategic collaborations with other organizations or individuals. - Make decisions about specific technical issues, features, bugs and pull requests. They are the primary mechanism of guiding the code review process and merging pull requests. - Make decisions about the Services that are run by The Project and manage those Services for the benefit of the Project and Community. - Update policy documents such as this one. - Make decisions when regular community discussion doesn?t produce consensus on an issue in a reasonable time frame. However, the Council's primary responsibility is to facilitate the ordinary community-based decision making procedure described above. If we ever have to step in and formally override the community for the health of the Project, then we will do so, but we will consider reaching this point to indicate a failure in our leadership. ### Council decision making If it becomes necessary for the Steering Council to produce a formal decision, then they will use a form of the [Apache Foundation voting process](https://www.apache.org/foundation/voting.html). This is a formalized version of consensus, in which +1 votes indicate agreement, -1 votes are vetoes (and must be accompanied with a rationale, as above), and one can also vote fractionally (e.g. -0.5, +0.5) if one wishes to express an opinion without registering a full veto. These numeric votes are also often used informally as a way of getting a general sense of people's feelings on some issue, and should not normally be taken as formal votes. A formal vote only occurs if explicitly declared, and if this does occur then the vote should be held open for long enough to give all interested Council Members a chance to respond -- at least one week. In practice, we anticipate that for most Steering Council decisions (e.g., voting in new members) a more informal process will suffice. ### Council membership To become eligible to join the Steering Council, an individual must be a Project Contributor who has produced contributions that are substantial in quality and quantity, and sustained over at least one year. Potential Council Members are nominated by existing Council members and voted upon by the existing Council after asking if the potential Member is interested and willing to serve in that capacity. The Council will be initially formed from the set of existing Core Developers who, as of late 2015, have been significantly active over the last year. When considering potential Members, the Council will look at candidates with a comprehensive view of their contributions. This will include but is not limited to code, code review, infrastructure work, mailing list and chat participation, community help/building, education and outreach, design work, etc. We are deliberately not setting arbitrary quantitative metrics (like ?100 commits in this repo?) to avoid encouraging behavior that plays to the metrics rather than the project?s overall well-being. We want to encourage a diverse array of backgrounds, viewpoints and talents in our team, which is why we explicitly do not define code as the sole metric on which council membership will be evaluated. If a Council member becomes inactive in the project for a period of one year, they will be considered for removal from the Council. Before removal, inactive Member will be approached to see if they plan on returning to active participation. If not they will be removed immediately upon a Council vote. If they plan on returning to active participation soon, they will be given a grace period of one year. If they don?t return to active participation within that time period they will be removed by vote of the Council without further grace period. All former Council members can be considered for membership again at any time in the future, like any other Project Contributor. Retired Council members will be listed on the project website, acknowledging the period during which they were active in the Council. The Council reserves the right to eject current Members, if they are deemed to be actively harmful to the project?s well-being, and attempts at communication and conflict resolution have failed. This requires the consensus of the remaining Members. [We also have to decide on the initial membership for the Council. While the above text makes pains to distinguish between "committer" and "Council Member", in the past we've pretty much treated them as the same. So to keep things simple and deterministic, I propose that we seed the Council with everyone who has reviewed/merged a pull request since Jan 1, 2014, and move those who haven't used their commit bit in >1.5 years to the emeritus list. Based on the output of git log --grep="^Merge pull request" --since 2014-01-01 | grep Author: | sort -u I believe this would give us an initial Steering Council of: @argriffing, Sebastian Berg, Jaime Fern?ndez del R?o, Ralf Gommers, Charles Harris, Nathaniel Smith, Julian Taylor, and Pauli Virtanen (assuming everyone on that list is interested/willing to serve).] ### Conflict of interest It is expected that the Council Members will be employed at a wide range of companies, universities and non-profit organizations. Because of this, it is possible that Members will have conflict of interests. Such conflict of interests include, but are not limited to: - Financial interests, such as investments, employment or contracting work, outside of The Project that may influence their work on The Project. - Access to proprietary information of their employer that could potentially leak into their work with the Project. All members of the Council shall disclose to the rest of the Council any conflict of interest they may have. Members with a conflict of interest in a particular issue may participate in Council discussions on that issue, but must recuse themselves from voting on the issue. ### Private communications of the Council Unless specifically required, all Council discussions and activities will be public and done in collaboration and discussion with the Project Contributors and Community. The Council will have a private mailing list that will be used sparingly and only when a specific matter requires privacy. When private communications and decisions are needed, the Council will do its best to summarize those to the Community after eliding personal/private/sensitive information that should not be posted to the public internet. ### Subcommittees The Council can create subcommittees that provide leadership and guidance for specific aspects of the project. Like the Council as a whole, subcommittees should conduct their business in an open and public manner unless privacy is specifically called for. Private subcommittee communications should happen on the main private mailing list of the Council unless specifically called for. ### NumFOCUS Subcommittee The Council will maintain one narrowly focused subcommittee to manage its interactions with NumFOCUS. - The NumFOCUS Subcommittee is comprised of 5 persons who manage project funding that comes through NumFOCUS. It is expected that these funds will be spent in a manner that is consistent with the non-profit mission of NumFOCUS and the direction of the Project as determined by the full Council. - This Subcommittee shall NOT make decisions about the direction, scope or technical direction of the Project. - This Subcommittee will have 5 members, 4 of whom will be current Council Members and 1 of whom will be external to the Steering Council. No more than 2 Subcommitee Members can report to one person through employment or contracting work (including the reportee, i.e. the reportee + 1 is the max). This avoids effective majorities resting on one person. [Initially, the NumFOCUS subcommittee will consist of: Chuck Harris, Ralf Gommers, Nathaniel Smith, and ???? as internal members, and Thomas Caswell as external member.] Institutional Partners and Funding ================================== The Steering Council are the primary leadership for the project. No outside institution, individual or legal entity has the ability to own, control, usurp or influence the project other than by participating in the Project as Contributors and Council Members. However, because institutions can be an important funding mechanism for the project, it is important to formally acknowledge institutional participation in the project. These are Institutional Partners. An Institutional Contributor is any individual Project Contributor who contributes to the project as part of their official duties at an Institutional Partner. Likewise, an Institutional Council Member is any Project Steering Council Member who contributes to the project as part of their official duties at an Institutional Partner. With these definitions, an Institutional Partner is any recognized legal entity in the United States or elsewhere that employs at least 1 Institutional Contributor of Institutional Council Member. Institutional Partners can be for-profit or non-profit entities. Institutions become eligible to become an Institutional Partner by employing individuals who actively contribute to The Project as part of their official duties. To state this another way, the only way for a Partner to influence the project is by actively contributing to the open development of the project, in equal terms to any other member of the community of Contributors and Council Members. Merely using Project Software in institutional context does not allow an entity to become an Institutional Partner. Financial gifts do not enable an entity to become an Institutional Partner. Once an institution becomes eligible for Institutional Partnership, the Steering Council must nominate and approve the Partnership. If an existing Institutional Partner no longer has a contributing employee, they will be given a 1 year grace period for remaining employees to begin contributing. An Institutional Partner is free to pursue funding for their work on The Project through any legal means. This could involve a non-profit organization raising money from private foundations and donors or a for-profit company building proprietary products and services that leverage Project Software and Services. Funding acquired by Institutional Partners to work on The Project is called Institutional Funding. However, no funding obtained by an Institutional Partner can override the Steering Council. If a Partner has funding to do NumPy work and the Council decides to not pursue that work as a project, the Partner is free to pursue it on their own. However in this situation, that part of the Partner?s work will not be under the NumPy umbrella and cannot use the Project trademarks in a way that suggests a formal relationship. Institutional Partner benefits are: - Acknowledgement on the NumPy websites, in talks and T-shirts. - Ability to acknowledge their own funding sources on the NumPy websites, in talks and T-shirts. - Ability to influence the project through the participation of their Council Member. - Council Members invited to NumPy Developer Meetings. Existing Institutional Partners: - UC Berkeley (Nathaniel Smith) Acknowledgements ================ Substantial portions of this document were ~~inspired~~ stolen wholesale from the Jupyter/IPython project's governance document, [IPEP 29](https://github.com/ipython/ipython/wiki/IPEP-29:-Project-Governance). -- Nathaniel J. Smith -- http://vorpus.org From matthew.brett at gmail.com Thu Aug 27 04:36:27 2015 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 27 Aug 2015 09:36:27 +0100 Subject: [Numpy-discussion] Comments on governance proposal (was: Notes from the numpy dev meeting at scipy 2015) In-Reply-To: <87y4gxllb4.fsf@berkeley.edu> References: <87y4gxllb4.fsf@berkeley.edu> Message-ID: Hi, On Wed, Aug 26, 2015 at 11:46 PM, Stefan van der Walt wrote: > Hi Matthew > > On 2015-08-26 10:50:47, Matthew Brett > wrote: >> In short, the core structure seems to be characteristically >> associated with a conservatism and lack of vision that causes >> the project to stagnate. > > Can you describe how a democratic governance structure would look? > It's not clear from the discussions linked where successful > examples are to be found. Ah yes - as I was writing at the top of the xfree86 summary, it's difficult to assess governance models, because you cannot tell if a project that has a particular governance model would have been more successful with another model. For example, would clang be competing so successfully with gcc, if gcc had had a different governance model? Would Apache be further ahead of the many competitors in the web-server space with different management? Difficult to know. The advantage of studying forks is that they usually arise from disagreements about how a project is managed. All other things being equal, we might expect a fork to fail, given the general aversion to forks and the considerable new work that has to be done to get one going. So, if a fork succeeds in the long term, that is probably an indication that the governance / management of the fork is indeed an improvement on the previous model. So, in answer to your question, it's difficult to know if a particular governance model is successful. It isn't enough that a project has lasted, or is still active, because there are so many factors in play. On the other hand, I think it is possible to point to models that have a tendency to fail in particular ways, and the by-invitation meritocratic 'core' group (I think this is close to the 'steering committee' in our current draft) is the model that failed for NetBSD and XFree86, with a particular pattern of poor or absent accountability and lack of project vision. Cheers, Matthew From matthew.brett at gmail.com Thu Aug 27 04:44:05 2015 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 27 Aug 2015 09:44:05 +0100 Subject: [Numpy-discussion] [DRAFT] numpy governance document In-Reply-To: References: Message-ID: Hi, On Thu, Aug 27, 2015 at 7:33 AM, Nathaniel Smith wrote: > Hi all, > > Here's a first draft of a governance document for NumPy. Thanks for this. I wasn't sure from your email whether you were asking for feedback as to whether this was the right governance model? I mean that - for code - I think the usual procedure would be to discuss various potential solutions on the mailing list, and then follow up with something like a NEP that lays out the various alternatives with their pros and cons. But I have the impression here that you consider the general form to be set, and that you are asking for comments on the detail. Is that right? Cheers, Matthew From bryanv at continuum.io Thu Aug 27 05:15:36 2015 From: bryanv at continuum.io (Bryan Van de Ven) Date: Thu, 27 Aug 2015 10:15:36 +0100 Subject: [Numpy-discussion] Comments on governance proposal (was: Notes from the numpy dev meeting at scipy 2015) In-Reply-To: References: <87y4gxllb4.fsf@berkeley.edu> Message-ID: > On Aug 27, 2015, at 9:36 AM, Matthew Brett wrote: > > > So, in answer to your question, it's difficult to know if a particular > governance model is successful. It isn't enough that a project has > lasted, or is still active, because there are so many factors in play. > On the other hand, I think it is possible to point to models that > have a tendency to fail in particular ways, and the by-invitation > meritocratic 'core' group (I think this is close to the 'steering > committee' in our current draft) is the model that failed for NetBSD > and XFree86, with a particular pattern of poor or absent > accountability and lack of project vision. Anecdotes about two projects is not compelling evidence of anything unless you can also point to a comparison of the corresponding success rate. Two failures out of three is suggestive. Two failures out of three hundred is significantly less interesting. More useful would be actual details of an alternative proposal or pointers to examples of alternative arrangements that could be modeled. Bryan From matthew.brett at gmail.com Thu Aug 27 05:16:31 2015 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 27 Aug 2015 10:16:31 +0100 Subject: [Numpy-discussion] Comments on governance proposal (was: Notes from the numpy dev meeting at scipy 2015) In-Reply-To: References: <87y4gxllb4.fsf@berkeley.edu> Message-ID: On Thu, Aug 27, 2015 at 9:36 AM, Matthew Brett wrote: > Hi, > > On Wed, Aug 26, 2015 at 11:46 PM, Stefan van der Walt > wrote: >> Hi Matthew >> >> On 2015-08-26 10:50:47, Matthew Brett >> wrote: >>> In short, the core structure seems to be characteristically >>> associated with a conservatism and lack of vision that causes >>> the project to stagnate. >> >> Can you describe how a democratic governance structure would look? >> It's not clear from the discussions linked where successful >> examples are to be found. > > Ah yes - as I was writing at the top of the xfree86 summary, it's > difficult to assess governance models, because you cannot tell if a > project that has a particular governance model would have been more > successful with another model. For example, would clang be competing > so successfully with gcc, if gcc had had a different governance model? > Would Apache be further ahead of the many competitors in the > web-server space with different management? Difficult to know. > > The advantage of studying forks is that they usually arise from > disagreements about how a project is managed. All other things being > equal, we might expect a fork to fail, given the general aversion to > forks and the considerable new work that has to be done to get one > going. So, if a fork succeeds in the long term, that is probably an > indication that the governance / management of the fork is indeed an > improvement on the previous model. > > So, in answer to your question, it's difficult to know if a particular > governance model is successful. It isn't enough that a project has > lasted, or is still active, because there are so many factors in play. > On the other hand, I think it is possible to point to models that > have a tendency to fail in particular ways, and the by-invitation > meritocratic 'core' group (I think this is close to the 'steering > committee' in our current draft) is the model that failed for NetBSD > and XFree86, with a particular pattern of poor or absent > accountability and lack of project vision. Sorry to follow up on my own email, but: I'm just speculating here, without data, but I suspect one the key elements that led to the decline and fall of NetBSD and XFree86 was the perception that there was no way for the community to depose the government. It seems these projects managed to combine aspects of the dictatorship model, with lots of emphasis on personal loyalty and expected gratitude, with a dysfunctional oligarchy, in which no-one felt able or willing to change the project direction, when the project was failing. The other problem with the meritocracy / invitation model, is that some people are terrible managers. In the XFree86 project, for example, I think David Dawes did a terrible job of guiding the project when it ran into trouble. He was in the position he was in because of his huge commitment and contributions to the project, but I think he was not the right person to manage the project. The standard 'core' model, doesn't take that into account. For example, I suspect that, if we had a David Dawes, no matter how terrible we thought they were at managing, we would feel obliged to put them onto the steering committee. It is much easier to count or review commits than it is to assess someone for their qualities as a leader or manager. We most of us would hate to be the person to make that assessment, and it's very tempting to negotiate ourselves into a world-view where this assessment is not necessary. So, I speculate, that a good governance model would have: * one 'president' who has to take final responsibility for all decisions; * this president might well have a fixed term, maybe with limits on the number of terms they can serve. * the president would be chosen by community vote and explicitly on the basis that they were good managers as well as coders; * for the presidential election, the candidates should set out what their vision for the project is, and how they plan to achieve that vision; The point about these features is that we explicitly emphasize accountability, vision and management ability. Instead of a small number of people being in the position of assessing their peers for their ability to manage, the whole community (somehow defined) takes responsibility for that assessment, therefore making it easier to think about without distracting issues of personal loyalty or implied obligation. See you, Matthew From matthew.brett at gmail.com Thu Aug 27 05:22:32 2015 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 27 Aug 2015 10:22:32 +0100 Subject: [Numpy-discussion] Comments on governance proposal (was: Notes from the numpy dev meeting at scipy 2015) In-Reply-To: References: <87y4gxllb4.fsf@berkeley.edu> Message-ID: Hi, On Thu, Aug 27, 2015 at 10:15 AM, Bryan Van de Ven wrote: > >> On Aug 27, 2015, at 9:36 AM, Matthew Brett wrote: >> >> >> So, in answer to your question, it's difficult to know if a particular >> governance model is successful. It isn't enough that a project has >> lasted, or is still active, because there are so many factors in play. >> On the other hand, I think it is possible to point to models that >> have a tendency to fail in particular ways, and the by-invitation >> meritocratic 'core' group (I think this is close to the 'steering >> committee' in our current draft) is the model that failed for NetBSD >> and XFree86, with a particular pattern of poor or absent >> accountability and lack of project vision. > > Anecdotes about two projects is not compelling evidence of anything unless you can also point to a comparison of the corresponding success rate. Two failures out of three is suggestive. Two failures out of three hundred is significantly less interesting. More useful would be actual details of an alternative proposal or pointers to examples of alternative arrangements that could be modeled. > Unfortunately, I don't think we have much choice but to do our best in sifting through the anecdotal evidence we have available, weak and contradictory as it is. Successful forks in large projects are pretty rare, and as I was arguing before, they are particularly useful as evidence about governance models. In the case of the 'core' model, we have some compelling testimony from someone with a great deal of experience: """ Much of this early structure (CVS, web site, cabal ["core" group], etc.) was copied verbatim by other open source (this term not being in wide use yet) projects -- even the form of the project name and the term "core". This later became a kind of standard template for starting up an open source project. [...] I'm sorry to say that I helped create this problem, and that most of the projects which modeled themselves after NetBSD (probably due to its high popularity in 1993 and 1994) have suffered similar problems. FreeBSD and XFree86, for example, have both forked successor projects (Dragonfly and X.org) for very similar reasons. """ http://mail-index.netbsd.org/netbsd-users/2006/08/30/0016.html Cheers, Matthew From bryanv at continuum.io Thu Aug 27 05:35:01 2015 From: bryanv at continuum.io (Bryan Van de Ven) Date: Thu, 27 Aug 2015 10:35:01 +0100 Subject: [Numpy-discussion] Comments on governance proposal (was: Notes from the numpy dev meeting at scipy 2015) In-Reply-To: References: <87y4gxllb4.fsf@berkeley.edu> Message-ID: > On Aug 27, 2015, at 10:22 AM, Matthew Brett wrote: > > In the case of the 'core' model, we have some compelling testimony > from someone with a great deal of experience: > > """ > Much of this early structure (CVS, web site, cabal ["core" group], > etc.) was copied verbatim by other open source (this term not being in > wide use yet) projects -- even the form of the project name and the > term "core". This later became a kind of standard template for > starting up an open source project. [...] I'm sorry to say that I > helped create this problem, and that most of the projects which > modeled themselves after NetBSD (probably due to its high popularity > in 1993 and 1994) have suffered similar problems. FreeBSD and XFree86, > for example, have both forked successor projects (Dragonfly and X.org) > for very similar reasons. > """ Who goes on to propose: 7) The "core" group must be replaced with people who are actually competent and dedicated enough to review proposals, accept feedback, and make good decisions. More to the point, though, the "core" group must only act when *needed* -- most technical decisions should be left to the community to hash out; it must not preempt the community from developing better solutions. (This is how the "core" group worked during most of the project's growth period.) Which, AFAICT, is exactly in line with the Numpy proposal: """ During the everyday project activities, council members participate in all discussions, code review and other project activities as peers with all other Contributors and the Community. In these everyday activities, Council Members do not have any special power or privilege through their membership on the Council. ... However, the Council's primary responsibility is to facilitate the ordinary community-based decision making procedure described above. If we ever have to step in and formally override the community for the health of the Project, then we will do so, but we will consider reaching this point to indicate a failure in our leadership. """ Bryan From matthew.brett at gmail.com Thu Aug 27 05:45:48 2015 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 27 Aug 2015 10:45:48 +0100 Subject: [Numpy-discussion] Comments on governance proposal (was: Notes from the numpy dev meeting at scipy 2015) In-Reply-To: References: <87y4gxllb4.fsf@berkeley.edu> Message-ID: Hi, On Thu, Aug 27, 2015 at 10:35 AM, Bryan Van de Ven wrote: > >> On Aug 27, 2015, at 10:22 AM, Matthew Brett wrote: >> >> In the case of the 'core' model, we have some compelling testimony >> from someone with a great deal of experience: >> >> """ >> Much of this early structure (CVS, web site, cabal ["core" group], >> etc.) was copied verbatim by other open source (this term not being in >> wide use yet) projects -- even the form of the project name and the >> term "core". This later became a kind of standard template for >> starting up an open source project. [...] I'm sorry to say that I >> helped create this problem, and that most of the projects which >> modeled themselves after NetBSD (probably due to its high popularity >> in 1993 and 1994) have suffered similar problems. FreeBSD and XFree86, >> for example, have both forked successor projects (Dragonfly and X.org) >> for very similar reasons. >> """ > > Who goes on to propose: > > 7) The "core" group must be replaced with people who are actually > competent and dedicated enough to review proposals, accept feedback, > and make good decisions. More to the point, though, the "core" group > must only act when *needed* -- most technical decisions should be > left to the community to hash out; it must not preempt the community > from developing better solutions. (This is how the "core" group > worked during most of the project's growth period.) Sure. I think it's reasonable to give high weight to Hannum's assessment of the failure of the core group, but less weight to his proposal for a replacement, because at the time, I don't believe he was in a good position to assess whether his (apparent) alternative would run into the same trouble. It's always tempting to blame the people rather than the system, but in this case, I strongly suspect that it was the system that was fundamentally flawed, therefore either promoting the wrong people or putting otherwise competent people into situations where they are no longer getting useful feedback. It would be great, and very convenient, if the only management we needed was getting out of the way, but I doubt very much that that is the case. Cheers, Matthew From njs at pobox.com Thu Aug 27 06:05:11 2015 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 27 Aug 2015 03:05:11 -0700 Subject: [Numpy-discussion] Comments on governance proposal (was: Notes from the numpy dev meeting at scipy 2015) In-Reply-To: References: <87y4gxllb4.fsf@berkeley.edu> Message-ID: On Thu, Aug 27, 2015 at 2:16 AM, Matthew Brett wrote: > So, I speculate, that a good governance model would have: > > * one 'president' who has to take final responsibility for all decisions; > * this president might well have a fixed term, maybe with limits on > the number of terms they can serve. > * the president would be chosen by community vote and explicitly on > the basis that they were good managers as well as coders; > * for the presidential election, the candidates should set out what > their vision for the project is, and how they plan to achieve that > vision; We actually discussed some variants on this kind of idea at the meeting, and I think the general sense of those present was we didn't want to go there (for whatever that's worth). At least personally, I have to admit that the idea of a governance model involving elections fills me with creeping horror. The reason is that whole point of having a governance model (IMHO) is to (a) minimize the rise of interpersonal drama, (b) when some amount of interpersonal drama does inevitably arise anyway, provide some regulated channel for it, hopefully one that leads to a drama sink. But elections are a huge massive drama source. No-one wants to spend time campaigning or wondering how some technical proposal will effect their re-election chances, we want to get this sorted out so that we can stop thinking about it and go back to solving actually interesting problems... As for evidence... there are obviously projects that have had serious problems with some variant of core team model, but there are also many many successful projects that are also using variants of this model, and the document I sent around attempts to incorporate the lessons that have been learned in the process. OTOH after wracking my brain I think the only project I'm familiar with that has elections at all like this is Fedora, which elects... a "core team" (FESCo). Given that we don't have the problem of trying to manage thousands of contributors, I'm not sure their experience is really relevant. Or I guess Debian's use of General Resolutions as a decision-making procedure of last resort is kinda relevant, but... pretty different. (They also elect the project leader, which is more similar to what you're describing, but the project leader has no technical authority; in Debian the final authority short of a GR is the CTTE, which is explicitly designed as a classic beholden-to-nobody institution -- and even overriding the CTTE requires a supermajority.) I kinda feel like... as a rule of thumb, if your description of your governance model starts with the words "I speculate that...", then NumPy is probably not a good project to use for your experiment? -n -- Nathaniel J. Smith -- http://vorpus.org From njs at pobox.com Thu Aug 27 06:10:32 2015 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 27 Aug 2015 03:10:32 -0700 Subject: [Numpy-discussion] [DRAFT] numpy governance document In-Reply-To: References: Message-ID: On Thu, Aug 27, 2015 at 1:44 AM, Matthew Brett wrote: > Hi, > > On Thu, Aug 27, 2015 at 7:33 AM, Nathaniel Smith wrote: >> Hi all, >> >> Here's a first draft of a governance document for NumPy. > > Thanks for this. > > I wasn't sure from your email whether you were asking for feedback as > to whether this was the right governance model? > > I mean that - for code - I think the usual procedure would be to > discuss various potential solutions on the mailing list, and then > follow up with something like a NEP that lays out the various > alternatives with their pros and cons. But I have the impression here > that you consider the general form to be set, and that you are asking > for comments on the detail. Is that right? I believe that the draft I sent around does reflect the consensus of those who were present at the dev meeting, but (as the document itself emphasizes!) of course it's helpful to hear critiques and concerns and ideas for how to do better... -- Nathaniel J. Smith -- http://vorpus.org From matthew.brett at gmail.com Thu Aug 27 06:15:44 2015 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 27 Aug 2015 11:15:44 +0100 Subject: [Numpy-discussion] [DRAFT] numpy governance document In-Reply-To: References: Message-ID: Hi, On Thu, Aug 27, 2015 at 11:10 AM, Nathaniel Smith wrote: > On Thu, Aug 27, 2015 at 1:44 AM, Matthew Brett wrote: >> Hi, >> >> On Thu, Aug 27, 2015 at 7:33 AM, Nathaniel Smith wrote: >>> Hi all, >>> >>> Here's a first draft of a governance document for NumPy. >> >> Thanks for this. >> >> I wasn't sure from your email whether you were asking for feedback as >> to whether this was the right governance model? >> >> I mean that - for code - I think the usual procedure would be to >> discuss various potential solutions on the mailing list, and then >> follow up with something like a NEP that lays out the various >> alternatives with their pros and cons. But I have the impression here >> that you consider the general form to be set, and that you are asking >> for comments on the detail. Is that right? > > I believe that the draft I sent around does reflect the consensus of > those who were present at the dev meeting, but (as the document itself > emphasizes!) of course it's helpful to hear critiques and concerns and > ideas for how to do better... I imagine this document was designed to be uncontroversial, in the sense that it more or less formalizes the status quo? I think it would be useful to set out alternatives that were or could be considered. It would be a shame to drift into the wrong governance model for lack of considering others. Cheers, Matthew From matthew.brett at gmail.com Thu Aug 27 06:21:54 2015 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 27 Aug 2015 11:21:54 +0100 Subject: [Numpy-discussion] Comments on governance proposal (was: Notes from the numpy dev meeting at scipy 2015) In-Reply-To: References: <87y4gxllb4.fsf@berkeley.edu> Message-ID: Hi, On Thu, Aug 27, 2015 at 11:05 AM, Nathaniel Smith wrote: > On Thu, Aug 27, 2015 at 2:16 AM, Matthew Brett wrote: >> So, I speculate, that a good governance model would have: >> >> * one 'president' who has to take final responsibility for all decisions; >> * this president might well have a fixed term, maybe with limits on >> the number of terms they can serve. >> * the president would be chosen by community vote and explicitly on >> the basis that they were good managers as well as coders; >> * for the presidential election, the candidates should set out what >> their vision for the project is, and how they plan to achieve that >> vision; > > We actually discussed some variants on this kind of idea at the > meeting, and I think the general sense of those present was we didn't > want to go there (for whatever that's worth). > > At least personally, I have to admit that the idea of a governance > model involving elections fills me with creeping horror. The reason is > that whole point of having a governance model (IMHO) is to (a) > minimize the rise of interpersonal drama Right - I think this is key to the problem in this model. It is designed not to cause any trouble, and to keep things running as they are without controversy. It works OK on average as long as 'no change' is the desired outcome. In general the core group know each other fairly well, and feel a sense of shared loyalty to the group. This loyalty is exercised when some outside or inside force challenges the direction of the project. This was what made it so hard for the XFree86 core group to pull back from the course they had set. The question is - is avoiding the potential controversy important enough to force us into a model that has (in my opinion) a high risk of tending to conservatism and stagnation? > As for evidence... there are obviously projects that have had serious > problems with some variant of core team model, but there are also many > many successful projects that are also using variants of this model, > and the document I sent around attempts to incorporate the lessons > that have been learned in the process. OTOH after wracking my brain I > think the only project I'm familiar with that has elections at all > like this is Fedora, which elects... a "core team" (FESCo). Given that > we don't have the problem of trying to manage thousands of > contributors, I'm not sure their experience is really relevant. Or I > guess Debian's use of General Resolutions as a decision-making > procedure of last resort is kinda relevant, but... pretty different. > (They also elect the project leader, which is more similar to what > you're describing, but the project leader has no technical authority; > in Debian the final authority short of a GR is the CTTE, which is > explicitly designed as a classic beholden-to-nobody institution -- and > even overriding the CTTE requires a supermajority.) > > I kinda feel like... as a rule of thumb, if your description of your > governance model starts with the words "I speculate that...", then > NumPy is probably not a good project to use for your experiment? So my argument would be that our current data on success (lots of projects use this model and many of them are OK) is much less useful than the data from successful forks. I suppose my question ends up being - do you agree that the core model does have these risks? Do they worry you? What do you think we can do to guard against them? Cheers, Matthew From solipsis at pitrou.net Thu Aug 27 06:44:33 2015 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 27 Aug 2015 12:44:33 +0200 Subject: [Numpy-discussion] 1.10.0rc1 References: <20150826151141.17db3046@fsol> Message-ID: <20150827124433.51dde6b6@fsol> Hi again, The change seems to have possibly unforeseen consequences because some ufuncs don't declare all possible types, e.g.: >>> a = np.arange(10, dtype=np.int32) >>> out = np.zeros_like(a) >>> np.fabs(a, out=out) Traceback (most recent call last): File "", line 1, in TypeError: ufunc 'fabs' output (typecode 'd') could not be coerced to provided output parameter (typecode 'i') according to the casting rule ''same_kind'' >>> np.fabs.types ['e->e', 'f->f', 'd->d', 'g->g', 'O->O'] (while fabs() wouldn't necessarily make sense on complex numbers, it does make sense on integers... and, ah, I've just noticed that np.abs() also exists with more input types, which is confusing...) Regards Antoine. On Wed, 26 Aug 2015 07:52:09 -0600 Charles R Harris wrote: > On Wed, Aug 26, 2015 at 7:32 AM, Charles R Harris > wrote: > > > > > > > On Wed, Aug 26, 2015 at 7:31 AM, Charles R Harris < > > charlesr.harris at gmail.com> wrote: > > > >> > >> > >> On Wed, Aug 26, 2015 at 7:11 AM, Antoine Pitrou > >> wrote: > >> > >>> On Tue, 25 Aug 2015 10:26:02 -0600 > >>> Charles R Harris wrote: > >>> > Hi All, > >>> > > >>> > The silence after the 1.10 beta has been eerie. Consequently, I'm > >>> thinking > >>> > of making a first release candidate this weekend. If you haven't yet > >>> tested > >>> > the beta, please do so. It would be good to discover as many problems > >>> as we > >>> > can before the first release. > >>> > >>> Has typing of ufunc parameters become much stricter? I can't find > >>> anything in the release notes, but see (1.10b1): > >>> > >>> >>> arr = np.linspace(0, 5, 10) > >>> >>> out = np.empty_like(arr, dtype=np.intp) > >>> >>> np.round(arr, out=out) > >>> Traceback (most recent call last): > >>> File "", line 1, in > >>> File > >>> "/home/antoine/np110/lib/python3.4/site-packages/numpy/core/fromnumeric.py", > >>> line 2778, in round_ > >>> return round(decimals, out) > >>> TypeError: ufunc 'rint' output (typecode 'd') could not be coerced to > >>> provided output parameter (typecode 'l') according to the casting rule > >>> ''same_kind'' > >>> > >>> > >>> It used to work (1.9): > >>> > >>> >>> arr = np.linspace(0, 5, 10) > >>> >>> out = np.empty_like(arr, dtype=np.intp) > >>> >>> np.round(arr, out=out) > >>> array([0, 1, 1, 2, 2, 3, 3, 4, 4, 5]) > >>> >>> out > >>> array([0, 1, 1, 2, 2, 3, 3, 4, 4, 5]) > >>> > >> > >> The default casting mode has been changed. I think this has been raising > >> a warning since 1.7 and was mentioned as a future change in 1.10, but you > >> are right, it needs to be mentioned in the 1.10 release notes. > >> > > > > Make that warned of in the 1.9.0 release notes. > > > > > Here it is in 1.9.0 with deprecation warning made visible. > ``` > In [3]: import warnings > > In [4]: warnings.simplefilter('always') > > In [5]: arr = np.linspace(0, 5, 10) > > In [6]: out = np.empty_like(arr, dtype=np.intp) > > In [7]: np.round(arr, out=out) > /home/charris/.local/lib/python2.7/site-packages/numpy/core/fromnumeric.py:2640: > DeprecationWarning: Implicitly casting between incompatible kinds. In a > future numpy release, this will raise an error. Use casting="unsafe" if > this is intentional. > return round(decimals, out) > Out[7]: array([0, 1, 1, 2, 2, 3, 3, 4, 4, 5]) > ``` > > Chuck > From sebastian at sipsolutions.net Thu Aug 27 07:11:13 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Thu, 27 Aug 2015 13:11:13 +0200 Subject: [Numpy-discussion] Comments on governance proposal (was: Notes from the numpy dev meeting at scipy 2015) In-Reply-To: References: <87y4gxllb4.fsf@berkeley.edu> Message-ID: <1440673873.1694.33.camel@sipsolutions.net> On Do, 2015-08-27 at 10:45 +0100, Matthew Brett wrote: > Hi, > > On Thu, Aug 27, 2015 at 10:35 AM, Bryan Van de Ven wrote: > > > >> On Aug 27, 2015, at 10:22 AM, Matthew Brett wrote: > >> > >> In the case of the 'core' model, we have some compelling testimony > >> from someone with a great deal of experience: > >> > >> """ > >> Much of this early structure (CVS, web site, cabal ["core" group], > >> etc.) was copied verbatim by other open source (this term not being in > >> wide use yet) projects -- even the form of the project name and the > >> term "core". This later became a kind of standard template for > >> starting up an open source project. [...] I'm sorry to say that I > >> helped create this problem, and that most of the projects which > >> modeled themselves after NetBSD (probably due to its high popularity > >> in 1993 and 1994) have suffered similar problems. FreeBSD and XFree86, > >> for example, have both forked successor projects (Dragonfly and X.org) > >> for very similar reasons. > >> """ > > > > Who goes on to propose: > > > > 7) The "core" group must be replaced with people who are actually > > competent and dedicated enough to review proposals, accept feedback, > > and make good decisions. More to the point, though, the "core" group > > must only act when *needed* -- most technical decisions should be > > left to the community to hash out; it must not preempt the community > > from developing better solutions. (This is how the "core" group > > worked during most of the project's growth period.) > > Sure. I think it's reasonable to give high weight to Hannum's > assessment of the failure of the core group, but less weight to his > proposal for a replacement, because at the time, I don't believe he > was in a good position to assess whether his (apparent) alternative > would run into the same trouble. > > It's always tempting to blame the people rather than the system, but > in this case, I strongly suspect that it was the system that was > fundamentally flawed, therefore either promoting the wrong people or > putting otherwise competent people into situations where they are no > longer getting useful feedback. Maybe so. I do not know much at all about these models, but I am not sure how much applies here to numpy. Isn't at least FreeBSD a magnitude larger then numpy? We do need to have some formality about how to give out commit rights, and do final decision when all else fails. One thing I do not know is how a "community vote" could work at all, considering I do not even know how to count its members. Votes and presidents make sense to me for large projects with hundrets of developers on different corners (think of the gnome foundation, debian probably) [1]. One thing I could imagine adding is that the community should be encouraged to ask for/propose new members for the "core" team. Nobody is particularly in love with this model, but maybe out of our own ignorance, we do not see many alternatives after ruling out a BDFL. Yes, it is a lot fixing a status quo, but we have to fixate something. Any alternative suggestions are welcome and would be even after deciding on this. Though maybe that takes away some momentum. - Sebastian [1] If we were to have a central governance for the SciPy stack the story would seem very different to me. > > It would be great, and very convenient, if the only management we > needed was getting out of the way, but I doubt very much that that is > the case. > > Cheers, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From ndbecker2 at gmail.com Thu Aug 27 07:23:25 2015 From: ndbecker2 at gmail.com (Neal Becker) Date: Thu, 27 Aug 2015 07:23:25 -0400 Subject: [Numpy-discussion] Defining a white noise process using numpy References: Message-ID: Daniel Bliss wrote: > Hi all, > > Can anyone give me some advice for translating this equation into code > using numpy? > > eta(t) = lim(dt -> 0) N(0, 1/sqrt(dt)), > > where N(a, b) is a Gaussian random variable of mean a and variance b**2. > > This is a heuristic definition of a white noise process. > > Thanks, > Dan You want noise with infinite variance? That doesn't make sense. From archibald at astron.nl Thu Aug 27 07:37:54 2015 From: archibald at astron.nl (Anne Archibald) Date: Thu, 27 Aug 2015 11:37:54 +0000 Subject: [Numpy-discussion] Defining a white noise process using numpy In-Reply-To: References: Message-ID: On Thu, Aug 27, 2015 at 12:51 AM Daniel Bliss wrote: Can anyone give me some advice for translating this equation into code > using numpy? > > eta(t) = lim(dt -> 0) N(0, 1/sqrt(dt)), > > where N(a, b) is a Gaussian random variable of mean a and variance b**2. > > This is a heuristic definition of a white noise process. > This is an abstract definition. How to express it in numpy will depend on what you want to do with it. The easiest and most likely thing you could want would be a time series, with N time steps dt, in which sample i is the average value of the white noise process from i*dt to (i+1)*dt. This is very easy to write in numpy: 1/np.sqrt(dt) * np.random.randn(N) Anne -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Thu Aug 27 07:44:24 2015 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 27 Aug 2015 13:44:24 +0200 Subject: [Numpy-discussion] 1.10.0rc1 References: <20150826151141.17db3046@fsol> Message-ID: <20150827134424.38ec9eac@fsol> The change also seems to have made datetime64 computations stricter: >>> np.datetime64('2010') - np.datetime64('2000-01-01') numpy.timedelta64(3653,'D') >>> np.datetime64('2010') - np.datetime64('2000-01-01T00:00:00Z') Traceback (most recent call last): File "", line 1, in TypeError: Cannot cast ufunc subtract input from dtype(' wrote: > On Wed, Aug 26, 2015 at 7:32 AM, Charles R Harris > wrote: > > > > > > > On Wed, Aug 26, 2015 at 7:31 AM, Charles R Harris < > > charlesr.harris at gmail.com> wrote: > > > >> > >> > >> On Wed, Aug 26, 2015 at 7:11 AM, Antoine Pitrou > >> wrote: > >> > >>> On Tue, 25 Aug 2015 10:26:02 -0600 > >>> Charles R Harris wrote: > >>> > Hi All, > >>> > > >>> > The silence after the 1.10 beta has been eerie. Consequently, I'm > >>> thinking > >>> > of making a first release candidate this weekend. If you haven't yet > >>> tested > >>> > the beta, please do so. It would be good to discover as many problems > >>> as we > >>> > can before the first release. > >>> > >>> Has typing of ufunc parameters become much stricter? I can't find > >>> anything in the release notes, but see (1.10b1): > >>> > >>> >>> arr = np.linspace(0, 5, 10) > >>> >>> out = np.empty_like(arr, dtype=np.intp) > >>> >>> np.round(arr, out=out) > >>> Traceback (most recent call last): > >>> File "", line 1, in > >>> File > >>> "/home/antoine/np110/lib/python3.4/site-packages/numpy/core/fromnumeric.py", > >>> line 2778, in round_ > >>> return round(decimals, out) > >>> TypeError: ufunc 'rint' output (typecode 'd') could not be coerced to > >>> provided output parameter (typecode 'l') according to the casting rule > >>> ''same_kind'' > >>> > >>> > >>> It used to work (1.9): > >>> > >>> >>> arr = np.linspace(0, 5, 10) > >>> >>> out = np.empty_like(arr, dtype=np.intp) > >>> >>> np.round(arr, out=out) > >>> array([0, 1, 1, 2, 2, 3, 3, 4, 4, 5]) > >>> >>> out > >>> array([0, 1, 1, 2, 2, 3, 3, 4, 4, 5]) > >>> > >> > >> The default casting mode has been changed. I think this has been raising > >> a warning since 1.7 and was mentioned as a future change in 1.10, but you > >> are right, it needs to be mentioned in the 1.10 release notes. > >> > > > > Make that warned of in the 1.9.0 release notes. > > > > > Here it is in 1.9.0 with deprecation warning made visible. > ``` > In [3]: import warnings > > In [4]: warnings.simplefilter('always') > > In [5]: arr = np.linspace(0, 5, 10) > > In [6]: out = np.empty_like(arr, dtype=np.intp) > > In [7]: np.round(arr, out=out) > /home/charris/.local/lib/python2.7/site-packages/numpy/core/fromnumeric.py:2640: > DeprecationWarning: Implicitly casting between incompatible kinds. In a > future numpy release, this will raise an error. Use casting="unsafe" if > this is intentional. > return round(decimals, out) > Out[7]: array([0, 1, 1, 2, 2, 3, 3, 4, 4, 5]) > ``` > > Chuck > From matthew.brett at gmail.com Thu Aug 27 08:57:46 2015 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 27 Aug 2015 13:57:46 +0100 Subject: [Numpy-discussion] Comments on governance proposal (was: Notes from the numpy dev meeting at scipy 2015) In-Reply-To: <1440673873.1694.33.camel@sipsolutions.net> References: <87y4gxllb4.fsf@berkeley.edu> <1440673873.1694.33.camel@sipsolutions.net> Message-ID: Hi, On Thu, Aug 27, 2015 at 12:11 PM, Sebastian Berg wrote: > On Do, 2015-08-27 at 10:45 +0100, Matthew Brett wrote: >> Hi, >> >> On Thu, Aug 27, 2015 at 10:35 AM, Bryan Van de Ven wrote: >> > >> >> On Aug 27, 2015, at 10:22 AM, Matthew Brett wrote: >> >> >> >> In the case of the 'core' model, we have some compelling testimony >> >> from someone with a great deal of experience: >> >> >> >> """ >> >> Much of this early structure (CVS, web site, cabal ["core" group], >> >> etc.) was copied verbatim by other open source (this term not being in >> >> wide use yet) projects -- even the form of the project name and the >> >> term "core". This later became a kind of standard template for >> >> starting up an open source project. [...] I'm sorry to say that I >> >> helped create this problem, and that most of the projects which >> >> modeled themselves after NetBSD (probably due to its high popularity >> >> in 1993 and 1994) have suffered similar problems. FreeBSD and XFree86, >> >> for example, have both forked successor projects (Dragonfly and X.org) >> >> for very similar reasons. >> >> """ >> > >> > Who goes on to propose: >> > >> > 7) The "core" group must be replaced with people who are actually >> > competent and dedicated enough to review proposals, accept feedback, >> > and make good decisions. More to the point, though, the "core" group >> > must only act when *needed* -- most technical decisions should be >> > left to the community to hash out; it must not preempt the community >> > from developing better solutions. (This is how the "core" group >> > worked during most of the project's growth period.) >> >> Sure. I think it's reasonable to give high weight to Hannum's >> assessment of the failure of the core group, but less weight to his >> proposal for a replacement, because at the time, I don't believe he >> was in a good position to assess whether his (apparent) alternative >> would run into the same trouble. >> >> It's always tempting to blame the people rather than the system, but >> in this case, I strongly suspect that it was the system that was >> fundamentally flawed, therefore either promoting the wrong people or >> putting otherwise competent people into situations where they are no >> longer getting useful feedback. > > Maybe so. I do not know much at all about these models, but I am not > sure how much applies here to numpy. Isn't at least FreeBSD a magnitude > larger then numpy? It seems to me that numpy suffers from the same risks of poor accountability, stagnation and conservatism that larger projects do. Is there a reason that would not be the case? > We do need to have some formality about how to give out commit rights, > and do final decision when all else fails. Yes, sure, something formal is probably but not certainly better than nothing, depending on what the 'something formal' is. > One thing I do not know is how a "community vote" could work at all, > considering I do not even know how to count its members. Votes and > presidents make sense to me for large projects with hundrets of > developers on different corners (think of the gnome foundation, debian > probably) [1]. The 'president' idea is to get at the problem of lack of accountability, along with selection for leadership skill rather than coding ability. It's trying to get at the advantages of the BDFL model in our situation where there is no obvious BDFL. For the me the problem is that, at the moment, if the formal or informal governing body makes a bad decision, then no member will feel responsible for that decision or its consequences. That tends to lead to an atmosphere of - "oh well, what could we do, X wouldn't agree to A and Y wouldn't agree to B so we're stuck". It seems to me we need a system such that whoever is in charge feels so strongly that it is their job to make numpy as good as possible, that they will take whatever difficult or sensitive decisions are necessary to make that happen. On the other hand the 'core' system seems to function on a model of mutual deference and personal loyalty that I believe is destructive of good management. Cheers, Matthew From ben.v.root at gmail.com Thu Aug 27 09:52:31 2015 From: ben.v.root at gmail.com (Benjamin Root) Date: Thu, 27 Aug 2015 09:52:31 -0400 Subject: [Numpy-discussion] 1.10.0rc1 In-Reply-To: <20150827134424.38ec9eac@fsol> References: <20150826151141.17db3046@fsol> <20150827134424.38ec9eac@fsol> Message-ID: Ok, I tested matplotlib master against numpy master, and there were no errors. I did get a bunch of new deprecation warnings though such as: "/nas/home/broot/centos6/lib/python2.7/site-packages/matplotlib-1.5.dev1-py2.7-linux-x86_64.egg/matplotlib/colorbar.py:539: VisibleDeprecationWarning: boolean index did not match indexed array along dimension 0; dimension is 5 but corresponding boolean dimension is 3 colors = np.asarray(colors)[igood]" The message isn't exactly clear. I suspect the problem is a shape mismatch, like colors is 5x3, and igood is just 3 for some reason. Could somebody shine some light on this, please? Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu Aug 27 10:04:51 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 27 Aug 2015 08:04:51 -0600 Subject: [Numpy-discussion] 1.10.0rc1 In-Reply-To: References: <20150826151141.17db3046@fsol> <20150827134424.38ec9eac@fsol> Message-ID: On Thu, Aug 27, 2015 at 7:52 AM, Benjamin Root wrote: > > Ok, I tested matplotlib master against numpy master, and there were no > errors. I did get a bunch of new deprecation warnings though such as: > > "/nas/home/broot/centos6/lib/python2.7/site-packages/matplotlib-1.5.dev1-py2.7-linux-x86_64.egg/matplotlib/colorbar.py:539: > VisibleDeprecationWarning: boolean index did not match indexed array along > dimension 0; dimension is 5 but corresponding boolean dimension is 3 > colors = np.asarray(colors)[igood]" > > The message isn't exactly clear. I suspect the problem is a shape > mismatch, like colors is 5x3, and igood is just 3 for some reason. Could > somebody shine some light on this, please? > IIRC, Boolean indexing would fill out the dimension, i.e., len 3 would be expanded to len 5 in this case. That behavior is deprecated. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Thu Aug 27 10:34:24 2015 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 27 Aug 2015 10:34:24 -0400 Subject: [Numpy-discussion] Comments on governance proposal (was: Notes from the numpy dev meeting at scipy 2015) In-Reply-To: References: <87y4gxllb4.fsf@berkeley.edu> <1440673873.1694.33.camel@sipsolutions.net> Message-ID: On Thu, Aug 27, 2015 at 8:57 AM, Matthew Brett wrote: > Hi, > > On Thu, Aug 27, 2015 at 12:11 PM, Sebastian Berg > wrote: > > On Do, 2015-08-27 at 10:45 +0100, Matthew Brett wrote: > >> Hi, > >> > >> On Thu, Aug 27, 2015 at 10:35 AM, Bryan Van de Ven > wrote: > >> > > >> >> On Aug 27, 2015, at 10:22 AM, Matthew Brett > wrote: > >> >> > >> >> In the case of the 'core' model, we have some compelling testimony > >> >> from someone with a great deal of experience: > >> >> > >> >> """ > >> >> Much of this early structure (CVS, web site, cabal ["core" group], > >> >> etc.) was copied verbatim by other open source (this term not being > in > >> >> wide use yet) projects -- even the form of the project name and the > >> >> term "core". This later became a kind of standard template for > >> >> starting up an open source project. [...] I'm sorry to say that I > >> >> helped create this problem, and that most of the projects which > >> >> modeled themselves after NetBSD (probably due to its high popularity > >> >> in 1993 and 1994) have suffered similar problems. FreeBSD and > XFree86, > >> >> for example, have both forked successor projects (Dragonfly and > X.org) > >> >> for very similar reasons. > >> >> """ > >> > > >> > Who goes on to propose: > >> > > >> > 7) The "core" group must be replaced with people who are actually > >> > competent and dedicated enough to review proposals, accept > feedback, > >> > and make good decisions. More to the point, though, the "core" > group > >> > must only act when *needed* -- most technical decisions should be > >> > left to the community to hash out; it must not preempt the > community > >> > from developing better solutions. (This is how the "core" group > >> > worked during most of the project's growth period.) > >> > >> Sure. I think it's reasonable to give high weight to Hannum's > >> assessment of the failure of the core group, but less weight to his > >> proposal for a replacement, because at the time, I don't believe he > >> was in a good position to assess whether his (apparent) alternative > >> would run into the same trouble. > >> > >> It's always tempting to blame the people rather than the system, but > >> in this case, I strongly suspect that it was the system that was > >> fundamentally flawed, therefore either promoting the wrong people or > >> putting otherwise competent people into situations where they are no > >> longer getting useful feedback. > > > > Maybe so. I do not know much at all about these models, but I am not > > sure how much applies here to numpy. Isn't at least FreeBSD a magnitude > > larger then numpy? > > It seems to me that numpy suffers from the same risks of poor > accountability, stagnation and conservatism that larger projects do. > Is there a reason that would not be the case? > > > We do need to have some formality about how to give out commit rights, > > and do final decision when all else fails. > > Yes, sure, something formal is probably but not certainly better than > nothing, depending on what the 'something formal' is. > > > One thing I do not know is how a "community vote" could work at all, > > considering I do not even know how to count its members. Votes and > > presidents make sense to me for large projects with hundrets of > > developers on different corners (think of the gnome foundation, debian > > probably) [1]. > > The 'president' idea is to get at the problem of lack of > accountability, along with selection for leadership skill rather than > coding ability. It's trying to get at the advantages of the BDFL > model in our situation where there is no obvious BDFL. For the me > the problem is that, at the moment, if the formal or informal > governing body makes a bad decision, then no member will feel > responsible for that decision or its consequences. That tends to lead > to an atmosphere of - "oh well, what could we do, X wouldn't agree to > A and Y wouldn't agree to B so we're stuck". It seems to me we need > a system such that whoever is in charge feels so strongly that it is > their job to make numpy as good as possible, that they will take > whatever difficult or sensitive decisions are necessary to make that > happen. On the other hand the 'core' system seems to function on a > model of mutual deference and personal loyalty that I believe is > destructive of good management. > I don't really see a problem with "codifying" the status quo. It might become necessary to have something like an administrative director if numpy becomes a more formal organization with funding, but for the development of the project I don't see any need for a president. If there is no obvious BDFL, then I guess there is also no obvious president. (I would vote for Ralf as president of everything, but I don't think he's available.) As the current debate shows it's possible to have a public discussion about the direction of the project without having to delegate providing a vision to a president. Given the current pattern all critical issues end up in a public debate on the mailing list. What numpy (and scipy) need is to have someone as a tie breaker to make any final decisions if there is no clear consensus, if there is no BDFL for it, then having the "core" group making those decisions looks appropriate to me. > "On the other hand the 'core' system seems to function on a model of mutual deference and personal loyalty that I believe is destructive of good management." That sounds actually like a good basis for team work to me. Also that has "mutual" in it instead of just deferring and being loyal to a president. Since I know scipy development much better: scipy has made a huge progress in the last 5 or 6 years since I've been following it, both in terms of code, in terms of development workflow, and in the number of developers. (When I started, I was essentially alone in scipy.stats, now there are 3 to 5 "core" developers that at least partially work on it, everything goes through PRs with public discussion and with critical issues additionally raised on the mailing list.) Ralf and Pauli are the defacto BDFLs for scipy overall, but all decisions in recent years have been without a fight, but not without lots of arguments, and, given the size and breadth of scipy, there are field experts (although not enough of those) to help in the discussion. There are still stalled PRs, blocked proposals, and decisions that I didn't like, but I think that's unavoidable. One feature of the current "core" system is that it is relatively open, almost all discussions are public, and it is relatively (?) easy to get into it for new developers. I currently don't worry that a closed clique is taking over the numpy or scipy "core" group:. Josef > > Cheers, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bryanv at continuum.io Thu Aug 27 10:43:28 2015 From: bryanv at continuum.io (Bryan Van de Ven) Date: Thu, 27 Aug 2015 15:43:28 +0100 Subject: [Numpy-discussion] Comments on governance proposal (was: Notes from the numpy dev meeting at scipy 2015) In-Reply-To: References: <87y4gxllb4.fsf@berkeley.edu> <1440673873.1694.33.camel@sipsolutions.net> Message-ID: > On Aug 27, 2015, at 1:57 PM, Matthew Brett wrote: > > The 'president' idea ...seems to be predicated on a steady stream of people who: actually want job, don't mind campaigning, are willing to accept any and all blame, and have the technical experience to make "final decisions". As others have pointed out the active developer community for NumPy is not measured in the hundreds (or even the tens, really). So: what is your proposed recourse if you hold an election and no-one shows up to run? Bryan From sebastian at sipsolutions.net Thu Aug 27 10:44:34 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Thu, 27 Aug 2015 16:44:34 +0200 Subject: [Numpy-discussion] 1.10.0rc1 In-Reply-To: References: <20150826151141.17db3046@fsol> <20150827134424.38ec9eac@fsol> Message-ID: <1440686674.11529.5.camel@sipsolutions.net> On Do, 2015-08-27 at 08:04 -0600, Charles R Harris wrote: > > > On Thu, Aug 27, 2015 at 7:52 AM, Benjamin Root > wrote: > > > Ok, I tested matplotlib master against numpy master, and there > were no errors. I did get a bunch of new deprecation warnings > though such as: > > "/nas/home/broot/centos6/lib/python2.7/site-packages/matplotlib-1.5.dev1-py2.7-linux-x86_64.egg/matplotlib/colorbar.py:539: VisibleDeprecationWarning: boolean index did not match indexed array along dimension 0; dimension is 5 but corresponding boolean dimension is 3 > colors = np.asarray(colors)[igood]" > > > The message isn't exactly clear. I suspect the problem is a > shape mismatch, like colors is 5x3, and igood is just 3 for > some reason. Could somebody shine some light on this, please? > > > > IIRC, Boolean indexing would fill out the dimension, i.e., len 3 would > be expanded to len 5 in this case. That behavior is deprecated. > Yes, this is exactly the case, you have something like: arr = np.zeros((5, 3)) ind = np.array([True, False, False]) arr[ind, :] and numpy nowadays thinks that such code is likely a bug (when the ind is shorter than arr it is somewhat OK, the other way around gets more creepy). If you have an idea of how to make the error message clearer, or objections to the change, I am happy to hear it! - Sebastian > > Chuck > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From josef.pktd at gmail.com Thu Aug 27 11:03:46 2015 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 27 Aug 2015 11:03:46 -0400 Subject: [Numpy-discussion] Notes from the numpy dev meeting at scipy 2015 In-Reply-To: References: Message-ID: On Wed, Aug 26, 2015 at 10:06 AM, Travis Oliphant wrote: > > > On Wed, Aug 26, 2015 at 1:41 AM, Nathaniel Smith wrote: > >> Hi Travis, >> >> Thanks for taking the time to write up your thoughts! >> >> I have many thoughts in return, but I will try to restrict myself to two >> main ones :-). >> >> 1) On the question of whether work should be directed towards improving >> NumPy-as-it-is or instead towards a compatibility-breaking replacement: >> There's plenty of room for debate about whether it's better engineering >> practice to try and evolve an existing system in place versus starting >> over, and I guess we have some fundamental disagreements there, but I >> actually think this debate is a distraction -- we can agree to disagree, >> because in fact we have to try both. >> > > Yes, on this we agree. I think NumPy can improve *and* we can have new > innovative array objects. I don't disagree about that. > > >> >> At a practical level: NumPy *is* going to continue to evolve, because it >> has users and people interested in evolving it; similarly, dynd and other >> alternatives libraries will also continue to evolve, because they also have >> people interested in doing it. And at a normative level, this is a good >> thing! If NumPy and dynd both get better, than that's awesome: the worst >> case is that NumPy adds the new features that we talked about at the >> meeting, and dynd simultaneously becomes so awesome that everyone wants to >> switch to it, and the result of this would be... that those NumPy features >> are exactly the ones that will make the transition to dynd easier. Or if >> some part of that plan goes wrong, then well, NumPy will still be there as >> a fallback, and in the mean time we've actually fixed the major pain points >> our users are begging us to fix. >> >> You seem to be urging us all to make a double-or-nothing wager that your >> extremely ambitious plans will all work out, with the entire numerical >> Python ecosystem as the stakes. I think this ambition is awesome, but maybe >> it'd be wise to hedge our bets a bit? >> > > You are mis-characterizing my view. I think NumPy can evolve (though I > would personally rather see a bigger change to the underlying system like I > outlined before). But, I don't believe it can even evolve easily in the > direction needed without breaking ABI and that insisting on not breaking it > or even putting too much effort into not breaking it will continue to > create less-optimal solutions that are harder to maintain and do not take > advantage of knowledge this community now has. > > I'm also very concerned that 'evolving' NumPy will create a situation > where there are regular semantic and subtle API changes that will cause > NumPy to be less stable for it's user-base. I've watched this happen. > This at a time that people are already looking around for new and different > approaches anyway. > > >> >> 2) You really emphasize this idea of an ABI-breaking (but not >> API-breaking) release, and I think this must indicate some basic gap in how >> we're looking at things. Where I'm getting stuck here is that... I actually >> can't think of anything important that we can't do now, but could if we >> were allowed to break ABI compatibility. The kinds of things that break ABI >> but keep API are like... rearranging what order the fields in a struct fall >> in, or changing the numeric value of opaque constants like >> NPY_ARRAY_WRITEABLE. The biggest win I can think of is that we could save a >> few bytes per array by arranging the fields inside the ndarray struct more >> optimally, but that's hardly a feature to hang a 2.0 on. You seem to have a >> vision of this ABI-breaking release as being something very different from >> that, and I'm not clear on what this vision is. >> >> > We already broke the ABI with date-time changes --- it's still broken for > a certain percentage of users last I checked. So, part of my > disagreement is that we've tried this and it didn't work --- even though > smart people thought it would. I've had to deal with this personally and > I'm not enthusiastic about having to deal with this for the next 5 years > because of even more attempts to make changes while not breaking the ABI. > I think the group is more careful now --- but I still think the API is > broad enough and uses of NumPy deep enough that the effort involved in > trying not to break the ABI is just not worth the effort (because it's a > non-feature today). Adding new dtypes without breaking the ABI is tricky > (and to do it without breaking the ABI is ugly). I also continue to > believe that putting out a new ABI-breaking NumPy will allow re-compiling > *once* (with some porting changes needed) and not subtle breakages > requiring code-changes every time a release is made. If subtle changes > aren't made, then the new features won't come. Right now, I'd rather have > stability from NumPy than new features. New features can come from other > libraries. > > One specific change that could easily be made in NumPy 2.0 (the current > code but with an ABI change) is that Dtypes should become true type objects > and array-scalars (which are the current type-objects) should become > instances of those dtypes. That is the biggest clean-up needed, I think on > the array-front. There should not be *both* array-scalars and dtype > objects. They are the same thing fundamentally. It was a mistake to > have both of them. I don't see how to make that change without breaking > the ABI. Perhaps it could be done in a creative way --- but why put the > effort into that and end up with an even more hacky code-base. > > NumPy's ABI was influenced by and evolved from Numeric and Numarray. It > was not "designed" to last 30 years. > > I think the dtype "types" should potentially have different > member-structures. The ufunc sub-system needs an overhaul --- it's > member structures need upgrades. With generalized ufuncs and the > iteration protocols of Mark Wiebe we know a whole lot more about ufuncs > now. Ufuncs are the same 1995 structure that Jim Hugunin wrote. I > suppose you *could* just tack new functions on the end of structure and > keep growing the list (while leaving old, unused structures as unused or > deprecated) --- or you can take the opportunity to tidy up a bit. The > longer you leave everything the same, the harder you make the code-base and > the more costly maintenance becomes. I just don't see the value there > --- and I see a lot of pain. > > Regarding the ufunc subsystem. We've argued before about the lack of > mulit-methods in NumPy. Continuing to add dunder-methods to try and get > around it will continue to make the system harder to maintain and more > brittle. > > You mention making NumPy an interface to multiple things along with many > other ideas. I don't believe you can get there without real changes that > break things (at the very least semantic changes). I'm not excited about > those changes causing instability (which they will cause ---- to me the > burden of proof that they won't is on you who wants to make the change and > not on me to say how they will). I also think it will take much > longer to get there incrementally (if at all) than just creating something > on top of newer ideas. > > > >> The main reason I personally am against having a big ABI-breaking release >> is not that I hate ABI breakage a priori, it's that all the big features >> that I care about and the are users are asking for seem to be ones that... >> don't actually require doing that. At most they seem to get a mild benefit >> from breaking some obscure corner cases. So the cost/benefits don't make >> any sense to me. >> >> So: can you give a concrete example of a change you have in mind where >> breaking ABI would be the key enabler? >> >> (I guess you might also be thinking of a separate issue that you sort of >> allude to: Perhaps we will try to make changes which we think don't involve >> breaking the ABI, but discover too late that we have failed to fully >> understand the implications and have broken it by mistake. IIUC this is >> what happened in the 1.4 timeframe when datetime64 was merged and >> accidentally renumbered some of the NPY_* constants. >> > > Yes, this is what I'm mainly worried about. But, more than that, I'm > concerned about general *semantic* and API changes at a rapid pace for a > community that is just looking for stability and bug-fixes from NumPy > itself --- with innovation happening elsewhere. > > >> Partially I am less worried about this because I have a fair amount of >> confidence that our review and QA process has improved these days to the >> point that we would not let a change like that slip through by accident -- >> we have a lot more active reviewers, people are sensitized to the issues, >> we've successfully landed intrusive changes like Sebastian's indexing >> rewrite, ... though this is very much second-hand impressions on my part, >> and I'd welcome input from folks like Chuck who have a clearer view on how >> things have changed from then to now. >> >> But more importantly, even if this is true, then I can't see how your >> proposal helps. If we aren't good enough at our jobs to predict when we'll >> break ABI, then by assumption it makes no sense to pick one release and >> decide that this is the one time that we'll break ABI.) >> > > I don't understand your point. Picking a release to break the ABI allows > you to actually do things like change macros to functions and move > structures around to be more consistent with a new design that is easier to > maintain and allows more growth. It has nothing to do with "whether you > are good at your job". Everyone has strengths and weaknesses. > > This kind of clean-up may be needed regularly --- every 3 years would not > be a crazy pattern, but it could also be every 5 years if you wanted more > discipline. I already knew we needed to break the ABI "soonish" when I > released NumPy 1.0. The fact that we haven't officially done it yet (but > have done it unofficially) is a great injustice to "what could be" and has > slowed development of NumPy tremendously. > > We've gone back and forth on this. I'm fine if we disagree, but I just > hope the disagreement doesn't lead to lack of cooperation as we both have > the same ultimate interests in seeing array-computing in Python improve. > I just don't support *major* changes without breaking the ABI without a > whole lot of proof that it is possible (without hackiness). You have > mentioned on your roadmap a lot of what I would consider *major* changes. > Some of it you describe how to get there. The most important change > (improving the dtype system) you don't. > > Part of my point is that we now *know* how to improve the dtype system. > Let's do it. Let's not try "yet again" to do it differently inside an old > system designed by a scientist who didn't understand type-theory or type > systems (that was me by the way). Look at data-shape in the blaze > project. Take that and build a Python type-system that also outputs > struct-string syntax for memory-views. That's the data-description system > that NumPy should be using --- not trying to hack on a mixed array-scalar, > dtype-object system that may never support everything we now know is > needed. > > Trying to incrementing from where we are now will only lead to a > sub-optimal outcome and unfortunate instability when we already know what > to do differently. I doubt I will convince you --- certainly not via > email. I apologize in advance that I likely won't be able to respond in > depth to any more questions that are really just "prove to me that I can't" > kind of questions. Of course I can't prove that. All I'm saying is that > to me the evidence and my experience leads me to not be able to support > major changes like you have proposed without also intentionally breaking > the ABI (and thus calling it NumPy 2.0). > > If I find time to write, I will try to use it to outline more specifically > what I think is a better approach to array- and table-computing in Python > that keeps the stability of NumPy and adds new features using different > approaches. > > -Travis > > >From my perspective the incremental evolutionary approach in numpy (and scipy) in the last few years has worked quite well, and I'm optimistic that it will work in future if the developers can pull it off. The main changes that I remember that needed adjustment in scipy (as observer) or statsmodels (as maintainer) came from becoming more strict in several cases. This mainly affects corner cases or cases where the downstream code wasn't "clean". Some API breaking (with deprecation) and some semantic changes are still needed independent of any big changes that may or may not be arriving anytime soon. This way we get improvements in a core library with the requirement that every once in a while we need to adjust our code. (And with the occasional unintended side effect where test coverage is not enough.) The advantage is that we are getting the improvements with the regular release cycles, and they keep numpy alive and competitive for another 10 years or more. In the meantime, other packages like pandas can cater and expand to other use cases, or other packages can develop generic arrays and out of core and distributed arrays. I'm partially following some of the Julia mailing lists. Starting something from scratch is a lot of work, and my guess is that similar approaches in python will take some time to become mainstream. In the meantime we can build something on an improving numpy. --- The only thing I'm not so happy about in the last years is the proliferation of object arrays, both in numpy code and in pandas. And I hope that the (dtype) proposals help to get rid of some of those object arrays. Josef > > > > >> >> On Tue, Aug 25, 2015 at 12:00 PM, Travis Oliphant >> wrote: >> >>> Thanks for the write-up Nathaniel. There is a lot of great detail and >>> interesting ideas here. >>> >>> I've am very eager to understand how to help NumPy and the wider >>> community move forward however I can (my passions on this have not changed >>> since 1999, though what I myself spend time on has changed). >>> >>> There are a lot of ways to think about approaching this, though. It's >>> hard to get all the ideas on the table, and it was unfortunate we couldn't >>> get everybody wyho are core NumPy devs together in person to have this >>> discussion as there are still a lot of questions unanswered and a lot of >>> thought that has gone into other approaches that was not brought up or >>> represented in the meeting (how does Numba fit into this, what about >>> data-shape, dynd, memory-views and Python type system, etc.). If NumPy >>> becomes just an interface-specification, then why don't we just do that >>> *outside* NumPy itself in a way that doesn't jeopardize the stability of >>> NumPy today. These are some of the real questions I have. I will try >>> to write up my thoughts in more depth soon, but I won't be able to respond >>> in-depth right now. I just wanted to comment because Nathaniel said I >>> disagree which is only partly true. >>> >>> The three most important things for me are 1) let's make sure we have >>> representation from as wide of the community as possible (this is really >>> hard), 2) let's look around at the broader community and the prior art that >>> is happening in this space right now and 3) let's not pretend we are going >>> to be able to make all this happen without breaking ABI compatibility. >>> Let's just break ABI compatibility with NumPy 2.0 *and* have as much >>> fidelity with the API and semantics of current NumPy as possible (though >>> there will be some changes necessary long-term). >>> >>> I don't think we should intentionally break ABI if we can avoid it, but >>> I also don't think we should spend in-ordinate amounts of time trying to >>> pretend that we won't break ABI (for at least some people), and most >>> importantly we should not pretend *not* to break the ABI when we actually >>> do. We did this once before with the roll-out of date-time, and it was >>> really un-necessary. When I released NumPy 1.0, there were several >>> things that I knew should be fixed very soon (NumPy was never designed to >>> not break ABI). Those problems are still there. Now, that we have >>> quite a bit better understanding of what NumPy *should* be (there have been >>> tremendous strides in understanding and community size over the past 10 >>> years), let's actually make the infrastructure we think will last for the >>> next 20 years (instead of trying to shoe-horn new ideas into a 20-year old >>> code-base that wasn't designed for it). >>> >>> NumPy is a hard code-base. It has been since Numeric days in 1995. >>> I could be wrong, but my guess is that we will be passed by as a community >>> if we don't seize the opportunity to build something better than we can >>> build if we are forced to use a 20 year old code-base. >>> >>> It is more important to not break people's code and to be clear when a >>> re-compile is necessary for dependencies. Those to me are the most >>> important constraints. There are a lot of great ideas that we all have >>> about what we want NumPy to be able to do. Some of this are pretty >>> transformational (and the more exciting they are, the harder I think they >>> are going to be to implement without breaking at least the ABI). There >>> is probably some CAP-like theorem around >>> Stability-Features-Speed-of-Development (pick 2) when it comes to Open >>> Source Software development and making feature-progress with NumPy *is >>> going* to create in-stability which concerns me. >>> >>> I would like to see a little-bit-of-pain one time with a NumPy 2.0, >>> rather than a constant pain because of constant churn over many years >>> approach that Nathaniel seems to advocate. To me NumPy 2.0 is an >>> ABI-breaking release that is as API-compatible as possible and whose >>> semantics are not dramatically different. >>> >>> There are at least 3 areas of compatibility (ABI, API, and semantic). >>> ABI-compatibility is a non-feature in today's world. There are so many >>> distributions of the NumPy stack (and conda makes it trivial for anyone to >>> build their own or for you to build one yourself). Making less-optimal >>> software-engineering choices because of fear of breaking the ABI is not >>> something I'm supportive of at all. We should not break ABI every >>> release, but a release every 3 years that breaks ABI is not a problem. >>> >>> API compatibility should be much more sacrosanct, but it is also >>> something that can also be managed. Any NumPy 2.0 should definitely >>> support the full NumPy API (though there could be deprecated swaths). I >>> think the community has done well in using deprecation and limiting the >>> public API to make this more manageable and I would love to see a NumPy 2.0 >>> that solidifies a future-oriented API along with a back-ward compatible API >>> that is also available. >>> >>> Semantic compatibility is the hardest. We have already broken this on >>> multiple occasions throughout the 1.x NumPy releases. Every time you >>> change the code, this can change. This is what I fear causing deep >>> instability over the course of many years. These are things like the >>> casting rule details, the effect of indexing changes, any change to the >>> calculations approaches. It is and has been the most at risk during any >>> code-changes. My view is that a NumPy 2.0 (with a new low-level >>> architecture) minimizes these changes to a single release rather than >>> unavoidably spreading them out over many, many releases. >>> >>> I think that summarizes my main concerns. I will write-up more forward >>> thinking ideas for what else is possible in the coming weeks. In the mean >>> time, thanks for keeping the discussion going. It is extremely exciting to >>> see the help people have continued to provide to maintain and improve >>> NumPy. It will be exciting to see what the next few years bring as well. >>> >>> >>> Best, >>> >>> -Travis >>> >>> >>> >>> >>> >>> >>> On Tue, Aug 25, 2015 at 5:03 AM, Nathaniel Smith wrote: >>> >>>> Hi all, >>>> >>>> These are the notes from the NumPy dev meeting held July 7, 2015, at >>>> the SciPy conference in Austin, presented here so the list can keep up >>>> with what happens, and so you can give feedback. Please do give >>>> feedback, none of this is final! >>>> >>>> (Also, if anyone who was there notices anything I left out or >>>> mischaracterized, please speak up -- these are a lot of notes I'm >>>> trying to gather together, so I could easily have missed something!) >>>> >>>> Thanks to Jill Cowan and the rest of the SciPy organizers for donating >>>> space and organizing logistics for us, and to the Berkeley Institute >>>> for Data Science for funding travel for Jaime, Nathaniel, and >>>> Sebastian. >>>> >>>> >>>> Attendees >>>> ========= >>>> >>>> Present in the room for all or part: Daniel Allan, Chris Barker, >>>> Sebastian Berg, Thomas Caswell, Jeff Reback, Jaime Fern?ndez del >>>> R?o, Chuck Harris, Nathaniel Smith, St?fan van der Walt. (Note: I'm >>>> pretty sure this list is incomplete) >>>> >>>> Joining remotely for all or part: Stephan Hoyer, Julian Taylor. >>>> >>>> >>>> Formalizing our governance/decision making >>>> ========================================== >>>> >>>> This was a major focus of discussion. At a high level, the consensus >>>> was to steal IPython's governance document ("IPEP 29") and modify it >>>> to remove its use of a BDFL as a "backstop" to normal community >>>> consensus-based decision, and replace it with a new "backstop" based >>>> on Apache-project-style consensus voting amongst the core team. >>>> >>>> I'll send out a proper draft of this shortly for further discussion. >>>> >>>> >>>> Development roadmap >>>> =================== >>>> >>>> General consensus: >>>> >>>> Let's assume NumPy is going to remain important indefinitely, and >>>> try to make it better, instead of waiting for something better to >>>> come along. (This is unlikely to be wasted effort even if something >>>> better does come along, and it's hardly a sure thing that that will >>>> happen anyway.) >>>> >>>> Let's focus on evolving numpy as far as we can without major >>>> break-the-world changes (no "numpy 2.0", at least in the foreseeable >>>> future). >>>> >>>> And, as a target for that evolution, let's change our focus from >>>> numpy as "NumPy is the library that gives you the np.ndarray object >>>> (plus some attached infrastructure)", to "NumPy provides the >>>> standard framework for working with arrays and array-like objects in >>>> Python" >>>> >>>> This means, creating defined interfaces between array-like objects / >>>> ufunc objects / dtype objects, so that it becomes possible for third >>>> parties to add their own and mix-and-match. Right now ufuncs are >>>> pretty good at this, but if you want a new array class or dtype then >>>> in most cases you pretty much have to modify numpy itself. >>>> >>>> Vision: instead of everyone who wants a new container type having to >>>> reimplement all of numpy, Alice can implement an array class using >>>> (sparse / distributed / compressed / tiled / gpu / out-of-core / >>>> delayed / ...) storage, pass it to code that was written using >>>> direct calls to np.* functions, and it just works. (Instead of >>>> np.sin being "the way you calculate the sine of an ndarray", it's >>>> "the way you calculate the sine of any array-like container >>>> object".) >>>> >>>> Vision: Darryl can implement a new dtype for (categorical data / >>>> astronomical dates / integers-with-missing-values / ...) without >>>> having to touch the numpy core. >>>> >>>> Vision: Chandni can then come along and combine them by doing >>>> >>>> a = alice_array([...], dtype=darryl_dtype) >>>> >>>> and it just works. >>>> >>>> Vision: no-one is tempted to subclass ndarray, because anything you >>>> can do with an ndarray subclass you can also easily do by defining >>>> your own new class that implements the "array protocol". >>>> >>>> >>>> Supporting third-party array types >>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>>> >>>> Sub-goals: >>>> - Get __numpy_ufunc__ done, which will cover a good chunk of numpy's >>>> API right there. >>>> - Go through the rest of the stuff in numpy, and figure out some >>>> story for how to let it handle third-party array classes: >>>> - ufunc ALL the things: Some things can be converted directly into >>>> (g)ufuncs and then use __numpy_ufunc__ (e.g., np.std); some >>>> things could be converted into (g)ufuncs if we extended the >>>> (g)ufunc interface a bit (e.g. np.sort, np.matmul). >>>> - Some things probably need their own __numpy_ufunc__-like >>>> extensions (__numpy_concatenate__?) >>>> - Provide tools to make it easier to implement the more complicated >>>> parts of an array object (e.g. the bazillion different methods, >>>> many of which are ufuncs in disguise, or indexing) >>>> - Longer-run interesting research project: __numpy_ufunc__ requires >>>> that one or the other object have explicit knowledge of how to >>>> handle the other, so to handle binary ufuncs with N array types >>>> you need something like N**2 __numpy_ufunc__ code paths. As an >>>> alternative, if there were some interface that an object could >>>> export that provided the operations nditer needs to efficiently >>>> iterate over (chunks of) it, then you would only need N >>>> implementations of this interface to handle all N**2 operations. >>>> >>>> This would solve a lot of problems for projects like: >>>> - blosc >>>> - dask >>>> - distarray >>>> - numpy.ma >>>> - pandas >>>> - scipy.sparse >>>> - xray >>>> >>>> >>>> Supporting third-party dtypes >>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>>> >>>> We already have something like a C level "dtype >>>> protocol". Conceptually, the way you define a new dtype is by >>>> defining a new class whose instances have data attributes defining >>>> the parameters of the dtype (what fields are in *this* record dtype, >>>> how many characters are in *this* string dtype, what units are used >>>> for *this* datetime64, etc.), and you define a bunch of methods to >>>> do things like convert an object from a Python object to your dtype >>>> or vice-versa, to copy an array of your dtype from one place to >>>> another, to cast to and from your new dtype, etc. This part is >>>> great. >>>> >>>> The problem is, in the current implementation, we don't actually use >>>> the Python object system to define these classes / attributes / >>>> methods. Instead, all possible dtypes are jammed into a single >>>> Python-level class, whose struct has fields for the union of all >>>> possible dtype's attributes, and instead of Python-style method >>>> slots there's just a big table of function pointers attached to each >>>> object. >>>> >>>> So the main proposal is that we keep the basic design, but switch it >>>> so that the float64 dtype, the int64 dtype, etc. actually literally >>>> are subclasses of np.dtype, each implementing their own fields and >>>> Python-style methods. >>>> >>>> Some of the pieces involved in doing this: >>>> >>>> - The current dtype methods should be cleaned up -- e.g. 'dot' and >>>> 'less_than' are both dtype methods, when conceptually they're much >>>> more like ufuncs. >>>> >>>> - The ufunc inner-loop interface currently does not get a reference >>>> to the dtype object, so they can't see its attributes and this is >>>> a big obstacle to many interesting dtypes (e.g., it's hard to >>>> implement np.equal for categoricals if you don't know what >>>> categories each has). So we need to add new arguments to the core >>>> ufunc loop signature. (Fortunately this can be done in a >>>> backwards-compatible way.) >>>> >>>> - We need to figure out what exactly the dtype methods should be, >>>> and add them to the dtype class (possibly with backwards >>>> compatibility shims for anyone who is accessing PyArray_ArrFuncs >>>> directly). >>>> >>>> - Casting will be possibly the trickiest thing to work out, though >>>> the basic idea of using dunder-dispatch-like __cast__ and >>>> __rcast__ methods seems workable. (Encouragingly, this is also >>>> exactly what dynd also does, though unfortunately dynd does not >>>> yet support user-defined dtypes even to the extent that numpy >>>> does, so there isn't much else we can steal from them.) >>>> - We may also want to rethink the casting rules while we're at it, >>>> since they have some very weird corners right now (e.g. see >>>> [https://github.com/numpy/numpy/issues/6240]) >>>> >>>> - We need to migrate the current dtypes over to the new system, >>>> which can be done in stages: >>>> >>>> - First stick them all in a single "legacy dtype" class whose >>>> methods just dispatch to the PyArray_ArrFuncs per-object "method >>>> table" >>>> >>>> - Then move each of them into their own classes >>>> >>>> - We should provide a Python-level wrapper for the protocol, so that >>>> you can call dtype methods from Python >>>> >>>> - And vice-versa, it should be possible to subclass dtype at the >>>> Python level >>>> >>>> - etc. >>>> >>>> Fortunately, AFAICT pretty much all of this can be done while >>>> maintaining backwards compatibility (though we may want to break >>>> some obscure cases to avoid expending *too* much effort with weird >>>> backcompat contortions that will only help a vanishingly small >>>> proportion of the userbase), and a lot of the above changes can be >>>> done as semi-independent mini-projects, so there's no need for some >>>> branch to go off and spend a year rewriting the world. >>>> >>>> Obviously there are still a lot of details to work out, though. But >>>> overall, there was widespread agreement that this is one of the #1 >>>> pain points for our users (e.g. it's the single main request from >>>> pandas), and fixing it is very high priority. >>>> >>>> Some features that would become straightforward to implement >>>> (e.g. even in third-party libraries) if this were fixed: >>>> - missing value support >>>> - physical unit tracking (meters / seconds -> array of velocity; >>>> meters + seconds -> error) >>>> - better and more diverse datetime representations (e.g. datetimes >>>> with attached timezones, or using funky geophysical or >>>> astronomical calendars) >>>> - categorical data >>>> - variable length strings >>>> - strings-with-encodings (e.g. latin1) >>>> - forward mode automatic differentiation (write a function that >>>> computes f(x) where x is an array of float64; pass that function >>>> an array with a special dtype and get out both f(x) and f'(x)) >>>> - probably others I'm forgetting right now >>>> >>>> I should also note that there was one substantial objection to this >>>> plan, from Travis Oliphant (in discussions later in the >>>> conference). I'm not confident I understand his objections well >>>> enough to reproduce them here, though -- perhaps he'll elaborate. >>>> >>>> >>>> Money >>>> ===== >>>> >>>> There was an extensive discussion on the topic of: "if we had money, >>>> what would we do with it?" >>>> >>>> This is partially motivated by the realization that there are a >>>> number of sources that we could probably get money from, if we had a >>>> good story for what we wanted to do, so it's not just an idle >>>> question. >>>> >>>> Points of general agreement: >>>> >>>> - Doing the in-person meeting was a good thing. We should plan do >>>> that again, at least once a year. So one thing to spend money on >>>> is travel subsidies to make sure that happens and is productive. >>>> >>>> - While it's tempting to imagine hiring junior people for the more >>>> frustrating/boring work like maintaining buildbots, release >>>> infrastructure, updating docs, etc., this seems difficult to do >>>> realistically with our current resources -- how do we hire for >>>> this, who would manage them, etc.? >>>> >>>> - On the other hand, the general feeling was that if we found the >>>> money to hire a few more senior people who could take care of >>>> themselves more, then that would be good and we could >>>> realistically absorb that extra work without totally unbalancing >>>> the project. >>>> >>>> - A major open question is how we would recruit someone for a >>>> position like this, since apparently all the obvious candidates >>>> who are already active on the NumPy team already have other >>>> things going on. [For calibration on how hard this can be: NYU >>>> has apparently had an open position for a year with the job >>>> description of "come work at NYU full-time with a >>>> private-industry-competitive-salary on whatever your personal >>>> open-source scientific project is" (!) and still is having an >>>> extremely difficult time filling it: >>>> [http://cds.nyu.edu/research-engineer/]] >>>> >>>> - General consensus though was that there isn't much to be done >>>> about this though, except try it and see. >>>> >>>> - (By the way, if you're someone who's reading this and >>>> potentially interested in like a postdoc or better working on >>>> numpy, then let's talk...) >>>> >>>> >>>> More specific changes to numpy that had general consensus, but don't >>>> really fit into a high-level roadmap >>>> >>>> ========================================================================================================= >>>> >>>> - Resolved: we should merge multiarray.so and umath.so into a single >>>> extension module, so that they can share utility code without the >>>> current awkward contortions. >>>> >>>> - Resolved: we should start hiding new fields in the ufunc and dtype >>>> structs as soon as possible going forward. (I.e. they would not be >>>> present in the version of the structs that are exposed through the >>>> C API, but internally we would use a more detailed struct.) >>>> - Mayyyyyybe we should even go ahead and hide the subset of the >>>> existing fields that are really internal details that no-one >>>> should be using. If we did this without changing anything else >>>> then it would preserve ABI (the fields would still be where >>>> existing compiled extensions expect them to be, if any such >>>> extensions exist) while breaking API (trying to compile such >>>> extensions would give a clear error), so would be a smoother >>>> ramp if we think we need to eventually break those fields for >>>> real. (As discussed above, there are a bunch of fields in the >>>> dtype base class that only make sense for specific dtype >>>> subclasses, e.g. only record dtypes need a list of field names, >>>> but right now all dtypes have one anyway. So it would be nice to >>>> remove these from the base class entirely, but that is >>>> potentially ABI-breaking.) >>>> >>>> - Resolved: np.array should never return an object array unless >>>> explicitly requested (e.g. with dtype=object); it just causes too >>>> many surprising problems. >>>> - First step: add a deprecation warning >>>> - Eventually: make it an error. >>>> >>>> - The matrix class >>>> - Resolved: We won't add warnings yet, but we will prominently >>>> document that it is deprecated and should be avoided where-ever >>>> possible. >>>> - St?fan van der Walt volunteers to do this. >>>> - We'd all like to deprecate it properly, but the feeling was that >>>> the precondition for this is for scipy.sparse to provide sparse >>>> "arrays" that don't return np.matrix objects on ordinary >>>> operatoins. Until that happens we can't reasonably tell people >>>> that using np.matrix is a bug. >>>> >>>> - Resolved: we should add a similar prominent note to the >>>> "subclassing ndarray" documentation, warning people that this is >>>> painful and barely works and please don't do it if you have any >>>> alternatives. >>>> >>>> - Resolved: we want more, smaller releases -- every 6 months at >>>> least, aiming to go even faster (every 4 months?) >>>> >>>> - On the question of using Cython inside numpy core: >>>> - Everyone agrees that there are places where this would be an >>>> improvement (e.g., Python<->C interfaces, and places "when you >>>> want to do computer science", e.g. complicated algorithmic stuff >>>> like graph traversals) >>>> - Chuck wanted it to be clear though that he doesn't think it >>>> would be a good goal to try and rewrite all of numpy in Cython >>>> -- there also exist places where Cython ends up being "an uglier >>>> version of C". No-one disagreed. >>>> >>>> - Our text reader is apparently not very functional on Python 3, and >>>> generally slow and hard to work with. >>>> - Resolved: We should extract Pandas's awesome text reader/parser >>>> and convert it into its own package, that could then become a >>>> new backend for both pandas and numpy.loadtxt. >>>> - Jeff thinks this is a great idea >>>> - Thomas Caswell volunteers to do the extraction. >>>> >>>> - We should work on improving our tools for evolving the ABI, so >>>> that we will eventually be less constrained by decisions made >>>> decades ago. >>>> - One idea that had a lot of support was to switch from our >>>> current append-only C-API to a "sliding window" API based on >>>> explicit versions. So a downstream package might say >>>> >>>> #define NUMPY_API_VERSION 4 >>>> >>>> and they'd get the functions and behaviour provided in "version >>>> 4" of the numpy C api. If they wanted to get access to new stuff >>>> that was added in version 5, then they'd need to switch that >>>> #define, and at the same time clean up any usage of stuff that >>>> was removed or changed in version 5. And to provide a smooth >>>> migration path, one version of numpy would support multiple >>>> versions at once, gradually deprecating and dropping old >>>> versions. >>>> >>>> - If anyone wants to help bring pip up to scratch WRT tracking ABI >>>> dependencies (e.g., 'pip install numpy==' >>>> -> triggers rebuild of scipy against the new ABI), then that >>>> would be an extremely useful thing. >>>> >>>> >>>> Policies that should be documented >>>> ================================== >>>> >>>> ...together with some notes about what the contents of the document >>>> should be: >>>> >>>> >>>> How we manage bugs in the bug tracker. >>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>>> >>>> - Github "milestones" should *only* be assigned to release-blocker >>>> bugs (which mostly means "regression from the last release"). >>>> >>>> In particular, if you're tempted to push a bug forward to the next >>>> release... then it's clearly not a blocker, so don't set it to the >>>> next release's milestone, just remove the milestone entirely. >>>> >>>> (Obvious exception to this: deprecation followup bugs where we >>>> decide that we want to keep the deprecation around a bit longer >>>> are a case where a bug actually does switch from being a blocker >>>> to release 1.x to being a blocker for release 1.(x+1).) >>>> >>>> - Don't hesitate to close an issue if there's no way forward -- >>>> e.g. a PR where the author has disappeared. Just post a link to >>>> this policy and close, with a polite note that we need to keep our >>>> tracker useful as a todo list, but they're welcome to re-open if >>>> things change. >>>> >>>> >>>> Deprecations and breakage policy: >>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>>> >>>> - How long do we need to keep DeprecationWarnings around before we >>>> break things? This is tricky because on the one hand an aggressive >>>> (short) deprecation period lets us deliver new features and >>>> important cleanups more quickly, but on the other hand a >>>> too-aggressive deprecation period is difficult for our more >>>> conservative downstream users. >>>> >>>> - Idea that had the most support: pick a somewhat-aggressive >>>> warning period as our default, and make a rule that if someone >>>> asks for an extension during the beta cycle for the release that >>>> removes it, then we put it back for another release or two worth >>>> of grace period. (While also possibly upgrading the warning to >>>> be more visible during the grace period.) This gives us >>>> deprecation periods that are more adaptive on a case-by-case >>>> basis. >>>> >>>> - Lament: it would be really nice if we could get more people to >>>> test our beta releases, because in practice right now 1.x.0 ends >>>> up being where we actually the discover all the bugs, and 1.x.1 is >>>> where it actually becomes usable. Which sucks, and makes it >>>> difficult to have a solid policy about what counts as a >>>> regression, etc. Is there anything we can do about this? >>>> >>>> - ABI breakage: we distinguish between an ABI break that breaks >>>> everything (e.g., "import scipy" segfaults), versus an ABI break >>>> that breaks an occasional rare case (e.g., only apps that poke >>>> around in some obscure corner of some struct are affected). >>>> >>>> - The "break-the-world" type remains off-limit for now: the pain >>>> is still too large (conda helps, but there are lots of people >>>> who don't use conda!), and there aren't really any compelling >>>> improvements that this would enable anyway. >>>> >>>> - For the "break-0.1%-of-users" type, it is *not* ruled out by >>>> fiat, though we remain conservative: we should treat it like >>>> other API breaks in principle, and do a careful case-by-case >>>> analysis of the details of the situation, taking into account >>>> what kind of code would be broken, how common these cases are, >>>> how important the benefits are, whether there are any specific >>>> mitigation strategies we can use, etc. -- with this process of >>>> course taking into account that a segfault is nastier than a >>>> Python exception. >>>> >>>> >>>> Other points that were discussed >>>> ================================ >>>> >>>> - There was inconclusive discussion of what we should do with dot() >>>> in the places where it disagrees with the PEP 465 matmul semantics >>>> (specifically this is when both arguments have ndim >= 3, or one >>>> argument has ndim == 0). >>>> - The concern is that the current behavior is not very useful, and >>>> as far as we can tell no-one is using it; but, as people get >>>> used to the more-useful PEP 465 behavior, they will increasingly >>>> try to use it on the assumption that np.dot will work the same >>>> way, and this will create pain for lots of people. So Nathaniel >>>> argued that we should start at least issuing a visible warning >>>> when people invoke the corner-case behavior. >>>> - But OTOH, np.dot is such a core piece of infrastructure, and >>>> there's such a large landscape of code out there using numpy >>>> that we can't see, that others were reasonably wary of making >>>> any change. >>>> - For now: document prominently, but no change in behavior. >>>> >>>> >>>> Links to raw notes >>>> ================== >>>> >>>> Main page: >>>> [https://github.com/numpy/numpy/wiki/SciPy-2015-developer-meeting] >>>> >>>> Notes from the meeting proper: >>>> [ >>>> https://docs.google.com/document/d/1IJcYdsHtk8MVAM4AZqFDBSf_nVG-mrB4Tv2bh9u1g4Y/edit?usp=sharing >>>> ] >>>> >>>> Slides from the followup BoF: >>>> [ >>>> https://gist.github.com/njsmith/eb42762054c88e810786/raw/b74f978ce10a972831c582485c80fb5b8e68183b/future-of-numpy-bof.odp >>>> ] >>>> >>>> Notes from the followup BoF: >>>> [ >>>> https://docs.google.com/document/d/11AuTPms5dIPo04JaBOWEoebXfk-tUzEZ-CvFnLIt33w/edit >>>> ] >>>> >>>> -n >>>> >>>> -- >>>> Nathaniel J. Smith -- http://vorpus.org >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>> >>> >>> >>> >>> -- >>> >>> *Travis Oliphant* >>> *Co-founder and CEO* >>> >>> >>> @teoliphant >>> 512-222-5440 >>> http://www.continuum.io >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> >> >> -- >> Nathaniel J. Smith -- http://vorpus.org >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > > -- > > *Travis Oliphant* > *Co-founder and CEO* > > > @teoliphant > 512-222-5440 > http://www.continuum.io > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Thu Aug 27 11:04:48 2015 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 27 Aug 2015 16:04:48 +0100 Subject: [Numpy-discussion] Comments on governance proposal (was: Notes from the numpy dev meeting at scipy 2015) In-Reply-To: References: <87y4gxllb4.fsf@berkeley.edu> <1440673873.1694.33.camel@sipsolutions.net> Message-ID: Hi, On Thu, Aug 27, 2015 at 3:34 PM, wrote: [snip] > I don't really see a problem with "codifying" the status quo. That's an excellent point. If we believe that the current situation is the best possible, both now and in the future, then codifying the status quo is an excellent idea. So, we should probably first start by asking ourselves: * what numpy is doing well; * what numpy could do better; and then ask, is there some way we could make it more likely we will improve over time. [snip] > As the current debate shows it's possible to have a public discussion about > the direction of the project without having to delegate providing a vision > to a president. The idea of a president that I had in mind, was not someone who makes all decisions, but the person who holds themselves responsible for the performance of the project. If the project has a coherent vision already, the president has no need to provide one, but it's the president's job to worry about whether we have vision or not, and do what they need to, to make sure we don't lose track of that. If you don't know it already, I highly recommend Jim Collins' work on 'level 5 leadership' [1] > Given the current pattern all critical issues end up in a public debate on > the mailing list. What numpy (and scipy) need is to have someone as a tie > breaker to make any final decisions if there is no clear consensus, if there > is no BDFL for it, then having the "core" group making those decisions looks > appropriate to me. > >> "On the other hand the 'core' system seems to function on a > model of mutual deference and personal loyalty that I believe is > destructive of good management." > > That sounds actually like a good basis for team work to me. Also that has > "mutual" in it instead of just deferring and being loyal to a president. > > Since I know scipy development much better: > > scipy has made a huge progress in the last 5 or 6 years since I've been > following it, both in terms of code, in terms of development workflow, and > in the number of developers. (When I started, I was essentially alone in > scipy.stats, now there are 3 to 5 "core" developers that at least partially > work on it, everything goes through PRs with public discussion and with > critical issues additionally raised on the mailing list.) > > Ralf and Pauli are the defacto BDFLs for scipy overall, but all decisions in > recent years have been without a fight, but not without lots of arguments, > and, given the size and breadth of scipy, there are field experts (although > not enough of those) to help in the discussion. I agree entirely, I think scipy is a good example where stability and clarity of leadership has made a huge difference to the health of the project. Cheers, Matthew [1] https://hbr.org/2005/07/level-5-leadership-the-triumph-of-humility-and-fierce-resolve From matthew.brett at gmail.com Thu Aug 27 11:12:28 2015 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 27 Aug 2015 16:12:28 +0100 Subject: [Numpy-discussion] Comments on governance proposal (was: Notes from the numpy dev meeting at scipy 2015) In-Reply-To: References: <87y4gxllb4.fsf@berkeley.edu> <1440673873.1694.33.camel@sipsolutions.net> Message-ID: Hi, On Thu, Aug 27, 2015 at 3:43 PM, Bryan Van de Ven wrote: > >> On Aug 27, 2015, at 1:57 PM, Matthew Brett wrote: >> >> The 'president' idea > > ...seems to be predicated on a steady stream of people who: actually want job, don't mind campaigning, are willing to accept any and all blame, and have the technical experience to make "final decisions". As others have pointed out the active developer community for NumPy is not measured in the hundreds (or even the tens, really). So: what is your proposed recourse if you hold an election and no-one shows up to run? > That seems to me a soluble problem, if there's agreement that the president idea is a sensible one. One very simple idea would be to revert to a 'core' system for the term for which there were no candidates. On the other hand, I suspect that there are people who care enough about numpy that they are prepared to step up and take the blame if things go wrong. Cheers, Matthew From ben.v.root at gmail.com Thu Aug 27 11:15:40 2015 From: ben.v.root at gmail.com (Benjamin Root) Date: Thu, 27 Aug 2015 11:15:40 -0400 Subject: [Numpy-discussion] 1.10.0rc1 In-Reply-To: <1440686674.11529.5.camel@sipsolutions.net> References: <20150826151141.17db3046@fsol> <20150827134424.38ec9eac@fsol> <1440686674.11529.5.camel@sipsolutions.net> Message-ID: Ok, I just wanted to make sure I understood the issue before going bug hunting. Chances are, it has been a bug on our end for a while now. Just to make sure, is the following valid? arr = np.zeros((5, 3)) ind = np.array([True, True, True, False, True]) arr[ind] # gives a 4x3 result Running that at the REPL doesn't produce a warning, so i am guessing that it is valid. Ben Root On Thu, Aug 27, 2015 at 10:44 AM, Sebastian Berg wrote: > On Do, 2015-08-27 at 08:04 -0600, Charles R Harris wrote: > > > > > > On Thu, Aug 27, 2015 at 7:52 AM, Benjamin Root > > wrote: > > > > > > Ok, I tested matplotlib master against numpy master, and there > > were no errors. I did get a bunch of new deprecation warnings > > though such as: > > > > > "/nas/home/broot/centos6/lib/python2.7/site-packages/matplotlib-1.5.dev1-py2.7-linux-x86_64.egg/matplotlib/colorbar.py:539: > VisibleDeprecationWarning: boolean index did not match indexed array along > dimension 0; dimension is 5 but corresponding boolean dimension is 3 > > colors = np.asarray(colors)[igood]" > > > > > > The message isn't exactly clear. I suspect the problem is a > > shape mismatch, like colors is 5x3, and igood is just 3 for > > some reason. Could somebody shine some light on this, please? > > > > > > > > IIRC, Boolean indexing would fill out the dimension, i.e., len 3 would > > be expanded to len 5 in this case. That behavior is deprecated. > > > > Yes, this is exactly the case, you have something like: > > arr = np.zeros((5, 3)) > ind = np.array([True, False, False]) > arr[ind, :] > > and numpy nowadays thinks that such code is likely a bug (when the ind > is shorter than arr it is somewhat OK, the other way around gets more > creepy). If you have an idea of how to make the error message clearer, > or objections to the change, I am happy to hear it! > > - Sebastian > > > > > > Chuck > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Thu Aug 27 11:33:16 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Thu, 27 Aug 2015 17:33:16 +0200 Subject: [Numpy-discussion] 1.10.0rc1 In-Reply-To: References: <20150826151141.17db3046@fsol> <20150827134424.38ec9eac@fsol> <1440686674.11529.5.camel@sipsolutions.net> Message-ID: <1440689596.11529.7.camel@sipsolutions.net> On Do, 2015-08-27 at 11:15 -0400, Benjamin Root wrote: > Ok, I just wanted to make sure I understood the issue before going bug > hunting. Chances are, it has been a bug on our end for a while now. > Just to make sure, is the following valid? > > > arr = np.zeros((5, 3)) > > ind = np.array([True, True, True, False, True]) > > arr[ind] # gives a 4x3 result > > > Running that at the REPL doesn't produce a warning, so i am guessing > that it is valid. > Sure, that is perfect (you can add the slice and write `arr[ind, :]` to make it a bit more clear if you like I guess). - Sebastian > > Ben Root > > > On Thu, Aug 27, 2015 at 10:44 AM, Sebastian Berg > wrote: > On Do, 2015-08-27 at 08:04 -0600, Charles R Harris wrote: > > > > > > On Thu, Aug 27, 2015 at 7:52 AM, Benjamin Root > > > wrote: > > > > > > Ok, I tested matplotlib master against numpy master, > and there > > were no errors. I did get a bunch of new deprecation > warnings > > though such as: > > > > > "/nas/home/broot/centos6/lib/python2.7/site-packages/matplotlib-1.5.dev1-py2.7-linux-x86_64.egg/matplotlib/colorbar.py:539: VisibleDeprecationWarning: boolean index did not match indexed array along dimension 0; dimension is 5 but corresponding boolean dimension is 3 > > colors = np.asarray(colors)[igood]" > > > > > > The message isn't exactly clear. I suspect the > problem is a > > shape mismatch, like colors is 5x3, and igood is > just 3 for > > some reason. Could somebody shine some light on > this, please? > > > > > > > > IIRC, Boolean indexing would fill out the dimension, i.e., > len 3 would > > be expanded to len 5 in this case. That behavior is > deprecated. > > > > Yes, this is exactly the case, you have something like: > > arr = np.zeros((5, 3)) > ind = np.array([True, False, False]) > arr[ind, :] > > and numpy nowadays thinks that such code is likely a bug (when > the ind > is shorter than arr it is somewhat OK, the other way around > gets more > creepy). If you have an idea of how to make the error message > clearer, > or objections to the change, I am happy to hear it! > > - Sebastian > > > > > > Chuck > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From ben.v.root at gmail.com Thu Aug 27 11:49:22 2015 From: ben.v.root at gmail.com (Benjamin Root) Date: Thu, 27 Aug 2015 11:49:22 -0400 Subject: [Numpy-discussion] 1.10.0rc1 In-Reply-To: <1440689596.11529.7.camel@sipsolutions.net> References: <20150826151141.17db3046@fsol> <20150827134424.38ec9eac@fsol> <1440686674.11529.5.camel@sipsolutions.net> <1440689596.11529.7.camel@sipsolutions.net> Message-ID: The reason why we don't have that extra slice is because we may not know ahead of time that we are dealing with a 2D array. It could be a 1D array. I guess we could use ellipses, but I wanted to make sure that the numpy devs consider the above to be perfectly valid semantics because it is entrenched in our codebase. Ben Root On Thu, Aug 27, 2015 at 11:33 AM, Sebastian Berg wrote: > On Do, 2015-08-27 at 11:15 -0400, Benjamin Root wrote: > > Ok, I just wanted to make sure I understood the issue before going bug > > hunting. Chances are, it has been a bug on our end for a while now. > > Just to make sure, is the following valid? > > > > > > arr = np.zeros((5, 3)) > > > > ind = np.array([True, True, True, False, True]) > > > > arr[ind] # gives a 4x3 result > > > > > > Running that at the REPL doesn't produce a warning, so i am guessing > > that it is valid. > > > > Sure, that is perfect (you can add the slice and write `arr[ind, :]` to > make it a bit more clear if you like I guess). > > - Sebastian > > > > > > Ben Root > > > > > > On Thu, Aug 27, 2015 at 10:44 AM, Sebastian Berg > > wrote: > > On Do, 2015-08-27 at 08:04 -0600, Charles R Harris wrote: > > > > > > > > > On Thu, Aug 27, 2015 at 7:52 AM, Benjamin Root > > > > > wrote: > > > > > > > > > Ok, I tested matplotlib master against numpy master, > > and there > > > were no errors. I did get a bunch of new deprecation > > warnings > > > though such as: > > > > > > > > > "/nas/home/broot/centos6/lib/python2.7/site-packages/matplotlib-1.5.dev1-py2.7-linux-x86_64.egg/matplotlib/colorbar.py:539: > VisibleDeprecationWarning: boolean index did not match indexed array along > dimension 0; dimension is 5 but corresponding boolean dimension is 3 > > > colors = np.asarray(colors)[igood]" > > > > > > > > > The message isn't exactly clear. I suspect the > > problem is a > > > shape mismatch, like colors is 5x3, and igood is > > just 3 for > > > some reason. Could somebody shine some light on > > this, please? > > > > > > > > > > > > IIRC, Boolean indexing would fill out the dimension, i.e., > > len 3 would > > > be expanded to len 5 in this case. That behavior is > > deprecated. > > > > > > > Yes, this is exactly the case, you have something like: > > > > arr = np.zeros((5, 3)) > > ind = np.array([True, False, False]) > > arr[ind, :] > > > > and numpy nowadays thinks that such code is likely a bug (when > > the ind > > is shorter than arr it is somewhat OK, the other way around > > gets more > > creepy). If you have an idea of how to make the error message > > clearer, > > or objections to the change, I am happy to hear it! > > > > - Sebastian > > > > > > > > > > Chuck > > > > > > > > > > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at scipy.org > > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Thu Aug 27 12:11:56 2015 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 27 Aug 2015 12:11:56 -0400 Subject: [Numpy-discussion] Comments on governance proposal (was: Notes from the numpy dev meeting at scipy 2015) In-Reply-To: References: <87y4gxllb4.fsf@berkeley.edu> <1440673873.1694.33.camel@sipsolutions.net> Message-ID: On Thu, Aug 27, 2015 at 11:04 AM, Matthew Brett wrote: > Hi, > > On Thu, Aug 27, 2015 at 3:34 PM, wrote: > [snip] > > I don't really see a problem with "codifying" the status quo. > > That's an excellent point. If we believe that the current situation > is the best possible, both now and in the future, then codifying the > status quo is an excellent idea. > > So, we should probably first start by asking ourselves: > > * what numpy is doing well; > * what numpy could do better; > > and then ask, is there some way we could make it more likely we will > improve over time. > > [snip] > > > As the current debate shows it's possible to have a public discussion > about > > the direction of the project without having to delegate providing a > vision > > to a president. > > The idea of a president that I had in mind, was not someone who makes > all decisions, but the person who holds themselves responsible for the > performance of the project. If the project has a coherent vision > already, the president has no need to provide one, but it's the > president's job to worry about whether we have vision or not, and do > what they need to, to make sure we don't lose track of that. If you > don't know it already, I highly recommend Jim Collins' work on 'level > 5 leadership' [1] > Still doesn't sound like the need for a president to me " the person who holds themselves responsible for the performance of the project" sounds more like the role of the "core" group (adding plural to persons) to me, and cannot be pushed of to an official president. Nathaniel to push and organize the discussion, Chuck for continuity, and several core developers for detailed ideas and implementation, and a large number of contributors. (stylized roles) and noisy mailing list for feedback and discussion. Given the size of the numpy development group, numpy needs individuals for the vision and to push things not a president, vice-presidents and assistant vice-presidents, IMO. (Given the importance of numpy itself, there should be enough remedies if the "core" group ever gets `out of touch` with the very large user base.) Josef > > > Given the current pattern all critical issues end up in a public debate > on > > the mailing list. What numpy (and scipy) need is to have someone as a tie > > breaker to make any final decisions if there is no clear consensus, if > there > > is no BDFL for it, then having the "core" group making those decisions > looks > > appropriate to me. > > > >> "On the other hand the 'core' system seems to function on a > > model of mutual deference and personal loyalty that I believe is > > destructive of good management." > > > > That sounds actually like a good basis for team work to me. Also that has > > "mutual" in it instead of just deferring and being loyal to a president. > > > > Since I know scipy development much better: > > > > scipy has made a huge progress in the last 5 or 6 years since I've been > > following it, both in terms of code, in terms of development workflow, > and > > in the number of developers. (When I started, I was essentially alone in > > scipy.stats, now there are 3 to 5 "core" developers that at least > partially > > work on it, everything goes through PRs with public discussion and with > > critical issues additionally raised on the mailing list.) > > > > Ralf and Pauli are the defacto BDFLs for scipy overall, but all > decisions in > > recent years have been without a fight, but not without lots of > arguments, > > and, given the size and breadth of scipy, there are field experts > (although > > not enough of those) to help in the discussion. > > I agree entirely, I think scipy is a good example where stability and > clarity of leadership has made a huge difference to the health of the > project. > > Cheers, > > Matthew > > [1] > https://hbr.org/2005/07/level-5-leadership-the-triumph-of-humility-and-fierce-resolve > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Thu Aug 27 12:22:21 2015 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 27 Aug 2015 17:22:21 +0100 Subject: [Numpy-discussion] Comments on governance proposal (was: Notes from the numpy dev meeting at scipy 2015) In-Reply-To: References: <87y4gxllb4.fsf@berkeley.edu> <1440673873.1694.33.camel@sipsolutions.net> Message-ID: Hi On Thu, Aug 27, 2015 at 5:11 PM, wrote: > > > On Thu, Aug 27, 2015 at 11:04 AM, Matthew Brett > wrote: >> >> Hi, >> >> On Thu, Aug 27, 2015 at 3:34 PM, wrote: >> [snip] >> > I don't really see a problem with "codifying" the status quo. >> >> That's an excellent point. If we believe that the current situation >> is the best possible, both now and in the future, then codifying the >> status quo is an excellent idea. >> >> So, we should probably first start by asking ourselves: >> >> * what numpy is doing well; >> * what numpy could do better; >> >> and then ask, is there some way we could make it more likely we will >> improve over time. >> >> [snip] >> >> > As the current debate shows it's possible to have a public discussion >> > about >> > the direction of the project without having to delegate providing a >> > vision >> > to a president. >> >> The idea of a president that I had in mind, was not someone who makes >> all decisions, but the person who holds themselves responsible for the >> performance of the project. If the project has a coherent vision >> already, the president has no need to provide one, but it's the >> president's job to worry about whether we have vision or not, and do >> what they need to, to make sure we don't lose track of that. If you >> don't know it already, I highly recommend Jim Collins' work on 'level >> 5 leadership' [1] > > > Still doesn't sound like the need for a president to me > > " the person who holds themselves responsible for the > performance of the project" > > sounds more like the role of the "core" group (adding plural to persons) to > me, and cannot be pushed of to an official president. Except that, in the past, having multiple people taking decisions has led to the situation where no-one feels themselves accountable for the result, hence this situation tends to lead to stagnation. > Nathaniel to push and organize the discussion, Chuck for continuity, and > several core developers for detailed ideas and implementation, and a large > number of contributors. (stylized roles) > and noisy mailing list for feedback and discussion. > > Given the size of the numpy development group, numpy needs individuals for > the vision and to push things not a president, vice-presidents and assistant > vice-presidents, IMO. Yes, if the roles were honorary and administrative, they would not be useful. Cheers, Matthew From josef.pktd at gmail.com Thu Aug 27 13:23:53 2015 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 27 Aug 2015 13:23:53 -0400 Subject: [Numpy-discussion] Comments on governance proposal (was: Notes from the numpy dev meeting at scipy 2015) In-Reply-To: References: <87y4gxllb4.fsf@berkeley.edu> <1440673873.1694.33.camel@sipsolutions.net> Message-ID: On Thu, Aug 27, 2015 at 12:22 PM, Matthew Brett wrote: > Hi > > On Thu, Aug 27, 2015 at 5:11 PM, wrote: > > > > > > On Thu, Aug 27, 2015 at 11:04 AM, Matthew Brett > > > wrote: > >> > >> Hi, > >> > >> On Thu, Aug 27, 2015 at 3:34 PM, wrote: > >> [snip] > >> > I don't really see a problem with "codifying" the status quo. > >> > >> That's an excellent point. If we believe that the current situation > >> is the best possible, both now and in the future, then codifying the > >> status quo is an excellent idea. > >> > >> So, we should probably first start by asking ourselves: > >> > >> * what numpy is doing well; > >> * what numpy could do better; > >> > >> and then ask, is there some way we could make it more likely we will > >> improve over time. > >> > >> [snip] > >> > >> > As the current debate shows it's possible to have a public discussion > >> > about > >> > the direction of the project without having to delegate providing a > >> > vision > >> > to a president. > >> > >> The idea of a president that I had in mind, was not someone who makes > >> all decisions, but the person who holds themselves responsible for the > >> performance of the project. If the project has a coherent vision > >> already, the president has no need to provide one, but it's the > >> president's job to worry about whether we have vision or not, and do > >> what they need to, to make sure we don't lose track of that. If you > >> don't know it already, I highly recommend Jim Collins' work on 'level > >> 5 leadership' [1] > > > > > > Still doesn't sound like the need for a president to me > > > > " the person who holds themselves responsible for the > > performance of the project" > > > > sounds more like the role of the "core" group (adding plural to persons) > to > > me, and cannot be pushed of to an official president. > > Except that, in the past, having multiple people taking decisions has > led to the situation where no-one feels themselves accountable for the > result, hence this situation tends to lead to stagnation. > Is there any evidence for this? First, it's several individuals taking joint decisions, jointly agree or not object (LGTM) to merging PRs, it's still a joint decision and accountability is not exclusive. The PR review process makes decisions much more into a joint decision process than it was with isolated SVN commits. (*) Second, if there are separated decisions, then it could also lead to excess change. All these enthusiastic new developers bringing in whatever they (and the local chief) like, and nobody to stop them. In either case, the developer, or local chief, has to deal with the consequences. You merged this PR, now fix it. or Why are you holding up this PR? I can merge it. (*) Even though I'm not a scipy developer anymore, I still feel partially responsible for it, I'm still reviewing some PRs and comment on them, sometimes as cheerleader in favor of something or sometimes pointing out problems, or just checking that it makes sense, and always with an eye on what downstream impact it might have. Another thought: Having an accountable president might actually reduce the feeling of accountability and responsibility of individual developers, so the neteffect is negative. "The president is responsible for this (even though he doesn't have enough time), so I can skip part of this review." Josef > > > Nathaniel to push and organize the discussion, Chuck for continuity, and > > several core developers for detailed ideas and implementation, and a > large > > number of contributors. (stylized roles) > > and noisy mailing list for feedback and discussion. > > > > Given the size of the numpy development group, numpy needs individuals > for > > the vision and to push things not a president, vice-presidents and > assistant > > vice-presidents, IMO. > > Yes, if the roles were honorary and administrative, they would not be > useful. > I'm not sure what you mean here. Given that it's all volunteer work, any president wouldn't have any hard tools. Josef > > Cheers, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Thu Aug 27 13:41:19 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Thu, 27 Aug 2015 19:41:19 +0200 Subject: [Numpy-discussion] Comments on governance proposal (was: Notes from the numpy dev meeting at scipy 2015) In-Reply-To: References: <87y4gxllb4.fsf@berkeley.edu> <1440673873.1694.33.camel@sipsolutions.net> Message-ID: <1440697279.11529.64.camel@sipsolutions.net> On Do, 2015-08-27 at 17:22 +0100, Matthew Brett wrote: > Hi > > On Thu, Aug 27, 2015 at 5:11 PM, wrote: > > > > > > On Thu, Aug 27, 2015 at 11:04 AM, Matthew Brett > > wrote: > >> > >> Hi, > >> > >> On Thu, Aug 27, 2015 at 3:34 PM, wrote: > >> [snip] > >> > I don't really see a problem with "codifying" the status quo. > >> > >> That's an excellent point. If we believe that the current situation > >> is the best possible, both now and in the future, then codifying the > >> status quo is an excellent idea. > >> > >> So, we should probably first start by asking ourselves: > >> > >> * what numpy is doing well; > >> * what numpy could do better; > >> > >> and then ask, is there some way we could make it more likely we will > >> improve over time. > >> > >> [snip] > >> > >> > As the current debate shows it's possible to have a public discussion > >> > about > >> > the direction of the project without having to delegate providing a > >> > vision > >> > to a president. > >> > >> The idea of a president that I had in mind, was not someone who makes > >> all decisions, but the person who holds themselves responsible for the > >> performance of the project. If the project has a coherent vision > >> already, the president has no need to provide one, but it's the > >> president's job to worry about whether we have vision or not, and do > >> what they need to, to make sure we don't lose track of that. If you > >> don't know it already, I highly recommend Jim Collins' work on 'level > >> 5 leadership' [1] > > > > > > Still doesn't sound like the need for a president to me > > > > " the person who holds themselves responsible for the > > performance of the project" > > > > sounds more like the role of the "core" group (adding plural to persons) to > > me, and cannot be pushed of to an official president. > > Except that, in the past, having multiple people taking decisions has > led to the situation where no-one feels themselves accountable for the > result, hence this situation tends to lead to stagnation. Frankly, I am failing to see the direction of these arguments. One thing to remember, that a "core" group is much like a BDFL/president with multiple personalities ;), and a "core" group is not a fixed Oligarchy. Anyone able and willing should be in it and the governance document is clear about that I think (of course nothing is perfect, but we can try). There is the thing of "how". I simply fail to see how the president can even be defined considering the size of the numpy development team (say 10, most of whom are busy with other things most of the time). Also, I fail to see how the president would be any more useful then the agreement of some tasks being handled by some people who are enthusiastic about them (note those do not even have to be in the "core" group for starters, though they should become part of it quickly). This is a community effort and I am starting to feel that the ideas you are giving are from a different management/company context. The goal of the governance is to show how and hopefully make it easy for *anyone* to provide vision. We do not need a manager who decides how to focus allocate resources, instead we must tell everyone that we are happy about any help we can get, and that anyone can pick up a topic they are enthusiastic about and drive numpy ahead. And considering accountability, that help may well amount in saying: "Do NOT do this." A "president" willing to run for such an election, should have a specific vision? Why should they be special to implement it? Note this is also the case in BDFL organizations. If you have a vision to improve python, it does not really matter if you happen to be Guido. You write a PEP and, if people like it, it will be implemented. At the same time we *must* have a well defined form of governance also for organizational things. Right now we cannot even decide on putting someone in charge of overseeing our NumFOCUS donations. NumPy could not even spend its own money! Sorry, getting way too long :(.... - Sebastian > > Nathaniel to push and organize the discussion, Chuck for continuity, and > > several core developers for detailed ideas and implementation, and a large > > number of contributors. (stylized roles) > > and noisy mailing list for feedback and discussion. > > > > Given the size of the numpy development group, numpy needs individuals for > > the vision and to push things not a president, vice-presidents and assistant > > vice-presidents, IMO. > > Yes, if the roles were honorary and administrative, they would not be useful. > > Cheers, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From matthew.brett at gmail.com Thu Aug 27 14:06:10 2015 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 27 Aug 2015 19:06:10 +0100 Subject: [Numpy-discussion] Comments on governance proposal (was: Notes from the numpy dev meeting at scipy 2015) In-Reply-To: References: <87y4gxllb4.fsf@berkeley.edu> <1440673873.1694.33.camel@sipsolutions.net> Message-ID: Hi, On Thu, Aug 27, 2015 at 6:23 PM, wrote: > > > On Thu, Aug 27, 2015 at 12:22 PM, Matthew Brett > wrote: >> >> Hi >> >> On Thu, Aug 27, 2015 at 5:11 PM, wrote: >> > >> > >> > On Thu, Aug 27, 2015 at 11:04 AM, Matthew Brett >> > >> > wrote: >> >> >> >> Hi, >> >> >> >> On Thu, Aug 27, 2015 at 3:34 PM, wrote: >> >> [snip] >> >> > I don't really see a problem with "codifying" the status quo. >> >> >> >> That's an excellent point. If we believe that the current situation >> >> is the best possible, both now and in the future, then codifying the >> >> status quo is an excellent idea. >> >> >> >> So, we should probably first start by asking ourselves: >> >> >> >> * what numpy is doing well; >> >> * what numpy could do better; >> >> >> >> and then ask, is there some way we could make it more likely we will >> >> improve over time. >> >> >> >> [snip] >> >> >> >> > As the current debate shows it's possible to have a public discussion >> >> > about >> >> > the direction of the project without having to delegate providing a >> >> > vision >> >> > to a president. >> >> >> >> The idea of a president that I had in mind, was not someone who makes >> >> all decisions, but the person who holds themselves responsible for the >> >> performance of the project. If the project has a coherent vision >> >> already, the president has no need to provide one, but it's the >> >> president's job to worry about whether we have vision or not, and do >> >> what they need to, to make sure we don't lose track of that. If you >> >> don't know it already, I highly recommend Jim Collins' work on 'level >> >> 5 leadership' [1] >> > >> > >> > Still doesn't sound like the need for a president to me >> > >> > " the person who holds themselves responsible for the >> > performance of the project" >> > >> > sounds more like the role of the "core" group (adding plural to persons) >> > to >> > me, and cannot be pushed of to an official president. >> >> Except that, in the past, having multiple people taking decisions has >> led to the situation where no-one feels themselves accountable for the >> result, hence this situation tends to lead to stagnation. > > > Is there any evidence for this? Oh - dear - that's the key point, but I'm obviously not making it clearly enough. Yes there is, and that was the evidence I was pointing to before. But anyway - Sebastian is right - this discussion isn't going anywhere useful. So - let's step back. In thinking about governance, we first need to ask what we want to achieve. This includes considering the risks ahead for the project. So, in the spirit of fruitful discussion, can I ask what y'all consider to be the current problems with working on numpy (other than the technical ones). What is numpy doing well, and what is it doing badly? What risks do we have to plan for in the future? Cheers, Matthew From stefanv at berkeley.edu Thu Aug 27 15:34:45 2015 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Thu, 27 Aug 2015 12:34:45 -0700 Subject: [Numpy-discussion] Comments on governance proposal (was: Notes from the numpy dev meeting at scipy 2015) In-Reply-To: References: <87y4gxllb4.fsf@berkeley.edu> <1440673873.1694.33.camel@sipsolutions.net> Message-ID: <87lhcwle3u.fsf@berkeley.edu> On 2015-08-27 11:06:10, Matthew Brett wrote: > So, in the spirit of fruitful discussion, can I ask what y'all > consider to be the current problems with working on numpy (other > than the technical ones). What is numpy doing well, and what > is it doing badly? What risks do we have to plan for in the > future? It looks to me as though the team is doing an excellent job of maintaining NumPy. The growth of the project has stagnated somewhat for numerous reasons---and a lack of ideas on the table is not one of them, rather whether / how to take them forward. The question, and I think what you also highlighted in the earlier part of this discussion, is: how to decide on which vision to adopt, and who takes responsibility for making that happen? Are the two models proposed thus far so different, or can they be merged in a way that makes sense? E.g., can we work as a community to rally behind a vision as set out by one person, and then repeat that process to focus on another a year later? Think of it as the iterative development equivalent of governance. This may just be another way of phrasing a precedency, but with a strong emphasis on its temporary nature, as well as a focus on a group-decided outcome. Alternatively, see it as a community governance model with a strong emphasis on responsibility. St?fan From josef.pktd at gmail.com Thu Aug 27 15:35:07 2015 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 27 Aug 2015 15:35:07 -0400 Subject: [Numpy-discussion] Comments on governance proposal (was: Notes from the numpy dev meeting at scipy 2015) In-Reply-To: References: <87y4gxllb4.fsf@berkeley.edu> <1440673873.1694.33.camel@sipsolutions.net> Message-ID: On Thu, Aug 27, 2015 at 2:06 PM, Matthew Brett wrote: > Hi, > > On Thu, Aug 27, 2015 at 6:23 PM, wrote: > > > > > > On Thu, Aug 27, 2015 at 12:22 PM, Matthew Brett > > > wrote: > >> > >> Hi > >> > >> On Thu, Aug 27, 2015 at 5:11 PM, wrote: > >> > > >> > > >> > On Thu, Aug 27, 2015 at 11:04 AM, Matthew Brett > >> > > >> > wrote: > >> >> > >> >> Hi, > >> >> > >> >> On Thu, Aug 27, 2015 at 3:34 PM, wrote: > >> >> [snip] > >> >> > I don't really see a problem with "codifying" the status quo. > >> >> > >> >> That's an excellent point. If we believe that the current > situation > >> >> is the best possible, both now and in the future, then codifying the > >> >> status quo is an excellent idea. > >> >> > >> >> So, we should probably first start by asking ourselves: > >> >> > >> >> * what numpy is doing well; > >> >> * what numpy could do better; > >> >> > >> >> and then ask, is there some way we could make it more likely we will > >> >> improve over time. > >> >> > >> >> [snip] > >> >> > >> >> > As the current debate shows it's possible to have a public > discussion > >> >> > about > >> >> > the direction of the project without having to delegate providing a > >> >> > vision > >> >> > to a president. > >> >> > >> >> The idea of a president that I had in mind, was not someone who makes > >> >> all decisions, but the person who holds themselves responsible for > the > >> >> performance of the project. If the project has a coherent vision > >> >> already, the president has no need to provide one, but it's the > >> >> president's job to worry about whether we have vision or not, and do > >> >> what they need to, to make sure we don't lose track of that. If you > >> >> don't know it already, I highly recommend Jim Collins' work on 'level > >> >> 5 leadership' [1] > >> > > >> > > >> > Still doesn't sound like the need for a president to me > >> > > >> > " the person who holds themselves responsible for the > >> > performance of the project" > >> > > >> > sounds more like the role of the "core" group (adding plural to > persons) > >> > to > >> > me, and cannot be pushed of to an official president. > >> > >> Except that, in the past, having multiple people taking decisions has > >> led to the situation where no-one feels themselves accountable for the > >> result, hence this situation tends to lead to stagnation. > > > > > > Is there any evidence for this? > > Oh - dear - that's the key point, but I'm obviously not making it > clearly enough. Yes there is, and that was the evidence I was > pointing to before. > If you mean the XFree and NetBSD cases, then I don't see any similarity to the numpy or scipy development pattern. If I would draw any conclusion, then maybe that NetBSD hat too much of formal governance structures and not enough informal governance. It would be difficult to take over the government if there is no government. just one aside "*No desire to recruit new users" * We are on a mission to take over the world (*). And forks of numpy like pandas turn out to be mostly complementary and increase the userbase of numpy. (R and Python are in friendly, or sometimes unfriendly, competition, but, AFAICS, we are both gaining users because of the others' presence. It's not a zero sum game in this case.) (*) But that's not in the "mission statement". > > But anyway - Sebastian is right - this discussion isn't going anywhere > useful. > > So - let's step back. > > In thinking about governance, we first need to ask what we want to > achieve. This includes considering the risks ahead for the project. > > So, in the spirit of fruitful discussion, can I ask what y'all > consider to be the current problems with working on numpy (other than > the technical ones). What is numpy doing well, and what is it doing > badly? What risks do we have to plan for in the future? > I thought that was implicit or explicit in the other thread. Josef > > Cheers, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Thu Aug 27 17:45:50 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Thu, 27 Aug 2015 23:45:50 +0200 Subject: [Numpy-discussion] Comments on governance proposal (was: Notes from the numpy dev meeting at scipy 2015) In-Reply-To: <87lhcwle3u.fsf@berkeley.edu> References: <87y4gxllb4.fsf@berkeley.edu> <1440673873.1694.33.camel@sipsolutions.net> <87lhcwle3u.fsf@berkeley.edu> Message-ID: <1440711950.11529.95.camel@sipsolutions.net> On Do, 2015-08-27 at 12:34 -0700, Stefan van der Walt wrote: > On 2015-08-27 11:06:10, Matthew Brett > wrote: > > So, in the spirit of fruitful discussion, can I ask what y'all > > consider to be the current problems with working on numpy (other > > than the technical ones). What is numpy doing well, and what > > is it doing badly? What risks do we have to plan for in the > > future? > > It looks to me as though the team is doing an excellent job of > maintaining NumPy. The growth of the project has stagnated > somewhat for numerous reasons---and a lack of ideas on the table > is not one of them, rather whether / how to take them forward. > > The question, and I think what you also highlighted in the earlier > part of this discussion, is: how to decide on which vision to > adopt, and who takes responsibility for making that happen? > > Are the two models proposed thus far so different, or can they be > merged in a way that makes sense? E.g., can we work as a > community to rally behind a vision as set out by one person, and > then repeat that process to focus on another a year later? Think > of it as the iterative development equivalent of governance. > > This may just be another way of phrasing a precedency, but with a > strong emphasis on its temporary nature, as well as a focus on a > group-decided outcome. Alternatively, see it as a community > governance model with a strong emphasis on responsibility. > Agreed. Are not PEP's/NEP's just that (and could possibly be formalized more, not sure how much they are in the current proposal) in some sense? Since they have a sponsor/author who can be said to be assigned to it/responsible once accepted. I will add one more thing which I think is important: The governance has to be create as little hassle as possible and it should be simple/short enough to quickly understand. - Sebastian > St?fan > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From stefanv at berkeley.edu Thu Aug 27 19:47:36 2015 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Thu, 27 Aug 2015 16:47:36 -0700 Subject: [Numpy-discussion] Comments on governance proposal (was: Notes from the numpy dev meeting at scipy 2015) In-Reply-To: <1440711950.11529.95.camel@sipsolutions.net> References: <87y4gxllb4.fsf@berkeley.edu> <1440673873.1694.33.camel@sipsolutions.net> <87lhcwle3u.fsf@berkeley.edu> <1440711950.11529.9 5.camel@sipsolutions.net> Message-ID: <87zj1cjntz.fsf@berkeley.edu> Hi Sebastian On 2015-08-27 14:45:50, Sebastian Berg wrote: > Agreed. Are not PEP's/NEP's just that (and could possibly be > formalized more, not sure how much they are in the current > proposal) in some sense? Since they have a sponsor/author who > can be said to be assigned to it/responsible once accepted. I would consider a collection of NEPs for the following year to be such a thing. When implementing a bigger plan, it does help to have one person who owns the entire vision at the helm, pushing it forward. I think of it like a symphony orchestra: we agree to play the same music, and then let the director oversee the execution of the piece as a whole. > The governance has to be create as little hassle as possible and > it should be simple/short enough to quickly understand. I completely agree; and I think bikeshedding (everyone gets to argue their point of view, regardless of the stakes) can be the antithesis to productive focus. St?fan From jaime.frio at gmail.com Fri Aug 28 00:59:35 2015 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Thu, 27 Aug 2015 21:59:35 -0700 Subject: [Numpy-discussion] Comments on governance proposal (was: Notes from the numpy dev meeting at scipy 2015) In-Reply-To: References: <87y4gxllb4.fsf@berkeley.edu> <1440673873.1694.33.camel@sipsolutions.net> Message-ID: On Thu, Aug 27, 2015 at 11:06 AM, Matthew Brett wrote: > Hi, > > On Thu, Aug 27, 2015 at 6:23 PM, wrote: > > > > > > On Thu, Aug 27, 2015 at 12:22 PM, Matthew Brett > > > wrote: > >> > >> Hi > >> > >> On Thu, Aug 27, 2015 at 5:11 PM, wrote: > >> > > >> > > >> > On Thu, Aug 27, 2015 at 11:04 AM, Matthew Brett > >> > > >> > wrote: > >> >> > >> >> Hi, > >> >> > >> >> On Thu, Aug 27, 2015 at 3:34 PM, wrote: > >> >> [snip] > >> >> > I don't really see a problem with "codifying" the status quo. > >> >> > >> >> That's an excellent point. If we believe that the current > situation > >> >> is the best possible, both now and in the future, then codifying the > >> >> status quo is an excellent idea. > >> >> > >> >> So, we should probably first start by asking ourselves: > >> >> > >> >> * what numpy is doing well; > >> >> * what numpy could do better; > >> >> > >> >> and then ask, is there some way we could make it more likely we will > >> >> improve over time. > >> >> > >> >> [snip] > >> >> > >> >> > As the current debate shows it's possible to have a public > discussion > >> >> > about > >> >> > the direction of the project without having to delegate providing a > >> >> > vision > >> >> > to a president. > >> >> > >> >> The idea of a president that I had in mind, was not someone who makes > >> >> all decisions, but the person who holds themselves responsible for > the > >> >> performance of the project. If the project has a coherent vision > >> >> already, the president has no need to provide one, but it's the > >> >> president's job to worry about whether we have vision or not, and do > >> >> what they need to, to make sure we don't lose track of that. If you > >> >> don't know it already, I highly recommend Jim Collins' work on 'level > >> >> 5 leadership' [1] > >> > > >> > > >> > Still doesn't sound like the need for a president to me > >> > > >> > " the person who holds themselves responsible for the > >> > performance of the project" > >> > > >> > sounds more like the role of the "core" group (adding plural to > persons) > >> > to > >> > me, and cannot be pushed of to an official president. > >> > >> Except that, in the past, having multiple people taking decisions has > >> led to the situation where no-one feels themselves accountable for the > >> result, hence this situation tends to lead to stagnation. > > > > > > Is there any evidence for this? > > Oh - dear - that's the key point, but I'm obviously not making it > clearly enough. Yes there is, and that was the evidence I was > pointing to before. > > But anyway - Sebastian is right - this discussion isn't going anywhere > useful. > > So - let's step back. > > In thinking about governance, we first need to ask what we want to > achieve. This includes considering the risks ahead for the project. > > So, in the spirit of fruitful discussion, can I ask what y'all > consider to be the current problems with working on numpy (other than > the technical ones). What is numpy doing well, and what is it doing > badly? What risks do we have to plan for in the future? > Are you trying to prove the point that consensus doesn't work by making it impossible to reach a consensus on this? ;-) One thing we are doing very badly is leveraging resources outside of contributions of work and time from individuals. Getting sponsors to finance work on what is the cornerstone of just about any Python package that has to add two numbers together shouldn't be too hard, especially seeing success stories like Jupyter's, who I believe has several paid developers working full time. That requires formalizing governance, because apparently sponsors are a little wary of giving money to "people on the internet". ;-) Fernando P?rez was extremely emphatic about the size of the opportunity NumPy was letting slip by not formalizing *any* governance model. And it is a necessary first step so that e.g. we have the money to, say a year from now, get the right people together for a couple of days to figure out a better governance model. I'd argue that money would be better spent financing a talented developer to advance e.g. Nathaniel's new dtype system to end all dtype systems, but that's a different story. Largely because of the above, even if Nathaniel's document involved tossing a coin to resolve disputes, I'd rather have that now than something much better never. Because there really is no alternative to Nathaniel's write-up of the status quo, other than the status quo without a write-up: it has taken him two months to put this draft together, **after** we agreed over several hours of face to face discussion on what the model should be. And I'm sure he has hated every minute he has had to put into it. So if we keep going around this in circles, after a few days we will all grow tired and go back to fighting over whether indexing should transpose subspaces or not, and all that other cool stuff we really enjoy. And a year from now we will be in the same place we are now, only a year older and deeper in (technical) debt. Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From faltet at gmail.com Fri Aug 28 04:24:30 2015 From: faltet at gmail.com (Francesc Alted) Date: Fri, 28 Aug 2015 10:24:30 +0200 Subject: [Numpy-discussion] Comments on governance proposal (was: Notes from the numpy dev meeting at scipy 2015) In-Reply-To: References: <87y4gxllb4.fsf@berkeley.edu> <1440673873.1694.33.camel@sipsolutions.net> Message-ID: +10 Very well written down ideas Jaime. 2015-08-28 6:59 GMT+02:00 Jaime Fern?ndez del R?o : > On Thu, Aug 27, 2015 at 11:06 AM, Matthew Brett > wrote: > >> Hi, >> >> On Thu, Aug 27, 2015 at 6:23 PM, wrote: >> > >> > >> > On Thu, Aug 27, 2015 at 12:22 PM, Matthew Brett < >> matthew.brett at gmail.com> >> > wrote: >> >> >> >> Hi >> >> >> >> On Thu, Aug 27, 2015 at 5:11 PM, wrote: >> >> > >> >> > >> >> > On Thu, Aug 27, 2015 at 11:04 AM, Matthew Brett >> >> > >> >> > wrote: >> >> >> >> >> >> Hi, >> >> >> >> >> >> On Thu, Aug 27, 2015 at 3:34 PM, wrote: >> >> >> [snip] >> >> >> > I don't really see a problem with "codifying" the status quo. >> >> >> >> >> >> That's an excellent point. If we believe that the current >> situation >> >> >> is the best possible, both now and in the future, then codifying the >> >> >> status quo is an excellent idea. >> >> >> >> >> >> So, we should probably first start by asking ourselves: >> >> >> >> >> >> * what numpy is doing well; >> >> >> * what numpy could do better; >> >> >> >> >> >> and then ask, is there some way we could make it more likely we will >> >> >> improve over time. >> >> >> >> >> >> [snip] >> >> >> >> >> >> > As the current debate shows it's possible to have a public >> discussion >> >> >> > about >> >> >> > the direction of the project without having to delegate providing >> a >> >> >> > vision >> >> >> > to a president. >> >> >> >> >> >> The idea of a president that I had in mind, was not someone who >> makes >> >> >> all decisions, but the person who holds themselves responsible for >> the >> >> >> performance of the project. If the project has a coherent vision >> >> >> already, the president has no need to provide one, but it's the >> >> >> president's job to worry about whether we have vision or not, and do >> >> >> what they need to, to make sure we don't lose track of that. If >> you >> >> >> don't know it already, I highly recommend Jim Collins' work on >> 'level >> >> >> 5 leadership' [1] >> >> > >> >> > >> >> > Still doesn't sound like the need for a president to me >> >> > >> >> > " the person who holds themselves responsible for the >> >> > performance of the project" >> >> > >> >> > sounds more like the role of the "core" group (adding plural to >> persons) >> >> > to >> >> > me, and cannot be pushed of to an official president. >> >> >> >> Except that, in the past, having multiple people taking decisions has >> >> led to the situation where no-one feels themselves accountable for the >> >> result, hence this situation tends to lead to stagnation. >> > >> > >> > Is there any evidence for this? >> >> Oh - dear - that's the key point, but I'm obviously not making it >> clearly enough. Yes there is, and that was the evidence I was >> pointing to before. >> >> But anyway - Sebastian is right - this discussion isn't going anywhere >> useful. >> >> So - let's step back. >> >> In thinking about governance, we first need to ask what we want to >> achieve. This includes considering the risks ahead for the project. >> >> So, in the spirit of fruitful discussion, can I ask what y'all >> consider to be the current problems with working on numpy (other than >> the technical ones). What is numpy doing well, and what is it doing >> badly? What risks do we have to plan for in the future? >> > > > Are you trying to prove the point that consensus doesn't work by making it > impossible to reach a consensus on this? ;-) > > > One thing we are doing very badly is leveraging resources outside of > contributions of work and time from individuals. Getting sponsors to > finance work on what is the cornerstone of just about any Python package > that has to add two numbers together shouldn't be too hard, especially > seeing success stories like Jupyter's, who I believe has several paid > developers working full time. That requires formalizing governance, > because apparently sponsors are a little wary of giving money to "people on > the internet". ;-) Fernando P?rez was extremely emphatic about the size of > the opportunity NumPy was letting slip by not formalizing *any* governance > model. And it is a necessary first step so that e.g. we have the money to, > say a year from now, get the right people together for a couple of days to > figure out a better governance model. I'd argue that money would be better > spent financing a talented developer to advance e.g. Nathaniel's new dtype > system to end all dtype systems, but that's a different story. > > Largely because of the above, even if Nathaniel's document involved > tossing a coin to resolve disputes, I'd rather have that now than something > much better never. Because there really is no alternative to Nathaniel's > write-up of the status quo, other than the status quo without a write-up: > it has taken him two months to put this draft together, **after** we agreed > over several hours of face to face discussion on what the model should be. > And I'm sure he has hated every minute he has had to put into it. So if we > keep going around this in circles, after a few days we will all grow tired > and go back to fighting over whether indexing should transpose subspaces or > not, and all that other cool stuff we really enjoy. And a year from now we > will be in the same place we are now, only a year older and deeper in > (technical) debt. > > Jaime > > -- > (\__/) > ( O.o) > ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes > de dominaci?n mundial. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- Francesc Alted -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Fri Aug 28 04:46:23 2015 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 28 Aug 2015 09:46:23 +0100 Subject: [Numpy-discussion] Comments on governance proposal (was: Notes from the numpy dev meeting at scipy 2015) In-Reply-To: References: <87y4gxllb4.fsf@berkeley.edu> <1440673873.1694.33.camel@sipsolutions.net> Message-ID: Hi, On Fri, Aug 28, 2015 at 5:59 AM, Jaime Fern?ndez del R?o wrote: > On Thu, Aug 27, 2015 at 11:06 AM, Matthew Brett > wrote: >> >> Hi, >> >> On Thu, Aug 27, 2015 at 6:23 PM, wrote: >> > >> > >> > On Thu, Aug 27, 2015 at 12:22 PM, Matthew Brett >> > >> > wrote: >> >> >> >> Hi >> >> >> >> On Thu, Aug 27, 2015 at 5:11 PM, wrote: >> >> > >> >> > >> >> > On Thu, Aug 27, 2015 at 11:04 AM, Matthew Brett >> >> > >> >> > wrote: >> >> >> >> >> >> Hi, >> >> >> >> >> >> On Thu, Aug 27, 2015 at 3:34 PM, wrote: >> >> >> [snip] >> >> >> > I don't really see a problem with "codifying" the status quo. >> >> >> >> >> >> That's an excellent point. If we believe that the current >> >> >> situation >> >> >> is the best possible, both now and in the future, then codifying the >> >> >> status quo is an excellent idea. >> >> >> >> >> >> So, we should probably first start by asking ourselves: >> >> >> >> >> >> * what numpy is doing well; >> >> >> * what numpy could do better; >> >> >> >> >> >> and then ask, is there some way we could make it more likely we will >> >> >> improve over time. >> >> >> >> >> >> [snip] >> >> >> >> >> >> > As the current debate shows it's possible to have a public >> >> >> > discussion >> >> >> > about >> >> >> > the direction of the project without having to delegate providing >> >> >> > a >> >> >> > vision >> >> >> > to a president. >> >> >> >> >> >> The idea of a president that I had in mind, was not someone who >> >> >> makes >> >> >> all decisions, but the person who holds themselves responsible for >> >> >> the >> >> >> performance of the project. If the project has a coherent vision >> >> >> already, the president has no need to provide one, but it's the >> >> >> president's job to worry about whether we have vision or not, and do >> >> >> what they need to, to make sure we don't lose track of that. If >> >> >> you >> >> >> don't know it already, I highly recommend Jim Collins' work on >> >> >> 'level >> >> >> 5 leadership' [1] >> >> > >> >> > >> >> > Still doesn't sound like the need for a president to me >> >> > >> >> > " the person who holds themselves responsible for the >> >> > performance of the project" >> >> > >> >> > sounds more like the role of the "core" group (adding plural to >> >> > persons) >> >> > to >> >> > me, and cannot be pushed of to an official president. >> >> >> >> Except that, in the past, having multiple people taking decisions has >> >> led to the situation where no-one feels themselves accountable for the >> >> result, hence this situation tends to lead to stagnation. >> > >> > >> > Is there any evidence for this? >> >> Oh - dear - that's the key point, but I'm obviously not making it >> clearly enough. Yes there is, and that was the evidence I was >> pointing to before. >> >> But anyway - Sebastian is right - this discussion isn't going anywhere >> useful. >> >> So - let's step back. >> >> In thinking about governance, we first need to ask what we want to >> achieve. This includes considering the risks ahead for the project. >> >> So, in the spirit of fruitful discussion, can I ask what y'all >> consider to be the current problems with working on numpy (other than >> the technical ones). What is numpy doing well, and what is it doing >> badly? What risks do we have to plan for in the future? > > > Are you trying to prove the point that consensus doesn't work by making it > impossible to reach a consensus on this? ;-) > Forgive me if I use this joke to see if I can get us any further. If this was code, I think this joke would not be funny, because we wouldn't expect to reach consensus without considering all the options, and discussing their pros and cons. Why would that not be useful in the case of forms of governance? One reason might be that the specific form of governance can have no influence on the long-term health of the project. I am convinced that that is wrong - that the form of governance has a large influence on the long-term health of a project. If there is some possibility that this is true, then it seems to me that we would be foolish not to try and come to some reasoned choice about the form of governance. Cheers, Matthew From sebastian at sipsolutions.net Fri Aug 28 05:40:05 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Fri, 28 Aug 2015 11:40:05 +0200 Subject: [Numpy-discussion] Comments on governance proposal (was: Notes from the numpy dev meeting at scipy 2015) In-Reply-To: References: <87y4gxllb4.fsf@berkeley.edu> <1440673873.1694.33.camel@sipsolutions.net> Message-ID: <1440754805.11529.109.camel@sipsolutions.net> On Fr, 2015-08-28 at 09:46 +0100, Matthew Brett wrote: > Hi, > > On Fri, Aug 28, 2015 at 5:59 AM, Jaime Fern?ndez del R?o > wrote: > > On Thu, Aug 27, 2015 at 11:06 AM, Matthew Brett > > wrote: > >> > >> Hi, > >> > >> On Thu, Aug 27, 2015 at 6:23 PM, wrote: > >> > > >> > > >> > On Thu, Aug 27, 2015 at 12:22 PM, Matthew Brett > >> > > >> > wrote: > >> >> > >> >> Hi > >> >> > >> >> On Thu, Aug 27, 2015 at 5:11 PM, wrote: > >> >> > > >> >> > > >> >> > On Thu, Aug 27, 2015 at 11:04 AM, Matthew Brett > >> >> > > >> >> > wrote: > >> >> >> > >> >> >> Hi, > >> >> >> > >> >> >> On Thu, Aug 27, 2015 at 3:34 PM, wrote: > >> >> >> [snip] > >> >> >> > I don't really see a problem with "codifying" the status quo. > >> >> >> > >> >> >> That's an excellent point. If we believe that the current > >> >> >> situation > >> >> >> is the best possible, both now and in the future, then codifying the > >> >> >> status quo is an excellent idea. > >> >> >> > >> >> >> So, we should probably first start by asking ourselves: > >> >> >> > >> >> >> * what numpy is doing well; > >> >> >> * what numpy could do better; > >> >> >> > >> >> >> and then ask, is there some way we could make it more likely we will > >> >> >> improve over time. > >> >> >> > >> >> >> [snip] > >> >> >> > >> >> >> > As the current debate shows it's possible to have a public > >> >> >> > discussion > >> >> >> > about > >> >> >> > the direction of the project without having to delegate providing > >> >> >> > a > >> >> >> > vision > >> >> >> > to a president. > >> >> >> > >> >> >> The idea of a president that I had in mind, was not someone who > >> >> >> makes > >> >> >> all decisions, but the person who holds themselves responsible for > >> >> >> the > >> >> >> performance of the project. If the project has a coherent vision > >> >> >> already, the president has no need to provide one, but it's the > >> >> >> president's job to worry about whether we have vision or not, and do > >> >> >> what they need to, to make sure we don't lose track of that. If > >> >> >> you > >> >> >> don't know it already, I highly recommend Jim Collins' work on > >> >> >> 'level > >> >> >> 5 leadership' [1] > >> >> > > >> >> > > >> >> > Still doesn't sound like the need for a president to me > >> >> > > >> >> > " the person who holds themselves responsible for the > >> >> > performance of the project" > >> >> > > >> >> > sounds more like the role of the "core" group (adding plural to > >> >> > persons) > >> >> > to > >> >> > me, and cannot be pushed of to an official president. > >> >> > >> >> Except that, in the past, having multiple people taking decisions has > >> >> led to the situation where no-one feels themselves accountable for the > >> >> result, hence this situation tends to lead to stagnation. > >> > > >> > > >> > Is there any evidence for this? > >> > >> Oh - dear - that's the key point, but I'm obviously not making it > >> clearly enough. Yes there is, and that was the evidence I was > >> pointing to before. > >> > >> But anyway - Sebastian is right - this discussion isn't going anywhere > >> useful. > >> > >> So - let's step back. > >> > >> In thinking about governance, we first need to ask what we want to > >> achieve. This includes considering the risks ahead for the project. > >> > >> So, in the spirit of fruitful discussion, can I ask what y'all > >> consider to be the current problems with working on numpy (other than > >> the technical ones). What is numpy doing well, and what is it doing > >> badly? What risks do we have to plan for in the future? > > > > > > Are you trying to prove the point that consensus doesn't work by making it > > impossible to reach a consensus on this? ;-) > > > > Forgive me if I use this joke to see if I can get us any further. > > If this was code, I think this joke would not be funny, because we > wouldn't expect to reach consensus without considering all the > options, and discussing their pros and cons. > > Why would that not be useful in the case of forms of governance? > Oh, it is true. I think we (those in the room in Austin) just have thought about it a bit already, so now we have to be a bit patient with everyone who just saw the plans the first time. But I hope we can agree that we should decide on some form of governance in the next few weeks, even if it may not be perfect. My personal problem with your ideas is not that I do not care for the warnings, but having already spend some time trying to put together this (and this is nothing weird, this is very common practice in open source), I personally do not want to spend time inventing something completely new. We must discuss improvements to the document, and even whole different approaches. But for me at least, I need something a little more specific. Maybe I am daft, but I hear "this is a bad idea" without also providing another approach (that seems doable). And I do not buy that it is *that* bad, it is a very common governance structure for open source. The presidency suggestions may be another approach and certainly something we can pick up ideas from, but to me it is so vague that I cannot even start comprehending what it would mean for the actual governance structure specifically for numpy (considering the size of the project, etc.). But by all means, I like proposals/learning from your ideas (i.e. maybe you can propose changes to the NEP sections), I personally would just like to see a bit more clearly where it goes. - Sebastian > One reason might be that the specific form of governance can have no > influence on the long-term health of the project. > > I am convinced that that is wrong - that the form of governance has a > large influence on the long-term health of a project. > > If there is some possibility that this is true, then it seems to me > that we would be foolish not to try and come to some reasoned choice > about the form of governance. > > Cheers, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From jdmc80 at hotmail.com Fri Aug 28 14:02:53 2015 From: jdmc80 at hotmail.com (Joseph Codadeen) Date: Fri, 28 Aug 2015 18:02:53 +0000 Subject: [Numpy-discussion] Numpty FFT.FFT slow with certain samples In-Reply-To: References: Message-ID: Hi, I am a numpy newbie. I have two wav files, one that numpy takes a long time to process the FFT. They was created within audacity using white noise and silence for gaps. my_1_minute_noise_with_gaps.wavmy_1_minute_noise_with_gaps_truncated.wav The files are very similar in the following way; 1. is white noise with silence gaps on every 15 second interval.2. is 1. but slightly shorter, i.e. I trimmed some ms off the end but it still has the last gap at 60s. The code I am using processes the file like this; framerate, data = scipy.io.wavfile.read(filepath) right = data[:, 0] # Align it to be efficient. if len(right) % 2 != 0: right = right[range(len(right) - 1)] noframes = len(right) fftout = np.fft.fft(right) / noframes # <<< I am timing this cmd Using timeit... my_1_minute_noise_with_gaps_truncated took 30.75620985s to process.my_1_minute_noise_with_gaps took 22307.13917s to process. Could someone tell me why this behaviour is happening please? Sorry I can't attach the files as this email gets bounced but you could easily create the files yourself.E.g my last gap width is 59.9995 - 1:00.0005, I repeat this every 15 seconds.My truncated file is 1:00.0015s long, non-truncated is 1:00.0833s long Thanks. -------------- next part -------------- An HTML attachment was scrubbed... URL: From hodge at stsci.edu Fri Aug 28 14:28:49 2015 From: hodge at stsci.edu (Phil Hodge) Date: Fri, 28 Aug 2015 14:28:49 -0400 Subject: [Numpy-discussion] Numpy FFT.FFT slow with certain samples In-Reply-To: References: Message-ID: <55E0A861.6020001@stsci.edu> On 08/28/2015 02:02 PM, Joseph Codadeen wrote: > > * my_1_minute_noise_with_gaps_truncated took***30.75620985s* to process. > * my_1_minute_noise_with_gaps took *22307.13917s*to process. > You didn't say how long those arrays were, but I can make a good guess that the truncated one had a length that could be factored into small, prime numbers, while the non-truncated one had a length that was either prime or could only be factored into large primes. Phil From jaime.frio at gmail.com Fri Aug 28 14:46:36 2015 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Fri, 28 Aug 2015 11:46:36 -0700 Subject: [Numpy-discussion] Numpty FFT.FFT slow with certain samples In-Reply-To: References: Message-ID: On Fri, Aug 28, 2015 at 11:02 AM, Joseph Codadeen wrote: > Hi, > > I am a numpy newbie. > > I have two wav files, one that numpy takes a long time to process the FFT. > They was created within audacity using white noise and silence for gaps. > > > 1. my_1_minute_noise_with_gaps.wav > 2. my_1_minute_noise_with_gaps_truncated.wav > > > The files are very similar in the following way; > > > - 1. is white noise with silence gaps on every 15 second interval. > - 2. is 1. but slightly shorter, i.e. I trimmed some ms off the end > but it still has the last gap at 60s. > > > The code I am using processes the file like this; > > framerate, data = scipy.io.wavfile.read(filepath) > right = data[:, 0] > # Align it to be efficient. > if len(right) % 2 != 0: > right = right[range(len(right) - 1)] > noframes = len(right) > fftout = np.fft.fft(right) / noframes # <<< I am timing this cmd > > Using timeit... > > > - my_1_minute_noise_with_gaps_truncated took *30.75620985s* to process. > - my_1_minute_noise_with_gaps took *22307.13917s* to process. > > > Could someone tell me why this behaviour is happening please? > > Sorry I can't attach the files as this email gets bounced but you could > easily create the files yourself. > E.g my last gap width is 59.9995 - 1:00.0005, I repeat this every 15 > seconds. > My truncated file is 1:00.0015s long, non-truncated is 1:00.0833s long > It is almost certainly caused by the number of samples in your signals, i.e. look at what noframes is in one case and the other. You will get best performance when noframes is a power of two, or has a factorization that includes many small integers (2, 3, 5, perhaps also 7), and the worst if the size is a prime number. Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jdmc80 at hotmail.com Fri Aug 28 14:51:47 2015 From: jdmc80 at hotmail.com (Joseph Codadeen) Date: Fri, 28 Aug 2015 18:51:47 +0000 Subject: [Numpy-discussion] Numpy FFT.FFT slow with certain samples In-Reply-To: <55E0A861.6020001@stsci.edu> References: , , <55E0A861.6020001@stsci.edu> Message-ID: my_1_minute_noise_with_gaps_truncated - Array len is 2646070my_1_minute_noise_with_gaps - Array len is 2649674 > Date: Fri, 28 Aug 2015 14:28:49 -0400 > From: hodge at stsci.edu > To: numpy-discussion at scipy.org > Subject: Re: [Numpy-discussion] Numpy FFT.FFT slow with certain samples > > On 08/28/2015 02:02 PM, Joseph Codadeen wrote: > > > > * my_1_minute_noise_with_gaps_truncated took***30.75620985s* to process. > > * my_1_minute_noise_with_gaps took *22307.13917s*to process. > > > > You didn't say how long those arrays were, but I can make a good guess > that the truncated one had a length that could be factored into small, > prime numbers, while the non-truncated one had a length that was either > prime or could only be factored into large primes. > > Phil > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefanv at berkeley.edu Fri Aug 28 15:03:52 2015 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Fri, 28 Aug 2015 12:03:52 -0700 Subject: [Numpy-discussion] Numpy FFT.FFT slow with certain samples In-Reply-To: References: <55E0A861.6020001@stsci.edu> Message-ID: <87y4gvi6av.fsf@berkeley.edu> On 2015-08-28 11:51:47, Joseph Codadeen wrote: > my_1_minute_noise_with_gaps_truncated - Array len is > 2646070my_1_minute_noise_with_gaps - Array len is 2649674 In [6]: from sympy import factorint In [7]: max(factorint(2646070)) Out[7]: 367 In [8]: max(factorint(2649674)) Out[8]: 1324837 Those numbers give you some indication of how long the FFT will take to compute. St?fan From jdmc80 at hotmail.com Fri Aug 28 15:13:55 2015 From: jdmc80 at hotmail.com (Joseph Codadeen) Date: Fri, 28 Aug 2015 19:13:55 +0000 Subject: [Numpy-discussion] Numpy FFT.FFT slow with certain samples In-Reply-To: <87y4gvi6av.fsf@berkeley.edu> References: , <55E0A861.6020001@stsci.edu>, , <87y4gvi6av.fsf@berkeley.edu> Message-ID: Great, thanks Stefan and everyone. > From: stefanv at berkeley.edu > To: numpy-discussion at scipy.org > Date: Fri, 28 Aug 2015 12:03:52 -0700 > Subject: Re: [Numpy-discussion] Numpy FFT.FFT slow with certain samples > > > On 2015-08-28 11:51:47, Joseph Codadeen > wrote: > > my_1_minute_noise_with_gaps_truncated - Array len is > > 2646070my_1_minute_noise_with_gaps - Array len is 2649674 > > In [6]: from sympy import factorint In [7]: > max(factorint(2646070)) Out[7]: 367 In [8]: > max(factorint(2649674)) Out[8]: 1324837 > > Those numbers give you some indication of how long the FFT will > take to compute. > > St?fan > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Fri Aug 28 16:36:36 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Fri, 28 Aug 2015 22:36:36 +0200 Subject: [Numpy-discussion] Numpy FFT.FFT slow with certain samples In-Reply-To: References: , <55E0A861.6020001@stsci.edu>, , <87y4gvi6av.fsf@berkeley.edu> Message-ID: <1440794196.22052.4.camel@sipsolutions.net> If you don't mind the extra dependency or licensing and this is an issue for you, you can try pyfftw (there are likely other similar projects) which wraps fftw and does not have this problem as far as I know. It exposes a numpy-like interface. - sebastian On Fr, 2015-08-28 at 19:13 +0000, Joseph Codadeen wrote: > Great, thanks Stefan and everyone. > > > From: stefanv at berkeley.edu > > To: numpy-discussion at scipy.org > > Date: Fri, 28 Aug 2015 12:03:52 -0700 > > Subject: Re: [Numpy-discussion] Numpy FFT.FFT slow with certain > samples > > > > > > On 2015-08-28 11:51:47, Joseph Codadeen > > wrote: > > > my_1_minute_noise_with_gaps_truncated - Array len is > > > 2646070my_1_minute_noise_with_gaps - Array len is 2649674 > > > > In [6]: from sympy import factorint In [7]: > > max(factorint(2646070)) Out[7]: 367 In [8]: > > max(factorint(2649674)) Out[8]: 1324837 > > > > Those numbers give you some indication of how long the FFT will > > take to compute. > > > > St?fan > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From efiring at hawaii.edu Fri Aug 28 17:31:10 2015 From: efiring at hawaii.edu (Eric Firing) Date: Fri, 28 Aug 2015 11:31:10 -1000 Subject: [Numpy-discussion] Numpy FFT.FFT slow with certain samples In-Reply-To: <1440794196.22052.4.camel@sipsolutions.net> References: <55E0A861.6020001@stsci.edu> <87y4gvi6av.fsf@berkeley.edu> <1440794196.22052.4.camel@sipsolutions.net> Message-ID: <55E0D31E.2020201@hawaii.edu> On 2015/08/28 10:36 AM, Sebastian Berg wrote: > If you don't mind the extra dependency or licensing and this is an issue > for you, you can try pyfftw (there are likely other similar projects) > which wraps fftw and does not have this problem as far as I know. It > exposes a numpy-like interface. Sort of; that interface returns a function, not the result. fftw is still an fft algorithm, so it is still subject to a huge difference in run time depending on how the input array can be factored. Furthermore, it gets its speed by figuring out how to optimize a calculation for a given size of input array. That initial optimization can be very slow. The overall speed gain is realized only when one saves the result of that optimization, and applies it to many calculations on arrays of the same size. Eric > > - sebastian > > > On Fr, 2015-08-28 at 19:13 +0000, Joseph Codadeen wrote: >> Great, thanks Stefan and everyone. >> >>> From: stefanv at berkeley.edu >>> To: numpy-discussion at scipy.org >>> Date: Fri, 28 Aug 2015 12:03:52 -0700 >>> Subject: Re: [Numpy-discussion] Numpy FFT.FFT slow with certain >> samples >>> >>> >>> On 2015-08-28 11:51:47, Joseph Codadeen >>> wrote: >>>> my_1_minute_noise_with_gaps_truncated - Array len is >>>> 2646070my_1_minute_noise_with_gaps - Array len is 2649674 >>> >>> In [6]: from sympy import factorint In [7]: >>> max(factorint(2646070)) Out[7]: 367 In [8]: >>> max(factorint(2649674)) Out[8]: 1324837 >>> >>> Those numbers give you some indication of how long the FFT will >>> take to compute. >>> >>> St?fan >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From jaime.frio at gmail.com Fri Aug 28 17:36:28 2015 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Fri, 28 Aug 2015 14:36:28 -0700 Subject: [Numpy-discussion] Comments on governance proposal (was: Notes from the numpy dev meeting at scipy 2015) In-Reply-To: <1440754805.11529.109.camel@sipsolutions.net> References: <87y4gxllb4.fsf@berkeley.edu> <1440673873.1694.33.camel@sipsolutions.net> <1440754805.11529.109.camel@sipsolutions.net> Message-ID: On Fri, Aug 28, 2015 at 2:40 AM, Sebastian Berg wrote: > On Fr, 2015-08-28 at 09:46 +0100, Matthew Brett wrote: > > Hi, > > > > On Fri, Aug 28, 2015 at 5:59 AM, Jaime Fern?ndez del R?o > > wrote: > > > On Thu, Aug 27, 2015 at 11:06 AM, Matthew Brett < > matthew.brett at gmail.com> > > > wrote: > > >> > > >> Hi, > > >> > > >> On Thu, Aug 27, 2015 at 6:23 PM, wrote: > > >> > > > >> > > > >> > On Thu, Aug 27, 2015 at 12:22 PM, Matthew Brett > > >> > > > >> > wrote: > > >> >> > > >> >> Hi > > >> >> > > >> >> On Thu, Aug 27, 2015 at 5:11 PM, wrote: > > >> >> > > > >> >> > > > >> >> > On Thu, Aug 27, 2015 at 11:04 AM, Matthew Brett > > >> >> > > > >> >> > wrote: > > >> >> >> > > >> >> >> Hi, > > >> >> >> > > >> >> >> On Thu, Aug 27, 2015 at 3:34 PM, wrote: > > >> >> >> [snip] > > >> >> >> > I don't really see a problem with "codifying" the status quo. > > >> >> >> > > >> >> >> That's an excellent point. If we believe that the current > > >> >> >> situation > > >> >> >> is the best possible, both now and in the future, then > codifying the > > >> >> >> status quo is an excellent idea. > > >> >> >> > > >> >> >> So, we should probably first start by asking ourselves: > > >> >> >> > > >> >> >> * what numpy is doing well; > > >> >> >> * what numpy could do better; > > >> >> >> > > >> >> >> and then ask, is there some way we could make it more likely we > will > > >> >> >> improve over time. > > >> >> >> > > >> >> >> [snip] > > >> >> >> > > >> >> >> > As the current debate shows it's possible to have a public > > >> >> >> > discussion > > >> >> >> > about > > >> >> >> > the direction of the project without having to delegate > providing > > >> >> >> > a > > >> >> >> > vision > > >> >> >> > to a president. > > >> >> >> > > >> >> >> The idea of a president that I had in mind, was not someone who > > >> >> >> makes > > >> >> >> all decisions, but the person who holds themselves responsible > for > > >> >> >> the > > >> >> >> performance of the project. If the project has a coherent > vision > > >> >> >> already, the president has no need to provide one, but it's the > > >> >> >> president's job to worry about whether we have vision or not, > and do > > >> >> >> what they need to, to make sure we don't lose track of that. > If > > >> >> >> you > > >> >> >> don't know it already, I highly recommend Jim Collins' work on > > >> >> >> 'level > > >> >> >> 5 leadership' [1] > > >> >> > > > >> >> > > > >> >> > Still doesn't sound like the need for a president to me > > >> >> > > > >> >> > " the person who holds themselves responsible for the > > >> >> > performance of the project" > > >> >> > > > >> >> > sounds more like the role of the "core" group (adding plural to > > >> >> > persons) > > >> >> > to > > >> >> > me, and cannot be pushed of to an official president. > > >> >> > > >> >> Except that, in the past, having multiple people taking decisions > has > > >> >> led to the situation where no-one feels themselves accountable for > the > > >> >> result, hence this situation tends to lead to stagnation. > > >> > > > >> > > > >> > Is there any evidence for this? > > >> > > >> Oh - dear - that's the key point, but I'm obviously not making it > > >> clearly enough. Yes there is, and that was the evidence I was > > >> pointing to before. > > >> > > >> But anyway - Sebastian is right - this discussion isn't going anywhere > > >> useful. > > >> > > >> So - let's step back. > > >> > > >> In thinking about governance, we first need to ask what we want to > > >> achieve. This includes considering the risks ahead for the project. > > >> > > >> So, in the spirit of fruitful discussion, can I ask what y'all > > >> consider to be the current problems with working on numpy (other than > > >> the technical ones). What is numpy doing well, and what is it doing > > >> badly? What risks do we have to plan for in the future? > > > > > > > > > Are you trying to prove the point that consensus doesn't work by > making it > > > impossible to reach a consensus on this? ;-) > > > > > > > Forgive me if I use this joke to see if I can get us any further. > > > > If this was code, I think this joke would not be funny, because we > > wouldn't expect to reach consensus without considering all the > > options, and discussing their pros and cons. > > > > Why would that not be useful in the case of forms of governance? > > > > Oh, it is true. I think we (those in the room in Austin) just have > thought about it a bit already, so now we have to be a bit patient with > everyone who just saw the plans the first time. But I hope we can agree > that we should decide on some form of governance in the next few weeks, > even if it may not be perfect. > > My personal problem with your ideas is not that I do not care for the > warnings, but having already spend some time trying to put together this > (and this is nothing weird, this is very common practice in open > source), I personally do not want to spend time inventing something > completely new. > > We must discuss improvements to the document, and even whole different > approaches. But for me at least, I need something a little more > specific. Maybe I am daft, but I hear "this is a bad idea" without also > providing another approach (that seems doable). > And I do not buy that it is *that* bad, it is a very common governance > structure for open source. The presidency suggestions may be another > approach and certainly something we can pick up ideas from, but to me it > is so vague that I cannot even start comprehending what it would mean > for the actual governance structure specifically for numpy (considering > the size of the project, etc.). > > But by all means, I like proposals/learning from your ideas (i.e. maybe > you can propose changes to the NEP sections), I personally would just > like to see a bit more clearly where it goes. > Perhaps we could add a paragraph to the document, stating that we understand the risks and will keep an eye open for the dilution of responsibility and lack of direction and ownership that may come from consensus based decision making. And make it part of our governance model that we will review the model yearly, to identify and correct issues. That wouldn't require any substantial change right now, but wouldn't crystallize a potentially harmful organization either. Jaime P.S. At some point during the discussion in Austin, the idea going around was that the NUMFocus committee, which at the time was going to have three members only, would also be vested with ultimate decision power. Just imagine, we could have had a proper triumvirate: Chuck, Nathaniel and Ralf, wearing togas and feasting around a triclinium while they decided the fate of NumPy! -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From noel.pierre.andre at gmail.com Fri Aug 28 19:20:33 2015 From: noel.pierre.andre at gmail.com (Pierre-Andre Noel) Date: Fri, 28 Aug 2015 16:20:33 -0700 Subject: [Numpy-discussion] Numpy FFT.FFT slow with certain samples In-Reply-To: <87y4gvi6av.fsf@berkeley.edu> References: <55E0A861.6020001@stsci.edu> <87y4gvi6av.fsf@berkeley.edu> Message-ID: <55E0ECC1.8010705@gmail.com> If your sequence is not meant to be periodic (i.e., if after one minute there is no reason why the signal should start back at the beginning right away), then you should do zero-padding. And while you zero-pad, you can zero-pad to a sequence that is a power of two, thus preventing awkward factorizations. from numpy.fft import fft from numpy.random import rand from math import log, ceil seq_A = rand(2649674) seq_B = rand(2646070) fft_A = fft(seq_A) #Long fft_B = fft(seq_B) zeropadded_fft_A = fft(seq_A, n=2**(ceil(log(len(seq_A),2))+1)) zeropadded_fft_B = fft(seq_B, n=2**(ceil(log(len(seq_B),2))+1)) You could remove the "+1" above to get faster results, but then that may lead to unwanted frequencies (coming from the fact that fft assumes periodic signals, read online about zero-padding). Have a nice day, Pierre-Andr? On 08/28/2015 12:03 PM, Stefan van der Walt wrote: > > On 2015-08-28 11:51:47, Joseph Codadeen wrote: >> my_1_minute_noise_with_gaps_truncated - Array len is >> 2646070my_1_minute_noise_with_gaps - Array len is 2649674 > > In [6]: from sympy import factorint In [7]: max(factorint(2646070)) > Out[7]: 367 In [8]: max(factorint(2649674)) Out[8]: 1324837 > Those numbers give you some indication of how long the FFT will take > to compute. > > St?fan > From stefanv at berkeley.edu Fri Aug 28 19:42:51 2015 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Fri, 28 Aug 2015 16:42:51 -0700 Subject: [Numpy-discussion] Numpy FFT.FFT slow with certain samples In-Reply-To: <55E0ECC1.8010705@gmail.com> References: <55E0A861.6020001@stsci.edu> <87y4gvi6av.fsf@berkeley.edu> <55E0ECC1.8010705@gmail.com> Message-ID: <87r3mnhtdw.fsf@berkeley.edu> On 2015-08-28 16:20:33, Pierre-Andre Noel wrote: > If your sequence is not meant to be periodic (i.e., if after one > minute there is no reason why the signal should start back at > the beginning right away), then you should do zero-padding. And > while you zero-pad, you can zero-pad to a sequence that is a > power of two, thus preventing awkward factorizations. Zero-padding won't help with the non-periodicity, will it? For that you may want to window instead. St?fan From charlesr.harris at gmail.com Fri Aug 28 20:09:37 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 28 Aug 2015 18:09:37 -0600 Subject: [Numpy-discussion] Comments on governance proposal (was: Notes from the numpy dev meeting at scipy 2015) In-Reply-To: References: <87y4gxllb4.fsf@berkeley.edu> <1440673873.1694.33.camel@sipsolutions.net> <1440754805.11529.109.camel@sipsolutions.net> Message-ID: On Fri, Aug 28, 2015 at 3:36 PM, Jaime Fern?ndez del R?o < jaime.frio at gmail.com> wrote: > On Fri, Aug 28, 2015 at 2:40 AM, Sebastian Berg < > sebastian at sipsolutions.net> wrote: > >> On Fr, 2015-08-28 at 09:46 +0100, Matthew Brett wrote: >> > Hi, >> > >> > On Fri, Aug 28, 2015 at 5:59 AM, Jaime Fern?ndez del R?o >> > wrote: >> > > On Thu, Aug 27, 2015 at 11:06 AM, Matthew Brett < >> matthew.brett at gmail.com> >> > > wrote: >> > >> >> > >> Hi, >> > >> >> > >> On Thu, Aug 27, 2015 at 6:23 PM, wrote: >> > >> > >> > >> > >> > >> > On Thu, Aug 27, 2015 at 12:22 PM, Matthew Brett >> > >> > >> > >> > wrote: >> > >> >> >> > >> >> Hi >> > >> >> >> > >> >> On Thu, Aug 27, 2015 at 5:11 PM, wrote: >> > >> >> > >> > >> >> > >> > >> >> > On Thu, Aug 27, 2015 at 11:04 AM, Matthew Brett >> > >> >> > >> > >> >> > wrote: >> > >> >> >> >> > >> >> >> Hi, >> > >> >> >> >> > >> >> >> On Thu, Aug 27, 2015 at 3:34 PM, >> wrote: >> > >> >> >> [snip] >> > >> >> >> > I don't really see a problem with "codifying" the status quo. >> > >> >> >> >> > >> >> >> That's an excellent point. If we believe that the current >> > >> >> >> situation >> > >> >> >> is the best possible, both now and in the future, then >> codifying the >> > >> >> >> status quo is an excellent idea. >> > >> >> >> >> > >> >> >> So, we should probably first start by asking ourselves: >> > >> >> >> >> > >> >> >> * what numpy is doing well; >> > >> >> >> * what numpy could do better; >> > >> >> >> >> > >> >> >> and then ask, is there some way we could make it more likely >> we will >> > >> >> >> improve over time. >> > >> >> >> >> > >> >> >> [snip] >> > >> >> >> >> > >> >> >> > As the current debate shows it's possible to have a public >> > >> >> >> > discussion >> > >> >> >> > about >> > >> >> >> > the direction of the project without having to delegate >> providing >> > >> >> >> > a >> > >> >> >> > vision >> > >> >> >> > to a president. >> > >> >> >> >> > >> >> >> The idea of a president that I had in mind, was not someone who >> > >> >> >> makes >> > >> >> >> all decisions, but the person who holds themselves responsible >> for >> > >> >> >> the >> > >> >> >> performance of the project. If the project has a coherent >> vision >> > >> >> >> already, the president has no need to provide one, but it's the >> > >> >> >> president's job to worry about whether we have vision or not, >> and do >> > >> >> >> what they need to, to make sure we don't lose track of that. >> If >> > >> >> >> you >> > >> >> >> don't know it already, I highly recommend Jim Collins' work on >> > >> >> >> 'level >> > >> >> >> 5 leadership' [1] >> > >> >> > >> > >> >> > >> > >> >> > Still doesn't sound like the need for a president to me >> > >> >> > >> > >> >> > " the person who holds themselves responsible for the >> > >> >> > performance of the project" >> > >> >> > >> > >> >> > sounds more like the role of the "core" group (adding plural to >> > >> >> > persons) >> > >> >> > to >> > >> >> > me, and cannot be pushed of to an official president. >> > >> >> >> > >> >> Except that, in the past, having multiple people taking decisions >> has >> > >> >> led to the situation where no-one feels themselves accountable >> for the >> > >> >> result, hence this situation tends to lead to stagnation. >> > >> > >> > >> > >> > >> > Is there any evidence for this? >> > >> >> > >> Oh - dear - that's the key point, but I'm obviously not making it >> > >> clearly enough. Yes there is, and that was the evidence I was >> > >> pointing to before. >> > >> >> > >> But anyway - Sebastian is right - this discussion isn't going >> anywhere >> > >> useful. >> > >> >> > >> So - let's step back. >> > >> >> > >> In thinking about governance, we first need to ask what we want to >> > >> achieve. This includes considering the risks ahead for the project. >> > >> >> > >> So, in the spirit of fruitful discussion, can I ask what y'all >> > >> consider to be the current problems with working on numpy (other than >> > >> the technical ones). What is numpy doing well, and what is it doing >> > >> badly? What risks do we have to plan for in the future? >> > > >> > > >> > > Are you trying to prove the point that consensus doesn't work by >> making it >> > > impossible to reach a consensus on this? ;-) >> > > >> > >> > Forgive me if I use this joke to see if I can get us any further. >> > >> > If this was code, I think this joke would not be funny, because we >> > wouldn't expect to reach consensus without considering all the >> > options, and discussing their pros and cons. >> > >> > Why would that not be useful in the case of forms of governance? >> > >> >> Oh, it is true. I think we (those in the room in Austin) just have >> thought about it a bit already, so now we have to be a bit patient with >> everyone who just saw the plans the first time. But I hope we can agree >> that we should decide on some form of governance in the next few weeks, >> even if it may not be perfect. >> >> My personal problem with your ideas is not that I do not care for the >> warnings, but having already spend some time trying to put together this >> (and this is nothing weird, this is very common practice in open >> source), I personally do not want to spend time inventing something >> completely new. >> >> We must discuss improvements to the document, and even whole different >> approaches. But for me at least, I need something a little more >> specific. Maybe I am daft, but I hear "this is a bad idea" without also >> providing another approach (that seems doable). >> And I do not buy that it is *that* bad, it is a very common governance >> structure for open source. The presidency suggestions may be another >> approach and certainly something we can pick up ideas from, but to me it >> is so vague that I cannot even start comprehending what it would mean >> for the actual governance structure specifically for numpy (considering >> the size of the project, etc.). >> >> But by all means, I like proposals/learning from your ideas (i.e. maybe >> you can propose changes to the NEP sections), I personally would just >> like to see a bit more clearly where it goes. >> > > Perhaps we could add a paragraph to the document, stating that we > understand the risks and will keep an eye open for the dilution of > responsibility and lack of direction and ownership that may come from > consensus based decision making. And make it part of our governance model > that we will review the model yearly, to identify and correct issues. That > wouldn't require any substantial change right now, but wouldn't crystallize > a potentially harmful organization either. > > Jaime > > P.S. At some point during the discussion in Austin, the idea going around > was that the NUMFocus committee, which at the time was going to have three > members only, would also be vested with ultimate decision power. Just > imagine, we could have had a proper triumvirate: Chuck, Nathaniel and Ralf, > wearing togas and feasting around a triclinium while they decided the fate > of NumPy! > The idea is appealing, but I don't think anyone should have to see me in a toga. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From noel.pierre.andre at gmail.com Fri Aug 28 20:12:25 2015 From: noel.pierre.andre at gmail.com (Pierre-Andre Noel) Date: Fri, 28 Aug 2015 17:12:25 -0700 Subject: [Numpy-discussion] Numpy FFT.FFT slow with certain samples In-Reply-To: <55E0ECC1.8010705@gmail.com> References: <55E0A861.6020001@stsci.edu> <87y4gvi6av.fsf@berkeley.edu> <55E0ECC1.8010705@gmail.com> Message-ID: <55E0F8E9.4020607@gmail.com> > Zero-padding won't help with the non-periodicity, will it? For > that you may want to window instead. Umh, it depends what you use the FFT for. You are right St?fan when saying that Joseph should probably also use a window to get rid of the high frequencies that will come from the sharp steps at the beginning and end of his signal. I had in mind the use of FFT to do convolutions ( https://en.wikipedia.org/wiki/Convolution_theorem ). If you do not zero-pad properly, then the end of the signal may "bleed" on the beginning, and vice versa. From stefanv at berkeley.edu Fri Aug 28 20:26:32 2015 From: stefanv at berkeley.edu (=?UTF-8?Q?St=C3=A9fan_van_der_Walt?=) Date: Fri, 28 Aug 2015 17:26:32 -0700 Subject: [Numpy-discussion] Numpy FFT.FFT slow with certain samples In-Reply-To: <55E0F8E9.4020607@gmail.com> References: <55E0A861.6020001@stsci.edu> <87y4gvi6av.fsf@berkeley.edu> <55E0ECC1.8010705@gmail.com> <55E0F8E9.4020607@gmail.com> Message-ID: On Aug 28, 2015 5:17 PM, "Pierre-Andre Noel" wrote: > > I had in mind the use of FFT to do convolutions ( > https://en.wikipedia.org/wiki/Convolution_theorem ). If you do not > zero-pad properly, then the end of the signal may "bleed" on the > beginning, and vice versa. Ah, gotcha! All these things should also be handled nicely in scipy.signal.fftconvolve. St?fan -------------- next part -------------- An HTML attachment was scrubbed... URL: From pelson.pub at gmail.com Sat Aug 29 03:55:41 2015 From: pelson.pub at gmail.com (Phil Elson) Date: Sat, 29 Aug 2015 08:55:41 +0100 Subject: [Numpy-discussion] Numpy helper function for __getitem__? In-Reply-To: References: <1440353282711.d9fa3274@Nodemailer> <1440404602.2051.14.camel@sipsolutions.net> Message-ID: Biggus also has such a function: https://github.com/SciTools/biggus/blob/master/biggus/__init__.py#L2878 It handles newaxis outside of that function in: https://github.com/SciTools/biggus/blob/master/biggus/__init__.py#L537. Again, it only aims to deal with orthogonal array indexing, not numpy fancy indexing. I'd be surprised if Dask.array didn't have a similar function too. HTH On 26 August 2015 at 18:59, Stephan Hoyer wrote: > Indeed, the helper function I wrote for xray was not designed to handle > None/np.newaxis or non-1d Boolean indexers, because those are not valid > indexers for xray objects. I think it could be straightforwardly extended > to handle None simply by not counting them towards the total number of > dimensions. > > On Tue, Aug 25, 2015 at 8:41 AM, Fabien wrote: > >> I think that Stephan's function for xray is very useful. A possible >> improvement (probably at a certain performance cost) would be to be able >> to provide a shape instead of a number of dimensions. The output would >> then be slices with valid start and ends. >> >> Current behavior: >> In[9]: expanded_indexer(slice(None), 2) >> Out[9]: (slice(None, None, None), slice(None, None, None)) >> >> With shape: >> In[9]: expanded_indexer(slice(None), (3, 4)) >> Out[9]: (slice(0, 4, 1), slice(0, 5, 1)) >> >> But if nobody needed something like this before me, I think that I might >> have a design problem in my code (still quite new to python). >> > > Glad you found it helpful! > > Python's slice object has the indices method which implements this logic, > e.g., > > In [15]: s = slice(None, 10) > > In [16]: s.indices(100) > Out[16]: (0, 10, 1) > > Cheers, > Stephan > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Sun Aug 30 17:44:39 2015 From: cournape at gmail.com (David Cournapeau) Date: Sun, 30 Aug 2015 22:44:39 +0100 Subject: [Numpy-discussion] Cythonizing some of NumPy Message-ID: Hi there, Reading Nathaniel summary from the numpy dev meeting, it looks like there is a consensus on using cython in numpy for the Python-C interfaces. This has been on my radar for a long time: that was one of my rationale for splitting multiarray into multiple "independent" .c files half a decade ago. I took the opportunity of EuroScipy sprints to look back into this, but before looking more into it, I'd like to make sure I am not going astray: 1. The transition has to be gradual 2. The obvious way I can think of allowing cython in multiarray is modifying multiarray such as cython "owns" the PyMODINIT_FUNC and the module PyModuleDef table. 3. We start using cython for the parts that are mostly menial refcount work. Things like functions in calculation.c are obvious candidates. Step 2 should not be disruptive, and does not look like a lot of work: there are < 60 methods in the table, and most of them should be fairly straightforward to cythonize. At worse, we could just keep them as is outside cython and just "export" them in cython. Does that sound like an acceptable plan ? If so, I will start working on a PR to work on 2. David -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaime.frio at gmail.com Mon Aug 31 00:09:20 2015 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Sun, 30 Aug 2015 21:09:20 -0700 Subject: [Numpy-discussion] np.sign and object comparisons Message-ID: There's been some work going on recently on Py2 vs Py3 object comparisons. If you want all the background, see gh-6265 and follow the links there. There is a half baked PR in the works, gh-6269 , that tries to unify behavior and fix some bugs along the way, by replacing all 2.x uses of PyObject_Compare with several calls to PyObject_RichCompareBool, which is available on 2.6, the oldest Python version we support. The poster child for this example is computing np.sign on an object array that has an np.nan entry. 2.x will just make up an answer for us: >>> cmp(np.nan, 0) -1 even though none of the relevant compares succeeds: >>> np.nan < 0 False >>> np.nan > 0 False >>> np.nan == 0 False The current 3.x is buggy, so the fact that it produces the same made up result as in 2.x is accidental: >>> np.sign(np.array([np.nan], 'O')) array([-1], dtype=object) Looking at the code, it seems that the original intention was for the answer to be `0`, which is equally made up but perhaps makes a little more sense. There are three ways of fixing this that I see: 1. Arbitrarily choose a value to set the return to. This is equivalent to choosing a default return for `cmp` for comparisons. This preserves behavior, but feels wrong. 2. Similarly to how np.sign of a floating point array with nans returns nan for those values, return e,g, None for these cases. This is my preferred option. 3. Raise an error, along the lines of the TypeError: unorderable types that 3.x produces for some comparisons. Thoughts anyone? Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.h.vankerkwijk at gmail.com Mon Aug 31 00:12:46 2015 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Mon, 31 Aug 2015 00:12:46 -0400 Subject: [Numpy-discussion] Notes from the numpy dev meeting at scipy 2015 In-Reply-To: References: Message-ID: Hi Nathaniel, others, I read the discussion of plans with interest. One item that struck me is that while there are great plans to have a proper extensible and presumably subclassable dtype, it is discouraged to subclass ndarray itself (rather, it is encouraged to use a broader array interface). From my experience with astropy in both Quantity (an ndarray subclass), Time (a separate class containing high precision times using two ndarray float64), and Table (initially holding structured arrays, but now sets of Columns, which themselves are ndarray subclasses), I'm not convinced the broader, new containers approach is that much preferable. Rather, it leads to a lot of boiler-plate code to reimplement things ndarray does already (since one is effectively just calling the methods on the underlying arrays). I also think the idea that a dtype becomes something that also contains a unit is a bit odd. Shouldn't dtype just be about how data is stored? Why include meta-data such as units? Instead, I think a quantity is most logically seen as numbers with a unit, just like masked arrays are numbers with masks, and variables numbers with uncertainties. Each of these cases adds extra information in different forms, and all are quite easily thought of as subclasses of ndarray where all operations do the normal operation, plus some extra work to keep the extra information up to date. Anyway, my suggestion would be to *encourage* rather than discourage ndarray subclassing, and help this by making ndarray (even) better. All the best, Marten On Thu, Aug 27, 2015 at 11:03 AM, wrote: > > > On Wed, Aug 26, 2015 at 10:06 AM, Travis Oliphant > wrote: > >> >> >> On Wed, Aug 26, 2015 at 1:41 AM, Nathaniel Smith wrote: >> >>> Hi Travis, >>> >>> Thanks for taking the time to write up your thoughts! >>> >>> I have many thoughts in return, but I will try to restrict myself to two >>> main ones :-). >>> >>> 1) On the question of whether work should be directed towards improving >>> NumPy-as-it-is or instead towards a compatibility-breaking replacement: >>> There's plenty of room for debate about whether it's better engineering >>> practice to try and evolve an existing system in place versus starting >>> over, and I guess we have some fundamental disagreements there, but I >>> actually think this debate is a distraction -- we can agree to disagree, >>> because in fact we have to try both. >>> >> >> Yes, on this we agree. I think NumPy can improve *and* we can have new >> innovative array objects. I don't disagree about that. >> >> >>> >>> At a practical level: NumPy *is* going to continue to evolve, because it >>> has users and people interested in evolving it; similarly, dynd and other >>> alternatives libraries will also continue to evolve, because they also have >>> people interested in doing it. And at a normative level, this is a good >>> thing! If NumPy and dynd both get better, than that's awesome: the worst >>> case is that NumPy adds the new features that we talked about at the >>> meeting, and dynd simultaneously becomes so awesome that everyone wants to >>> switch to it, and the result of this would be... that those NumPy features >>> are exactly the ones that will make the transition to dynd easier. Or if >>> some part of that plan goes wrong, then well, NumPy will still be there as >>> a fallback, and in the mean time we've actually fixed the major pain points >>> our users are begging us to fix. >>> >>> You seem to be urging us all to make a double-or-nothing wager that your >>> extremely ambitious plans will all work out, with the entire numerical >>> Python ecosystem as the stakes. I think this ambition is awesome, but maybe >>> it'd be wise to hedge our bets a bit? >>> >> >> You are mis-characterizing my view. I think NumPy can evolve (though I >> would personally rather see a bigger change to the underlying system like I >> outlined before). But, I don't believe it can even evolve easily in the >> direction needed without breaking ABI and that insisting on not breaking it >> or even putting too much effort into not breaking it will continue to >> create less-optimal solutions that are harder to maintain and do not take >> advantage of knowledge this community now has. >> >> I'm also very concerned that 'evolving' NumPy will create a situation >> where there are regular semantic and subtle API changes that will cause >> NumPy to be less stable for it's user-base. I've watched this happen. >> This at a time that people are already looking around for new and different >> approaches anyway. >> >> >>> >>> 2) You really emphasize this idea of an ABI-breaking (but not >>> API-breaking) release, and I think this must indicate some basic gap in how >>> we're looking at things. Where I'm getting stuck here is that... I actually >>> can't think of anything important that we can't do now, but could if we >>> were allowed to break ABI compatibility. The kinds of things that break ABI >>> but keep API are like... rearranging what order the fields in a struct fall >>> in, or changing the numeric value of opaque constants like >>> NPY_ARRAY_WRITEABLE. The biggest win I can think of is that we could save a >>> few bytes per array by arranging the fields inside the ndarray struct more >>> optimally, but that's hardly a feature to hang a 2.0 on. You seem to have a >>> vision of this ABI-breaking release as being something very different from >>> that, and I'm not clear on what this vision is. >>> >>> >> We already broke the ABI with date-time changes --- it's still broken for >> a certain percentage of users last I checked. So, part of my >> disagreement is that we've tried this and it didn't work --- even though >> smart people thought it would. I've had to deal with this personally and >> I'm not enthusiastic about having to deal with this for the next 5 years >> because of even more attempts to make changes while not breaking the ABI. >> I think the group is more careful now --- but I still think the API is >> broad enough and uses of NumPy deep enough that the effort involved in >> trying not to break the ABI is just not worth the effort (because it's a >> non-feature today). Adding new dtypes without breaking the ABI is tricky >> (and to do it without breaking the ABI is ugly). I also continue to >> believe that putting out a new ABI-breaking NumPy will allow re-compiling >> *once* (with some porting changes needed) and not subtle breakages >> requiring code-changes every time a release is made. If subtle changes >> aren't made, then the new features won't come. Right now, I'd rather have >> stability from NumPy than new features. New features can come from other >> libraries. >> >> One specific change that could easily be made in NumPy 2.0 (the current >> code but with an ABI change) is that Dtypes should become true type objects >> and array-scalars (which are the current type-objects) should become >> instances of those dtypes. That is the biggest clean-up needed, I think on >> the array-front. There should not be *both* array-scalars and dtype >> objects. They are the same thing fundamentally. It was a mistake to >> have both of them. I don't see how to make that change without breaking >> the ABI. Perhaps it could be done in a creative way --- but why put the >> effort into that and end up with an even more hacky code-base. >> >> NumPy's ABI was influenced by and evolved from Numeric and Numarray. It >> was not "designed" to last 30 years. >> >> I think the dtype "types" should potentially have different >> member-structures. The ufunc sub-system needs an overhaul --- it's >> member structures need upgrades. With generalized ufuncs and the >> iteration protocols of Mark Wiebe we know a whole lot more about ufuncs >> now. Ufuncs are the same 1995 structure that Jim Hugunin wrote. I >> suppose you *could* just tack new functions on the end of structure and >> keep growing the list (while leaving old, unused structures as unused or >> deprecated) --- or you can take the opportunity to tidy up a bit. The >> longer you leave everything the same, the harder you make the code-base and >> the more costly maintenance becomes. I just don't see the value there >> --- and I see a lot of pain. >> >> Regarding the ufunc subsystem. We've argued before about the lack of >> mulit-methods in NumPy. Continuing to add dunder-methods to try and get >> around it will continue to make the system harder to maintain and more >> brittle. >> >> You mention making NumPy an interface to multiple things along with many >> other ideas. I don't believe you can get there without real changes that >> break things (at the very least semantic changes). I'm not excited about >> those changes causing instability (which they will cause ---- to me the >> burden of proof that they won't is on you who wants to make the change and >> not on me to say how they will). I also think it will take much >> longer to get there incrementally (if at all) than just creating something >> on top of newer ideas. >> >> >> >>> The main reason I personally am against having a big ABI-breaking >>> release is not that I hate ABI breakage a priori, it's that all the big >>> features that I care about and the are users are asking for seem to be ones >>> that... don't actually require doing that. At most they seem to get a mild >>> benefit from breaking some obscure corner cases. So the cost/benefits don't >>> make any sense to me. >>> >>> So: can you give a concrete example of a change you have in mind where >>> breaking ABI would be the key enabler? >>> >>> (I guess you might also be thinking of a separate issue that you sort of >>> allude to: Perhaps we will try to make changes which we think don't involve >>> breaking the ABI, but discover too late that we have failed to fully >>> understand the implications and have broken it by mistake. IIUC this is >>> what happened in the 1.4 timeframe when datetime64 was merged and >>> accidentally renumbered some of the NPY_* constants. >>> >> >> Yes, this is what I'm mainly worried about. But, more than that, I'm >> concerned about general *semantic* and API changes at a rapid pace for a >> community that is just looking for stability and bug-fixes from NumPy >> itself --- with innovation happening elsewhere. >> >> >>> Partially I am less worried about this because I have a fair amount of >>> confidence that our review and QA process has improved these days to the >>> point that we would not let a change like that slip through by accident -- >>> we have a lot more active reviewers, people are sensitized to the issues, >>> we've successfully landed intrusive changes like Sebastian's indexing >>> rewrite, ... though this is very much second-hand impressions on my part, >>> and I'd welcome input from folks like Chuck who have a clearer view on how >>> things have changed from then to now. >>> >>> But more importantly, even if this is true, then I can't see how your >>> proposal helps. If we aren't good enough at our jobs to predict when we'll >>> break ABI, then by assumption it makes no sense to pick one release and >>> decide that this is the one time that we'll break ABI.) >>> >> >> I don't understand your point. Picking a release to break the ABI >> allows you to actually do things like change macros to functions and move >> structures around to be more consistent with a new design that is easier to >> maintain and allows more growth. It has nothing to do with "whether you >> are good at your job". Everyone has strengths and weaknesses. >> >> This kind of clean-up may be needed regularly --- every 3 years would not >> be a crazy pattern, but it could also be every 5 years if you wanted more >> discipline. I already knew we needed to break the ABI "soonish" when I >> released NumPy 1.0. The fact that we haven't officially done it yet (but >> have done it unofficially) is a great injustice to "what could be" and has >> slowed development of NumPy tremendously. >> >> We've gone back and forth on this. I'm fine if we disagree, but I just >> hope the disagreement doesn't lead to lack of cooperation as we both have >> the same ultimate interests in seeing array-computing in Python improve. >> I just don't support *major* changes without breaking the ABI without a >> whole lot of proof that it is possible (without hackiness). You have >> mentioned on your roadmap a lot of what I would consider *major* changes. >> Some of it you describe how to get there. The most important change >> (improving the dtype system) you don't. >> >> Part of my point is that we now *know* how to improve the dtype system. >> Let's do it. Let's not try "yet again" to do it differently inside an old >> system designed by a scientist who didn't understand type-theory or type >> systems (that was me by the way). Look at data-shape in the blaze >> project. Take that and build a Python type-system that also outputs >> struct-string syntax for memory-views. That's the data-description system >> that NumPy should be using --- not trying to hack on a mixed array-scalar, >> dtype-object system that may never support everything we now know is >> needed. >> >> Trying to incrementing from where we are now will only lead to a >> sub-optimal outcome and unfortunate instability when we already know what >> to do differently. I doubt I will convince you --- certainly not via >> email. I apologize in advance that I likely won't be able to respond in >> depth to any more questions that are really just "prove to me that I can't" >> kind of questions. Of course I can't prove that. All I'm saying is that >> to me the evidence and my experience leads me to not be able to support >> major changes like you have proposed without also intentionally breaking >> the ABI (and thus calling it NumPy 2.0). >> >> If I find time to write, I will try to use it to outline more >> specifically what I think is a better approach to array- and >> table-computing in Python that keeps the stability of NumPy and adds new >> features using different approaches. >> >> -Travis >> >> > > From my perspective the incremental evolutionary approach in numpy (and > scipy) in the last few years has worked quite well, and I'm optimistic that > it will work in future if the developers can pull it off. > > The main changes that I remember that needed adjustment in scipy (as > observer) or statsmodels (as maintainer) came from becoming more strict in > several cases. This mainly affects corner cases or cases where the > downstream code wasn't "clean". Some API breaking (with deprecation) and > some semantic changes are still needed independent of any big changes that > may or may not be arriving anytime soon. > > This way we get improvements in a core library with the requirement that > every once in a while we need to adjust our code. (And with the occasional > unintended side effect where test coverage is not enough.) > The advantage is that we are getting the improvements with the regular > release cycles, and they keep numpy alive and competitive for another 10 > years or more. In the meantime, other packages like pandas can cater and > expand to other use cases, or other packages can develop generic arrays and > out of core and distributed arrays. > > I'm partially following some of the Julia mailing lists. Starting > something from scratch is a lot of work, and my guess is that similar > approaches in python will take some time to become mainstream. In the > meantime we can build something on an improving numpy. > > --- > The only thing I'm not so happy about in the last years is the > proliferation of object arrays, both in numpy code and in pandas. And I > hope that the (dtype) proposals help to get rid of some of those object > arrays. > > > Josef > > >> >> >> >> >>> >>> On Tue, Aug 25, 2015 at 12:00 PM, Travis Oliphant >>> wrote: >>> >>>> Thanks for the write-up Nathaniel. There is a lot of great detail and >>>> interesting ideas here. >>>> >>>> I've am very eager to understand how to help NumPy and the wider >>>> community move forward however I can (my passions on this have not changed >>>> since 1999, though what I myself spend time on has changed). >>>> >>>> There are a lot of ways to think about approaching this, though. It's >>>> hard to get all the ideas on the table, and it was unfortunate we couldn't >>>> get everybody wyho are core NumPy devs together in person to have this >>>> discussion as there are still a lot of questions unanswered and a lot of >>>> thought that has gone into other approaches that was not brought up or >>>> represented in the meeting (how does Numba fit into this, what about >>>> data-shape, dynd, memory-views and Python type system, etc.). If NumPy >>>> becomes just an interface-specification, then why don't we just do that >>>> *outside* NumPy itself in a way that doesn't jeopardize the stability of >>>> NumPy today. These are some of the real questions I have. I will try >>>> to write up my thoughts in more depth soon, but I won't be able to respond >>>> in-depth right now. I just wanted to comment because Nathaniel said I >>>> disagree which is only partly true. >>>> >>>> The three most important things for me are 1) let's make sure we have >>>> representation from as wide of the community as possible (this is really >>>> hard), 2) let's look around at the broader community and the prior art that >>>> is happening in this space right now and 3) let's not pretend we are going >>>> to be able to make all this happen without breaking ABI compatibility. >>>> Let's just break ABI compatibility with NumPy 2.0 *and* have as much >>>> fidelity with the API and semantics of current NumPy as possible (though >>>> there will be some changes necessary long-term). >>>> >>>> I don't think we should intentionally break ABI if we can avoid it, but >>>> I also don't think we should spend in-ordinate amounts of time trying to >>>> pretend that we won't break ABI (for at least some people), and most >>>> importantly we should not pretend *not* to break the ABI when we actually >>>> do. We did this once before with the roll-out of date-time, and it was >>>> really un-necessary. When I released NumPy 1.0, there were several >>>> things that I knew should be fixed very soon (NumPy was never designed to >>>> not break ABI). Those problems are still there. Now, that we have >>>> quite a bit better understanding of what NumPy *should* be (there have been >>>> tremendous strides in understanding and community size over the past 10 >>>> years), let's actually make the infrastructure we think will last for the >>>> next 20 years (instead of trying to shoe-horn new ideas into a 20-year old >>>> code-base that wasn't designed for it). >>>> >>>> NumPy is a hard code-base. It has been since Numeric days in 1995. >>>> I could be wrong, but my guess is that we will be passed by as a community >>>> if we don't seize the opportunity to build something better than we can >>>> build if we are forced to use a 20 year old code-base. >>>> >>>> It is more important to not break people's code and to be clear when a >>>> re-compile is necessary for dependencies. Those to me are the most >>>> important constraints. There are a lot of great ideas that we all have >>>> about what we want NumPy to be able to do. Some of this are pretty >>>> transformational (and the more exciting they are, the harder I think they >>>> are going to be to implement without breaking at least the ABI). There >>>> is probably some CAP-like theorem around >>>> Stability-Features-Speed-of-Development (pick 2) when it comes to Open >>>> Source Software development and making feature-progress with NumPy *is >>>> going* to create in-stability which concerns me. >>>> >>>> I would like to see a little-bit-of-pain one time with a NumPy 2.0, >>>> rather than a constant pain because of constant churn over many years >>>> approach that Nathaniel seems to advocate. To me NumPy 2.0 is an >>>> ABI-breaking release that is as API-compatible as possible and whose >>>> semantics are not dramatically different. >>>> >>>> There are at least 3 areas of compatibility (ABI, API, and semantic). >>>> ABI-compatibility is a non-feature in today's world. There are so many >>>> distributions of the NumPy stack (and conda makes it trivial for anyone to >>>> build their own or for you to build one yourself). Making less-optimal >>>> software-engineering choices because of fear of breaking the ABI is not >>>> something I'm supportive of at all. We should not break ABI every >>>> release, but a release every 3 years that breaks ABI is not a problem. >>>> >>>> API compatibility should be much more sacrosanct, but it is also >>>> something that can also be managed. Any NumPy 2.0 should definitely >>>> support the full NumPy API (though there could be deprecated swaths). I >>>> think the community has done well in using deprecation and limiting the >>>> public API to make this more manageable and I would love to see a NumPy 2.0 >>>> that solidifies a future-oriented API along with a back-ward compatible API >>>> that is also available. >>>> >>>> Semantic compatibility is the hardest. We have already broken this on >>>> multiple occasions throughout the 1.x NumPy releases. Every time you >>>> change the code, this can change. This is what I fear causing deep >>>> instability over the course of many years. These are things like the >>>> casting rule details, the effect of indexing changes, any change to the >>>> calculations approaches. It is and has been the most at risk during any >>>> code-changes. My view is that a NumPy 2.0 (with a new low-level >>>> architecture) minimizes these changes to a single release rather than >>>> unavoidably spreading them out over many, many releases. >>>> >>>> I think that summarizes my main concerns. I will write-up more forward >>>> thinking ideas for what else is possible in the coming weeks. In the mean >>>> time, thanks for keeping the discussion going. It is extremely exciting to >>>> see the help people have continued to provide to maintain and improve >>>> NumPy. It will be exciting to see what the next few years bring as well. >>>> >>>> >>>> Best, >>>> >>>> -Travis >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Tue, Aug 25, 2015 at 5:03 AM, Nathaniel Smith wrote: >>>> >>>>> Hi all, >>>>> >>>>> These are the notes from the NumPy dev meeting held July 7, 2015, at >>>>> the SciPy conference in Austin, presented here so the list can keep up >>>>> with what happens, and so you can give feedback. Please do give >>>>> feedback, none of this is final! >>>>> >>>>> (Also, if anyone who was there notices anything I left out or >>>>> mischaracterized, please speak up -- these are a lot of notes I'm >>>>> trying to gather together, so I could easily have missed something!) >>>>> >>>>> Thanks to Jill Cowan and the rest of the SciPy organizers for donating >>>>> space and organizing logistics for us, and to the Berkeley Institute >>>>> for Data Science for funding travel for Jaime, Nathaniel, and >>>>> Sebastian. >>>>> >>>>> >>>>> Attendees >>>>> ========= >>>>> >>>>> Present in the room for all or part: Daniel Allan, Chris Barker, >>>>> Sebastian Berg, Thomas Caswell, Jeff Reback, Jaime Fern?ndez del >>>>> R?o, Chuck Harris, Nathaniel Smith, St?fan van der Walt. (Note: I'm >>>>> pretty sure this list is incomplete) >>>>> >>>>> Joining remotely for all or part: Stephan Hoyer, Julian Taylor. >>>>> >>>>> >>>>> Formalizing our governance/decision making >>>>> ========================================== >>>>> >>>>> This was a major focus of discussion. At a high level, the consensus >>>>> was to steal IPython's governance document ("IPEP 29") and modify it >>>>> to remove its use of a BDFL as a "backstop" to normal community >>>>> consensus-based decision, and replace it with a new "backstop" based >>>>> on Apache-project-style consensus voting amongst the core team. >>>>> >>>>> I'll send out a proper draft of this shortly for further discussion. >>>>> >>>>> >>>>> Development roadmap >>>>> =================== >>>>> >>>>> General consensus: >>>>> >>>>> Let's assume NumPy is going to remain important indefinitely, and >>>>> try to make it better, instead of waiting for something better to >>>>> come along. (This is unlikely to be wasted effort even if something >>>>> better does come along, and it's hardly a sure thing that that will >>>>> happen anyway.) >>>>> >>>>> Let's focus on evolving numpy as far as we can without major >>>>> break-the-world changes (no "numpy 2.0", at least in the foreseeable >>>>> future). >>>>> >>>>> And, as a target for that evolution, let's change our focus from >>>>> numpy as "NumPy is the library that gives you the np.ndarray object >>>>> (plus some attached infrastructure)", to "NumPy provides the >>>>> standard framework for working with arrays and array-like objects in >>>>> Python" >>>>> >>>>> This means, creating defined interfaces between array-like objects / >>>>> ufunc objects / dtype objects, so that it becomes possible for third >>>>> parties to add their own and mix-and-match. Right now ufuncs are >>>>> pretty good at this, but if you want a new array class or dtype then >>>>> in most cases you pretty much have to modify numpy itself. >>>>> >>>>> Vision: instead of everyone who wants a new container type having to >>>>> reimplement all of numpy, Alice can implement an array class using >>>>> (sparse / distributed / compressed / tiled / gpu / out-of-core / >>>>> delayed / ...) storage, pass it to code that was written using >>>>> direct calls to np.* functions, and it just works. (Instead of >>>>> np.sin being "the way you calculate the sine of an ndarray", it's >>>>> "the way you calculate the sine of any array-like container >>>>> object".) >>>>> >>>>> Vision: Darryl can implement a new dtype for (categorical data / >>>>> astronomical dates / integers-with-missing-values / ...) without >>>>> having to touch the numpy core. >>>>> >>>>> Vision: Chandni can then come along and combine them by doing >>>>> >>>>> a = alice_array([...], dtype=darryl_dtype) >>>>> >>>>> and it just works. >>>>> >>>>> Vision: no-one is tempted to subclass ndarray, because anything you >>>>> can do with an ndarray subclass you can also easily do by defining >>>>> your own new class that implements the "array protocol". >>>>> >>>>> >>>>> Supporting third-party array types >>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>>>> >>>>> Sub-goals: >>>>> - Get __numpy_ufunc__ done, which will cover a good chunk of numpy's >>>>> API right there. >>>>> - Go through the rest of the stuff in numpy, and figure out some >>>>> story for how to let it handle third-party array classes: >>>>> - ufunc ALL the things: Some things can be converted directly into >>>>> (g)ufuncs and then use __numpy_ufunc__ (e.g., np.std); some >>>>> things could be converted into (g)ufuncs if we extended the >>>>> (g)ufunc interface a bit (e.g. np.sort, np.matmul). >>>>> - Some things probably need their own __numpy_ufunc__-like >>>>> extensions (__numpy_concatenate__?) >>>>> - Provide tools to make it easier to implement the more complicated >>>>> parts of an array object (e.g. the bazillion different methods, >>>>> many of which are ufuncs in disguise, or indexing) >>>>> - Longer-run interesting research project: __numpy_ufunc__ requires >>>>> that one or the other object have explicit knowledge of how to >>>>> handle the other, so to handle binary ufuncs with N array types >>>>> you need something like N**2 __numpy_ufunc__ code paths. As an >>>>> alternative, if there were some interface that an object could >>>>> export that provided the operations nditer needs to efficiently >>>>> iterate over (chunks of) it, then you would only need N >>>>> implementations of this interface to handle all N**2 operations. >>>>> >>>>> This would solve a lot of problems for projects like: >>>>> - blosc >>>>> - dask >>>>> - distarray >>>>> - numpy.ma >>>>> - pandas >>>>> - scipy.sparse >>>>> - xray >>>>> >>>>> >>>>> Supporting third-party dtypes >>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>>>> >>>>> We already have something like a C level "dtype >>>>> protocol". Conceptually, the way you define a new dtype is by >>>>> defining a new class whose instances have data attributes defining >>>>> the parameters of the dtype (what fields are in *this* record dtype, >>>>> how many characters are in *this* string dtype, what units are used >>>>> for *this* datetime64, etc.), and you define a bunch of methods to >>>>> do things like convert an object from a Python object to your dtype >>>>> or vice-versa, to copy an array of your dtype from one place to >>>>> another, to cast to and from your new dtype, etc. This part is >>>>> great. >>>>> >>>>> The problem is, in the current implementation, we don't actually use >>>>> the Python object system to define these classes / attributes / >>>>> methods. Instead, all possible dtypes are jammed into a single >>>>> Python-level class, whose struct has fields for the union of all >>>>> possible dtype's attributes, and instead of Python-style method >>>>> slots there's just a big table of function pointers attached to each >>>>> object. >>>>> >>>>> So the main proposal is that we keep the basic design, but switch it >>>>> so that the float64 dtype, the int64 dtype, etc. actually literally >>>>> are subclasses of np.dtype, each implementing their own fields and >>>>> Python-style methods. >>>>> >>>>> Some of the pieces involved in doing this: >>>>> >>>>> - The current dtype methods should be cleaned up -- e.g. 'dot' and >>>>> 'less_than' are both dtype methods, when conceptually they're much >>>>> more like ufuncs. >>>>> >>>>> - The ufunc inner-loop interface currently does not get a reference >>>>> to the dtype object, so they can't see its attributes and this is >>>>> a big obstacle to many interesting dtypes (e.g., it's hard to >>>>> implement np.equal for categoricals if you don't know what >>>>> categories each has). So we need to add new arguments to the core >>>>> ufunc loop signature. (Fortunately this can be done in a >>>>> backwards-compatible way.) >>>>> >>>>> - We need to figure out what exactly the dtype methods should be, >>>>> and add them to the dtype class (possibly with backwards >>>>> compatibility shims for anyone who is accessing PyArray_ArrFuncs >>>>> directly). >>>>> >>>>> - Casting will be possibly the trickiest thing to work out, though >>>>> the basic idea of using dunder-dispatch-like __cast__ and >>>>> __rcast__ methods seems workable. (Encouragingly, this is also >>>>> exactly what dynd also does, though unfortunately dynd does not >>>>> yet support user-defined dtypes even to the extent that numpy >>>>> does, so there isn't much else we can steal from them.) >>>>> - We may also want to rethink the casting rules while we're at it, >>>>> since they have some very weird corners right now (e.g. see >>>>> [https://github.com/numpy/numpy/issues/6240]) >>>>> >>>>> - We need to migrate the current dtypes over to the new system, >>>>> which can be done in stages: >>>>> >>>>> - First stick them all in a single "legacy dtype" class whose >>>>> methods just dispatch to the PyArray_ArrFuncs per-object "method >>>>> table" >>>>> >>>>> - Then move each of them into their own classes >>>>> >>>>> - We should provide a Python-level wrapper for the protocol, so that >>>>> you can call dtype methods from Python >>>>> >>>>> - And vice-versa, it should be possible to subclass dtype at the >>>>> Python level >>>>> >>>>> - etc. >>>>> >>>>> Fortunately, AFAICT pretty much all of this can be done while >>>>> maintaining backwards compatibility (though we may want to break >>>>> some obscure cases to avoid expending *too* much effort with weird >>>>> backcompat contortions that will only help a vanishingly small >>>>> proportion of the userbase), and a lot of the above changes can be >>>>> done as semi-independent mini-projects, so there's no need for some >>>>> branch to go off and spend a year rewriting the world. >>>>> >>>>> Obviously there are still a lot of details to work out, though. But >>>>> overall, there was widespread agreement that this is one of the #1 >>>>> pain points for our users (e.g. it's the single main request from >>>>> pandas), and fixing it is very high priority. >>>>> >>>>> Some features that would become straightforward to implement >>>>> (e.g. even in third-party libraries) if this were fixed: >>>>> - missing value support >>>>> - physical unit tracking (meters / seconds -> array of velocity; >>>>> meters + seconds -> error) >>>>> - better and more diverse datetime representations (e.g. datetimes >>>>> with attached timezones, or using funky geophysical or >>>>> astronomical calendars) >>>>> - categorical data >>>>> - variable length strings >>>>> - strings-with-encodings (e.g. latin1) >>>>> - forward mode automatic differentiation (write a function that >>>>> computes f(x) where x is an array of float64; pass that function >>>>> an array with a special dtype and get out both f(x) and f'(x)) >>>>> - probably others I'm forgetting right now >>>>> >>>>> I should also note that there was one substantial objection to this >>>>> plan, from Travis Oliphant (in discussions later in the >>>>> conference). I'm not confident I understand his objections well >>>>> enough to reproduce them here, though -- perhaps he'll elaborate. >>>>> >>>>> >>>>> Money >>>>> ===== >>>>> >>>>> There was an extensive discussion on the topic of: "if we had money, >>>>> what would we do with it?" >>>>> >>>>> This is partially motivated by the realization that there are a >>>>> number of sources that we could probably get money from, if we had a >>>>> good story for what we wanted to do, so it's not just an idle >>>>> question. >>>>> >>>>> Points of general agreement: >>>>> >>>>> - Doing the in-person meeting was a good thing. We should plan do >>>>> that again, at least once a year. So one thing to spend money on >>>>> is travel subsidies to make sure that happens and is productive. >>>>> >>>>> - While it's tempting to imagine hiring junior people for the more >>>>> frustrating/boring work like maintaining buildbots, release >>>>> infrastructure, updating docs, etc., this seems difficult to do >>>>> realistically with our current resources -- how do we hire for >>>>> this, who would manage them, etc.? >>>>> >>>>> - On the other hand, the general feeling was that if we found the >>>>> money to hire a few more senior people who could take care of >>>>> themselves more, then that would be good and we could >>>>> realistically absorb that extra work without totally unbalancing >>>>> the project. >>>>> >>>>> - A major open question is how we would recruit someone for a >>>>> position like this, since apparently all the obvious candidates >>>>> who are already active on the NumPy team already have other >>>>> things going on. [For calibration on how hard this can be: NYU >>>>> has apparently had an open position for a year with the job >>>>> description of "come work at NYU full-time with a >>>>> private-industry-competitive-salary on whatever your personal >>>>> open-source scientific project is" (!) and still is having an >>>>> extremely difficult time filling it: >>>>> [http://cds.nyu.edu/research-engineer/]] >>>>> >>>>> - General consensus though was that there isn't much to be done >>>>> about this though, except try it and see. >>>>> >>>>> - (By the way, if you're someone who's reading this and >>>>> potentially interested in like a postdoc or better working on >>>>> numpy, then let's talk...) >>>>> >>>>> >>>>> More specific changes to numpy that had general consensus, but don't >>>>> really fit into a high-level roadmap >>>>> >>>>> ========================================================================================================= >>>>> >>>>> - Resolved: we should merge multiarray.so and umath.so into a single >>>>> extension module, so that they can share utility code without the >>>>> current awkward contortions. >>>>> >>>>> - Resolved: we should start hiding new fields in the ufunc and dtype >>>>> structs as soon as possible going forward. (I.e. they would not be >>>>> present in the version of the structs that are exposed through the >>>>> C API, but internally we would use a more detailed struct.) >>>>> - Mayyyyyybe we should even go ahead and hide the subset of the >>>>> existing fields that are really internal details that no-one >>>>> should be using. If we did this without changing anything else >>>>> then it would preserve ABI (the fields would still be where >>>>> existing compiled extensions expect them to be, if any such >>>>> extensions exist) while breaking API (trying to compile such >>>>> extensions would give a clear error), so would be a smoother >>>>> ramp if we think we need to eventually break those fields for >>>>> real. (As discussed above, there are a bunch of fields in the >>>>> dtype base class that only make sense for specific dtype >>>>> subclasses, e.g. only record dtypes need a list of field names, >>>>> but right now all dtypes have one anyway. So it would be nice to >>>>> remove these from the base class entirely, but that is >>>>> potentially ABI-breaking.) >>>>> >>>>> - Resolved: np.array should never return an object array unless >>>>> explicitly requested (e.g. with dtype=object); it just causes too >>>>> many surprising problems. >>>>> - First step: add a deprecation warning >>>>> - Eventually: make it an error. >>>>> >>>>> - The matrix class >>>>> - Resolved: We won't add warnings yet, but we will prominently >>>>> document that it is deprecated and should be avoided where-ever >>>>> possible. >>>>> - St?fan van der Walt volunteers to do this. >>>>> - We'd all like to deprecate it properly, but the feeling was that >>>>> the precondition for this is for scipy.sparse to provide sparse >>>>> "arrays" that don't return np.matrix objects on ordinary >>>>> operatoins. Until that happens we can't reasonably tell people >>>>> that using np.matrix is a bug. >>>>> >>>>> - Resolved: we should add a similar prominent note to the >>>>> "subclassing ndarray" documentation, warning people that this is >>>>> painful and barely works and please don't do it if you have any >>>>> alternatives. >>>>> >>>>> - Resolved: we want more, smaller releases -- every 6 months at >>>>> least, aiming to go even faster (every 4 months?) >>>>> >>>>> - On the question of using Cython inside numpy core: >>>>> - Everyone agrees that there are places where this would be an >>>>> improvement (e.g., Python<->C interfaces, and places "when you >>>>> want to do computer science", e.g. complicated algorithmic stuff >>>>> like graph traversals) >>>>> - Chuck wanted it to be clear though that he doesn't think it >>>>> would be a good goal to try and rewrite all of numpy in Cython >>>>> -- there also exist places where Cython ends up being "an uglier >>>>> version of C". No-one disagreed. >>>>> >>>>> - Our text reader is apparently not very functional on Python 3, and >>>>> generally slow and hard to work with. >>>>> - Resolved: We should extract Pandas's awesome text reader/parser >>>>> and convert it into its own package, that could then become a >>>>> new backend for both pandas and numpy.loadtxt. >>>>> - Jeff thinks this is a great idea >>>>> - Thomas Caswell volunteers to do the extraction. >>>>> >>>>> - We should work on improving our tools for evolving the ABI, so >>>>> that we will eventually be less constrained by decisions made >>>>> decades ago. >>>>> - One idea that had a lot of support was to switch from our >>>>> current append-only C-API to a "sliding window" API based on >>>>> explicit versions. So a downstream package might say >>>>> >>>>> #define NUMPY_API_VERSION 4 >>>>> >>>>> and they'd get the functions and behaviour provided in "version >>>>> 4" of the numpy C api. If they wanted to get access to new stuff >>>>> that was added in version 5, then they'd need to switch that >>>>> #define, and at the same time clean up any usage of stuff that >>>>> was removed or changed in version 5. And to provide a smooth >>>>> migration path, one version of numpy would support multiple >>>>> versions at once, gradually deprecating and dropping old >>>>> versions. >>>>> >>>>> - If anyone wants to help bring pip up to scratch WRT tracking ABI >>>>> dependencies (e.g., 'pip install numpy==' >>>>> -> triggers rebuild of scipy against the new ABI), then that >>>>> would be an extremely useful thing. >>>>> >>>>> >>>>> Policies that should be documented >>>>> ================================== >>>>> >>>>> ...together with some notes about what the contents of the document >>>>> should be: >>>>> >>>>> >>>>> How we manage bugs in the bug tracker. >>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>>>> >>>>> - Github "milestones" should *only* be assigned to release-blocker >>>>> bugs (which mostly means "regression from the last release"). >>>>> >>>>> In particular, if you're tempted to push a bug forward to the next >>>>> release... then it's clearly not a blocker, so don't set it to the >>>>> next release's milestone, just remove the milestone entirely. >>>>> >>>>> (Obvious exception to this: deprecation followup bugs where we >>>>> decide that we want to keep the deprecation around a bit longer >>>>> are a case where a bug actually does switch from being a blocker >>>>> to release 1.x to being a blocker for release 1.(x+1).) >>>>> >>>>> - Don't hesitate to close an issue if there's no way forward -- >>>>> e.g. a PR where the author has disappeared. Just post a link to >>>>> this policy and close, with a polite note that we need to keep our >>>>> tracker useful as a todo list, but they're welcome to re-open if >>>>> things change. >>>>> >>>>> >>>>> Deprecations and breakage policy: >>>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>>>> >>>>> - How long do we need to keep DeprecationWarnings around before we >>>>> break things? This is tricky because on the one hand an aggressive >>>>> (short) deprecation period lets us deliver new features and >>>>> important cleanups more quickly, but on the other hand a >>>>> too-aggressive deprecation period is difficult for our more >>>>> conservative downstream users. >>>>> >>>>> - Idea that had the most support: pick a somewhat-aggressive >>>>> warning period as our default, and make a rule that if someone >>>>> asks for an extension during the beta cycle for the release that >>>>> removes it, then we put it back for another release or two worth >>>>> of grace period. (While also possibly upgrading the warning to >>>>> be more visible during the grace period.) This gives us >>>>> deprecation periods that are more adaptive on a case-by-case >>>>> basis. >>>>> >>>>> - Lament: it would be really nice if we could get more people to >>>>> test our beta releases, because in practice right now 1.x.0 ends >>>>> up being where we actually the discover all the bugs, and 1.x.1 is >>>>> where it actually becomes usable. Which sucks, and makes it >>>>> difficult to have a solid policy about what counts as a >>>>> regression, etc. Is there anything we can do about this? >>>>> >>>>> - ABI breakage: we distinguish between an ABI break that breaks >>>>> everything (e.g., "import scipy" segfaults), versus an ABI break >>>>> that breaks an occasional rare case (e.g., only apps that poke >>>>> around in some obscure corner of some struct are affected). >>>>> >>>>> - The "break-the-world" type remains off-limit for now: the pain >>>>> is still too large (conda helps, but there are lots of people >>>>> who don't use conda!), and there aren't really any compelling >>>>> improvements that this would enable anyway. >>>>> >>>>> - For the "break-0.1%-of-users" type, it is *not* ruled out by >>>>> fiat, though we remain conservative: we should treat it like >>>>> other API breaks in principle, and do a careful case-by-case >>>>> analysis of the details of the situation, taking into account >>>>> what kind of code would be broken, how common these cases are, >>>>> how important the benefits are, whether there are any specific >>>>> mitigation strategies we can use, etc. -- with this process of >>>>> course taking into account that a segfault is nastier than a >>>>> Python exception. >>>>> >>>>> >>>>> Other points that were discussed >>>>> ================================ >>>>> >>>>> - There was inconclusive discussion of what we should do with dot() >>>>> in the places where it disagrees with the PEP 465 matmul semantics >>>>> (specifically this is when both arguments have ndim >= 3, or one >>>>> argument has ndim == 0). >>>>> - The concern is that the current behavior is not very useful, and >>>>> as far as we can tell no-one is using it; but, as people get >>>>> used to the more-useful PEP 465 behavior, they will increasingly >>>>> try to use it on the assumption that np.dot will work the same >>>>> way, and this will create pain for lots of people. So Nathaniel >>>>> argued that we should start at least issuing a visible warning >>>>> when people invoke the corner-case behavior. >>>>> - But OTOH, np.dot is such a core piece of infrastructure, and >>>>> there's such a large landscape of code out there using numpy >>>>> that we can't see, that others were reasonably wary of making >>>>> any change. >>>>> - For now: document prominently, but no change in behavior. >>>>> >>>>> >>>>> Links to raw notes >>>>> ================== >>>>> >>>>> Main page: >>>>> [https://github.com/numpy/numpy/wiki/SciPy-2015-developer-meeting] >>>>> >>>>> Notes from the meeting proper: >>>>> [ >>>>> https://docs.google.com/document/d/1IJcYdsHtk8MVAM4AZqFDBSf_nVG-mrB4Tv2bh9u1g4Y/edit?usp=sharing >>>>> ] >>>>> >>>>> Slides from the followup BoF: >>>>> [ >>>>> https://gist.github.com/njsmith/eb42762054c88e810786/raw/b74f978ce10a972831c582485c80fb5b8e68183b/future-of-numpy-bof.odp >>>>> ] >>>>> >>>>> Notes from the followup BoF: >>>>> [ >>>>> https://docs.google.com/document/d/11AuTPms5dIPo04JaBOWEoebXfk-tUzEZ-CvFnLIt33w/edit >>>>> ] >>>>> >>>>> -n >>>>> >>>>> -- >>>>> Nathaniel J. Smith -- http://vorpus.org >>>>> _______________________________________________ >>>>> NumPy-Discussion mailing list >>>>> NumPy-Discussion at scipy.org >>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>>> >>>> >>>> >>>> >>>> -- >>>> >>>> *Travis Oliphant* >>>> *Co-founder and CEO* >>>> >>>> >>>> @teoliphant >>>> 512-222-5440 >>>> http://www.continuum.io >>>> >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>> >>>> >>> >>> >>> -- >>> Nathaniel J. Smith -- http://vorpus.org >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> >> >> -- >> >> *Travis Oliphant* >> *Co-founder and CEO* >> >> >> @teoliphant >> 512-222-5440 >> http://www.continuum.io >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Mon Aug 31 04:23:15 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Mon, 31 Aug 2015 10:23:15 +0200 Subject: [Numpy-discussion] np.sign and object comparisons In-Reply-To: References: Message-ID: <1441009395.1716.14.camel@sipsolutions.net> On So, 2015-08-30 at 21:09 -0700, Jaime Fern?ndez del R?o wrote: > There's been some work going on recently on Py2 vs Py3 object > comparisons. If you want all the background, see gh-6265 and follow > the links there. > > > There is a half baked PR in the works, gh-6269, that tries to unify > behavior and fix some bugs along the way, by replacing all 2.x uses of > PyObject_Compare with several calls to PyObject_RichCompareBool, which > is available on 2.6, the oldest Python version we support. > > > The poster child for this example is computing np.sign on an object > array that has an np.nan entry. 2.x will just make up an answer for > us: > > > >>> cmp(np.nan, 0) > -1 > > > even though none of the relevant compares succeeds: > > > >>> np.nan < 0 > False > >>> np.nan > 0 > > False > >>> np.nan == 0 > > False > > > The current 3.x is buggy, so the fact that it produces the same made > up result as in 2.x is accidental: > > > >>> np.sign(np.array([np.nan], 'O')) > array([-1], dtype=object) > > > Looking at the code, it seems that the original intention was for the > answer to be `0`, which is equally made up but perhaps makes a little > more sense. > > > There are three ways of fixing this that I see: > 1. Arbitrarily choose a value to set the return to. This is > equivalent to choosing a default return for `cmp` for > comparisons. This preserves behavior, but feels wrong. > 2. Similarly to how np.sign of a floating point array with nans > returns nan for those values, return e,g, None for these > cases. This is my preferred option. That would be my gut feeling as well. Returning `NaN` could also make sense, but I guess we run into problems since we do not know the input type. So `None` seems like the only option here I can think of right now. - Sebastian > 1. Raise an error, along the lines of the TypeError: unorderable > types that 3.x produces for some comparisons. > Thoughts anyone? > > > Jaime > -- > > (\__/) > ( O.o) > ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus > planes de dominaci?n mundial. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From shoyer at gmail.com Mon Aug 31 13:23:10 2015 From: shoyer at gmail.com (Stephan Hoyer) Date: Mon, 31 Aug 2015 10:23:10 -0700 Subject: [Numpy-discussion] np.sign and object comparisons In-Reply-To: <1441009395.1716.14.camel@sipsolutions.net> References: <1441009395.1716.14.camel@sipsolutions.net> Message-ID: On Mon, Aug 31, 2015 at 1:23 AM, Sebastian Berg wrote: > That would be my gut feeling as well. Returning `NaN` could also make > sense, but I guess we run into problems since we do not know the input > type. So `None` seems like the only option here I can think of right > now. > My inclination is that return NaN would be the appropriate choice. It's certainly consistent with the behavior for float dtypes -- my expectation for object dtype behavior is that it works exactly like applying the np.sign ufunc to each element of the array individually. On the other hand, I suppose there are other ways in which an object can fail all those comparisons (e.g., NaT?), so I suppose we could return None. But it would still be a weird outcome for the most common case. Ideally, I suppose, np.sign would return an array with int-NA dtype, but that's a whole different can of worms... Stephan -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Mon Aug 31 13:31:06 2015 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 31 Aug 2015 19:31:06 +0200 Subject: [Numpy-discussion] np.sign and object comparisons References: <1441009395.1716.14.camel@sipsolutions.net> Message-ID: <20150831193106.6504744f@fsol> On Mon, 31 Aug 2015 10:23:10 -0700 Stephan Hoyer wrote: > > My inclination is that return NaN would be the appropriate choice. It's > certainly consistent with the behavior for float dtypes -- my expectation > for object dtype behavior is that it works exactly like applying the > np.sign ufunc to each element of the array individually. > > On the other hand, I suppose there are other ways in which an object can > fail all those comparisons (e.g., NaT?), so I suppose we could return None. Currently: >>> np.sign(np.timedelta64('nat')) numpy.timedelta64(-1) ... probably because NaT is -2**63 under the hood. But in this case returning NaT would sound better. Regards Antoine. From sebastian at sipsolutions.net Mon Aug 31 14:06:12 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Mon, 31 Aug 2015 20:06:12 +0200 Subject: [Numpy-discussion] np.sign and object comparisons In-Reply-To: References: <1441009395.1716.14.camel@sipsolutions.net> Message-ID: <1441044372.1716.23.camel@sipsolutions.net> On Mo, 2015-08-31 at 10:23 -0700, Stephan Hoyer wrote: > On Mon, Aug 31, 2015 at 1:23 AM, Sebastian Berg > wrote: > That would be my gut feeling as well. Returning `NaN` could > also make > > sense, but I guess we run into problems since we do not know > the input > type. So `None` seems like the only option here I can think of > right > now. > > > My inclination is that return NaN would be the appropriate choice. > It's certainly consistent with the behavior for float dtypes -- my > expectation for object dtype behavior is that it works exactly like > applying the np.sign ufunc to each element of the array individually. > I was wondering a bit if returning the original object could make sense. It would work for NaN (and also decimal versions of NaN, etc.). But I am not sure in general. - Sebastian > > On the other hand, I suppose there are other ways in which an object > can fail all those comparisons (e.g., NaT?), so I suppose we could > return None. But it would still be a weird outcome for the most common > case. Ideally, I suppose, np.sign would return an array with int-NA > dtype, but that's a whole different can of worms... > > > Stephan > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: