From irvin.probst at ensta-bretagne.fr Tue Mar 1 06:11:05 2016 From: irvin.probst at ensta-bretagne.fr (Irvin Probst) Date: Tue, 1 Mar 2016 12:11:05 +0100 Subject: [Numpy-discussion] About inv/lstsq Message-ID: <56D578C9.70000@ensta-bretagne.fr> Hi, I'm not sure if I should send this here or to scipy-user, feel free to redirect me there if I'm off topic. So, there is something I don't understand using inv and lstsq in numpy. I've built *on purpose* an ill conditioned system to fit a quadric a*x**2+b*y**2+c*x*y+d*x+e*y+f, the data points are taken on a narrow stripe four times longer than wide. My goal is obviously to find (a,b,c,d,e,f) so I built the following matrix: A[:,0] = data[:,0]**2 A[:,1] = data[:,1]**2 A[:,2] = data[:,1]*data[:,0] A[:,3] = data[:,0] A[:,4] = data[:,1] A[:,5] = 1; The condition number of A is around 2*1e5 but I can make it much bigger if needed by scaling the data along an axis. I then tried to find the best estimate of X in order to minimize the norm of A*X - B with B being my data points and X the vector (a,b,c,d,e,f). That's a very basic usage of least squares and it works fine with lstsq despite the bad condition number. However I was expecting to fail to solve it properly using inv(A.T.dot(A)).dot(A.T).dot(B) but actually while I scaled up the condition number lstsq began to give obviously wrong results (that's expected) whereas using inv constantly gave "visually good" results. I have no residuals to show but lstsq was just plain wrong (again that is expected when cond(A) rises) while inv "worked". I was expecting to see inv fail much before lstsq. Interestingly the same dataset fails in Matlab using inv without any scaling of the condition number while it works using \ (mldivide, i.e least squares). On octave it works fine using both methods with the original dataset, I did not try to scale up the condition number. So my question is very simple, what's going on here ? It looks like Matlab, Numpy and Octave both use the same lapack functions for inv and lstsq. As they don't use the same version of lapack I can understand that they do not exhibit the same behavior but how can it be possible to have lstsq failing before inv(A.T.dot(A)) when I scale up the condition number of A ? I feel like I'm missing something obvious but I can't find it. Thanks. -------------- next part -------------- An HTML attachment was scrubbed... URL: From alex.rogozhnikov at yandex.ru Tue Mar 1 18:03:45 2016 From: alex.rogozhnikov at yandex.ru (Alex Rogozhnikov) Date: Wed, 2 Mar 2016 02:03:45 +0300 Subject: [Numpy-discussion] Weighted percentile / quantile Message-ID: <4394BFEC-297E-458D-B440-C143156EE074@yandex.ru> Hi, I know the topic was already raised a long ago: https://mail.scipy.org/pipermail/numpy-discussion/2010-July/051851.html There are also several questions on SO: http://stackoverflow.com/questions/20601872/numpy-or-scipy-to-calculate-weighted-median http://stackoverflow.com/questions/13546146/percentile-calculation-with-weighted-data http://stackoverflow.com/questions/26102867/python-weighted-median-algorithm-with-pandas The only working solution with numpy: http://stackoverflow.com/questions/21844024/weighted-percentile-using-numpy uses sorting. Are there better options at the moment (numpy/scipy/pandas)? Cheers, Alex. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jfoxrabinovitz at gmail.com Tue Mar 1 22:27:05 2016 From: jfoxrabinovitz at gmail.com (Joseph Fox-Rabinovitz) Date: Tue, 1 Mar 2016 22:27:05 -0500 Subject: [Numpy-discussion] Weighted percentile / quantile In-Reply-To: <4394BFEC-297E-458D-B440-C143156EE074@yandex.ru> References: <4394BFEC-297E-458D-B440-C143156EE074@yandex.ru> Message-ID: Alex, At the moment, there does not appear to be anything in numpy. However, I am working (slowly) on upgrading the C code for partitioning with arbitrary arrays of real weights. That will get `partition`, `median`, `percentile` to work with weights, as well as enabling weights for the automated bin estimators of `histogram`. `mean` already has an implementation of weights via `average`. You may be interested in my original post to the mailing list here: https://mail.scipy.org/pipermail/numpy-discussion/2016-February/075000.html. Josef P. mentioned in one of his responses that statsmodels has a weighted quantile computation available as of PR 2707: https://github.com/statsmodels/statsmodels/pull/2707. That should effectively serve your purpose. -Joe On Tue, Mar 1, 2016 at 6:03 PM, Alex Rogozhnikov wrote: > Hi, > I know the topic was already raised a long ago: > https://mail.scipy.org/pipermail/numpy-discussion/2010-July/051851.html > > There are also several questions on SO: > http://stackoverflow.com/questions/20601872/numpy-or-scipy-to-calculate-weighted-median > http://stackoverflow.com/questions/13546146/percentile-calculation-with-weighted-data > http://stackoverflow.com/questions/26102867/python-weighted-median-algorithm-with-pandas > > The only working solution with numpy: > http://stackoverflow.com/questions/21844024/weighted-percentile-using-numpy > uses sorting. > > Are there better options at the moment (numpy/scipy/pandas)? > > Cheers, > Alex. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > From alex.rogozhnikov at yandex.ru Wed Mar 2 07:20:36 2016 From: alex.rogozhnikov at yandex.ru (Alex Rogozhnikov) Date: Wed, 2 Mar 2016 15:20:36 +0300 Subject: [Numpy-discussion] Weighted percentile / quantile In-Reply-To: References: <4394BFEC-297E-458D-B440-C143156EE074@yandex.ru> Message-ID: <9965E1D2-3E41-45B3-850F-FCFFF5CEA6A7@yandex.ru> Hi, Joe, > I am working (slowly) on upgrading the C code for partitioning with > arbitrary arrays of real weights really good to know there is some work in this direction. 02 ????? 2016 ?., ? 6:27, Joseph Fox-Rabinovitz ???????(?): > Alex, > > At the moment, there does not appear to be anything in numpy. However, > I am working (slowly) on upgrading the C code for partitioning with > arbitrary arrays of real weights. That will get `partition`, `median`, > `percentile` to work with weights, as well as enabling weights for the > automated bin estimators of `histogram`. `mean` already has an > implementation of weights via `average`. > > You may be interested in my original post to the mailing list here: > https://mail.scipy.org/pipermail/numpy-discussion/2016-February/075000.html. > Josef P. mentioned in one of his responses that statsmodels has a > weighted quantile computation available as of PR 2707: > https://github.com/statsmodels/statsmodels/pull/2707. That should > effectively serve your purpose. It?s the same sort+cumsum approach, and even worse because relies on aggregating. Thanks for letting know, but I?ll definitely prefer implementation from SO (till numpy will support weights). Cheers, Alex > > -Joe > > > On Tue, Mar 1, 2016 at 6:03 PM, Alex Rogozhnikov > wrote: >> Hi, >> I know the topic was already raised a long ago: >> https://mail.scipy.org/pipermail/numpy-discussion/2010-July/051851.html >> >> There are also several questions on SO: >> http://stackoverflow.com/questions/20601872/numpy-or-scipy-to-calculate-weighted-median >> http://stackoverflow.com/questions/13546146/percentile-calculation-with-weighted-data >> http://stackoverflow.com/questions/26102867/python-weighted-median-algorithm-with-pandas >> >> The only working solution with numpy: >> http://stackoverflow.com/questions/21844024/weighted-percentile-using-numpy >> uses sorting. >> >> Are there better options at the moment (numpy/scipy/pandas)? >> >> Cheers, >> Alex. >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion From evgeny.burovskiy at gmail.com Thu Mar 3 09:38:42 2016 From: evgeny.burovskiy at gmail.com (Evgeni Burovski) Date: Thu, 3 Mar 2016 14:38:42 +0000 Subject: [Numpy-discussion] GSoC? In-Reply-To: References: Message-ID: > On Wed, Feb 10, 2016 at 11:55 PM, Ralf Gommers > wrote: >> This is last year's page: >> https://github.com/scipy/scipy/wiki/GSoC-2015-project-ideas >> >> Some ideas have been worked on, others are still relevant. Let's copy this >> page to -2016- and start editing it and adding new ideas. I'll start right >> now actually. > > > OK first version: > https://github.com/scipy/scipy/wiki/GSoC-2016-project-ideas > I kept some of the ideas from last year, but removed all potential mentors > as the same people may not be available this year - please re-add yourselves > where needed. Thanks Ralf for doing it! Just a quick note on de-listed projects. While I do not disagree with removing the projects on splines and special functions, this IMO does not mean we won't consider proposals on either of these topics if someone wants to write one. For instance, if Josh or Ted want to frame their work on hypergeometric functions as a GSoC project, I'm sure we're going to at least consider these. Evgeni From matthew.brett at gmail.com Thu Mar 3 23:42:42 2016 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 3 Mar 2016 20:42:42 -0800 Subject: [Numpy-discussion] Windows wheels, built, but should we deploy? Message-ID: Hi, Summary: I propose that we upload Windows wheels to pypi. The wheels are likely to be stable and relatively easy to maintain, but will have slower performance than other versions of numpy linked against faster BLAS / LAPACK libraries. Background: There's a long discussion going on at issue github #5479 [1], where the old problem of Windows wheels for numpy came up. For those of you not following this issue, the current situation for community-built numpy Windows binaries is dire: * We have not so far provided windows wheels on pypi, so `pip install numpy` on Windows will bring you a world of pain; * Until recently we did provide .exe "superpack" installers on sourceforge, but these became increasingly difficult to build and we gave up building them as of the latest (1.10.4) release. Despite this, popularity of Windows wheels on pypi is high. A few weeks ago, Donald Stufft ran a query for the binary wheels most often downloaded from pypi, for any platform [2] . The top five most downloaded were (n_downloads, name): 6646, numpy-1.10.4-cp27-none-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl 5445, cryptography-1.2.1-cp27-none-win_amd64.whl 5243, matplotlib-1.4.0-cp34-none-win32.whl 5241, scikit_learn-0.15.1-cp34-none-win32.whl 4573, pandas-0.17.1-cp27-none-win_amd64.whl So a) the OSX numpy wheel is very popular and b) despite the fact that we don't provide a numpy wheel for Windows, matplotlib, sckit_learn and pandas, that depend on numpy, are the 3rd, 4th and 5th most downloaded wheels as of a few weeks ago. So, there seems to be a large appetite for numpy wheels. Current proposal: I have now built numpy wheels, using the ATLAS blas / lapack library - the build is automatic and reproducible [3]. I chose ATLAS to build against, rather than, say OpenBLAS, because we've had some significant worries in the past about the reliability of OpenBLAS, and I thought it better to err on the side of correctness. However, these builds are relatively slow for matrix multiply and other linear algebra routines compared numpy built against OpenBLAS or MKL (which we cannot use because of its license) [4]. In my very crude array test of a dot product and matrix inversion, the ATLAS wheels were 2-3 times slower than MKL. Other benchmarks on Julia found about the same result for ATLAS vs OpenBLAS on 32-bit bit, but a much bigger difference on 64-bit (for an earlier version of ATLAS than we are currently using) [5]. So, our numpy wheels likely to be stable and give correct results, but will be somewhat slow for linear algebra. I propose that we upload these ATLAS wheels to pypi. The upside is that this gives our Windows users a much better experience with pip, and allows other developers to build Windows wheels that depend on numpy. The downside is that these will not be optimized for performance on modern processors. In order to signal that, I propose adding the following text to the numpy pypi front page: ``` All numpy wheels distributed from pypi are BSD licensed. Windows wheels are linked against the ATLAS BLAS / LAPACK library, restricted to SSE2 instructions, so may not give optimal linear algebra performance for your machine. See http://docs.scipy.org/doc/numpy/user/install.html for alternatives. ``` In a way this is very similar to our previous situation, in that the superpack installers also used ATLAS - in fact an older version of ATLAS. Once we are up and running with numpy wheels, we can consider whether we should switch to other BLAS libraries, such as OpenBLAS or BLIS (see [6]). I'm posting here hoping for your feedback... Cheers, Matthew [1] https://github.com/numpy/numpy/issues/5479 [2] https://gist.github.com/dstufft/1dda9a9f87ee7121e0ee [3] https://ci.appveyor.com/project/matthew-brett/np-wheel-builder [4] http://mingwpy.github.io/blas_lapack.html#intel-math-kernel-library [5] https://github.com/numpy/numpy/issues/5479#issuecomment-185033668 [6] https://github.com/numpy/numpy/issues/7372 From cournape at gmail.com Fri Mar 4 03:29:27 2016 From: cournape at gmail.com (David Cournapeau) Date: Fri, 4 Mar 2016 08:29:27 +0000 Subject: [Numpy-discussion] Windows wheels, built, but should we deploy? In-Reply-To: References: Message-ID: On Fri, Mar 4, 2016 at 4:42 AM, Matthew Brett wrote: > Hi, > > Summary: > > I propose that we upload Windows wheels to pypi. The wheels are > likely to be stable and relatively easy to maintain, but will have > slower performance than other versions of numpy linked against faster > BLAS / LAPACK libraries. > > Background: > > There's a long discussion going on at issue github #5479 [1], where > the old problem of Windows wheels for numpy came up. > > For those of you not following this issue, the current situation for > community-built numpy Windows binaries is dire: > > * We have not so far provided windows wheels on pypi, so `pip install > numpy` on Windows will bring you a world of pain; > * Until recently we did provide .exe "superpack" installers on > sourceforge, but these became increasingly difficult to build and we > gave up building them as of the latest (1.10.4) release. > > Despite this, popularity of Windows wheels on pypi is high. A few > weeks ago, Donald Stufft ran a query for the binary wheels most often > downloaded from pypi, for any platform [2] . The top five most > downloaded were (n_downloads, name): > > 6646, > numpy-1.10.4-cp27-none-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl > 5445, cryptography-1.2.1-cp27-none-win_amd64.whl > 5243, matplotlib-1.4.0-cp34-none-win32.whl > 5241, scikit_learn-0.15.1-cp34-none-win32.whl > 4573, pandas-0.17.1-cp27-none-win_amd64.whl > > So a) the OSX numpy wheel is very popular and b) despite the fact that > we don't provide a numpy wheel for Windows, matplotlib, sckit_learn > and pandas, that depend on numpy, are the 3rd, 4th and 5th most > downloaded wheels as of a few weeks ago. > > So, there seems to be a large appetite for numpy wheels. > > Current proposal: > > I have now built numpy wheels, using the ATLAS blas / lapack library - > the build is automatic and reproducible [3]. > > I chose ATLAS to build against, rather than, say OpenBLAS, because > we've had some significant worries in the past about the reliability > of OpenBLAS, and I thought it better to err on the side of > correctness. > > However, these builds are relatively slow for matrix multiply and > other linear algebra routines compared numpy built against OpenBLAS or > MKL (which we cannot use because of its license) [4]. In my very > crude array test of a dot product and matrix inversion, the ATLAS > wheels were 2-3 times slower than MKL. Other benchmarks on Julia > found about the same result for ATLAS vs OpenBLAS on 32-bit bit, but a > much bigger difference on 64-bit (for an earlier version of ATLAS than > we are currently using) [5]. > > So, our numpy wheels likely to be stable and give correct results, but > will be somewhat slow for linear algebra. > I would not worry too much about this: at worst, this gives us back the situation where we were w/ so-called superpack, which have been successful in the past to spread numpy use on windows. My main worry is whether this locks us into ATLAS for a long time because of package depending on numpy blas/lapack (scipy, scikit learn). I am not sure how much this is the case. David > > I propose that we upload these ATLAS wheels to pypi. The upside is > that this gives our Windows users a much better experience with pip, > and allows other developers to build Windows wheels that depend on > numpy. The downside is that these will not be optimized for > performance on modern processors. In order to signal that, I propose > adding the following text to the numpy pypi front page: > > ``` > All numpy wheels distributed from pypi are BSD licensed. > > Windows wheels are linked against the ATLAS BLAS / LAPACK library, > restricted to SSE2 instructions, so may not give optimal linear > algebra performance for your machine. See > http://docs.scipy.org/doc/numpy/user/install.html for alternatives. > ``` > > In a way this is very similar to our previous situation, in that the > superpack installers also used ATLAS - in fact an older version of > ATLAS. > > Once we are up and running with numpy wheels, we can consider whether > we should switch to other BLAS libraries, such as OpenBLAS or BLIS > (see [6]). > > I'm posting here hoping for your feedback... > > Cheers, > > Matthew > > > [1] https://github.com/numpy/numpy/issues/5479 > [2] https://gist.github.com/dstufft/1dda9a9f87ee7121e0ee > [3] https://ci.appveyor.com/project/matthew-brett/np-wheel-builder > [4] http://mingwpy.github.io/blas_lapack.html#intel-math-kernel-library > [5] https://github.com/numpy/numpy/issues/5479#issuecomment-185033668 > [6] https://github.com/numpy/numpy/issues/7372 > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Mar 4 10:31:11 2016 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 4 Mar 2016 08:31:11 -0700 Subject: [Numpy-discussion] Ufunc identity for bitwise reduction of object arrays. Message-ID: Hi All, There is currently some discussion on whether or not object arrays should have an identity for bitwise reductions. Currently, they do not use the identity for non-empty arrays, so this would only affect reductions on empty arrays. Currently bitwise_or, bitwise_xor, and bitwise_and will return (bool_) 0, (bool_) 0, and (int) -1 respectively in that case. Note the non-object arrays work as they should, the question is only about object arrays. Thougts? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Fri Mar 4 13:38:22 2016 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 4 Mar 2016 10:38:22 -0800 Subject: [Numpy-discussion] Windows wheels, built, but should we deploy? In-Reply-To: References: Message-ID: On Fri, Mar 4, 2016 at 12:29 AM, David Cournapeau wrote: > > > On Fri, Mar 4, 2016 at 4:42 AM, Matthew Brett > wrote: >> >> Hi, >> >> Summary: >> >> I propose that we upload Windows wheels to pypi. The wheels are >> likely to be stable and relatively easy to maintain, but will have >> slower performance than other versions of numpy linked against faster >> BLAS / LAPACK libraries. >> >> Background: >> >> There's a long discussion going on at issue github #5479 [1], where >> the old problem of Windows wheels for numpy came up. >> >> For those of you not following this issue, the current situation for >> community-built numpy Windows binaries is dire: >> >> * We have not so far provided windows wheels on pypi, so `pip install >> numpy` on Windows will bring you a world of pain; >> * Until recently we did provide .exe "superpack" installers on >> sourceforge, but these became increasingly difficult to build and we >> gave up building them as of the latest (1.10.4) release. >> >> Despite this, popularity of Windows wheels on pypi is high. A few >> weeks ago, Donald Stufft ran a query for the binary wheels most often >> downloaded from pypi, for any platform [2] . The top five most >> downloaded were (n_downloads, name): >> >> 6646, >> numpy-1.10.4-cp27-none-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl >> 5445, cryptography-1.2.1-cp27-none-win_amd64.whl >> 5243, matplotlib-1.4.0-cp34-none-win32.whl >> 5241, scikit_learn-0.15.1-cp34-none-win32.whl >> 4573, pandas-0.17.1-cp27-none-win_amd64.whl >> >> So a) the OSX numpy wheel is very popular and b) despite the fact that >> we don't provide a numpy wheel for Windows, matplotlib, sckit_learn >> and pandas, that depend on numpy, are the 3rd, 4th and 5th most >> downloaded wheels as of a few weeks ago. >> >> So, there seems to be a large appetite for numpy wheels. >> >> Current proposal: >> >> I have now built numpy wheels, using the ATLAS blas / lapack library - >> the build is automatic and reproducible [3]. >> >> I chose ATLAS to build against, rather than, say OpenBLAS, because >> we've had some significant worries in the past about the reliability >> of OpenBLAS, and I thought it better to err on the side of >> correctness. >> >> However, these builds are relatively slow for matrix multiply and >> other linear algebra routines compared numpy built against OpenBLAS or >> MKL (which we cannot use because of its license) [4]. In my very >> crude array test of a dot product and matrix inversion, the ATLAS >> wheels were 2-3 times slower than MKL. Other benchmarks on Julia >> found about the same result for ATLAS vs OpenBLAS on 32-bit bit, but a >> much bigger difference on 64-bit (for an earlier version of ATLAS than >> we are currently using) [5]. >> >> So, our numpy wheels likely to be stable and give correct results, but >> will be somewhat slow for linear algebra. > > > I would not worry too much about this: at worst, this gives us back the > situation where we were w/ so-called superpack, which have been successful > in the past to spread numpy use on windows. > > My main worry is whether this locks us into ATLAS for a long time because > of package depending on numpy blas/lapack (scipy, scikit learn). I am not > sure how much this is the case. You mean the situation where other packages try to find the BLAS / LAPACK library and link against that? My impression was that neither scipy or scikit-learn do that at the moment, but I'm happy to be corrected. You'd know better than me about this, but my understanding is that BLAS / LAPACK has a standard interface that should allow code to run the same way, regardless of which BLAS / LAPACK library it is linking to. So, even if another package is trying to link against the numpy BLAS, swapping the numpy BLAS library shouldn't cause a problem (unless the package is trying to link to ATLAS-specific stuff, which seems a bit unlikely). Is that right? Cheers, Matthew From chris.barker at noaa.gov Fri Mar 4 15:29:52 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 4 Mar 2016 12:29:52 -0800 Subject: [Numpy-discussion] Windows wheels, built, but should we deploy? In-Reply-To: References: Message-ID: +1 -- thanks for doing all this work. There is a HUGE amount you can do with numpy that doesn't give a whit about how fast .dot() et all are. If you really do need that to be fast as possible, you can pug in a faster build later. This is great. Just as one example -- I teach a general python class every year --I do only one session on numpy/scipy. If I can expect my students to be able to simply pip install the core scipy stack, this will be SO much easier. -CHB On Thu, Mar 3, 2016 at 8:42 PM, Matthew Brett wrote: > Hi, > > Summary: > > I propose that we upload Windows wheels to pypi. The wheels are > likely to be stable and relatively easy to maintain, but will have > slower performance than other versions of numpy linked against faster > BLAS / LAPACK libraries. > > Background: > > There's a long discussion going on at issue github #5479 [1], where > the old problem of Windows wheels for numpy came up. > > For those of you not following this issue, the current situation for > community-built numpy Windows binaries is dire: > > * We have not so far provided windows wheels on pypi, so `pip install > numpy` on Windows will bring you a world of pain; > * Until recently we did provide .exe "superpack" installers on > sourceforge, but these became increasingly difficult to build and we > gave up building them as of the latest (1.10.4) release. > > Despite this, popularity of Windows wheels on pypi is high. A few > weeks ago, Donald Stufft ran a query for the binary wheels most often > downloaded from pypi, for any platform [2] . The top five most > downloaded were (n_downloads, name): > > 6646, > numpy-1.10.4-cp27-none-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl > 5445, cryptography-1.2.1-cp27-none-win_amd64.whl > 5243, matplotlib-1.4.0-cp34-none-win32.whl > 5241, scikit_learn-0.15.1-cp34-none-win32.whl > 4573, pandas-0.17.1-cp27-none-win_amd64.whl > > So a) the OSX numpy wheel is very popular and b) despite the fact that > we don't provide a numpy wheel for Windows, matplotlib, sckit_learn > and pandas, that depend on numpy, are the 3rd, 4th and 5th most > downloaded wheels as of a few weeks ago. > > So, there seems to be a large appetite for numpy wheels. > > Current proposal: > > I have now built numpy wheels, using the ATLAS blas / lapack library - > the build is automatic and reproducible [3]. > > I chose ATLAS to build against, rather than, say OpenBLAS, because > we've had some significant worries in the past about the reliability > of OpenBLAS, and I thought it better to err on the side of > correctness. > > However, these builds are relatively slow for matrix multiply and > other linear algebra routines compared numpy built against OpenBLAS or > MKL (which we cannot use because of its license) [4]. In my very > crude array test of a dot product and matrix inversion, the ATLAS > wheels were 2-3 times slower than MKL. Other benchmarks on Julia > found about the same result for ATLAS vs OpenBLAS on 32-bit bit, but a > much bigger difference on 64-bit (for an earlier version of ATLAS than > we are currently using) [5]. > > So, our numpy wheels likely to be stable and give correct results, but > will be somewhat slow for linear algebra. > > I propose that we upload these ATLAS wheels to pypi. The upside is > that this gives our Windows users a much better experience with pip, > and allows other developers to build Windows wheels that depend on > numpy. The downside is that these will not be optimized for > performance on modern processors. In order to signal that, I propose > adding the following text to the numpy pypi front page: > > ``` > All numpy wheels distributed from pypi are BSD licensed. > > Windows wheels are linked against the ATLAS BLAS / LAPACK library, > restricted to SSE2 instructions, so may not give optimal linear > algebra performance for your machine. See > http://docs.scipy.org/doc/numpy/user/install.html for alternatives. > ``` > > In a way this is very similar to our previous situation, in that the > superpack installers also used ATLAS - in fact an older version of > ATLAS. > > Once we are up and running with numpy wheels, we can consider whether > we should switch to other BLAS libraries, such as OpenBLAS or BLIS > (see [6]). > > I'm posting here hoping for your feedback... > > Cheers, > > Matthew > > > [1] https://github.com/numpy/numpy/issues/5479 > [2] https://gist.github.com/dstufft/1dda9a9f87ee7121e0ee > [3] https://ci.appveyor.com/project/matthew-brett/np-wheel-builder > [4] http://mingwpy.github.io/blas_lapack.html#intel-math-kernel-library > [5] https://github.com/numpy/numpy/issues/5479#issuecomment-185033668 > [6] https://github.com/numpy/numpy/issues/7372 > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Fri Mar 4 16:20:44 2016 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 4 Mar 2016 21:20:44 +0000 (UTC) Subject: [Numpy-discussion] GSoC? References: Message-ID: Thu, 11 Feb 2016 00:02:52 +0100, Ralf Gommers kirjoitti: [clip] > OK first version: > https://github.com/scipy/scipy/wiki/GSoC-2016-project-ideas I kept some > of the ideas from last year, but removed all potential mentors as the > same people may not be available this year - please re-add yourselves > where needed. > > And to everyone who has a good idea, and preferably is willing to mentor > for that idea: please add it to that page. I probably don't have bandwidth for mentoring, but as the Numpy suggestions seem to be mostly "hard" problems, we can add another one: ## Dealing with overlapping input/output data Numpy operations where output arrays overlap with input arrays can produce unexpected results. A simple example is ``` x = np.arange(100*100).reshape(100,100) x += x.T # <- undefined result! ``` The task is to change Numpy so that the results here become similar to as if the input arrays overlapping with output were separate (here: `x += x.T.copy()`). The challenge here lies in doing this without sacrificing too much performance or memory efficiency. Initial steps toward solving this problem were taken in https://github.com/numpy/numpy/pull/6166 where a simplest available algorithm for detecting if arrays overlap was added. However, this is not yet utilized in ufuncs. An initial attempt to sketch what should be done is at https://github.com/numpy/numpy/issues/6272 and issues referenced therein. From josef.pktd at gmail.com Fri Mar 4 22:30:31 2016 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 4 Mar 2016 22:30:31 -0500 Subject: [Numpy-discussion] Windows wheels, built, but should we deploy? In-Reply-To: References: Message-ID: On Fri, Mar 4, 2016 at 1:38 PM, Matthew Brett wrote: > On Fri, Mar 4, 2016 at 12:29 AM, David Cournapeau > wrote: > > > > > > On Fri, Mar 4, 2016 at 4:42 AM, Matthew Brett > > wrote: > >> > >> Hi, > >> > >> Summary: > >> > >> I propose that we upload Windows wheels to pypi. The wheels are > >> likely to be stable and relatively easy to maintain, but will have > >> slower performance than other versions of numpy linked against faster > >> BLAS / LAPACK libraries. > >> > >> Background: > >> > >> There's a long discussion going on at issue github #5479 [1], where > >> the old problem of Windows wheels for numpy came up. > >> > >> For those of you not following this issue, the current situation for > >> community-built numpy Windows binaries is dire: > >> > >> * We have not so far provided windows wheels on pypi, so `pip install > >> numpy` on Windows will bring you a world of pain; > >> * Until recently we did provide .exe "superpack" installers on > >> sourceforge, but these became increasingly difficult to build and we > >> gave up building them as of the latest (1.10.4) release. > >> > >> Despite this, popularity of Windows wheels on pypi is high. A few > >> weeks ago, Donald Stufft ran a query for the binary wheels most often > >> downloaded from pypi, for any platform [2] . The top five most > >> downloaded were (n_downloads, name): > >> > >> 6646, > >> > numpy-1.10.4-cp27-none-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl > >> 5445, cryptography-1.2.1-cp27-none-win_amd64.whl > >> 5243, matplotlib-1.4.0-cp34-none-win32.whl > >> 5241, scikit_learn-0.15.1-cp34-none-win32.whl > >> 4573, pandas-0.17.1-cp27-none-win_amd64.whl > >> > >> So a) the OSX numpy wheel is very popular and b) despite the fact that > >> we don't provide a numpy wheel for Windows, matplotlib, sckit_learn > >> and pandas, that depend on numpy, are the 3rd, 4th and 5th most > >> downloaded wheels as of a few weeks ago. > >> > >> So, there seems to be a large appetite for numpy wheels. > >> > >> Current proposal: > >> > >> I have now built numpy wheels, using the ATLAS blas / lapack library - > >> the build is automatic and reproducible [3]. > >> > >> I chose ATLAS to build against, rather than, say OpenBLAS, because > >> we've had some significant worries in the past about the reliability > >> of OpenBLAS, and I thought it better to err on the side of > >> correctness. > >> > >> However, these builds are relatively slow for matrix multiply and > >> other linear algebra routines compared numpy built against OpenBLAS or > >> MKL (which we cannot use because of its license) [4]. In my very > >> crude array test of a dot product and matrix inversion, the ATLAS > >> wheels were 2-3 times slower than MKL. Other benchmarks on Julia > >> found about the same result for ATLAS vs OpenBLAS on 32-bit bit, but a > >> much bigger difference on 64-bit (for an earlier version of ATLAS than > >> we are currently using) [5]. > >> > >> So, our numpy wheels likely to be stable and give correct results, but > >> will be somewhat slow for linear algebra. > > > > > > I would not worry too much about this: at worst, this gives us back the > > situation where we were w/ so-called superpack, which have been > successful > > in the past to spread numpy use on windows. > > > > My main worry is whether this locks us into ATLAS for a long time > because > > of package depending on numpy blas/lapack (scipy, scikit learn). I am not > > sure how much this is the case. > > You mean the situation where other packages try to find the BLAS / > LAPACK library and link against that? My impression was that neither > scipy or scikit-learn do that at the moment, but I'm happy to be > corrected. > > You'd know better than me about this, but my understanding is that > BLAS / LAPACK has a standard interface that should allow code to run > the same way, regardless of which BLAS / LAPACK library it is linking > to. So, even if another package is trying to link against the numpy > BLAS, swapping the numpy BLAS library shouldn't cause a problem > (unless the package is trying to link to ATLAS-specific stuff, which > seems a bit unlikely). > > Is that right? > AFAIK, numpy doesn't provide access to BLAS/LAPACK. scipy does. statsmodels is linking to the installed BLAS/LAPACK in cython code through scipy. So far we haven't seen problems with different versions. I think scipy development works very well to isolate linalg library version specific parts from the user interface. AFAIU, The main problem will be linking to inconsistent Fortran libraries in downstream packages that use Fortran. Eg. AFAIU it won't work to pip install a ATLAS based numpy and then install a MKL based scipy from Gohlke. I don't know if there is a useful error message, or if this just results in puzzled users. Josef > > Cheers, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Fri Mar 4 23:40:29 2016 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 4 Mar 2016 20:40:29 -0800 Subject: [Numpy-discussion] Windows wheels, built, but should we deploy? In-Reply-To: References: Message-ID: On Fri, Mar 4, 2016 at 7:30 PM, wrote: [...] > AFAIK, numpy doesn't provide access to BLAS/LAPACK. scipy does. statsmodels > is linking to the installed BLAS/LAPACK in cython code through scipy. So far > we haven't seen problems with different versions. I think scipy development > works very well to isolate linalg library version specific parts from the > user interface. Yeah, it should be invisible to users of both numpy and scipy which BLAS/LAPACK is in use under the hood. > > AFAIU, The main problem will be linking to inconsistent Fortran libraries in > downstream packages that use Fortran. > Eg. AFAIU it won't work to pip install a ATLAS based numpy and then install > a MKL based scipy from Gohlke. The specific scenario you describe will be a problem, but not for the reason you state -- the problem is that (IIUC) the Gohlke scipy build has some specific hacks where it "knows" that it can find a copy of MKL buried at a particular location inside the numpy package (and the Gohlke numpy build has a specific hack to put a copy of MKL there). So the Gohlke scipy requires the Gohlke numpy, but this is due to patches that Christoph applies to his builds. AFAIK, outside of downstream packages that poke around the inside of numpy like this, there should be no way for downstream packages to know or care which BLAS/LAPACK implementation numpy is using (except for speed, bugs, etc.). -n -- Nathaniel J. Smith -- https://vorpus.org From matthew.brett at gmail.com Sat Mar 5 13:40:37 2016 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 5 Mar 2016 10:40:37 -0800 Subject: [Numpy-discussion] Windows wheels, built, but should we deploy? In-Reply-To: References: Message-ID: Hi, On Fri, Mar 4, 2016 at 8:40 PM, Nathaniel Smith wrote: > On Fri, Mar 4, 2016 at 7:30 PM, wrote: > [...] >> AFAIK, numpy doesn't provide access to BLAS/LAPACK. scipy does. statsmodels >> is linking to the installed BLAS/LAPACK in cython code through scipy. So far >> we haven't seen problems with different versions. I think scipy development >> works very well to isolate linalg library version specific parts from the >> user interface. > > Yeah, it should be invisible to users of both numpy and scipy which > BLAS/LAPACK is in use under the hood. My impression is that the general mood here is positive, so I plan to deploy these wheels to pypi on Monday, with the change to the pypi text. Please do let me know if there are any strong objections. Cheers, Matthew From sebastian at sipsolutions.net Sun Mar 6 07:16:50 2016 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Sun, 06 Mar 2016 13:16:50 +0100 Subject: [Numpy-discussion] GSoC? In-Reply-To: References: Message-ID: <1457266610.10047.1.camel@sipsolutions.net> On Fr, 2016-03-04 at 21:20 +0000, Pauli Virtanen wrote: > Thu, 11 Feb 2016 00:02:52 +0100, Ralf Gommers kirjoitti: > [clip] > > OK first version: > > https://github.com/scipy/scipy/wiki/GSoC-2016-project-ideas I kept > > some > > of the ideas from last year, but removed all potential mentors as > > the > > same people may not be available this year - please re-add > > yourselves > > where needed. > > > > And to everyone who has a good idea, and preferably is willing to > > mentor > > for that idea: please add it to that page. > > I probably don't have bandwidth for mentoring, but as the Numpy > suggestions seem to be mostly "hard" problems, we can add another > one: > > ## Dealing with overlapping input/output data > > Numpy operations where output arrays overlap with > input arrays can produce unexpected results. > A simple example is > ``` > x = np.arange(100*100).reshape(100,100) > x += x.T # <- undefined result! > ``` > The task is to change Numpy so that the results > here become similar to as if the input arrays > overlapping with output were separate (here: `x += x.T.copy()`). > The challenge here lies in doing this without sacrificing > too much performance or memory efficiency. > > Initial steps toward solving this problem were taken in > https://github.com/numpy/numpy/pull/6166 > where a simplest available algorithm for detecting > if arrays overlap was added. However, this is not yet > utilized in ufuncs. An initial attempt to sketch what > should be done is at https://github.com/numpy/numpy/issues/6272 > and issues referenced therein. > Since I like the idea, I copy pasted it into the GSoC project ideas wiki. - Sebastian > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From cmkleffner at gmail.com Sun Mar 6 09:52:31 2016 From: cmkleffner at gmail.com (Carl Kleffner) Date: Sun, 6 Mar 2016 15:52:31 +0100 Subject: [Numpy-discussion] Windows wheels, built, but should we deploy? In-Reply-To: References: Message-ID: +1 from me. I could prepare scipy builds based on these numpy builds. Carl 2016-03-05 19:40 GMT+01:00 Matthew Brett : > Hi, > > On Fri, Mar 4, 2016 at 8:40 PM, Nathaniel Smith wrote: > > On Fri, Mar 4, 2016 at 7:30 PM, wrote: > > [...] > >> AFAIK, numpy doesn't provide access to BLAS/LAPACK. scipy does. > statsmodels > >> is linking to the installed BLAS/LAPACK in cython code through scipy. > So far > >> we haven't seen problems with different versions. I think scipy > development > >> works very well to isolate linalg library version specific parts from > the > >> user interface. > > > > Yeah, it should be invisible to users of both numpy and scipy which > > BLAS/LAPACK is in use under the hood. > > My impression is that the general mood here is positive, so I plan to > deploy these wheels to pypi on Monday, with the change to the pypi > text. Please do let me know if there are any strong objections. > > Cheers, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Mon Mar 7 15:29:45 2016 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 7 Mar 2016 12:29:45 -0800 Subject: [Numpy-discussion] Windows wheels, built, but should we deploy? In-Reply-To: References: Message-ID: On Sat, Mar 5, 2016 at 10:40 AM, Matthew Brett wrote: > Hi, > > On Fri, Mar 4, 2016 at 8:40 PM, Nathaniel Smith wrote: >> On Fri, Mar 4, 2016 at 7:30 PM, wrote: >> [...] >>> AFAIK, numpy doesn't provide access to BLAS/LAPACK. scipy does. statsmodels >>> is linking to the installed BLAS/LAPACK in cython code through scipy. So far >>> we haven't seen problems with different versions. I think scipy development >>> works very well to isolate linalg library version specific parts from the >>> user interface. >> >> Yeah, it should be invisible to users of both numpy and scipy which >> BLAS/LAPACK is in use under the hood. > > My impression is that the general mood here is positive, so I plan to > deploy these wheels to pypi on Monday, with the change to the pypi > text. Please do let me know if there are any strong objections. Done: (py35) PS C:\tmp> pip install numpy Collecting numpy Downloading numpy-1.10.4-cp35-none-win32.whl (6.6MB) 100% |################################| 6.6MB 34kB/s Installing collected packages: numpy Cheers, Matthew From faltet at gmail.com Tue Mar 8 08:27:44 2016 From: faltet at gmail.com (Francesc Alted) Date: Tue, 8 Mar 2016 14:27:44 +0100 Subject: [Numpy-discussion] [ANN] bcolz 1.0.0 RC1 released Message-ID: ========================== Announcing bcolz 1.0.0 RC1 ========================== What's new ========== Yeah, 1.0.0 is finally here. We are not introducing any exciting new feature (just some optimizations and bug fixes), but bcolz is already 6 years old and it implements most of the capabilities that it was designed for, so I decided to release a 1.0.0 meaning that the format is declared stable and that people can be assured that future bcolz releases will be able to read bcolz 1.0 data files (and probably much earlier ones too) for a long while. Such a format is fully described at: https://github.com/Blosc/bcolz/blob/master/DISK_FORMAT_v1.rst Also, a 1.0.0 release means that bcolz 1.x series will be based on C-Blosc 1.x series (https://github.com/Blosc/c-blosc). After C-Blosc 2.x (https://github.com/Blosc/c-blosc2) would be out, a new bcolz 2.x is expected taking advantage of shiny new features of C-Blosc2 (more compressors, more filters, native variable length support and the concept of super-chunks), which should be very beneficial for next bcolz generation. Important: this is a Release Candidate, so please test it as much as you can. If no issues would appear in a week or so, I will proceed to tag and release 1.0.0 final. Enjoy! For a more detailed change log, see: https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst What it is ========== *bcolz* provides columnar and compressed data containers that can live either on-disk or in-memory. Column storage allows for efficiently querying tables with a large number of columns. It also allows for cheap addition and removal of column. In addition, bcolz objects are compressed by default for reducing memory/disk I/O needs. The compression process is carried out internally by Blosc, an extremely fast meta-compressor that is optimized for binary data. Lastly, high-performance iterators (like ``iter()``, ``where()``) for querying the objects are provided. bcolz can use numexpr internally so as to accelerate many vector and query operations (although it can use pure NumPy for doing so too). numexpr optimizes the memory usage and use several cores for doing the computations, so it is blazing fast. Moreover, since the carray/ctable containers can be disk-based, and it is possible to use them for seamlessly performing out-of-memory computations. bcolz has minimal dependencies (NumPy), comes with an exhaustive test suite and fully supports both 32-bit and 64-bit platforms. Also, it is typically tested on both UNIX and Windows operating systems. Together, bcolz and the Blosc compressor, are finally fulfilling the promise of accelerating memory I/O, at least for some real scenarios: http://nbviewer.ipython.org/github/Blosc/movielens-bench/blob/master/querying-ep14.ipynb#Plots Other users of bcolz are Visualfabriq (http://www.visualfabriq.com/) the Blaze project (http://blaze.pydata.org/), Quantopian (https://www.quantopian.com/) and Scikit-Allel (https://github.com/cggh/scikit-allel) which you can read more about by pointing your browser at the links below. * Visualfabriq: * *bquery*, A query and aggregation framework for Bcolz: * https://github.com/visualfabriq/bquery * Blaze: * Notebooks showing Blaze + Pandas + BColz interaction: * http://nbviewer.ipython.org/url/blaze.pydata.org/notebooks/timings-csv.ipynb * http://nbviewer.ipython.org/url/blaze.pydata.org/notebooks/timings-bcolz.ipynb * Quantopian: * Using compressed data containers for faster backtesting at scale: * https://quantopian.github.io/talks/NeedForSpeed/slides.html * Scikit-Allel * Provides an alternative backend to work with compressed arrays * https://scikit-allel.readthedocs.org/en/latest/model/bcolz.html Installing ========== bcolz is in the PyPI repository, so installing it is easy:: $ pip install -U bcolz Resources ========= Visit the main bcolz site repository at: http://github.com/Blosc/bcolz Manual: http://bcolz.blosc.org Home of Blosc compressor: http://blosc.org User's mail list: bcolz at googlegroups.com http://groups.google.com/group/bcolz License is the new BSD: https://github.com/Blosc/bcolz/blob/master/LICENSES/BCOLZ.txt Release notes can be found in the Git repository: https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst ---- **Enjoy data!** -- Francesc Alted -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndbecker2 at gmail.com Tue Mar 8 09:26:13 2016 From: ndbecker2 at gmail.com (Neal Becker) Date: Tue, 08 Mar 2016 09:26:13 -0500 Subject: [Numpy-discussion] tracemalloc + numpy? Message-ID: I'm trying tracemalloc to find memory usage. Will numpy array memory usage be counted by tracemalloc? (Doesn't seem to) From solipsis at pitrou.net Tue Mar 8 09:31:17 2016 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 8 Mar 2016 15:31:17 +0100 Subject: [Numpy-discussion] tracemalloc + numpy? References: Message-ID: <20160308153117.15c12d12@fsol> On Tue, 08 Mar 2016 09:26:13 -0500 Neal Becker wrote: > I'm trying tracemalloc to find memory usage. Will numpy array memory usage > be counted by tracemalloc? (Doesn't seem to) No, but together with something like https://github.com/numpy/numpy/pull/5470 it could. Regards Antoine. From Nicolas.Rougier at inria.fr Tue Mar 8 15:18:42 2016 From: Nicolas.Rougier at inria.fr (Nicolas P. Rougier) Date: Tue, 8 Mar 2016 21:18:42 +0100 Subject: [Numpy-discussion] 100 numpy exercises (80/100) Message-ID: <26180CFC-E041-4864-B8D8-E890E35A230E@inria.fr> Hi all, I've just added some exercises to the collection at https://github.com/rougier/numpy-100 (and in the process, I've discovered np.argpartition... nice!) If you have some ideas/comments/corrections... Still 20 to go... Nicolas From olivier.grisel at ensta.org Tue Mar 8 16:11:33 2016 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Tue, 8 Mar 2016 22:11:33 +0100 Subject: [Numpy-discussion] Windows wheels, built, but should we deploy? In-Reply-To: References: Message-ID: Thanks Matthew! I just installed it and ran the tests and it all works (except for test_system_info.py that fails because I am missing a vcvarsall.bat on that system but this is expected). -- Olivier From jni.soma at gmail.com Tue Mar 8 20:08:39 2016 From: jni.soma at gmail.com (Juan Nunez-Iglesias) Date: Wed, 9 Mar 2016 12:08:39 +1100 Subject: [Numpy-discussion] 100 numpy exercises (80/100) In-Reply-To: <26180CFC-E041-4864-B8D8-E890E35A230E@inria.fr> References: <26180CFC-E041-4864-B8D8-E890E35A230E@inria.fr> Message-ID: Thanks for this fantastic resource, Nicolas! I also had never heard of argpartition and immediately know of many places in my code where I can use it. I also learned that axis= can take a tuple as an argument. On Wed, Mar 9, 2016 at 7:18 AM, Nicolas P. Rougier wrote: > > Hi all, > > I've just added some exercises to the collection at > https://github.com/rougier/numpy-100 > (and in the process, I've discovered np.argpartition... nice!) > > If you have some ideas/comments/corrections... Still 20 to go... > > > > Nicolas > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeffreback at gmail.com Wed Mar 9 10:05:39 2016 From: jeffreback at gmail.com (Jeff Reback) Date: Wed, 9 Mar 2016 10:05:39 -0500 Subject: [Numpy-discussion] ANN: pandas v0.18.0rc2 - RELEASE CANDIDATE Message-ID: Hi, I'm pleased to announce the availability of the second release candidate of Pandas 0.18.0. Please try this RC and report any issues here: Pandas Issues . Compared to RC1, we have added updated read_sas and fixed float indexing. We will be releasing officially very shortly. THIS IS NOT A PRODUCTION RELEASE This is a major release from 0.17.1 and includes a small number of API changes, several new features, enhancements, and performance improvements along with a large number of bug fixes. We recommend that all users upgrade to this version. Highlights include: - pandas >= 0.18.0 will no longer support compatibility with Python version 2.6 GH7718 or version 3.3 GH11273 - Moving and expanding window functions are now methods on Series and DataFrame similar to .groupby like objects, see here . - Adding support for a RangeIndex as a specialized form of the Int64Index for memory savings, see here . - API breaking .resample changes to make it more .groupby like, see here - Removal of support for positional indexing with floats, which was deprecated since 0.14.0. This will now raise a TypeError, see here - The .to_xarray() function has been added for compatibility with the xarray package see here . - The read_sas() function has been enhanced to read sas7bdat files, see here - Addition of the .str.extractall() method , and API changes to the the .str.extract() method , and the .str.cat() method - pd.test() top-level nose test runner is available GH4327 See the Whatsnew for much more information. Best way to get this is to install via conda from our development channel. Builds for osx-64,linux-64,win-64 for Python 2.7 and Python 3.5 are all available. conda install pandas=v0.18.0rc2 -c pandas Thanks to all who made this release happen. It is a very large release! Jeff -------------- next part -------------- An HTML attachment was scrubbed... URL: From hemla21 at gmail.com Thu Mar 10 12:00:53 2016 From: hemla21 at gmail.com (Hedieh Ebrahimi) Date: Thu, 10 Mar 2016 18:00:53 +0100 Subject: [Numpy-discussion] Fwd: Looping and searching in numpy array In-Reply-To: References: Message-ID: Dear all, I need to loop over a numpy array and then do the following search. The following is taking almost 60(s) for an array (npArray1 and npArray2 in the example below) with around 300K values. In other words, I am looking for the index of the first occurence in npArray2 for every value of npArray1. for id in np.nditer(npArray1): newId=(np.where(npArray2==id))[0][0] Is there anyway I can make the above faster using numpy? I need to run the script above on much bigger arrays (50M). Please note that my two numpy arrays in the lines above, npArray1 and npArray2 are not necessarily the same size, but they are both 1d. Thanks a lot for your help, -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeffreback at gmail.com Sat Mar 12 11:13:49 2016 From: jeffreback at gmail.com (Jeff Reback) Date: Sat, 12 Mar 2016 11:13:49 -0500 Subject: [Numpy-discussion] ANN: pandas v0.18.0 Final released Message-ID: Hi, This is a major release from 0.17.1 and includes a small number of API changes, several new features, enhancements, and performance improvements along with a large number of bug fixes. We recommend that all users upgrade to this version. This was a release of 3.5 months with 381 commits by 100 authors encompassing 465 issues and 290 pull-requests. *What is it:* *pandas* is a Python package providing fast, flexible, and expressive data structures designed to make working with ?relational? or ?labeled? data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language. *Highlights*: - pandas >= 0.18.0 will no longer support compatibility with Python version 2.6 GH7718 or version 3.3 GH11273 - Moving and expanding window functions are now methods on Series and DataFrame similar to .groupby like objects, see here . - Adding support for a RangeIndex as a specialized form of the Int64Index for memory savings, see here . - API breaking .resample changes to make it more .groupby like, see here - Removal of support for positional indexing with floats, which was deprecated since 0.14.0. This will now raise a TypeError, see here - The .to_xarray() function has been added for compatibility with the xarray package see here . - The read_sas() function has been enhanced to read sas7bdat files, see here - Addition of the .str.extractall() method , and API changes to the the .str.extract() method , and the .str.cat() method - pd.test() top-level nose test runner is available GH4327 See the Whatsnew for much more information and the full Documentation link. *How to get it:* Source tarballs, windows wheels, and macosx wheels are available on PyPI Installation via conda is: - conda install pandas windows wheels are courtesy of Christoph Gohlke and are built on Numpy 1.10 macosx wheels are courtesy of Matthew Brett. *Issues:* Please report any issues on our issue tracker : Jeff *Thanks to all of the contributors* - ARF - Alex Alekseyev - Andrew McPherson - Andrew Rosenfeld - Anthonios Partheniou - Anton I. Sipos - Ben - Ben North - Bran Yang - Chris - Chris Carroux - Christopher C. Aycock - Christopher Scanlin - Cody - Da Wang - Daniel Grady - Dorozhko Anton - Dr-Irv - Erik M. Bray - Evan Wright - Francis T. O'Donovan - Frank Cleary - Gianluca Rossi - Graham Jeffries - Guillaume Horel - Henry Hammond - Isaac Schwabacher - Jean-Mathieu Deschenes - Jeff Reback - Joe Jevnik - John Freeman - John Fremlin - Jonas Hoersch - Joris Van den Bossche - Joris Vankerschaver - Justin Lecher - Justin Lin - Ka Wo Chen - Keming Zhang - Kerby Shedden - Kyle - Marco Farrugia - MasonGallo - MattRijk - Matthew Lurie - Maximilian Roos - Mayank Asthana - Mortada Mehyar - Moussa Taifi - Navreet Gill - Nicolas Bonnotte - Paul Reiners - Philip Gura - Pietro Battiston - RahulHP - Randy Carnevale - Rinoc Johnson - Rishipuri - Sangmin Park - Scott E Lasley - Sereger13 - Shannon Wang - Skipper Seabold - Thierry Moisan - Thomas A Caswell - Toby Dylan Hocking - Tom Augspurger - Travis - Trent Hauck - Tux1 - Varun - Wes McKinney - Will Thompson - Yoav Ram - Yoong Kang Lim - Yoshiki V?zquez Baeza - Young Joong Kim - Younggun Kim - Yuval Langer - alex argunov - behzad nouri - boombard - ian-pantano - chromy - daniel - dgram0 - gfyoung - hack-c - hcontrast - jfoo - kaustuv deolal - llllllllll - ranarag - rockg - scls19fr - seales - sinhrks - srib - surveymedia.ca - tworec -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Tue Mar 15 19:33:38 2016 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 15 Mar 2016 16:33:38 -0700 Subject: [Numpy-discussion] linux wheels coming soon Message-ID: Hi all, Just a heads-up that we're planning to upload Linux wheels for numpy to PyPI soon. Unless there's some objection, these will be using ATLAS, just like the current Windows wheels, for the same reasons -- moving to something faster like OpenBLAS would be good, but given the concerns about OpenBLAS's reliability we want to get something working first and then worry about making it fast. (Plus it doesn't make sense to ship different BLAS libraries on Windows versus Linux -- that just multiplies our support burden for no reason.) -n -- Nathaniel J. Smith -- https://vorpus.org From charlesr.harris at gmail.com Tue Mar 15 20:54:38 2016 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 15 Mar 2016 18:54:38 -0600 Subject: [Numpy-discussion] linux wheels coming soon In-Reply-To: References: Message-ID: On Tue, Mar 15, 2016 at 5:33 PM, Nathaniel Smith wrote: > Hi all, > > Just a heads-up that we're planning to upload Linux wheels for numpy > to PyPI soon. Unless there's some objection, these will be using > ATLAS, just like the current Windows wheels, for the same reasons -- > moving to something faster like OpenBLAS would be good, but given the > concerns about OpenBLAS's reliability we want to get something working > first and then worry about making it fast. (Plus it doesn't make sense > to ship different BLAS libraries on Windows versus Linux -- that just > multiplies our support burden for no reason.) > Good news, thanks to all who have worked on this. Question: what to do with the prerelease uploads on pypi after they are outdated? I'm inclined to delete them, as there may be four of five of them per release and that seems unnecessary clutter. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Tue Mar 15 21:10:13 2016 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 15 Mar 2016 18:10:13 -0700 Subject: [Numpy-discussion] linux wheels coming soon In-Reply-To: References: Message-ID: On Mar 15, 2016 5:54 PM, "Charles R Harris" wrote: > > > > On Tue, Mar 15, 2016 at 5:33 PM, Nathaniel Smith wrote: >> >> Hi all, >> >> Just a heads-up that we're planning to upload Linux wheels for numpy >> to PyPI soon. Unless there's some objection, these will be using >> ATLAS, just like the current Windows wheels, for the same reasons -- >> moving to something faster like OpenBLAS would be good, but given the >> concerns about OpenBLAS's reliability we want to get something working >> first and then worry about making it fast. (Plus it doesn't make sense >> to ship different BLAS libraries on Windows versus Linux -- that just >> multiplies our support burden for no reason.) > > > Good news, thanks to all who have worked on this. > > Question: what to do with the prerelease uploads on pypi after they are outdated? I'm inclined to delete them, as there may be four of five of them per release and that seems unnecessary clutter. I'd just leave them? Pypi doesn't care, and who knows, they might be useful for archival purposes to someone. Plus this is less work :-) -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue Mar 15 21:36:29 2016 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 15 Mar 2016 19:36:29 -0600 Subject: [Numpy-discussion] linux wheels coming soon In-Reply-To: References: Message-ID: On Tue, Mar 15, 2016 at 7:10 PM, Nathaniel Smith wrote: > On Mar 15, 2016 5:54 PM, "Charles R Harris" > wrote: > > > > > > > > On Tue, Mar 15, 2016 at 5:33 PM, Nathaniel Smith wrote: > >> > >> Hi all, > >> > >> Just a heads-up that we're planning to upload Linux wheels for numpy > >> to PyPI soon. Unless there's some objection, these will be using > >> ATLAS, just like the current Windows wheels, for the same reasons -- > >> moving to something faster like OpenBLAS would be good, but given the > >> concerns about OpenBLAS's reliability we want to get something working > >> first and then worry about making it fast. (Plus it doesn't make sense > >> to ship different BLAS libraries on Windows versus Linux -- that just > >> multiplies our support burden for no reason.) > > > > > > Good news, thanks to all who have worked on this. > > > > Question: what to do with the prerelease uploads on pypi after they are > outdated? I'm inclined to delete them, as there may be four of five of them > per release and that seems unnecessary clutter. > > I'd just leave them? Pypi doesn't care, and who knows, they might be > useful for archival purposes to someone. Plus this is less work :-) > Less work than hitting the delete button? Oh, my aching finger ;) Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Wed Mar 16 12:52:08 2016 From: travis at continuum.io (Travis Oliphant) Date: Wed, 16 Mar 2016 11:52:08 -0500 Subject: [Numpy-discussion] Changes to generalized ufunc core dimension checking Message-ID: Hi everyone, Can you help me understand why the stricter changes to generalized ufunc argument checking no now longer allows scalars to be interpreted as 1-d arrays in the core-dimensions? Is there a way to specify in the core-signature that scalars should be allowed and interpreted in those cases as an array with all the elements the same? This seems like an important feature. Here's an example: myfunc with core-signature (t),(k),(k) -> (t) called with myfunc(arr1, arr2, scalar2). This used to work in 1.9 and before and scalar2 was interpreted as a 1-d array the same size as arr2. It no longer works with 1.10.0 but I don't see why that is an improvement. Thoughts? Is there a work-around that doesn't involve creating a 1-d array the same size as arr2 and filling it with scalar2? Thanks. -Travis -- *Travis Oliphant, PhD* *Co-founder and CEO* @teoliphant 512-222-5440 http://www.continuum.io -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Wed Mar 16 13:55:26 2016 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 16 Mar 2016 10:55:26 -0700 Subject: [Numpy-discussion] Changes to generalized ufunc core dimension checking In-Reply-To: References: Message-ID: Hi Travis, On Mar 16, 2016 9:52 AM, "Travis Oliphant" wrote: > > Hi everyone, > > Can you help me understand why the stricter changes to generalized ufunc argument checking no now longer allows scalars to be interpreted as 1-d arrays in the core-dimensions? > > Is there a way to specify in the core-signature that scalars should be allowed and interpreted in those cases as an array with all the elements the same? This seems like an important feature. Can you share some example of when this is useful? The reasoning for the change was that broadcasting is really about aligning a set of core elements for parallel looping, and in the gufunc case with arbitrary core kernels that might or might not have any simple loop structure inside them, it's not at all obvious that it makes sense. (Of course we still use broadcasting to line up different instances of the core elements themselves, just not to manufacture the internal shape of the core elements.) In fact there are examples where it clearly doesn't make sense, I don't think we were able to come up with any compelling examples where it did make sense (which is one reason why I'm interested to hear what yours is :-)), and there's not a single obvious way to reconcile broadcasting rules and gufunc's named axis matching, so, when in doubt refuse the temptation to guess. (Example of when it doesn't make sense: matrix_multiply with (n,k),(k,m)->(n,m) used to produce all kinds of different counterintuitive behaviors if one or both of the inputs were scalar or 1d. In this case making it an error is a clear improvement IMHO. And for something like inv that takes a single input with signature (n,n), if you get (1,n) do you broadcast that to (n,n)? If not, why not? For regular broadcasting the question makes no sense but once you have named axis matching then suddenly it's not obvious.) > Here's an example: > > myfunc with core-signature (t),(k),(k) -> (t) > > called with myfunc(arr1, arr2, scalar2). > > This used to work in 1.9 and before and scalar2 was interpreted as a 1-d array the same size as arr2. It no longer works with 1.10.0 but I don't see why that is an improvement. > > Thoughts? Is there a work-around that doesn't involve creating a 1-d array the same size as arr2 and filling it with scalar2? A better workaround would be to use one of the np.broadcast* functions to request exactly the broadcasting you want and make an arr2-sized view of the scalar. In this case where you presumably (?) want to allow the last two arguments to be broadcast against each other arbitrarily: arr2, arr3 = np.broadcast_arrays(arr2, scalar) myufunc(arr1, arr2, arr3) A little wordier than implicit broadcasting, but not as bad as manually creating an array, and like implicit broadcasting the memory overhead is O(1) instead of O(size). -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Wed Mar 16 15:48:32 2016 From: travis at continuum.io (Travis Oliphant) Date: Wed, 16 Mar 2016 14:48:32 -0500 Subject: [Numpy-discussion] Changes to generalized ufunc core dimension checking In-Reply-To: References: Message-ID: On Wed, Mar 16, 2016 at 12:55 PM, Nathaniel Smith wrote: > Hi Travis, > > On Mar 16, 2016 9:52 AM, "Travis Oliphant" wrote: > > > > Hi everyone, > > > > Can you help me understand why the stricter changes to generalized ufunc > argument checking no now longer allows scalars to be interpreted as 1-d > arrays in the core-dimensions? > > > > Is there a way to specify in the core-signature that scalars should be > allowed and interpreted in those cases as an array with all the elements > the same? This seems like an important feature. > > Can you share some example of when this is useful? > Being able to implicitly broadcast scalars to arrays is the core-function of broadcasting. This is still very useful when you have a core-kernel an want to pass in a scalar for many of the arguments. It seems that at least in that case, automatic broadcasting should be allowed --- as it seems clear what is meant. While you can use the broadcast* features to get the same effect with the current code-base, this is not intuitive to a user who is used to having scalars interpreted as arrays in other NumPy operations. It used to automatically happen and a few people depended on it in several companies and so the 1.10 release broke their code. I can appreciate that in the general case, allowing arbitrary broadcasting on the internal core dimensions can create confusion. But, scalar broadcasting still makes sense. A better workaround would be to use one of the np.broadcast* functions to > request exactly the broadcasting you want and make an arr2-sized view of > the scalar. In this case where you presumably (?) want to allow the last > two arguments to be broadcast against each other arbitrarily: > > arr2, arr3 = np.broadcast_arrays(arr2, scalar) > myufunc(arr1, arr2, arr3) > > A little wordier than implicit broadcasting, but not as bad as manually > creating an array, and like implicit broadcasting the memory overhead is > O(1) instead of O(size). > Thanks for the pointer (after I wrote the email this solution also occured to me). I think adding back automatic broadcasting for the scalar case makes a lot of sense as well, however. What do people think of that? Also adding this example to the documentation as a work-around for people whose code breaks with the new changes. Thanks, -Travis > -n > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- *Travis Oliphant, PhD* *Co-founder and CEO* @teoliphant 512-222-5440 http://www.continuum.io -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Wed Mar 16 16:07:38 2016 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 16 Mar 2016 14:07:38 -0600 Subject: [Numpy-discussion] Changes to generalized ufunc core dimension checking In-Reply-To: References: Message-ID: On Wed, Mar 16, 2016 at 1:48 PM, Travis Oliphant wrote: > > > On Wed, Mar 16, 2016 at 12:55 PM, Nathaniel Smith wrote: > >> Hi Travis, >> >> On Mar 16, 2016 9:52 AM, "Travis Oliphant" wrote: >> > >> > Hi everyone, >> > >> > Can you help me understand why the stricter changes to generalized >> ufunc argument checking no now longer allows scalars to be interpreted as >> 1-d arrays in the core-dimensions? >> > >> > Is there a way to specify in the core-signature that scalars should be >> allowed and interpreted in those cases as an array with all the elements >> the same? This seems like an important feature. >> >> Can you share some example of when this is useful? >> > > Being able to implicitly broadcast scalars to arrays is the core-function > of broadcasting. This is still very useful when you have a core-kernel > an want to pass in a scalar for many of the arguments. It seems that at > least in that case, automatic broadcasting should be allowed --- as it > seems clear what is meant. > > While you can use the broadcast* features to get the same effect with the > current code-base, this is not intuitive to a user who is used to having > scalars interpreted as arrays in other NumPy operations. > The `@` operator doesn't allow that. > > It used to automatically happen and a few people depended on it in several > companies and so the 1.10 release broke their code. > > I can appreciate that in the general case, allowing arbitrary broadcasting > on the internal core dimensions can create confusion. But, scalar > broadcasting still makes sense. > Mixing array multiplications with scalar broadcasting is looking for trouble. Array multiplication needs strict dimensions and having stacked arrays and vectors was one of the prime objectives of gufuncs. Perhaps what we need is a more precise notation for broadcasting, maybe `*` or some such addition to the signaturs to indicate that scalar broadcasting is acceptable. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Wed Mar 16 18:28:14 2016 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 16 Mar 2016 15:28:14 -0700 Subject: [Numpy-discussion] Changes to generalized ufunc core dimension checking In-Reply-To: References: Message-ID: On Wed, Mar 16, 2016 at 12:48 PM, Travis Oliphant wrote: > > > On Wed, Mar 16, 2016 at 12:55 PM, Nathaniel Smith wrote: > >> Hi Travis, >> >> On Mar 16, 2016 9:52 AM, "Travis Oliphant" wrote: >> > >> > Hi everyone, >> > >> > Can you help me understand why the stricter changes to generalized >> ufunc argument checking no now longer allows scalars to be interpreted as >> 1-d arrays in the core-dimensions? >> > >> > Is there a way to specify in the core-signature that scalars should be >> allowed and interpreted in those cases as an array with all the elements >> the same? This seems like an important feature. >> >> Can you share some example of when this is useful? >> > > Being able to implicitly broadcast scalars to arrays is the core-function > of broadcasting. This is still very useful when you have a core-kernel > an want to pass in a scalar for many of the arguments. It seems that at > least in that case, automatic broadcasting should be allowed --- as it > seems clear what is meant. > It isn't at all obvious what matrix_multiply(some_arr, 3) should mean, and the behavior you're proposing is definitely not what most people would expect when seeing that call. (I guess most people reading this would expect it to be equivalent to do a scalar multiplication like some_arr * 3, but in numpy 1.9 and earlier it did... I'm not quite sure what, actually, maybe np.sum(some_arr * 3, axis=1)?) Again, can you share some example(s) of a gufunc where this is what is wanted, so that we have a more concrete basis for discussion? > > While you can use the broadcast* features to get the same effect with the > current code-base, this is not intuitive to a user who is used to having > scalars interpreted as arrays in other NumPy operations. > > It used to automatically happen and a few people depended on it in several > companies and so the 1.10 release broke their code. > That's certainly unfortunate -- I think one of the reasons we elected to push the change through directly instead of going through a deprecation cycle was that as far as we could tell, this feature was both broken and unused and was only noticed recently because gufunc usage was only just starting to increase -- so we wanted to get rid of it quickly before people started inadvertently depending on it. (And clear the ground for any more-carefully-thought-through option that might arise in the future.) Sounds like a real deprecation cycle would have been better. For reference: Mailing list discussion: http://thread.gmane.org/gmane.comp.python.numeric.general/58824 Pull request: https://github.com/numpy/numpy/pull/5077 -n -- Nathaniel J. Smith -- https://vorpus.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From waterbug at pangalactic.us Wed Mar 16 18:45:41 2016 From: waterbug at pangalactic.us (Steve Waterbury) Date: Wed, 16 Mar 2016 18:45:41 -0400 Subject: [Numpy-discussion] Changes to generalized ufunc core dimension checking In-Reply-To: References: Message-ID: <56E9E215.7070506@pangalactic.us> On 03/16/2016 06:28 PM, Nathaniel Smith wrote: > ... Sounds like a real deprecation cycle would have been better. IMHO for a library as venerable and widely-used as Numpy, a deprecation cycle is almost always better ... consider this a lesson learned. > For reference: > Mailing list discussion: > http://thread.gmane.org/gmane.comp.python.numeric.general/58824 > Pull request: https://github.com/numpy/numpy/pull/5077 Also better if the "discussion" has a more descriptive subject than "Is this a bug?" ... My 2c. Steve From fperez.net at gmail.com Wed Mar 16 22:32:59 2016 From: fperez.net at gmail.com (Fernando Perez) Date: Wed, 16 Mar 2016 19:32:59 -0700 Subject: [Numpy-discussion] Changes to generalized ufunc core dimension checking In-Reply-To: <56E9E215.7070506@pangalactic.us> References: <56E9E215.7070506@pangalactic.us> Message-ID: On Wed, Mar 16, 2016 at 3:45 PM, Steve Waterbury wrote: > On 03/16/2016 06:28 PM, Nathaniel Smith wrote: > >> ... Sounds like a real deprecation cycle would have been better. >> > > IMHO for a library as venerable and widely-used as Numpy, a > deprecation cycle is almost always better ... consider this a > lesson learned. > Mandatory XKCD - https://xkcd.com/1172 We recently had a discussion about a similar "nobody we know uses nor should use this api" situation in IPython, and ultimately decided that xkcd 1172 would hit us, so opted in this case just for creating new cleaner APIs + possibly doing slow deprecation of the old stuff. For a widely used library, if the code exists then someone, somewhere depends on it, regardless of how broken or obscure you think the feature is. We just have to live with that. Cheers, f -- Fernando Perez (@fperez_org; http://fperez.org) fperez.net-at-gmail: mailing lists only (I ignore this when swamped!) fernando.perez-at-berkeley: contact me here for any direct mail -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Thu Mar 17 01:02:47 2016 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 16 Mar 2016 22:02:47 -0700 Subject: [Numpy-discussion] Changes to generalized ufunc core dimension checking In-Reply-To: References: <56E9E215.7070506@pangalactic.us> Message-ID: On Wed, Mar 16, 2016 at 7:32 PM, Fernando Perez wrote: > On Wed, Mar 16, 2016 at 3:45 PM, Steve Waterbury > wrote: >> >> On 03/16/2016 06:28 PM, Nathaniel Smith wrote: >>> >>> ... Sounds like a real deprecation cycle would have been better. >> >> >> IMHO for a library as venerable and widely-used as Numpy, a >> deprecation cycle is almost always better ... consider this a >> lesson learned. > > > Mandatory XKCD - https://xkcd.com/1172 > > We recently had a discussion about a similar "nobody we know uses nor should > use this api" situation in IPython, and ultimately decided that xkcd 1172 > would hit us, so opted in this case just for creating new cleaner APIs + > possibly doing slow deprecation of the old stuff. > > For a widely used library, if the code exists then someone, somewhere > depends on it, regardless of how broken or obscure you think the feature is. > We just have to live with that. Sure, the point is well taken, and we've been working hard to make numpy's change policy more consistent, rigorous, and useful -- and this remains very much a work in progress. But engineering is fundamentally the art of making trade-offs, which are always messy and context-dependent. I actually rather like XKCD 1172 because it's a Rorschach test -- you can just as easily read it as saying that you should start by accepting that all changes will break something, and then move on to the more interesting discussion of which things are you going to break and what are the trade-offs. :-) Like, in this case: Our general policy is definitely to use a deprecation cycle. And if you look at the discussions back in September, we also considered options like deprecating-then-removing the broadcasting, or adding a new gufunc-registration-API that would enable the new behavior while preserving the old behavior for old users. Both sound appealing initially. But they both also have serious downsides: they mean preserving broken behavior, possibly indefinitely, which *also* breaks users' code. Notice that if we add a new API in 1.10, then most people won't actually switch until 1.10 becomes the *minimum* version they support, except they probably will forget then too. And we have to maintain both APIs indefinitely. And I'm dubious that the kind of anonymous corporate users who got broken by this would have noticed a deprecation warning -- during the 1.11 cycle we had to go around pointing out some year+ old deprecation warnings to all our really active and engaged downstreams, never mind the ones we only hear about months later through Travis :-/. In this particular case, as far as we could tell, all single existing users were actually *broken* by the current behavior, so it was a question of whether we should fix a bunch of real cases but risk breaking some rare/theoretical ones, or should we leave a bunch of real cases broken for a while in order to be gentler to the rare/theoretical ones. And would we have ever even learned about these cases that Travis's clients are running into if we hadn't broken things? And so forth. It's obviously true that we make mistakes and should try to learn from them to do better in the future, but I object to the idea that there are simple and neat answers available :-). (And sometimes in my more pessimistic moments I feel like a lot of the conventional rules for change management are really technology for shifting around blame :-/. "We had a deprecation period that you didn't notice; your code broke the same either way, but our use of a deprecation period means that now it's *your* fault". Or, in the opposite situation: "Sure, this API doesn't work in the way that anyone would actually expect or want, but if we fix it then it will be *our* fault when existing code breaks -- OTOH if we leave the brokenness there but document it, then it'll be *your* fault when you -- totally predictably -- fall into the trap we've left for you". IMO in both situations it's much healthier to skip the blame games and take responsibility for the actual end result whatever it is, and if that means that some kind of failure is inevitable, then oh well, lets think about how to optimize our process for minimum net failure given imperfect information and finite resources :-). Is there some way we can help downstreams notice deprecations earlier? It's a lot easier to measure the cost of making a change than of not making a change -- is there some way we can gather more data to correct for this bias? IMO *these* are the really interesting questions, and they're ones that we've been actively working on.) -n -- Nathaniel J. Smith -- https://vorpus.org From waterbug at pangalactic.us Thu Mar 17 01:08:33 2016 From: waterbug at pangalactic.us (Steve Waterbury) Date: Thu, 17 Mar 2016 01:08:33 -0400 Subject: [Numpy-discussion] Changes to generalized ufunc core dimension checking In-Reply-To: References: <56E9E215.7070506@pangalactic.us> Message-ID: <56EA3BD1.8090102@pangalactic.us> On 03/16/2016 10:32 PM, Fernando Perez wrote: > On Wed, Mar 16, 2016 at 3:45 PM, Steve Waterbury > > wrote: > > On 03/16/2016 06:28 PM, Nathaniel Smith wrote: > > ... Sounds like a real deprecation cycle would have been better. > > > IMHO for a library as venerable and widely-used as Numpy, a > deprecation cycle is almost always better ... consider this a > lesson learned. > > > Mandatory XKCD - https://xkcd.com/1172 > > We recently had a discussion about a similar "nobody we know uses nor > should use this api" situation in IPython, and ultimately decided that > xkcd 1172 would hit us, so opted in this case just for creating new > cleaner APIs + possibly doing slow deprecation of the old stuff. > > For a widely used library, if the code exists then someone, somewhere > depends on it, regardless of how broken or obscure you think the feature > is. We just have to live with that. Ha, I love that xkcd! But not sure I agree that it applies here ... however, I do appreciate your sharing it. :D I mean, just change stuff and see who screams, right? ;) Steve From josef.pktd at gmail.com Thu Mar 17 01:20:53 2016 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 17 Mar 2016 01:20:53 -0400 Subject: [Numpy-discussion] Changes to generalized ufunc core dimension checking In-Reply-To: <56EA3BD1.8090102@pangalactic.us> References: <56E9E215.7070506@pangalactic.us> <56EA3BD1.8090102@pangalactic.us> Message-ID: On Thu, Mar 17, 2016 at 1:08 AM, Steve Waterbury wrote: > On 03/16/2016 10:32 PM, Fernando Perez wrote: > >> On Wed, Mar 16, 2016 at 3:45 PM, Steve Waterbury >> > wrote: >> >> On 03/16/2016 06:28 PM, Nathaniel Smith wrote: >> >> ... Sounds like a real deprecation cycle would have been better. >> >> >> IMHO for a library as venerable and widely-used as Numpy, a >> deprecation cycle is almost always better ... consider this a >> lesson learned. >> >> >> Mandatory XKCD - https://xkcd.com/1172 >> >> We recently had a discussion about a similar "nobody we know uses nor >> should use this api" situation in IPython, and ultimately decided that >> xkcd 1172 would hit us, so opted in this case just for creating new >> cleaner APIs + possibly doing slow deprecation of the old stuff. >> >> For a widely used library, if the code exists then someone, somewhere >> depends on it, regardless of how broken or obscure you think the feature >> is. We just have to live with that. >> > > Ha, I love that xkcd! But not sure I agree that it applies here ... > however, I do appreciate your sharing it. :D > > I mean, just change stuff and see who screams, right? ;) No, it's change stuff and listen to whether anybody screams. I'm sometimes late in (politely) screaming because deprecation warning are either filtered out or I'm using ancient numpy in my development python, or for whatever other reason I don't see warnings. Josef > > > Steve > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Thu Mar 17 04:04:11 2016 From: travis at continuum.io (Travis Oliphant) Date: Thu, 17 Mar 2016 03:04:11 -0500 Subject: [Numpy-discussion] Changes to generalized ufunc core dimension checking In-Reply-To: References: Message-ID: On Wed, Mar 16, 2016 at 3:07 PM, Charles R Harris wrote: > > > On Wed, Mar 16, 2016 at 1:48 PM, Travis Oliphant > wrote: > >> >> >> On Wed, Mar 16, 2016 at 12:55 PM, Nathaniel Smith wrote: >> >>> Hi Travis, >>> >>> On Mar 16, 2016 9:52 AM, "Travis Oliphant" wrote: >>> > >>> > Hi everyone, >>> > >>> > Can you help me understand why the stricter changes to generalized >>> ufunc argument checking no now longer allows scalars to be interpreted as >>> 1-d arrays in the core-dimensions? >>> > >>> > Is there a way to specify in the core-signature that scalars should be >>> allowed and interpreted in those cases as an array with all the elements >>> the same? This seems like an important feature. >>> >>> Can you share some example of when this is useful? >>> >> >> Being able to implicitly broadcast scalars to arrays is the core-function >> of broadcasting. This is still very useful when you have a core-kernel >> an want to pass in a scalar for many of the arguments. It seems that at >> least in that case, automatic broadcasting should be allowed --- as it >> seems clear what is meant. >> >> While you can use the broadcast* features to get the same effect with the >> current code-base, this is not intuitive to a user who is used to having >> scalars interpreted as arrays in other NumPy operations. >> > > The `@` operator doesn't allow that. > > >> >> It used to automatically happen and a few people depended on it in >> several companies and so the 1.10 release broke their code. >> >> I can appreciate that in the general case, allowing arbitrary >> broadcasting on the internal core dimensions can create confusion. But, >> scalar broadcasting still makes sense. >> > > Mixing array multiplications with scalar broadcasting is looking for > trouble. Array multiplication needs strict dimensions and having stacked > arrays and vectors was one of the prime objectives of gufuncs. Perhaps what > we need is a more precise notation for broadcasting, maybe `*` or some such > addition to the signaturs to indicate that scalar broadcasting is > acceptable. > I think that is a good idea. Let the user decide if scalar broadcasting is acceptable for their function. Here is a simple concrete example where scalar broadcasting makes sense: A 1-d dot product (the core of np.inner) (k), (k) -> () A user would assume they could call this function with a scalar in either argument and have it broadcast to a 1-d array. Of course, if both arguments are scalars, then it doesn't make sense. Having a way for the user to allow scalar broadcasting seems sensible and a nice compromise. -Travis > > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- *Travis Oliphant, PhD* *Co-founder and CEO* @teoliphant 512-222-5440 http://www.continuum.io -------------- next part -------------- An HTML attachment was scrubbed... URL: From rainwoodman at gmail.com Thu Mar 17 04:22:03 2016 From: rainwoodman at gmail.com (Feng Yu) Date: Thu, 17 Mar 2016 02:22:03 -0600 Subject: [Numpy-discussion] Changes to generalized ufunc core dimension checking In-Reply-To: References: Message-ID: Hi, Here is another example. To write pix2ang (and similar functions) to a ufunc, one may want to have implicit scalar broadcast on `nested` and `nsides` arguments. The function is described here: http://healpy.readthedocs.org/en/latest/generated/healpy.pixelfunc.pix2ang.html#healpy.pixelfunc.pix2ang Yu On Thu, Mar 17, 2016 at 2:04 AM, Travis Oliphant wrote: > > > On Wed, Mar 16, 2016 at 3:07 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Wed, Mar 16, 2016 at 1:48 PM, Travis Oliphant >> wrote: >> >>> >>> >>> On Wed, Mar 16, 2016 at 12:55 PM, Nathaniel Smith wrote: >>> >>>> Hi Travis, >>>> >>>> On Mar 16, 2016 9:52 AM, "Travis Oliphant" wrote: >>>> > >>>> > Hi everyone, >>>> > >>>> > Can you help me understand why the stricter changes to generalized >>>> ufunc argument checking no now longer allows scalars to be interpreted as >>>> 1-d arrays in the core-dimensions? >>>> > >>>> > Is there a way to specify in the core-signature that scalars should >>>> be allowed and interpreted in those cases as an array with all the elements >>>> the same? This seems like an important feature. >>>> >>>> Can you share some example of when this is useful? >>>> >>> >>> Being able to implicitly broadcast scalars to arrays is the >>> core-function of broadcasting. This is still very useful when you have a >>> core-kernel an want to pass in a scalar for many of the arguments. It >>> seems that at least in that case, automatic broadcasting should be allowed >>> --- as it seems clear what is meant. >>> >>> While you can use the broadcast* features to get the same effect with >>> the current code-base, this is not intuitive to a user who is used to >>> having scalars interpreted as arrays in other NumPy operations. >>> >> >> The `@` operator doesn't allow that. >> >> >>> >>> It used to automatically happen and a few people depended on it in >>> several companies and so the 1.10 release broke their code. >>> >>> I can appreciate that in the general case, allowing arbitrary >>> broadcasting on the internal core dimensions can create confusion. But, >>> scalar broadcasting still makes sense. >>> >> >> Mixing array multiplications with scalar broadcasting is looking for >> trouble. Array multiplication needs strict dimensions and having stacked >> arrays and vectors was one of the prime objectives of gufuncs. Perhaps what >> we need is a more precise notation for broadcasting, maybe `*` or some such >> addition to the signaturs to indicate that scalar broadcasting is >> acceptable. >> > > I think that is a good idea. Let the user decide if scalar broadcasting > is acceptable for their function. > > Here is a simple concrete example where scalar broadcasting makes sense: > > A 1-d dot product (the core of np.inner) (k), (k) -> () > > A user would assume they could call this function with a scalar in either > argument and have it broadcast to a 1-d array. Of course, if both > arguments are scalars, then it doesn't make sense. > > Having a way for the user to allow scalar broadcasting seems sensible and > a nice compromise. > > -Travis > > > >> >> >> Chuck >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > > -- > > *Travis Oliphant, PhD* > *Co-founder and CEO* > > > @teoliphant > 512-222-5440 > http://www.continuum.io > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Thu Mar 17 10:03:13 2016 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 17 Mar 2016 07:03:13 -0700 Subject: [Numpy-discussion] Changes to generalized ufunc core dimension checking In-Reply-To: References: Message-ID: On Mar 17, 2016 1:22 AM, "Feng Yu" wrote: > > Hi, > > Here is another example. > > To write pix2ang (and similar functions) to a ufunc, one may want to have implicit scalar broadcast on `nested` and `nsides` arguments. > > The function is described here: > > http://healpy.readthedocs.org/en/latest/generated/healpy.pixelfunc.pix2ang.html#healpy.pixelfunc.pix2ang Sorry, can you elaborate on what that function does, maybe give an example, for those of us who haven't used healpy before? I can't quite understand from that page, but am interested... -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From jfoxrabinovitz at gmail.com Thu Mar 17 10:09:38 2016 From: jfoxrabinovitz at gmail.com (Joseph Fox-Rabinovitz) Date: Thu, 17 Mar 2016 10:09:38 -0400 Subject: [Numpy-discussion] Changes to generalized ufunc core dimension checking In-Reply-To: References: Message-ID: On Thu, Mar 17, 2016 at 10:03 AM, Nathaniel Smith wrote: > On Mar 17, 2016 1:22 AM, "Feng Yu" wrote: >> >> Hi, >> >> Here is another example. >> >> To write pix2ang (and similar functions) to a ufunc, one may want to have >> implicit scalar broadcast on `nested` and `nsides` arguments. >> >> The function is described here: >> >> >> http://healpy.readthedocs.org/en/latest/generated/healpy.pixelfunc.pix2ang.html#healpy.pixelfunc.pix2ang > > Sorry, can you elaborate on what that function does, maybe give an example, > for those of us who haven't used healpy before? I can't quite understand > from that page, but am interested... > > -n Likewise. I just took a look at the library and it looks fascinating. I might just use it for something fun to learn about it. -Joe From rainwoodman at gmail.com Thu Mar 17 17:04:25 2016 From: rainwoodman at gmail.com (Feng Yu) Date: Thu, 17 Mar 2016 15:04:25 -0600 Subject: [Numpy-discussion] Changes to generalized ufunc core dimension checking In-Reply-To: References: Message-ID: Hi, ang2pix is used in astronomy to pixelize coordinate in forms of (theta, phi). healpy is a binding of healpix (http://healpix.sourceforge.net/, introduction there too), plus a lot of more extra features or bloat (and I am not particular fond of this aspect of healpy). It gets the work done. You can think of the function ang2pix as nump.digitize for angular input. 'nside' and 'nest' controls the number of pixels and the ordering of pixels (since it is 2d to linear index). The important thing here is ang2pix is a pure function from (nside, nest, theta, phi) to pixelid, so in principle it can be written as a ufunc to extend the functionality to generate pixel ids for different nside and nest settings in the same function call. There are probably functions in numpy that can benefit from this as well, but I can't immediately think of any. Yu On Thu, Mar 17, 2016 at 8:09 AM, Joseph Fox-Rabinovitz wrote: > On Thu, Mar 17, 2016 at 10:03 AM, Nathaniel Smith wrote: >> On Mar 17, 2016 1:22 AM, "Feng Yu" wrote: >>> >>> Hi, >>> >>> Here is another example. >>> >>> To write pix2ang (and similar functions) to a ufunc, one may want to have >>> implicit scalar broadcast on `nested` and `nsides` arguments. >>> >>> The function is described here: >>> >>> >>> http://healpy.readthedocs.org/en/latest/generated/healpy.pixelfunc.pix2ang.html#healpy.pixelfunc.pix2ang >> >> Sorry, can you elaborate on what that function does, maybe give an example, >> for those of us who haven't used healpy before? I can't quite understand >> from that page, but am interested... >> >> -n > > Likewise. I just took a look at the library and it looks fascinating. > I might just use it for something fun to learn about it. > > -Joe > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion From njs at pobox.com Thu Mar 17 17:21:21 2016 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 17 Mar 2016 14:21:21 -0700 Subject: [Numpy-discussion] Changes to generalized ufunc core dimension checking In-Reply-To: References: Message-ID: On Thu, Mar 17, 2016 at 2:04 PM, Feng Yu wrote: > Hi, > > ang2pix is used in astronomy to pixelize coordinate in forms of > (theta, phi). healpy is a binding of healpix > (http://healpix.sourceforge.net/, introduction there too), plus a lot > of more extra features or bloat (and I am not particular fond of this > aspect of healpy). It gets the work done. > > You can think of the function ang2pix as nump.digitize for angular input. > > 'nside' and 'nest' controls the number of pixels and the ordering of > pixels (since it is 2d to linear index). > > The important thing here is ang2pix is a pure function from (nside, > nest, theta, phi) to pixelid, so in principle it can be written as a > ufunc to extend the functionality to generate pixel ids for different > nside and nest settings in the same function call. Thanks for the details! >From what you're saying, it sounds like ang2pix actually wouldn't care either way about the gufunc broadcasting changes we're talking about. When we talk about *g*eneralized ufuncs, we're referring to ufuncs where the "core" minimal operation that gets looped over is already intrinsically something that operates on arrays, not just scalars -- so operations like matrix multiply, sum, mean, mode, sort, etc., which you might want to apply simultaneously to a whole bunch of arrays, and the question is about how to handle these "inner" dimensions. In this case it sounds like (nside, nest, theta, phi) are 4 scalars, right? So this would just be a regular ufunc, and the whole issue doesn't arise. Broadcast all you like :-) -n -- Nathaniel J. Smith -- https://vorpus.org From shoyer at gmail.com Thu Mar 17 17:41:51 2016 From: shoyer at gmail.com (Stephan Hoyer) Date: Thu, 17 Mar 2016 14:41:51 -0700 Subject: [Numpy-discussion] Changes to generalized ufunc core dimension checking In-Reply-To: References: Message-ID: On Thu, Mar 17, 2016 at 1:04 AM, Travis Oliphant wrote: > I think that is a good idea. Let the user decide if scalar broadcasting > is acceptable for their function. > > Here is a simple concrete example where scalar broadcasting makes sense: > > A 1-d dot product (the core of np.inner) (k), (k) -> () > > A user would assume they could call this function with a scalar in either > argument and have it broadcast to a 1-d array. Of course, if both > arguments are scalars, then it doesn't make sense. > > Having a way for the user to allow scalar broadcasting seems sensible and > a nice compromise. > > -Travis > To generalize a little bit, consider the entire family of weighted statistical function (mean, std, median, etc.). For example, the gufunc version of np.average is basically equivalent to np.inner with a bit of preprocessing. Arguably, it *could* make sense to broadcast weights when given a scalar: np.average(values, weights=1.0 / len(values)) is pretty unambiguous. That said, adding an explicit "scalar broadcasting OK flag" seems like a hack that will need even more special logic (e.g., so we can error if both arguments to np.inner are scalars). Multiple dispatch for gufunc core signatures seems like the cleaner solution. If you want np.inner to handle scalars, you need to supply core signatures (k),()->() and (),(k)->() along with (k),(k)->(). This is the similar to vision of three core signatures for np.matmul: (i),(i,j)->(j), (i,j),(j)->(i) and (i,j),(j,k)->(i,k). Maybe someone will even eventually get around to adding an axis/axes argument so we can specify these core dimensions explicitly. Writing np.inner(a, b, axes=((-1,), ())) could trigger the (k),()->() signature even if the second argument is not a scalar (it should be broadcast against "a" instead). -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Thu Mar 17 17:49:18 2016 From: travis at continuum.io (Travis Oliphant) Date: Thu, 17 Mar 2016 16:49:18 -0500 Subject: [Numpy-discussion] Changes to generalized ufunc core dimension checking In-Reply-To: References: Message-ID: On Thu, Mar 17, 2016 at 4:41 PM, Stephan Hoyer wrote: > On Thu, Mar 17, 2016 at 1:04 AM, Travis Oliphant > wrote: > >> I think that is a good idea. Let the user decide if scalar >> broadcasting is acceptable for their function. >> >> Here is a simple concrete example where scalar broadcasting makes sense: >> >> >> A 1-d dot product (the core of np.inner) (k), (k) -> () >> >> A user would assume they could call this function with a scalar in either >> argument and have it broadcast to a 1-d array. Of course, if both >> arguments are scalars, then it doesn't make sense. >> >> Having a way for the user to allow scalar broadcasting seems sensible and >> a nice compromise. >> >> -Travis >> > > To generalize a little bit, consider the entire family of weighted > statistical function (mean, std, median, etc.). For example, the gufunc > version of np.average is basically equivalent to np.inner with a bit of > preprocessing. > > Arguably, it *could* make sense to broadcast weights when given a scalar: > np.average(values, weights=1.0 / len(values)) is pretty unambiguous. > > That said, adding an explicit "scalar broadcasting OK flag" seems like a > hack that will need even more special logic (e.g., so we can error if both > arguments to np.inner are scalars). > > Multiple dispatch for gufunc core signatures seems like the cleaner > solution. If you want np.inner to handle scalars, you need to supply core > signatures (k),()->() and (),(k)->() along with (k),(k)->(). This is the > similar to vision of three core signatures for np.matmul: (i),(i,j)->(j), > (i,j),(j)->(i) and (i,j),(j,k)->(i,k). > > Maybe someone will even eventually get around to adding an axis/axes > argument so we can specify these core dimensions explicitly. Writing > np.inner(a, b, axes=((-1,), ())) could trigger the (k),()->() signature > even if the second argument is not a scalar (it should be broadcast against > "a" instead). > That's a great idea! Adding multiple-dispatch capability for this case could also solve a lot of issues that right now prevent generalized ufuncs from being the mechanism of implementation of *all* NumPy functions. -Travis > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- *Travis Oliphant, PhD* *Co-founder and CEO* @teoliphant 512-222-5440 http://www.continuum.io -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaime.frio at gmail.com Thu Mar 17 18:28:11 2016 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Thu, 17 Mar 2016 23:28:11 +0100 Subject: [Numpy-discussion] Changes to generalized ufunc core dimension checking In-Reply-To: References: Message-ID: On Thu, Mar 17, 2016 at 10:41 PM, Stephan Hoyer wrote: > On Thu, Mar 17, 2016 at 1:04 AM, Travis Oliphant > wrote: > >> I think that is a good idea. Let the user decide if scalar >> broadcasting is acceptable for their function. >> >> Here is a simple concrete example where scalar broadcasting makes sense: >> >> >> A 1-d dot product (the core of np.inner) (k), (k) -> () >> >> A user would assume they could call this function with a scalar in either >> argument and have it broadcast to a 1-d array. Of course, if both >> arguments are scalars, then it doesn't make sense. >> >> Having a way for the user to allow scalar broadcasting seems sensible and >> a nice compromise. >> >> -Travis >> > > To generalize a little bit, consider the entire family of weighted > statistical function (mean, std, median, etc.). For example, the gufunc > version of np.average is basically equivalent to np.inner with a bit of > preprocessing. > > Arguably, it *could* make sense to broadcast weights when given a scalar: > np.average(values, weights=1.0 / len(values)) is pretty unambiguous. > > That said, adding an explicit "scalar broadcasting OK flag" seems like a > hack that will need even more special logic (e.g., so we can error if both > arguments to np.inner are scalars). > > Multiple dispatch for gufunc core signatures seems like the cleaner > solution. If you want np.inner to handle scalars, you need to supply core > signatures (k),()->() and (),(k)->() along with (k),(k)->(). This is the > similar to vision of three core signatures for np.matmul: (i),(i,j)->(j), > (i,j),(j)->(i) and (i,j),(j,k)->(i,k). > Would the logic for such a thing be consistent? E.g. how do you decide if the user is requesting (k),(k)->(), or (k),()->() with broadcasting over a non-core dimension of size k in the second argument? What if your signatures are (m, k),(k)->(m) and (k),(n,k)->(n) and your two inputs are (m,k) and (n,k), how do you decide which one to call? Or alternatively, how do you detect and forbid such ambiguous designs? Figuring out the dispatch rules for the general case seems like a non-trivial problem to me. Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Thu Mar 17 19:11:16 2016 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 17 Mar 2016 16:11:16 -0700 Subject: [Numpy-discussion] new API: nancumsum and nancumprod Message-ID: Hi all, We have a pull request to add np.nancumsum and np.nancumprod: https://github.com/numpy/numpy/pull/7421 Seems pretty straightforward and uncontroversial to me, but our policy is to run new API by the mailing list, so speak up if you have some objection, or take a look at the PR if you have an interest in the detailed semantics. -n -- Nathaniel J. Smith -- https://vorpus.org From shoyer at gmail.com Fri Mar 18 13:21:14 2016 From: shoyer at gmail.com (Stephan Hoyer) Date: Fri, 18 Mar 2016 10:21:14 -0700 Subject: [Numpy-discussion] Changes to generalized ufunc core dimension checking In-Reply-To: References: Message-ID: On Thu, Mar 17, 2016 at 3:28 PM, Jaime Fern?ndez del R?o < jaime.frio at gmail.com> wrote: > Would the logic for such a thing be consistent? E.g. how do you decide if > the user is requesting (k),(k)->(), or (k),()->() with broadcasting over a > non-core dimension of size k in the second argument? What if your > signatures are (m, k),(k)->(m) and (k),(n,k)->(n) and your two inputs are > (m,k) and (n,k), how do you decide which one to call? Or alternatively, how > do you detect and forbid such ambiguous designs? Figuring out the dispatch > rules for the general case seems like a non-trivial problem to me. > I would require a priority order for the core signatures when the gufunc is created and only allow one implementation per argument dimension in the core signature (i.e., disallow multiple implementations like (k),(k)->() and (k),(m)->()). The rule would be to dispatch to the implementation with the first core signature with the right number of axes. The later constraint ensures that (m,n) @ (k,n) errors if k != n, rather than attempting vectorized matrix-vector multiplication. For matmul/@, the priority order is pretty straightforward: 1. (m,n),(n,k)->(n,k) 2. (m,n),(n)->(m) 3. (m),(m,n)->(n) 4. (m),(m)->() (2 and 3 could be safely interchanged.) For scenarios like "(k),(k)->(), or (k),()->()", the only reasonable choice would be to put (k),(k)->() first -- otherwise it never gets called. For the other ambiguous case, "(m, k),(k)->(m) and (k),(n,k)->(n)", the implementer would also need to pick an order. Most of the tricky cases for multiple dispatch arise from extensible systems (e.g., Matthew Rocklin's multipledispatch library), where you allow/encourage third party libraries to add their own implementations and need to be sure the combined result is still consistent. I wouldn't suggest such a system for NumPy -- I think it's fine to require every gufunc to have a single owner. There are other solutions for allowing extensibility to duck array types (namely, __numpy_ufunc__). -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Fri Mar 18 13:22:53 2016 From: shoyer at gmail.com (Stephan Hoyer) Date: Fri, 18 Mar 2016 10:22:53 -0700 Subject: [Numpy-discussion] Changes to generalized ufunc core dimension checking In-Reply-To: References: Message-ID: On Thu, Mar 17, 2016 at 2:49 PM, Travis Oliphant wrote: > That's a great idea! > > Adding multiple-dispatch capability for this case could also solve a lot > of issues that right now prevent generalized ufuncs from being the > mechanism of implementation of *all* NumPy functions. > > -Travis > For future reference, there's already an issue on GitHub about adding an axis argument to gufuncs: https://github.com/numpy/numpy/issues/5197 (see also the referenced mailing list discussion from that page.) -------------- next part -------------- An HTML attachment was scrubbed... URL: From rainwoodman at gmail.com Fri Mar 18 13:25:44 2016 From: rainwoodman at gmail.com (Feng Yu) Date: Fri, 18 Mar 2016 11:25:44 -0600 Subject: [Numpy-discussion] Changes to generalized ufunc core dimension checking In-Reply-To: References: Message-ID: Thanks for the explanation. I see the point now. On Thu, Mar 17, 2016 at 3:21 PM, Nathaniel Smith wrote: > On Thu, Mar 17, 2016 at 2:04 PM, Feng Yu wrote: >> Hi, >> >> ang2pix is used in astronomy to pixelize coordinate in forms of >> (theta, phi). healpy is a binding of healpix >> (http://healpix.sourceforge.net/, introduction there too), plus a lot >> of more extra features or bloat (and I am not particular fond of this >> aspect of healpy). It gets the work done. >> >> You can think of the function ang2pix as nump.digitize for angular input. >> >> 'nside' and 'nest' controls the number of pixels and the ordering of >> pixels (since it is 2d to linear index). >> >> The important thing here is ang2pix is a pure function from (nside, >> nest, theta, phi) to pixelid, so in principle it can be written as a >> ufunc to extend the functionality to generate pixel ids for different >> nside and nest settings in the same function call. > > Thanks for the details! > > From what you're saying, it sounds like ang2pix actually wouldn't care > either way about the gufunc broadcasting changes we're talking about. > When we talk about *g*eneralized ufuncs, we're referring to ufuncs > where the "core" minimal operation that gets looped over is already > intrinsically something that operates on arrays, not just scalars -- > so operations like matrix multiply, sum, mean, mode, sort, etc., which > you might want to apply simultaneously to a whole bunch of arrays, and > the question is about how to handle these "inner" dimensions. In this > case it sounds like (nside, nest, theta, phi) are 4 scalars, right? So > this would just be a regular ufunc, and the whole issue doesn't arise. > Broadcast all you like :-) > > -n > > -- > Nathaniel J. Smith -- https://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion From charlesr.harris at gmail.com Sat Mar 19 19:27:03 2016 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 19 Mar 2016 17:27:03 -0600 Subject: [Numpy-discussion] Numpy 1.11.0rc2 released. Message-ID: Hi All, Numpy 1.11.0rc2 has been released. There have been a few reversions and fixes since 1.11.0rc1, but by and large things have been pretty stable and unless something terrible happens, I will make the final next weekend. Source files and documentation can be found on Sourceforge , while source files and OS X wheels for Python 2.7, 3.3, 3.4, and 3.5 can be installed from Pypi. Please test and yell loudly if there is a problem. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From frederic.bastien at gmail.com Mon Mar 21 15:23:37 2016 From: frederic.bastien at gmail.com (=?UTF-8?B?RnLDqWTDqXJpYyBCYXN0aWVu?=) Date: Mon, 21 Mar 2016 15:23:37 -0400 Subject: [Numpy-discussion] Announcing Theano 0.8.0 Message-ID: ======================== Announcing Theano 0.8.0 ======================== This is a release for a major version, with lots of new features, bug fixes, and some interface changes (deprecated or potentially misleading features were removed). The upgrade is recommended for everybody. For those using the bleeding edge version in the git repository, we encourage you to update to the `rel-0.8.0` tag. What's New ---------- Highlights:? - Python 2 and 3 support with the same code base - Faster optimization - Integration of CuDNN for better GPU performance - Many Scan improvements (execution speed up, ...) - optimizer=fast_compile moves computation to the GPU. - Better convolution on CPU and GPU. (CorrMM, cudnn, 3d conv, more parameter) - Interactive visualization of graphs with d3viz - cnmem (better memory management on GPU) - BreakpointOp - Multi-GPU for data parallism via Platoon ( https://github.com/mila-udem/platoon/) - More pooling parameter supported - Bilinear interpolation of images - New GPU back-end: * Float16 new back-end (need cuda 7.5) * Multi dtypes * Multi-GPU support in the same process A total of 141 people contributed to this release, please see the end of NEWS.txt for the complete list. If you are among the authors and would like to update the information, please let us know. Download and Install -------------------- You can download Theano from http://pypi.python.org/pypi/Theano Installation instructions are available at http://deeplearning.net/software/theano/install.html Description ----------- Theano is a Python library that allows you to define, optimize, and efficiently evaluate mathematical expressions involving multi-dimensional arrays. It is built on top of NumPy. Theano features: * tight integration with NumPy: a similar interface to NumPy's. numpy.ndarrays are also used internally in Theano-compiled functions. * transparent use of a GPU: perform data-intensive computations up to 140x faster than on a CPU (support for float32 only). * efficient symbolic differentiation: Theano can compute derivatives for functions of one or many inputs. * speed and stability optimizations: avoid nasty bugs when computing expressions such as log(1+ exp(x)) for large values of x. * dynamic C code generation: evaluate expressions faster. * extensive unit-testing and self-verification: includes tools for detecting and diagnosing bugs and/or potential problems. Theano has been powering large-scale computationally intensive scientific research since 2007, but it is also approachable enough to be used in deep learning classes. All questions/comments are always welcome on the Theano mailing-lists ( http://deeplearning.net/software/theano/#community ) Fr?d?ric -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Tue Mar 22 02:45:23 2016 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 21 Mar 2016 23:45:23 -0700 Subject: [Numpy-discussion] [ANN/FYI] workshop on python compilers @ SciPy this year, with relevance to numpy Message-ID: Hi numpy-discussion, Wanted to pass on this workshop announcement here, since one of the motivations for this was the realization that with PyPy and Pyston both actively working on getting numpy working, we'll probably soon have *three* JIT platforms actively attempting to write their own version of numpy (see the website below for details), and it would be nice to avert that if we can :-). So it'd be great to have some representation from numpy (and projects like numpy, e.g. dynd or pandas). --- What: A two-day workshop bringing together folks working on JIT/AOT compilation in Python. When/where: July 11-12, in Austin, Texas. (This is co-located with SciPy 2016, at the same time as the tutorial sessions, just before the conference proper.) Website: https://python-compilers-workshop.github.io/ Note that I anticipate that we'll be able to get sponsorship funding to cover travel costs for folks who can't get their employers to foot the bill. Cheers, -n -- Nathaniel J. Smith -- https://vorpus.org From bobmerhebi at gmail.com Wed Mar 23 09:06:44 2016 From: bobmerhebi at gmail.com (Ibrahim EL MEREHBI) Date: Wed, 23 Mar 2016 14:06:44 +0100 Subject: [Numpy-discussion] Multi-dimensional array of splitted array Message-ID: <56F294E4.9080907@gmail.com> Hello, I have a multi-diensional array that I would like to split its columns. For example consider, dat = np.array([np.arange(10),np.arange(10,20), np.arange(20,30)]).T array([[ 0, 10, 20], [ 1, 11, 21], [ 2, 12, 22], [ 3, 13, 23], [ 4, 14, 24], [ 5, 15, 25], [ 6, 16, 26], [ 7, 17, 27], [ 8, 18, 28], [ 9, 19, 29]]) I already can split one column at a time: np.array_split(dat[:,0], [2,5,8]) [array([0, 1]), array([2, 3, 4]), array([5, 6, 7]), array([8, 9])] How can I extend this for all columns and (overwrite or) have a new multi-dimensional array? Thank you, Bob -------------- next part -------------- An HTML attachment was scrubbed... URL: From ewm at redtetrahedron.org Wed Mar 23 09:17:35 2016 From: ewm at redtetrahedron.org (Eric Moore) Date: Wed, 23 Mar 2016 09:17:35 -0400 Subject: [Numpy-discussion] Multi-dimensional array of splitted array In-Reply-To: <56F294E4.9080907@gmail.com> References: <56F294E4.9080907@gmail.com> Message-ID: Try just calling np.array_split on the full 2D array. It splits along a particular axis, which is selected using the axis argument of np.array_split. The axis to split along defaults to the first so the two calls to np.array_split below are exactly equivalent. In [16]: a = np.c_[:10,10:20,20:30] In [17]: np.array_split(a, [2,5,8]) Out[17]: [array([[ 0, 10, 20], [ 1, 11, 21]]), array([[ 2, 12, 22], [ 3, 13, 23], [ 4, 14, 24]]), array([[ 5, 15, 25], [ 6, 16, 26], [ 7, 17, 27]]), array([[ 8, 18, 28], [ 9, 19, 29]])] In [18]: np.array_split(a, [2,5,8], 0) Out[18]: [array([[ 0, 10, 20], [ 1, 11, 21]]), array([[ 2, 12, 22], [ 3, 13, 23], [ 4, 14, 24]]), array([[ 5, 15, 25], [ 6, 16, 26], [ 7, 17, 27]]), array([[ 8, 18, 28], [ 9, 19, 29]])] Eric On Wed, Mar 23, 2016 at 9:06 AM, Ibrahim EL MEREHBI wrote: > Hello, > > I have a multi-diensional array that I would like to split its columns. > > For example consider, > > dat = np.array([np.arange(10),np.arange(10,20), np.arange(20,30)]).T > > array([[ 0, 10, 20], > [ 1, 11, 21], > [ 2, 12, 22], > [ 3, 13, 23], > [ 4, 14, 24], > [ 5, 15, 25], > [ 6, 16, 26], > [ 7, 17, 27], > [ 8, 18, 28], > [ 9, 19, 29]]) > > > I already can split one column at a time: > > np.array_split(dat[:,0], [2,5,8]) > > [array([0, 1]), array([2, 3, 4]), array([5, 6, 7]), array([8, 9])] > > > How can I extend this for all columns and (overwrite or) have a new > multi-dimensional array? > > Thank you, > Bob > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bobmerhebi at gmail.com Wed Mar 23 09:37:48 2016 From: bobmerhebi at gmail.com (Ibrahim EL MEREHBI) Date: Wed, 23 Mar 2016 14:37:48 +0100 Subject: [Numpy-discussion] Multi-dimensional array of splitted array In-Reply-To: References: <56F294E4.9080907@gmail.com> Message-ID: <56F29C2C.8040401@gmail.com> Thanks Eric. I already checked that. It's not what I want. I think I wasn't clear about what I wanted. I want to split each column but I want to do it for each column and end up with an array. Here's the result I wish to have: array([[[0], [1, 2, 3, 4], [5, 6, 7], [8, 9]], [[10], [11, 12, 13, 14], [15, 16, 17], [18, 19]], [[20], [21, 21, 23, 24], [25, 26, 27], [28, 29]]], dtype=object) Sincerely Yours, Bob On 23/03/2016 14:17, Eric Moore wrote: > Try just calling np.array_split on the full 2D array. It splits along > a particular axis, which is selected using the axis argument of > np.array_split. The axis to split along defaults to the first so the > two calls to np.array_split below are exactly equivalent. > > In [16]: a = np.c_[:10,10:20,20:30] > > > In [17]: np.array_split(a, [2,5,8]) > > Out[17]: > > [array([[ 0, 10, 20], > > [ 1, 11, 21]]), array([[ 2, 12, 22], > > [ 3, 13, 23], > > [ 4, 14, 24]]), array([[ 5, 15, 25], > > [ 6, 16, 26], > > [ 7, 17, 27]]), array([[ 8, 18, 28], > > [ 9, 19, 29]])] > > > In [18]: np.array_split(a, [2,5,8], 0) > > Out[18]: > > [array([[ 0, 10, 20], > > [ 1, 11, 21]]), array([[ 2, 12, 22], > > [ 3, 13, 23], > > [ 4, 14, 24]]), array([[ 5, 15, 25], > > [ 6, 16, 26], > > [ 7, 17, 27]]), array([[ 8, 18, 28], > > [ 9, 19, 29]])] > > > Eric > > > > On Wed, Mar 23, 2016 at 9:06 AM, Ibrahim EL MEREHBI > > wrote: > > Hello, > > I have a multi-diensional array that I would like to split its > columns. > > For example consider, > > dat = np.array([np.arange(10),np.arange(10,20), np.arange(20,30)]).T > > array([[ 0, 10, 20], > [ 1, 11, 21], > [ 2, 12, 22], > [ 3, 13, 23], > [ 4, 14, 24], > [ 5, 15, 25], > [ 6, 16, 26], > [ 7, 17, 27], > [ 8, 18, 28], > [ 9, 19, 29]]) > > > I already can split one column at a time: > > np.array_split(dat[:,0], [2,5,8]) > > [array([0, 1]), array([2, 3, 4]), array([5, 6, 7]), array([8, 9])] > > > How can I extend this for all columns and (overwrite or) have a > new multi-dimensional array? > > Thank you, > Bob > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From jfoxrabinovitz at gmail.com Wed Mar 23 10:02:38 2016 From: jfoxrabinovitz at gmail.com (Joseph Fox-Rabinovitz) Date: Wed, 23 Mar 2016 10:02:38 -0400 Subject: [Numpy-discussion] Multi-dimensional array of splitted array In-Reply-To: <56F29C2C.8040401@gmail.com> References: <56F294E4.9080907@gmail.com> <56F29C2C.8040401@gmail.com> Message-ID: On Wed, Mar 23, 2016 at 9:37 AM, Ibrahim EL MEREHBI wrote: > Thanks Eric. I already checked that. It's not what I want. I think I wasn't > clear about what I wanted. > > I want to split each column but I want to do it for each column and end up > with an array. Here's the result I wish to have: > > array([[[0], [1, 2, 3, 4], [5, 6, 7], [8, 9]], > [[10], [11, 12, 13, 14], [15, 16, 17], [18, 19]], > [[20], [21, 21, 23, 24], [25, 26, 27], [28, 29]]], dtype=object) > Apply [`np.stack`](http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.stack.html#numpy.stack) to the result. It will merge the arrays the way you want. -Joe > > Sincerely Yours, > Bob > > > > On 23/03/2016 14:17, Eric Moore wrote: > > Try just calling np.array_split on the full 2D array. It splits along a > particular axis, which is selected using the axis argument of > np.array_split. The axis to split along defaults to the first so the two > calls to np.array_split below are exactly equivalent. > > In [16]: a = np.c_[:10,10:20,20:30] > > > In [17]: np.array_split(a, [2,5,8]) > > Out[17]: > > [array([[ 0, 10, 20], > > [ 1, 11, 21]]), array([[ 2, 12, 22], > > [ 3, 13, 23], > > [ 4, 14, 24]]), array([[ 5, 15, 25], > > [ 6, 16, 26], > > [ 7, 17, 27]]), array([[ 8, 18, 28], > > [ 9, 19, 29]])] > > > In [18]: np.array_split(a, [2,5,8], 0) > > Out[18]: > > [array([[ 0, 10, 20], > > [ 1, 11, 21]]), array([[ 2, 12, 22], > > [ 3, 13, 23], > > [ 4, 14, 24]]), array([[ 5, 15, 25], > > [ 6, 16, 26], > > [ 7, 17, 27]]), array([[ 8, 18, 28], > > [ 9, 19, 29]])] > > > Eric > > > > On Wed, Mar 23, 2016 at 9:06 AM, Ibrahim EL MEREHBI > wrote: >> >> Hello, >> >> I have a multi-diensional array that I would like to split its columns. >> >> For example consider, >> >> dat = np.array([np.arange(10),np.arange(10,20), np.arange(20,30)]).T >> >> array([[ 0, 10, 20], >> [ 1, 11, 21], >> [ 2, 12, 22], >> [ 3, 13, 23], >> [ 4, 14, 24], >> [ 5, 15, 25], >> [ 6, 16, 26], >> [ 7, 17, 27], >> [ 8, 18, 28], >> [ 9, 19, 29]]) >> >> >> I already can split one column at a time: >> >> np.array_split(dat[:,0], [2,5,8]) >> >> [array([0, 1]), array([2, 3, 4]), array([5, 6, 7]), array([8, 9])] >> >> >> How can I extend this for all columns and (overwrite or) have a new >> multi-dimensional array? >> >> Thank you, >> Bob >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > From sebastian at sipsolutions.net Wed Mar 23 10:21:57 2016 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Wed, 23 Mar 2016 15:21:57 +0100 Subject: [Numpy-discussion] Multi-dimensional array of splitted array In-Reply-To: References: <56F294E4.9080907@gmail.com> <56F29C2C.8040401@gmail.com> Message-ID: <1458742917.12837.13.camel@sipsolutions.net> On Mi, 2016-03-23 at 10:02 -0400, Joseph Fox-Rabinovitz wrote: > On Wed, Mar 23, 2016 at 9:37 AM, Ibrahim EL MEREHBI > wrote: > > Thanks Eric. I already checked that. It's not what I want. I think > > I wasn't > > clear about what I wanted. > > > > I want to split each column but I want to do it for each column and > > end up > > with an array. Here's the result I wish to have: > > > > array([[[0], [1, 2, 3, 4], [5, 6, 7], [8, 9]], > > [[10], [11, 12, 13, 14], [15, 16, 17], [18, 19]], > > [[20], [21, 21, 23, 24], [25, 26, 27], [28, 29]]], > > dtype=object) > > > > Apply [`np.stack`](http://docs.scipy.org/doc/numpy-1.10.0/reference/g > enerated/numpy.stack.html#numpy.stack) > to the result. It will merge the arrays the way you want. > It is simply impossible to stack those arrays like he wants, it is not a valid array. - Sebastian > -Joe > > > > > Sincerely Yours, > > Bob > > > > > > > > On 23/03/2016 14:17, Eric Moore wrote: > > > > Try just calling np.array_split on the full 2D array. It splits > > along a > > particular axis, which is selected using the axis argument of > > np.array_split. The axis to split along defaults to the first so > > the two > > calls to np.array_split below are exactly equivalent. > > > > In [16]: a = np.c_[:10,10:20,20:30] > > > > > > In [17]: np.array_split(a, [2,5,8]) > > > > Out[17]: > > > > [array([[ 0, 10, 20], > > > > [ 1, 11, 21]]), array([[ 2, 12, 22], > > > > [ 3, 13, 23], > > > > [ 4, 14, 24]]), array([[ 5, 15, 25], > > > > [ 6, 16, 26], > > > > [ 7, 17, 27]]), array([[ 8, 18, 28], > > > > [ 9, 19, 29]])] > > > > > > In [18]: np.array_split(a, [2,5,8], 0) > > > > Out[18]: > > > > [array([[ 0, 10, 20], > > > > [ 1, 11, 21]]), array([[ 2, 12, 22], > > > > [ 3, 13, 23], > > > > [ 4, 14, 24]]), array([[ 5, 15, 25], > > > > [ 6, 16, 26], > > > > [ 7, 17, 27]]), array([[ 8, 18, 28], > > > > [ 9, 19, 29]])] > > > > > > Eric > > > > > > > > On Wed, Mar 23, 2016 at 9:06 AM, Ibrahim EL MEREHBI < > > bobmerhebi at gmail.com> > > wrote: > > > > > > Hello, > > > > > > I have a multi-diensional array that I would like to split its > > > columns. > > > > > > For example consider, > > > > > > dat = np.array([np.arange(10),np.arange(10,20), > > > np.arange(20,30)]).T > > > > > > array([[ 0, 10, 20], > > > [ 1, 11, 21], > > > [ 2, 12, 22], > > > [ 3, 13, 23], > > > [ 4, 14, 24], > > > [ 5, 15, 25], > > > [ 6, 16, 26], > > > [ 7, 17, 27], > > > [ 8, 18, 28], > > > [ 9, 19, 29]]) > > > > > > > > > I already can split one column at a time: > > > > > > np.array_split(dat[:,0], [2,5,8]) > > > > > > [array([0, 1]), array([2, 3, 4]), array([5, 6, 7]), array([8, > > > 9])] > > > > > > > > > How can I extend this for all columns and (overwrite or) have a > > > new > > > multi-dimensional array? > > > > > > Thank you, > > > Bob > > > > > > > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at scipy.org > > > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From sebastian at sipsolutions.net Wed Mar 23 10:22:39 2016 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Wed, 23 Mar 2016 15:22:39 +0100 Subject: [Numpy-discussion] Multi-dimensional array of splitted array In-Reply-To: References: <56F294E4.9080907@gmail.com> <56F29C2C.8040401@gmail.com> Message-ID: <1458742959.12837.14.camel@sipsolutions.net> On Mi, 2016-03-23 at 10:02 -0400, Joseph Fox-Rabinovitz wrote: > On Wed, Mar 23, 2016 at 9:37 AM, Ibrahim EL MEREHBI > wrote: > > Thanks Eric. I already checked that. It's not what I want. I think > > I wasn't > > clear about what I wanted. > > > > I want to split each column but I want to do it for each column and > > end up > > with an array. Here's the result I wish to have: > > > > array([[[0], [1, 2, 3, 4], [5, 6, 7], [8, 9]], > > [[10], [11, 12, 13, 14], [15, 16, 17], [18, 19]], > > [[20], [21, 21, 23, 24], [25, 26, 27], [28, 29]]], > > dtype=object) > > > > Apply [`np.stack`](http://docs.scipy.org/doc/numpy-1.10.0/reference/g > enerated/numpy.stack.html#numpy.stack) > to the result. It will merge the arrays the way you want. Oh sorry, nvm. As an object array, it works of course.... > -Joe > > > > > Sincerely Yours, > > Bob > > > > > > > > On 23/03/2016 14:17, Eric Moore wrote: > > > > Try just calling np.array_split on the full 2D array. It splits > > along a > > particular axis, which is selected using the axis argument of > > np.array_split. The axis to split along defaults to the first so > > the two > > calls to np.array_split below are exactly equivalent. > > > > In [16]: a = np.c_[:10,10:20,20:30] > > > > > > In [17]: np.array_split(a, [2,5,8]) > > > > Out[17]: > > > > [array([[ 0, 10, 20], > > > > [ 1, 11, 21]]), array([[ 2, 12, 22], > > > > [ 3, 13, 23], > > > > [ 4, 14, 24]]), array([[ 5, 15, 25], > > > > [ 6, 16, 26], > > > > [ 7, 17, 27]]), array([[ 8, 18, 28], > > > > [ 9, 19, 29]])] > > > > > > In [18]: np.array_split(a, [2,5,8], 0) > > > > Out[18]: > > > > [array([[ 0, 10, 20], > > > > [ 1, 11, 21]]), array([[ 2, 12, 22], > > > > [ 3, 13, 23], > > > > [ 4, 14, 24]]), array([[ 5, 15, 25], > > > > [ 6, 16, 26], > > > > [ 7, 17, 27]]), array([[ 8, 18, 28], > > > > [ 9, 19, 29]])] > > > > > > Eric > > > > > > > > On Wed, Mar 23, 2016 at 9:06 AM, Ibrahim EL MEREHBI < > > bobmerhebi at gmail.com> > > wrote: > > > > > > Hello, > > > > > > I have a multi-diensional array that I would like to split its > > > columns. > > > > > > For example consider, > > > > > > dat = np.array([np.arange(10),np.arange(10,20), > > > np.arange(20,30)]).T > > > > > > array([[ 0, 10, 20], > > > [ 1, 11, 21], > > > [ 2, 12, 22], > > > [ 3, 13, 23], > > > [ 4, 14, 24], > > > [ 5, 15, 25], > > > [ 6, 16, 26], > > > [ 7, 17, 27], > > > [ 8, 18, 28], > > > [ 9, 19, 29]]) > > > > > > > > > I already can split one column at a time: > > > > > > np.array_split(dat[:,0], [2,5,8]) > > > > > > [array([0, 1]), array([2, 3, 4]), array([5, 6, 7]), array([8, > > > 9])] > > > > > > > > > How can I extend this for all columns and (overwrite or) have a > > > new > > > multi-dimensional array? > > > > > > Thank you, > > > Bob > > > > > > > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at scipy.org > > > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From bobmerhebi at gmail.com Wed Mar 23 10:38:53 2016 From: bobmerhebi at gmail.com (Ibrahim EL MEREHBI) Date: Wed, 23 Mar 2016 15:38:53 +0100 Subject: [Numpy-discussion] Multi-dimensional array of splitted array In-Reply-To: <1458742959.12837.14.camel@sipsolutions.net> References: <56F294E4.9080907@gmail.com> <56F29C2C.8040401@gmail.com> <1458742959.12837.14.camel@sipsolutions.net> Message-ID: <56F2AA7D.6070301@gmail.com> As an object, will it change how numpy operates? Sincerely Yours, Bob On 23/03/2016 15:22, Sebastian Berg wrote: > On Mi, 2016-03-23 at 10:02 -0400, Joseph Fox-Rabinovitz wrote: >> On Wed, Mar 23, 2016 at 9:37 AM, Ibrahim EL MEREHBI >> wrote: >>> Thanks Eric. I already checked that. It's not what I want. I think >>> I wasn't >>> clear about what I wanted. >>> >>> I want to split each column but I want to do it for each column and >>> end up >>> with an array. Here's the result I wish to have: >>> >>> array([[[0], [1, 2, 3, 4], [5, 6, 7], [8, 9]], >>> [[10], [11, 12, 13, 14], [15, 16, 17], [18, 19]], >>> [[20], [21, 21, 23, 24], [25, 26, 27], [28, 29]]], >>> dtype=object) >>> >> Apply [`np.stack`](http://docs.scipy.org/doc/numpy-1.10.0/reference/g >> enerated/numpy.stack.html#numpy.stack) >> to the result. It will merge the arrays the way you want. > Oh sorry, nvm. As an object array, it works of course.... > > >> -Joe >> >>> Sincerely Yours, >>> Bob >>> >>> >>> >>> On 23/03/2016 14:17, Eric Moore wrote: >>> >>> Try just calling np.array_split on the full 2D array. It splits >>> along a >>> particular axis, which is selected using the axis argument of >>> np.array_split. The axis to split along defaults to the first so >>> the two >>> calls to np.array_split below are exactly equivalent. >>> >>> In [16]: a = np.c_[:10,10:20,20:30] >>> >>> >>> In [17]: np.array_split(a, [2,5,8]) >>> >>> Out[17]: >>> >>> [array([[ 0, 10, 20], >>> >>> [ 1, 11, 21]]), array([[ 2, 12, 22], >>> >>> [ 3, 13, 23], >>> >>> [ 4, 14, 24]]), array([[ 5, 15, 25], >>> >>> [ 6, 16, 26], >>> >>> [ 7, 17, 27]]), array([[ 8, 18, 28], >>> >>> [ 9, 19, 29]])] >>> >>> >>> In [18]: np.array_split(a, [2,5,8], 0) >>> >>> Out[18]: >>> >>> [array([[ 0, 10, 20], >>> >>> [ 1, 11, 21]]), array([[ 2, 12, 22], >>> >>> [ 3, 13, 23], >>> >>> [ 4, 14, 24]]), array([[ 5, 15, 25], >>> >>> [ 6, 16, 26], >>> >>> [ 7, 17, 27]]), array([[ 8, 18, 28], >>> >>> [ 9, 19, 29]])] >>> >>> >>> Eric >>> >>> >>> >>> On Wed, Mar 23, 2016 at 9:06 AM, Ibrahim EL MEREHBI < >>> bobmerhebi at gmail.com> >>> wrote: >>>> Hello, >>>> >>>> I have a multi-diensional array that I would like to split its >>>> columns. >>>> >>>> For example consider, >>>> >>>> dat = np.array([np.arange(10),np.arange(10,20), >>>> np.arange(20,30)]).T >>>> >>>> array([[ 0, 10, 20], >>>> [ 1, 11, 21], >>>> [ 2, 12, 22], >>>> [ 3, 13, 23], >>>> [ 4, 14, 24], >>>> [ 5, 15, 25], >>>> [ 6, 16, 26], >>>> [ 7, 17, 27], >>>> [ 8, 18, 28], >>>> [ 9, 19, 29]]) >>>> >>>> >>>> I already can split one column at a time: >>>> >>>> np.array_split(dat[:,0], [2,5,8]) >>>> >>>> [array([0, 1]), array([2, 3, 4]), array([5, 6, 7]), array([8, >>>> 9])] >>>> >>>> >>>> How can I extend this for all columns and (overwrite or) have a >>>> new >>>> multi-dimensional array? >>>> >>>> Thank you, >>>> Bob >>>> >>>> >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at scipy.org >>>> https://mail.scipy.org/mailman/listinfo/numpy-discussion >>>> >>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> https://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> https://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.j.a.cock at googlemail.com Thu Mar 24 11:04:23 2016 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 24 Mar 2016 15:04:23 +0000 Subject: [Numpy-discussion] linux wheels coming soon In-Reply-To: References: Message-ID: Hi Nathaniel, Will you be providing portable Linux wheels aka manylinux1? https://www.python.org/dev/peps/pep-0513/ Does this also open up the door to releasing wheels for SciPy too? While speeding up "pip install" would be of benefit in itself, I am particularly keen to see this for use within automated testing frameworks like TravisCI where currently having to install NumPy (and SciPy) from source is an unreasonable overhead. Many thanks to everyone working on this, Peter On Tue, Mar 15, 2016 at 11:33 PM, Nathaniel Smith wrote: > Hi all, > > Just a heads-up that we're planning to upload Linux wheels for numpy > to PyPI soon. Unless there's some objection, these will be using > ATLAS, just like the current Windows wheels, for the same reasons -- > moving to something faster like OpenBLAS would be good, but given the > concerns about OpenBLAS's reliability we want to get something working > first and then worry about making it fast. (Plus it doesn't make sense > to ship different BLAS libraries on Windows versus Linux -- that just > multiplies our support burden for no reason.) > > -n > > -- > Nathaniel J. Smith -- https://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion From ralf.gommers at gmail.com Thu Mar 24 13:35:40 2016 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Thu, 24 Mar 2016 18:35:40 +0100 Subject: [Numpy-discussion] linux wheels coming soon In-Reply-To: References: Message-ID: On Thu, Mar 24, 2016 at 4:04 PM, Peter Cock wrote: > Hi Nathaniel, > > Will you be providing portable Linux wheels aka manylinux1? > https://www.python.org/dev/peps/pep-0513/ > > Does this also open up the door to releasing wheels for SciPy > too? > That should work just fine. > While speeding up "pip install" would be of benefit in itself, > I am particularly keen to see this for use within automated > testing frameworks like TravisCI where currently having to > install NumPy (and SciPy) from source is an unreasonable > overhead. > There's already http://travis-dev-wheels.scipy.org/ (latest dev versions of numpy and scipy) and http://travis-wheels.scikit-image.org/ (releases, there are multiple sources for this one) for TravisCI setups to reuse. Ralf > Many thanks to everyone working on this, > > Peter > > On Tue, Mar 15, 2016 at 11:33 PM, Nathaniel Smith wrote: > > Hi all, > > > > Just a heads-up that we're planning to upload Linux wheels for numpy > > to PyPI soon. Unless there's some objection, these will be using > > ATLAS, just like the current Windows wheels, for the same reasons -- > > moving to something faster like OpenBLAS would be good, but given the > > concerns about OpenBLAS's reliability we want to get something working > > first and then worry about making it fast. (Plus it doesn't make sense > > to ship different BLAS libraries on Windows versus Linux -- that just > > multiplies our support burden for no reason.) > > > > -n > > > > -- > > Nathaniel J. Smith -- https://vorpus.org > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > https://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Thu Mar 24 14:37:33 2016 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 24 Mar 2016 11:37:33 -0700 Subject: [Numpy-discussion] linux wheels coming soon In-Reply-To: References: Message-ID: On Mar 24, 2016 8:04 AM, "Peter Cock" wrote: > > Hi Nathaniel, > > Will you be providing portable Linux wheels aka manylinux1? > https://www.python.org/dev/peps/pep-0513/ Matthew Brett will (probably) do the actual work, but yeah, that's the idea exactly. Note the author list on that PEP ;-) -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.j.a.cock at googlemail.com Thu Mar 24 14:44:44 2016 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 24 Mar 2016 18:44:44 +0000 Subject: [Numpy-discussion] linux wheels coming soon In-Reply-To: References: Message-ID: On Thu, Mar 24, 2016 at 6:37 PM, Nathaniel Smith wrote: > On Mar 24, 2016 8:04 AM, "Peter Cock" wrote: >> >> Hi Nathaniel, >> >> Will you be providing portable Linux wheels aka manylinux1? >> https://www.python.org/dev/peps/pep-0513/ > > Matthew Brett will (probably) do the actual work, but yeah, that's the idea > exactly. Note the author list on that PEP ;-) > > -n Yep - I was partly double checking, but also aware many folk skim the NumPy list and might not be aware of PEP-513 and the standardisation efforts going on. Also in addition to http://travis-dev-wheels.scipy.org/ and http://travis-wheels.scikit-image.org/ mentioned by Ralf there is http://wheels.scipy.org/ which I presume will get the new Linux wheels once they go live. Is it possible to add a README to these listings explaining what they are intended to be used for? P.S. To save anyone else Googling, you can do things like this: pip install -r requirements.txt --timeout 60 --trusted-host travis-wheels.scikit-image.org -f http://travis-wheels.scikit-image.org/ Thanks, Peter From njs at pobox.com Thu Mar 24 22:46:07 2016 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 24 Mar 2016 19:46:07 -0700 Subject: [Numpy-discussion] linux wheels coming soon In-Reply-To: References: Message-ID: On Thu, Mar 24, 2016 at 11:44 AM, Peter Cock wrote: > On Thu, Mar 24, 2016 at 6:37 PM, Nathaniel Smith wrote: >> On Mar 24, 2016 8:04 AM, "Peter Cock" wrote: >>> >>> Hi Nathaniel, >>> >>> Will you be providing portable Linux wheels aka manylinux1? >>> https://www.python.org/dev/peps/pep-0513/ >> >> Matthew Brett will (probably) do the actual work, but yeah, that's the idea >> exactly. Note the author list on that PEP ;-) >> >> -n > > Yep - I was partly double checking, but also aware many folk > skim the NumPy list and might not be aware of PEP-513 and > the standardisation efforts going on. > > Also in addition to http://travis-dev-wheels.scipy.org/ and > http://travis-wheels.scikit-image.org/ mentioned by Ralf there > is http://wheels.scipy.org/ which I presume will get the new > Linux wheels once they go live. The new wheels will go up on pypi, and I guess once everyone has wheels on pypi then these ad-hoc wheel servers that existed only as a way to distribute Linux wheels will become obsolete. (travis-dev-wheels will remain useful, though, because its purpose is to hold up-to-the-minute builds of project master branches to allow downstream projects to get early warning of breaking changes -- we don't plan to upload to pypi after every commit :-).) -n -- Nathaniel J. Smith -- https://vorpus.org From rmcgibbo at gmail.com Thu Mar 24 23:02:03 2016 From: rmcgibbo at gmail.com (Robert T. McGibbon) Date: Thu, 24 Mar 2016 23:02:03 -0400 Subject: [Numpy-discussion] linux wheels coming soon In-Reply-To: References: Message-ID: I suspect that many of the maintainers of major scipy-ecosystem projects are aware of these (or other similar) travis wheel caches, but would guess that the pool of travis-ci python users who weren't aware of these wheel caches is much much larger. So there will still be a lot of travis-ci clock cycles saved by manylinux wheels. -Robert On Thu, Mar 24, 2016 at 10:46 PM, Nathaniel Smith wrote: > On Thu, Mar 24, 2016 at 11:44 AM, Peter Cock > wrote: > > On Thu, Mar 24, 2016 at 6:37 PM, Nathaniel Smith wrote: > >> On Mar 24, 2016 8:04 AM, "Peter Cock" > wrote: > >>> > >>> Hi Nathaniel, > >>> > >>> Will you be providing portable Linux wheels aka manylinux1? > >>> https://www.python.org/dev/peps/pep-0513/ > >> > >> Matthew Brett will (probably) do the actual work, but yeah, that's the > idea > >> exactly. Note the author list on that PEP ;-) > >> > >> -n > > > > Yep - I was partly double checking, but also aware many folk > > skim the NumPy list and might not be aware of PEP-513 and > > the standardisation efforts going on. > > > > Also in addition to http://travis-dev-wheels.scipy.org/ and > > http://travis-wheels.scikit-image.org/ mentioned by Ralf there > > is http://wheels.scipy.org/ which I presume will get the new > > Linux wheels once they go live. > > The new wheels will go up on pypi, and I guess once everyone has > wheels on pypi then these ad-hoc wheel servers that existed only as a > way to distribute Linux wheels will become obsolete. > > (travis-dev-wheels will remain useful, though, because its purpose is > to hold up-to-the-minute builds of project master branches to allow > downstream projects to get early warning of breaking changes -- we > don't plan to upload to pypi after every commit :-).) > > -n > > -- > Nathaniel J. Smith -- https://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -- -Robert -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.j.a.cock at googlemail.com Fri Mar 25 09:39:45 2016 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 25 Mar 2016 13:39:45 +0000 Subject: [Numpy-discussion] linux wheels coming soon In-Reply-To: References: Message-ID: On Fri, Mar 25, 2016 at 3:02 AM, Robert T. McGibbon wrote: > I suspect that many of the maintainers of major scipy-ecosystem projects are > aware of these (or other similar) travis wheel caches, but would guess that > the pool of travis-ci python users who weren't aware of these wheel caches > is much much larger. So there will still be a lot of travis-ci clock cycles > saved by manylinux wheels. > > -Robert Yes exactly. Availability of NumPy Linux wheels on PyPI is definitely something I would suggest adding to the release notes. Hopefully this will help trigger a general availability of wheels in the numpy-ecosystem :) In the case of Travis CI, their VM images for Python already have a version of NumPy installed, but having the latest version of NumPy and SciPy etc available as Linux wheels would be very nice. Peter P.S. As an aside, PyPI seems to be having trouble displaying the main NumPy page https://pypi.python.org/pypi/numpy at the moment (Error 404 page): https://bitbucket.org/pypa/pypi/issues/423/version-less-page-for-numpy-broken-error From jaime.frio at gmail.com Sat Mar 26 16:16:13 2016 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Sat, 26 Mar 2016 21:16:13 +0100 Subject: [Numpy-discussion] Make np.bincount output same dtype as weights Message-ID: Hi all, I have just submitted a PR (#7464 ) that fixes an enhancement request (#6854 ), making np.bincount return an array of the same type as the weights parameter. This is an important deviation from current behavior, which always casts weights to double, and always returns a double array, so I would like to hear what others think about the worthiness of this. Main discussion points: - np.bincount now works with complex weights (yay!), I guess this should be a pretty uncontroversial enhancement. - The return is of the same type as weights, which means that small integers are very likely to overflow. This is exactly what #6854 requested, but perhaps we should promote the output for integers to a long, as we do in np.sum? - Boolean arrays stay boolean, and OR, rather than sum, the weights. Is this what one would want? If we decide that integer promotion is the way to go, perhaps booleans should go in the same pack? - This new implementation currently supports all of the reasonable native types, but has no fallback for user defined types. I guess we should attempt to cast the array to double as before if no native loop can be found? It would be good to have a way of testing this though, any thoughts on how to go about this? - Does a behavior change like this require some deprecation period? What would that look like? - I have also added broadcasting of weights to the full size of list, so that one can do e.g. np.bincount([1, 2, 3], weights=2j) without having to tile the single weight to the size of the bins list. Any other thoughts are very welcome as well! Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Mar 26 16:28:40 2016 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 26 Mar 2016 14:28:40 -0600 Subject: [Numpy-discussion] Replacing paver with invoke Message-ID: HI All, I've been looking around for something easier to use than paver for running shell commands. Fabric looked like a possibility, but it doesn't support Python 3 and in currently in the midst of a rewrite, not least because it has become burdened with technical dept. Invoke splits off fabric's local functionality, supports Python3, and looks to be in active, if not yet finished, development. It is python but with a make flavor. Indeed, I wonder if it might possible to use it in place of make for the documentation. It also makes it easier to work with the output of shell commands, which would be helpful for making lists of authors and commits for releases. In any case, I thought it worth raising the topic on the list to see if others had suggestions or forceful disagreements. Thoughts? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaime.frio at gmail.com Sat Mar 26 16:48:20 2016 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Sat, 26 Mar 2016 21:48:20 +0100 Subject: [Numpy-discussion] Some thoughts on ufunc reduction methods Message-ID: The following write up was triggered by issue #7265 , you may want to review that discussion for context: In the docs, ufunc reduction operations are explained as "cumulatively applying the ufunc along an axis." This basically limits just about any reduction operation to ufuncs with two inpus and one output. Another take on the same reduction idea is that, if the inputs are a and b and the output is c, c is the result of updating a with the value in b. With this idea in mind, in e.g. add(a, b) -> c = a + b, c would be thought of as the updated value of a, after applying b to it. One would expect that, for any registered loop suitable for this interpretation, the types of a and c would be identical, but b could be of some other type. As an example consider count(a, b) -> c = a + 1, where a and c would typically be intp, but b could have just about any type. The power of this description of ufuncs suited for reduction is that it is very easy to generalize beyond the current "two inputs, one output" restriction. E.g. a "sum of squared differences" ufunc defined as: ssqd(sd2, x, y) -> sd2 += (x - y)**2, could be used to compute squared euclidean distances between vectors doing: ssqd.reduce((x, y)), without the memory overhead of current available approaches. In general, a reduction of a ufunc with m inputs and n outputs, m > n, would produce n results out of m - n inputs. Such a ufunc would have to define a suitable identity to initialize each of those m - n outputs at the beginning of any reduction, rather than the single identity ufuncs now hold. It can be argued that this generalization of reduction is just a redefinition of a subset of what gufuncs can do, e.g. a "sum of squared differences" gufunc with signature (n),(n)->() would do the same as the above reduction, probably faster. And it is not clear that getting accumulate and reduceat for free, or reduction over multiple axes, justifies the extra complexity. Especially since it is likely that something similarly powerful could be built on top of gufuncs. Where an approach such as this would shine is if we had a reduction method that did not require a single strided memory segment to act on. Setting aside the generalization of reduction described above, a groupby or reduceby method would take an array of values and an array of groups, and compute a reduction value for each of the unique groups. With the above generalizations, one could compute, e.g. the variance over each group by doing something like: # how exactly does keepdims work here is not clear at all! counts = count.reduceby(values, groups, keepdims=True) mean = np.add.reduceby(values, groups, keepdims=True) / counts var = np.ssqd.reduceby((values, mean), groups) / counts I am not fully sure of whether this is just useful for this particular little example of computing variance using a ufunc method that doesn't exist, or if it is valid in other settings too. Implementing this would add complexity to an already complex system, so it better be good for something! One of the weakest points I see is the need to rely on a set of predefined identities to start the reduction. Think of trying to generalize reduction to gufuncs, another worthy effort, and take matrix multiplication with signature (i,j),(j,k)->(i,k). In the current paradigm you would impose that both inputs and the output parameters be interchangeable, which leads rather naturally to the condition i = j = k for this reduction to be doable. But with the new paradigm you only need that the first input and output be interchangeable, which only provides you with j = k as a condition. And while this is a more general behavior, the questions of what value to set i to in the output, and what to init the output to, when trying to do a reduction over a stack of square arrays, make it fairly unusable. So a backup to the "two inputs, one output, initialize to some item in the reduction array" would have to be kept in place It is also important to note that expanding reduction to gufuncs would probably require that we impose some iteration order guarantees on ourselves, as the poster child for this (matrix multiplication) is not in general a commutative operation. Would we want to do this to ourselves? Would the features today justify the burden for ever after? Any thoughts on where, if anywhere, to go with this, are very welcome! Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Sat Mar 26 17:05:24 2016 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 26 Mar 2016 14:05:24 -0700 Subject: [Numpy-discussion] ATLAS build errors Message-ID: Hi, I'm workon on building manylinux wheels for numpy, and I ran into unexpected problems with a numpy built against the ATLAS 3.8 binaries supplied by CentOS 5. I'm working on the manylinux docker container [1] To get ATLAS, I'm doing `yum install atlas-devel` which gets the default CentOS 5 ATLAS packages. I then build numpy. Local tests work fine, but when I test on travis, I get these errors [2]: ====================================================================== ERROR: test_svd_build (test_regression.TestRegression) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/travis/build/matthew-brett/manylinux-testing/venv/lib/python2.7/site-packages/numpy/linalg/tests/test_regression.py", line 56, in test_svd_build u, s, vh = linalg.svd(a) File "/home/travis/build/matthew-brett/manylinux-testing/venv/lib/python2.7/site-packages/numpy/linalg/linalg.py", line 1359, in svd u, s, vt = gufunc(a, signature=signature, extobj=extobj) ValueError: On entry to DGESDD parameter number 12 had an illegal value ====================================================================== FAIL: test_lapack (test_build.TestF77Mismatch) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/travis/build/matthew-brett/manylinux-testing/venv/lib/python2.7/site-packages/numpy/testing/decorators.py", line 146, in skipper_func return f(*args, **kwargs) File "/home/travis/build/matthew-brett/manylinux-testing/venv/lib/python2.7/site-packages/numpy/linalg/tests/test_build.py", line 56, in test_lapack information.""") AssertionError: Both g77 and gfortran runtimes linked in lapack_lite ! This is likely to cause random crashes and wrong results. See numpy INSTALL.txt for more information. Sure enough, scipy built the same way segfaults or fails to import (see [2]). I get no errors for an openblas build. Does anyone recognize these? How should I modify the build to avoid them? Cheers, Matthew [1] https://github.com/pypa/manylinux [2] https://travis-ci.org/matthew-brett/manylinux-testing/jobs/118712090 From jni.soma at gmail.com Sat Mar 26 18:10:00 2016 From: jni.soma at gmail.com (Juan Nunez-Iglesias) Date: Sun, 27 Mar 2016 09:10:00 +1100 Subject: [Numpy-discussion] Make np.bincount output same dtype as weights In-Reply-To: References: Message-ID: <2ddc59b6-ce4a-4c45-bf95-1ff5b8c24d2b@Spark> Just to clarify, this will only affect weighted bincounts, right? I can't tell you in how many places my code depends on the return type being integer!!! On 27 Mar 2016, 7:16 AM +1100, Jaime Fern?ndez del R?o, wrote: > Hi all, > > I have just submitted a PR (#7464(https://github.com/numpy/numpy/pull/7464)) that fixes an enhancement request (#6854(https://github.com/numpy/numpy/issues/6854)), makingnp.bincountreturn an array of the same type as theweightsparameter.This is an important deviation from current behavior, which always castsweightstodouble, and always returns adoublearray, so I would like to hear what others think about the worthiness of this.Main discussion points: > np.bincountnow works with complex weights (yay!), I guess this should be a pretty uncontroversial enhancement. > The return is of the same type asweights, which means that small integers are very likely to overflow.This is exactly what #6854 requested, but perhaps we should promote the output for integers to along, as we do innp.sum? > Boolean arrays stay boolean, and OR, rather than sum, the weights. Is this what one would want? If we decide that integer promotion is the way to go, perhaps booleans should go in the same pack? > This new implementation currently supports all of the reasonable native types, but has no fallback for user defined types.I guess we should attempt to cast the array to double as before if no native loop can be found? It would be good to have a way of testing this though, any thoughts on how to go about this? > Does a behavior change like this require some deprecation period? What would that look like? > I have also added broadcasting of weights to the full size of list, so that one can do e.g.np.bincount([1, 2, 3], weights=2j)without having to tile the single weight to the size of the bins list. > Any other thoughts are very welcome as well! > > Jaime > > -- > (\__/) > ( O.o) > (><) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial._______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaime.frio at gmail.com Sat Mar 26 18:21:46 2016 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Sat, 26 Mar 2016 23:21:46 +0100 Subject: [Numpy-discussion] Make np.bincount output same dtype as weights In-Reply-To: <2ddc59b6-ce4a-4c45-bf95-1ff5b8c24d2b@Spark> References: <2ddc59b6-ce4a-4c45-bf95-1ff5b8c24d2b@Spark> Message-ID: On Sat, Mar 26, 2016 at 11:10 PM, Juan Nunez-Iglesias wrote: > Just to clarify, this will only affect weighted bincounts, right? I can't > tell you in how many places my code depends on the return type being > integer!!! > Indeed! Unweighted bincounts still return, as all counting operations, a np.intp array. Sorry for the noise! Jaime > > > On 27 Mar 2016, 7:16 AM +1100, Jaime Fern?ndez del R?o < > jaime.frio at gmail.com>, wrote: > > Hi all, > > I have just submitted a PR (#7464 > ) that fixes an enhancement > request (#6854 ), making > np.bincount return an array of the same type as the weights parameter. > This is an important deviation from current behavior, which always casts > weights to double, and always returns a double array, so I would like to > hear what others think about the worthiness of this. Main discussion > points: > > - np.bincount now works with complex weights (yay!), I guess this > should be a pretty uncontroversial enhancement. > - The return is of the same type as weights, which means that small > integers are very likely to overflow. This is exactly what #6854 > requested, but perhaps we should promote the output for integers to a > long, as we do in np.sum? > - Boolean arrays stay boolean, and OR, rather than sum, the weights. > Is this what one would want? If we decide that integer promotion is the way > to go, perhaps booleans should go in the same pack? > - This new implementation currently supports all of the reasonable > native types, but has no fallback for user defined types. I guess we > should attempt to cast the array to double as before if no native loop can > be found? It would be good to have a way of testing this though, any > thoughts on how to go about this? > - Does a behavior change like this require some deprecation period? > What would that look like? > - I have also added broadcasting of weights to the full size of list, > so that one can do e.g. np.bincount([1, 2, 3], weights=2j) without > having to tile the single weight to the size of the bins list. > > Any other thoughts are very welcome as well! > > Jaime > > -- > (\__/) > ( O.o) > ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes > de dominaci?n mundial. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jfoxrabinovitz at gmail.com Sat Mar 26 21:54:21 2016 From: jfoxrabinovitz at gmail.com (=?utf-8?B?Sm9zZXBoIEZveC1SYWJpbm92aXR6?=) Date: Sat, 26 Mar 2016 18:54:21 -0700 (PDT) Subject: [Numpy-discussion] Make np.bincount output same dtype as weights Message-ID: <000f4242.46d2f6b061c54796@gmail.com> Would it make sense to just make the output type large enough to hold the cumulative sum of the weights? - Joseph Fox-Rabinovitz ------ Original message------From: Jaime Fern?ndez del R?oDate: Sat, Mar 26, 2016 16:16To: Discussion of Numerical Python;Subject:[Numpy-discussion] Make np.bincount output same dtype as weightsHi all, I have just submitted a PR (#7464) that fixes an enhancement request (#6854), making np.bincount return an array of the same type as the weights parameter.? This is an important deviation from current behavior, which always casts weights to double, and always returns a double array, so I would like to hear what others think about the worthiness of this.? Main discussion points:np.bincount now works with complex weights (yay!), I guess this should be a pretty uncontroversial enhancement.The return is of the same type as weights, which means that small integers are very likely to overflow.? This is exactly what #6854 requested, but perhaps we should promote the output for integers to a long, as we do in np.sum?Boolean arrays stay boolean, and OR, rather than sum, the weights. Is this what one would want? If we decide that integer promotion is the way to go, perhaps booleans should go in the same pack?This new implementation currently supports all of the reasonable native types, but has no fallback for user defined types.? I guess we should attempt to cast the array to double as before if no native loop can be found? It would be good to have a way of testing this though, any thoughts on how to go about this?Does a behavior change like this require some deprecation period? What would that look like?I have also added broadcasting of weights to the full size of list, so that one can do e.g.?np.bincount([1, 2, 3], weights=2j)?without having to tile the single weight to the size of the bins list. Any other thoughts are very welcome as well! Jaime -- (__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sat Mar 26 22:58:00 2016 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 26 Mar 2016 22:58:00 -0400 Subject: [Numpy-discussion] Make np.bincount output same dtype as weights In-Reply-To: <000f4242.46d2f6b061c54796@gmail.com> References: <000f4242.46d2f6b061c54796@gmail.com> Message-ID: On Sat, Mar 26, 2016 at 9:54 PM, Joseph Fox-Rabinovitz wrote: > Would it make sense to just make the output type large enough to hold the > cumulative sum of the weights? > > > - Joseph Fox-Rabinovitz > > ------ Original message------ > > From: Jaime Fern?ndez del R?o > > Date: Sat, Mar 26, 2016 16:16 > > To: Discussion of Numerical Python; > > Subject:[Numpy-discussion] Make np.bincount output same dtype as weights > > Hi all, > > I have just submitted a PR (#7464) that fixes an enhancement request > (#6854), making np.bincount return an array of the same type as the weights > parameter. This is an important deviation from current behavior, which > always casts weights to double, and always returns a double array, so I > would like to hear what others think about the worthiness of this. Main > discussion points: > > np.bincount now works with complex weights (yay!), I guess this should be a > pretty uncontroversial enhancement. > The return is of the same type as weights, which means that small integers > are very likely to overflow. This is exactly what #6854 requested, but > perhaps we should promote the output for integers to a long, as we do in > np.sum? I always thought of bincount with weights just as a group-by sum. So it would be easier to remember and have fewer surprises if it matches the behavior of np.sum. > Boolean arrays stay boolean, and OR, rather than sum, the weights. Is this > what one would want? If we decide that integer promotion is the way to go, > perhaps booleans should go in the same pack? Isn't this calculating the sum, i.e. count of True by group, already? Based on a quick example with numpy 1.9.2, I don't think I ever used bool weights before. > This new implementation currently supports all of the reasonable native > types, but has no fallback for user defined types. I guess we should > attempt to cast the array to double as before if no native loop can be > found? It would be good to have a way of testing this though, any thoughts > on how to go about this? > Does a behavior change like this require some deprecation period? What would > that look like? > I have also added broadcasting of weights to the full size of list, so that > one can do e.g. np.bincount([1, 2, 3], weights=2j) without having to tile > the single weight to the size of the bins list. > > Any other thoughts are very welcome as well! (2-D weights ?) Josef > > Jaime > > -- > (__/) > ( O.o) > ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de > dominaci?n mundial. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > From jni.soma at gmail.com Sun Mar 27 00:12:44 2016 From: jni.soma at gmail.com (Juan Nunez-Iglesias) Date: Sun, 27 Mar 2016 15:12:44 +1100 Subject: [Numpy-discussion] Make np.bincount output same dtype as weights In-Reply-To: References: <000f4242.46d2f6b061c54796@gmail.com> Message-ID: Thanks for clarifying, Jaime, and fwiw I agree with Josef: I would expect np.bincount to behave like np.sum with regards to promoting weights dtypes. Including bool. On Sun, Mar 27, 2016 at 1:58 PM, wrote: > On Sat, Mar 26, 2016 at 9:54 PM, Joseph Fox-Rabinovitz > wrote: > > Would it make sense to just make the output type large enough to hold the > > cumulative sum of the weights? > > > > > > - Joseph Fox-Rabinovitz > > > > ------ Original message------ > > > > From: Jaime Fern?ndez del R?o > > > > Date: Sat, Mar 26, 2016 16:16 > > > > To: Discussion of Numerical Python; > > > > Subject:[Numpy-discussion] Make np.bincount output same dtype as weights > > > > Hi all, > > > > I have just submitted a PR (#7464) that fixes an enhancement request > > (#6854), making np.bincount return an array of the same type as the > weights > > parameter. This is an important deviation from current behavior, which > > always casts weights to double, and always returns a double array, so I > > would like to hear what others think about the worthiness of this. Main > > discussion points: > > > > np.bincount now works with complex weights (yay!), I guess this should > be a > > pretty uncontroversial enhancement. > > The return is of the same type as weights, which means that small > integers > > are very likely to overflow. This is exactly what #6854 requested, but > > perhaps we should promote the output for integers to a long, as we do in > > np.sum? > > I always thought of bincount with weights just as a group-by sum. So > it would be easier to remember and have fewer surprises if it matches > the behavior of np.sum. > > > Boolean arrays stay boolean, and OR, rather than sum, the weights. Is > this > > what one would want? If we decide that integer promotion is the way to > go, > > perhaps booleans should go in the same pack? > > Isn't this calculating the sum, i.e. count of True by group, already? > Based on a quick example with numpy 1.9.2, I don't think I ever used > bool weights before. > > > > This new implementation currently supports all of the reasonable native > > types, but has no fallback for user defined types. I guess we should > > attempt to cast the array to double as before if no native loop can be > > found? It would be good to have a way of testing this though, any > thoughts > > on how to go about this? > > Does a behavior change like this require some deprecation period? What > would > > that look like? > > I have also added broadcasting of weights to the full size of list, so > that > > one can do e.g. np.bincount([1, 2, 3], weights=2j) without having to tile > > the single weight to the size of the bins list. > > > > Any other thoughts are very welcome as well! > > (2-D weights ?) > > > Josef > > > > > > Jaime > > > > -- > > (__/) > > ( O.o) > > ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus > planes de > > dominaci?n mundial. > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaime.frio at gmail.com Sun Mar 27 04:36:21 2016 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Sun, 27 Mar 2016 10:36:21 +0200 Subject: [Numpy-discussion] Behavior of .reduceat() Message-ID: Two of the oldest issues in the tracker (#834 and #835 ) are about how .reduceat() handles its indices parameter. I have been taking a look at the source code, and it would be relatively easy to modify, the hardest part being to figure out what the exact behavior should be. Current behavior is that np.ufunc.reduceat(x, ind) returns [np.ufunc.reduce(a[ind[i]:ind[i+1]] for i in range(len(ind))] with a couple of caveats: 1. if ind[i] >= ind[i+1], then a[ind[i]] is returned, rather than a reduction over an empty slice. 2. an index of len(ind) is appended to the indices argument, to be used as the endpoint of the last slice to reduce over. 3. aside from this last case, the indices are required to be strictly inbounds, 0 <= index < len(x), or an error is raised The proposed new behavior, with some optional behaviors, would be: 1. if ind[i] >= ind[i+1], then a reduction over an empty slice, i.e. the ufunc identity, is returned. This includes raising an error if the ufunc does not have an identity, e.g. np.minimum. 2. to fully support the "reduction over slices" idea, some form of out of bounds indices should be allowed. This could mean either that: 1. only index = len(x) is allowed without raising an error, to allow computing the full reduction anywhere, not just as the last entry of the return, or 2. allow any index in -len(x) <= index <= len(x), with the usual meaning given to negative values, or 3. any index is allowed, with reduction results clipped to existing values (and the usual meaning for negative values). 3. Regarding the appending of that last index of len(ind) to indices, we could: 1. keep appending it, or 2. never append it, since you can now request it without an error being raised, or 3. only append it if the last index is smaller than len(x). My thoughts on the options: - The minimal, more conservative approach would go with 2.1 and 3.1. And of course 1, if we don't implement that none of this makes sense. - I kind of think 2.2 or even 2.3 are a nice enhancement that shouldn't break too much stuff. - 3.2 I'm not sure about, probably hurts more than it helps at this point, although in a brand new design you probably would either not append the last index or also prepend a zero, as in np.split. - And 3.3 seems too magical, probably not a good idea, only listed it for completeness. Any other thoughts or votes on what, if anything, should we implement, and what the deprecation of current behavior should look like? Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Sun Mar 27 13:15:12 2016 From: pav at iki.fi (Pauli Virtanen) Date: Sun, 27 Mar 2016 17:15:12 +0000 (UTC) Subject: [Numpy-discussion] ATLAS build errors References: Message-ID: Sat, 26 Mar 2016 14:05:24 -0700, Matthew Brett kirjoitti: > I'm workon on building manylinux wheels for numpy, and I ran into > unexpected problems with a numpy built against the ATLAS 3.8 binaries > supplied by CentOS 5. [clip] > Does anyone recognize these? How should I modify the build to avoid > them? Maybe the ATLAS binaries supplied were compiled with g77 instead of gfortran. If so, they should not be used with gfortran --- need to recompile. Also, in the past ATLAS binaries shipped by distributions had severe bugs. However, 3.8.x may be a new enough version. -- Pauli Virtanen From ndarray at mac.com Sun Mar 27 17:00:51 2016 From: ndarray at mac.com (Alexander Belopolsky) Date: Sun, 27 Mar 2016 17:00:51 -0400 Subject: [Numpy-discussion] Why does asarray() create an intermediate memoryview? Message-ID: In the following session a numpy array is created from an stdlib array: In [1]: import array In [2]: base = array.array('i', [1, 2]) In [3]: a = np.asarray(base) In [4]: a.base Out[4]: In [5]: a.base.obj Out[5]: array('i', [1, 2]) In [6]: a.base.obj is base Out[6]: True Why can't a.base be base? What is the need for the intermediate memoryview object? -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Sun Mar 27 18:21:32 2016 From: pav at iki.fi (Pauli Virtanen) Date: Sun, 27 Mar 2016 22:21:32 +0000 (UTC) Subject: [Numpy-discussion] Why does asarray() create an intermediate memoryview? References: Message-ID: Sun, 27 Mar 2016 17:00:51 -0400, Alexander Belopolsky kirjoitti: [clip] > Why can't a.base be base? What is the need for the intermediate > memoryview object? Implementation detail vs. life cycle management of buffer acquisitions. The PEP3118 Py_buffer structure representing an acquired buffer is a C struct that is not safe to copy (!), and needs to sit in an allocated blob of memory whose life cycle has to be managed. The acquisition also needs to be released after use. Python's memoryview object happens to be a convenient way to babysit this. Rather than adding a new entry to the ArrayObject struct for a potential acquired buffer and inserting corresponding release calls, I picked a more localized solution where the acquisition is managed by the memoryview object rather than ndarray itself, and the life cycle works out via the pre-existing ndarray.base refcounting. -- Pauli Virtanen From charlesr.harris at gmail.com Sun Mar 27 19:32:46 2016 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 27 Mar 2016 17:32:46 -0600 Subject: [Numpy-discussion] Numpy 1.11.0 released Message-ID: Hi All, I'm pleased to announce the release of Numpy 1.11.0. This release is the result of six months of work comprising 371 merged pull requests submitted by 112 authors containing many bug fixes and improvements. Highlights are: - The datetime64 type is now timezone naive. - A dtype parameter has been added to ``randint``. - Improved detection of two arrays possibly sharing memory. - Automatic bin size estimation for ``np.histogram``. - Speed optimization of A @ A.T and dot(A, A.T). - New function ``np.moveaxis`` for reordering array axes. Source files and documentation can be found on Sourceforge , while source files and OS X wheels for Python 2.7, 3.3, 3.4, and 3.5 can be installed from Pypi. Note that this is the last release to support Python 2.6, 3.2, and 3.3. Contributors are listed below in alphabetical order with the names of new contributors starred. Abdullah Alrasheed* Aditya Panchal* Alain* Alex Griffing Alex Rogozhnikov* Alex Willmer Alexander Heger* Allan Haldane Anatoly Techtonik* Andrew Nelson Anne Archibald Antoine Pitrou Antony Lee* Behzad Nouri Bertrand Lefebvre Blake Griffith Boxiang Sun* Brigitta Sipocz* Carl Kleffner Charles Harris Chris Hogan* Christoph Gohlke Colin Jermain* Cong Ma* Daniel David Freese David Sanders* Dmitry Odzerikho* Dmitry Zagorny* Eric Larson* Eric Moore Ethan Kruse* Eugene Krokhalev* Evgeni Burovski* Francis T. O'Donovan* Fran?ois Boulogne* Gabi Davar* Gerrit Holl Gopal Singh Meena* Greg Yang* Greg Young* Gregory R. Lee Griffin Hosseinzadeh* Hassan Kibirige* Holger Kohr* Ian Henriksen Iceman9* Jaime Fernandez James Camel* Jason King* John Bjorn Nelson* John Kirkham Jonathan Helmus Jonathan Underwood* Joseph Fox-Rabinovitz* Julian Taylor Julien Dubois* Julien Lhermitte* Julien Schueller* J?rn Hees* Konstantinos Psychas* Lars Buitinck Luke Zoltan Kelley* MaPePeR* Mad Physicist* Mark Wiebe Marten van Kerkwijk Matthew Brett Matthias Geier* Maximilian Trescher* Michael K. Tran* Michael Behrisch* Michael Currie* Michael L?ffler* Nathaniel Hellabyte* Nathaniel J. Smith Nick Papior* Nicolas Calle* Nicol?s Della Penna* Olivier Grisel Pauli Virtanen Peter Iannucci Phaiax* Ralf Gommers Rehas Sachdeva* Ronan Lamy Ruediger Meier* Ryan Grout* Ryosuke Okuta* R?my L?one* Sam Radhakrishnan* Samuel St-Jean* Sebastian Berg Simon Conseil* Stephan Hoyer Stuart Archibald* Stuart Berg Sumith* Tapasweni Pathak* Thomas Robitaille Tobias Megies* Tushar Gautam* Varun Nayyar* Vincent Legoll* Warren Weckesser Wendell Smith Yash Mehrotra* Yifan Li* endolith floatingpointstack* ldoddema* yolanda15* Thanks to all who worked on this release. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From perimosocordiae at gmail.com Mon Mar 28 15:55:14 2016 From: perimosocordiae at gmail.com (CJ Carey) Date: Mon, 28 Mar 2016 14:55:14 -0500 Subject: [Numpy-discussion] Make np.bincount output same dtype as weights In-Reply-To: References: <000f4242.46d2f6b061c54796@gmail.com> Message-ID: Another +1 for Josef's interpretation from me. Consistency with np.sum seems like the best option. On Sat, Mar 26, 2016 at 11:12 PM, Juan Nunez-Iglesias wrote: > Thanks for clarifying, Jaime, and fwiw I agree with Josef: I would expect > np.bincount to behave like np.sum with regards to promoting weights dtypes. > Including bool. > > On Sun, Mar 27, 2016 at 1:58 PM, wrote: > >> On Sat, Mar 26, 2016 at 9:54 PM, Joseph Fox-Rabinovitz >> wrote: >> > Would it make sense to just make the output type large enough to hold >> the >> > cumulative sum of the weights? >> > >> > >> > - Joseph Fox-Rabinovitz >> > >> > ------ Original message------ >> > >> > From: Jaime Fern?ndez del R?o >> > >> > Date: Sat, Mar 26, 2016 16:16 >> > >> > To: Discussion of Numerical Python; >> > >> > Subject:[Numpy-discussion] Make np.bincount output same dtype as weights >> > >> > Hi all, >> > >> > I have just submitted a PR (#7464) that fixes an enhancement request >> > (#6854), making np.bincount return an array of the same type as the >> weights >> > parameter. This is an important deviation from current behavior, which >> > always casts weights to double, and always returns a double array, so I >> > would like to hear what others think about the worthiness of this. Main >> > discussion points: >> > >> > np.bincount now works with complex weights (yay!), I guess this should >> be a >> > pretty uncontroversial enhancement. >> > The return is of the same type as weights, which means that small >> integers >> > are very likely to overflow. This is exactly what #6854 requested, but >> > perhaps we should promote the output for integers to a long, as we do in >> > np.sum? >> >> I always thought of bincount with weights just as a group-by sum. So >> it would be easier to remember and have fewer surprises if it matches >> the behavior of np.sum. >> >> > Boolean arrays stay boolean, and OR, rather than sum, the weights. Is >> this >> > what one would want? If we decide that integer promotion is the way to >> go, >> > perhaps booleans should go in the same pack? >> >> Isn't this calculating the sum, i.e. count of True by group, already? >> Based on a quick example with numpy 1.9.2, I don't think I ever used >> bool weights before. >> >> >> > This new implementation currently supports all of the reasonable native >> > types, but has no fallback for user defined types. I guess we should >> > attempt to cast the array to double as before if no native loop can be >> > found? It would be good to have a way of testing this though, any >> thoughts >> > on how to go about this? >> > Does a behavior change like this require some deprecation period? What >> would >> > that look like? >> > I have also added broadcasting of weights to the full size of list, so >> that >> > one can do e.g. np.bincount([1, 2, 3], weights=2j) without having to >> tile >> > the single weight to the size of the bins list. >> > >> > Any other thoughts are very welcome as well! >> >> (2-D weights ?) >> >> >> Josef >> >> >> > >> > Jaime >> > >> > -- >> > (__/) >> > ( O.o) >> > ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus >> planes de >> > dominaci?n mundial. >> > >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at scipy.org >> > https://mail.scipy.org/mailman/listinfo/numpy-discussion >> > >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bloring at lbl.gov Mon Mar 28 16:44:29 2016 From: bloring at lbl.gov (Burlen Loring) Date: Mon, 28 Mar 2016 13:44:29 -0700 Subject: [Numpy-discussion] numpy in python callback from threaded c++ Message-ID: <56F997AD.6060601@lbl.gov> Hi All, in my c++ code I've added Python binding via swig. one scenario is to pass a python function to do some computational work. the Python program runs in serial in the main thread but work is handled by a thread pool, the callback is invoked from another thread on unique data. Before a thread invokes the Python callback it acquires Python's GIL. Also I PyEval_InitThreads during module initialization, and have swig'd with -threads flag. However, I'm experiencing frequent crashes when thread pool size is greater than 1, and valgrind is reporting errors from numpy even in case where thread pool size is 1. Here's the essence of the error reported by valgrind: ==10316== Invalid read of size 4 ==10316== at 0x4ED7D73: PyObject_Free (obmalloc.c:1013) ==10316== by 0x10D540B0: NpyIter_Deallocate (nditer_constr.c:699) .... ==10316== Address 0x20034020 is 3,856 bytes inside a block of size 4,097 free'd ==10316== at 0x4C29E00: free (vg_replace_malloc.c:530) ==10316== by 0x4F57B22: import_module_level (import.c:2278) ==10316== by 0x4F57B22: PyImport_ImportModuleLevel (import.c:2292) ==10316== by 0x4F36597: builtin___import__ (bltinmodule.c:49) ==10316== by 0x4E89AC2: PyObject_Call (abstract.c:2546) ==10316== by 0x4E89C1A: call_function_tail (abstract.c:2578) ==10316== by 0x4E89C1A: PyObject_CallFunction (abstract.c:2602) ==10316== by 0x4F58735: PyImport_Import (import.c:2890) ==10316== by 0x4F588B9: PyImport_ImportModule (import.c:2133) ==10316== by 0x10D334C2: get_forwarding_ndarray_method (methods.c:57) ==10316== by 0x10D372C0: array_mean (methods.c:1932) ==10316== by 0x4F40AC7: call_function (ceval.c:4350) There are a few of these reported. I'll attach the full output. This is from the simplest scenario, where the thread pool has a size of 1. Although there are 2 threads, the program is serial as the main thread passes work tasks to the thread pool and waits for work to finish. Here is the work function where above occurs: def execute(port, data_in, req): sys.stderr.write('descriptive_stats::execute MPI %d\n'%(rank)) mesh = as_teca_cartesian_mesh(data_in[0]) table = teca_table.New() table.declare_columns(['step','time'], ['ul','d']) table << mesh.get_time_step() << mesh.get_time() for var_name in var_names: table.declare_columns(['min '+var_name, 'avg '+var_name, \ 'max '+var_name, 'std '+var_name, 'low_q '+var_name, \ 'med '+var_name, 'up_q '+var_name], ['d']*7) var = mesh.get_point_arrays().get(var_name).as_array() table << float(np.min(var)) << float(np.average(var)) \ << float(np.max(var)) << float(np.std(var)) \ << map(float, np.percentile(var, [25.,50.,75.])) return table Again, I'm acquiring the GIL so this should be executed in serial. What am I doing wrong? Have I missed some key aspect of using numpy in this scenario? Any documentation on using numpy in a scenario like this? Any help is greatly appreciated! Thanks Burlen -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ==10316== Thread 2: ==10316== Invalid read of size 4 ==10316== at 0x4ED7D73: PyObject_Free (obmalloc.c:1013) ==10316== by 0x10D540B0: NpyIter_Deallocate (nditer_constr.c:699) ==10316== by 0x112B01EE: iterator_loop (ufunc_object.c:1511) ==10316== by 0x112B01EE: execute_legacy_ufunc_loop (ufunc_object.c:1660) ==10316== by 0x112B01EE: PyUFunc_GenericFunction (ufunc_object.c:2627) ==10316== by 0x112B0D95: ufunc_generic_call (ufunc_object.c:4253) ==10316== by 0x4E89AC2: PyObject_Call (abstract.c:2546) ==10316== by 0x4F3E009: do_call (ceval.c:4568) ==10316== by 0x4F3E009: call_function (ceval.c:4373) ==10316== by 0x4F3E009: PyEval_EvalFrameEx (ceval.c:2987) ==10316== by 0x4F41F9B: PyEval_EvalCodeEx (ceval.c:3582) ==10316== by 0x4F3E3DE: fast_function (ceval.c:4446) ==10316== by 0x4F3E3DE: call_function (ceval.c:4371) ==10316== by 0x4F3E3DE: PyEval_EvalFrameEx (ceval.c:2987) ==10316== by 0x4F41F9B: PyEval_EvalCodeEx (ceval.c:3582) ==10316== by 0x4F3E3DE: fast_function (ceval.c:4446) ==10316== by 0x4F3E3DE: call_function (ceval.c:4371) ==10316== by 0x4F3E3DE: PyEval_EvalFrameEx (ceval.c:2987) ==10316== by 0x4F41F9B: PyEval_EvalCodeEx (ceval.c:3582) ==10316== by 0x4F3E3DE: fast_function (ceval.c:4446) ==10316== by 0x4F3E3DE: call_function (ceval.c:4371) ==10316== by 0x4F3E3DE: PyEval_EvalFrameEx (ceval.c:2987) ==10316== by 0x4F41F9B: PyEval_EvalCodeEx (ceval.c:3582) ==10316== by 0x4EBA5DB: function_call (funcobject.c:526) ==10316== by 0x4E89AC2: PyObject_Call (abstract.c:2546) ==10316== by 0x4F38096: PyEval_CallObjectWithKeywords (ceval.c:4219) ==10316== by 0x1E8A088A: teca_py_algorithm::execute_callback::operator()(unsigned int, std::vector, std::allocator > > const&, teca_metadata const&) (teca_py_algorithm.h:279) ==10316== by 0x1E8E2BDC: std::_Function_handler (unsigned int, std::vector, std::allocator > > const&, teca_metadata const&), teca_py_algorithm::execute_callback>::_M_invoke(std::_Any_data const&, unsigned int&&, std::vector, std::allocator > > const&, teca_metadata const&) (functional:1857) ==10316== by 0x1EE1046A: std::function (unsigned int, std::vector, std::allocator > > const&, teca_metadata const&)>::operator()(unsigned int, std::vector, std::allocator > > const&, teca_metadata const&) const (functional:2271) ==10316== by 0x1EE0F51A: teca_programmable_algorithm::execute(unsigned int, std::vector, std::allocator > > const&, teca_metadata const&) (teca_programmable_algorithm.cxx:103) ==10316== by 0xFAB96A4: teca_algorithm::request_data(std::pair, unsigned int>&, teca_metadata const&) (teca_algorithm.cxx:627) ==10316== by 0xFAF854D: teca_data_request::operator()() (teca_threaded_algorithm.cxx:43) ==10316== by 0xFB02975: _ZSt8__invokeI17teca_data_requestIEENSt9enable_ifIXaaaantsrSt17is_member_pointerIT_E5valuentsrSt11is_functionIS3_E5valuentsrS5_INSt14remove_pointerIS3_E4typeEE5valueENSt9result_ofIFRS3_DpOT0_EE4typeEE4typeESC_SF_ (functional:201) ==10316== by 0xFB02721: std::result_of::type std::reference_wrapper::operator()<>() const (functional:428) ==10316== by 0xFB02561: std::shared_ptr std::_Bind_simple ()>::_M_invoke<>(std::_Index_tuple<>) (functional:1531) ==10316== by 0xFB022EF: std::_Bind_simple ()>::operator()() (functional:1520) ==10316== by 0xFB0200D: std::__future_base::_Task_setter >, std::__future_base::_Result_base::_Deleter>, std::_Bind_simple ()>, std::shared_ptr >::operator()() const (future:1319) ==10316== by 0xFB01CEE: std::_Function_handler (), std::__future_base::_Task_setter >, std::__future_base::_Result_base::_Deleter>, std::_Bind_simple ()>, std::shared_ptr > >::_M_invoke(std::_Any_data const&) (functional:1857) ==10316== by 0xFAF8FBC: std::function ()>::operator()() const (functional:2271) ==10316== by 0xFAF8374: std::__future_base::_State_baseV2::_M_do_set(std::function ()>*, bool*) (future:527) ==10316== by 0xFAFD50D: void std::_Mem_fn_base ()>*, bool*), true>::operator() ()>*, bool*, void>(std::__future_base::_State_baseV2*, std::function ()>*&&, bool*&&) const (functional:600) ==10316== by 0xFAFC4F4: void std::_Bind_simple ()>*, bool*)> (std::__future_base::_State_baseV2*, std::function ()>*, bool*)>::_M_invoke<0ul, 1ul, 2ul>(std::_Index_tuple<0ul, 1ul, 2ul>) (functional:1531) ==10316== by 0xFAFB102: std::_Bind_simple ()>*, bool*)> (std::__future_base::_State_baseV2*, std::function ()>*, bool*)>::operator()() (functional:1520) ==10316== by 0xFAF9CF9: void std::__once_call_impl ()>*, bool*)> (std::__future_base::_State_baseV2*, std::function ()>*, bool*)> >() (mutex:697) ==10316== by 0x52480E8: __pthread_once_slow (pthread_once.c:116) ==10316== by 0xFAF55D8: __gthread_once(int*, void (*)()) (gthr-default.h:699) ==10316== by 0xFAF8BB5: void std::call_once ()>*, bool*), std::__future_base::_State_baseV2*, std::function ()>*, bool*>(std::once_flag&, void (std::__future_base::_State_baseV2::*&&)(std::function ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function ()>*&&, bool*&&) (mutex:729) ==10316== by 0xFAF7F10: std::__future_base::_State_baseV2::_M_set_result(std::function ()>, bool) (future:387) ==10316== by 0xFB00E53: std::__future_base::_Task_state, std::shared_ptr ()>::_M_run() (future:1403) ==10316== by 0xFAF95CC: std::packaged_task ()>::operator()() (future:1547) ==10316== by 0xFAF5926: teca_thread_pool::create_threads(unsigned int)::{lambda()#1}::operator()() const (teca_threaded_algorithm.cxx:123) ==10316== by 0xFAF76ED: void std::_Bind_simple::_M_invoke<>(std::_Index_tuple<>) (functional:1531) ==10316== by 0xFAF7656: std::_Bind_simple::operator()() (functional:1520) ==10316== by 0xFAF7593: std::thread::_Impl >::_M_run() (thread:115) ==10316== by 0x1030AF2F: ??? (in /usr/lib64/libstdc++.so.6.0.21) ==10316== by 0x5241609: start_thread (pthread_create.c:334) ==10316== Address 0x20034020 is 3,856 bytes inside a block of size 4,097 free'd ==10316== at 0x4C29E00: free (vg_replace_malloc.c:530) ==10316== by 0x4F57B22: import_module_level (import.c:2278) ==10316== by 0x4F57B22: PyImport_ImportModuleLevel (import.c:2292) ==10316== by 0x4F36597: builtin___import__ (bltinmodule.c:49) ==10316== by 0x4E89AC2: PyObject_Call (abstract.c:2546) ==10316== by 0x4E89C1A: call_function_tail (abstract.c:2578) ==10316== by 0x4E89C1A: PyObject_CallFunction (abstract.c:2602) ==10316== by 0x4F58735: PyImport_Import (import.c:2890) ==10316== by 0x4F588B9: PyImport_ImportModule (import.c:2133) ==10316== by 0x10D334C2: get_forwarding_ndarray_method (methods.c:57) ==10316== by 0x10D372C0: array_mean (methods.c:1932) ==10316== by 0x4F40AC7: call_function (ceval.c:4350) ==10316== by 0x4F40AC7: PyEval_EvalFrameEx (ceval.c:2987) ==10316== by 0x4F41F9B: PyEval_EvalCodeEx (ceval.c:3582) ==10316== by 0x4F3E3DE: fast_function (ceval.c:4446) ==10316== by 0x4F3E3DE: call_function (ceval.c:4371) ==10316== by 0x4F3E3DE: PyEval_EvalFrameEx (ceval.c:2987) ==10316== by 0x4F41F9B: PyEval_EvalCodeEx (ceval.c:3582) ==10316== by 0x4EBA5DB: function_call (funcobject.c:526) ==10316== by 0x4E89AC2: PyObject_Call (abstract.c:2546) ==10316== by 0x4F38096: PyEval_CallObjectWithKeywords (ceval.c:4219) ==10316== by 0x1E8A088A: teca_py_algorithm::execute_callback::operator()(unsigned int, std::vector, std::allocator > > const&, teca_metadata const&) (teca_py_algorithm.h:279) ==10316== by 0x1E8E2BDC: std::_Function_handler (unsigned int, std::vector, std::allocator > > const&, teca_metadata const&), teca_py_algorithm::execute_callback>::_M_invoke(std::_Any_data const&, unsigned int&&, std::vector, std::allocator > > const&, teca_metadata const&) (functional:1857) ==10316== by 0x1EE1046A: std::function (unsigned int, std::vector, std::allocator > > const&, teca_metadata const&)>::operator()(unsigned int, std::vector, std::allocator > > const&, teca_metadata const&) const (functional:2271) ==10316== by 0x1EE0F51A: teca_programmable_algorithm::execute(unsigned int, std::vector, std::allocator > > const&, teca_metadata const&) (teca_programmable_algorithm.cxx:103) ==10316== by 0xFAB96A4: teca_algorithm::request_data(std::pair, unsigned int>&, teca_metadata const&) (teca_algorithm.cxx:627) ==10316== by 0xFAF854D: teca_data_request::operator()() (teca_threaded_algorithm.cxx:43) ==10316== by 0xFB02975: _ZSt8__invokeI17teca_data_requestIEENSt9enable_ifIXaaaantsrSt17is_member_pointerIT_E5valuentsrSt11is_functionIS3_E5valuentsrS5_INSt14remove_pointerIS3_E4typeEE5valueENSt9result_ofIFRS3_DpOT0_EE4typeEE4typeESC_SF_ (functional:201) ==10316== by 0xFB02721: std::result_of::type std::reference_wrapper::operator()<>() const (functional:428) ==10316== by 0xFB02561: std::shared_ptr std::_Bind_simple ()>::_M_invoke<>(std::_Index_tuple<>) (functional:1531) ==10316== by 0xFB022EF: std::_Bind_simple ()>::operator()() (functional:1520) ==10316== by 0xFB0200D: std::__future_base::_Task_setter >, std::__future_base::_Result_base::_Deleter>, std::_Bind_simple ()>, std::shared_ptr >::operator()() const (future:1319) ==10316== by 0xFB01CEE: std::_Function_handler (), std::__future_base::_Task_setter >, std::__future_base::_Result_base::_Deleter>, std::_Bind_simple ()>, std::shared_ptr > >::_M_invoke(std::_Any_data const&) (functional:1857) ==10316== by 0xFAF8FBC: std::function ()>::operator()() const (functional:2271) ==10316== by 0xFAF8374: std::__future_base::_State_baseV2::_M_do_set(std::function ()>*, bool*) (future:527) ==10316== by 0xFAFD50D: void std::_Mem_fn_base ()>*, bool*), true>::operator() ()>*, bool*, void>(std::__future_base::_State_baseV2*, std::function ()>*&&, bool*&&) const (functional:600) ==10316== by 0xFAFC4F4: void std::_Bind_simple ()>*, bool*)> (std::__future_base::_State_baseV2*, std::function ()>*, bool*)>::_M_invoke<0ul, 1ul, 2ul>(std::_Index_tuple<0ul, 1ul, 2ul>) (functional:1531) ==10316== by 0xFAFB102: std::_Bind_simple ()>*, bool*)> (std::__future_base::_State_baseV2*, std::function ()>*, bool*)>::operator()() (functional:1520) ==10316== by 0xFAF9CF9: void std::__once_call_impl ()>*, bool*)> (std::__future_base::_State_baseV2*, std::function ()>*, bool*)> >() (mutex:697) ==10316== by 0x52480E8: __pthread_once_slow (pthread_once.c:116) ==10316== by 0xFAF55D8: __gthread_once(int*, void (*)()) (gthr-default.h:699) ==10316== by 0xFAF8BB5: void std::call_once ()>*, bool*), std::__future_base::_State_baseV2*, std::function ()>*, bool*>(std::once_flag&, void (std::__future_base::_State_baseV2::*&&)(std::function ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function ()>*&&, bool*&&) (mutex:729) ==10316== by 0xFAF7F10: std::__future_base::_State_baseV2::_M_set_result(std::function ()>, bool) (future:387) ==10316== by 0xFB00E53: std::__future_base::_Task_state, std::shared_ptr ()>::_M_run() (future:1403) ==10316== by 0xFAF95CC: std::packaged_task ()>::operator()() (future:1547) ==10316== by 0xFAF5926: teca_thread_pool::create_threads(unsigned int)::{lambda()#1}::operator()() const (teca_threaded_algorithm.cxx:123) ==10316== by 0xFAF76ED: void std::_Bind_simple::_M_invoke<>(std::_Index_tuple<>) (functional:1531) ==10316== by 0xFAF7656: std::_Bind_simple::operator()() (functional:1520) ==10316== by 0xFAF7593: std::thread::_Impl >::_M_run() (thread:115) ==10316== by 0x1030AF2F: ??? (in /usr/lib64/libstdc++.so.6.0.21) ==10316== by 0x5241609: start_thread (pthread_create.c:334) ==10316== Block was alloc'd at ==10316== at 0x4C28D06: malloc (vg_replace_malloc.c:299) ==10316== by 0x4F57800: import_module_level (import.c:2220) ==10316== by 0x4F57800: PyImport_ImportModuleLevel (import.c:2292) ==10316== by 0x4F36597: builtin___import__ (bltinmodule.c:49) ==10316== by 0x4E89AC2: PyObject_Call (abstract.c:2546) ==10316== by 0x4E89C1A: call_function_tail (abstract.c:2578) ==10316== by 0x4E89C1A: PyObject_CallFunction (abstract.c:2602) ==10316== by 0x4F58735: PyImport_Import (import.c:2890) ==10316== by 0x4F588B9: PyImport_ImportModule (import.c:2133) ==10316== by 0x10D334C2: get_forwarding_ndarray_method (methods.c:57) ==10316== by 0x10D372C0: array_mean (methods.c:1932) ==10316== by 0x4F40AC7: call_function (ceval.c:4350) ==10316== by 0x4F40AC7: PyEval_EvalFrameEx (ceval.c:2987) ==10316== by 0x4F41F9B: PyEval_EvalCodeEx (ceval.c:3582) ==10316== by 0x4F3E3DE: fast_function (ceval.c:4446) ==10316== by 0x4F3E3DE: call_function (ceval.c:4371) ==10316== by 0x4F3E3DE: PyEval_EvalFrameEx (ceval.c:2987) ==10316== by 0x4F41F9B: PyEval_EvalCodeEx (ceval.c:3582) ==10316== by 0x4EBA5DB: function_call (funcobject.c:526) ==10316== by 0x4E89AC2: PyObject_Call (abstract.c:2546) ==10316== by 0x4F38096: PyEval_CallObjectWithKeywords (ceval.c:4219) ==10316== by 0x1E8A088A: teca_py_algorithm::execute_callback::operator()(unsigned int, std::vector, std::allocator > > const&, teca_metadata const&) (teca_py_algorithm.h:279) ==10316== by 0x1E8E2BDC: std::_Function_handler (unsigned int, std::vector, std::allocator > > const&, teca_metadata const&), teca_py_algorithm::execute_callback>::_M_invoke(std::_Any_data const&, unsigned int&&, std::vector, std::allocator > > const&, teca_metadata const&) (functional:1857) ==10316== by 0x1EE1046A: std::function (unsigned int, std::vector, std::allocator > > const&, teca_metadata const&)>::operator()(unsigned int, std::vector, std::allocator > > const&, teca_metadata const&) const (functional:2271) ==10316== by 0x1EE0F51A: teca_programmable_algorithm::execute(unsigned int, std::vector, std::allocator > > const&, teca_metadata const&) (teca_programmable_algorithm.cxx:103) ==10316== by 0xFAB96A4: teca_algorithm::request_data(std::pair, unsigned int>&, teca_metadata const&) (teca_algorithm.cxx:627) ==10316== by 0xFAF854D: teca_data_request::operator()() (teca_threaded_algorithm.cxx:43) ==10316== by 0xFB02975: _ZSt8__invokeI17teca_data_requestIEENSt9enable_ifIXaaaantsrSt17is_member_pointerIT_E5valuentsrSt11is_functionIS3_E5valuentsrS5_INSt14remove_pointerIS3_E4typeEE5valueENSt9result_ofIFRS3_DpOT0_EE4typeEE4typeESC_SF_ (functional:201) ==10316== by 0xFB02721: std::result_of::type std::reference_wrapper::operator()<>() const (functional:428) ==10316== by 0xFB02561: std::shared_ptr std::_Bind_simple ()>::_M_invoke<>(std::_Index_tuple<>) (functional:1531) ==10316== by 0xFB022EF: std::_Bind_simple ()>::operator()() (functional:1520) ==10316== by 0xFB0200D: std::__future_base::_Task_setter >, std::__future_base::_Result_base::_Deleter>, std::_Bind_simple ()>, std::shared_ptr >::operator()() const (future:1319) ==10316== by 0xFB01CEE: std::_Function_handler (), std::__future_base::_Task_setter >, std::__future_base::_Result_base::_Deleter>, std::_Bind_simple ()>, std::shared_ptr > >::_M_invoke(std::_Any_data const&) (functional:1857) ==10316== by 0xFAF8FBC: std::function ()>::operator()() const (functional:2271) ==10316== by 0xFAF8374: std::__future_base::_State_baseV2::_M_do_set(std::function ()>*, bool*) (future:527) ==10316== by 0xFAFD50D: void std::_Mem_fn_base ()>*, bool*), true>::operator() ()>*, bool*, void>(std::__future_base::_State_baseV2*, std::function ()>*&&, bool*&&) const (functional:600) ==10316== by 0xFAFC4F4: void std::_Bind_simple ()>*, bool*)> (std::__future_base::_State_baseV2*, std::function ()>*, bool*)>::_M_invoke<0ul, 1ul, 2ul>(std::_Index_tuple<0ul, 1ul, 2ul>) (functional:1531) ==10316== by 0xFAFB102: std::_Bind_simple ()>*, bool*)> (std::__future_base::_State_baseV2*, std::function ()>*, bool*)>::operator()() (functional:1520) ==10316== by 0xFAF9CF9: void std::__once_call_impl ()>*, bool*)> (std::__future_base::_State_baseV2*, std::function ()>*, bool*)> >() (mutex:697) ==10316== by 0x52480E8: __pthread_once_slow (pthread_once.c:116) ==10316== by 0xFAF55D8: __gthread_once(int*, void (*)()) (gthr-default.h:699) ==10316== by 0xFAF8BB5: void std::call_once ()>*, bool*), std::__future_base::_State_baseV2*, std::function ()>*, bool*>(std::once_flag&, void (std::__future_base::_State_baseV2::*&&)(std::function ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function ()>*&&, bool*&&) (mutex:729) ==10316== by 0xFAF7F10: std::__future_base::_State_baseV2::_M_set_result(std::function ()>, bool) (future:387) ==10316== by 0xFB00E53: std::__future_base::_Task_state, std::shared_ptr ()>::_M_run() (future:1403) ==10316== by 0xFAF95CC: std::packaged_task ()>::operator()() (future:1547) ==10316== by 0xFAF5926: teca_thread_pool::create_threads(unsigned int)::{lambda()#1}::operator()() const (teca_threaded_algorithm.cxx:123) ==10316== by 0xFAF76ED: void std::_Bind_simple::_M_invoke<>(std::_Index_tuple<>) (functional:1531) ==10316== by 0xFAF7656: std::_Bind_simple::operator()() (functional:1520) ==10316== by 0xFAF7593: std::thread::_Impl >::_M_run() (thread:115) ==10316== by 0x1030AF2F: ??? (in /usr/lib64/libstdc++.so.6.0.21) ==10316== by 0x5241609: start_thread (pthread_create.c:334) ==10316== ==10316== Conditional jump or move depends on uninitialised value(s) ==10316== at 0x4ED7D7C: PyObject_Free (obmalloc.c:1013) ==10316== by 0x10D540B0: NpyIter_Deallocate (nditer_constr.c:699) ==10316== by 0x112B01EE: iterator_loop (ufunc_object.c:1511) ==10316== by 0x112B01EE: execute_legacy_ufunc_loop (ufunc_object.c:1660) ==10316== by 0x112B01EE: PyUFunc_GenericFunction (ufunc_object.c:2627) ==10316== by 0x112B0D95: ufunc_generic_call (ufunc_object.c:4253) ==10316== by 0x4E89AC2: PyObject_Call (abstract.c:2546) ==10316== by 0x4F3E009: do_call (ceval.c:4568) ==10316== by 0x4F3E009: call_function (ceval.c:4373) ==10316== by 0x4F3E009: PyEval_EvalFrameEx (ceval.c:2987) ==10316== by 0x4F41F9B: PyEval_EvalCodeEx (ceval.c:3582) ==10316== by 0x4F3E3DE: fast_function (ceval.c:4446) ==10316== by 0x4F3E3DE: call_function (ceval.c:4371) ==10316== by 0x4F3E3DE: PyEval_EvalFrameEx (ceval.c:2987) ==10316== by 0x4F41F9B: PyEval_EvalCodeEx (ceval.c:3582) ==10316== by 0x4F3E3DE: fast_function (ceval.c:4446) ==10316== by 0x4F3E3DE: call_function (ceval.c:4371) ==10316== by 0x4F3E3DE: PyEval_EvalFrameEx (ceval.c:2987) ==10316== by 0x4F41F9B: PyEval_EvalCodeEx (ceval.c:3582) ==10316== by 0x4F3E3DE: fast_function (ceval.c:4446) ==10316== by 0x4F3E3DE: call_function (ceval.c:4371) ==10316== by 0x4F3E3DE: PyEval_EvalFrameEx (ceval.c:2987) ==10316== by 0x4F41F9B: PyEval_EvalCodeEx (ceval.c:3582) ==10316== by 0x4EBA5DB: function_call (funcobject.c:526) ==10316== by 0x4E89AC2: PyObject_Call (abstract.c:2546) ==10316== by 0x4F38096: PyEval_CallObjectWithKeywords (ceval.c:4219) ==10316== by 0x1E8A088A: teca_py_algorithm::execute_callback::operator()(unsigned int, std::vector, std::allocator > > const&, teca_metadata const&) (teca_py_algorithm.h:279) ==10316== by 0x1E8E2BDC: std::_Function_handler (unsigned int, std::vector, std::allocator > > const&, teca_metadata const&), teca_py_algorithm::execute_callback>::_M_invoke(std::_Any_data const&, unsigned int&&, std::vector, std::allocator > > const&, teca_metadata const&) (functional:1857) ==10316== by 0x1EE1046A: std::function (unsigned int, std::vector, std::allocator > > const&, teca_metadata const&)>::operator()(unsigned int, std::vector, std::allocator > > const&, teca_metadata const&) const (functional:2271) ==10316== by 0x1EE0F51A: teca_programmable_algorithm::execute(unsigned int, std::vector, std::allocator > > const&, teca_metadata const&) (teca_programmable_algorithm.cxx:103) ==10316== by 0xFAB96A4: teca_algorithm::request_data(std::pair, unsigned int>&, teca_metadata const&) (teca_algorithm.cxx:627) ==10316== by 0xFAF854D: teca_data_request::operator()() (teca_threaded_algorithm.cxx:43) ==10316== by 0xFB02975: _ZSt8__invokeI17teca_data_requestIEENSt9enable_ifIXaaaantsrSt17is_member_pointerIT_E5valuentsrSt11is_functionIS3_E5valuentsrS5_INSt14remove_pointerIS3_E4typeEE5valueENSt9result_ofIFRS3_DpOT0_EE4typeEE4typeESC_SF_ (functional:201) ==10316== by 0xFB02721: std::result_of::type std::reference_wrapper::operator()<>() const (functional:428) ==10316== by 0xFB02561: std::shared_ptr std::_Bind_simple ()>::_M_invoke<>(std::_Index_tuple<>) (functional:1531) ==10316== by 0xFB022EF: std::_Bind_simple ()>::operator()() (functional:1520) ==10316== by 0xFB0200D: std::__future_base::_Task_setter >, std::__future_base::_Result_base::_Deleter>, std::_Bind_simple ()>, std::shared_ptr >::operator()() const (future:1319) ==10316== by 0xFB01CEE: std::_Function_handler (), std::__future_base::_Task_setter >, std::__future_base::_Result_base::_Deleter>, std::_Bind_simple ()>, std::shared_ptr > >::_M_invoke(std::_Any_data const&) (functional:1857) ==10316== by 0xFAF8FBC: std::function ()>::operator()() const (functional:2271) ==10316== by 0xFAF8374: std::__future_base::_State_baseV2::_M_do_set(std::function ()>*, bool*) (future:527) ==10316== by 0xFAFD50D: void std::_Mem_fn_base ()>*, bool*), true>::operator() ()>*, bool*, void>(std::__future_base::_State_baseV2*, std::function ()>*&&, bool*&&) const (functional:600) ==10316== by 0xFAFC4F4: void std::_Bind_simple ()>*, bool*)> (std::__future_base::_State_baseV2*, std::function ()>*, bool*)>::_M_invoke<0ul, 1ul, 2ul>(std::_Index_tuple<0ul, 1ul, 2ul>) (functional:1531) ==10316== by 0xFAFB102: std::_Bind_simple ()>*, bool*)> (std::__future_base::_State_baseV2*, std::function ()>*, bool*)>::operator()() (functional:1520) ==10316== by 0xFAF9CF9: void std::__once_call_impl ()>*, bool*)> (std::__future_base::_State_baseV2*, std::function ()>*, bool*)> >() (mutex:697) ==10316== by 0x52480E8: __pthread_once_slow (pthread_once.c:116) ==10316== by 0xFAF55D8: __gthread_once(int*, void (*)()) (gthr-default.h:699) ==10316== by 0xFAF8BB5: void std::call_once ()>*, bool*), std::__future_base::_State_baseV2*, std::function ()>*, bool*>(std::once_flag&, void (std::__future_base::_State_baseV2::*&&)(std::function ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function ()>*&&, bool*&&) (mutex:729) ==10316== by 0xFAF7F10: std::__future_base::_State_baseV2::_M_set_result(std::function ()>, bool) (future:387) ==10316== by 0xFB00E53: std::__future_base::_Task_state, std::shared_ptr ()>::_M_run() (future:1403) ==10316== by 0xFAF95CC: std::packaged_task ()>::operator()() (future:1547) ==10316== by 0xFAF5926: teca_thread_pool::create_threads(unsigned int)::{lambda()#1}::operator()() const (teca_threaded_algorithm.cxx:123) ==10316== by 0xFAF76ED: void std::_Bind_simple::_M_invoke<>(std::_Index_tuple<>) (functional:1531) ==10316== by 0xFAF7656: std::_Bind_simple::operator()() (functional:1520) ==10316== by 0xFAF7593: std::thread::_Impl >::_M_run() (thread:115) ==10316== by 0x1030AF2F: ??? (in /usr/lib64/libstdc++.so.6.0.21) ==10316== by 0x5241609: start_thread (pthread_create.c:334) ==10316== Uninitialised value was created by a heap allocation ==10316== at 0x4C29326: operator new(unsigned long) (vg_replace_malloc.c:334) ==10316== by 0x1E913C79: __gnu_cxx::new_allocator, std::allocator > >::allocate(unsigned long, void const*) (new_allocator.h:104) ==10316== by 0x1E900B2B: std::allocator_traits, std::allocator > > >::allocate(std::allocator, std::allocator > >&, unsigned long) (alloc_traits.h:360) ==10316== by 0x1E8EDA41: std::_Vector_base, std::allocator >, std::allocator, std::allocator > > >::_M_allocate(unsigned long) (stl_vector.h:170) ==10316== by 0x1E90C4B7: void std::vector, std::allocator >, std::allocator, std::allocator > > >::_M_emplace_back_aux(char const*&) (vector.tcc:412) ==10316== by 0x1E8F7C46: void std::vector, std::allocator >, std::allocator, std::allocator > > >::emplace_back(char const*&) (vector.tcc:101) ==10316== by 0x1E8E250C: void teca_array_collection::declare(char const*&, double) (teca_array_collection.h:126) ==10316== by 0x1E8C0729: void teca_table::declare_column(char const*&, double) (teca_table.h:153) ==10316== by 0x1E7C92D6: teca_table_declare_column(teca_table*, char const*, char const*) (teca_py_alg.cxx:7651) ==10316== by 0x1E7C9506: teca_table_declare_columns(teca_table*, _object*, _object*) (teca_py_alg.cxx:7704) ==10316== by 0x1E86E072: _wrap_teca_table_declare_columns (teca_py_alg.cxx:59491) ==10316== by 0x4F40AC7: call_function (ceval.c:4350) ==10316== by 0x4F40AC7: PyEval_EvalFrameEx (ceval.c:2987) ==10316== by 0x4F3E4F9: fast_function (ceval.c:4436) ==10316== by 0x4F3E4F9: call_function (ceval.c:4371) ==10316== by 0x4F3E4F9: PyEval_EvalFrameEx (ceval.c:2987) ==10316== by 0x4F41F9B: PyEval_EvalCodeEx (ceval.c:3582) ==10316== by 0x4EBA5DB: function_call (funcobject.c:526) ==10316== by 0x4E89AC2: PyObject_Call (abstract.c:2546) ==10316== by 0x4F38096: PyEval_CallObjectWithKeywords (ceval.c:4219) ==10316== by 0x1E8A088A: teca_py_algorithm::execute_callback::operator()(unsigned int, std::vector, std::allocator > > const&, teca_metadata const&) (teca_py_algorithm.h:279) ==10316== by 0x1E8E2BDC: std::_Function_handler (unsigned int, std::vector, std::allocator > > const&, teca_metadata const&), teca_py_algorithm::execute_callback>::_M_invoke(std::_Any_data const&, unsigned int&&, std::vector, std::allocator > > const&, teca_metadata const&) (functional:1857) ==10316== by 0x1EE1046A: std::function (unsigned int, std::vector, std::allocator > > const&, teca_metadata const&)>::operator()(unsigned int, std::vector, std::allocator > > const&, teca_metadata const&) const (functional:2271) ==10316== by 0x1EE0F51A: teca_programmable_algorithm::execute(unsigned int, std::vector, std::allocator > > const&, teca_metadata const&) (teca_programmable_algorithm.cxx:103) ==10316== by 0xFAB96A4: teca_algorithm::request_data(std::pair, unsigned int>&, teca_metadata const&) (teca_algorithm.cxx:627) ==10316== by 0xFAF854D: teca_data_request::operator()() (teca_threaded_algorithm.cxx:43) ==10316== by 0xFB02975: _ZSt8__invokeI17teca_data_requestIEENSt9enable_ifIXaaaantsrSt17is_member_pointerIT_E5valuentsrSt11is_functionIS3_E5valuentsrS5_INSt14remove_pointerIS3_E4typeEE5valueENSt9result_ofIFRS3_DpOT0_EE4typeEE4typeESC_SF_ (functional:201) ==10316== by 0xFB02721: std::result_of::type std::reference_wrapper::operator()<>() const (functional:428) ==10316== by 0xFB02561: std::shared_ptr std::_Bind_simple ()>::_M_invoke<>(std::_Index_tuple<>) (functional:1531) ==10316== by 0xFB022EF: std::_Bind_simple ()>::operator()() (functional:1520) ==10316== by 0xFB0200D: std::__future_base::_Task_setter >, std::__future_base::_Result_base::_Deleter>, std::_Bind_simple ()>, std::shared_ptr >::operator()() const (future:1319) ==10316== by 0xFB01CEE: std::_Function_handler (), std::__future_base::_Task_setter >, std::__future_base::_Result_base::_Deleter>, std::_Bind_simple ()>, std::shared_ptr > >::_M_invoke(std::_Any_data const&) (functional:1857) ==10316== by 0xFAF8FBC: std::function ()>::operator()() const (functional:2271) ==10316== by 0xFAF8374: std::__future_base::_State_baseV2::_M_do_set(std::function ()>*, bool*) (future:527) ==10316== by 0xFAFD50D: void std::_Mem_fn_base ()>*, bool*), true>::operator() ()>*, bool*, void>(std::__future_base::_State_baseV2*, std::function ()>*&&, bool*&&) const (functional:600) ==10316== by 0xFAFC4F4: void std::_Bind_simple ()>*, bool*)> (std::__future_base::_State_baseV2*, std::function ()>*, bool*)>::_M_invoke<0ul, 1ul, 2ul>(std::_Index_tuple<0ul, 1ul, 2ul>) (functional:1531) ==10316== by 0xFAFB102: std::_Bind_simple ()>*, bool*)> (std::__future_base::_State_baseV2*, std::function ()>*, bool*)>::operator()() (functional:1520) ==10316== by 0xFAF9CF9: void std::__once_call_impl ()>*, bool*)> (std::__future_base::_State_baseV2*, std::function ()>*, bool*)> >() (mutex:697) ==10316== by 0x52480E8: __pthread_once_slow (pthread_once.c:116) ==10316== by 0xFAF55D8: __gthread_once(int*, void (*)()) (gthr-default.h:699) ==10316== by 0xFAF8BB5: void std::call_once ()>*, bool*), std::__future_base::_State_baseV2*, std::function ()>*, bool*>(std::once_flag&, void (std::__future_base::_State_baseV2::*&&)(std::function ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function ()>*&&, bool*&&) (mutex:729) ==10316== by 0xFAF7F10: std::__future_base::_State_baseV2::_M_set_result(std::function ()>, bool) (future:387) ==10316== by 0xFB00E53: std::__future_base::_Task_state, std::shared_ptr ()>::_M_run() (future:1403) ==10316== by 0xFAF95CC: std::packaged_task ()>::operator()() (future:1547) ==10316== by 0xFAF5926: teca_thread_pool::create_threads(unsigned int)::{lambda()#1}::operator()() const (teca_threaded_algorithm.cxx:123) ==10316== by 0xFAF76ED: void std::_Bind_simple::_M_invoke<>(std::_Index_tuple<>) (functional:1531) ==10316== by 0xFAF7656: std::_Bind_simple::operator()() (functional:1520) ==10316== by 0xFAF7593: std::thread::_Impl >::_M_run() (thread:115) ==10316== by 0x1030AF2F: ??? (in /usr/lib64/libstdc++.so.6.0.21) ==10316== by 0x5241609: start_thread (pthread_create.c:334) ==10316== ==10316== Use of uninitialised value of size 8 ==10316== at 0x4ED7D94: PyObject_Free (obmalloc.c:1013) ==10316== by 0x10D540B0: NpyIter_Deallocate (nditer_constr.c:699) ==10316== by 0x112B01EE: iterator_loop (ufunc_object.c:1511) ==10316== by 0x112B01EE: execute_legacy_ufunc_loop (ufunc_object.c:1660) ==10316== by 0x112B01EE: PyUFunc_GenericFunction (ufunc_object.c:2627) ==10316== by 0x112B0D95: ufunc_generic_call (ufunc_object.c:4253) ==10316== by 0x4E89AC2: PyObject_Call (abstract.c:2546) ==10316== by 0x4F3E009: do_call (ceval.c:4568) ==10316== by 0x4F3E009: call_function (ceval.c:4373) ==10316== by 0x4F3E009: PyEval_EvalFrameEx (ceval.c:2987) ==10316== by 0x4F41F9B: PyEval_EvalCodeEx (ceval.c:3582) ==10316== by 0x4F3E3DE: fast_function (ceval.c:4446) ==10316== by 0x4F3E3DE: call_function (ceval.c:4371) ==10316== by 0x4F3E3DE: PyEval_EvalFrameEx (ceval.c:2987) ==10316== by 0x4F41F9B: PyEval_EvalCodeEx (ceval.c:3582) ==10316== by 0x4F3E3DE: fast_function (ceval.c:4446) ==10316== by 0x4F3E3DE: call_function (ceval.c:4371) ==10316== by 0x4F3E3DE: PyEval_EvalFrameEx (ceval.c:2987) ==10316== by 0x4F41F9B: PyEval_EvalCodeEx (ceval.c:3582) ==10316== by 0x4F3E3DE: fast_function (ceval.c:4446) ==10316== by 0x4F3E3DE: call_function (ceval.c:4371) ==10316== by 0x4F3E3DE: PyEval_EvalFrameEx (ceval.c:2987) ==10316== by 0x4F41F9B: PyEval_EvalCodeEx (ceval.c:3582) ==10316== by 0x4EBA5DB: function_call (funcobject.c:526) ==10316== by 0x4E89AC2: PyObject_Call (abstract.c:2546) ==10316== by 0x4F38096: PyEval_CallObjectWithKeywords (ceval.c:4219) ==10316== by 0x1E8A088A: teca_py_algorithm::execute_callback::operator()(unsigned int, std::vector, std::allocator > > const&, teca_metadata const&) (teca_py_algorithm.h:279) ==10316== by 0x1E8E2BDC: std::_Function_handler (unsigned int, std::vector, std::allocator > > const&, teca_metadata const&), teca_py_algorithm::execute_callback>::_M_invoke(std::_Any_data const&, unsigned int&&, std::vector, std::allocator > > const&, teca_metadata const&) (functional:1857) ==10316== by 0x1EE1046A: std::function (unsigned int, std::vector, std::allocator > > const&, teca_metadata const&)>::operator()(unsigned int, std::vector, std::allocator > > const&, teca_metadata const&) const (functional:2271) ==10316== by 0x1EE0F51A: teca_programmable_algorithm::execute(unsigned int, std::vector, std::allocator > > const&, teca_metadata const&) (teca_programmable_algorithm.cxx:103) ==10316== by 0xFAB96A4: teca_algorithm::request_data(std::pair, unsigned int>&, teca_metadata const&) (teca_algorithm.cxx:627) ==10316== by 0xFAF854D: teca_data_request::operator()() (teca_threaded_algorithm.cxx:43) ==10316== by 0xFB02975: _ZSt8__invokeI17teca_data_requestIEENSt9enable_ifIXaaaantsrSt17is_member_pointerIT_E5valuentsrSt11is_functionIS3_E5valuentsrS5_INSt14remove_pointerIS3_E4typeEE5valueENSt9result_ofIFRS3_DpOT0_EE4typeEE4typeESC_SF_ (functional:201) ==10316== by 0xFB02721: std::result_of::type std::reference_wrapper::operator()<>() const (functional:428) ==10316== by 0xFB02561: std::shared_ptr std::_Bind_simple ()>::_M_invoke<>(std::_Index_tuple<>) (functional:1531) ==10316== by 0xFB022EF: std::_Bind_simple ()>::operator()() (functional:1520) ==10316== by 0xFB0200D: std::__future_base::_Task_setter >, std::__future_base::_Result_base::_Deleter>, std::_Bind_simple ()>, std::shared_ptr >::operator()() const (future:1319) ==10316== by 0xFB01CEE: std::_Function_handler (), std::__future_base::_Task_setter >, std::__future_base::_Result_base::_Deleter>, std::_Bind_simple ()>, std::shared_ptr > >::_M_invoke(std::_Any_data const&) (functional:1857) ==10316== by 0xFAF8FBC: std::function ()>::operator()() const (functional:2271) ==10316== by 0xFAF8374: std::__future_base::_State_baseV2::_M_do_set(std::function ()>*, bool*) (future:527) ==10316== by 0xFAFD50D: void std::_Mem_fn_base ()>*, bool*), true>::operator() ()>*, bool*, void>(std::__future_base::_State_baseV2*, std::function ()>*&&, bool*&&) const (functional:600) ==10316== by 0xFAFC4F4: void std::_Bind_simple ()>*, bool*)> (std::__future_base::_State_baseV2*, std::function ()>*, bool*)>::_M_invoke<0ul, 1ul, 2ul>(std::_Index_tuple<0ul, 1ul, 2ul>) (functional:1531) ==10316== by 0xFAFB102: std::_Bind_simple ()>*, bool*)> (std::__future_base::_State_baseV2*, std::function ()>*, bool*)>::operator()() (functional:1520) ==10316== by 0xFAF9CF9: void std::__once_call_impl ()>*, bool*)> (std::__future_base::_State_baseV2*, std::function ()>*, bool*)> >() (mutex:697) ==10316== by 0x52480E8: __pthread_once_slow (pthread_once.c:116) ==10316== by 0xFAF55D8: __gthread_once(int*, void (*)()) (gthr-default.h:699) ==10316== by 0xFAF8BB5: void std::call_once ()>*, bool*), std::__future_base::_State_baseV2*, std::function ()>*, bool*>(std::once_flag&, void (std::__future_base::_State_baseV2::*&&)(std::function ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function ()>*&&, bool*&&) (mutex:729) ==10316== by 0xFAF7F10: std::__future_base::_State_baseV2::_M_set_result(std::function ()>, bool) (future:387) ==10316== by 0xFB00E53: std::__future_base::_Task_state, std::shared_ptr ()>::_M_run() (future:1403) ==10316== by 0xFAF95CC: std::packaged_task ()>::operator()() (future:1547) ==10316== by 0xFAF5926: teca_thread_pool::create_threads(unsigned int)::{lambda()#1}::operator()() const (teca_threaded_algorithm.cxx:123) ==10316== by 0xFAF76ED: void std::_Bind_simple::_M_invoke<>(std::_Index_tuple<>) (functional:1531) ==10316== by 0xFAF7656: std::_Bind_simple::operator()() (functional:1520) ==10316== by 0xFAF7593: std::thread::_Impl >::_M_run() (thread:115) ==10316== by 0x1030AF2F: ??? (in /usr/lib64/libstdc++.so.6.0.21) ==10316== by 0x5241609: start_thread (pthread_create.c:334) ==10316== Uninitialised value was created by a heap allocation ==10316== at 0x4C29326: operator new(unsigned long) (vg_replace_malloc.c:334) ==10316== by 0x1E913C79: __gnu_cxx::new_allocator, std::allocator > >::allocate(unsigned long, void const*) (new_allocator.h:104) ==10316== by 0x1E900B2B: std::allocator_traits, std::allocator > > >::allocate(std::allocator, std::allocator > >&, unsigned long) (alloc_traits.h:360) ==10316== by 0x1E8EDA41: std::_Vector_base, std::allocator >, std::allocator, std::allocator > > >::_M_allocate(unsigned long) (stl_vector.h:170) ==10316== by 0x1E90C4B7: void std::vector, std::allocator >, std::allocator, std::allocator > > >::_M_emplace_back_aux(char const*&) (vector.tcc:412) ==10316== by 0x1E8F7C46: void std::vector, std::allocator >, std::allocator, std::allocator > > >::emplace_back(char const*&) (vector.tcc:101) ==10316== by 0x1E8E250C: void teca_array_collection::declare(char const*&, double) (teca_array_collection.h:126) ==10316== by 0x1E8C0729: void teca_table::declare_column(char const*&, double) (teca_table.h:153) ==10316== by 0x1E7C92D6: teca_table_declare_column(teca_table*, char const*, char const*) (teca_py_alg.cxx:7651) ==10316== by 0x1E7C9506: teca_table_declare_columns(teca_table*, _object*, _object*) (teca_py_alg.cxx:7704) ==10316== by 0x1E86E072: _wrap_teca_table_declare_columns (teca_py_alg.cxx:59491) ==10316== by 0x4F40AC7: call_function (ceval.c:4350) ==10316== by 0x4F40AC7: PyEval_EvalFrameEx (ceval.c:2987) ==10316== by 0x4F3E4F9: fast_function (ceval.c:4436) ==10316== by 0x4F3E4F9: call_function (ceval.c:4371) ==10316== by 0x4F3E4F9: PyEval_EvalFrameEx (ceval.c:2987) ==10316== by 0x4F41F9B: PyEval_EvalCodeEx (ceval.c:3582) ==10316== by 0x4EBA5DB: function_call (funcobject.c:526) ==10316== by 0x4E89AC2: PyObject_Call (abstract.c:2546) ==10316== by 0x4F38096: PyEval_CallObjectWithKeywords (ceval.c:4219) ==10316== by 0x1E8A088A: teca_py_algorithm::execute_callback::operator()(unsigned int, std::vector, std::allocator > > const&, teca_metadata const&) (teca_py_algorithm.h:279) ==10316== by 0x1E8E2BDC: std::_Function_handler (unsigned int, std::vector, std::allocator > > const&, teca_metadata const&), teca_py_algorithm::execute_callback>::_M_invoke(std::_Any_data const&, unsigned int&&, std::vector, std::allocator > > const&, teca_metadata const&) (functional:1857) ==10316== by 0x1EE1046A: std::function (unsigned int, std::vector, std::allocator > > const&, teca_metadata const&)>::operator()(unsigned int, std::vector, std::allocator > > const&, teca_metadata const&) const (functional:2271) ==10316== by 0x1EE0F51A: teca_programmable_algorithm::execute(unsigned int, std::vector, std::allocator > > const&, teca_metadata const&) (teca_programmable_algorithm.cxx:103) ==10316== by 0xFAB96A4: teca_algorithm::request_data(std::pair, unsigned int>&, teca_metadata const&) (teca_algorithm.cxx:627) ==10316== by 0xFAF854D: teca_data_request::operator()() (teca_threaded_algorithm.cxx:43) ==10316== by 0xFB02975: _ZSt8__invokeI17teca_data_requestIEENSt9enable_ifIXaaaantsrSt17is_member_pointerIT_E5valuentsrSt11is_functionIS3_E5valuentsrS5_INSt14remove_pointerIS3_E4typeEE5valueENSt9result_ofIFRS3_DpOT0_EE4typeEE4typeESC_SF_ (functional:201) ==10316== by 0xFB02721: std::result_of::type std::reference_wrapper::operator()<>() const (functional:428) ==10316== by 0xFB02561: std::shared_ptr std::_Bind_simple ()>::_M_invoke<>(std::_Index_tuple<>) (functional:1531) ==10316== by 0xFB022EF: std::_Bind_simple ()>::operator()() (functional:1520) ==10316== by 0xFB0200D: std::__future_base::_Task_setter >, std::__future_base::_Result_base::_Deleter>, std::_Bind_simple ()>, std::shared_ptr >::operator()() const (future:1319) ==10316== by 0xFB01CEE: std::_Function_handler (), std::__future_base::_Task_setter >, std::__future_base::_Result_base::_Deleter>, std::_Bind_simple ()>, std::shared_ptr > >::_M_invoke(std::_Any_data const&) (functional:1857) ==10316== by 0xFAF8FBC: std::function ()>::operator()() const (functional:2271) ==10316== by 0xFAF8374: std::__future_base::_State_baseV2::_M_do_set(std::function ()>*, bool*) (future:527) ==10316== by 0xFAFD50D: void std::_Mem_fn_base ()>*, bool*), true>::operator() ()>*, bool*, void>(std::__future_base::_State_baseV2*, std::function ()>*&&, bool*&&) const (functional:600) ==10316== by 0xFAFC4F4: void std::_Bind_simple ()>*, bool*)> (std::__future_base::_State_baseV2*, std::function ()>*, bool*)>::_M_invoke<0ul, 1ul, 2ul>(std::_Index_tuple<0ul, 1ul, 2ul>) (functional:1531) ==10316== by 0xFAFB102: std::_Bind_simple ()>*, bool*)> (std::__future_base::_State_baseV2*, std::function ()>*, bool*)>::operator()() (functional:1520) ==10316== by 0xFAF9CF9: void std::__once_call_impl ()>*, bool*)> (std::__future_base::_State_baseV2*, std::function ()>*, bool*)> >() (mutex:697) ==10316== by 0x52480E8: __pthread_once_slow (pthread_once.c:116) ==10316== by 0xFAF55D8: __gthread_once(int*, void (*)()) (gthr-default.h:699) ==10316== by 0xFAF8BB5: void std::call_once ()>*, bool*), std::__future_base::_State_baseV2*, std::function ()>*, bool*>(std::once_flag&, void (std::__future_base::_State_baseV2::*&&)(std::function ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function ()>*&&, bool*&&) (mutex:729) ==10316== by 0xFAF7F10: std::__future_base::_State_baseV2::_M_set_result(std::function ()>, bool) (future:387) ==10316== by 0xFB00E53: std::__future_base::_Task_state, std::shared_ptr ()>::_M_run() (future:1403) ==10316== by 0xFAF95CC: std::packaged_task ()>::operator()() (future:1547) ==10316== by 0xFAF5926: teca_thread_pool::create_threads(unsigned int)::{lambda()#1}::operator()() const (teca_threaded_algorithm.cxx:123) ==10316== by 0xFAF76ED: void std::_Bind_simple::_M_invoke<>(std::_Index_tuple<>) (functional:1531) ==10316== by 0xFAF7656: std::_Bind_simple::operator()() (functional:1520) ==10316== by 0xFAF7593: std::thread::_Impl >::_M_run() (thread:115) ==10316== by 0x1030AF2F: ??? (in /usr/lib64/libstdc++.so.6.0.21) ==10316== by 0x5241609: start_thread (pthread_create.c:334) From matthew.brett at gmail.com Mon Mar 28 17:33:42 2016 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 28 Mar 2016 14:33:42 -0700 Subject: [Numpy-discussion] Using OpenBLAS for manylinux wheels Message-ID: Hi, Olivier Grisel and I are working on building and testing manylinux wheels for numpy and scipy. We first thought that we should use ATLAS BLAS, but Olivier found that my build of these could be very slow [1]. I set up a testing grid [2] which found test errors for numpy and scipy using ATLAS wheels. On the other hand, the same testing grid finds no errors or failures [3] using latest OpenBLAS (0.2.17) and running tests for: numpy scipy scikit-learn numexpr pandas statsmodels This is on the travis-ci ubuntu VMs. Please do test on your own machines with something like this script [4]: source test_manylinux.sh We have worried in the past about the reliability of OpenBLAS, but I find these tests reassuring. Are there any other tests of OpenBLAS that we should run to assure ourselves that it is safe to use? Matthew [1] https://github.com/matthew-brett/manylinux-builds/issues/4#issue-143530908 [2] https://travis-ci.org/matthew-brett/manylinux-testing/builds/118780781 [3] I disabled a few pandas tests which were failing for reasons not related to BLAS. Some of the statsmodels test runs time out. [4] https://gist.github.com/matthew-brett/2fd9d9a29e022c297634 From jaime.frio at gmail.com Mon Mar 28 18:04:52 2016 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Tue, 29 Mar 2016 00:04:52 +0200 Subject: [Numpy-discussion] Make np.bincount output same dtype as weights In-Reply-To: References: <000f4242.46d2f6b061c54796@gmail.com> Message-ID: Have modified the PR to do the "promote integers to at least long" we do in np.sum. Jaime On Mon, Mar 28, 2016 at 9:55 PM, CJ Carey wrote: > Another +1 for Josef's interpretation from me. Consistency with np.sum > seems like the best option. > > On Sat, Mar 26, 2016 at 11:12 PM, Juan Nunez-Iglesias > wrote: > >> Thanks for clarifying, Jaime, and fwiw I agree with Josef: I would expect >> np.bincount to behave like np.sum with regards to promoting weights dtypes. >> Including bool. >> >> On Sun, Mar 27, 2016 at 1:58 PM, wrote: >> >>> On Sat, Mar 26, 2016 at 9:54 PM, Joseph Fox-Rabinovitz >>> wrote: >>> > Would it make sense to just make the output type large enough to hold >>> the >>> > cumulative sum of the weights? >>> > >>> > >>> > - Joseph Fox-Rabinovitz >>> > >>> > ------ Original message------ >>> > >>> > From: Jaime Fern?ndez del R?o >>> > >>> > Date: Sat, Mar 26, 2016 16:16 >>> > >>> > To: Discussion of Numerical Python; >>> > >>> > Subject:[Numpy-discussion] Make np.bincount output same dtype as >>> weights >>> > >>> > Hi all, >>> > >>> > I have just submitted a PR (#7464) that fixes an enhancement request >>> > (#6854), making np.bincount return an array of the same type as the >>> weights >>> > parameter. This is an important deviation from current behavior, which >>> > always casts weights to double, and always returns a double array, so I >>> > would like to hear what others think about the worthiness of this. >>> Main >>> > discussion points: >>> > >>> > np.bincount now works with complex weights (yay!), I guess this should >>> be a >>> > pretty uncontroversial enhancement. >>> > The return is of the same type as weights, which means that small >>> integers >>> > are very likely to overflow. This is exactly what #6854 requested, but >>> > perhaps we should promote the output for integers to a long, as we do >>> in >>> > np.sum? >>> >>> I always thought of bincount with weights just as a group-by sum. So >>> it would be easier to remember and have fewer surprises if it matches >>> the behavior of np.sum. >>> >>> > Boolean arrays stay boolean, and OR, rather than sum, the weights. Is >>> this >>> > what one would want? If we decide that integer promotion is the way to >>> go, >>> > perhaps booleans should go in the same pack? >>> >>> Isn't this calculating the sum, i.e. count of True by group, already? >>> Based on a quick example with numpy 1.9.2, I don't think I ever used >>> bool weights before. >>> >>> >>> > This new implementation currently supports all of the reasonable native >>> > types, but has no fallback for user defined types. I guess we should >>> > attempt to cast the array to double as before if no native loop can be >>> > found? It would be good to have a way of testing this though, any >>> thoughts >>> > on how to go about this? >>> > Does a behavior change like this require some deprecation period? What >>> would >>> > that look like? >>> > I have also added broadcasting of weights to the full size of list, so >>> that >>> > one can do e.g. np.bincount([1, 2, 3], weights=2j) without having to >>> tile >>> > the single weight to the size of the bins list. >>> > >>> > Any other thoughts are very welcome as well! >>> >>> (2-D weights ?) >>> >>> >>> Josef >>> >>> >>> > >>> > Jaime >>> > >>> > -- >>> > (__/) >>> > ( O.o) >>> > ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus >>> planes de >>> > dominaci?n mundial. >>> > >>> > _______________________________________________ >>> > NumPy-Discussion mailing list >>> > NumPy-Discussion at scipy.org >>> > https://mail.scipy.org/mailman/listinfo/numpy-discussion >>> > >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> https://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From insertinterestingnamehere at gmail.com Mon Mar 28 18:52:31 2016 From: insertinterestingnamehere at gmail.com (Ian Henriksen) Date: Mon, 28 Mar 2016 22:52:31 +0000 Subject: [Numpy-discussion] ATLAS build errors In-Reply-To: References: Message-ID: On Sat, Mar 26, 2016 at 3:06 PM Matthew Brett wrote: > Hi, > > I'm workon on building manylinux wheels for numpy, and I ran into > unexpected problems with a numpy built against the ATLAS 3.8 binaries > supplied by CentOS 5. > > I'm working on the manylinux docker container [1] > > To get ATLAS, I'm doing `yum install atlas-devel` which gets the > default CentOS 5 ATLAS packages. > > I then build numpy. Local tests work fine, but when I test on travis, > I get these errors [2]: > > ====================================================================== > ERROR: test_svd_build (test_regression.TestRegression) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File > "/home/travis/build/matthew-brett/manylinux-testing/venv/lib/python2.7/site-packages/numpy/linalg/tests/test_regression.py", > line 56, in test_svd_build > u, s, vh = linalg.svd(a) > File > "/home/travis/build/matthew-brett/manylinux-testing/venv/lib/python2.7/site-packages/numpy/linalg/linalg.py", > line 1359, in svd > u, s, vt = gufunc(a, signature=signature, extobj=extobj) > ValueError: On entry to DGESDD parameter number 12 had an illegal value > > ====================================================================== > FAIL: test_lapack (test_build.TestF77Mismatch) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File > "/home/travis/build/matthew-brett/manylinux-testing/venv/lib/python2.7/site-packages/numpy/testing/decorators.py", > line 146, in skipper_func > return f(*args, **kwargs) > File > "/home/travis/build/matthew-brett/manylinux-testing/venv/lib/python2.7/site-packages/numpy/linalg/tests/test_build.py", > line 56, in test_lapack > information.""") > AssertionError: Both g77 and gfortran runtimes linked in lapack_lite ! > This is likely to > cause random crashes and wrong results. See numpy INSTALL.txt for more > information. > > > Sure enough, scipy built the same way segfaults or fails to import (see > [2]). > > I get no errors for an openblas build. > > Does anyone recognize these? How should I modify the build to avoid them? > > Cheers, > > Matthew > > > [1] https://github.com/pypa/manylinux > [2] https://travis-ci.org/matthew-brett/manylinux-testing/jobs/118712090 > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion The error regarding parameter 12 of dgesdd sounds a lot like https://github.com/scipy/scipy/issues/5039 where the issue was that the LAPACK version was too old. CentOS 5 is pretty old, so I wouldn't be surprised if that were the case here too. In general, you can't expect Linux distros to have a uniform shared object interface for LAPACK, so you don't gain much by using the version that ships with CentOS 5 beyond not having to compile it all yourself. It might be better to use a newer LAPACK built from source with the older toolchains already there. Best, -Ian Henriksen -------------- next part -------------- An HTML attachment was scrubbed... URL: From olivier.grisel at ensta.org Tue Mar 29 08:13:29 2016 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Tue, 29 Mar 2016 14:13:29 +0200 Subject: [Numpy-discussion] Using OpenBLAS for manylinux wheels In-Reply-To: References: Message-ID: I just tested those new openblas-based wheels on the linux virtualbox setup that I used to report the following segfault back in February: https://mail.scipy.org/pipermail/numpy-discussion/2016-February/074866.html https://mail.scipy.org/pipermail/numpy-discussion/2016-February/074870.html I cannot reproduce this problem anymore with this new version of OpenBLAS. All scikit-learn tests pass (I had originally encountered the segfault when runing the scikit-learn tests). I also ran the numpy and scipy tests successfully on that machine. What I find reassuring is that the upstream OpenBLAS has set up a buildbot based CI to test OpenBLAS on many CPU architectures and is running the scipy test continuously to detect regressions early on: https://github.com/xianyi/OpenBLAS/issues/785 http://build.openblas.net/waterfall https://github.com/xianyi/OpenBLAS-CI/ -- Olivier Grisel From ben.v.root at gmail.com Tue Mar 29 13:46:45 2016 From: ben.v.root at gmail.com (Benjamin Root) Date: Tue, 29 Mar 2016 13:46:45 -0400 Subject: [Numpy-discussion] Reflect array? Message-ID: Is there a quick-n-easy way to reflect a NxM array that represents a quadrant into a 2Nx2M array? Essentially, I am trying to reduce the size of an expensive calculation by taking advantage of the fact that the first part of the calculation is just computing gaussian weights, which is radially symmetric. It doesn't seem like np.tile() could support this (yet?). Maybe we could allow negative repetitions to mean "reflected"? But I was hoping there was some existing function or stride trick that could accomplish what I am trying. x = np.linspace(-5, 5, 20) y = np.linspace(-5, 5, 24) z = np.hypot(x[None, :], y[:, None]) zz = np.hypot(x[None, :int(len(x)//2)], y[:int(len(y)//2), None]) zz = some_mirroring_trick(zz) assert np.all(z == zz) What can be my "some_mirroring_trick()"? I am hoping for something a little better than using hstack()/vstack(). Thanks, Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From jfoxrabinovitz at gmail.com Tue Mar 29 13:58:44 2016 From: jfoxrabinovitz at gmail.com (Joseph Fox-Rabinovitz) Date: Tue, 29 Mar 2016 13:58:44 -0400 Subject: [Numpy-discussion] Reflect array? In-Reply-To: References: Message-ID: On Tue, Mar 29, 2016 at 1:46 PM, Benjamin Root wrote: > Is there a quick-n-easy way to reflect a NxM array that represents a > quadrant into a 2Nx2M array? Essentially, I am trying to reduce the size of > an expensive calculation by taking advantage of the fact that the first part > of the calculation is just computing gaussian weights, which is radially > symmetric. > > It doesn't seem like np.tile() could support this (yet?). Maybe we could > allow negative repetitions to mean "reflected"? But I was hoping there was > some existing function or stride trick that could accomplish what I am > trying. > > x = np.linspace(-5, 5, 20) > y = np.linspace(-5, 5, 24) > z = np.hypot(x[None, :], y[:, None]) > zz = np.hypot(x[None, :int(len(x)//2)], y[:int(len(y)//2), None]) > zz = some_mirroring_trick(zz) Are you looking for something like this: zz = np.hypot.outer(y[:len(y)//2], x[:len(x)//2]) zz = np.concatenate((zz[:, ::-1], zz), axis=1) zz = np.concatenate((zz, zz[::-1, :])) > assert np.all(z == zz) > > What can be my "some_mirroring_trick()"? I am hoping for something a little > better than using hstack()/vstack(). > > Thanks, > Ben Root > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > From ben.v.root at gmail.com Tue Mar 29 14:02:07 2016 From: ben.v.root at gmail.com (Benjamin Root) Date: Tue, 29 Mar 2016 14:02:07 -0400 Subject: [Numpy-discussion] Reflect array? In-Reply-To: References: Message-ID: Along those lines, yes, but you have to be careful of even/odd dimension lengths. Would be nice if it was some sort of stride trick so that I don't have to allocate a new array twice as we do in the concatenation steps. Cheers! Ben Root On Tue, Mar 29, 2016 at 1:58 PM, Joseph Fox-Rabinovitz < jfoxrabinovitz at gmail.com> wrote: > On Tue, Mar 29, 2016 at 1:46 PM, Benjamin Root > wrote: > > Is there a quick-n-easy way to reflect a NxM array that represents a > > quadrant into a 2Nx2M array? Essentially, I am trying to reduce the size > of > > an expensive calculation by taking advantage of the fact that the first > part > > of the calculation is just computing gaussian weights, which is radially > > symmetric. > > > > It doesn't seem like np.tile() could support this (yet?). Maybe we could > > allow negative repetitions to mean "reflected"? But I was hoping there > was > > some existing function or stride trick that could accomplish what I am > > trying. > > > > x = np.linspace(-5, 5, 20) > > y = np.linspace(-5, 5, 24) > > z = np.hypot(x[None, :], y[:, None]) > > zz = np.hypot(x[None, :int(len(x)//2)], y[:int(len(y)//2), None]) > > zz = some_mirroring_trick(zz) > > Are you looking for something like this: > > zz = np.hypot.outer(y[:len(y)//2], x[:len(x)//2]) > zz = np.concatenate((zz[:, ::-1], zz), axis=1) > zz = np.concatenate((zz, zz[::-1, :])) > > > assert np.all(z == zz) > > > > What can be my "some_mirroring_trick()"? I am hoping for something a > little > > better than using hstack()/vstack(). > > > > Thanks, > > Ben Root > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jfoxrabinovitz at gmail.com Tue Mar 29 14:17:01 2016 From: jfoxrabinovitz at gmail.com (Joseph Fox-Rabinovitz) Date: Tue, 29 Mar 2016 14:17:01 -0400 Subject: [Numpy-discussion] Reflect array? In-Reply-To: References: Message-ID: On Tue, Mar 29, 2016 at 2:02 PM, Benjamin Root wrote: > Along those lines, yes, but you have to be careful of even/odd dimension > lengths. Would be nice if it was some sort of stride trick so that I don't > have to allocate a new array twice as we do in the concatenation steps. > > Cheers! > > Ben Root > > On Tue, Mar 29, 2016 at 1:58 PM, Joseph Fox-Rabinovitz > wrote: >> >> On Tue, Mar 29, 2016 at 1:46 PM, Benjamin Root >> wrote: >> > Is there a quick-n-easy way to reflect a NxM array that represents a >> > quadrant into a 2Nx2M array? Essentially, I am trying to reduce the size >> > of >> > an expensive calculation by taking advantage of the fact that the first >> > part >> > of the calculation is just computing gaussian weights, which is radially >> > symmetric. >> > >> > It doesn't seem like np.tile() could support this (yet?). Maybe we could >> > allow negative repetitions to mean "reflected"? But I was hoping there >> > was >> > some existing function or stride trick that could accomplish what I am >> > trying. >> > >> > x = np.linspace(-5, 5, 20) >> > y = np.linspace(-5, 5, 24) >> > z = np.hypot(x[None, :], y[:, None]) >> > zz = np.hypot(x[None, :int(len(x)//2)], y[:int(len(y)//2), None]) >> > zz = some_mirroring_trick(zz) >> >> Are you looking for something like this: >> >> zz = np.hypot.outer(y[:len(y)//2], x[:len(x)//2]) >> zz = np.concatenate((zz[:, ::-1], zz), axis=1) >> zz = np.concatenate((zz, zz[::-1, :])) >> You can avoid the allocation with preallocation: nx = len(x) // 2 ny = len(y) // 2 zz = np.zeros((len(y), len(x))) zz[:ny,-nx:] = np.hypot.outer(y[:ny], x[:nx]) zz[:ny, :nx] = zz[:ny,:-nx-1:-1] zz[-ny:, :] = zz[ny::-1, :] if nx * 2 != len(x): zz[:ny, nx] = y[::-1] zz[-ny:, nx] = y if ny * 2 != len(y): zz[ny, :nx] = x[::-1] zz[ny, -nx:] = x All of the steps after the call to `hypot.outer` create views. This is untested, so you may need to tweak the indices a little. >> > assert np.all(z == zz) >> > >> > What can be my "some_mirroring_trick()"? I am hoping for something a >> > little >> > better than using hstack()/vstack(). >> > >> > Thanks, >> > Ben Root >> > >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at scipy.org >> > https://mail.scipy.org/mailman/listinfo/numpy-discussion >> > >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > From p.e.creasey.00 at googlemail.com Tue Mar 29 14:48:02 2016 From: p.e.creasey.00 at googlemail.com (Peter Creasey) Date: Tue, 29 Mar 2016 11:48:02 -0700 Subject: [Numpy-discussion] Reflect array? Message-ID: > >> On Tue, Mar 29, 2016 at 1:46 PM, Benjamin Root > >> wrote: > >> > Is there a quick-n-easy way to reflect a NxM array that represents a > >> > quadrant into a 2Nx2M array? Essentially, I am trying to reduce the size > >> > of > >> > an expensive calculation by taking advantage of the fact that the first > >> > part > >> > of the calculation is just computing gaussian weights, which is radially > >> > symmetric. > >> > > >> > It doesn't seem like np.tile() could support this (yet?). Maybe we could > >> > allow negative repetitions to mean "reflected"? But I was hoping there > >> > was > >> > some existing function or stride trick that could accomplish what I am > >> > trying. > >> > > >> > x = np.linspace(-5, 5, 20) > >> > y = np.linspace(-5, 5, 24) > >> > z = np.hypot(x[None, :], y[:, None]) > >> > zz = np.hypot(x[None, :int(len(x)//2)], y[:int(len(y)//2), None]) > >> > zz = some_mirroring_trick(zz) > >> > > You can avoid the allocation with preallocation: > > nx = len(x) // 2 > ny = len(y) // 2 > zz = np.zeros((len(y), len(x))) > zz[:ny,-nx:] = np.hypot.outer(y[:ny], x[:nx]) > zz[:ny, :nx] = zz[:ny,:-nx-1:-1] > zz[-ny:, :] = zz[ny::-1, :] > > if nx * 2 != len(x): > zz[:ny, nx] = y[::-1] > zz[-ny:, nx] = y > if ny * 2 != len(y): > zz[ny, :nx] = x[::-1] > zz[ny, -nx:] = x > > All of the steps after the call to `hypot.outer` create views. This is > untested, so you may need to tweak the indices a little. > A couple of months ago I wrote a C-code with ctypes to do this sort of mirroring trick on an (N,N,N) numpy array of fft weights (where you can exploit the 48-fold symmetry of using interchangeable axes), which was pretty useful since I had N^3 >> 1e9 and the weight function was quite expensive. Obviously the (N,M) case doesn't allow quite so much optimization but if it could be interesting then PM me. Best, Peter From travis at continuum.io Tue Mar 29 16:28:58 2016 From: travis at continuum.io (Travis Oliphant) Date: Tue, 29 Mar 2016 13:28:58 -0700 Subject: [Numpy-discussion] Blog-post that explains what Blaze actually is and where Pluribus project now lives. Message-ID: I have emailed this list in the past explaining what is driving my open source efforts now. Here is a blog-post that may help some of you understand at little bit of the history of Blaze, DyND Numba, and other related developments as they relate to scaling up and scaling out array-computing in Python. http://technicaldiscovery.blogspot.com/2016/03/anaconda-and-hadoop-story-of-journey.html This post and these projects do not have anything to do with the future of the NumPy and/or SciPy projects which are now in great hands guiding their community-driven development. The post is however, a discussion of additional projects that will hopefully benefit some of you as well, and for which your feedback and assistance is welcome. Best, -Travis -- *Travis Oliphant, PhD* *Co-founder and CEO* @teoliphant 512-222-5440 http://www.continuum.io -------------- next part -------------- An HTML attachment was scrubbed... URL: From jjhelmus at gmail.com Tue Mar 29 17:09:25 2016 From: jjhelmus at gmail.com (Jonathan Helmus) Date: Tue, 29 Mar 2016 16:09:25 -0500 Subject: [Numpy-discussion] Using OpenBLAS for manylinux wheels In-Reply-To: References: Message-ID: <56FAEF05.1060801@gmail.com> On 03/28/2016 04:33 PM, Matthew Brett wrote: > Please do test on your own machines with something like this script [4]: Matthew, I ran the tests after installing the wheels on my machine running Ubuntu 14.04. Three numpy tests failed with the GFORTRAN_1.4 error you mentioned in post to the wheel-builds list recently. All other tests passed. I can reproduce these failing tests in a Docker container if it is helpful. # python -c 'import numpy; numpy.test("full")' Running unit tests for numpy NumPy version 1.11.0 NumPy relaxed strides checking option: False NumPy is installed in /usr/local/lib/python2.7/dist-packages/numpy Python version 2.7.6 (default, Jun 22 2015, 17:58:13) [GCC 4.8.2] nose version 1.3.7 ... ====================================================================== ERROR: test_kind.TestKind.test_all ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line 381, in setUp try_run(self.inst, ('setup', 'setUp')) File "/usr/local/lib/python2.7/dist-packages/nose/util.py", line 471, in try_run return func() File "/usr/local/lib/python2.7/dist-packages/numpy/f2py/tests/util.py", line 358, in setUp module_name=self.module_name) File "/usr/local/lib/python2.7/dist-packages/numpy/f2py/tests/util.py", line 78, in wrapper memo[key] = func(*a, **kw) File "/usr/local/lib/python2.7/dist-packages/numpy/f2py/tests/util.py", line 149, in build_module __import__(module_name) ImportError: /usr/local/lib/python2.7/dist-packages/numpy/core/../.libs/libgfortran.so.3: version `GFORTRAN_1.4' not found (required by /tmp/tmptYznnz/_test_ext_module_5405.so) ====================================================================== ERROR: test_mixed.TestMixed.test_all ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line 381, in setUp try_run(self.inst, ('setup', 'setUp')) File "/usr/local/lib/python2.7/dist-packages/nose/util.py", line 471, in try_run return func() File "/usr/local/lib/python2.7/dist-packages/numpy/f2py/tests/util.py", line 358, in setUp module_name=self.module_name) File "/usr/local/lib/python2.7/dist-packages/numpy/f2py/tests/util.py", line 78, in wrapper memo[key] = func(*a, **kw) File "/usr/local/lib/python2.7/dist-packages/numpy/f2py/tests/util.py", line 149, in build_module __import__(module_name) ImportError: /usr/local/lib/python2.7/dist-packages/numpy/core/../.libs/libgfortran.so.3: version `GFORTRAN_1.4' not found (required by /tmp/tmptYznnz/_test_ext_module_5405.so) ====================================================================== ERROR: test_mixed.TestMixed.test_docstring ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line 381, in setUp try_run(self.inst, ('setup', 'setUp')) File "/usr/local/lib/python2.7/dist-packages/nose/util.py", line 471, in try_run return func() File "/usr/local/lib/python2.7/dist-packages/numpy/f2py/tests/util.py", line 358, in setUp module_name=self.module_name) File "/usr/local/lib/python2.7/dist-packages/numpy/f2py/tests/util.py", line 84, in wrapper raise ret ImportError: /usr/local/lib/python2.7/dist-packages/numpy/core/../.libs/libgfortran.so.3: version `GFORTRAN_1.4' not found (required by /tmp/tmptYznnz/_test_ext_module_5405.so) ---------------------------------------------------------------------- Ran 6322 tests in 136.678s FAILED (KNOWNFAIL=6, SKIP=11, errors=3) Cheers, - Jonathan Helmus From olivier.grisel at ensta.org Wed Mar 30 07:32:21 2016 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Wed, 30 Mar 2016 13:32:21 +0200 Subject: [Numpy-discussion] Using OpenBLAS for manylinux wheels In-Reply-To: <56FAEF05.1060801@gmail.com> References: <56FAEF05.1060801@gmail.com> Message-ID: The problem with the gfortran failures will be tackled by renaming the vendored libgfortran.so library, see: https://github.com/pypa/auditwheel/issues/24 This is orthogonal to the ATLAS vs OpenBLAS decision though. -- Olivier From freddyrietdijk at fridh.nl Wed Mar 30 14:26:17 2016 From: freddyrietdijk at fridh.nl (Freddy Rietdijk) Date: Wed, 30 Mar 2016 20:26:17 +0200 Subject: [Numpy-discussion] Using OpenBLAS for manylinux wheels In-Reply-To: References: <56FAEF05.1060801@gmail.com> Message-ID: On Nix/NixOS we've been using OpenBLAS 0.2.14 for some time now because we had some segmentation faults with 0.2.15 and scipy/scikitlearn. I've tested the packages you listed, and more, with OpenBLAS 0.2.17 and encountered no problems. On Wed, Mar 30, 2016 at 1:32 PM, Olivier Grisel wrote: > The problem with the gfortran failures will be tackled by renaming the > vendored libgfortran.so library, see: > > https://github.com/pypa/auditwheel/issues/24 > > This is orthogonal to the ATLAS vs OpenBLAS decision though. > > -- > Olivier > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From allanhaldane at gmail.com Wed Mar 30 15:37:14 2016 From: allanhaldane at gmail.com (Allan Haldane) Date: Wed, 30 Mar 2016 15:37:14 -0400 Subject: [Numpy-discussion] change to memmap subclass propagation Message-ID: <56FC2AEA.2000100@gmail.com> Hi all, This is a warning for a planned change to np.memmap in https://github.com/numpy/numpy/pull/7406. The return values of ufuncs and fancy slices of a memmap will now be plain ndarrays, since those return values don't point to mem-mapped memory. There is a possibility that if you are subclassing memmap using multiple inheritance something may break. We don't think anyone will be affected, but please reply if it does affect you. Allan From njs at pobox.com Wed Mar 30 17:30:47 2016 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 30 Mar 2016 14:30:47 -0700 Subject: [Numpy-discussion] Using OpenBLAS for manylinux wheels In-Reply-To: References: Message-ID: If OpenBLAS is looking like the easiest to support solution, then no objections here. (If 0.2.17 is genuinely working well, then maybe we want to switch to it on Windows too. I know Xianyi disabled some of the problematic kernels for us -- maybe that's enough. Mostly I just don't want to end up in the situation where we're trying to support a bunch of differently broken builds on different platforms.) On Mar 28, 2016 2:34 PM, "Matthew Brett" wrote: > Hi, > > Olivier Grisel and I are working on building and testing manylinux > wheels for numpy and scipy. > > We first thought that we should use ATLAS BLAS, but Olivier found that > my build of these could be very slow [1]. I set up a testing grid [2] > which found test errors for numpy and scipy using ATLAS wheels. > > On the other hand, the same testing grid finds no errors or failures > [3] using latest OpenBLAS (0.2.17) and running tests for: > > numpy > scipy > scikit-learn > numexpr > pandas > statsmodels > > This is on the travis-ci ubuntu VMs. > > Please do test on your own machines with something like this script [4]: > > source test_manylinux.sh > > We have worried in the past about the reliability of OpenBLAS, but I > find these tests reassuring. > > Are there any other tests of OpenBLAS that we should run to assure > ourselves that it is safe to use? > > Matthew > > [1] > https://github.com/matthew-brett/manylinux-builds/issues/4#issue-143530908 > [2] https://travis-ci.org/matthew-brett/manylinux-testing/builds/118780781 > [3] I disabled a few pandas tests which were failing for reasons not > related to BLAS. Some of the statsmodels test runs time out. > [4] https://gist.github.com/matthew-brett/2fd9d9a29e022c297634 > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cmkleffner at gmail.com Wed Mar 30 17:39:36 2016 From: cmkleffner at gmail.com (Carl Kleffner) Date: Wed, 30 Mar 2016 23:39:36 +0200 Subject: [Numpy-discussion] Using OpenBLAS for manylinux wheels In-Reply-To: References: Message-ID: I would like to see OpenBLAS support for numpy on windows. The latest OpenBLAS windows builds numpy support for are on https://bitbucket.org/carlkl/mingw-w64-for-python/downloads now. Scipy wheels should work regardless if numpy was build with MSVC or with mingwpy. It is only mandantory to agree about the BLAS implemenation used. Carl 2016-03-30 23:30 GMT+02:00 Nathaniel Smith : > If OpenBLAS is looking like the easiest to support solution, then no > objections here. (If 0.2.17 is genuinely working well, then maybe we want > to switch to it on Windows too. I know Xianyi disabled some of the > problematic kernels for us -- maybe that's enough. Mostly I just don't want > to end up in the situation where we're trying to support a bunch of > differently broken builds on different platforms.) > On Mar 28, 2016 2:34 PM, "Matthew Brett" wrote: > >> Hi, >> >> Olivier Grisel and I are working on building and testing manylinux >> wheels for numpy and scipy. >> >> We first thought that we should use ATLAS BLAS, but Olivier found that >> my build of these could be very slow [1]. I set up a testing grid [2] >> which found test errors for numpy and scipy using ATLAS wheels. >> >> On the other hand, the same testing grid finds no errors or failures >> [3] using latest OpenBLAS (0.2.17) and running tests for: >> >> numpy >> scipy >> scikit-learn >> numexpr >> pandas >> statsmodels >> >> This is on the travis-ci ubuntu VMs. >> >> Please do test on your own machines with something like this script [4]: >> >> source test_manylinux.sh >> >> We have worried in the past about the reliability of OpenBLAS, but I >> find these tests reassuring. >> >> Are there any other tests of OpenBLAS that we should run to assure >> ourselves that it is safe to use? >> >> Matthew >> >> [1] >> https://github.com/matthew-brett/manylinux-builds/issues/4#issue-143530908 >> [2] >> https://travis-ci.org/matthew-brett/manylinux-testing/builds/118780781 >> [3] I disabled a few pandas tests which were failing for reasons not >> related to BLAS. Some of the statsmodels test runs time out. >> [4] https://gist.github.com/matthew-brett/2fd9d9a29e022c297634 >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Wed Mar 30 18:34:32 2016 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 30 Mar 2016 15:34:32 -0700 Subject: [Numpy-discussion] Using OpenBLAS for manylinux wheels In-Reply-To: References: Message-ID: On Wed, Mar 30, 2016 at 2:39 PM, Carl Kleffner wrote: > I would like to see OpenBLAS support for numpy on windows. The latest > OpenBLAS windows builds numpy support for are on > https://bitbucket.org/carlkl/mingw-w64-for-python/downloads now. Scipy > wheels should work regardless if numpy was build with MSVC or with mingwpy. > It is only mandantory to agree about the BLAS implemenation used. How do the tests look for `numpy.test("full")` and `scipy.test("full") with OpenBLAS 0.2.17 on Windows? Sorry if you said before. Thanks, Matthew From faltet at gmail.com Thu Mar 31 03:45:01 2016 From: faltet at gmail.com (Francesc Alted) Date: Thu, 31 Mar 2016 09:45:01 +0200 Subject: [Numpy-discussion] ANN: numexpr 2.5.1 released Message-ID: ========================= Announcing Numexpr 2.5.1 ========================= Numexpr is a fast numerical expression evaluator for NumPy. With it, expressions that operate on arrays (like "3*a+4*b") are accelerated and use less memory than doing the same calculation in Python. It wears multi-threaded capabilities, as well as support for Intel's MKL (Math Kernel Library), which allows an extremely fast evaluation of transcendental functions (sin, cos, tan, exp, log...) while squeezing the last drop of performance out of your multi-core processors. Look here for a some benchmarks of numexpr using MKL: https://github.com/pydata/numexpr/wiki/NumexprMKL Its only dependency is NumPy (MKL is optional), so it works well as an easy-to-deploy, easy-to-use, computational engine for projects that don't want to adopt other solutions requiring more heavy dependencies. What's new ========== Fixed a critical bug that caused wrong evaluations of log10() and conj(). These produced wrong results when numexpr was compiled with Intel's MKL (which is a popular build since Anaconda ships it by default) and non-contiguous data. This is considered a *critical* bug and upgrading is highly recommended. Thanks to Arne de Laat and Tom Kooij for reporting and providing a test unit. In case you want to know more in detail what has changed in this version, see: https://github.com/pydata/numexpr/blob/master/RELEASE_NOTES.rst Where I can find Numexpr? ========================= The project is hosted at GitHub in: https://github.com/pydata/numexpr You can get the packages from PyPI as well (but not for RC releases): http://pypi.python.org/pypi/numexpr Share your experience ===================== Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. Enjoy data! -- Francesc Alted -------------- next part -------------- An HTML attachment was scrubbed... URL: From matt.p.conte at gmail.com Thu Mar 31 11:18:01 2016 From: matt.p.conte at gmail.com (mpc) Date: Thu, 31 Mar 2016 08:18:01 -0700 (MST) Subject: [Numpy-discussion] C-API: multidimensional array indexing? In-Reply-To: References: <4E30852E.8080201@web.de> Message-ID: <1459437481679-42693.post@n7.nabble.com> Cool! But I'm having trouble implementing this, could you provide an example on how exactly to do this? I'm not sure how to create the appropriate tuple and how to use it with PyObject_GetItem given an PyArrayObject, unless I'm misunderstood. Much appreciated, Matthew -- View this message in context: http://numpy-discussion.10968.n7.nabble.com/C-API-multidimensional-array-indexing-tp7413p42693.html Sent from the Numpy-discussion mailing list archive at Nabble.com. From jaime.frio at gmail.com Thu Mar 31 16:00:27 2016 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Thu, 31 Mar 2016 22:00:27 +0200 Subject: [Numpy-discussion] Starting work on ufunc rewrite Message-ID: I have started discussing with Nathaniel the implementation of the ufunc ABI break that he proposed in a draft NEP a few months ago: http://thread.gmane.org/gmane.comp.python.numeric.general/61270 His original proposal was to make the public portion of PyUFuncObject be: typedef struct { PyObject_HEAD int nin, nout, nargs; } PyUFuncObject; Of course the idea is that internally we would use a much larger struct that we could change at will, as long as its first few entries matched those of PyUFuncObject. My problem with this, and I may very well be missing something, is that in PyUFunc_Type we need to set the tp_basicsize to the size of the extended struct, so we would end up having to expose its contents. This is somewhat similar to what now happens with PyArrayObject: anyone can #include "ndarraytypes.h", cast PyArrayObject* to PyArrayObjectFields*, and access the guts of the struct without using the supplied API inline functions. Not the end of the world, but if you want to make something private, you might as well make it truly private. I think it would be to have something similar to what NpyIter does:: typedef struct { PyObject_HEAD NpyUFunc *ufunc; } PyUFuncObject; where NpyUFunc would, at this level, be an opaque type of which nothing would be known. We could have some of the NpyUFunc attributes cached on the PyUFuncObject struct for easier access, as is done in NewNpyArrayIterObject. This would also give us more liberty in making NpyUFunc be whatever we want it to be, including a variable-sized memory chunk that we could use and access at will. NpyIter is again a good example, where rather than storing pointers to strides and dimensions arrays, these are made part of the NpyIter memory chunk, effectively being equivalent to having variable sized arrays as part of the struct. And I think we will probably no longer trigger the Cython warnings about size changes either. Any thoughts on this approach? Is there anything fundamentally wrong with what I'm proposing here? Also, this is probably going to end up being a rewrite of a pretty large and complex codebase. I am not sure that working on this on my own and eventually sending a humongous PR is the best approach. Any thoughts on how best to handle turning this into a collaborative, incremental effort? Anyone who would like to join in the fun? Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jfoxrabinovitz at gmail.com Thu Mar 31 16:14:26 2016 From: jfoxrabinovitz at gmail.com (Joseph Fox-Rabinovitz) Date: Thu, 31 Mar 2016 16:14:26 -0400 Subject: [Numpy-discussion] Starting work on ufunc rewrite In-Reply-To: References: Message-ID: There is certainly good precedent for the approach you suggest. Shortly after Nathaniel mentioned the rewrite to me, I looked up d-pointers as a possible technique: https://wiki.qt.io/D-Pointer. If we allow arbitrary kwargs for the new functions, is that something you would want to note in the public structure? I was thinking something along the lines of adding a hook to process additional kwargs and return a void * that would then be passed to the loop. To do this incrementally, perhaps opening a special development branch on the main repository is in order? I would love to join in the fun as time permits. Unfortunately, it is not especially permissive right about now. I will at least throw in some ideas that I have been mulling over. -Joe On Thu, Mar 31, 2016 at 4:00 PM, Jaime Fern?ndez del R?o wrote: > I have started discussing with Nathaniel the implementation of the ufunc ABI > break that he proposed in a draft NEP a few months ago: > > http://thread.gmane.org/gmane.comp.python.numeric.general/61270 > > His original proposal was to make the public portion of PyUFuncObject be: > > typedef struct { > PyObject_HEAD > int nin, nout, nargs; > } PyUFuncObject; > > Of course the idea is that internally we would use a much larger struct that > we could change at will, as long as its first few entries matched those of > PyUFuncObject. My problem with this, and I may very well be missing > something, is that in PyUFunc_Type we need to set the tp_basicsize to the > size of the extended struct, so we would end up having to expose its > contents. This is somewhat similar to what now happens with PyArrayObject: > anyone can #include "ndarraytypes.h", cast PyArrayObject* to > PyArrayObjectFields*, and access the guts of the struct without using the > supplied API inline functions. Not the end of the world, but if you want to > make something private, you might as well make it truly private. > > I think it would be to have something similar to what NpyIter does:: > > typedef struct { > PyObject_HEAD > NpyUFunc *ufunc; > } PyUFuncObject; > > where NpyUFunc would, at this level, be an opaque type of which nothing > would be known. We could have some of the NpyUFunc attributes cached on the > PyUFuncObject struct for easier access, as is done in NewNpyArrayIterObject. > This would also give us more liberty in making NpyUFunc be whatever we want > it to be, including a variable-sized memory chunk that we could use and > access at will. NpyIter is again a good example, where rather than storing > pointers to strides and dimensions arrays, these are made part of the > NpyIter memory chunk, effectively being equivalent to having variable sized > arrays as part of the struct. And I think we will probably no longer trigger > the Cython warnings about size changes either. > > Any thoughts on this approach? Is there anything fundamentally wrong with > what I'm proposing here? > > Also, this is probably going to end up being a rewrite of a pretty large and > complex codebase. I am not sure that working on this on my own and > eventually sending a humongous PR is the best approach. Any thoughts on how > best to handle turning this into a collaborative, incremental effort? Anyone > who would like to join in the fun? > > Jaime > > -- > (\__/) > ( O.o) > ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de > dominaci?n mundial. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > From faltet at gmail.com Thu Mar 31 17:25:13 2016 From: faltet at gmail.com (Francesc Alted) Date: Thu, 31 Mar 2016 23:25:13 +0200 Subject: [Numpy-discussion] ANN: bcolz 1.0.0 RC2 is out! Message-ID: ========================== Announcing bcolz 1.0.0 RC2 ========================== What's new ========== Yeah, 1.0.0 is finally here. We are not introducing any exciting new feature (just some optimizations and bug fixes), but bcolz is already 6 years old and it implements most of the capabilities that it was designed for, so I decided to release a 1.0.0 meaning that the format is declared stable and that people can be assured that future bcolz releases will be able to read bcolz 1.0 data files (and probably much earlier ones too) for a long while. Such a format is fully described at: https://github.com/Blosc/bcolz/blob/master/DISK_FORMAT_v1.rst Also, a 1.0.0 release means that bcolz 1.x series will be based on C-Blosc 1.x series (https://github.com/Blosc/c-blosc). After C-Blosc 2.x (https://github.com/Blosc/c-blosc2) would be out, a new bcolz 2.x is expected taking advantage of shiny new features of C-Blosc2 (more compressors, more filters, native variable length support and the concept of super-chunks), which should be very beneficial for next bcolz generation. Important: this is a Release Candidate, so please test it as much as you can. If no issues would appear in a week or so, I will proceed to tag and release 1.0.0 final. Enjoy! For a more detailed change log, see: https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst What it is ========== *bcolz* provides columnar and compressed data containers that can live either on-disk or in-memory. Column storage allows for efficiently querying tables with a large number of columns. It also allows for cheap addition and removal of column. In addition, bcolz objects are compressed by default for reducing memory/disk I/O needs. The compression process is carried out internally by Blosc, an extremely fast meta-compressor that is optimized for binary data. Lastly, high-performance iterators (like ``iter()``, ``where()``) for querying the objects are provided. bcolz can use numexpr internally so as to accelerate many vector and query operations (although it can use pure NumPy for doing so too). numexpr optimizes the memory usage and use several cores for doing the computations, so it is blazing fast. Moreover, since the carray/ctable containers can be disk-based, and it is possible to use them for seamlessly performing out-of-memory computations. bcolz has minimal dependencies (NumPy), comes with an exhaustive test suite and fully supports both 32-bit and 64-bit platforms. Also, it is typically tested on both UNIX and Windows operating systems. Together, bcolz and the Blosc compressor, are finally fulfilling the promise of accelerating memory I/O, at least for some real scenarios: http://nbviewer.ipython.org/github/Blosc/movielens-bench/blob/master/querying-ep14.ipynb#Plots Other users of bcolz are Visualfabriq (http://www.visualfabriq.com/) the Blaze project (http://blaze.pydata.org/), Quantopian (https://www.quantopian.com/) and Scikit-Allel (https://github.com/cggh/scikit-allel) which you can read more about by pointing your browser at the links below. * Visualfabriq: * *bquery*, A query and aggregation framework for Bcolz: * https://github.com/visualfabriq/bquery * Blaze: * Notebooks showing Blaze + Pandas + BColz interaction: * http://nbviewer.ipython.org/url/blaze.pydata.org/notebooks/timings-csv.ipynb * http://nbviewer.ipython.org/url/blaze.pydata.org/notebooks/timings-bcolz.ipynb * Quantopian: * Using compressed data containers for faster backtesting at scale: * https://quantopian.github.io/talks/NeedForSpeed/slides.html * Scikit-Allel * Provides an alternative backend to work with compressed arrays * https://scikit-allel.readthedocs.org/en/latest/model/bcolz.html Installing ========== bcolz is in the PyPI repository, so installing it is easy:: $ pip install -U bcolz Resources ========= Visit the main bcolz site repository at: http://github.com/Blosc/bcolz Manual: http://bcolz.blosc.org Home of Blosc compressor: http://blosc.org User's mail list: bcolz at googlegroups.com http://groups.google.com/group/bcolz License is the new BSD: https://github.com/Blosc/bcolz/blob/master/LICENSES/BCOLZ.txt Release notes can be found in the Git repository: https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst ---- **Enjoy data!** -- Francesc Alted -------------- next part -------------- An HTML attachment was scrubbed... URL: From izaid at continuum.io Thu Mar 31 18:09:40 2016 From: izaid at continuum.io (Irwin Zaid) Date: Thu, 31 Mar 2016 17:09:40 -0500 Subject: [Numpy-discussion] Starting work on ufunc rewrite Message-ID: Hey guys, I figured I'd just chime in here. Over in DyND-town, we've spent a lot of time figuring out how to structure DyND callables, which are actually more general than NumPy gufuncs. We've just recently got them to a place where we are very happy, and are able to represent a wide range of computations. Our callables use a two-fold approach to evaluation. The first pass is a resolution pass, where a callable can specialize what it is doing based on the input types. It is able to deduce the return type, multidispatch, or even perform some sort of recursive analysis in the form of computations that call themselves. The second pass is construction of a kernel object that is exactly specialized to the metadata (e.g., strides, contiguity, ...) of the array. The callable itself can store arbitrary data, as can each pass of the evaluation. Either (or both) of these passes can be done ahead of time, allowing one to have a callable exactly specialized for your array. If NumPy is looking to change it's ufunc design, we'd be happy to share our experiences with this. Irwin On Thu, Mar 31, 2016 at 4:00 PM, Jaime Fern?ndez del R?o > wrote: >* I have started discussing with Nathaniel the implementation of the ufunc ABI *>* break that he proposed in a draft NEP a few months ago: *>>* http://thread.gmane.org/gmane.comp.python.numeric.general/61270 *>>* His original proposal was to make the public portion of PyUFuncObject be: *>>* typedef struct { *>* PyObject_HEAD *>* int nin, nout, nargs; *>* } PyUFuncObject; *>>* Of course the idea is that internally we would use a much larger struct that *>* we could change at will, as long as its first few entries matched those of *>* PyUFuncObject. My problem with this, and I may very well be missing *>* something, is that in PyUFunc_Type we need to set the tp_basicsize to the *>* size of the extended struct, so we would end up having to expose its *>* contents. This is somewhat similar to what now happens with PyArrayObject: *>* anyone can #include "ndarraytypes.h", cast PyArrayObject* to *>* PyArrayObjectFields*, and access the guts of the struct without using the *>* supplied API inline functions. Not the end of the world, but if you want to *>* make something private, you might as well make it truly private. *>>* I think it would be to have something similar to what NpyIter does:: *>>* typedef struct { *>* PyObject_HEAD *>* NpyUFunc *ufunc; *>* } PyUFuncObject; *>>* where NpyUFunc would, at this level, be an opaque type of which nothing *>* would be known. We could have some of the NpyUFunc attributes cached on the *>* PyUFuncObject struct for easier access, as is done in NewNpyArrayIterObject. *>* This would also give us more liberty in making NpyUFunc be whatever we want *>* it to be, including a variable-sized memory chunk that we could use and *>* access at will. NpyIter is again a good example, where rather than storing *>* pointers to strides and dimensions arrays, these are made part of the *>* NpyIter memory chunk, effectively being equivalent to having variable sized *>* arrays as part of the struct. And I think we will probably no longer trigger *>* the Cython warnings about size changes either. *>>* Any thoughts on this approach? Is there anything fundamentally wrong with *>* what I'm proposing here? *>>* Also, this is probably going to end up being a rewrite of a pretty large and *>* complex codebase. I am not sure that working on this on my own and *>* eventually sending a humongous PR is the best approach. Any thoughts on how *>* best to handle turning this into a collaborative, incremental effort? Anyone *>* who would like to join in the fun? *>>* Jaime *>>* -- *>* (\__/) *>* ( O.o) *>* ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de *>* dominaci?n mundial. *>>* _______________________________________________ *>* NumPy-Discussion mailing list *>* NumPy-Discussion at scipy.org *>* https://mail.scipy.org/mailman/listinfo/numpy-discussion *> -------------- next part -------------- An HTML attachment was scrubbed... URL: From cgodshall at enthought.com Tue Mar 29 11:13:52 2016 From: cgodshall at enthought.com (Courtenay Godshall (Enthought)) Date: Tue, 29 Mar 2016 15:13:52 -0000 Subject: [Numpy-discussion] ANN: LAST CALL for SciPy (Scientific Python) 2016 Conference Talk / Poster Proposals - Due Friday 4/1 Message-ID: <008801d189cd$9108d070$b31a7150$@enthought.com> **ANN: LAST CALL for SciPy 2016 Conference Talk / Poster Proposals (Scientific Computing with Python) - Final Deadline - Friday April 1st** SciPy 2016, the 15th annual Scientific Computing with Python conference, will be held July 11-17, 2016 in Austin, Texas. SciPy is a community dedicated to the advancement of scientific computing through open source Python software for mathematics, science, and engineering. The annual SciPy Conference brings together over 650 participants from industry, academia, and government to showcase their latest projects, learn from skilled users and developers, and collaborate on code development. The full program will consist of 2 days of tutorials (July 11-12), 3 days of talks (July 13-15), and 2 days of developer sprints (July 16-17). More info is available on the conference website at http://scipy2016.scipy.org (where you can sign up for the mailing list); or follow @scipyconf on Twitter. ---------------------------------------------------------------------------- --------------------------------- **SUBMIT A SCIPY 2016 TALK / POSTER PROPOSAL - DUE APRIL 1, 2016* ---------------------------------------------------------------------------- --------------------------------- Submissions for talks and posters are welcome on our website ( http://scipy2016.scipy.org). In your abstract, please provide details on what Python tools are being employed, and how. The talk and poster submission deadline is March 25th, 2016 SciPy 2016 will include 3 major topic tracks and 8 mini-symposia tracks. Major topic tracks include: - Scientific Computing in Python - Python in Data Science (Big data and not so big data) - High Performance Computing Mini-symposia will include the applications of Python in: - Earth and Space Science - Engineering - Medicine and Biology - Social Sciences - Special Purpose Databases - Case Studies in Industry - Education - Reproducibility If you have any questions or comments, feel free to contact us at: scipy-organizers at scipy.org ----------------------------------------------------------- **SCIPY 2016 REGISTRATION IS OPEN** ----------------------------------------------------------- Please register early. SciPy early bird registration until May 22, 2016. Register at http://scipy2016.scipy.org. Plus, enter our t-shirt design contest to win a free registration. (Send a vector art file to scipy at enthought.com by March 31 to enter). Important dates: April 1: Talk and Poster Proposals Due May 11: Plotting Contest Submissions Due Apr 22: Tutorials Announced Apr 22: Financial Aid Submissions Due May 4: Talk and Posters Announced May 22: Early Bird Registration Deadline Jul 11-12: SciPy 2016 Tutorials Jul 13-15: SciPy 2016 General Conference Jul 16-17: SciPy 2016 Sprints We look forward to an exciting conference and hope to see you in Austin in July! The Scipy 2016 Committee http://scipy2016.scipy.org/ -------------- next part -------------- An HTML attachment was scrubbed... URL: