From gregor.thalhammer at gmail.com Wed Jun 1 03:44:50 2016 From: gregor.thalhammer at gmail.com (Gregor Thalhammer) Date: Wed, 1 Jun 2016 09:44:50 +0200 Subject: [Numpy-discussion] Changing FFT cache to a bounded LRU cache In-Reply-To: <1393163039486422808.824659sturla.molden-gmail.com@news.gmane.org> References: <4f5279db-43d3-4ae2-7a8b-67d1ec7bd802@gmail.com> <1464459567.2690.12.camel@sipsolutions.net> <1393163039486422808.824659sturla.molden-gmail.com@news.gmane.org> Message-ID: <22276D10-788A-4A0C-A0EF-8E6F903AA717@gmail.com> > Am 31.05.2016 um 23:36 schrieb Sturla Molden : > > Joseph Martinot-Lagarde wrote: > >> The problem with FFTW is that its license is more restrictive (GPL), and >> because of this may not be suitable everywhere numpy.fft is. > > A lot of us use NumPy linked with MKL or Accelerate, both of which have > some really nifty FFTs. And the license issue is hardly any worse than > linking with them for BLAS and LAPACK, which we do anyway. We could extend > numpy.fft to use MKL or Accelerate when they are available. It seems the anaconda numpy binaries do already use MKL for fft: In [2]: np.fft.using_mklfft Out[2]: True Is this based on a proprietary patch of numpy? Gregor > > Sturla > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion From lion.krischer at gmail.com Wed Jun 1 08:15:45 2016 From: lion.krischer at gmail.com (Lion Krischer) Date: Wed, 1 Jun 2016 14:15:45 +0200 Subject: [Numpy-discussion] Changing FFT cache to a bounded LRU cache In-Reply-To: <22276D10-788A-4A0C-A0EF-8E6F903AA717@gmail.com> References: <4f5279db-43d3-4ae2-7a8b-67d1ec7bd802@gmail.com> <1464459567.2690.12.camel@sipsolutions.net> <1393163039486422808.824659sturla.molden-gmail.com@news.gmane.org> <22276D10-788A-4A0C-A0EF-8E6F903AA717@gmail.com> Message-ID: <0f266962-9beb-86b0-f444-95df0561d2a6@gmail.com> Seems so. numpy/fft/__init__.py when installed with conda contains a thin optional wrapper around mklfft, e.g. this here: https://docs.continuum.io/accelerate/mkl_fft It is part of the accelerate package from continuum and thus not free. Cheers! Lion On 01/06/16 09:44, Gregor Thalhammer wrote: > >> Am 31.05.2016 um 23:36 schrieb Sturla Molden : >> >> Joseph Martinot-Lagarde wrote: >> >>> The problem with FFTW is that its license is more restrictive (GPL), and >>> because of this may not be suitable everywhere numpy.fft is. >> >> A lot of us use NumPy linked with MKL or Accelerate, both of which have >> some really nifty FFTs. And the license issue is hardly any worse than >> linking with them for BLAS and LAPACK, which we do anyway. We could extend >> numpy.fft to use MKL or Accelerate when they are available. > > It seems the anaconda numpy binaries do already use MKL for fft: > > In [2]: np.fft.using_mklfft > Out[2]: True > > Is this based on a proprietary patch of numpy? > > Gregor > >> >> Sturla >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > From faltet at gmail.com Wed Jun 1 08:30:16 2016 From: faltet at gmail.com (Francesc Alted) Date: Wed, 1 Jun 2016 14:30:16 +0200 Subject: [Numpy-discussion] ANN: numexpr 2.6.0 released Message-ID: ========================= Announcing Numexpr 2.6.0 ========================= Numexpr is a fast numerical expression evaluator for NumPy. With it, expressions that operate on arrays (like "3*a+4*b") are accelerated and use less memory than doing the same calculation in Python. It wears multi-threaded capabilities, as well as support for Intel's MKL (Math Kernel Library), which allows an extremely fast evaluation of transcendental functions (sin, cos, tan, exp, log...) while squeezing the last drop of performance out of your multi-core processors. Look here for a some benchmarks of numexpr using MKL: https://github.com/pydata/numexpr/wiki/NumexprMKL Its only dependency is NumPy (MKL is optional), so it works well as an easy-to-deploy, easy-to-use, computational engine for projects that don't want to adopt other solutions requiring more heavy dependencies. What's new ========== This is a minor version bump because it introduces a new function. Also some minor fine tuning for recent CPUs has been done. More specifically: - Introduced a new re_evaluate() function for re-evaluating the previous executed array expression without any check. This is meant for accelerating loops that are re-evaluating the same expression repeatedly without changing anything else than the operands. If unsure, use evaluate() which is safer. - The BLOCK_SIZE1 and BLOCK_SIZE2 constants have been re-checked in order to find a value maximizing most of the benchmarks in bench/ directory. The new values (8192 and 16 respectively) give somewhat better results (~5%) overall. The CPU used for fine tuning is a relatively new Haswell processor (E3-1240 v3). In case you want to know more in detail what has changed in this version, see: https://github.com/pydata/numexpr/blob/master/RELEASE_NOTES.rst Where I can find Numexpr? ========================= The project is hosted at GitHub in: https://github.com/pydata/numexpr You can get the packages from PyPI as well (but not for RC releases): http://pypi.python.org/pypi/numexpr Share your experience ===================== Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. Enjoy data! -- Francesc Alted -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Wed Jun 1 19:47:07 2016 From: cournape at gmail.com (David Cournapeau) Date: Thu, 2 Jun 2016 00:47:07 +0100 Subject: [Numpy-discussion] Changing FFT cache to a bounded LRU cache In-Reply-To: <1393163039486422808.824659sturla.molden-gmail.com@news.gmane.org> References: <4f5279db-43d3-4ae2-7a8b-67d1ec7bd802@gmail.com> <1464459567.2690.12.camel@sipsolutions.net> <1393163039486422808.824659sturla.molden-gmail.com@news.gmane.org> Message-ID: On Tue, May 31, 2016 at 10:36 PM, Sturla Molden wrote: > Joseph Martinot-Lagarde wrote: > > > The problem with FFTW is that its license is more restrictive (GPL), and > > because of this may not be suitable everywhere numpy.fft is. > > A lot of us use NumPy linked with MKL or Accelerate, both of which have > some really nifty FFTs. And the license issue is hardly any worse than > linking with them for BLAS and LAPACK, which we do anyway. We could extend > numpy.fft to use MKL or Accelerate when they are available. > That's what we used to do in scipy, but it was a PITA to maintain. Contrary to blas/lapack, fft does not have a standard API, hence exposing a consistent API in python, including data layout involved quite a bit of work. It is better to expose those through 3rd party APIs. David > Sturla > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Wed Jun 1 22:42:22 2016 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 1 Jun 2016 19:42:22 -0700 Subject: [Numpy-discussion] Changing FFT cache to a bounded LRU cache In-Reply-To: References: <4f5279db-43d3-4ae2-7a8b-67d1ec7bd802@gmail.com> <1464459567.2690.12.camel@sipsolutions.net> <1393163039486422808.824659sturla.molden-gmail.com@news.gmane.org> Message-ID: On Jun 1, 2016 4:47 PM, "David Cournapeau" wrote: > > > > On Tue, May 31, 2016 at 10:36 PM, Sturla Molden wrote: >> >> Joseph Martinot-Lagarde wrote: >> >> > The problem with FFTW is that its license is more restrictive (GPL), and >> > because of this may not be suitable everywhere numpy.fft is. >> >> A lot of us use NumPy linked with MKL or Accelerate, both of which have >> some really nifty FFTs. And the license issue is hardly any worse than >> linking with them for BLAS and LAPACK, which we do anyway. We could extend >> numpy.fft to use MKL or Accelerate when they are available. > > > That's what we used to do in scipy, but it was a PITA to maintain. Contrary to blas/lapack, fft does not have a standard API, hence exposing a consistent API in python, including data layout involved quite a bit of work. > > It is better to expose those through 3rd party APIs. Fwiw Intel's new python distribution thing has numpy patched to use mkl for fft, and they're interested in pushing the relevant changes upstream. I have no idea how maintainable their patches are, since I haven't seen them -- this is just from taking to people here at pycon. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Thu Jun 2 00:52:08 2016 From: travis at continuum.io (Travis Oliphant) Date: Wed, 1 Jun 2016 21:52:08 -0700 Subject: [Numpy-discussion] Changing FFT cache to a bounded LRU cache In-Reply-To: References: <4f5279db-43d3-4ae2-7a8b-67d1ec7bd802@gmail.com> <1464459567.2690.12.camel@sipsolutions.net> <1393163039486422808.824659sturla.molden-gmail.com@news.gmane.org> Message-ID: Hi all, At Continuum we are trying to coordinate with Intel about releasing our patches from Accelerate upstream as well rather than having them redo things we have already done but have just not been able to open source yet. Accelerate also uses GPU accelerated FFTs and it would be nice if there were a supported NumPy-way of plugging in these optimized approaches. This is not a trivial thing to do, though and there are a lot of design choices. We have been giving away Accelerate to academics since it was released but have asked companies to pay for it as a means of generating money to support open source. Several things that used to be in Accelerate only are now already in open-source (e.g. cuda.jit, guvectorize, target='cuda' and target='parallel' in numba.vectorize). I expect this trend will continue. The FFT enhancements are another thing that are on the list of things to make open source. I for one, welcome Intel's contributions and am enthusiastic about their joining the Python development community. In many cases it would be better if they would just pay a company that already has built and tested this capability to release it then develop things themselves yet again. Any encouragement that can be provided to Intel to encourage them in this direction would help. Many companies are now supporting open-source. Even those that sell some software are still contributing overall to ensure that the total amount of useful open-source software available is increasing. Best, -Travis On Wed, Jun 1, 2016 at 7:42 PM, Nathaniel Smith wrote: > On Jun 1, 2016 4:47 PM, "David Cournapeau" wrote: > > > > > > > > On Tue, May 31, 2016 at 10:36 PM, Sturla Molden > wrote: > >> > >> Joseph Martinot-Lagarde wrote: > >> > >> > The problem with FFTW is that its license is more restrictive (GPL), > and > >> > because of this may not be suitable everywhere numpy.fft is. > >> > >> A lot of us use NumPy linked with MKL or Accelerate, both of which have > >> some really nifty FFTs. And the license issue is hardly any worse than > >> linking with them for BLAS and LAPACK, which we do anyway. We could > extend > >> numpy.fft to use MKL or Accelerate when they are available. > > > > > > That's what we used to do in scipy, but it was a PITA to maintain. > Contrary to blas/lapack, fft does not have a standard API, hence exposing a > consistent API in python, including data layout involved quite a bit of > work. > > > > It is better to expose those through 3rd party APIs. > > Fwiw Intel's new python distribution thing has numpy patched to use mkl > for fft, and they're interested in pushing the relevant changes upstream. > > I have no idea how maintainable their patches are, since I haven't seen > them -- this is just from taking to people here at pycon. > > -n > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- *Travis Oliphant, PhD* *Co-founder and CEO* @teoliphant 512-222-5440 http://www.continuum.io -------------- next part -------------- An HTML attachment was scrubbed... URL: From mhearne at usgs.gov Thu Jun 2 17:30:12 2016 From: mhearne at usgs.gov (Hearne, Mike) Date: Thu, 2 Jun 2016 14:30:12 -0700 Subject: [Numpy-discussion] SciPy 2016 Message-ID: I am one of the co-chairs of the Birds of a Feather (BOF) committee at SciPy 2016, taking place this year from July 11-17. We are actively seeking moderators to propose BOF sessions on any number of topics. If someone on this list (a numpy dev, perhaps?) is interested in leading a numpy-focused session at SciPy this year, the submission form can be found here: http://scipy2016.scipy.org/ehome/146062/332970/ The BOF can take the form of a panel or open discussion. We have one submission already for a BOF focused on future plans for matplotlib development, so a parallel session on numpy future development is sure to be of general interest.' Thanks, and we'll see some of you in Austin next month! --Mike Hearne From charlesr.harris at gmail.com Sat Jun 4 13:05:08 2016 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 4 Jun 2016 11:05:08 -0600 Subject: [Numpy-discussion] Integers to integer powers In-Reply-To: <201605242033.u4OKXRLf029822@blue-cove.com> References: <20160524133034.642b60b1@fsol> <7c0663b6-9d2a-2a0f-e0bf-71e19e30c5b4@gmail.com> <8f0576d6-53fc-0fbd-cd2e-3daa72076731@gmail.com> <020de0da-4ce2-0b0b-f2eb-bc77bbf589ab@gmail.com> <201605242033.u4OKXRLf029822@blue-cove.com> Message-ID: On Tue, May 24, 2016 at 2:33 PM, R Schumacher wrote: > At 01:15 PM 5/24/2016, you wrote: > > On 5/24/2016 3:57 PM, Eric Moore wrote: > > Changing np.arange(10)**3 to have a non-integer dtype seems like a big > change. > > > > What about np.arange(100)**5? > > > Interesting, one warning per instantiation (Py2.7): > > >>> import numpy > >>> a=numpy.arange(100)**5 > :1: RuntimeWarning: invalid value encountered in power > >>> a=numpy.arange(100)**5. > >>> b=numpy.arange(100.)**5 > >>> a==b > array([ True, True, True, True, True, True, True, True, True, > True, True, True, True, True, True, True, True, True, > True, True, True, True, True, True, True, True, True, > True, True, True, True, True, True, True, True, True, > True, True, True, True, True, True, True, True, True, > True, True, True, True, True, True, True, True, True, > True, True, True, True, True, True, True, True, True, > True, True, True, True, True, True, True, True, True, > True, True, True, True, True, True, True, True, True, > True, True, True, True, True, True, True, True, True, > True, True, True, True, True, True, True, True, True, > True], dtype=bool) > >>> numpy.arange(100)**5 > array([ 0, 1, 32, 243, 1024, > 3125, 7776, 16807, 32768, 59049, > 100000, 161051, 248832, 371293, 537824, > 759375, 1048576, 1419857, 1889568, 2476099, > 3200000, 4084101, 5153632, 6436343, 7962624, > 9765625, 11881376, 14348907, 17210368, 20511149, > 24300000, 28629151, 33554432, 39135393, 45435424, > 52521875, 60466176, 69343957, 79235168, 90224199, > 102400000, 115856201, 130691232, 147008443, 164916224, > 184528125, 205962976, 229345007, 254803968, 282475249, > 312500000, 345025251, 380204032, 418195493, 459165024, > 503284375, 550731776, 601692057, 656356768, 714924299, > 777600000, 844596301, 916132832, 992436543, 1073741824, > 1160290625, 1252332576, 1350125107, 1453933568, 1564031349, > 1680700000, 1804229351, 1934917632, 2073071593, -2147483648, > -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, > -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, > -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, > -2147483648, -2147483648, -2147483648, -2147483648, -2147483648, > -2147483648, -2147483648, -2147483648, -2147483648, -2147483648]) > >>> > >>> numpy.arange(100, dtype=numpy.int64)**5 > array([ 0, 1, 32, 243, 1024, > 3125, 7776, 16807, 32768, 59049, > 100000, 161051, 248832, 371293, 537824, > 759375, 1048576, 1419857, 1889568, 2476099, > 3200000, 4084101, 5153632, 6436343, 7962624, > 9765625, 11881376, 14348907, 17210368, 20511149, > 24300000, 28629151, 33554432, 39135393, 45435424, > 52521875, 60466176, 69343957, 79235168, 90224199, > 102400000, 115856201, 130691232, 147008443, 164916224, > 184528125, 205962976, 229345007, 254803968, 282475249, > 312500000, 345025251, 380204032, 418195493, 459165024, > 503284375, 550731776, 601692057, 656356768, 714924299, > 777600000, 844596301, 916132832, 992436543, 1073741824, > 1160290625, 1252332576, 1350125107, 1453933568, 1564031349, > 1680700000, 1804229351, 1934917632, 2073071593, 2219006624, > 2373046875, 2535525376, 2706784157, 2887174368, 3077056399, > 3276800000, 3486784401, 3707398432, 3939040643, 4182119424, > 4437053125, 4704270176, 4984209207, 5277319168, 5584059449, > 5904900000, 6240321451, 6590815232, 6956883693, 7339040224, > 7737809375, 8153726976, 8587340257, 9039207968, 9509900499], > dtype=int64) > That is the Python default. To always see warnings do `warnings.simplefilter('always')` before running. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Jun 4 13:22:52 2016 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 4 Jun 2016 11:22:52 -0600 Subject: [Numpy-discussion] Integers to integer powers, let's make a decision Message-ID: Hi All, I've made a new post so that we can make an explicit decision. AFAICT, the two proposals are 1. Integers to negative integer powers raise an error. 2. Integers to integer powers always results in floats. My own sense is that 1. would be closest to current behavior and using a float exponential when a float is wanted is an explicit way to indicate that desire. OTOH, 2. would be the most convenient default for everyday numerical computation, but I think would more likely break current code. I am going to come down on the side of 1., which I don't think should cause too many problems if we start with a {Future, Deprecation}Warning explaining the workaround. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sat Jun 4 13:45:24 2016 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 4 Jun 2016 10:45:24 -0700 Subject: [Numpy-discussion] Integers to integer powers, let's make a decision In-Reply-To: References: Message-ID: +1 On Sat, Jun 4, 2016 at 10:22 AM, Charles R Harris wrote: > Hi All, > > I've made a new post so that we can make an explicit decision. AFAICT, the > two proposals are > > Integers to negative integer powers raise an error. > Integers to integer powers always results in floats. > > My own sense is that 1. would be closest to current behavior and using a > float exponential when a float is wanted is an explicit way to indicate that > desire. OTOH, 2. would be the most convenient default for everyday numerical > computation, but I think would more likely break current code. I am going to > come down on the side of 1., which I don't think should cause too many > problems if we start with a {Future, Deprecation}Warning explaining the > workaround. > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Nathaniel J. Smith -- https://vorpus.org From matthew.brett at gmail.com Sat Jun 4 13:46:50 2016 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 4 Jun 2016 10:46:50 -0700 Subject: [Numpy-discussion] Integers to integer powers, let's make a decision In-Reply-To: References: Message-ID: On Sat, Jun 4, 2016 at 10:45 AM, Nathaniel Smith wrote: > +1 > > On Sat, Jun 4, 2016 at 10:22 AM, Charles R Harris > wrote: >> Hi All, >> >> I've made a new post so that we can make an explicit decision. AFAICT, the >> two proposals are >> >> Integers to negative integer powers raise an error. >> Integers to integer powers always results in floats. >> >> My own sense is that 1. would be closest to current behavior and using a >> float exponential when a float is wanted is an explicit way to indicate that >> desire. OTOH, 2. would be the most convenient default for everyday numerical >> computation, but I think would more likely break current code. I am going to >> come down on the side of 1., which I don't think should cause too many >> problems if we start with a {Future, Deprecation}Warning explaining the >> workaround. I agree - error for negative integer powers seems like the safest option. Cheers, Matthew From charlesr.harris at gmail.com Sat Jun 4 15:43:42 2016 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 4 Jun 2016 13:43:42 -0600 Subject: [Numpy-discussion] Integers to integer powers, let's make a decision In-Reply-To: References: Message-ID: On Sat, Jun 4, 2016 at 11:22 AM, Charles R Harris wrote: > Hi All, > > I've made a new post so that we can make an explicit decision. AFAICT, the > two proposals are > > > 1. Integers to negative integer powers raise an error. > 2. Integers to integer powers always results in floats. > > My own sense is that 1. would be closest to current behavior and using a > float exponential when a float is wanted is an explicit way to indicate > that desire. OTOH, 2. would be the most convenient default for everyday > numerical computation, but I think would more likely break current code. I > am going to come down on the side of 1., which I don't think should cause > too many problems if we start with a {Future, Deprecation}Warning > explaining the workaround. > Note that current behavior in 1.11 is such a mess ``` In [5]: array([0], dtype=int64) ** -1 Out[5]: array([-9223372036854775808]) In [6]: array([0], dtype=uint64) ** -1 Out[6]: array([ inf]) ``` That the simplest approach might be to start by raising an error rather than by trying to maintain current behavior and issuing a warning. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sat Jun 4 15:47:47 2016 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 4 Jun 2016 15:47:47 -0400 Subject: [Numpy-discussion] Integers to integer powers, let's make a decision In-Reply-To: References: Message-ID: On Sat, Jun 4, 2016 at 3:43 PM, Charles R Harris wrote: > > > On Sat, Jun 4, 2016 at 11:22 AM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> Hi All, >> >> I've made a new post so that we can make an explicit decision. AFAICT, >> the two proposals are >> >> >> 1. Integers to negative integer powers raise an error. >> 2. Integers to integer powers always results in floats. >> >> My own sense is that 1. would be closest to current behavior and using a >> float exponential when a float is wanted is an explicit way to indicate >> that desire. OTOH, 2. would be the most convenient default for everyday >> numerical computation, but I think would more likely break current code. I >> am going to come down on the side of 1., which I don't think should cause >> too many problems if we start with a {Future, Deprecation}Warning >> explaining the workaround. >> > I'm in favor of 2. always float for `**` I don't see enough pure integer usecases to throw away a nice operator. Josef > > Note that current behavior in 1.11 is such a mess > ``` > In [5]: array([0], dtype=int64) ** -1 > Out[5]: array([-9223372036854775808]) > > In [6]: array([0], dtype=uint64) ** -1 > Out[6]: array([ inf]) > ``` > That the simplest approach might be to start by raising an error rather > than by trying to maintain current behavior and issuing a warning. > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Sat Jun 4 15:49:22 2016 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 4 Jun 2016 12:49:22 -0700 Subject: [Numpy-discussion] Integers to integer powers, let's make a decision In-Reply-To: References: Message-ID: On Sat, Jun 4, 2016 at 12:47 PM, wrote: > > > On Sat, Jun 4, 2016 at 3:43 PM, Charles R Harris > wrote: >> >> >> >> On Sat, Jun 4, 2016 at 11:22 AM, Charles R Harris >> wrote: >>> >>> Hi All, >>> >>> I've made a new post so that we can make an explicit decision. AFAICT, >>> the two proposals are >>> >>> Integers to negative integer powers raise an error. >>> Integers to integer powers always results in floats. >>> >>> My own sense is that 1. would be closest to current behavior and using a >>> float exponential when a float is wanted is an explicit way to indicate that >>> desire. OTOH, 2. would be the most convenient default for everyday numerical >>> computation, but I think would more likely break current code. I am going to >>> come down on the side of 1., which I don't think should cause too many >>> problems if we start with a {Future, Deprecation}Warning explaining the >>> workaround. > > > I'm in favor of 2. always float for `**` > I don't see enough pure integer usecases to throw away a nice operator. I can't make sense of 'throw away a nice operator' - you still have arr ** 2.0 if you want floats. Matthew From josef.pktd at gmail.com Sat Jun 4 16:16:21 2016 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 4 Jun 2016 16:16:21 -0400 Subject: [Numpy-discussion] Integers to integer powers, let's make a decision In-Reply-To: References: Message-ID: On Sat, Jun 4, 2016 at 3:49 PM, Matthew Brett wrote: > On Sat, Jun 4, 2016 at 12:47 PM, wrote: > > > > > > On Sat, Jun 4, 2016 at 3:43 PM, Charles R Harris < > charlesr.harris at gmail.com> > > wrote: > >> > >> > >> > >> On Sat, Jun 4, 2016 at 11:22 AM, Charles R Harris > >> wrote: > >>> > >>> Hi All, > >>> > >>> I've made a new post so that we can make an explicit decision. AFAICT, > >>> the two proposals are > >>> > >>> Integers to negative integer powers raise an error. > >>> Integers to integer powers always results in floats. > >>> > >>> My own sense is that 1. would be closest to current behavior and using > a > >>> float exponential when a float is wanted is an explicit way to > indicate that > >>> desire. OTOH, 2. would be the most convenient default for everyday > numerical > >>> computation, but I think would more likely break current code. I am > going to > >>> come down on the side of 1., which I don't think should cause too many > >>> problems if we start with a {Future, Deprecation}Warning explaining the > >>> workaround. > > > > > > I'm in favor of 2. always float for `**` > > I don't see enough pure integer usecases to throw away a nice operator. > > I can't make sense of 'throw away a nice operator' - you still have > arr ** 2.0 if you want floats. > but if we have x**y, then we always need to check the dtype. If we don't we get RuntimeErrors or overflow, where we might have forgotten to include the relevant cases in the unit tests. numpy has got pickier with using only integers in some areas (index, ...). Now we have to watch out that we convert back to floats for power. Not a serious problem for a library with unit tests and enough users who run into the dtype issues and report them. But I'm sure I will have to fix any scripts or interactive work that I'm writing. It's just another thing to watch out for, after we managed to get rid of integer division 1/2=?. Josef > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.e.creasey.00 at googlemail.com Sat Jun 4 16:35:30 2016 From: p.e.creasey.00 at googlemail.com (Peter Creasey) Date: Sat, 4 Jun 2016 13:35:30 -0700 Subject: [Numpy-discussion] Integers to integer powers, let's make a decision Message-ID: > > +1 > > On Sat, Jun 4, 2016 at 10:22 AM, Charles R Harris > wrote: >> Hi All, >> >> I've made a new post so that we can make an explicit decision. AFAICT, the >> two proposals are >> >> Integers to negative integer powers raise an error. >> Integers to integer powers always results in floats. >> >> My own sense is that 1. would be closest to current behavior and using a >> float exponential when a float is wanted is an explicit way to indicate that >> desire. OTOH, 2. would be the most convenient default for everyday numerical >> computation, but I think would more likely break current code. I am going to >> come down on the side of 1., which I don't think should cause too many >> problems if we start with a {Future, Deprecation}Warning explaining the >> workaround. >> >> Chuck >> +1 (grudgingly) My thoughts on this are: (i) Intuitive APIs are better, and power(a,b) suggests to a lot of (most?) readers that you are going to invoke a function like the C pow(double x, double y) on every element. Doing positive integer powers with the same function name suggests a correspondence that is in practice not that helpful. With a time machine I?d suggest a separate function for positive integer powers, however... (ii) I think that ship has sailed, and particularly with e.g. a**3 the numpy conventions are backed up by quite a bit of code, probably too much to change without a lot of problems. So I?d go with integer ^ negative integer is an error. Peter From matti.picus at gmail.com Sat Jun 4 16:58:11 2016 From: matti.picus at gmail.com (Matti Picus) Date: Sat, 4 Jun 2016 23:58:11 +0300 Subject: [Numpy-discussion] PyArray_Scalar should not use memcpy Message-ID: <575340E3.1060904@gmail.com> Hi. This is a heads up and RFC about a pull request I am preparing for PyArray_Scalar, within the framework of getting NumPy working properly on PyPy. For those who don't know, the numpy HEAD builds and runs on PyPy2.7 HEAD (otherwise known as nightly default). However there are a number of test failures, some are caused by (ab)use of memcpy on c-level pointers obtained from Py*_FromString(). I am currently rewriting PyArray_Scalar to not use memcpy, and wondering how deep of a refactoring would be acceptable by the maintainers in a single pull request? Should I just stick to small changes to eliminate the two calls to memcpy, or clean up and restructure the entire function around a more switch(type_num) programming style? Thanks, Matti From sole at esrf.fr Sat Jun 4 17:07:28 2016 From: sole at esrf.fr (V. Armando Sole) Date: Sat, 04 Jun 2016 23:07:28 +0200 Subject: [Numpy-discussion] Integers to integer powers, let's make a decision In-Reply-To: References: Message-ID: <89daf976787556179fb11a9ea397183d@esrf.fr> Also in favor of 2. Always return a float for '**' On 04.06.2016 21:47, josef.pktd at gmail.com wrote: > On Sat, Jun 4, 2016 at 3:43 PM, Charles R Harris > wrote: > >> On Sat, Jun 4, 2016 at 11:22 AM, Charles R Harris >> wrote: >> >>> Hi All, >>> >>> I've made a new post so that we can make an explicit decision. >>> AFAICT, the two proposals are >>> >>> * Integers to negative integer powers raise an error. >>> * Integers to integer powers always results in floats. >>> >>> My own sense is that 1. would be closest to current behavior and >>> using a float exponential when a float is wanted is an explicit >>> way to indicate that desire. OTOH, 2. would be the most convenient >>> default for everyday numerical computation, but I think would more >>> likely break current code. I am going to come down on the side of >>> 1., which I don't think should cause too many problems if we start >>> with a {Future, Deprecation}Warning explaining the workaround. > > I'm in favor of 2. always float for `**` > I don't see enough pure integer usecases to throw away a nice > operator. From njs at pobox.com Sat Jun 4 18:10:28 2016 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 4 Jun 2016 15:10:28 -0700 Subject: [Numpy-discussion] Integers to integer powers, let's make a decision In-Reply-To: <89daf976787556179fb11a9ea397183d@esrf.fr> References: <89daf976787556179fb11a9ea397183d@esrf.fr> Message-ID: On Sat, Jun 4, 2016 at 2:07 PM, V. Armando Sole wrote: > Also in favor of 2. Always return a float for '**' Even if we did want to switch to this, it's such a major backwards-incompatible change that I'm not sure how we could actually make the transition without first making it an error for a while. -n -- Nathaniel J. Smith -- https://vorpus.org From josef.pktd at gmail.com Sat Jun 4 19:27:30 2016 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 4 Jun 2016 19:27:30 -0400 Subject: [Numpy-discussion] Integers to integer powers, let's make a decision In-Reply-To: References: <89daf976787556179fb11a9ea397183d@esrf.fr> Message-ID: On Sat, Jun 4, 2016 at 6:10 PM, Nathaniel Smith wrote: > On Sat, Jun 4, 2016 at 2:07 PM, V. Armando Sole wrote: > > Also in favor of 2. Always return a float for '**' > > Even if we did want to switch to this, it's such a major > backwards-incompatible change that I'm not sure how we could actually > make the transition without first making it an error for a while. > AFAIU, only the dtype for int**int would change. So, what would be the problem with FutureWarnings as with other dtype changes that were done in recent releases. Josef > > -n > > -- > Nathaniel J. Smith -- https://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Jun 4 20:07:22 2016 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 4 Jun 2016 18:07:22 -0600 Subject: [Numpy-discussion] Integers to integer powers, let's make a decision In-Reply-To: References: <89daf976787556179fb11a9ea397183d@esrf.fr> Message-ID: On Sat, Jun 4, 2016 at 5:27 PM, wrote: > > > On Sat, Jun 4, 2016 at 6:10 PM, Nathaniel Smith wrote: > >> On Sat, Jun 4, 2016 at 2:07 PM, V. Armando Sole wrote: >> > Also in favor of 2. Always return a float for '**' >> >> Even if we did want to switch to this, it's such a major >> backwards-incompatible change that I'm not sure how we could actually >> make the transition without first making it an error for a while. >> > > AFAIU, only the dtype for int**int would change. So, what would be the > problem with FutureWarnings as with other dtype changes that were done in > recent releases. > > The main problem I see with that is that numpy integers would behave differently than Python integers, and the difference would be silent. With option 1 it is possible to write code that behaves the same up to overflow and the error message would supply a warning when the exponent should be float. One could argue that numpy scalar integer types could be made to behave like python integers, but then their behavior would differ from numpy arrays and numpy scalar arrays. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sat Jun 4 20:17:59 2016 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 4 Jun 2016 20:17:59 -0400 Subject: [Numpy-discussion] Integers to integer powers, let's make a decision In-Reply-To: References: <89daf976787556179fb11a9ea397183d@esrf.fr> Message-ID: On Sat, Jun 4, 2016 at 8:07 PM, Charles R Harris wrote: > > > On Sat, Jun 4, 2016 at 5:27 PM, wrote: > >> >> >> On Sat, Jun 4, 2016 at 6:10 PM, Nathaniel Smith wrote: >> >>> On Sat, Jun 4, 2016 at 2:07 PM, V. Armando Sole wrote: >>> > Also in favor of 2. Always return a float for '**' >>> >>> Even if we did want to switch to this, it's such a major >>> backwards-incompatible change that I'm not sure how we could actually >>> make the transition without first making it an error for a while. >>> >> >> AFAIU, only the dtype for int**int would change. So, what would be the >> problem with FutureWarnings as with other dtype changes that were done in >> recent releases. >> >> > The main problem I see with that is that numpy integers would behave > differently than Python integers, and the difference would be silent. With > option 1 it is possible to write code that behaves the same up to overflow > and the error message would supply a warning when the exponent should be > float. One could argue that numpy scalar integer types could be made to > behave like python integers, but then their behavior would differ from > numpy arrays and numpy scalar arrays. > I'm not sure I understand. Do you mean np.arange(5)**2 would behave differently than np.arange(5)**np.int_(2) or 2**2 would behave differently than np.int_(2)**np.int(2) ? AFAICS, there are many cases where numpy scalars don't behave like python scalars. Also, does different behavior mean different type/dtype or different numbers. (The first I can live with, the second requires human memory usage, which is a scarce resource.) >>> 2**(-2) 0.25 Josef > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sat Jun 4 20:22:40 2016 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 4 Jun 2016 17:22:40 -0700 Subject: [Numpy-discussion] PyArray_Scalar should not use memcpy In-Reply-To: <575340E3.1060904@gmail.com> References: <575340E3.1060904@gmail.com> Message-ID: On Jun 4, 2016 13:58, "Matti Picus" wrote: > > Hi. This is a heads up and RFC about a pull request I am preparing for PyArray_Scalar, within the framework of getting NumPy working properly on PyPy. For those who don't know, the numpy HEAD builds and runs on PyPy2.7 HEAD (otherwise known as nightly default). However there are a number of test failures, some are caused by (ab)use of memcpy on c-level pointers obtained from Py*_FromString(). > > I am currently rewriting PyArray_Scalar to not use memcpy, and wondering how deep of a refactoring would be acceptable by the maintainers in a single pull request? Should I just stick to small changes to eliminate the two calls to memcpy, or clean up and restructure the entire function around a more switch(type_num) programming style? I don't think anyone is particularly attached to the current internal structure of the numpy scalars. Beyond that it's hard to say in the abstract... a small change will certainly be easier and quicker to merge than a big change, but if you have a good clean up then it'd certainly be welcome :-). You know better than us how easy it would be to split up the changes. Two things to watch out for in general are that numpy can be rather picky about abi compatibility and performance regressions. (The scalar code is definitely performance-sensitive.) -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Jun 4 21:16:39 2016 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 4 Jun 2016 19:16:39 -0600 Subject: [Numpy-discussion] Integers to integer powers, let's make a decision In-Reply-To: References: <89daf976787556179fb11a9ea397183d@esrf.fr> Message-ID: On Sat, Jun 4, 2016 at 6:17 PM, wrote: > > > On Sat, Jun 4, 2016 at 8:07 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Sat, Jun 4, 2016 at 5:27 PM, wrote: >> >>> >>> >>> On Sat, Jun 4, 2016 at 6:10 PM, Nathaniel Smith wrote: >>> >>>> On Sat, Jun 4, 2016 at 2:07 PM, V. Armando Sole wrote: >>>> > Also in favor of 2. Always return a float for '**' >>>> >>>> Even if we did want to switch to this, it's such a major >>>> backwards-incompatible change that I'm not sure how we could actually >>>> make the transition without first making it an error for a while. >>>> >>> >>> AFAIU, only the dtype for int**int would change. So, what would be the >>> problem with FutureWarnings as with other dtype changes that were done in >>> recent releases. >>> >>> >> The main problem I see with that is that numpy integers would behave >> differently than Python integers, and the difference would be silent. With >> option 1 it is possible to write code that behaves the same up to overflow >> and the error message would supply a warning when the exponent should be >> float. One could argue that numpy scalar integer types could be made to >> behave like python integers, but then their behavior would differ from >> numpy arrays and numpy scalar arrays. >> > > I'm not sure I understand. > > Do you mean > > np.arange(5)**2 would behave differently than np.arange(5)**np.int_(2) > > or 2**2 would behave differently than np.int_(2)**np.int(2) > The second case. Python returns ints for non-negative integer powers of ints. > > ? > > > AFAICS, there are many cases where numpy scalars don't behave like python > scalars. Also, does different behavior mean different type/dtype or > different numbers. (The first I can live with, the second requires human > memory usage, which is a scarce resource.) > > >>> 2**(-2) > 0.25 > > But we can't mix types in np.arrays and we can't depend on the element values of arrays in the exponent, but only on their type, so 2 ** array([1, -1]) must contain a single type and making that type float would surely break code. Scalar arrays, which are arrays, have the same problem. We can't do what Python does with ndarrays and numpy scalars, and it would be best to be consistent. Division was a simpler problem to deal with, as there were two operators, `//` and `/`. If there were two exponential operators life would be simpler. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sat Jun 4 21:54:26 2016 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 4 Jun 2016 21:54:26 -0400 Subject: [Numpy-discussion] Integers to integer powers, let's make a decision In-Reply-To: References: <89daf976787556179fb11a9ea397183d@esrf.fr> Message-ID: On Sat, Jun 4, 2016 at 9:16 PM, Charles R Harris wrote: > > > On Sat, Jun 4, 2016 at 6:17 PM, wrote: > >> >> >> On Sat, Jun 4, 2016 at 8:07 PM, Charles R Harris < >> charlesr.harris at gmail.com> wrote: >> >>> >>> >>> On Sat, Jun 4, 2016 at 5:27 PM, wrote: >>> >>>> >>>> >>>> On Sat, Jun 4, 2016 at 6:10 PM, Nathaniel Smith wrote: >>>> >>>>> On Sat, Jun 4, 2016 at 2:07 PM, V. Armando Sole wrote: >>>>> > Also in favor of 2. Always return a float for '**' >>>>> >>>>> Even if we did want to switch to this, it's such a major >>>>> backwards-incompatible change that I'm not sure how we could actually >>>>> make the transition without first making it an error for a while. >>>>> >>>> >>>> AFAIU, only the dtype for int**int would change. So, what would be the >>>> problem with FutureWarnings as with other dtype changes that were done in >>>> recent releases. >>>> >>>> >>> The main problem I see with that is that numpy integers would behave >>> differently than Python integers, and the difference would be silent. With >>> option 1 it is possible to write code that behaves the same up to overflow >>> and the error message would supply a warning when the exponent should be >>> float. One could argue that numpy scalar integer types could be made to >>> behave like python integers, but then their behavior would differ from >>> numpy arrays and numpy scalar arrays. >>> >> >> I'm not sure I understand. >> >> Do you mean >> >> np.arange(5)**2 would behave differently than np.arange(5)**np.int_(2) >> >> or 2**2 would behave differently than np.int_(2)**np.int(2) >> > > The second case. Python returns ints for non-negative integer powers of > ints. > > >> >> ? >> >> >> AFAICS, there are many cases where numpy scalars don't behave like python >> scalars. Also, does different behavior mean different type/dtype or >> different numbers. (The first I can live with, the second requires human >> memory usage, which is a scarce resource.) >> >> >>> 2**(-2) >> 0.25 >> >> > But we can't mix types in np.arrays and we can't depend on the element > values of arrays in the exponent, but only on their type, so 2 ** array([1, > -1]) must contain a single type and making that type float would surely > break code. Scalar arrays, which are arrays, have the same problem. We > can't do what Python does with ndarrays and numpy scalars, and it would be > best to be consistent. Division was a simpler problem to deal with, as > there were two operators, `//` and `/`. If there were two exponential > operators life would be simpler. > What bothers me with the entire argument is that you are putting higher priority on returning a dtype than on returning the correct numbers. Reverse the argument: Because we cannot make the return type value dependent we **have** to return float, in order to get the correct number. (It's an argument not what we really have to do.) Which code really breaks, code that gets a float instead of an int, and with some advance warning users that really need to watch their memory can use np.power. My argument before was that I think a simple operator like `**` should work for 90+% of the users and match their expectation, and the users that need to watch dtypes can as well use the function. (I can also live with the exception from case 1., but I really think this is like the python 2 integer division "surprise") Josef > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Jun 4 22:23:41 2016 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 4 Jun 2016 20:23:41 -0600 Subject: [Numpy-discussion] Integers to integer powers, let's make a decision In-Reply-To: References: <89daf976787556179fb11a9ea397183d@esrf.fr> Message-ID: On Sat, Jun 4, 2016 at 7:54 PM, wrote: > > > On Sat, Jun 4, 2016 at 9:16 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Sat, Jun 4, 2016 at 6:17 PM, wrote: >> >>> >>> >>> On Sat, Jun 4, 2016 at 8:07 PM, Charles R Harris < >>> charlesr.harris at gmail.com> wrote: >>> >>>> >>>> >>>> On Sat, Jun 4, 2016 at 5:27 PM, wrote: >>>> >>>>> >>>>> >>>>> On Sat, Jun 4, 2016 at 6:10 PM, Nathaniel Smith wrote: >>>>> >>>>>> On Sat, Jun 4, 2016 at 2:07 PM, V. Armando Sole wrote: >>>>>> > Also in favor of 2. Always return a float for '**' >>>>>> >>>>>> Even if we did want to switch to this, it's such a major >>>>>> backwards-incompatible change that I'm not sure how we could actually >>>>>> make the transition without first making it an error for a while. >>>>>> >>>>> >>>>> AFAIU, only the dtype for int**int would change. So, what would be the >>>>> problem with FutureWarnings as with other dtype changes that were done in >>>>> recent releases. >>>>> >>>>> >>>> The main problem I see with that is that numpy integers would behave >>>> differently than Python integers, and the difference would be silent. With >>>> option 1 it is possible to write code that behaves the same up to overflow >>>> and the error message would supply a warning when the exponent should be >>>> float. One could argue that numpy scalar integer types could be made to >>>> behave like python integers, but then their behavior would differ from >>>> numpy arrays and numpy scalar arrays. >>>> >>> >>> I'm not sure I understand. >>> >>> Do you mean >>> >>> np.arange(5)**2 would behave differently than np.arange(5)**np.int_(2) >>> >>> or 2**2 would behave differently than np.int_(2)**np.int(2) >>> >> >> The second case. Python returns ints for non-negative integer powers of >> ints. >> >> >>> >>> ? >>> >>> >>> AFAICS, there are many cases where numpy scalars don't behave like >>> python scalars. Also, does different behavior mean different type/dtype or >>> different numbers. (The first I can live with, the second requires human >>> memory usage, which is a scarce resource.) >>> >>> >>> 2**(-2) >>> 0.25 >>> >>> >> But we can't mix types in np.arrays and we can't depend on the element >> values of arrays in the exponent, but only on their type, so 2 ** array([1, >> -1]) must contain a single type and making that type float would surely >> break code. Scalar arrays, which are arrays, have the same problem. We >> can't do what Python does with ndarrays and numpy scalars, and it would be >> best to be consistent. Division was a simpler problem to deal with, as >> there were two operators, `//` and `/`. If there were two exponential >> operators life would be simpler. >> > > What bothers me with the entire argument is that you are putting higher > priority on returning a dtype than on returning the correct numbers. > Overflow in integer powers would be correct in modular arithmetic, at least for unsigned. Signed is a bit trickier. But overflow is a known property of numpy integer types. If we raise an exception for the negative exponents we at least aren't returning incorrect numbers. > > Reverse the argument: Because we cannot make the return type value > dependent we **have** to return float, in order to get the correct number. > (It's an argument not what we really have to do.) > >From my point of view, backwards compatibility is the main reason for choosing 1, otherwise I'd pick 2. If it weren't so easy to get floating point by using floating exponents I'd probably choose differently. > > > Which code really breaks, code that gets a float instead of an int, and > with some advance warning users that really need to watch their memory can > use np.power. > > My argument before was that I think a simple operator like `**` should > work for 90+% of the users and match their expectation, and the users that > need to watch dtypes can as well use the function. > > (I can also live with the exception from case 1., but I really think this > is like the python 2 integer division "surprise") > Well, that is why we would raise an exception, making it less surprising ;) We could always try the float option and see what breaks, but I expect there is a fair amount of code using small exponents like 2 or 3 where it is expected that the result is still integer. I would like more input from users than we have seen so far... Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sat Jun 4 23:26:15 2016 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 4 Jun 2016 20:26:15 -0700 Subject: [Numpy-discussion] Integers to integer powers, let's make a decision In-Reply-To: References: <89daf976787556179fb11a9ea397183d@esrf.fr> Message-ID: On Jun 4, 2016 7:23 PM, "Charles R Harris" wrote: > [...] > We could always try the float option and see what breaks, but I expect there is a fair amount of code using small exponents like 2 or 3 where it is expected that the result is still integer. I would like more input from users than we have seen so far... Just to highlight this, if anyone wants to strengthen the argument for switching to float then this is something you can literally do: tweak a local checkout of numpy to return float from int**int and array-of-int**array-of-int, and then try running the test suites of projects like scikit-learn, astropy, nipy, scikit-image, ... (The reason I'm phrasing this as something that people who like the float idea should do is that generally when proposing a risky compatibility-breaking change, the onus is on the ones proposing it to demonstrate that the risk is ok.) -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Jun 4 23:44:52 2016 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 4 Jun 2016 21:44:52 -0600 Subject: [Numpy-discussion] Integers to integer powers, let's make a decision In-Reply-To: References: <89daf976787556179fb11a9ea397183d@esrf.fr> Message-ID: On Sat, Jun 4, 2016 at 9:26 PM, Nathaniel Smith wrote: > On Jun 4, 2016 7:23 PM, "Charles R Harris" > wrote: > > > [...] > > We could always try the float option and see what breaks, but I expect > there is a fair amount of code using small exponents like 2 or 3 where it > is expected that the result is still integer. I would like more input from > users than we have seen so far... > > Just to highlight this, if anyone wants to strengthen the argument for > switching to float then this is something you can literally do: tweak a > local checkout of numpy to return float from int**int and > array-of-int**array-of-int, and then try running the test suites of > projects like scikit-learn, astropy, nipy, scikit-image, ... > > (The reason I'm phrasing this as something that people who like the float > idea should do is that generally when proposing a risky > compatibility-breaking change, the onus is on the ones proposing it to > demonstrate that the risk is ok.) > I was tempted for a bit, but I think the biggest compatibility problem is not current usage, but the fact that code written assuming float results will not work for earlier versions of numpy, and that would be a nasty situation. Given that integers raised to negative integer powers is already pretty much broken, making folks write around an exception will result in code compatible with previous numpy versions. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From alan.isaac at gmail.com Sun Jun 5 09:05:32 2016 From: alan.isaac at gmail.com (Alan Isaac) Date: Sun, 5 Jun 2016 09:05:32 -0400 Subject: [Numpy-discussion] Integers to integer powers, let's make a decision In-Reply-To: References: <89daf976787556179fb11a9ea397183d@esrf.fr> Message-ID: On 6/4/2016 10:23 PM, Charles R Harris wrote: > From my point of view, backwards compatibility is the main reason for > choosing 1, otherwise I'd pick 2. If it weren't so easy to get > floating point by using floating exponents I'd probably choose > differently. As an interested user, I offer a summary of some things I believe are being claimed about the two proposals on the table (for int**int), which are are: 1. raise an error for negative powers 2. always return float Here is a first draft comparison (for int**int only) Proposal 1. effectively throws away an operator - true in this: np.arange(10)**10 already overflows even for int32 much less smaller sizes, and negative powers are now errors - fale in this: you can change an argument to float Proposal 1. effectively behaves more like Python - true in this: for a very small range of numbers, int**int will return int in Python 2 - false in this: In Python, negative exponents produce floats, and int**int does not overflow Proposal 1 is more backwards compatible: true, but this really only affects int**2 (larger arguments quickly overflow) Proposal 2 is a better match for other languages: basically true (see e.g., C++'s overloaded `pow`) Proposal 2 better satisfies the principle of least surprise: probably true for most users, possibly false for some Feel free to add, correct, modify. I think there is a strong argument to always return float, and the real question is whether it is strong enough tosacrifice backwards compatibility. Hope this summary is of some use and not too tendentious, Alan From daoust.mj at gmail.com Sun Jun 5 20:08:32 2016 From: daoust.mj at gmail.com (Mark Daoust) Date: Sun, 5 Jun 2016 20:08:32 -0400 Subject: [Numpy-discussion] ENH: compute many inner products quickly Message-ID: Here's the einsum version: `es = np.einsum('Na,ab,Nb->N',X,A,X)` But that's running ~45x slower than your version. OT: anyone know why einsum is so bad for this one? Mark Daoust On Sat, May 28, 2016 at 11:53 PM, Scott Sievert wrote: > I recently ran into an application where I had to compute many inner > products quickly (roughy 50k inner products in less than a second). I > wanted a vector of inner products over the 50k vectors, or `[x1.T @ A @ x1, > ?, xn.T @ A @ xn]` with A.shape = (1k, 1k). > > My first instinct was to look for a NumPy function to quickly compute > this, such as np.inner. However, it looks like np.inner has some other > behavior and I couldn?t get tensordot/einsum to work for me. > > Then a labmate pointed out that I can just do some slick matrix > multiplication to compute the same quantity, `(X.T * A @ X.T).sum(axis=0)`. > I opened [a PR] with this, and proposed that we define a new function > called `inner_prods` for this. > > However, in the PR, @shoyer pointed out > > > The main challenge is to figure out how to transition the behavior of > all these operations, while preserving backwards compatibility. Quite > likely, we need to pick new names for these functions, though we should try > to pick something that doesn't suggest that they are second class > alternatives. > > Do we choose new function names? Do we add a keyword arg that changes what > np.inner returns? > > [a PR]:https://github.com/numpy/numpy/pull/7690 > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Sun Jun 5 20:41:52 2016 From: shoyer at gmail.com (Stephan Hoyer) Date: Sun, 5 Jun 2016 17:41:52 -0700 Subject: [Numpy-discussion] ENH: compute many inner products quickly In-Reply-To: References: Message-ID: If possible, I'd love to add new functions for "generalized ufunc" linear algebra, and then deprecate (or at least discourage) using the older versions with inferior broadcasting rules. Adding a new keyword arg means we'll be stuck with an awkward API for a long time to come. There are three types of matrix/vector products for which ufuncs would be nice: 1. matrix-matrix product (covered by matmul) 2. matrix-vector product 3. vector-vector (inner) product It's straightful to implement either of the later two options by inserting dummy dimensions and then calling matmul, but that's a pretty awkward API, especially for inner products. Unfortunately, we already use the two most obvious one word names for vector inner products (inner and dot). But on the other hand, one word names are not very descriptive, and the short name "dot" probably mostly exists because of the lack of an infix operator. So I'll start by throwing out some potential new names: For matrix-vector products: matvecmul (if it's worth making a new operator) For inner products: vecmul (similar to matmul, but probably too ambiguous) dot_product inner_prod inner_product On Sat, May 28, 2016 at 8:53 PM, Scott Sievert wrote: > I recently ran into an application where I had to compute many inner > products quickly (roughy 50k inner products in less than a second). I > wanted a vector of inner products over the 50k vectors, or `[x1.T @ A @ x1, > ?, xn.T @ A @ xn]` with A.shape = (1k, 1k). > > My first instinct was to look for a NumPy function to quickly compute > this, such as np.inner. However, it looks like np.inner has some other > behavior and I couldn?t get tensordot/einsum to work for me. > > Then a labmate pointed out that I can just do some slick matrix > multiplication to compute the same quantity, `(X.T * A @ X.T).sum(axis=0)`. > I opened [a PR] with this, and proposed that we define a new function > called `inner_prods` for this. > > However, in the PR, @shoyer pointed out > > > The main challenge is to figure out how to transition the behavior of > all these operations, while preserving backwards compatibility. Quite > likely, we need to pick new names for these functions, though we should try > to pick something that doesn't suggest that they are second class > alternatives. > > Do we choose new function names? Do we add a keyword arg that changes what > np.inner returns? > > [a PR]:https://github.com/numpy/numpy/pull/7690 > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Sun Jun 5 20:44:54 2016 From: shoyer at gmail.com (Stephan Hoyer) Date: Sun, 5 Jun 2016 17:44:54 -0700 Subject: [Numpy-discussion] ENH: compute many inner products quickly In-Reply-To: References: Message-ID: On Sun, Jun 5, 2016 at 5:08 PM, Mark Daoust wrote: > Here's the einsum version: > > `es = np.einsum('Na,ab,Nb->N',X,A,X)` > > But that's running ~45x slower than your version. > > OT: anyone know why einsum is so bad for this one? > I think einsum can create some large intermediate arrays. It certainly doesn't always do multiplication in the optimal order: https://github.com/numpy/numpy/pull/5488 -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sun Jun 5 20:50:44 2016 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 5 Jun 2016 20:50:44 -0400 Subject: [Numpy-discussion] ENH: compute many inner products quickly In-Reply-To: References: Message-ID: On Sun, Jun 5, 2016 at 8:41 PM, Stephan Hoyer wrote: > If possible, I'd love to add new functions for "generalized ufunc" linear > algebra, and then deprecate (or at least discourage) using the older > versions with inferior broadcasting rules. Adding a new keyword arg means > we'll be stuck with an awkward API for a long time to come. > > There are three types of matrix/vector products for which ufuncs would be > nice: > 1. matrix-matrix product (covered by matmul) > 2. matrix-vector product > 3. vector-vector (inner) product > > It's straightful to implement either of the later two options by inserting > dummy dimensions and then calling matmul, but that's a pretty awkward API, > especially for inner products. Unfortunately, we already use the two most > obvious one word names for vector inner products (inner and dot). But on > the other hand, one word names are not very descriptive, and the short name > "dot" probably mostly exists because of the lack of an infix operator. > > So I'll start by throwing out some potential new names: > > For matrix-vector products: > matvecmul (if it's worth making a new operator) > > For inner products: > vecmul (similar to matmul, but probably too ambiguous) > dot_product > inner_prod > inner_product > > how about names in plural as in the PR I thought the `s` in inner_prods would signal better the broadcasting behavior dot_products ... "dots" ? (I guess not) Josef > > > > > On Sat, May 28, 2016 at 8:53 PM, Scott Sievert > wrote: > >> I recently ran into an application where I had to compute many inner >> products quickly (roughy 50k inner products in less than a second). I >> wanted a vector of inner products over the 50k vectors, or `[x1.T @ A @ x1, >> ?, xn.T @ A @ xn]` with A.shape = (1k, 1k). >> >> My first instinct was to look for a NumPy function to quickly compute >> this, such as np.inner. However, it looks like np.inner has some other >> behavior and I couldn?t get tensordot/einsum to work for me. >> >> Then a labmate pointed out that I can just do some slick matrix >> multiplication to compute the same quantity, `(X.T * A @ X.T).sum(axis=0)`. >> I opened [a PR] with this, and proposed that we define a new function >> called `inner_prods` for this. >> >> However, in the PR, @shoyer pointed out >> >> > The main challenge is to figure out how to transition the behavior of >> all these operations, while preserving backwards compatibility. Quite >> likely, we need to pick new names for these functions, though we should try >> to pick something that doesn't suggest that they are second class >> alternatives. >> >> Do we choose new function names? Do we add a keyword arg that changes >> what np.inner returns? >> >> [a PR]:https://github.com/numpy/numpy/pull/7690 >> >> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From perimosocordiae at gmail.com Sun Jun 5 21:08:59 2016 From: perimosocordiae at gmail.com (CJ Carey) Date: Sun, 5 Jun 2016 20:08:59 -0500 Subject: [Numpy-discussion] ENH: compute many inner products quickly In-Reply-To: References: Message-ID: A simple workaround gets the speed back: In [11]: %timeit (X.T * A.dot(X.T)).sum(axis=0) 1 loop, best of 3: 612 ms per loop In [12]: %timeit np.einsum('ij,ji->j', A.dot(X.T), X) 1 loop, best of 3: 414 ms per loop If working as advertised, the code in gh-5488 will convert the three-argument einsum call into my version automatically. On Sun, Jun 5, 2016 at 7:44 PM, Stephan Hoyer wrote: > On Sun, Jun 5, 2016 at 5:08 PM, Mark Daoust wrote: > >> Here's the einsum version: >> >> `es = np.einsum('Na,ab,Nb->N',X,A,X)` >> >> But that's running ~45x slower than your version. >> >> OT: anyone know why einsum is so bad for this one? >> > > I think einsum can create some large intermediate arrays. It certainly > doesn't always do multiplication in the optimal order: > https://github.com/numpy/numpy/pull/5488 > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sun Jun 5 21:20:20 2016 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 5 Jun 2016 19:20:20 -0600 Subject: [Numpy-discussion] ENH: compute many inner products quickly In-Reply-To: References: Message-ID: On Sun, Jun 5, 2016 at 6:41 PM, Stephan Hoyer wrote: > If possible, I'd love to add new functions for "generalized ufunc" linear > algebra, and then deprecate (or at least discourage) using the older > versions with inferior broadcasting rules. Adding a new keyword arg means > we'll be stuck with an awkward API for a long time to come. > > There are three types of matrix/vector products for which ufuncs would be > nice: > 1. matrix-matrix product (covered by matmul) > 2. matrix-vector product > 3. vector-vector (inner) product > > It's straightful to implement either of the later two options by inserting > dummy dimensions and then calling matmul, but that's a pretty awkward API, > especially for inner products. Unfortunately, we already use the two most > obvious one word names for vector inner products (inner and dot). But on > the other hand, one word names are not very descriptive, and the short name > "dot" probably mostly exists because of the lack of an infix operator. > > So I'll start by throwing out some potential new names: > > For matrix-vector products: > matvecmul (if it's worth making a new operator) > > For inner products: > vecmul (similar to matmul, but probably too ambiguous) > dot_product > inner_prod > inner_product > I was using mulmatvec, mulvecmat, mulvecvec back when I was looking at this. I suppose the mul could also go in the middle, or maybe change it to x and put it in the middle: matxvec, vecxmat, vecxvec. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Sun Jun 5 22:33:00 2016 From: sturla.molden at gmail.com (Sturla Molden) Date: Mon, 6 Jun 2016 02:33:00 +0000 (UTC) Subject: [Numpy-discussion] Integers to integer powers, let's make a decision References: Message-ID: <253516852486873130.010471sturla.molden-gmail.com@news.gmane.org> Charles R Harris wrote: > 1. Integers to negative integer powers raise an error. > 2. Integers to integer powers always results in floats. 2 From sebastian at sipsolutions.net Mon Jun 6 03:35:21 2016 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Mon, 06 Jun 2016 09:35:21 +0200 Subject: [Numpy-discussion] ENH: compute many inner products quickly In-Reply-To: References: Message-ID: <1465198521.18293.0.camel@sipsolutions.net> On So, 2016-06-05 at 19:20 -0600, Charles R Harris wrote: > > > On Sun, Jun 5, 2016 at 6:41 PM, Stephan Hoyer > wrote: > > If possible, I'd love to add new functions for "generalized ufunc" > > linear algebra, and then deprecate (or at least discourage) using > > the older versions with inferior broadcasting rules. Adding a new > > keyword arg means we'll be stuck with an awkward API for a long > > time to come. > > > > There are three types of matrix/vector products for which ufuncs > > would be nice: > > 1. matrix-matrix product (covered by matmul) > > 2. matrix-vector product > > 3. vector-vector (inner) product > > > > It's straightful to implement either of the later two options by > > inserting dummy dimensions and then calling matmul, but that's a > > pretty awkward API, especially for inner products. Unfortunately, > > we already use the two most obvious one word names for vector inner > > products (inner and dot). But on the other hand, one word names are > > not very descriptive, and the short name "dot" probably mostly > > exists because of the lack of an infix operator. > > > > So I'll start by throwing out some potential new names: > > > > For matrix-vector products: > > matvecmul (if it's worth making a new operator) > > > > For inner products: > > vecmul (similar to matmul, but probably too ambiguous) > > dot_product > > inner_prod > > inner_product > > > I was using mulmatvec, mulvecmat, mulvecvec back when I was looking > at this. I suppose the mul could also go in the middle, or maybe > change it to x and put it in the middle: matxvec, vecxmat, vecxvec. > Were not some of this part of the gufunc linalg functions and we just removed it because we were not sure about the API? Not sure anymore, but might be worth to have a look. - Sebastian > Chuck?? > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From m.h.vankerkwijk at gmail.com Mon Jun 6 16:11:17 2016 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Mon, 6 Jun 2016 16:11:17 -0400 Subject: [Numpy-discussion] Integers to integer powers, let's make a decision In-Reply-To: <253516852486873130.010471sturla.molden-gmail.com@news.gmane.org> References: <253516852486873130.010471sturla.molden-gmail.com@news.gmane.org> Message-ID: Hi Chuck, I consider either proposal an improvement, but among the two I favour returning float for `**`, because, like for `/`, it ensures one gets closest to the (mathematically) true answer in most cases, and makes duck-typing that much easier -- I'd like to be able to do x** y without having to worry whether x and y are python scalars or numpy arrays of certain type. I do agree with Nathaniel that it would be good to check what actually breaks. Certainly, if anybody is up to making a PR that implements either suggestion, I'd gladly check whether it breaks anything in astropy. I should add that I have no idea how to assuage the fear that new code would break with old versions of numpy, but on the other hand, I don't know its vailidity either, as it seems one either develops larger projects for multiple versions and tests, or writes more scripty things for whatever the current versions are. Certainly, by this argument I better not start using the new `@` operator! I do think the argument that for division it was easier because there was `//` already available is a red herring: here one can use `np.power(a, b, dtype=...)` if one really needs to. All the best, Marten -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon Jun 6 16:17:40 2016 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 6 Jun 2016 14:17:40 -0600 Subject: [Numpy-discussion] Integers to integer powers, let's make a decision In-Reply-To: References: <253516852486873130.010471sturla.molden-gmail.com@news.gmane.org> Message-ID: On Mon, Jun 6, 2016 at 2:11 PM, Marten van Kerkwijk < m.h.vankerkwijk at gmail.com> wrote: > Hi Chuck, > > I consider either proposal an improvement, but among the two I favour > returning float for `**`, because, like for `/`, it ensures one gets > closest to the (mathematically) true answer in most cases, and makes > duck-typing that much easier -- I'd like to be able to do x** y without > having to worry whether x and y are python scalars or numpy arrays of > certain type. > > I do agree with Nathaniel that it would be good to check what actually > breaks. Certainly, if anybody is up to making a PR that implements either > suggestion, I'd gladly check whether it breaks anything in astropy. > > I should add that I have no idea how to assuage the fear that new code > would break with old versions of numpy, but on the other hand, I don't know > its vailidity either, as it seems one either develops larger projects for > multiple versions and tests, or writes more scripty things for whatever the > current versions are. Certainly, by this argument I better not start using > the new `@` operator! > > I do think the argument that for division it was easier because there was > `//` already available is a red herring: here one can use `np.power(a, b, > dtype=...)` if one really needs to. > It looks to me like users want floats, while developers want the easy path of raising an error. Darn those users, they just make life sooo difficult... Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.h.vankerkwijk at gmail.com Mon Jun 6 16:42:19 2016 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Mon, 6 Jun 2016 16:42:19 -0400 Subject: [Numpy-discussion] ENH: compute many inner products quickly In-Reply-To: <1465198521.18293.0.camel@sipsolutions.net> References: <1465198521.18293.0.camel@sipsolutions.net> Message-ID: There I was thinking vector-vector inner product was in fact covered by `np.inner`. Yikes, half inner, half outer. As for names, I think `matvecmul` and `vecmul` do seem quite OK (probably need `vecmatmul` as well, which does the same as `matmul` would for 1-D first argument). But as other suggestions, keeping the `dot` one could think of `vec_dot_vec` and `mat_dot_vec`, etc. More obscure but shorter would be to use the equivalent `einsum` notation: `i_i`, `ij_j`, `i_ij`, `ij_jk`. -- Marten -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaime.frio at gmail.com Mon Jun 6 18:32:44 2016 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Tue, 7 Jun 2016 00:32:44 +0200 Subject: [Numpy-discussion] ENH: compute many inner products quickly In-Reply-To: <1465198521.18293.0.camel@sipsolutions.net> References: <1465198521.18293.0.camel@sipsolutions.net> Message-ID: On Mon, Jun 6, 2016 at 9:35 AM, Sebastian Berg wrote: > On So, 2016-06-05 at 19:20 -0600, Charles R Harris wrote: > > > > > > On Sun, Jun 5, 2016 at 6:41 PM, Stephan Hoyer > > wrote: > > > If possible, I'd love to add new functions for "generalized ufunc" > > > linear algebra, and then deprecate (or at least discourage) using > > > the older versions with inferior broadcasting rules. Adding a new > > > keyword arg means we'll be stuck with an awkward API for a long > > > time to come. > > > > > > There are three types of matrix/vector products for which ufuncs > > > would be nice: > > > 1. matrix-matrix product (covered by matmul) > > > 2. matrix-vector product > > > 3. vector-vector (inner) product > > > > > > It's straightful to implement either of the later two options by > > > inserting dummy dimensions and then calling matmul, but that's a > > > pretty awkward API, especially for inner products. Unfortunately, > > > we already use the two most obvious one word names for vector inner > > > products (inner and dot). But on the other hand, one word names are > > > not very descriptive, and the short name "dot" probably mostly > > > exists because of the lack of an infix operator. > > > > > > So I'll start by throwing out some potential new names: > > > > > > For matrix-vector products: > > > matvecmul (if it's worth making a new operator) > > > > > > For inner products: > > > vecmul (similar to matmul, but probably too ambiguous) > > > dot_product > > > inner_prod > > > inner_product > > > > > I was using mulmatvec, mulvecmat, mulvecvec back when I was looking > > at this. I suppose the mul could also go in the middle, or maybe > > change it to x and put it in the middle: matxvec, vecxmat, vecxvec. > > > > Were not some of this part of the gufunc linalg functions and we just > removed it because we were not sure about the API? Not sure anymore, > but might be worth to have a look. > We have from numpy.core.umath_tests import inner1d which does vectorized vector-vector multiplication, but it's undocumented. There is also a matrix_multiply in that same module that does the obvious thing. And when gufuncs were introduced in linalg, there were a bunch of functions doing all sorts of operations without intermediate storage, e.g. sum3(a, b, c) -> a + b + c, that were removed before merging the PR. Wasn't involved at the time, so not sure what the rationale was. Since we are at it, should quadratic/bilinear forms get their own function too? That is, after all, what the OP was asking for. Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Mon Jun 6 18:41:15 2016 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 07 Jun 2016 00:41:15 +0200 Subject: [Numpy-discussion] ENH: compute many inner products quickly In-Reply-To: References: <1465198521.18293.0.camel@sipsolutions.net> Message-ID: <1465252875.17700.2.camel@sipsolutions.net> On Di, 2016-06-07 at 00:32 +0200, Jaime Fern?ndez del R?o wrote: > On Mon, Jun 6, 2016 at 9:35 AM, Sebastian Berg s.net> wrote: > > On So, 2016-06-05 at 19:20 -0600, Charles R Harris wrote: > > > > > > > > > On Sun, Jun 5, 2016 at 6:41 PM, Stephan Hoyer > > > wrote: > > > > If possible, I'd love to add new functions for "generalized > > ufunc" > > > > linear algebra, and then deprecate (or at least discourage) > > using > > > > the older versions with inferior broadcasting rules. Adding a > > new > > > > keyword arg means we'll be stuck with an awkward API for a long > > > > time to come. > > > > > > > > There are three types of matrix/vector products for which > > ufuncs > > > > would be nice: > > > > 1. matrix-matrix product (covered by matmul) > > > > 2. matrix-vector product > > > > 3. vector-vector (inner) product > > > > > > > > It's straightful to implement either of the later two options > > by > > > > inserting dummy dimensions and then calling matmul, but that's > > a > > > > pretty awkward API, especially for inner products. > > Unfortunately, > > > > we already use the two most obvious one word names for vector > > inner > > > > products (inner and dot). But on the other hand, one word names > > are > > > > not very descriptive, and the short name "dot" probably mostly > > > > exists because of the lack of an infix operator. > > > > > > > > So I'll start by throwing out some potential new names: > > > > > > > > For matrix-vector products: > > > > matvecmul (if it's worth making a new operator) > > > > > > > > For inner products: > > > > vecmul (similar to matmul, but probably too ambiguous) > > > > dot_product > > > > inner_prod > > > > inner_product > > > > > > > I was using mulmatvec, mulvecmat, mulvecvec back when I was > > looking > > > at this. I suppose the mul could also go in the middle, or maybe > > > change it to x and put it in the middle: matxvec, vecxmat, > > vecxvec. > > > > > > > Were not some of this part of the gufunc linalg functions and we > > just > > removed it because we were not sure about the API? Not sure > > anymore, > > but might be worth to have a look. > We have > from numpy.core.umath_tests import inner1d > which does vectorized vector-vector multiplication, but it's > undocumented.? There is also a matrix_multiply in that same module > that does the obvious thing. > And when gufuncs were introduced in linalg, there were a bunch of > functions doing all sorts of operations without intermediate storage, > e.g. sum3(a, b, c) -> a + b + c, that were removed before merging the > PR. Wasn't involved at the time, so not sure what the rationale was. I think it was probably just that the api was not thought out much. Adding sum3 to linalg does seem a bit funny ;). I would not mind it in numpy as such I guess, if it quite a bit faster anyway, but maybe in its own submodule for these kind of performance optimizations. - Sebastian > Since we are at it, should quadratic/bilinear forms get their own > function too?? That is, after all, what the OP was asking for. > Jaime > --? > (\__/) > ( O.o) > ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus > planes de dominaci?n mundial. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From njs at pobox.com Mon Jun 6 18:42:25 2016 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 6 Jun 2016 15:42:25 -0700 Subject: [Numpy-discussion] ENH: compute many inner products quickly In-Reply-To: References: Message-ID: On Sun, Jun 5, 2016 at 5:41 PM, Stephan Hoyer wrote: > If possible, I'd love to add new functions for "generalized ufunc" linear > algebra, and then deprecate (or at least discourage) using the older > versions with inferior broadcasting rules. Adding a new keyword arg means > we'll be stuck with an awkward API for a long time to come. > > There are three types of matrix/vector products for which ufuncs would be > nice: > 1. matrix-matrix product (covered by matmul) > 2. matrix-vector product > 3. vector-vector (inner) product > > It's straightful to implement either of the later two options by inserting > dummy dimensions and then calling matmul, but that's a pretty awkward API, > especially for inner products. Unfortunately, we already use the two most > obvious one word names for vector inner products (inner and dot). But on the > other hand, one word names are not very descriptive, and the short name > "dot" probably mostly exists because of the lack of an infix operator. > > So I'll start by throwing out some potential new names: > > For matrix-vector products: > matvecmul (if it's worth making a new operator) > > For inner products: > vecmul (similar to matmul, but probably too ambiguous) > dot_product > inner_prod > inner_product Given how core to linear algebra these are, and that this is a family of somewhat expert-oriented functions, I think it'd even be fine to leave the "product" part implicit, like: np.linalg.matrix_matrix np.linalg.matrix_vector np.linalg.vector_matrix np.linalg.vector_vector np.linalg.vector_matrix_vector (for bilinear forms) (or we could shorten matrix -> mat, vector -> vec if we must.) -n -- Nathaniel J. Smith -- https://vorpus.org From shoyer at gmail.com Mon Jun 6 19:35:34 2016 From: shoyer at gmail.com (Stephan Hoyer) Date: Mon, 6 Jun 2016 16:35:34 -0700 Subject: [Numpy-discussion] ENH: compute many inner products quickly In-Reply-To: References: <1465198521.18293.0.camel@sipsolutions.net> Message-ID: On Mon, Jun 6, 2016 at 3:32 PM, Jaime Fern?ndez del R?o < jaime.frio at gmail.com> wrote: > Since we are at it, should quadratic/bilinear forms get their own function > too? That is, after all, what the OP was asking for. > If we have matvecmul and vecmul, then how to implement bilinear forms efficiently becomes pretty clear: np.vecmul(b, np.matvecmul(A, b)) I'm not sure writing a dedicated function in numpy itself makes sense for something this easy. I suppose there would be some performance gains from not saving the immediate result, but I suspect this would be premature optimization in most cases. -------------- next part -------------- An HTML attachment was scrubbed... URL: From gfyoung17 at gmail.com Mon Jun 6 22:04:00 2016 From: gfyoung17 at gmail.com (G Young) Date: Tue, 7 Jun 2016 03:04:00 +0100 Subject: [Numpy-discussion] broadcasting for randint In-Reply-To: References: Message-ID: Just wanted to ping the mailing list again in case this email (see below) got lost in your inboxes. Would be great to get some feedback on this! Thanks! On Sun, May 22, 2016 at 2:15 AM, G Young wrote: > Hi, > > I have had a PR open for quite > some time now that allows arguments to broadcast in *randint*. While the > functionality is fully in-place and very robust, the obstacle at this point > is the implementation. > > When the *dtype* parameter was added to *randint* (see here > ), a big issue with the > implementation was that it created so much duplicate code that it would be > a huge maintenance nightmare. However, this was dismissed in the original > PR message because it was believed that template-ing would be trivial, > which seemed reasonable at the time. > > When I added broadcasting, I introduced a template system to the code that > dramatically cut down on the duplication. However, the obstacle has been > whether or not this template system is too *ad hoc* to be merged into the > library. Implementing a template in Cython was not considered sufficient > and is in fact very tricky to do, and unfortunately, I have not received > any constructive suggestions from maintainers about how to proceed, so I'm > opening this up to the mailing to see whether or not there are better > alternatives to what I did, whether this should be merged as it, or whether > this should be tabled until a better template can be found. > > Thanks! > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre.debuyl at chem.kuleuven.be Tue Jun 7 09:59:13 2016 From: pierre.debuyl at chem.kuleuven.be (Pierre de Buyl) Date: Tue, 7 Jun 2016 15:59:13 +0200 Subject: [Numpy-discussion] EuroSciPy 2016 In-Reply-To: <20160531130523.GN12938@pi-x230> References: <20160531130523.GN12938@pi-x230> Message-ID: <20160607135913.GR1738@pi-x230> Dear NumPy and SciPy communities, On Tue, May 31, 2016 at 03:05:23PM +0200, Pierre de Buyl wrote: > EuroSciPy 2016 takes place in Erlangen, Germany, from the 23 to the 27 of August > and consists of two days of tutorials (beginner and advanced tracks) and two > days of conference representing many fields of science, with a focus on Python > tools for science. A day of sprints follows (sprints TBA). > > The keynote speakers are Ga?l Varoquaux and Abby Cabunoc Mayes and we can expect > a rich tutorial and scientific program! Videos from previous years are available > at https://www.youtube.com/playlist?list=PLYx7XA2nY5GeQCCugyvtnHMVLdhYlrRxH and > https://www.youtube.com/playlist?list=PLYx7XA2nY5Gcpabmu61kKcToLz0FapmHu > > Visit us, register and submit an abstract on our website! > https://www.euroscipy.org/2016/ EuroSciPy 2016 has extended the deadline for submitting contributions! You have until the 19th of june to submit a talk/poster/tutorial at https://www.euroscipy.org/2016/ SciPythonic regards, The EuroSciPy 2016 team From gfyoung17 at gmail.com Tue Jun 7 13:23:59 2016 From: gfyoung17 at gmail.com (G Young) Date: Tue, 7 Jun 2016 18:23:59 +0100 Subject: [Numpy-discussion] broadcasting for randint In-Reply-To: References: Message-ID: There seems to be a push in my PR now for using Tempita as a way to solve this issue with the ad-hoc templating. However, before I go about attempting this, it would be great to receive feedback from other developers on this, especially from some of the numpy maintainers. Thanks! On Tue, Jun 7, 2016 at 3:04 AM, G Young wrote: > Just wanted to ping the mailing list again in case this email (see below) > got lost in your inboxes. Would be great to get some feedback on this! > Thanks! > > On Sun, May 22, 2016 at 2:15 AM, G Young wrote: > >> Hi, >> >> I have had a PR open for >> quite some time now that allows arguments to broadcast in *randint*. >> While the functionality is fully in-place and very robust, the obstacle at >> this point is the implementation. >> >> When the *dtype* parameter was added to *randint* (see here >> ), a big issue with the >> implementation was that it created so much duplicate code that it would be >> a huge maintenance nightmare. However, this was dismissed in the original >> PR message because it was believed that template-ing would be trivial, >> which seemed reasonable at the time. >> >> When I added broadcasting, I introduced a template system to the code >> that dramatically cut down on the duplication. However, the obstacle has >> been whether or not this template system is too *ad hoc* to be merged >> into the library. Implementing a template in Cython was not considered >> sufficient and is in fact very tricky to do, and unfortunately, I have not >> received any constructive suggestions from maintainers about how to >> proceed, so I'm opening this up to the mailing to see whether or not there >> are better alternatives to what I did, whether this should be merged as it, >> or whether this should be tabled until a better template can be found. >> >> Thanks! >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From antony.lee at berkeley.edu Thu Jun 9 03:36:36 2016 From: antony.lee at berkeley.edu (Antony Lee) Date: Thu, 9 Jun 2016 00:36:36 -0700 Subject: [Numpy-discussion] Requesting a PR review for #5822 Message-ID: https://github.com/numpy/numpy/pull/5822 is a year-old PR which allows many random distributions to have a scale of exactly 0 (in which case a stream of zeros is returned of whatever constant value is appropriate). It passes all tests and has been sitting there for a while. Would a core dev be kind enough to have a look at it? Thanks! Antony -------------- next part -------------- An HTML attachment was scrubbed... URL: From mail at telenczuk.pl Thu Jun 9 17:25:27 2016 From: mail at telenczuk.pl (mail at telenczuk.pl) Date: Thu, 09 Jun 2016 23:25:27 +0200 Subject: [Numpy-discussion] NumPy lesson at EuroScipy2016? Message-ID: <5759dec733e30_c0212276204b@Pct-EqAlain-Z30.notmuch> Hi all, Recently I taught "Advanced NumPy" lesson at a Software Carpentry workshop [1]. It covered a review of basic operations on numpy arrays and also more advanced topics: indexing, broadcasting, dtypes and memory layout. I would greatly appreciate your feedback on the lesson materials, which are available on github pages [2]. I am also thinking of proposing this lesson as a EuroScipy 2016 tutorial. Is anyone already planning to teach NumPy there? If so, would you be interested to team up for this lesson (as a co-instructor, helper or mentor)? I gratefully acknowledge inspiration, some examples and exercises from the following materials: - NumPy chapters of "SciPy lectures" by Emmanuelle Gouillart, Didrik Pinte, Ga?l Varoquaux, and Pauli Virtanen [3] - "Advanced NumPy patterns" by Juan Nunez-Iglesias [4] - "The NumPy array. A structure for efficient numerical computation." by Stefan van der Walt [5] Yours, Bartosz [1] http://telecom-python.telenczuk.pl [2] https://paris-swc.github.io/advanced-numpy-lesson/ [3] http://www.scipy-lectures.org/ [4] https://github.com/jni/aspp2015/tree/delivered [5] https://python.g-node.org/python-summerschool-2014/numpy.html From njs at pobox.com Fri Jun 10 02:42:47 2016 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 9 Jun 2016 23:42:47 -0700 Subject: [Numpy-discussion] Integers to integer powers, let's make a decision In-Reply-To: References: <253516852486873130.010471sturla.molden-gmail.com@news.gmane.org> Message-ID: On Mon, Jun 6, 2016 at 1:17 PM, Charles R Harris wrote: > > > > On Mon, Jun 6, 2016 at 2:11 PM, Marten van Kerkwijk wrote: >> >> Hi Chuck, >> >> I consider either proposal an improvement, but among the two I favour returning float for `**`, because, like for `/`, it ensures one gets closest to the (mathematically) true answer in most cases, and makes duck-typing that much easier -- I'd like to be able to do x** y without having to worry whether x and y are python scalars or numpy arrays of certain type. >> >> I do agree with Nathaniel that it would be good to check what actually breaks. Certainly, if anybody is up to making a PR that implements either suggestion, I'd gladly check whether it breaks anything in astropy. >> >> I should add that I have no idea how to assuage the fear that new code would break with old versions of numpy, but on the other hand, I don't know its vailidity either, as it seems one either develops larger projects for multiple versions and tests, or writes more scripty things for whatever the current versions are. Certainly, by this argument I better not start using the new `@` operator! >> >> I do think the argument that for division it was easier because there was `//` already available is a red herring: here one can use `np.power(a, b, dtype=...)` if one really needs to. > > > It looks to me like users want floats, while developers want the easy path of raising an error. Darn those users, they just make life sooo difficult... I dunno, with my user hat on I'd be incredibly surprised / confused / annoyed if an innocent-looking expression like np.arange(10) ** 2 started returning floats... having exact ints is a really nice feature of Python/numpy as compared to R/Javascript, and while it's true that int64 can overflow, there are also large powers that can be more precisely represented as int64 than float. -n -- Nathaniel J. Smith -- https://vorpus.org From gawron at mail.sdsu.edu Fri Jun 10 03:06:22 2016 From: gawron at mail.sdsu.edu (Mark Gawron) Date: Fri, 10 Jun 2016 00:06:22 -0700 Subject: [Numpy-discussion] scipy.stats.qqplot and scipy.stats.probplot axis labeling Message-ID: <9F7D269B-E5C9-4BB5-B8A8-314E68667538@mail.sdsu.edu> The scipy.stats.qqplot and scipy.stats.probplot functions plot expected values versus actual data values for visualization of fit to a distribution. First a one-D array of expected percentiles is generated for a sample of size N; then that is passed to dist.ppf, the per cent point function for the chosen distribution, to return an array of expected values. The visualized data points are pairs of expected and actual values, and a linear regression is done on these to produce the line data points in this distribution should lie on. Where x is the input data array and dist the chosen distribution we have: > osr = np.sort(x) > osm_uniform = _calc_uniform_order_statistic_medians(len(x)) > osm = dist.ppf(osm_uniform) > slope, intercept, r, prob, sterrest = stats.linregress(osm, osr) My question concerns the plot display. > ax.plot(osm, osr, 'bo', osm, slope*osm + intercept, 'r-') The x-axis of the resulting plot is labeled quantiles, but the xticks and xticklabels produced produced by qqplot and problplot do not seem correct for the their intended interpretations. First the numbers on the x-axis do not represent quantiles; the intervals between them do not in general contain equal numbers of points. For a normal distribution with sigma=1, they represent standard deviations. Changing the label on the x-axis does not seem like a very good solution, because the interpretation of the values on the x-axis will be different for different distributions. Rather the right solution seems to be to actually show quantiles on the x-axis. The numbers on the x-axis can stay as they are, representing quantile indexes, but they need to be spaced so as to show the actual division points that carve the population up into groups of the same size. This can be done in something like the following way. > import numpy as np > xt = np.arange(-3,3,dtype=int) > # Find the 5 quantiles to divide the data into sixths > percentiles = [x*.167 + .502 for x in xt] > percentiles = np.array(percentiles + [.999]) > vals = dist.ppf(percentiles) > ax.set_xticks(vals) > xt = np.array(list(xt)+[3]) > ax.set_xticklabels(xt) > ax.set_xlabel('Quantile') > plt.show() I?ve attached two images to show the difference between the current visualization and the suggested one. Mark Gawron -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: current_probplot.png Type: image/png Size: 42691 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: revised_probplot.png Type: image/png Size: 36922 bytes Desc: not available URL: From p.j.a.cock at googlemail.com Fri Jun 10 04:11:17 2016 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 10 Jun 2016 09:11:17 +0100 Subject: [Numpy-discussion] Integers to integer powers, let's make a decision In-Reply-To: References: <253516852486873130.010471sturla.molden-gmail.com@news.gmane.org> Message-ID: On Fri, Jun 10, 2016 at 7:42 AM, Nathaniel Smith wrote: > On Mon, Jun 6, 2016 at 1:17 PM, Charles R Harris > wrote: >> >> ... >> >> It looks to me like users want floats, while developers want the >> easy path of raising an error. Darn those users, they just make >> life sooo difficult... > > I dunno, with my user hat on I'd be incredibly surprised / confused / > annoyed if an innocent-looking expression like > > np.arange(10) ** 2 > > started returning floats... having exact ints is a really nice feature > of Python/numpy as compared to R/Javascript, and while it's true that > int64 can overflow, there are also large powers that can be more > precisely represented as int64 than float. > > -n I was about to express an preference for (1), preserving integers on output but treating negative powers as an error. However, I realised the use case I had in mind does not apply: Where I've used integer matrices as network topology adjacency matrixes, to get connectivity by paths of n steps you use A**n, by which I mean A x A x ... A using matrix multiplication. But in NumPy A**n will do element wise multiplication, so this example is not helpful. Charles R Harris wrote: > 1. Integers to negative integer powers raise an error. > 2. Integers to integer powers always results in floats. As an aside, using boolean matrices can be helpful in the context of connectivity matrices. How would the proposals here affect booleans, where there is no risk of overflow? If we went with (2), using promotion to floats here would be very odd: >>> import numpy >>> A = numpy.array([[False,True,False],[True,False,True],[True,True,False]], dtype=numpy.bool) >>> A array([[False, True, False], [ True, False, True], [ True, True, False]], dtype=bool) >>> A*A array([[False, True, False], [ True, False, True], [ True, True, False]], dtype=bool) >>> A**2 array([[False, True, False], [ True, False, True], [ True, True, False]], dtype=bool) >>> numpy.dot(A,A) array([[ True, False, True], [ True, True, False], [ True, True, True]], dtype=bool) >>> Regards, Peter From fabien.maussion at gmail.com Fri Jun 10 07:15:29 2016 From: fabien.maussion at gmail.com (Fabien) Date: Fri, 10 Jun 2016 13:15:29 +0200 Subject: [Numpy-discussion] Indexing with floats Message-ID: Hi, I really tried to do my homework before asking this here, but I just couldn't find the relevant information anywhere... My question is about the rationale behind forbidding indexing with floats, i.e.: >>> x[2.] __main__:1: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future I don't find this very handy from a user's perspective, and I'd be grateful for pointers on discussion threads and/or PRs where this has been discussed, so that I can understand why it's important. Maybe a short note on the indexing docpage (http://docs.scipy.org/doc/numpy-1.11.0/user/basics.indexing.html) could be useful also. Thanks a lot! Fabien From robert.kern at gmail.com Fri Jun 10 07:48:20 2016 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 10 Jun 2016 12:48:20 +0100 Subject: [Numpy-discussion] Indexing with floats In-Reply-To: References: Message-ID: On Fri, Jun 10, 2016 at 12:15 PM, Fabien wrote: > > Hi, > > I really tried to do my homework before asking this here, but I just couldn't find the relevant information anywhere... > > My question is about the rationale behind forbidding indexing with floats, i.e.: > > >>> x[2.] > __main__:1: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future > > I don't find this very handy from a user's perspective, and I'd be grateful for pointers on discussion threads and/or PRs where this has been discussed, so that I can understand why it's important. https://mail.scipy.org/pipermail/numpy-discussion/2012-December/064705.html https://github.com/numpy/numpy/issues/2810 https://github.com/numpy/numpy/pull/2891 https://github.com/numpy/numpy/pull/3243 https://mail.scipy.org/pipermail/numpy-discussion/2015-July/073125.html Note that the future is coming in the next numpy release: https://github.com/numpy/numpy/pull/6271 -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From fabien.maussion at gmail.com Fri Jun 10 08:02:21 2016 From: fabien.maussion at gmail.com (Fabien) Date: Fri, 10 Jun 2016 14:02:21 +0200 Subject: [Numpy-discussion] Indexing with floats In-Reply-To: References: Message-ID: On 06/10/2016 01:48 PM, Robert Kern wrote: > https://mail.scipy.org/pipermail/numpy-discussion/2012-December/064705.html > https://github.com/numpy/numpy/issues/2810 > https://github.com/numpy/numpy/pull/2891 > https://github.com/numpy/numpy/pull/3243 > https://mail.scipy.org/pipermail/numpy-discussion/2015-July/073125.html > > Note that the future is coming in the next numpy release: > > https://github.com/numpy/numpy/pull/6271 > > -- > Robert Kern Thanks Robert! From alan.isaac at gmail.com Fri Jun 10 08:10:57 2016 From: alan.isaac at gmail.com (Alan Isaac) Date: Fri, 10 Jun 2016 08:10:57 -0400 Subject: [Numpy-discussion] Integers to integer powers, let's make a decision In-Reply-To: References: <253516852486873130.010471sturla.molden-gmail.com@news.gmane.org> Message-ID: On 6/10/2016 2:42 AM, Nathaniel Smith wrote: > I dunno, with my user hat on I'd be incredibly surprised / confused / > annoyed if an innocent-looking expression like > > np.arange(10) ** 2 > > started returning floats... having exact ints is a really nice feature > of Python/numpy as compared to R/Javascript, and while it's true that > int64 can overflow, there are also large powers that can be more > precisely represented as int64 than float. Is np.arange(10)**10 also "innocent looking" to a Python user? Also, I am confused by what "large powers" means in this context. Is 2**40 a "large power"? Finally, is np.arange(1,3)**-2 "innocent looking" to a Python user? Cheers, Alan From jni.soma at gmail.com Fri Jun 10 12:49:10 2016 From: jni.soma at gmail.com (Juan Nunez-Iglesias) Date: Fri, 10 Jun 2016 09:49:10 -0700 Subject: [Numpy-discussion] Integers to integer powers, let's make a decision In-Reply-To: References: Message-ID: +1 to Alan's point. Having different type behaviour depending on the values of x and y for np.arange(x) ** y would be awful, and it would also be awful to have to worry about overflow here... ... Having said that, it would be equally annoying to not have a way to define integer powers... From: Alan Isaac Reply: Discussion of Numerical Python Date: 10 June 2016 at 5:10:57 AM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] Integers to integer powers, let's make a decision On 6/10/2016 2:42 AM, Nathaniel Smith wrote: > > I dunno, with my user hat on I'd be incredibly surprised / confused / > annoyed if an innocent-looking expression like > > np.arange(10) ** 2 > > started returning floats... having exact ints is a really nice feature > of Python/numpy as compared to R/Javascript, and while it's true that > int64 can overflow, there are also large powers that can be more > precisely represented as int64 than float. > > > > Is np.arange(10)**10 also "innocent looking" to a Python user? > > Also, I am confused by what "large powers" means in this context. > Is 2**40 a "large power"? > > Finally, is np.arange(1,3)**-2 "innocent looking" to a Python user? > > Cheers, > Alan > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From insertinterestingnamehere at gmail.com Fri Jun 10 13:20:39 2016 From: insertinterestingnamehere at gmail.com (Ian Henriksen) Date: Fri, 10 Jun 2016 17:20:39 +0000 Subject: [Numpy-discussion] Integers to integer powers, let's make a decision In-Reply-To: References: <253516852486873130.010471sturla.molden-gmail.com@news.gmane.org> Message-ID: On Fri, Jun 10, 2016 at 12:42 AM Nathaniel Smith wrote: > On Mon, Jun 6, 2016 at 1:17 PM, Charles R Harris > wrote: > > > > > > > > On Mon, Jun 6, 2016 at 2:11 PM, Marten van Kerkwijk < > m.h.vankerkwijk at gmail.com> wrote: > >> > >> Hi Chuck, > >> > >> I consider either proposal an improvement, but among the two I favour > returning float for `**`, because, like for `/`, it ensures one gets > closest to the (mathematically) true answer in most cases, and makes > duck-typing that much easier -- I'd like to be able to do x** y without > having to worry whether x and y are python scalars or numpy arrays of > certain type. > >> > >> I do agree with Nathaniel that it would be good to check what actually > breaks. Certainly, if anybody is up to making a PR that implements either > suggestion, I'd gladly check whether it breaks anything in astropy. > >> > >> I should add that I have no idea how to assuage the fear that new code > would break with old versions of numpy, but on the other hand, I don't know > its vailidity either, as it seems one either develops larger projects for > multiple versions and tests, or writes more scripty things for whatever the > current versions are. Certainly, by this argument I better not start using > the new `@` operator! > >> > >> I do think the argument that for division it was easier because there > was `//` already available is a red herring: here one can use `np.power(a, > b, dtype=...)` if one really needs to. > > > > > > It looks to me like users want floats, while developers want the easy > path of raising an error. Darn those users, they just make life sooo > difficult... > > I dunno, with my user hat on I'd be incredibly surprised / confused / > annoyed if an innocent-looking expression like > > np.arange(10) ** 2 > > started returning floats... having exact ints is a really nice feature > of Python/numpy as compared to R/Javascript, and while it's true that > int64 can overflow, there are also large powers that can be more > precisely represented as int64 than float. > > -n > > > This is very much my line of thinking as well. Generally when I'm doing operations with integers, I expect integer output, regardless of floor division and overflow. There's a lot to both sides of the argument though. Python's arbitrary precision integers alleviate overflow concerns very nicely, but forcing float output for people who actually want integers is not at all ideal either. Best, Ian Henriksen -------------- next part -------------- An HTML attachment was scrubbed... URL: From allanhaldane at gmail.com Fri Jun 10 13:20:50 2016 From: allanhaldane at gmail.com (Allan Haldane) Date: Fri, 10 Jun 2016 13:20:50 -0400 Subject: [Numpy-discussion] Integers to integer powers, let's make a decision In-Reply-To: References: <253516852486873130.010471sturla.molden-gmail.com@news.gmane.org> Message-ID: <575AF6F2.60505@gmail.com> On 06/10/2016 08:10 AM, Alan Isaac wrote: > Is np.arange(10)**10 also "innocent looking" to a Python user? This doesn't bother me much because numpy users have to be aware of overflow issues in lots of other (simple) cases anyway, eg plain addition and multiplication. I'll add my +1 for integer powers returning an integer, and an error for negative powers. Integer powers are a useful operation that I would bet a lot of code currently depends on. Allan From alan.isaac at gmail.com Fri Jun 10 13:28:32 2016 From: alan.isaac at gmail.com (Alan Isaac) Date: Fri, 10 Jun 2016 13:28:32 -0400 Subject: [Numpy-discussion] Integers to integer powers, let's make a decision In-Reply-To: References: <253516852486873130.010471sturla.molden-gmail.com@news.gmane.org> Message-ID: On 6/10/2016 1:20 PM, Ian Henriksen wrote: > forcing float output for people who actually want integers is not at all ideal Yes, there definitely should be a function supporting this. Alan From njs at pobox.com Fri Jun 10 13:34:57 2016 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 10 Jun 2016 10:34:57 -0700 Subject: [Numpy-discussion] Integers to integer powers, let's make a decision In-Reply-To: References: <253516852486873130.010471sturla.molden-gmail.com@news.gmane.org> Message-ID: On Jun 10, 2016 05:11, "Alan Isaac" wrote: > > On 6/10/2016 2:42 AM, Nathaniel Smith wrote: >> >> I dunno, with my user hat on I'd be incredibly surprised / confused / >> annoyed if an innocent-looking expression like >> >> np.arange(10) ** 2 >> >> started returning floats... having exact ints is a really nice feature >> of Python/numpy as compared to R/Javascript, and while it's true that >> int64 can overflow, there are also large powers that can be more >> precisely represented as int64 than float. > > > > Is np.arange(10)**10 also "innocent looking" to a Python user? You keep pounding on this example. It's a fine example, but, c'mon. **2 is probably at least 100x more common in real source code. Maybe 1000x more common. Why should we break the common case for your edge case? > Also, I am confused by what "large powers" means in this context. > Is 2**40 a "large power"? I meant the range 2**53 -- 2**63 where integers have more precision than floats. It's not a terribly important point, but it is true that there are currently ** operations that return exact results, and that would become impossible to do exactly if we switch ** to return floats. > Finally, is np.arange(1,3)**-2 "innocent looking" to a Python user? Maybe, maybe not. Historically your example here has always silently returned nonsense, and no one seems to complain. Maybe this just means everyone's become used to the pain, I dunno. But probably it also has something to do with how uncommon this code is in practice. OTOH I'm convinced that making **2 return floats is going to generate a ton of complaints -- first because of all the code we broke, but then (I predict, could be wrong) on an ongoing basis, as new users trip over the unexpected **2 behavior. Because, again, **2 is something that orders of magnitude more people will actually trip over, and in contexts where they'll have no idea why it would return float. (Remember that whole discussion we had at the beginning of the thread, where very experienced numpy users started out thinking we should return float for negative powers only, and then we had to carefully think through how numpy's type system works to convince ourselves that this wasn't possible? I don't really want to force every new user to recapitulate that discussion just when they're learning what types even are...) -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From alan.isaac at gmail.com Fri Jun 10 13:38:31 2016 From: alan.isaac at gmail.com (Alan Isaac) Date: Fri, 10 Jun 2016 13:38:31 -0400 Subject: [Numpy-discussion] Integers to integer powers, let's make a decision In-Reply-To: <575AF6F2.60505@gmail.com> References: <253516852486873130.010471sturla.molden-gmail.com@news.gmane.org> <575AF6F2.60505@gmail.com> Message-ID: <4eb8cec9-e3f5-f11a-4182-89ca3403c203@gmail.com> On 6/10/2016 1:20 PM, Allan Haldane wrote: > numpy users have to be aware of > overflow issues in lots of other (simple) cases anyway, eg plain > addition and multiplication. This is not comparable because *almost all* integer combinations overflow for exponentiation. See the discussion at https://wiki.haskell.org/Power_function http://stackoverflow.com/questions/6400568/exponentiation-in-haskell Alan From alan.isaac at gmail.com Fri Jun 10 13:50:47 2016 From: alan.isaac at gmail.com (Alan Isaac) Date: Fri, 10 Jun 2016 13:50:47 -0400 Subject: [Numpy-discussion] Integers to integer powers, let's make a decision In-Reply-To: References: <253516852486873130.010471sturla.molden-gmail.com@news.gmane.org> Message-ID: On 6/10/2016 1:34 PM, Nathaniel Smith wrote: > You keep pounding on this example. It's a fine example, but, c'mon. **2 is probably at least 100x more common in real source code. Maybe 1000x more common. Why should we break the > common case for your edge case? It is hardly an "edge case". Again, **almost all** integer combinations overflow: that's the point. If you were promoting to a Python long integer, that would change things. But hobbling a whole operator so that people don't have to say `a*a` seems absurdly wasteful. Additionally, returning floats provides a better match to Python's behavior (i.e., it allows sensible handling of negative powers). Users who really want in output and understand overflow should be supported with a function. Anyway, I've said my piece and will shut up now. Cheers, Alan From njs at pobox.com Fri Jun 10 14:00:48 2016 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 10 Jun 2016 11:00:48 -0700 Subject: [Numpy-discussion] Integers to integer powers, let's make a decision In-Reply-To: References: <253516852486873130.010471sturla.molden-gmail.com@news.gmane.org> Message-ID: On Jun 10, 2016 10:50, "Alan Isaac" wrote: > > On 6/10/2016 1:34 PM, Nathaniel Smith wrote: >> >> You keep pounding on this example. It's a fine example, but, c'mon. **2 is probably at least 100x more common in real source code. Maybe 1000x more common. Why should we break the >> common case for your edge case? > > > > It is hardly an "edge case". > Again, **almost all** integer combinations overflow: that's the point. When you say "almost all", you're assuming inputs that are uniformly sampled integers. I'm much more interested in what proportion of calls to the ** operator involve inputs that can overflow, and in real life those inputs are very heavily biased towards small numbers. (I also think we should default to raising an error on overflow in general, with a seterr switch to turn it off when desired. But that's another discussion...) -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Fri Jun 10 15:01:00 2016 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 10 Jun 2016 15:01:00 -0400 Subject: [Numpy-discussion] Integers to integer powers, let's make a decision In-Reply-To: References: <253516852486873130.010471sturla.molden-gmail.com@news.gmane.org> Message-ID: On Fri, Jun 10, 2016 at 2:00 PM, Nathaniel Smith wrote: > On Jun 10, 2016 10:50, "Alan Isaac" wrote: > > > > On 6/10/2016 1:34 PM, Nathaniel Smith wrote: > >> > >> You keep pounding on this example. It's a fine example, but, c'mon. **2 > is probably at least 100x more common in real source code. Maybe 1000x more > common. Why should we break the > >> common case for your edge case? > > > > > > > > It is hardly an "edge case". > > Again, **almost all** integer combinations overflow: that's the point. > > When you say "almost all", you're assuming inputs that are uniformly > sampled integers. I'm much more interested in what proportion of calls to > the ** operator involve inputs that can overflow, and in real life those > inputs are very heavily biased towards small numbers. > > (I also think we should default to raising an error on overflow in > general, with a seterr switch to turn it off when desired. But that's > another discussion...) > but x**2 is just x*x which some seem to recommend (I have no idea why), and then there are not so many "common" cases left. (However, I find integers pretty useless except in some very specific cases. When I started to cleanup scipy.stats.distribution, I threw out integers for discrete distributions and replaced all or most `**` by np.power, IIRC mainly because of the old python behavior and better numpy behavior.) (I'd rather use robust calculations that provide correct numbers, than chasing individual edge cases and save a bit of memory in some common cases. scipy stats also doesn't use factorial in almost all cases, because special.gamma and variants are more robust ) Josef > -n > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From alan.isaac at gmail.com Fri Jun 10 15:38:30 2016 From: alan.isaac at gmail.com (Alan Isaac) Date: Fri, 10 Jun 2016 15:38:30 -0400 Subject: [Numpy-discussion] Integers to integer powers, let's make a decision In-Reply-To: References: <253516852486873130.010471sturla.molden-gmail.com@news.gmane.org> Message-ID: <1e673c3b-8bea-c7b7-3ccd-d35f417741ee@gmail.com> I guess I have one more question; sorry. Suppose we stipulate that `np.int_(9)**np.int__(10)` should just overflow, since that appears to be the clear intent of the (informed) user. When a Python 3 user writes `np.arange(10)**10`, how are we to infer the intended type of the output? (I specify Python 3, since it has a unified treatment of integers.) Of course: >>> np.find_common_type([np.int32],[int]) dtype('int32') If this were indeed an enforced numpy convention, I would see better the point of view on the integer exponentiation case. But how does that reconcile with: >>> np.find_common_type([np.int8],[np.int32]) dtype('int8') >>> (np.arange(10,dtype=np.int8)+np.int32(2**10)).dtype dtype('int16') And so on. If these other binary operators upcast based on the scalar value, why wouldn't exponentiation? I suppose the answer is: they upcast only insofar as necessary to fit the scalar value, which I see is a simple and enforceable rule. However, that seems the wrong rule for exponentiation, and in fact it is not in play: >>> (np.int8(2)**2).dtype dtype('int32') OK, my question to those who have argued a**2 should produce an int32 when a is an int32: what if a is an int8? (Obviously the overflow problem is becoming extremely pressing ...) Thanks, Alan PS Where are these casting rules documented? From matthew.brett at gmail.com Fri Jun 10 15:51:31 2016 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 10 Jun 2016 12:51:31 -0700 Subject: [Numpy-discussion] Datarray 0.1.0 release Message-ID: Hi, I just released a new version of the Datarray package: https://pypi.python.org/pypi/datarray/0.1.0 https://github.com/BIDS/datarray It's a very lightweight implementation of arrays with labeled axes and ticks, that allows you to do stuff like: >>> narr = DataArray(np.zeros((1,2,3)), axes=('a','b','c')) >>> narr.axes.a Axis(name='a', index=0, labels=None) >>> narr.axes.a[0] DataArray(array([[ 0., 0., 0.], [ 0., 0., 0.]]), ('b', 'c')) It's still experimental, but we'd love to hear any feedback, in any form. Please feel free to make github issues for specific bugs / suggestions: https://github.com/BIDS/datarray/issues If you like the general idea, and you don't mind the pandas dependency, `xray` is a much better choice for production code right now, and will do the same stuff and more: https://pypi.python.org/pypi/xray/0.4.1 Cheers, Matthew From allanhaldane at gmail.com Fri Jun 10 15:59:08 2016 From: allanhaldane at gmail.com (Allan Haldane) Date: Fri, 10 Jun 2016 15:59:08 -0400 Subject: [Numpy-discussion] Integers to integer powers, let's make a decision In-Reply-To: References: <253516852486873130.010471sturla.molden-gmail.com@news.gmane.org> Message-ID: <575B1C0C.5050904@gmail.com> On 06/10/2016 01:50 PM, Alan Isaac wrote: > Again, **almost all** integer combinations overflow: that's the point. Don't almost all integer combinations overflow for multiplication as well? I estimate that for unsigned 32 bit integers, only roughly 1 in 2e8 combinations don't overflow. The fraction is approximately (np.euler_gamma + 32*np.log(2))/2.0**32, if I didn't make a mistake. :) Allan From insertinterestingnamehere at gmail.com Fri Jun 10 16:16:06 2016 From: insertinterestingnamehere at gmail.com (Ian Henriksen) Date: Fri, 10 Jun 2016 20:16:06 +0000 Subject: [Numpy-discussion] Integers to integer powers, let's make a decision In-Reply-To: References: <253516852486873130.010471sturla.molden-gmail.com@news.gmane.org> Message-ID: On Fri, Jun 10, 2016 at 12:01 PM Nathaniel Smith wrote: > On Jun 10, 2016 10:50, "Alan Isaac" wrote: > > > > On 6/10/2016 1:34 PM, Nathaniel Smith wrote: > >> > >> You keep pounding on this example. It's a fine example, but, c'mon. **2 > is probably at least 100x more common in real source code. Maybe 1000x more > common. Why should we break the > >> common case for your edge case? > > > > > > > > It is hardly an "edge case". > > Again, **almost all** integer combinations overflow: that's the point. > > When you say "almost all", you're assuming inputs that are uniformly > sampled integers. I'm much more interested in what proportion of calls to > the ** operator involve inputs that can overflow, and in real life those > inputs are very heavily biased towards small numbers. > > (I also think we should default to raising an error on overflow in > general, with a seterr switch to turn it off when desired. But that's > another discussion...) > > -n > Another thing that would need separate discussion... Making 64 bit integers default in more cases would help here. Currently arange gives 32 bit integers on 64 bit Windows, but 64 bit integers on 64 bit Linux/OSX. Using size_t (or even int64_t) as the default size would help with overflows in the more common use cases. It's a hefty backcompat break, but 64 bit systems are much more common now, and using 32 bit integers on 64 bit windows is a bit odd. Anyway, hopefully that's not too off-topic. Best, Ian Henriksen -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Fri Jun 10 16:19:52 2016 From: shoyer at gmail.com (Stephan Hoyer) Date: Fri, 10 Jun 2016 13:19:52 -0700 Subject: [Numpy-discussion] Datarray 0.1.0 release In-Reply-To: References: Message-ID: On Fri, Jun 10, 2016 at 12:51 PM, Matthew Brett wrote: > If you like the general idea, and you don't mind the pandas > dependency, `xray` is a much better choice for production code right > now, and will do the same stuff and more: > > https://pypi.python.org/pypi/xray/0.4.1 > > Hi Matthew, Congrats on the release! I just wanted to point out that "xray" is now known as "xarray": https://pypi.python.org/pypi/xarray/ Cheers, Stephan -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Fri Jun 10 16:20:53 2016 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 10 Jun 2016 13:20:53 -0700 Subject: [Numpy-discussion] Datarray 0.1.0 release In-Reply-To: References: Message-ID: On Fri, Jun 10, 2016 at 1:19 PM, Stephan Hoyer wrote: > On Fri, Jun 10, 2016 at 12:51 PM, Matthew Brett > wrote: >> >> If you like the general idea, and you don't mind the pandas >> dependency, `xray` is a much better choice for production code right >> now, and will do the same stuff and more: >> >> https://pypi.python.org/pypi/xray/0.4.1 >> > > > Hi Matthew, > > Congrats on the release! > > I just wanted to point out that "xray" is now known as "xarray": > https://pypi.python.org/pypi/xarray/ Ah - thank you - I'll update the docs... Cheers, Matthew From allanhaldane at gmail.com Fri Jun 10 20:28:30 2016 From: allanhaldane at gmail.com (Allan Haldane) Date: Fri, 10 Jun 2016 20:28:30 -0400 Subject: [Numpy-discussion] Integers to integer powers, let's make a decision In-Reply-To: <1e673c3b-8bea-c7b7-3ccd-d35f417741ee@gmail.com> References: <253516852486873130.010471sturla.molden-gmail.com@news.gmane.org> <1e673c3b-8bea-c7b7-3ccd-d35f417741ee@gmail.com> Message-ID: <575B5B2E.5090709@gmail.com> On 06/10/2016 03:38 PM, Alan Isaac wrote: >>>> np.find_common_type([np.int8],[np.int32]) > dtype('int8') >>>> (np.arange(10,dtype=np.int8)+np.int32(2**10)).dtype > dtype('int16') > > And so on. If these other binary operators upcast based > on the scalar value, why wouldn't exponentiation? > I suppose the answer is: they upcast only insofar > as necessary to fit the scalar value, which I see is > a simple and enforceable rule. However, that seems the wrong > rule for exponentiation, and in fact it is not in play: > >>>> (np.int8(2)**2).dtype > dtype('int32') My understanding is that numpy never upcasts based on the values, it upcasts based on the datatype ranges. http://docs.scipy.org/doc/numpy-1.10.1/reference/ufuncs.html#casting-rules For arrays of different datatype, numpy finds the datatype which can store values in both dtype's ranges, *not* the type which is large enough to accurately store the result values. So for instance, >>> (np.arange(10, dtype=np.uint8) + np.uint32(2**32-1)).dtype dtype('uint32') Overflow has occurred, but numpy didn't upcast to uint64. This rule has some slightly strange consequences. For example, the ranges of np.int8 and np.uint64 don't match up, and numpy has decided that the only type covering both ranges is np.float64. So as an extra twist in this discussion, this means numpy actually *does* return a float value for an integer power in a few cases: >>> type( np.uint64(2) ** np.int8(3) ) numpy.float64 > OK, my question to those who have argued a**2 should > produce an int32 when a is an int32: what if a is an int8? > (Obviously the overflow problem is becoming extremely pressing ...) To me, whether it's int8 or int32, the user should just be aware of overflow. Also, I like to think of numpy as having quite C-like behavior, allowing you to play with the lowlevel bits and bytes. (I actually wish its casting behavior was more C-like). I suspect that people working with uint8 arrays might be doing byte-fiddling hacks and actually *want* overflow/wraparound to occur, at least when multiplying/adding. Allan PS I would concede that numpy's uint8 integer power currently doesn't wraparound like mutliply does, but it would be cool if it did. (modulo arithmetic is associative, so it should, right?). >>> x = np.arange(256, dtype='uint8') >>> x**8 # returns all 0 >>> x*x*x*x*x*x*x*x # returns wrapped values From m.h.vankerkwijk at gmail.com Fri Jun 10 20:44:33 2016 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Fri, 10 Jun 2016 20:44:33 -0400 Subject: [Numpy-discussion] Integers to integer powers, let's make a decision In-Reply-To: <575B5B2E.5090709@gmail.com> References: <253516852486873130.010471sturla.molden-gmail.com@news.gmane.org> <1e673c3b-8bea-c7b7-3ccd-d35f417741ee@gmail.com> <575B5B2E.5090709@gmail.com> Message-ID: I do think one of the main arguments for returning float remains the analogy with division. I don't know about the rest of you, but it has been such a relief not to have to tell students any more "you should add a ".", otherwise it does integer division". For most purposes, it simply shouldn't matter whether one types an integer or a float; if it does, then one has to think about it, and it seems fine for that relatively specialized case to have to use a specialized function. -- Marten -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Sat Jun 11 06:05:23 2016 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Sat, 11 Jun 2016 12:05:23 +0200 Subject: [Numpy-discussion] Integers to integer powers, let's make a decision In-Reply-To: References: <253516852486873130.010471sturla.molden-gmail.com@news.gmane.org> Message-ID: <1465639523.28428.1.camel@sipsolutions.net> On Fr, 2016-06-10 at 20:16 +0000, Ian Henriksen wrote: > On Fri, Jun 10, 2016 at 12:01 PM Nathaniel Smith > wrote: > > On Jun 10, 2016 10:50, "Alan Isaac" wrote: > > > > > > On 6/10/2016 1:34 PM, Nathaniel Smith wrote: > > >> > > >> You keep pounding on this example. It's a fine example, but, > > c'mon. **2 is probably at least 100x more common in real source > > code. Maybe 1000x more common. Why should we break the > > >> common case for your edge case? > > > > > > > > > > > > It is hardly an "edge case". > > > Again, **almost all** integer combinations overflow: that's the > > point. > > When you say "almost all", you're assuming inputs that are > > uniformly sampled integers. I'm much more interested in what > > proportion of calls to the ** operator involve inputs that can > > overflow, and in real life those inputs are very heavily biased > > towards small numbers. > > (I also think we should default to raising an error on overflow in > > general, with a seterr switch to turn it off when desired. But > > that's another discussion...) > > -n > > > Another thing that would need?separate discussion... > Making?64 bit integers default in more cases would help here. > Currently?arange gives?32 bit integers on 64 bit Windows, but > 64 bit integers on?64 bit Linux/OSX. Using size_t?(or even > int64_t) as the?default size would help with overflows in > the more common use cases. It's a hefty backcompat > break, but 64 bit systems are much more common now, > and using 32 bit integers on 64 bit windows is a bit odd. > Anyway, hopefully that's not too off-topic. > Best, I agree, at least on python3 (the reason is that python 3, the subclass thingy goes away, so it is less likely to break anything). I think we could have a shot at this, it is quirky, but the current incosistency is pretty bad too (and probably has a lot of bugs out in the wild, because of tests on systems where long is 64bits). A different issue though, though I wouldn't mind if someone ponders this a bit more and maybe creates a pull request. - Sebastian > Ian Henriksen > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From ralf.gommers at gmail.com Sat Jun 11 08:53:13 2016 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 11 Jun 2016 14:53:13 +0200 Subject: [Numpy-discussion] scipy.stats.qqplot and scipy.stats.probplot axis labeling In-Reply-To: <9F7D269B-E5C9-4BB5-B8A8-314E68667538@mail.sdsu.edu> References: <9F7D269B-E5C9-4BB5-B8A8-314E68667538@mail.sdsu.edu> Message-ID: Hi Mark, Note that the scipy-dev or scipy-user mailing list would have been more appropriate for this question. On Fri, Jun 10, 2016 at 9:06 AM, Mark Gawron wrote: > > > The scipy.stats.qqplot and scipy.stats.probplot functions plot expected > values versus actual data values for visualization of fit to a > distribution. First a one-D array of expected percentiles is generated for > a sample of size N; then that is passed to dist.ppf, the per cent point > function for the chosen distribution, to return an array of expected > values. The visualized data points are pairs of expected and actual > values, and a linear regression is done on these to produce the line data > points in this distribution should lie on. > > Where x is the input data array and dist the chosen distribution we have: > > osr = np.sort(x) > osm_uniform = _calc_uniform_order_statistic_medians(len(x)) > osm = dist.ppf(osm_uniform) > slope, intercept, r, prob, sterrest = stats.linregress(osm, osr) > > > My question concerns the plot display. > > ax.plot(osm, osr, 'bo', osm, slope*osm + intercept, 'r-') > > > The x-axis of the resulting plot is labeled quantiles, but the xticks and > xticklabels produced produced by qqplot and problplot do not seem correct > for the their intended interpretations. First the numbers on the x-axis do > not represent quantiles; the intervals between them do not in general > contain equal numbers of points. For a normal distribution with sigma=1, > they represent standard deviations. Changing the label on the x-axis does > not seem like a very good solution, because the interpretation of the > values on the x-axis will be different for different distributions. Rather > the right solution seems to be to actually show quantiles on the x-axis. > The numbers on the x-axis can stay as they are, representing quantile > indexes, but they need to be spaced so as to show the actual division > points that carve the population up into groups of the same size. This > can be done in something like the following way. > The ticks are correct I think, but they're theoretical quantiles and not sample quantiles. This was discussed in [1] and is consistent with R [2] and statsmodels [3]. I see that we just forgot to add "theoretical" to the x-axis label (mea culpa). Does adding that resolve your concern? [1] https://github.com/scipy/scipy/issues/1821 [2] http://data.library.virginia.edu/understanding-q-q-plots/ [3] http://statsmodels.sourceforge.net/devel/generated/statsmodels.graphics.gofplots.qqplot.html?highlight=qqplot#statsmodels.graphics.gofplots.qqplot Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sat Jun 11 09:51:08 2016 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 11 Jun 2016 15:51:08 +0200 Subject: [Numpy-discussion] NumPy lesson at EuroScipy2016? In-Reply-To: <5759dec733e30_c0212276204b@Pct-EqAlain-Z30.notmuch> References: <5759dec733e30_c0212276204b@Pct-EqAlain-Z30.notmuch> Message-ID: On Thu, Jun 9, 2016 at 11:25 PM, wrote: > Hi all, > > Recently I taught "Advanced NumPy" lesson at a Software Carpentry workshop > [1]. It covered a review of basic operations on numpy arrays and also more > advanced topics: indexing, broadcasting, dtypes and memory layout. I would > greatly appreciate your feedback on the lesson materials, which are > available on github pages [2]. > > I am also thinking of proposing this lesson as a EuroScipy 2016 tutorial. > Is anyone already planning to teach NumPy there? If so, would you be > interested to team up for this lesson (as a co-instructor, helper or > mentor)? > There's always a Numpy tutorial at EuroScipy. Emmanuelle (Cc'd) is the tutorial chair, she can tell you the plan and I'm sure she appreciates your offer of help. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sat Jun 11 13:03:26 2016 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 11 Jun 2016 13:03:26 -0400 Subject: [Numpy-discussion] scipy.stats.qqplot and scipy.stats.probplot axis labeling In-Reply-To: References: <9F7D269B-E5C9-4BB5-B8A8-314E68667538@mail.sdsu.edu> Message-ID: On Sat, Jun 11, 2016 at 8:53 AM, Ralf Gommers wrote: > Hi Mark, > > Note that the scipy-dev or scipy-user mailing list would have been more > appropriate for this question. > > > On Fri, Jun 10, 2016 at 9:06 AM, Mark Gawron wrote: > >> >> >> The scipy.stats.qqplot and scipy.stats.probplot functions plot expected >> values versus actual data values for visualization of fit to a >> distribution. First a one-D array of expected percentiles is generated for >> a sample of size N; then that is passed to dist.ppf, the per cent point >> function for the chosen distribution, to return an array of expected >> values. The visualized data points are pairs of expected and actual >> values, and a linear regression is done on these to produce the line data >> points in this distribution should lie on. >> >> Where x is the input data array and dist the chosen distribution we have: >> >> osr = np.sort(x) >> osm_uniform = _calc_uniform_order_statistic_medians(len(x)) >> osm = dist.ppf(osm_uniform) >> slope, intercept, r, prob, sterrest = stats.linregress(osm, osr) >> >> >> My question concerns the plot display. >> >> ax.plot(osm, osr, 'bo', osm, slope*osm + intercept, 'r-') >> >> >> The x-axis of the resulting plot is labeled quantiles, but the xticks and >> xticklabels produced produced by qqplot and problplot do not seem correct >> for the their intended interpretations. First the numbers on the x-axis do >> not represent quantiles; the intervals between them do not in general >> contain equal numbers of points. For a normal distribution with sigma=1, >> they represent standard deviations. Changing the label on the x-axis does >> not seem like a very good solution, because the interpretation of the >> values on the x-axis will be different for different distributions. Rather >> the right solution seems to be to actually show quantiles on the x-axis. >> The numbers on the x-axis can stay as they are, representing quantile >> indexes, but they need to be spaced so as to show the actual division >> points that carve the population up into groups of the same size. This >> can be done in something like the following way. >> > > The ticks are correct I think, but they're theoretical quantiles and not > sample quantiles. This was discussed in [1] and is consistent with R [2] > and statsmodels [3]. I see that we just forgot to add "theoretical" to the > x-axis label (mea culpa). Does adding that resolve your concern? > > [1] https://github.com/scipy/scipy/issues/1821 > [2] http://data.library.virginia.edu/understanding-q-q-plots/ > [3] > http://statsmodels.sourceforge.net/devel/generated/statsmodels.graphics.gofplots.qqplot.html?highlight=qqplot#statsmodels.graphics.gofplots.qqplot > > Ralf > > as related link http://phobson.github.io/mpl-probscale/tutorial/closer_look_at_viz.html Paul Hobson has done a lot of work for getting different probabitlity scales attached to pp-plots or generalized versions of probability plots. I think qqplots are less ambiguous because they are on the original or standardized scale. I haven't worked my way through the various interpretation of probability axis yet because I find it "not obvious". It might be easier for fields that have a tradition of using probability papers. It's planned to be added to the statsmodels probability plots so that there will be a large choice of axis labels and scales. Josef > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gawron at mail.sdsu.edu Sat Jun 11 14:49:20 2016 From: gawron at mail.sdsu.edu (Mark Gawron) Date: Sat, 11 Jun 2016 11:49:20 -0700 Subject: [Numpy-discussion] scipy.stats.qqplot and scipy.stats.probplot axis labeling In-Reply-To: References: <9F7D269B-E5C9-4BB5-B8A8-314E68667538@mail.sdsu.edu> Message-ID: <67ED9F06-FB81-48A2-8B76-2AEC68BC634D@mail.sdsu.edu> Thanks, Jozef. This is very helpful. And I will direct this to one of the other mailing lists, once I read the previous posts. Regarding your remark: Maybe Im having a terminology problem. It seems to me once you do >> osm = dist.ppf(osm_uniform) you?re back in the value space for the particular distribution. So this gives you known probability intervals, but not UNIFORM probability intervals (the interval between 0 and 1 STD covers a bigger prob interval than the the interval between 1 and 2). And the idea of a quantile is that it?s a division point in a UNIFORM division of the probability axis. Mark On Jun 11, 2016, at 10:03 AM, josef.pktd at gmail.com wrote: > > > On Sat, Jun 11, 2016 at 8:53 AM, Ralf Gommers wrote: > Hi Mark, > > Note that the scipy-dev or scipy-user mailing list would have been more appropriate for this question. > > > On Fri, Jun 10, 2016 at 9:06 AM, Mark Gawron wrote: > > > The scipy.stats.qqplot and scipy.stats.probplot functions plot expected values versus actual data values for visualization of fit to a distribution. First a one-D array of expected percentiles is generated for a sample of size N; then that is passed to dist.ppf, the per cent point function for the chosen distribution, to return an array of expected values. The visualized data points are pairs of expected and actual values, and a linear regression is done on these to produce the line data points in this distribution should lie on. > > Where x is the input data array and dist the chosen distribution we have: > >> osr = np.sort(x) >> osm_uniform = _calc_uniform_order_statistic_medians(len(x)) >> osm = dist.ppf(osm_uniform) >> slope, intercept, r, prob, sterrest = stats.linregress(osm, osr) > > My question concerns the plot display. > >> ax.plot(osm, osr, 'bo', osm, slope*osm + intercept, 'r-') > > > The x-axis of the resulting plot is labeled quantiles, but the xticks and xticklabels produced produced by qqplot and problplot do not seem correct for the their intended interpretations. First the numbers on the x-axis do not represent quantiles; the intervals between them do not in general contain equal numbers of points. For a normal distribution with sigma=1, they represent standard deviations. Changing the label on the x-axis does not seem like a very good solution, because the interpretation of the values on the x-axis will be different for different distributions. Rather the right solution seems to be to actually show quantiles on the x-axis. The numbers on the x-axis can stay as they are, representing quantile indexes, but they need to be spaced so as to show the actual division points that carve the population up into groups of the same size. This can be done in something like the following way. > > The ticks are correct I think, but they're theoretical quantiles and not sample quantiles. This was discussed in [1] and is consistent with R [2] and statsmodels [3]. I see that we just forgot to add "theoretical" to the x-axis label (mea culpa). Does adding that resolve your concern? > > [1] https://github.com/scipy/scipy/issues/1821 > [2] http://data.library.virginia.edu/understanding-q-q-plots/ > [3] http://statsmodels.sourceforge.net/devel/generated/statsmodels.graphics.gofplots.qqplot.html?highlight=qqplot#statsmodels.graphics.gofplots.qqplot > > Ralf > > > as related link http://phobson.github.io/mpl-probscale/tutorial/closer_look_at_viz.html > > Paul Hobson has done a lot of work for getting different probabitlity scales attached to pp-plots or generalized versions of probability plots. I think qqplots are less ambiguous because they are on the original or standardized scale. > > I haven't worked my way through the various interpretation of probability axis yet because I find it "not obvious". It might be easier for fields that have a tradition of using probability papers. > > It's planned to be added to the statsmodels probability plots so that there will be a large choice of axis labels and scales. > > Josef > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sat Jun 11 15:24:03 2016 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 11 Jun 2016 15:24:03 -0400 Subject: [Numpy-discussion] scipy.stats.qqplot and scipy.stats.probplot axis labeling In-Reply-To: <67ED9F06-FB81-48A2-8B76-2AEC68BC634D@mail.sdsu.edu> References: <9F7D269B-E5C9-4BB5-B8A8-314E68667538@mail.sdsu.edu> <67ED9F06-FB81-48A2-8B76-2AEC68BC634D@mail.sdsu.edu> Message-ID: On Sat, Jun 11, 2016 at 2:49 PM, Mark Gawron wrote: > Thanks, Jozef. This is very helpful. And I will direct this > to one of the other mailing lists, once I read the previous posts. > > Regarding your remark: Maybe Im having a terminology problem. It seems > to me once you do > > osm = dist.ppf(osm_uniform) >>> >>> > you?re back in the value space for the particular distribution. So this > gives you known probability intervals, but not UNIFORM probability > intervals (the interval between 0 and 1 STD covers a bigger prob interval > than the the interval between 1 and 2). And the idea of a quantile is > that it?s a division point in a UNIFORM division of the probability axis. > Yes and No, quantile, i.e. what you get from ppf, are units of the random variable. So it is on the scale of the random variable not on a probability scale. The axis labels are in units of the random variable. pp-plots have probabilities on the axis and are uniform scaled in probabilities but non-uniform in the values of the random variable. The difficult part to follow is if the plot is done uniform in one scale, but the axis are labeled non-uniform in the other scale. That's what Paul's probscale does and what you have in mind, AFAIU. Josef > > Mark > > On Jun 11, 2016, at 10:03 AM, josef.pktd at gmail.com wrote: > > > > On Sat, Jun 11, 2016 at 8:53 AM, Ralf Gommers > wrote: > >> Hi Mark, >> >> Note that the scipy-dev or scipy-user mailing list would have been more >> appropriate for this question. >> >> >> On Fri, Jun 10, 2016 at 9:06 AM, Mark Gawron >> wrote: >> >>> >>> >>> The scipy.stats.qqplot and scipy.stats.probplot functions plot expected >>> values versus actual data values for visualization of fit to a >>> distribution. First a one-D array of expected percentiles is generated for >>> a sample of size N; then that is passed to dist.ppf, the per cent point >>> function for the chosen distribution, to return an array of expected >>> values. The visualized data points are pairs of expected and actual >>> values, and a linear regression is done on these to produce the line data >>> points in this distribution should lie on. >>> >>> Where x is the input data array and dist the chosen distribution we have: >>> >>> osr = np.sort(x) >>> osm_uniform = _calc_uniform_order_statistic_medians(len(x)) >>> osm = dist.ppf(osm_uniform) >>> slope, intercept, r, prob, sterrest = stats.linregress(osm, osr) >>> >>> >>> My question concerns the plot display. >>> >>> ax.plot(osm, osr, 'bo', osm, slope*osm + intercept, 'r-') >>> >>> >>> The x-axis of the resulting plot is labeled quantiles, but the xticks >>> and xticklabels produced produced by qqplot and problplot do not seem >>> correct for the their intended interpretations. First the numbers on the >>> x-axis do not represent quantiles; the intervals between them do not in >>> general contain equal numbers of points. For a normal distribution with >>> sigma=1, they represent standard deviations. Changing the label on the >>> x-axis does not seem like a very good solution, because the interpretation >>> of the values on the x-axis will be different for different distributions. >>> Rather the right solution seems to be to actually show quantiles on the >>> x-axis. The numbers on the x-axis can stay as they are, representing >>> quantile indexes, but they need to be spaced so as to show the actual >>> division points that carve the population up into groups of the same >>> size. This can be done in something like the following way. >>> >> >> The ticks are correct I think, but they're theoretical quantiles and not >> sample quantiles. This was discussed in [1] and is consistent with R [2] >> and statsmodels [3]. I see that we just forgot to add "theoretical" to the >> x-axis label (mea culpa). Does adding that resolve your concern? >> >> [1] https://github.com/scipy/scipy/issues/1821 >> [2] http://data.library.virginia.edu/understanding-q-q-plots/ >> [3] >> http://statsmodels.sourceforge.net/devel/generated/statsmodels.graphics.gofplots.qqplot.html?highlight=qqplot#statsmodels.graphics.gofplots.qqplot >> >> Ralf >> >> > as related link > http://phobson.github.io/mpl-probscale/tutorial/closer_look_at_viz.html > > Paul Hobson has done a lot of work for getting different probabitlity > scales attached to pp-plots or generalized versions of probability plots. I > think qqplots are less ambiguous because they are on the original or > standardized scale. > > I haven't worked my way through the various interpretation of probability > axis yet because I find it "not obvious". It might be easier for fields > that have a tradition of using probability papers. > > It's planned to be added to the statsmodels probability plots so that > there will be a large choice of axis labels and scales. > > Josef > > >> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gawron at mail.sdsu.edu Sat Jun 11 15:31:11 2016 From: gawron at mail.sdsu.edu (Mark Gawron) Date: Sat, 11 Jun 2016 12:31:11 -0700 Subject: [Numpy-discussion] scipy.stats.qqplot and scipy.stats.probplot axis labeling In-Reply-To: References: <9F7D269B-E5C9-4BB5-B8A8-314E68667538@mail.sdsu.edu> <67ED9F06-FB81-48A2-8B76-2AEC68BC634D@mail.sdsu.edu> Message-ID: <97FD2978-5179-430C-923B-D85B0A8621F7@mail.sdsu.edu> Ok, Our messages crossed. I understand now. Thanks. Mark On Jun 11, 2016, at 12:24 PM, josef.pktd at gmail.com wrote: > > > On Sat, Jun 11, 2016 at 2:49 PM, Mark Gawron wrote: > Thanks, Jozef. This is very helpful. And I will direct this > to one of the other mailing lists, once I read the previous posts. > > Regarding your remark: Maybe Im having a terminology problem. It seems to me once you do > >>> osm = dist.ppf(osm_uniform) > > you?re back in the value space for the particular distribution. So this > gives you known probability intervals, but not UNIFORM probability > intervals (the interval between 0 and 1 STD covers a bigger prob interval > than the the interval between 1 and 2). And the idea of a quantile is > that it?s a division point in a UNIFORM division of the probability axis. > > > Yes and No, quantile, i.e. what you get from ppf, are units of the random variable. So it is on the scale of the random variable not on a probability scale. The axis labels are in units of the random variable. > > pp-plots have probabilities on the axis and are uniform scaled in probabilities but non-uniform in the values of the random variable. > > The difficult part to follow is if the plot is done uniform in one scale, but the axis are labeled non-uniform in the other scale. That's what Paul's probscale does and what you have in mind, AFAIU. > > Josef > > > Mark > > On Jun 11, 2016, at 10:03 AM, josef.pktd at gmail.com wrote: > >> >> >> On Sat, Jun 11, 2016 at 8:53 AM, Ralf Gommers wrote: >> Hi Mark, >> >> Note that the scipy-dev or scipy-user mailing list would have been more appropriate for this question. >> >> >> On Fri, Jun 10, 2016 at 9:06 AM, Mark Gawron wrote: >> >> >> The scipy.stats.qqplot and scipy.stats.probplot functions plot expected values versus actual data values for visualization of fit to a distribution. First a one-D array of expected percentiles is generated for a sample of size N; then that is passed to dist.ppf, the per cent point function for the chosen distribution, to return an array of expected values. The visualized data points are pairs of expected and actual values, and a linear regression is done on these to produce the line data points in this distribution should lie on. >> >> Where x is the input data array and dist the chosen distribution we have: >> >>> osr = np.sort(x) >>> osm_uniform = _calc_uniform_order_statistic_medians(len(x)) >>> osm = dist.ppf(osm_uniform) >>> slope, intercept, r, prob, sterrest = stats.linregress(osm, osr) >> >> My question concerns the plot display. >> >>> ax.plot(osm, osr, 'bo', osm, slope*osm + intercept, 'r-') >> >> >> The x-axis of the resulting plot is labeled quantiles, but the xticks and xticklabels produced produced by qqplot and problplot do not seem correct for the their intended interpretations. First the numbers on the x-axis do not represent quantiles; the intervals between them do not in general contain equal numbers of points. For a normal distribution with sigma=1, they represent standard deviations. Changing the label on the x-axis does not seem like a very good solution, because the interpretation of the values on the x-axis will be different for different distributions. Rather the right solution seems to be to actually show quantiles on the x-axis. The numbers on the x-axis can stay as they are, representing quantile indexes, but they need to be spaced so as to show the actual division points that carve the population up into groups of the same size. This can be done in something like the following way. >> >> The ticks are correct I think, but they're theoretical quantiles and not sample quantiles. This was discussed in [1] and is consistent with R [2] and statsmodels [3]. I see that we just forgot to add "theoretical" to the x-axis label (mea culpa). Does adding that resolve your concern? >> >> [1] https://github.com/scipy/scipy/issues/1821 >> [2] http://data.library.virginia.edu/understanding-q-q-plots/ >> [3] http://statsmodels.sourceforge.net/devel/generated/statsmodels.graphics.gofplots.qqplot.html?highlight=qqplot#statsmodels.graphics.gofplots.qqplot >> >> Ralf >> >> >> as related link http://phobson.github.io/mpl-probscale/tutorial/closer_look_at_viz.html >> >> Paul Hobson has done a lot of work for getting different probabitlity scales attached to pp-plots or generalized versions of probability plots. I think qqplots are less ambiguous because they are on the original or standardized scale. >> >> I haven't worked my way through the various interpretation of probability axis yet because I find it "not obvious". It might be easier for fields that have a tradition of using probability papers. >> >> It's planned to be added to the statsmodels probability plots so that there will be a large choice of axis labels and scales. >> >> Josef >> >> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From emmanuelle.gouillart at nsup.org Sat Jun 11 15:53:55 2016 From: emmanuelle.gouillart at nsup.org (Emmanuelle Gouillart) Date: Sat, 11 Jun 2016 21:53:55 +0200 Subject: [Numpy-discussion] NumPy lesson at EuroScipy2016? In-Reply-To: References: <5759dec733e30_c0212276204b@Pct-EqAlain-Z30.notmuch> Message-ID: <20160611195355.GD2879607@phare.normalesup.org> Dear Bartocz, thank you very much for proposing a tutorial on advanced NumPy for Euroscipy 2016! I think it's an awesome idea! Before the call for proposals, I did a survey about the subjects that people were interested in for the advanced tutorials, and advanced NumPy scored very high (see the poll on https://docs.google.com/forms/d/1H0vDPNgRVyESM1LYHSXXmunTgorNvVmu_psS56u9MOk/viewanalytics and my blog post on the results on http://emmanuelle.github.io/euroscipy-tutorials-results-from-the-opinion-poll.html). Therefore, I would be very grateful if you were willing to submit a proposal for a tutorial on advanced NumPy, in the advanced track. For the beginners track, there is already a tutorial on NumPy, which will be given by Gert Ingold (a contributor to the Scipy Lecture Notes). He's planning to cover the intro chapter of the scipy lecture notes about NumPy http://www.scipy-lectures.org/intro/numpy/index.html Since you mentioned the Scipy Lecture Notes in your e-mail, if you think that you would be interested in updating/improving the part on advanced NumPy of the lecture notes, that'd be really awesome! All the best, Emma On Sat, Jun 11, 2016 at 03:51:08PM +0200, Ralf Gommers wrote: > On Thu, Jun 9, 2016 at 11:25 PM, wrote: > Hi all, > Recently I taught "Advanced NumPy" lesson at a Software Carpentry workshop > [1]. It covered a review of basic operations on numpy arrays and also more > advanced topics: indexing, broadcasting, dtypes and memory layout. I would > greatly appreciate your feedback on the lesson materials, which are > available on github pages [2]. > I am also thinking of proposing this lesson as a EuroScipy 2016 tutorial. > Is anyone already planning to teach NumPy there? If so, would you be > interested to team up for this lesson (as a co-instructor, helper or > mentor)? > There's always a Numpy tutorial at EuroScipy. Emmanuelle (Cc'd) is the tutorial > chair, she can tell you the plan and I'm sure she appreciates your offer of > help. > Cheers, > Ralf From ralf.gommers at gmail.com Sun Jun 12 06:18:29 2016 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sun, 12 Jun 2016 12:18:29 +0200 Subject: [Numpy-discussion] scipy.stats.qqplot and scipy.stats.probplot axis labeling In-Reply-To: References: <9F7D269B-E5C9-4BB5-B8A8-314E68667538@mail.sdsu.edu> Message-ID: On Sat, Jun 11, 2016 at 2:53 PM, Ralf Gommers wrote: > Hi Mark, > > Note that the scipy-dev or scipy-user mailing list would have been more > appropriate for this question. > > > On Fri, Jun 10, 2016 at 9:06 AM, Mark Gawron wrote: > >> >> >> The scipy.stats.qqplot and scipy.stats.probplot functions plot expected >> values versus actual data values for visualization of fit to a >> distribution. First a one-D array of expected percentiles is generated for >> a sample of size N; then that is passed to dist.ppf, the per cent point >> function for the chosen distribution, to return an array of expected >> values. The visualized data points are pairs of expected and actual >> values, and a linear regression is done on these to produce the line data >> points in this distribution should lie on. >> >> Where x is the input data array and dist the chosen distribution we have: >> >> osr = np.sort(x) >> osm_uniform = _calc_uniform_order_statistic_medians(len(x)) >> osm = dist.ppf(osm_uniform) >> slope, intercept, r, prob, sterrest = stats.linregress(osm, osr) >> >> >> My question concerns the plot display. >> >> ax.plot(osm, osr, 'bo', osm, slope*osm + intercept, 'r-') >> >> >> The x-axis of the resulting plot is labeled quantiles, but the xticks and >> xticklabels produced produced by qqplot and problplot do not seem correct >> for the their intended interpretations. First the numbers on the x-axis do >> not represent quantiles; the intervals between them do not in general >> contain equal numbers of points. For a normal distribution with sigma=1, >> they represent standard deviations. Changing the label on the x-axis does >> not seem like a very good solution, because the interpretation of the >> values on the x-axis will be different for different distributions. Rather >> the right solution seems to be to actually show quantiles on the x-axis. >> The numbers on the x-axis can stay as they are, representing quantile >> indexes, but they need to be spaced so as to show the actual division >> points that carve the population up into groups of the same size. This >> can be done in something like the following way. >> > > The ticks are correct I think, but they're theoretical quantiles and not > sample quantiles. This was discussed in [1] and is consistent with R [2] > and statsmodels [3]. I see that we just forgot to add "theoretical" to the > x-axis label (mea culpa). Does adding that resolve your concern? > Sent a PR for this: https://github.com/scipy/scipy/pull/6249 Ralf > > [1] https://github.com/scipy/scipy/issues/1821 > [2] http://data.library.virginia.edu/understanding-q-q-plots/ > [3] > http://statsmodels.sourceforge.net/devel/generated/statsmodels.graphics.gofplots.qqplot.html?highlight=qqplot#statsmodels.graphics.gofplots.qqplot > > Ralf > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Mon Jun 13 04:47:08 2016 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 13 Jun 2016 10:47:08 +0200 Subject: [Numpy-discussion] Integers to integer powers, let's make a decision References: <253516852486873130.010471sturla.molden-gmail.com@news.gmane.org> <1e673c3b-8bea-c7b7-3ccd-d35f417741ee@gmail.com> <575B5B2E.5090709@gmail.com> Message-ID: <20160613104708.5192668e@fsol> On Fri, 10 Jun 2016 20:28:30 -0400 Allan Haldane wrote: > > Also, I like to think of numpy as having quite C-like behavior, allowing > you to play with the lowlevel bits and bytes. (I actually wish its > casting behavior was more C-like). I suspect that people working with > uint8 arrays might be doing byte-fiddling hacks and actually *want* > overflow/wraparound to occur, at least when multiplying/adding. I agree. Currently, the choice is simple: if you want an int output, have an int input; if you want a float output, have a float output. This fidelity to the user's data type choice allows people to make informed decisions. Regards Antoine. From sole at esrf.fr Mon Jun 13 05:05:42 2016 From: sole at esrf.fr (=?UTF-8?Q?V._Armando_Sol=c3=a9?=) Date: Mon, 13 Jun 2016 11:05:42 +0200 Subject: [Numpy-discussion] Integers to integer powers, let's make a decision In-Reply-To: <575B5B2E.5090709@gmail.com> References: <253516852486873130.010471sturla.molden-gmail.com@news.gmane.org> <1e673c3b-8bea-c7b7-3ccd-d35f417741ee@gmail.com> <575B5B2E.5090709@gmail.com> Message-ID: On 11/06/2016 02:28, Allan Haldane wrote: > > So as an extra twist in this discussion, this means numpy actually > *does* return a float value for an integer power in a few cases: > > >>> type( np.uint64(2) ** np.int8(3) ) > numpy.float64 > Shouldn't that example end up the discussion? I find that behaviour for any integer power of an np.uint64. I guess if something was to be broken, I guess it is already the case. We were given the choice between: 1 - Integers to negative integer powers raise an error. 2 - Integers to integer powers always results in floats. and we were never given the choice to adapt the returned type to the result. Assuming that option is not possible, it is certainly better option 2 than 1 (why to refuse to perform a clearly defined operation???) *and* returning a float is already the behaviour for integer powers of np.uint64. Armando From alan.isaac at gmail.com Mon Jun 13 10:05:08 2016 From: alan.isaac at gmail.com (Alan Isaac) Date: Mon, 13 Jun 2016 10:05:08 -0400 Subject: [Numpy-discussion] Integers to integer powers, let's make a decision In-Reply-To: <20160613104708.5192668e@fsol> References: <253516852486873130.010471sturla.molden-gmail.com@news.gmane.org> <1e673c3b-8bea-c7b7-3ccd-d35f417741ee@gmail.com> <575B5B2E.5090709@gmail.com> <20160613104708.5192668e@fsol> Message-ID: <7296d017-5a60-1ca3-3aad-a35bbf07bd61@gmail.com> On 6/13/2016 4:47 AM, Antoine Pitrou wrote: > Currently, the choice is simple: if you want an int output, > have an int input; if you want a float output, have a float output. That is a misunderstanding, which may be influencing the discussion. Examples of complications: >>> type(np.int8(2)**2) >>> type(np.uint64(2)**np.int8(2)) I don't think anyone has proposed first principles from which the desirable behavior could be deduced. I do think reference to the reasoning used by other languages in making this decision could be helpful. Alan Isaac (on Windows) From solipsis at pitrou.net Mon Jun 13 10:42:51 2016 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 13 Jun 2016 16:42:51 +0200 Subject: [Numpy-discussion] Integers to integer powers, let's make a decision References: <253516852486873130.010471sturla.molden-gmail.com@news.gmane.org> <1e673c3b-8bea-c7b7-3ccd-d35f417741ee@gmail.com> <575B5B2E.5090709@gmail.com> <20160613104708.5192668e@fsol> <7296d017-5a60-1ca3-3aad-a35bbf07bd61@gmail.com> Message-ID: <20160613164251.48731ffa@fsol> On Mon, 13 Jun 2016 10:05:08 -0400 Alan Isaac wrote: > > That is a misunderstanding, which may be influencing the discussion. > Examples of complications: > > >>> type(np.int8(2)**2) > > >>> type(np.uint64(2)**np.int8(2)) > The `uint64 x int8 -> float64` is IMHO an abberration in Numpy's typing logic. Regardless, it's not specific to the power operator: >>> np.int64(2) + np.int32(3) 5 >>> np.uint64(2) + np.int32(3) 5.0 The other complications have to do with the type width, which are less annoying than changing the numeric kind altogether (as would be done by mandating int x int -> float in all cases). Regards Antoine. From josef.pktd at gmail.com Mon Jun 13 10:49:44 2016 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 13 Jun 2016 10:49:44 -0400 Subject: [Numpy-discussion] Integers to integer powers, let's make a decision In-Reply-To: <7296d017-5a60-1ca3-3aad-a35bbf07bd61@gmail.com> References: <253516852486873130.010471sturla.molden-gmail.com@news.gmane.org> <1e673c3b-8bea-c7b7-3ccd-d35f417741ee@gmail.com> <575B5B2E.5090709@gmail.com> <20160613104708.5192668e@fsol> <7296d017-5a60-1ca3-3aad-a35bbf07bd61@gmail.com> Message-ID: On Mon, Jun 13, 2016 at 10:05 AM, Alan Isaac wrote: > On 6/13/2016 4:47 AM, Antoine Pitrou wrote: > >> Currently, the choice is simple: if you want an int output, >> have an int input; if you want a float output, have a float output. >> > > > That is a misunderstanding, which may be influencing the discussion. > Examples of complications: > > >>> type(np.int8(2)**2) > > >>> type(np.uint64(2)**np.int8(2)) > > > I don't think anyone has proposed first principles > from which the desirable behavior could be deduced. > I do think reference to the reasoning used by other > languages in making this decision could be helpful. I think the main principle is whether an operator is a "float" operator. for example, I don't think anyone would expect sqrt(int) to return int, even if it would have exact results in a countable infinite number of cases (theoretically) another case is division which moved from return-int to return-float definition in the py2 - py3 move. My argument is that `**` is like integer division and sqrt where the domain where integer return are the correct numbers is too small to avoid headaches by users. Josef > > > Alan Isaac > (on Windows) > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Mon Jun 13 11:25:21 2016 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 13 Jun 2016 17:25:21 +0200 Subject: [Numpy-discussion] Integers to integer powers, let's make a decision References: <253516852486873130.010471sturla.molden-gmail.com@news.gmane.org> <1e673c3b-8bea-c7b7-3ccd-d35f417741ee@gmail.com> <575B5B2E.5090709@gmail.com> <20160613104708.5192668e@fsol> <7296d017-5a60-1ca3-3aad-a35bbf07bd61@gmail.com> Message-ID: <20160613172521.1b2b4e11@fsol> On Mon, 13 Jun 2016 10:49:44 -0400 josef.pktd at gmail.com wrote: > > My argument is that `**` is like integer division and sqrt where the domain > where integer return are the correct numbers is too small to avoid > headaches by users. float64 has less integer precision than int64: >>> math.pow(3, 39) == 3**39 False >>> np.int64(3)**39 == 3**39 True (as a sidenote, np.float64's equality operator seems to be slightly broken: >>> np.float64(3)**39 == 3**39 True >>> int(np.float64(3)**39) == 3**39 False >>> float(np.float64(3)**39) == 3**39 False ) Regards Antoine. From josef.pktd at gmail.com Mon Jun 13 11:51:07 2016 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 13 Jun 2016 11:51:07 -0400 Subject: [Numpy-discussion] Integers to integer powers, let's make a decision In-Reply-To: <20160613172521.1b2b4e11@fsol> References: <253516852486873130.010471sturla.molden-gmail.com@news.gmane.org> <1e673c3b-8bea-c7b7-3ccd-d35f417741ee@gmail.com> <575B5B2E.5090709@gmail.com> <20160613104708.5192668e@fsol> <7296d017-5a60-1ca3-3aad-a35bbf07bd61@gmail.com> <20160613172521.1b2b4e11@fsol> Message-ID: On Mon, Jun 13, 2016 at 11:25 AM, Antoine Pitrou wrote: > On Mon, 13 Jun 2016 10:49:44 -0400 > josef.pktd at gmail.com wrote: > > > > My argument is that `**` is like integer division and sqrt where the > domain > > where integer return are the correct numbers is too small to avoid > > headaches by users. > > float64 has less integer precision than int64: > > >>> math.pow(3, 39) == 3**39 > False > >>> np.int64(3)**39 == 3**39 > True > but if a user does this, then ??? (headaches or head scratching) >>> np.array([3])**39 RuntimeWarning: invalid value encountered in power array([-2147483648], dtype=int32) Josef > > > (as a sidenote, np.float64's equality operator seems to be slightly > broken: > > >>> np.float64(3)**39 == 3**39 > True > >>> int(np.float64(3)**39) == 3**39 > False > >>> float(np.float64(3)**39) == 3**39 > False > ) > > Regards > > Antoine. > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Mon Jun 13 12:07:11 2016 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 13 Jun 2016 12:07:11 -0400 Subject: [Numpy-discussion] Integers to integer powers, let's make a decision In-Reply-To: References: <253516852486873130.010471sturla.molden-gmail.com@news.gmane.org> <1e673c3b-8bea-c7b7-3ccd-d35f417741ee@gmail.com> <575B5B2E.5090709@gmail.com> <20160613104708.5192668e@fsol> <7296d017-5a60-1ca3-3aad-a35bbf07bd61@gmail.com> <20160613172521.1b2b4e11@fsol> Message-ID: On Mon, Jun 13, 2016 at 11:51 AM, wrote: > > > On Mon, Jun 13, 2016 at 11:25 AM, Antoine Pitrou > wrote: > >> On Mon, 13 Jun 2016 10:49:44 -0400 >> josef.pktd at gmail.com wrote: >> > >> > My argument is that `**` is like integer division and sqrt where the >> domain >> > where integer return are the correct numbers is too small to avoid >> > headaches by users. >> >> float64 has less integer precision than int64: >> >> >>> math.pow(3, 39) == 3**39 >> False >> >>> np.int64(3)**39 == 3**39 >> True >> > > but if a user does this, then ??? (headaches or head scratching) > > >>> np.array([3])**39 > RuntimeWarning: invalid value encountered in power > > array([-2147483648], dtype=int32) > I forgot to add the real headaches start in the second call, when we don't get the RuntimeWarning anymore >>> np.array([4])**39 array([-2147483648], dtype=int32) ("Now, why do I owe so much money, when I made a huge profit all year." ) Josef > > Josef > > >> >> >> (as a sidenote, np.float64's equality operator seems to be slightly >> broken: >> >> >>> np.float64(3)**39 == 3**39 >> True >> >>> int(np.float64(3)**39) == 3**39 >> False >> >>> float(np.float64(3)**39) == 3**39 >> False >> ) >> >> Regards >> >> Antoine. >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Mon Jun 13 12:15:40 2016 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 13 Jun 2016 12:15:40 -0400 Subject: [Numpy-discussion] Integers to integer powers, let's make a decision In-Reply-To: References: <253516852486873130.010471sturla.molden-gmail.com@news.gmane.org> <1e673c3b-8bea-c7b7-3ccd-d35f417741ee@gmail.com> <575B5B2E.5090709@gmail.com> <20160613104708.5192668e@fsol> <7296d017-5a60-1ca3-3aad-a35bbf07bd61@gmail.com> <20160613172521.1b2b4e11@fsol> Message-ID: On Mon, Jun 13, 2016 at 12:07 PM, wrote: > > > On Mon, Jun 13, 2016 at 11:51 AM, wrote: > >> >> >> On Mon, Jun 13, 2016 at 11:25 AM, Antoine Pitrou >> wrote: >> >>> On Mon, 13 Jun 2016 10:49:44 -0400 >>> josef.pktd at gmail.com wrote: >>> > >>> > My argument is that `**` is like integer division and sqrt where the >>> domain >>> > where integer return are the correct numbers is too small to avoid >>> > headaches by users. >>> >>> float64 has less integer precision than int64: >>> >>> >>> math.pow(3, 39) == 3**39 >>> False >>> >>> np.int64(3)**39 == 3**39 >>> True >>> >> >> but if a user does this, then ??? (headaches or head scratching) >> >> >>> np.array([3])**39 >> RuntimeWarning: invalid value encountered in power >> >> array([-2147483648], dtype=int32) >> > > I forgot to add > > the real headaches start in the second call, when we don't get the > RuntimeWarning anymore > > >>> np.array([4])**39 > array([-2147483648], dtype=int32) > > > ("Now, why do I owe so much money, when I made a huge profit all year." ) > (grumpy off-topic complaint: The Canadian tax system is like this. They make a mistake in transferring information to a new computerized system, and then they send a bill for taxes based on reassessment of something that happened 5 years ago because their computerized record is wrong. ) > > Josef > > > >> >> Josef >> >> >>> >>> >>> (as a sidenote, np.float64's equality operator seems to be slightly >>> broken: >>> >>> >>> np.float64(3)**39 == 3**39 >>> True >>> >>> int(np.float64(3)**39) == 3**39 >>> False >>> >>> float(np.float64(3)**39) == 3**39 >>> False >>> ) >>> >>> Regards >>> >>> Antoine. >>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> https://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From allanhaldane at gmail.com Mon Jun 13 13:07:53 2016 From: allanhaldane at gmail.com (Allan Haldane) Date: Mon, 13 Jun 2016 13:07:53 -0400 Subject: [Numpy-discussion] Integers to integer powers, let's make a decision In-Reply-To: References: <253516852486873130.010471sturla.molden-gmail.com@news.gmane.org> <1e673c3b-8bea-c7b7-3ccd-d35f417741ee@gmail.com> <575B5B2E.5090709@gmail.com> Message-ID: <575EE869.9080401@gmail.com> On 06/13/2016 05:05 AM, V. Armando Sol? wrote: > On 11/06/2016 02:28, Allan Haldane wrote: >> >> So as an extra twist in this discussion, this means numpy actually >> *does* return a float value for an integer power in a few cases: >> >> >>> type( np.uint64(2) ** np.int8(3) ) >> numpy.float64 >> > > Shouldn't that example end up the discussion? I find that behaviour for > any integer power of an np.uint64. I guess if something was to be > broken, I guess it is already the case. > > We were given the choice between: > > 1 - Integers to negative integer powers raise an error. > 2 - Integers to integer powers always results in floats. > > and we were never given the choice to adapt the returned type to the > result. Assuming that option is not possible, it is certainly better > option 2 than 1 (why to refuse to perform a clearly defined > operation???) *and* returning a float is already the behaviour for > integer powers of np.uint64. Not for any uints: "type( np.uint64(2) ** np.uint8(3) )" is uint64. Although I brought it up I think the mixed dtype case is a bit of a red herring. The single-dtype case is better to think about for now, eg "np.uint64(2) ** np.uint64(3)". Allan From m.h.vankerkwijk at gmail.com Mon Jun 13 13:54:51 2016 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Mon, 13 Jun 2016 13:54:51 -0400 Subject: [Numpy-discussion] Integers to integer powers, let's make a decision In-Reply-To: <575EE869.9080401@gmail.com> References: <253516852486873130.010471sturla.molden-gmail.com@news.gmane.org> <1e673c3b-8bea-c7b7-3ccd-d35f417741ee@gmail.com> <575B5B2E.5090709@gmail.com> <575EE869.9080401@gmail.com> Message-ID: Hi All, ?I think we're getting a little off the rails, perhaps because two questions are being conflated: 1. What in principle is the best return type for int ** int (which Josef I think most properly rephrased as whether `**` should be thought of as a float operator, like `/` in python3 and `sqrt` etc.); 2. Whether one is willing to possibly break code by implementing this. My sense is that most discussion is about (1), where a majority may well agree the answer is float, instead of about (2), where it ends up boiling down to a judgment call of "eternal small pain" or "possible short-time big pain but consistency from now on". Perhaps I can introduce an alternative (likely shot down immediately...). For this, note that for division at least, numpy follows python closely, so that one has the following in python2: ``` In [2]: np.arange(10) / 2 Out[2]: array([0, 0, 1, 1, 2, 2, 3, 3, 4, 4]) In [3]: from __future__ import division In [4]: np.arange(10) / 2 Out[4]: array([ 0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5]) ``` Since negative exponents are really just 1 over the positive one, could we use the same logic for **? I.e., let what type is returned by int1 ** int2 be the same as that returned by int1 / int2? If we then also ensure that for integer output type, int1 ** -int2 returns 1 // (int1 ** int2), we have well-defined rules all around, so there would be no need for raising zero-division error. All the best, Marten -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Mon Jun 13 14:50:25 2016 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 13 Jun 2016 11:50:25 -0700 Subject: [Numpy-discussion] Integers to integer powers, let's make a decision In-Reply-To: References: <253516852486873130.010471sturla.molden-gmail.com@news.gmane.org> <1e673c3b-8bea-c7b7-3ccd-d35f417741ee@gmail.com> <575B5B2E.5090709@gmail.com> <575EE869.9080401@gmail.com> Message-ID: On Jun 13, 2016 10:54 AM, "Marten van Kerkwijk" wrote: > > Hi All, > > ?I think we're getting a little off the rails, perhaps because two questions are being conflated: > > 1. What in principle is the best return type for int ** int (which Josef I think most properly rephrased as whether `**` should be thought of as a float operator, like `/` in python3 and `sqrt` etc.); > > 2. Whether one is willing to possibly break code by implementing this. > > My sense is that most discussion is about (1), where a majority may well agree the answer is float, instead of about (2), where it ends up boiling down to a judgment call of "eternal small pain" or "possible short-time big pain but consistency from now on". > > Perhaps I can introduce an alternative (likely shot down immediately...). For this, note that for division at least, numpy follows python closely, so that one has the following in python2: > ``` > In [2]: np.arange(10) / 2 > Out[2]: array([0, 0, 1, 1, 2, 2, 3, 3, 4, 4]) > > In [3]: from __future__ import division > > In [4]: np.arange(10) / 2 > Out[4]: array([ 0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5]) > ``` > Since negative exponents are really just 1 over the positive one, could we use the same logic for **? I.e., let what type is returned by int1 ** int2 be the same as that returned by int1 / int2? There isn't any reasonable way for numpy's ** operator to check whether the caller has future division enabled, so I think this proposal boils down to, int ** int returning int in py2 and float on py3? It has a certain Solomonic appeal, in that I think it would make everyone equally unhappy :-). But probably now is not the time to be introducing new py2/py3 incompatibilities... > If we then also ensure that for integer output type, int1 ** -int2 returns 1 // (int1 ** int2), we have well-defined rules all around, so there would be no need for raising zero-division error. Not sure what to make of this part -- converting int ** -int into 1 // (int ** int) will return zero in almost all cases, is the unfortunate behavior that kicked off this whole discussion. AFAICT everyone agrees that we don't want *that*. And I don't think this gets around the need to decide how to handle 0 ** -1, whether by raising ZeroDivisionError or what. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Mon Jun 13 16:11:36 2016 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 13 Jun 2016 13:11:36 -0700 Subject: [Numpy-discussion] Deprecating silent truncation of floats when assigned to int array Message-ID: It was recently pointed out: https://github.com/numpy/numpy/issues/7730 that this code silently truncates floats: In [1]: a = np.arange(10) In [2]: a.dtype Out[2]: dtype('int64') In [3]: a[3] = 1.5 In [4]: a[3] Out[4]: 1 The proposal is that we should deprecate this, and eventually turn it into an error. Any objections? We recently went through a similar deprecation cycle for in-place operations, i.e., this used to silently truncate but now raises an error: In [1]: a = np.arange(10) In [2]: a += 1.5 --------------------------------------------------------------------------- TypeError Traceback (most recent call last) in () ----> 1 a += 1.5 TypeError: Cannot cast ufunc add output from dtype('float64') to dtype('int64') with casting rule 'same_kind' so the proposal here is to extend this to regular assignment. -n -- Nathaniel J. Smith -- https://vorpus.org From insertinterestingnamehere at gmail.com Mon Jun 13 18:23:06 2016 From: insertinterestingnamehere at gmail.com (Ian Henriksen) Date: Mon, 13 Jun 2016 22:23:06 +0000 Subject: [Numpy-discussion] Deprecating silent truncation of floats when assigned to int array In-Reply-To: References: Message-ID: Personally, I think this is a great idea. +1 to more informative errors. Best, Ian Henriksen On Mon, Jun 13, 2016 at 2:11 PM Nathaniel Smith wrote: > It was recently pointed out: > > https://github.com/numpy/numpy/issues/7730 > > that this code silently truncates floats: > > In [1]: a = np.arange(10) > > In [2]: a.dtype > Out[2]: dtype('int64') > > In [3]: a[3] = 1.5 > > In [4]: a[3] > Out[4]: 1 > > The proposal is that we should deprecate this, and eventually turn it > into an error. Any objections? > > We recently went through a similar deprecation cycle for in-place > operations, i.e., this used to silently truncate but now raises an > error: > > In [1]: a = np.arange(10) > > In [2]: a += 1.5 > --------------------------------------------------------------------------- > TypeError Traceback (most recent call last) > in () > ----> 1 a += 1.5 > > TypeError: Cannot cast ufunc add output from dtype('float64') to > dtype('int64') with casting rule 'same_kind' > > so the proposal here is to extend this to regular assignment. > > -n > > -- > Nathaniel J. Smith -- https://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bloring at lbl.gov Mon Jun 13 20:23:53 2016 From: bloring at lbl.gov (Burlen Loring) Date: Mon, 13 Jun 2016 17:23:53 -0700 Subject: [Numpy-discussion] numpy threads crash when allocating arrays Message-ID: Hi All, I'm working on a threaded pipeline where we want the end user to be able to code up Python functions to do numerical work. Threading is all done in C++11 and in each thread we've acquired gill before we invoke the user provided Python callback and release it only when the callback returns. We've used SWIG to expose bindings to C++ objects in Python. When run with more than 1 thread I get inconsistent segv's in various numpy routines, and occasionally see a *** Reference count error detected: an attempt was made to deallocate 11 (f) ***. To pass data from C++ to the Python callback as numpy array we have 206 // **************************************************************************** 207 template 208 PyArrayObject *new_object(teca_variant_array_impl *varrt) 209 { 210 // allocate a buffer 211 npy_intp n_elem = varrt->size(); 212 size_t n_bytes = n_elem*sizeof(NT); 213 NT *mem = static_cast(malloc(n_bytes)); 214 if (!mem) 215 { 216 PyErr_Format(PyExc_RuntimeError, 217 "failed to allocate %lu bytes", n_bytes); 218 return nullptr; 219 } 220 221 // copy the data 222 memcpy(mem, varrt->get(), n_bytes); 223 224 // put the buffer in to a new numpy object 225 PyArrayObject *arr = reinterpret_cast( 226 PyArray_SimpleNewFromData(1, &n_elem, numpy_tt::code, mem)); 227 PyArray_ENABLEFLAGS(arr, NPY_ARRAY_OWNDATA); 228 229 return arr; 230 } This is the only place we create numpy objects in the C++ side. In my demo the Python callback is as follows: 33 def get_execute(rank, var_names): 34 def execute(port, data_in, req): 35 sys.stderr.write('descriptive_stats::execute MPI %d\n'%(rank)) 36 37 mesh = as_teca_cartesian_mesh(data_in[0]) 38 39 table = teca_table.New() 40 table.copy_metadata(mesh) 41 42 table.declare_columns(['step','time'], ['ul','d']) 43 table << mesh.get_time_step() << mesh.get_time() 44 45 for var_name in var_names: 46 47 table.declare_columns(['min '+var_name, 'avg '+var_name, \ 48 'max '+var_name, 'std '+var_name, 'low_q '+var_name, \ 49 'med '+var_name, 'up_q '+var_name], ['d']*7) 50 *51 **var = mesh.get_point_arrays().get(var_name).as_array()* 52 53 table << float(np.min(var)) << float(np.average(var)) \ 54 << float(np.max(var)) << float(np.std(var)) \ 55 << map(float, np.percentile(var, [25.,50.,75.])) 56 57 return table 58 return execute this callback is the only spot where numpy is used. the as_array call is implemented by new_object template above. Further, If I remove our use of PyArray_SimpleNewFromData, by replacing line 51 in the Python code above with var = np.array(range(1, 1100), 'f'), the problem disappears. It must have something to do with use of PyArray_SimpleNewFromData. I'm at a loss to see why things are going south. I'm using the GIL and I thought that would serialize the Python code. I suspect that numpy is using global or static variables some where internally and that it's inherently thread unsafe. Can anyone confirm/deny? maybe point me in the right direction? -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Mon Jun 13 22:07:38 2016 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 13 Jun 2016 19:07:38 -0700 Subject: [Numpy-discussion] numpy threads crash when allocating arrays In-Reply-To: References: Message-ID: Hi Burlen, On Jun 13, 2016 5:24 PM, "Burlen Loring" wrote: > > Hi All, > > I'm working on a threaded pipeline where we want the end user to be able to code up Python functions to do numerical work. Threading is all done in C++11 and in each thread we've acquired gill before we invoke the user provided Python callback and release it only when the callback returns. We've used SWIG to expose bindings to C++ objects in Python. > > When run with more than 1 thread I get inconsistent segv's in various numpy routines, and occasionally see a *** Reference count error detected: an attempt was made to deallocate 11 (f) ***. > > To pass data from C++ to the Python callback as numpy array we have >> >> 206 // **************************************************************************** >> 207 template >> 208 PyArrayObject *new_object(teca_variant_array_impl *varrt) >> 209 { >> 210 // allocate a buffer >> 211 npy_intp n_elem = varrt->size(); >> 212 size_t n_bytes = n_elem*sizeof(NT); >> 213 NT *mem = static_cast(malloc(n_bytes)); >> 214 if (!mem) >> 215 { >> 216 PyErr_Format(PyExc_RuntimeError, >> 217 "failed to allocate %lu bytes", n_bytes); >> 218 return nullptr; >> 219 } >> 220 >> 221 // copy the data >> 222 memcpy(mem, varrt->get(), n_bytes); >> 223 >> 224 // put the buffer in to a new numpy object >> 225 PyArrayObject *arr = reinterpret_cast( >> 226 PyArray_SimpleNewFromData(1, &n_elem, numpy_tt::code, mem)); >> 227 PyArray_ENABLEFLAGS(arr, NPY_ARRAY_OWNDATA); >> 228 >> 229 return arr; >> 230 } This code would probably be much simpler if you let numpy allocate the buffer with PyArray_SimpleNew and then did the memcpy. I doubt that's your problem, though. Numpy should be assuming that "owned" data was allocated using malloc(), and if it were using a different allocator then I think you'd be seeing crashes much sooner. > This is the only place we create numpy objects in the C++ side. > > In my demo the Python callback is as follows: >> >> 33 def get_execute(rank, var_names): >> 34 def execute(port, data_in, req): >> 35 sys.stderr.write('descriptive_stats::execute MPI %d\n'%(rank)) >> 36 >> 37 mesh = as_teca_cartesian_mesh(data_in[0]) >> 38 >> 39 table = teca_table.New() >> 40 table.copy_metadata(mesh) >> 41 >> 42 table.declare_columns(['step','time'], ['ul','d']) >> 43 table << mesh.get_time_step() << mesh.get_time() >> 44 >> 45 for var_name in var_names: >> 46 >> 47 table.declare_columns(['min '+var_name, 'avg '+var_name, \ >> 48 'max '+var_name, 'std '+var_name, 'low_q '+var_name, \ >> 49 'med '+var_name, 'up_q '+var_name], ['d']*7) >> 50 >> 51 var = mesh.get_point_arrays().get(var_name).as_array() >> 52 >> 53 table << float(np.min(var)) << float(np.average(var)) \ >> 54 << float(np.max(var)) << float(np.std(var)) \ >> 55 << map(float, np.percentile(var, [25.,50.,75.])) >> 56 >> 57 return table >> 58 return execute > > this callback is the only spot where numpy is used. the as_array call is implemented by new_object template above. > Further, If I remove our use of PyArray_SimpleNewFromData, by replacing line 51 in the Python code above with var = np.array(range(1, 1100), 'f'), the problem disappears. It must have something to do with use of PyArray_SimpleNewFromData. > > I'm at a loss to see why things are going south. I'm using the GIL and I thought that would serialize the Python code. I suspect that numpy is using global or static variables some where internally and that it's inherently thread unsafe. Can anyone confirm/deny? maybe point me in the right direction? Numpy does use global/static variables, and it is unsafe to call into numpy simultaneously from different threads. But that's ok, because you're not allowed to call numpy functions simultaneously from different threads -- you have to hold the GIL first, and that serializes access to all of numpy's internal state. Numpy is very commonly used in threaded code and most people aren't seeing random segfaults, so the problem is most likely in your code. Sorry I can't help much more than that... I guess I'd start by triple-checking that the code really truly does hold the GIL every time that it calls into numpy/python APIs. I'd also try running it under valgrind in case it's some other random memory corruption that's just showing up in a weird way. -n From bloring at lbl.gov Tue Jun 14 13:34:13 2016 From: bloring at lbl.gov (Burlen Loring) Date: Tue, 14 Jun 2016 10:34:13 -0700 Subject: [Numpy-discussion] numpy threads crash when allocating arrays In-Reply-To: References: Message-ID: <9c39a92c-f79e-ebf2-51bc-dd2469c1703a@lbl.gov> Nathaniel, Thanks for the feedback. Investigations made where I acquire GIL in spots where I already hold it lead me to suspect that the issue is caused by use of Py_BEGIN_ALLOW_THREADS or another macro NPY_BEGIN_ALLOW_THREADS. to put this point to rest, I have a high degree of confidence in our code. I have run the code in valgrind, and it runs cleanly when I use Python's valgrind suppression file. It is certainly the case the the error is harder to reproduce with valgrind, and it runs so slowly under valgrind that running in a bash loop (the usual way I reproduce) is not practical. I have examined crashes with gdb and I have verified that they occur while I hold the GIL. However, based on your suggestion, I experimented with acquiring GIL in certain spots in my code where it should not be necessary either because those code pieces are invoked by the main thread when there is only one thread running, or because the code in question is invoked while I'm already holding the GIL. I also added trace output to stderr in those spots as well. Acquiring the GIL caused deadlocks as expected, except when I used it in new_object template (the only spot we pass data to numpy) below. Aquiring the GIL here fixes the issues! 206 // **************************************************************************** 207 template 208 PyArrayObject *new_object(teca_variant_array_impl *varrt) 209 { *210 PyGILState_STATE gstate;**// experimental, I already hold the GIL higher up in stack **211 gstate = PyGILState_Ensure();* 212 TECA_STATUS("teca_py_array::new_object"); 213 214 // allocate a buffer 215 npy_intp n_elem = varrt->size(); 216 size_t n_bytes = n_elem*sizeof(NT); 217 NT *mem = static_cast(malloc(n_bytes)); 218 if (!mem) 219 { 220 PyErr_Format(PyExc_RuntimeError, 221 "failed to allocate %lu bytes", n_bytes); 222 return nullptr; 223 } 224 225 // copy the data 226 memcpy(mem, varrt->get(), n_bytes); 227 228 // put the buffer in to a new numpy object 229 PyArrayObject *arr = reinterpret_cast( 230 PyArray_SimpleNewFromData(1, &n_elem, numpy_tt::code, mem)); 231 PyArray_ENABLEFLAGS(arr, NPY_ARRAY_OWNDATA); 232 *233 PyGILState_Release(gstate);* 234 return arr; 235 } now, this function is only used from within the user provided Python callback which is invoked only while I'm holding the GIL. A fact that I've verified via gdb stack traces. This function should be running serially due to the fact that I hold the GIL. However, it's running concurrently, as evidenced from the random crashes when it's used, and the garbled stderr output, both of which go away with the above addition. I don't think that I should have to acquire the GIL here, but the evidence is against me! In Python docs on the GIL, I noticed that there's a macro Py_BEGIN_ALLOW_THREADS that Python uses internally around blocking I/O and heavy computations. much to my surprise grep reveals Py_BEGIN_ALLOW_THREADS is used in numpy. I think this can explain the issues I'm experiencing. where numpy uses Py_BEGIN_ALLOW_THREADS it would let my threads, who utilize the numpy C-API via new_object template above, run concurrently despite the fact that I hold the GIL. It seems to me that in order for numpy to be thread safe it should not use this macro at all. At least this is my theory, I haven't yet had a chance to modify numpy build to verify. It's possible that Py_BEGIN_ALLOW_THREADS maybe used elsewhere. here's my question: given Py_BEGIN_ALLOW_THREADS is used by numpy how can numpy be thread safe? and how can someone using the C-API know where it's necessary to acquire the GIL? Maybe someone can explain this? $grep BEGIN_ALLOW_THREADS ./ -rIn ./doc/source/f2py/signature-file.rst:250: Use ``Py_BEGIN_ALLOW_THREADS .. Py_END_ALLOW_THREADS`` block ./doc/source/reference/c-api.array.rst:3092: .. c:macro:: NPY_BEGIN_ALLOW_THREADS ./doc/source/reference/c-api.array.rst:3094: Equivalent to :c:macro:`Py_BEGIN_ALLOW_THREADS` except it uses ./build/src.linux-x86_64-2.7/numpy/core/src/multiarray/scalartypes.c:2109: Py_BEGIN_ALLOW_THREADS ./build/src.linux-x86_64-2.7/numpy/core/src/multiarray/scalartypes.c:2125: Py_BEGIN_ALLOW_THREADS ./build/src.linux-x86_64-2.7/numpy/core/src/multiarray/scalartypes.c:2143: Py_BEGIN_ALLOW_THREADS ./build/src.linux-x86_64-2.7/numpy/core/src/multiarray/scalartypes.c:2159: Py_BEGIN_ALLOW_THREADS ./build/src.linux-x86_64-2.7/numpy/core/src/multiarray/scalartypes.c:2177: Py_BEGIN_ALLOW_THREADS ./build/src.linux-x86_64-2.7/numpy/core/src/multiarray/scalartypes.c:2193: Py_BEGIN_ALLOW_THREADS ./build/src.linux-x86_64-2.7/numpy/core/src/multiarray/scalartypes.c:2211: Py_BEGIN_ALLOW_THREADS ./build/src.linux-x86_64-2.7/numpy/core/src/multiarray/scalartypes.c:2227: Py_BEGIN_ALLOW_THREADS ./build/lib.linux-x86_64-2.7/numpy/f2py/rules.py:419: {isthreadsafe: '\t\t\tPy_BEGIN_ALLOW_THREADS'}, ./build/lib.linux-x86_64-2.7/numpy/f2py/rules.py:457: {isthreadsafe: '\tPy_BEGIN_ALLOW_THREADS'}, ./build/lib.linux-x86_64-2.7/numpy/f2py/rules.py:495: {isthreadsafe: '\tPy_BEGIN_ALLOW_THREADS'}, ./build/lib.linux-x86_64-2.7/numpy/f2py/rules.py:541: {isthreadsafe: '\tPy_BEGIN_ALLOW_THREADS'}, ./build/lib.linux-x86_64-2.7/numpy/f2py/rules.py:583: {isthreadsafe: '\t\tPy_BEGIN_ALLOW_THREADS'}, ./numpy/f2py/rules.py:419: {isthreadsafe: '\t\t\tPy_BEGIN_ALLOW_THREADS'}, ./numpy/f2py/rules.py:457: {isthreadsafe: '\tPy_BEGIN_ALLOW_THREADS'}, ./numpy/f2py/rules.py:495: {isthreadsafe: '\tPy_BEGIN_ALLOW_THREADS'}, ./numpy/f2py/rules.py:541: {isthreadsafe: '\tPy_BEGIN_ALLOW_THREADS'}, ./numpy/f2py/rules.py:583: {isthreadsafe: '\t\tPy_BEGIN_ALLOW_THREADS'}, ./numpy/fft/fftpack_litemodule.c:45: Py_BEGIN_ALLOW_THREADS; ./numpy/fft/fftpack_litemodule.c:98: Py_BEGIN_ALLOW_THREADS; ./numpy/fft/fftpack_litemodule.c:135: Py_BEGIN_ALLOW_THREADS; ./numpy/fft/fftpack_litemodule.c:191: Py_BEGIN_ALLOW_THREADS; ./numpy/fft/fftpack_litemodule.c:254: Py_BEGIN_ALLOW_THREADS; ./numpy/fft/fftpack_litemodule.c:295: Py_BEGIN_ALLOW_THREADS; ./numpy/core/src/multiarray/ctors.c:3237: NPY_BEGIN_ALLOW_THREADS; ./numpy/core/src/multiarray/ctors.c:3277: NPY_BEGIN_ALLOW_THREADS; ./numpy/core/src/multiarray/cblasfuncs.c:378: NPY_BEGIN_ALLOW_THREADS; ./numpy/core/src/multiarray/cblasfuncs.c:525: NPY_BEGIN_ALLOW_THREADS; ./numpy/core/src/multiarray/cblasfuncs.c:549: NPY_BEGIN_ALLOW_THREADS ./numpy/core/src/multiarray/cblasfuncs.c:576: NPY_BEGIN_ALLOW_THREADS ./numpy/core/src/multiarray/cblasfuncs.c:628: NPY_BEGIN_ALLOW_THREADS; ./numpy/core/src/multiarray/cblasfuncs.c:761: NPY_BEGIN_ALLOW_THREADS; ./numpy/core/src/multiarray/compiled_base.c:157: NPY_BEGIN_ALLOW_THREADS; ./numpy/core/src/multiarray/compiled_base.c:181: NPY_BEGIN_ALLOW_THREADS; ./numpy/core/src/multiarray/compiled_base.c:473: NPY_BEGIN_ALLOW_THREADS; ./numpy/core/src/multiarray/compiled_base.c:810: NPY_BEGIN_ALLOW_THREADS; ./numpy/core/src/multiarray/compiled_base.c:1014: NPY_BEGIN_ALLOW_THREADS; ./numpy/core/src/multiarray/compiled_base.c:1049: NPY_BEGIN_ALLOW_THREADS; ./numpy/core/src/multiarray/scalartypes.c.src:939: Py_BEGIN_ALLOW_THREADS ./numpy/core/src/multiarray/scalartypes.c.src:955: Py_BEGIN_ALLOW_THREADS ./numpy/core/src/multiarray/convert.c:98: NPY_BEGIN_ALLOW_THREADS; ./numpy/core/src/multiarray/convert.c:210: NPY_BEGIN_ALLOW_THREADS; ./numpy/core/src/multiarray/multiarraymodule.c:3981: Py_BEGIN_ALLOW_THREADS; ./numpy/core/include/numpy/ndarraytypes.h:932:#define NPY_BEGIN_ALLOW_THREADS Py_BEGIN_ALLOW_THREADS ./numpy/core/include/numpy/ndarraytypes.h:952:#define NPY_BEGIN_ALLOW_THREADS On 06/13/2016 07:07 PM, Nathaniel Smith wrote: > Hi Burlen, > > On Jun 13, 2016 5:24 PM, "Burlen Loring" wrote: >> Hi All, >> >> I'm working on a threaded pipeline where we want the end user to be able to code up Python functions to do numerical work. Threading is all done in C++11 and in each thread we've acquired gill before we invoke the user provided Python callback and release it only when the callback returns. We've used SWIG to expose bindings to C++ objects in Python. >> >> When run with more than 1 thread I get inconsistent segv's in various numpy routines, and occasionally see a *** Reference count error detected: an attempt was made to deallocate 11 (f) ***. >> >> To pass data from C++ to the Python callback as numpy array we have >>> 206 // **************************************************************************** >>> 207 template >>> 208 PyArrayObject *new_object(teca_variant_array_impl *varrt) >>> 209 { >>> 210 // allocate a buffer >>> 211 npy_intp n_elem = varrt->size(); >>> 212 size_t n_bytes = n_elem*sizeof(NT); >>> 213 NT *mem = static_cast(malloc(n_bytes)); >>> 214 if (!mem) >>> 215 { >>> 216 PyErr_Format(PyExc_RuntimeError, >>> 217 "failed to allocate %lu bytes", n_bytes); >>> 218 return nullptr; >>> 219 } >>> 220 >>> 221 // copy the data >>> 222 memcpy(mem, varrt->get(), n_bytes); >>> 223 >>> 224 // put the buffer in to a new numpy object >>> 225 PyArrayObject *arr = reinterpret_cast( >>> 226 PyArray_SimpleNewFromData(1, &n_elem, numpy_tt::code, mem)); >>> 227 PyArray_ENABLEFLAGS(arr, NPY_ARRAY_OWNDATA); >>> 228 >>> 229 return arr; >>> 230 } > This code would probably be much simpler if you let numpy allocate the > buffer with PyArray_SimpleNew and then did the memcpy. I doubt that's > your problem, though. Numpy should be assuming that "owned" data was > allocated using malloc(), and if it were using a different allocator > then I think you'd be seeing crashes much sooner. > >> This is the only place we create numpy objects in the C++ side. >> >> In my demo the Python callback is as follows: >>> 33 def get_execute(rank, var_names): >>> 34 def execute(port, data_in, req): >>> 35 sys.stderr.write('descriptive_stats::execute MPI %d\n'%(rank)) >>> 36 >>> 37 mesh = as_teca_cartesian_mesh(data_in[0]) >>> 38 >>> 39 table = teca_table.New() >>> 40 table.copy_metadata(mesh) >>> 41 >>> 42 table.declare_columns(['step','time'], ['ul','d']) >>> 43 table << mesh.get_time_step() << mesh.get_time() >>> 44 >>> 45 for var_name in var_names: >>> 46 >>> 47 table.declare_columns(['min '+var_name, 'avg '+var_name, \ >>> 48 'max '+var_name, 'std '+var_name, 'low_q '+var_name, \ >>> 49 'med '+var_name, 'up_q '+var_name], ['d']*7) >>> 50 >>> 51 var = mesh.get_point_arrays().get(var_name).as_array() >>> 52 >>> 53 table << float(np.min(var)) << float(np.average(var)) \ >>> 54 << float(np.max(var)) << float(np.std(var)) \ >>> 55 << map(float, np.percentile(var, [25.,50.,75.])) >>> 56 >>> 57 return table >>> 58 return execute >> this callback is the only spot where numpy is used. the as_array call is implemented by new_object template above. >> Further, If I remove our use of PyArray_SimpleNewFromData, by replacing line 51 in the Python code above with var = np.array(range(1, 1100), 'f'), the problem disappears. It must have something to do with use of PyArray_SimpleNewFromData. >> >> I'm at a loss to see why things are going south. I'm using the GIL and I thought that would serialize the Python code. I suspect that numpy is using global or static variables some where internally and that it's inherently thread unsafe. Can anyone confirm/deny? maybe point me in the right direction? > Numpy does use global/static variables, and it is unsafe to call into > numpy simultaneously from different threads. But that's ok, because > you're not allowed to call numpy functions simultaneously from > different threads -- you have to hold the GIL first, and that > serializes access to all of numpy's internal state. Numpy is very > commonly used in threaded code and most people aren't seeing random > segfaults, so the problem is most likely in your code. Sorry I can't > help much more than that... I guess I'd start by triple-checking that > the code really truly does hold the GIL every time that it calls into > numpy/python APIs. I'd also try running it under valgrind in case it's > some other random memory corruption that's just showing up in a weird > way. > > -n > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtaylor.debian at googlemail.com Tue Jun 14 15:28:42 2016 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Tue, 14 Jun 2016 21:28:42 +0200 Subject: [Numpy-discussion] numpy threads crash when allocating arrays In-Reply-To: <9c39a92c-f79e-ebf2-51bc-dd2469c1703a@lbl.gov> References: <9c39a92c-f79e-ebf2-51bc-dd2469c1703a@lbl.gov> Message-ID: <57605AEA.6000609@googlemail.com> On 14.06.2016 19:34, Burlen Loring wrote: > > here's my question: given Py_BEGIN_ALLOW_THREADS is used by numpy how > can numpy be thread safe? and how can someone using the C-API know where > it's necessary to acquire the GIL? Maybe someone can explain this? > numpy only releases the GIL when it is not accessing any python objects or other non-threadsafe structures anymore. That is usually during computation loops and IO. Your problem is indeed a missing PyGILState_Ensure I am assuming that the threads you are using are not created by python, so you don't have a threadstate setup and no GIL. You do set it up with that function, see https://docs.python.org/2/c-api/init.html#non-python-created-threads From bloring at lbl.gov Tue Jun 14 15:38:32 2016 From: bloring at lbl.gov (Burlen Loring) Date: Tue, 14 Jun 2016 12:38:32 -0700 Subject: [Numpy-discussion] numpy threads crash when allocating arrays In-Reply-To: <57605AEA.6000609@googlemail.com> References: <9c39a92c-f79e-ebf2-51bc-dd2469c1703a@lbl.gov> <57605AEA.6000609@googlemail.com> Message-ID: On 06/14/2016 12:28 PM, Julian Taylor wrote: > On 14.06.2016 19:34, Burlen Loring wrote: > >> >> here's my question: given Py_BEGIN_ALLOW_THREADS is used by numpy how >> can numpy be thread safe? and how can someone using the C-API know where >> it's necessary to acquire the GIL? Maybe someone can explain this? >> > > numpy only releases the GIL when it is not accessing any python > objects or other non-threadsafe structures anymore. > That is usually during computation loops and IO. > > > Your problem is indeed a missing PyGILState_Ensure > > I am assuming that the threads you are using are not created by > python, so you don't have a threadstate setup and no GIL. > You do set it up with that function, see > https://docs.python.org/2/c-api/init.html#non-python-created-threads I'm already hold the GIL in each thread via the mechanism you pointed to, and I have verified this with gdb, some how GIL is being released. re-acquiring the GIL solves the issue, but it technically should cause a deadlock to acquire 2x in the same thread. I suspect Numpy use of Py_BEGIN_ALLOW_THREADS is cause of the issue. It will take some work to verify. From njs at pobox.com Tue Jun 14 16:05:16 2016 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 14 Jun 2016 13:05:16 -0700 Subject: [Numpy-discussion] numpy threads crash when allocating arrays In-Reply-To: References: <9c39a92c-f79e-ebf2-51bc-dd2469c1703a@lbl.gov> <57605AEA.6000609@googlemail.com> Message-ID: On Jun 14, 2016 12:38 PM, "Burlen Loring" wrote: > > On 06/14/2016 12:28 PM, Julian Taylor wrote: >> >> On 14.06.2016 19:34, Burlen Loring wrote: >> >>> >>> here's my question: given Py_BEGIN_ALLOW_THREADS is used by numpy how >>> can numpy be thread safe? and how can someone using the C-API know where >>> it's necessary to acquire the GIL? Maybe someone can explain this? >>> >> >> numpy only releases the GIL when it is not accessing any python objects or other non-threadsafe structures anymore. >> That is usually during computation loops and IO. >> >> >> Your problem is indeed a missing PyGILState_Ensure >> >> I am assuming that the threads you are using are not created by python, so you don't have a threadstate setup and no GIL. >> You do set it up with that function, see https://docs.python.org/2/c-api/init.html#non-python-created-threads > > I'm already hold the GIL in each thread via the mechanism you pointed to, and I have verified this with gdb, some how GIL is being released. re-acquiring the GIL solves the issue, but it technically should cause a deadlock to acquire 2x in the same thread. I suspect Numpy use of Py_BEGIN_ALLOW_THREADS is cause of the issue. It will take some work to verify. It's legal to call PyGILState_Ensure when you already have the GIL; the whole point of that function is that you can use it whether you have the GIL or not. However, if you already have the GIL, then it's a no-op, so it shouldn't have fixed your problems. If it did help, then this strongly suggests that you've missed something in your analysis of when you hold the GIL. While bugs are always possible, it's unlikely that this has anything to do with numpy using Py_BEGIN_ALLOW_THREADS. In theory numpy's use is safe, because it always follows the pattern of dropping the GIL, doing a chunk of work that is careful not to touch any globals or the python api, and then reacquiring the GIL. In practice it's possible that the code does something else in some edge case, but if so then it's a pretty subtle issue that's being triggered by some unusual thing about how you call into numpy. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From martino.sorbaro at ed.ac.uk Wed Jun 15 09:16:36 2016 From: martino.sorbaro at ed.ac.uk (Martino Sorbaro) Date: Wed, 15 Jun 2016 14:16:36 +0100 Subject: [Numpy-discussion] Axis argument to np.unique Message-ID: <57615534.6020002@ed.ac.uk> Hi all, I've opened a new pull request (https://github.com/numpy/numpy/pull/7742) trying to revive a previous one that was left abandoned (#3584, by another contributor), regarding the possibility of adding an 'axis=' argument to numpy.unique. There had been a debate (http://numpy-discussion.10968.n7.nabble.com/Adding-an-axis-argument-to-numpy-unique-td34841.html) about what the axis argument should mean. The current behaviour in the code I propose (written by the previous contributor) looks for unique rows if "axis=0" and unique columns if "axis=1", in other words: [In] a = array([[0, 0, 0], [1, 1, 1], [0, 0, 0]]) [In] unique(a, axis=0) [Out] array([[0, 0, 0], [1, 1, 1]]) So, I would just like to ask whether a conclusion can be reached about that discussion. Thanks! Martino -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From ben.v.root at gmail.com Wed Jun 15 09:31:56 2016 From: ben.v.root at gmail.com (Benjamin Root) Date: Wed, 15 Jun 2016 09:31:56 -0400 Subject: [Numpy-discussion] Axis argument to np.unique In-Reply-To: <57615534.6020002@ed.ac.uk> References: <57615534.6020002@ed.ac.uk> Message-ID: That seems like the only reasonable behavior, but I will admit that my initial desire is essentially a vectorized "unique" such that it returns the unique values of the stated axis. But that isn't possible because there can be different number of unique values in the given axis, resulting in a ragged array, which numpy does not support. Ben Root On Wed, Jun 15, 2016 at 9:16 AM, Martino Sorbaro wrote: > Hi all, > I've opened a new pull request > (https://github.com/numpy/numpy/pull/7742) trying to revive a previous > one that was left abandoned (#3584, by another contributor), regarding > the possibility of adding an 'axis=' argument to numpy.unique. > > There had been a debate > ( > http://numpy-discussion.10968.n7.nabble.com/Adding-an-axis-argument-to-numpy-unique-td34841.html > ) > about what the axis argument should mean. The current behaviour in the > code I propose (written by the previous contributor) looks for unique > rows if "axis=0" and unique columns if "axis=1", in other words: > > [In] a = array([[0, 0, 0], > [1, 1, 1], > [0, 0, 0]]) > > [In] unique(a, axis=0) > [Out] array([[0, 0, 0], > [1, 1, 1]]) > > > So, I would just like to ask whether a conclusion can be reached about > that discussion. > Thanks! > Martino > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bloring at lbl.gov Thu Jun 16 14:19:50 2016 From: bloring at lbl.gov (Burlen Loring) Date: Thu, 16 Jun 2016 11:19:50 -0700 Subject: [Numpy-discussion] numpy threads crash when allocating arrays In-Reply-To: References: <9c39a92c-f79e-ebf2-51bc-dd2469c1703a@lbl.gov> <57605AEA.6000609@googlemail.com> Message-ID: <4fdcf483-8452-462b-3b63-a270dd3b5812@lbl.gov> On 06/14/2016 01:05 PM, Nathaniel Smith wrote: > > On Jun 14, 2016 12:38 PM, "Burlen Loring" > wrote: > > > > On 06/14/2016 12:28 PM, Julian Taylor wrote: > >> > >> On 14.06.2016 19:34, Burlen Loring wrote: > >> > >>> > >>> here's my question: given Py_BEGIN_ALLOW_THREADS is used by numpy how > >>> can numpy be thread safe? and how can someone using the C-API know > where > >>> it's necessary to acquire the GIL? Maybe someone can explain this? > >>> > >> > >> numpy only releases the GIL when it is not accessing any python > objects or other non-threadsafe structures anymore. > >> That is usually during computation loops and IO. > >> > >> > >> Your problem is indeed a missing PyGILState_Ensure > >> > >> I am assuming that the threads you are using are not created by > python, so you don't have a threadstate setup and no GIL. > >> You do set it up with that function, see > https://docs.python.org/2/c-api/init.html#non-python-created-threads > > > > I'm already hold the GIL in each thread via the mechanism you > pointed to, and I have verified this with gdb, some how GIL is being > released. re-acquiring the GIL solves the issue, but it technically > should cause a deadlock to acquire 2x in the same thread. I suspect > Numpy use of Py_BEGIN_ALLOW_THREADS is cause of the issue. It will > take some work to verify. > > It's legal to call PyGILState_Ensure when you already have the GIL; > the whole point of that function is that you can use it whether you > have the GIL or not. However, if you already have the GIL, then it's a > no-op, so it shouldn't have fixed your problems. If it did help, then > this strongly suggests that you've missed something in your analysis > of when you hold the GIL. > > While bugs are always possible, it's unlikely that this has anything > to do with numpy using Py_BEGIN_ALLOW_THREADS. In theory numpy's use > is safe, because it always follows the pattern of dropping the GIL, > doing a chunk of work that is careful not to touch any globals or the > python api, and then reacquiring the GIL. In practice it's possible > that the code does something else in some edge case, but if so then > it's a pretty subtle issue that's being triggered by some unusual > thing about how you call into numpy. > Thank you guys for the feedback and being a sounding board for my explorations and ideas. I think I got to the bottom of it. I think you are right it has nothing to do with numpy. Also, I am indeed acquiring the GIL before invoking the callback and releasing it after which is the right thing to do. However, it turns out SWIG brackets wrapped C++ code with Py_BEGIN/END_ALLOW_THREADS blocks, thus any calls through SWIG wrapped code from within the callback release the GIL! I guess this normally wouldn't be an issue, except that I have used %extend directives and used Python and Numpy C-API's in a bunch of places to provide Python specific interface to our data structures or do stuff more seamlessly and/or beyond what's possible with typemaps. SWIG releases the GIL prior to invoking my extensions, which hit the C-API, subsequently chaos ensues. I think the solution is to acquire the GIL again in all these extensions where I touch Python C-API. It seems to have solved the problem! Thanks and regrets for all the discussion on the numpy list which probably belongs in the swig list. Burlen -------------- next part -------------- An HTML attachment was scrubbed... URL: From rxu823 at gmail.com Thu Jun 16 18:21:29 2016 From: rxu823 at gmail.com (Roger Xu) Date: Thu, 16 Jun 2016 18:21:29 -0400 Subject: [Numpy-discussion] how did numpy make mkl use multiple threads? Message-ID: Hi. I have tried a lot of things to make mkl use multiple threads. I just learned to write c extension for python yesterday. Can I know how did numpy make mkl use multiple threads? Numpy can make mkl do that on the machine I am working on now. My best try is described at https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/640387 two earlier attempts are described here: https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/639846 and also here: http://stackoverflow.com/questions/37536106/directly-use-intel-mkl-library-on-scipy-sparse-matrix-to-calculate-a-dot-a-t-wit Thank you. - rxu -------------- next part -------------- An HTML attachment was scrubbed... URL: From gfyoung17 at gmail.com Fri Jun 17 00:02:43 2016 From: gfyoung17 at gmail.com (G Young) Date: Fri, 17 Jun 2016 00:02:43 -0400 Subject: [Numpy-discussion] broadcasting for randint In-Reply-To: References: Message-ID: Hello all, Thank you to those who commented on this PR and for pushing it to a *much better place* in terms of templating with Tempita. With that issue out of the way it seems, the momentum has stalled a bit. However, it would be great to receive any additional feedback, *especially from maintainers* so as to help get this merged! Thanks! On Tue, Jun 7, 2016 at 1:23 PM, G Young wrote: > There seems to be a push in my PR now for using Tempita as a way to solve > this issue with the ad-hoc templating. However, before I go about > attempting this, it would be great to receive feedback from other > developers on this, especially from some of the numpy maintainers. Thanks! > > On Tue, Jun 7, 2016 at 3:04 AM, G Young wrote: > >> Just wanted to ping the mailing list again in case this email (see below) >> got lost in your inboxes. Would be great to get some feedback on this! >> Thanks! >> >> On Sun, May 22, 2016 at 2:15 AM, G Young wrote: >> >>> Hi, >>> >>> I have had a PR open for >>> quite some time now that allows arguments to broadcast in *randint*. >>> While the functionality is fully in-place and very robust, the obstacle at >>> this point is the implementation. >>> >>> When the *dtype* parameter was added to *randint* (see here >>> ), a big issue with the >>> implementation was that it created so much duplicate code that it would be >>> a huge maintenance nightmare. However, this was dismissed in the original >>> PR message because it was believed that template-ing would be trivial, >>> which seemed reasonable at the time. >>> >>> When I added broadcasting, I introduced a template system to the code >>> that dramatically cut down on the duplication. However, the obstacle has >>> been whether or not this template system is too *ad hoc* to be merged >>> into the library. Implementing a template in Cython was not considered >>> sufficient and is in fact very tricky to do, and unfortunately, I have not >>> received any constructive suggestions from maintainers about how to >>> proceed, so I'm opening this up to the mailing to see whether or not there >>> are better alternatives to what I did, whether this should be merged as it, >>> or whether this should be tabled until a better template can be found. >>> >>> Thanks! >>> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gfyoung17 at gmail.com Fri Jun 17 00:04:21 2016 From: gfyoung17 at gmail.com (G Young) Date: Fri, 17 Jun 2016 00:04:21 -0400 Subject: [Numpy-discussion] axis parameter for count_nonzero In-Reply-To: References: Message-ID: Just wanted to ping the mailing this again about this PR. Just to recap, some simplification has been done thanks to some suggestions by *@rgommers*, though the question still remains whether or not leaving the *axis* parameter in the Python API for now (given how complicated it is to add in the C API) is acceptable. I will say that in response to the concern of adding parameters such as "out" and "keepdims" (should they be requested), we could avail ourselves to functions like median for help as *@juliantaylor* pointed out. The *scipy* library has dealt with this problem as well in its *sparse* modules, so that is also a useful resource. Feedback on this issue would be much appreciated! Thanks! On Sun, May 22, 2016 at 1:36 PM, G Young wrote: > After some discussion with *@rgommers*, I have simplified the code as > follows: > > 1) the path to the original count_nonzero in the C API is essentially > unchanged, save some small overhead with Python calling and the > if-statement to check the *axis* parameter > > 2) All of the complicated validation of the *axis* parameter and > acrobatics for getting the count is handled *only* after we cannot > fast-track via a numerical, boolean, or string *dtype*. > > The question still remains whether or not leaving the *axis* parameter in > the Python API for now (given how complicated it is to add in the C API) is > acceptable. I will say that in response to the concern of adding > parameters such as "out" and "keepdims" (should they be requested), we > could avail ourselves to functions like median > for > help as *@juliantaylor* pointed out. The *scipy* library has dealt with > this problem as well in its *sparse* modules, so that is also a useful > resource. > > On Sun, May 22, 2016 at 1:35 PM, G Young wrote: > >> 1) Correction: The PR was not written with small arrays in mind. I ran >> some new timing tests, and it does perform worse on smaller arrays but >> appears to scale better than the current implementation. >> >> 2) Let me put it out there that I am not opposed to moving it to C, but >> right now, there seems to be a large technical brick wall up against such >> an implementation. So suggestions about how to move the code into C would >> be welcome too! >> >> On Sun, May 22, 2016 at 10:32 AM, Ralf Gommers >> wrote: >> >>> >>> >>> On Sun, May 22, 2016 at 3:05 AM, G Young wrote: >>> >>>> Hi, >>>> >>>> I have had a PR open (first >>>> draft can be found here ) for >>>> quite some time now that adds an 'axis' parameter to *count_nonzero*. >>>> While the functionality is fully in-place, very robust, and actually >>>> higher-performing than the original *count_nonzero* function, the >>>> obstacle at this point is the implementation, as most of the functionality >>>> is now surfaced at the Python level instead of at the C level. >>>> >>>> I have made several attempts to move the code into C to no avail and >>>> have not received much feedback from maintainers unfortunately to move this >>>> forward, so I'm opening this up to the mailing list to see what you guys >>>> think of the changes and whether or not it should be merged in as is or be >>>> tabled until a more C-friendly solution can be found. >>>> >>> >>> The discussion is spread over several PRs/issues, so maybe a summary is >>> useful: >>> >>> - adding an axis parameter was a feature request that was generally >>> approved of [1] >>> - writing the axis selection/validation code in C, like the rest of >>> count_nonzero, was preferred by several core devs >>> - Writing that C code turns out to be tricky. Jaime had a PR for doing >>> this for bincount [2], but closed it with final conclusion "the proper >>> approach seems to me to build some intermediate layer over nditer that >>> abstracts the complexity away". >>> - Julian pointed out that this adds a ufunc-like param, so why not add >>> other params like out/keepdims [3] >>> - Stephan points out that the current PR has quite a few branches, would >>> benefit from reusing a helper function (like _validate_axis, but that may >>> not do exactly the right thing), and that he doesn't want to merge it as is >>> without further input from other devs [4]. >>> >>> Points previously not raised that I can think of: >>> - count_nonzero is also in the C API [5], the axis parameter is now only >>> added to the Python API. >>> - Part of why the code in this PR is complex is to keep performance for >>> small arrays OK, but there's no benchmarks added or result given for the >>> existing benchmark [6]. A simple check with: >>> x = np.arange(100) >>> %timeit np.count_nonzero(x) >>> shows that that gets about 30x slower (330 ns vs 10.5 us on my machine). >>> >>> It looks to me like performance is a concern, and if that can be >>> resolved there's the broader discussion of whether it's a good idea to >>> merge this PR at all. That's a trade-off of adding a useful feature vs. >>> technical debt / maintenance burden plus divergence Python/C API. Also, >>> what do we do when we merge this and then next week someone else sends a PR >>> adding a keepdims or out keyword? For these kinds of additions it would >>> feel better if we were sure that the new version is the final/desired one >>> for the foreseeable future. >>> >>> Ralf >>> >>> >>> [1] https://github.com/numpy/numpy/issues/391 >>> [2] https://github.com/numpy/numpy/pull/4330#issuecomment-77791250 >>> [3] https://github.com/numpy/numpy/pull/7138#issuecomment-177202894 >>> [4] https://github.com/numpy/numpy/pull/7177 >>> [5] >>> http://docs.scipy.org/doc/numpy/reference/c-api.array.html#c.PyArray_CountNonzero >>> [6] >>> https://github.com/numpy/numpy/blob/master/benchmarks/benchmarks/bench_ufunc.py#L70 >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> https://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From oleksandr.pavlyk at intel.com Fri Jun 17 11:08:19 2016 From: oleksandr.pavlyk at intel.com (Pavlyk, Oleksandr) Date: Fri, 17 Jun 2016 15:08:19 +0000 Subject: [Numpy-discussion] Design feedback solicitation Message-ID: <4C9EDA7282E297428F3986994EB0FBD38342AA@ORSMSX110.amr.corp.intel.com> Hi, I am new to this list, so I will start with an introduction. My name is Oleksandr Pavlyk. I now work at Intel Corp. on the Intel Distribution for Python, and previously worked at Wolfram Research for 12 years. My latest project was to write a mirror to numpy.random, named numpy.random_intel. The module uses MKL to sample from different distributions for efficiency. It provides support for different underlying algorithms for basic pseudo-random number generation, i.e. in addition to MT19937, it also provides SFMT19937, MT2203, etc. I recently published a blog about it: https://software.intel.com/en-us/blogs/2016/06/15/faster-random-number-generation-in-intel-distribution-for-python I originally attempted to simply replace numpy.random in the Intel Distribution for Python with the new module, but due to fixed seed backwards incompatibility this results in numerous test failures in numpy, scipy, pandas and other modules. Unlike numpy.random, the new module generates a vector of random numbers at a time, which can be done faster than repeatedly generating the same number of variates one at a time. The source code for the new module is not upstreamed yet, and this email is meant to solicit early community feedback to allow for faster acceptance of the proposed changes. Thank you, Oleksandr -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Fri Jun 17 11:22:45 2016 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 17 Jun 2016 16:22:45 +0100 Subject: [Numpy-discussion] Design feedback solicitation In-Reply-To: <4C9EDA7282E297428F3986994EB0FBD38342AA@ORSMSX110.amr.corp.intel.com> References: <4C9EDA7282E297428F3986994EB0FBD38342AA@ORSMSX110.amr.corp.intel.com> Message-ID: On Fri, Jun 17, 2016 at 4:08 PM, Pavlyk, Oleksandr < oleksandr.pavlyk at intel.com> wrote: > > Hi, > > I am new to this list, so I will start with an introduction. My name is Oleksandr Pavlyk. I now work at Intel Corp. on the Intel Distribution for Python, and previously worked at Wolfram Research for 12 years. My latest project was to write a mirror to numpy.random, named numpy.random_intel. The module uses MKL to sample from different distributions for efficiency. It provides support for different underlying algorithms for basic pseudo-random number generation, i.e. in addition to MT19937, it also provides SFMT19937, MT2203, etc. > > I recently published a blog about it: > > https://software.intel.com/en-us/blogs/2016/06/15/faster-random-number-generation-in-intel-distribution-for-python > > I originally attempted to simply replace numpy.random in the Intel Distribution for Python with the new module, but due to fixed seed backwards incompatibility this results in numerous test failures in numpy, scipy, pandas and other modules. > > Unlike numpy.random, the new module generates a vector of random numbers at a time, which can be done faster than repeatedly generating the same number of variates one at a time. > > The source code for the new module is not upstreamed yet, and this email is meant to solicit early community feedback to allow for faster acceptance of the proposed changes. Cool! You can find pertinent discussion here: https://github.com/numpy/numpy/issues/6967 And the current effort for adding new core PRNGs here: https://github.com/bashtage/ng-numpy-randomstate -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Fri Jun 17 14:30:34 2016 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Fri, 17 Jun 2016 20:30:34 +0200 Subject: [Numpy-discussion] Deprecating silent truncation of floats when assigned to int array In-Reply-To: References: Message-ID: On Tue, Jun 14, 2016 at 12:23 AM, Ian Henriksen < insertinterestingnamehere at gmail.com> wrote: > Personally, I think this is a great idea. +1 to more informative errors. > +1 from me as well Ralf > Best, > Ian Henriksen > > On Mon, Jun 13, 2016 at 2:11 PM Nathaniel Smith wrote: > >> It was recently pointed out: >> >> https://github.com/numpy/numpy/issues/7730 >> >> that this code silently truncates floats: >> >> In [1]: a = np.arange(10) >> >> In [2]: a.dtype >> Out[2]: dtype('int64') >> >> In [3]: a[3] = 1.5 >> >> In [4]: a[3] >> Out[4]: 1 >> >> The proposal is that we should deprecate this, and eventually turn it >> into an error. Any objections? >> >> We recently went through a similar deprecation cycle for in-place >> operations, i.e., this used to silently truncate but now raises an >> error: >> >> In [1]: a = np.arange(10) >> >> In [2]: a += 1.5 >> >> --------------------------------------------------------------------------- >> TypeError Traceback (most recent call >> last) >> in () >> ----> 1 a += 1.5 >> >> TypeError: Cannot cast ufunc add output from dtype('float64') to >> dtype('int64') with casting rule 'same_kind' >> >> so the proposal here is to extend this to regular assignment. >> >> -n >> >> -- >> Nathaniel J. Smith -- https://vorpus.org >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Jun 17 16:41:57 2016 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 17 Jun 2016 14:41:57 -0600 Subject: [Numpy-discussion] Design feedback solicitation In-Reply-To: References: <4C9EDA7282E297428F3986994EB0FBD38342AA@ORSMSX110.amr.corp.intel.com> Message-ID: On Fri, Jun 17, 2016 at 9:22 AM, Robert Kern wrote: > On Fri, Jun 17, 2016 at 4:08 PM, Pavlyk, Oleksandr < > oleksandr.pavlyk at intel.com> wrote: > > > > Hi, > > > > I am new to this list, so I will start with an introduction. My name is > Oleksandr Pavlyk. I now work at Intel Corp. on the Intel Distribution for > Python, and previously worked at Wolfram Research for 12 years. My latest > project was to write a mirror to numpy.random, named numpy.random_intel. > The module uses MKL to sample from different distributions for efficiency. > It provides support for different underlying algorithms for basic > pseudo-random number generation, i.e. in addition to MT19937, it also > provides SFMT19937, MT2203, etc. > > > > I recently published a blog about it: > > > > > https://software.intel.com/en-us/blogs/2016/06/15/faster-random-number-generation-in-intel-distribution-for-python > > > > I originally attempted to simply replace numpy.random in the Intel > Distribution for Python with the new module, but due to fixed seed > backwards incompatibility this results in numerous test failures in numpy, > scipy, pandas and other modules. > > > > Unlike numpy.random, the new module generates a vector of random numbers > at a time, which can be done faster than repeatedly generating the same > number of variates one at a time. > > > > The source code for the new module is not upstreamed yet, and this email > is meant to solicit early community feedback to allow for faster acceptance > of the proposed changes. > > Cool! You can find pertinent discussion here: > > https://github.com/numpy/numpy/issues/6967 > > And the current effort for adding new core PRNGs here: > > https://github.com/bashtage/ng-numpy-randomstate > I wonder if the easiest thing to do at this point might be to implement a new redesigned random module and keep the old one around for backward compatibility? Not that that would make everything easy, but at least folks could choose to use the new functions for speed and versatility if they needed them. The current random module is pretty stable so maintenance should not be too onerous. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From evgeny.burovskiy at gmail.com Mon Jun 20 09:31:44 2016 From: evgeny.burovskiy at gmail.com (Evgeni Burovski) Date: Mon, 20 Jun 2016 14:31:44 +0100 Subject: [Numpy-discussion] scipy 0.18 release candidate 1 Message-ID: Hi, I'm pleased to announce the availability of the first release candidate for scipy 0.18.0. Please try this release and report any issues on Github tracker, https://github.com/scipy/scipy, or scipy-dev mailing list. Source tarballs and release notes are available from Github releases, https://github.com/scipy/scipy/releases/tag/v0.18.0rc1 Please note that this is a source-only release. We do not provide Windows binaries for this release. OS X and Linux wheels will be provided for the final release. The current release schedule is 27 June: rc2 (if necessary) 11 July: final release Thanks to everyone who contributed to this release! Cheers, Evgeni A part of the release notes follows: ========================== SciPy 0.18.0 Release Notes ========================== .. note:: Scipy 0.18.0 is not released yet! .. contents:: SciPy 0.18.0 is the culmination of 6 months of hard work. It contains many new features, numerous bug-fixes, improved test coverage and better documentation. There have been a number of deprecations and API changes in this release, which are documented below. All users are encouraged to upgrade to this release, as there are a large number of bug-fixes and optimizations. Moreover, our development attention will now shift to bug-fix releases on the 0.19.x branch, and on adding new features on the master branch. This release requires Python 2.7 or 3.4-3.5 and NumPy 1.7.1 or greater. Highlights of this release include: - - A new ODE solver for two-point boundary value problems, `scipy.optimize.solve_bvp`. - - A new class, `CubicSpline`, for cubic spline interpolation of data. - - N-dimensional tensor product polynomials, `scipy.interpolate.NdPPoly`. - - Spherical Voronoi diagrams, `scipy.spatial.SphericalVoronoi`. - - Support for discrete-time linear systems, `scipy.signal.dlti`. New features ============ `scipy.integrate` improvements - ------------------------------ A solver of two-point boundary value problems for ODE systems has been implemented in `scipy.integrate.solve_bvp`. The solver allows for non-separated boundary conditions, unknown parameters and certain singular terms. It finds a C1 continious solution using a fourth-order collocation algorithm. `scipy.interpolate` improvements - -------------------------------- Cubic spline interpolation is now available via `scipy.interpolate.CubicSpline`. This class represents a piecewise cubic polynomial passing through given points and C2 continuous. It is represented in the standard polynomial basis on each segment. A representation of n-dimensional tensor product piecewise polynomials is available as the `scipy.interpolate.NdPPoly` class. Univariate piecewise polynomial classes, `PPoly` and `Bpoly`, can now be evaluated on periodic domains. Use ``extrapolate="periodic"`` keyword argument for this. `scipy.fftpack` improvements - ---------------------------- `scipy.fftpack.next_fast_len` function computes the next "regular" number for FFTPACK. Padding the input to this length can give significant performance increase for `scipy.fftpack.fft`. `scipy.signal` improvements - --------------------------- Resampling using polyphase filtering has been implemented in the function `scipy.signal.resample_poly`. This method upsamples a signal, applies a zero-phase low-pass FIR filter, and downsamples using `scipy.signal.upfirdn` (which is also new in 0.18.0). This method can be faster than FFT-based filtering provided by `scipy.signal.resample` for some signals. `scipy.signal.firls`, which constructs FIR filters using least-squares error minimization, was added. `scipy.signal.sosfiltfilt`, which does forward-backward filtering like `scipy.signal.filtfilt` but for second-order sections, was added. Discrete-time linear systems ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ `scipy.signal.dlti` provides an implementation of discrete-time linear systems. Accordingly, the `StateSpace`, `TransferFunction` and `ZerosPolesGain` classes have learned a the new keyword, `dt`, which can be used to create discrete-time instances of the corresponding system representation. `scipy.sparse` improvements - --------------------------- The functions `sum`, `max`, `mean`, `min`, `transpose`, and `reshape` in `scipy.sparse` have had their signatures augmented with additional arguments and functionality so as to improve compatibility with analogously defined functions in `numpy`. Sparse matrices now have a `count_nonzero` method, which counts the number of nonzero elements in the matrix. Unlike `getnnz()` and ``nnz`` propety, which return the number of stored entries (the length of the data attribute), this method counts the actual number of non-zero entries in data. `scipy.optimize` improvements - ----------------------------- The implementation of Nelder-Mead minimization, `scipy.minimize(..., method="Nelder-Mead")`, obtained a new keyword, `initial_simplex`, which can be used to specify the initial simplex for the optimization process. Initial step size selection in CG and BFGS minimizers has been improved. We expect that this change will improve numeric stability of optimization in some cases. See pull request gh-5536 for details. Handling of infinite bounds in SLSQP optimization has been improved. We expect that this change will improve numeric stability of optimization in the some cases. See pull request gh-6024 for details. A large suite of global optimization benchmarks has been added to ``scipy/benchmarks/go_benchmark_functions``. See pull request gh-4191 for details. Nelder-Mead and Powell minimization will now only set defaults for maximum iterations or function evaluations if neither limit is set by the caller. In some cases with a slow converging function and only 1 limit set, the minimization may continue for longer than with previous versions and so is more likely to reach convergence. See issue gh-5966. `scipy.stats` improvements - -------------------------- Trapezoidal distribution has been implemented as `scipy.stats.trapz`. Skew normal distribution has been implemented as `scipy.stats.skewnorm`. Burr type XII distribution has been implemented as `scipy.stats.burr12`. Three- and four-parameter kappa distributions have been implemented as `scipy.stats.kappa3` and `scipy.stats.kappa4`, respectively. New `scipy.stats.iqr` function computes the interquartile region of a distribution. Random matrices ~~~~~~~~~~~~~~~ `scipy.stats.special_ortho_group` and `scipy.stats.ortho_group` provide generators of random matrices in the SO(N) and O(N) groups, respectively. They generate matrices in the Haar distribution, the only uniform distribution on these group manifolds. `scipy.stats.random_correlation` provides a generator for random correlation matrices, given specified eigenvalues. `scipy.linalg` improvements - --------------------------- `scipy.linalg.svd` gained a new keyword argument, ``lapack_driver``. Available drivers are ``gesdd`` (default) and ``gesvd``. `scipy.linalg.lapack.ilaver` returns the version of the LAPACK library SciPy links to. `scipy.spatial` improvements - ---------------------------- Boolean distances, `scipy.spatial.pdist`, have been sped up. Improvements vary by the function and the input size. In many cases, one can expect a speed-up of x2--x10. New class `scipy.spatial.SphericalVoronoi` constructs Voronoi diagrams on the surface of a sphere. See pull request gh-5232 for details. `scipy.cluster` improvements - ---------------------------- A new clustering algorithm, the nearest neighbor chain algorithm, has been implemented for `scipy.cluster.hierarchy.linkage`. As a result, one can expect a significant algorithmic improvement (:math:`O(N^2)` instead of :math:`O(N^3)`) for several linkage methods. `scipy.special` improvements - ---------------------------- The new function `scipy.special.loggamma` computes the principal branch of the logarithm of the Gamma function. For real input, ``loggamma`` is compatible with `scipy.special.gammaln`. For complex input, it has more consistent behavior in the complex plane and should be preferred over ``gammaln``. Vectorized forms of spherical Bessel functions have been implemented as `scipy.special.spherical_jn`, `scipy.special.spherical_kn`, `scipy.special.spherical_in` and `scipy.special.spherical_yn`. They are recommended for use over ``sph_*`` functions, which are now deprecated. Several special functions have been extended to the complex domain and/or have seen domain/stability improvements. This includes `spence`, `digamma`, `log1p` and several others. Deprecated features =================== The cross-class properties of `lti` systems have been deprecated. The following properties/setters will raise a `DeprecationWarning`: Name - (accessing/setting raises warning) - (setting raises warning) * StateSpace - (`num`, `den`, `gain`) - (`zeros`, `poles`) * TransferFunction (`A`, `B`, `C`, `D`, `gain`) - (`zeros`, `poles`) * ZerosPolesGain (`A`, `B`, `C`, `D`, `num`, `den`) - () Spherical Bessel functions, ``sph_in``, ``sph_jn``, ``sph_kn``, ``sph_yn``, ``sph_jnyn`` and ``sph_inkn`` have been deprecated in favor of `scipy.special.spherical_jn` and ``spherical_kn``, ``spherical_yn``, ``spherical_in``. The following functions in `scipy.constants` are deprecated: ``C2K``, ``K2C``, ``C2F``, ``F2C``, ``F2K`` and ``K2F``. They are superceded by a new function `scipy.constants.convert_temperature` that can perform all those conversions plus to/from the Rankine temperature scale. Backwards incompatible changes ============================== `scipy.optimize` - ---------------- The convergence criterion for ``optimize.bisect``, ``optimize.brentq``, ``optimize.brenth``, and ``optimize.ridder`` now works the same as ``numpy.allclose``. `scipy.ndimage` - --------------- The offset in ``ndimage.iterpolation.affine_transform`` is now consistently added after the matrix is applied, independent of if the matrix is specified using a one-dimensional or a two-dimensional array. `scipy.stats` - ------------- ``stats.ks_2samp`` used to return nonsensical values if the input was not real or contained nans. It now raises an exception for such inputs. Several deprecated methods of `scipy.stats` distributions have been removed: ``est_loc_scale``, ``vecfunc``, ``veccdf`` and ``vec_generic_moment``. Deprecated functions ``nanmean``, ``nanstd`` and ``nanmedian`` have been removed from `scipy.stats`. These functions were deprecated in scipy 0.15.0 in favor of their `numpy` equivalents. A bug in the ``rvs()`` method of the distributions in `scipy.stats` has been fixed. When arguments to ``rvs()`` were given that were shaped for broadcasting, in many cases the returned random samples were not random. A simple example of the problem is ``stats.norm.rvs(loc=np.zeros(10))``. Because of the bug, that call would return 10 identical values. The bug only affected code that relied on the broadcasting of the shape, location and scale parameters. The ``rvs()`` method also accepted some arguments that it should not have. There is a potential for backwards incompatibility in cases where ``rvs()`` accepted arguments that are not, in fact, compatible with broadcasting. An example is stats.gamma.rvs([2, 5, 10, 15], size=(2,2)) The shape of the first argument is not compatible with the requested size, but the function still returned an array with shape (2, 2). In scipy 0.18, that call generates a ``ValueError``. `scipy.io` - ---------- `scipy.io.netcdf` masking now gives precedence to the ``_FillValue`` attribute over the ``missing_value`` attribute, if both are given. Also, data are only treated as missing if they match one of these attributes exactly: values that differ by roundoff from ``_FillValue`` or ``missing_value`` are no longer treated as missing values. `scipy.interpolate` - ------------------- `scipy.interpolate.PiecewisePolynomial` class has been removed. It has been deprecated in scipy 0.14.0, and `scipy.interpolate.BPoly.from_derivatives` serves as a drop-in replacement. Other changes ============= Scipy now uses ``setuptools`` for its builds instead of plain distutils. This fixes usage of ``install_requires='scipy'`` in the ``setup.py`` files of projects that depend on Scipy (see Numpy issue gh-6551 for details). It potentially affects the way that build/install methods for Scipy itself behave though. Please report any unexpected behavior on the Scipy issue tracker. PR `#6240 `__ changes the interpretation of the `maxfun` option in `L-BFGS-B` based routines in the `scipy.optimize` module. An `L-BFGS-B` search consists of multiple iterations, with each iteration consisting of one or more function evaluations. Whereas the old search strategy terminated immediately upon reaching `maxfun` function evaluations, the new strategy allows the current iteration to finish despite reaching `maxfun`. The bundled copy of Qhull in the `scipy.spatial` subpackage has been upgraded to version 2015.2. The bundled copy of ARPACK in the `scipy.sparse.linalg` subpackage has been upgraded to arpack-ng 3.3.0. The bundled copy of SuperLU in the `scipy.sparse` subpackage has been upgraded to version 5.1.1. Authors ======= * @endolith * @yanxun827 + * @kleskjr + * @MYheavyGo + * @solarjoe + * Gregory Allen + * Gilles Aouizerate + * Tom Augspurger + * Henrik Bengtsson + * Felix Berkenkamp * Per Brodtkorb * Lars Buitinck * Daniel Bunting + * Evgeni Burovski * CJ Carey * Tim Cera * Grey Christoforo + * Robert Cimrman * Philip DeBoer + * Yves Delley + * D?vid Bodn?r + * Ion Elberdin + * Gabriele Farina + * Yu Feng * Andrew Fowlie + * Joseph Fox-Rabinovitz * Simon Gibbons + * Neil Girdhar + * Kolja Glogowski + * Christoph Gohlke * Ralf Gommers * Todd Goodall + * Johnnie Gray + * Alex Griffing * Olivier Grisel * Thomas Haslwanter + * Michael Hirsch + * Derek Homeier * Golnaz Irannejad + * Marek Jacob + * InSuk Joung + * Tetsuo Koyama + * Eugene Krokhalev + * Eric Larson * Denis Laxalde * Antony Lee * Jerry Li + * Henry Lin + * Nelson Liu + * Lo?c Est?ve * Lei Ma + * Osvaldo Martin + * Stefano Martina + * Nikolay Mayorov * Matthieu Melot + * Sturla Molden * Eric Moore * Alistair Muldal + * Maniteja Nandana * Tavi Nathanson + * Andrew Nelson * Joel Nothman * Behzad Nouri * Nikolai Nowaczyk + * Juan Nunez-Iglesias + * Ted Pudlik * Eric Quintero * Yoav Ram * Jonas Rauber + * Tyler Reddy + * Juha Remes * Garrett Reynolds + * Ariel Rokem + * Fabian Rost + * Bill Sacks + * Jona Sassenhagen + * Marcello Seri + * Sourav Singh + * Martin Spacek + * S?ren Fuglede J?rgensen + * Bhavika Tekwani + * Martin Thoma + * Sam Tygier + * Meet Udeshi + * Utkarsh Upadhyay * Bram Vandekerckhove + * Sebasti?n Vanrell + * Ze Vinicius + * Pauli Virtanen * Stefan van der Walt * Warren Weckesser * Jakub Wilk + * Josh Wilson * Phillip J. Wolfram + * Nathan Woods * Haochen Wu * G Young + A total of 99 people contributed to this release. People with a "+" by their names contributed a patch for the first time. This list of names is automatically generated, and may not be fully complete. From alan.isaac at gmail.com Mon Jun 20 16:31:04 2016 From: alan.isaac at gmail.com (Alan Isaac) Date: Mon, 20 Jun 2016 16:31:04 -0400 Subject: [Numpy-discussion] Integers to integer powers, let's make a decision In-Reply-To: References: <253516852486873130.010471sturla.molden-gmail.com@news.gmane.org> <1e673c3b-8bea-c7b7-3ccd-d35f417741ee@gmail.com> <575B5B2E.5090709@gmail.com> <575EE869.9080401@gmail.com> Message-ID: On 6/13/2016 1:54 PM, Marten van Kerkwijk wrote: > 1. What in principle is the best return type for int ** int (which > Josef I think most properly rephrased as whether `**` should be > thought of as a float operator, like `/` in python3 and `sqrt` etc.); Perhaps the question is somewhat different. Maybe it is: what type should a user expect when the exponent is a Python int? The obvious choices seem to be an object array of Python ints, or an array of floats. So far, nobody has proposed the former, and concerns have been expressed about the latter. More important, either would break the rule that the scalar type is not important in array operations, which seems like a good general rule (useful and easy to remember). How much commitment is there to such a rule? E.g., np.int64(2**7)*np.arange(5,dtype=np.int8) violates this. One thing that has come out of this discussion for me is that the actual rules in play are hard to keep track of. Are they all written down in one place? I suspect there is general support for the idea that if someone explicitly specifies the same dtype for the base and the exponent then the result should also have that dtype. I think this is already true for array exponentiation and for scalar exponentiation. One other thing that a user might expect, I believe, is that any type promotion rules for scalars and arrays will be the same. This is not currently the case, and that feels like an inconsistency. But is it an inconsistency? If the rule is that that array type dominates the scalar type, that may be understandable, but then it should be a firm rule. In this case, an exponent that is a Python int should not affect the dtype of the (array) result. In sum, as a user, I've come around to Chuck's original proposal: integers raised to negative integer powers raise an error. My reason for coming around is that I believe it meshes well with a general rule that in binary operations the scalar dtypes should not influence the dtype of an array result. Otoh, it is unclear to me how much commitment there is to that rule. Thanks in advance to anyone who can help me understand better the issues in play. Cheers, Alan Isaac From alan.isaac at gmail.com Mon Jun 20 17:11:54 2016 From: alan.isaac at gmail.com (Alan Isaac) Date: Mon, 20 Jun 2016 17:11:54 -0400 Subject: [Numpy-discussion] Integers to integer powers, let's make a decision In-Reply-To: <575B5B2E.5090709@gmail.com> References: <253516852486873130.010471sturla.molden-gmail.com@news.gmane.org> <1e673c3b-8bea-c7b7-3ccd-d35f417741ee@gmail.com> <575B5B2E.5090709@gmail.com> Message-ID: <15d20753-ce82-9802-592e-a39bae3fcd23@gmail.com> On 6/10/2016 8:28 PM, Allan Haldane wrote: > My understanding is that numpy never upcasts based on the values, it > upcasts based on the datatype ranges. > > http://docs.scipy.org/doc/numpy-1.10.1/reference/ufuncs.html#casting-rules >>> (np.int64(2**6)*np.arange(5,dtype=np.int8)).dtype dtype('int8') >>> (np.int64(2**7)*np.arange(5,dtype=np.int8)).dtype dtype('int16') fwiw, Alan Isaac From njs at pobox.com Mon Jun 20 17:59:44 2016 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 20 Jun 2016 14:59:44 -0700 Subject: [Numpy-discussion] Integers to integer powers, let's make a decision In-Reply-To: <15d20753-ce82-9802-592e-a39bae3fcd23@gmail.com> References: <253516852486873130.010471sturla.molden-gmail.com@news.gmane.org> <1e673c3b-8bea-c7b7-3ccd-d35f417741ee@gmail.com> <575B5B2E.5090709@gmail.com> <15d20753-ce82-9802-592e-a39bae3fcd23@gmail.com> Message-ID: On Mon, Jun 20, 2016 at 2:11 PM, Alan Isaac wrote: > On 6/10/2016 8:28 PM, Allan Haldane wrote: >> >> My understanding is that numpy never upcasts based on the values, it >> upcasts based on the datatype ranges. >> >> http://docs.scipy.org/doc/numpy-1.10.1/reference/ufuncs.html#casting-rules > > > > >>>> (np.int64(2**6)*np.arange(5,dtype=np.int8)).dtype > dtype('int8') >>>> (np.int64(2**7)*np.arange(5,dtype=np.int8)).dtype > dtype('int16') If you have the time to check for existing bug reports about this, and file a new bug if you don't find one, then it'd be appreciated. I suspect it's something that would be better handled as part of an overhaul of the casting rules in general rather than on its own, but it'd be good to at least have a record of it somewhere. -n -- Nathaniel J. Smith -- https://vorpus.org From josef.pktd at gmail.com Mon Jun 20 18:09:18 2016 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 20 Jun 2016 18:09:18 -0400 Subject: [Numpy-discussion] Integers to integer powers, let's make a decision In-Reply-To: References: <253516852486873130.010471sturla.molden-gmail.com@news.gmane.org> <1e673c3b-8bea-c7b7-3ccd-d35f417741ee@gmail.com> <575B5B2E.5090709@gmail.com> <575EE869.9080401@gmail.com> Message-ID: On Mon, Jun 20, 2016 at 4:31 PM, Alan Isaac wrote: > On 6/13/2016 1:54 PM, Marten van Kerkwijk wrote: >> >> 1. What in principle is the best return type for int ** int (which >> Josef I think most properly rephrased as whether `**` should be >> thought of as a float operator, like `/` in python3 and `sqrt` etc.); > > > > Perhaps the question is somewhat different. Maybe it is: what type > should a user expect when the exponent is a Python int? The obvious > choices seem to be an object array of Python ints, or an array of > floats. So far, nobody has proposed the former, and concerns have > been expressed about the latter. More important, either would break > the rule that the scalar type is not important in array operations, > which seems like a good general rule (useful and easy to remember). > > How much commitment is there to such a rule? E.g., > np.int64(2**7)*np.arange(5,dtype=np.int8) > violates this. One thing that has come out of this > discussion for me is that the actual rules in play are > hard to keep track of. Are they all written down in > one place? > > I suspect there is general support for the idea that if someone > explicitly specifies the same dtype for the base and the > exponent then the result should also have that dtype. > I think this is already true for array exponentiation > and for scalar exponentiation. > > One other thing that a user might expect, I believe, is that > any type promotion rules for scalars and arrays will be the same. > This is not currently the case, and that feels like an > inconsistency. But is it an inconsistency? If the rule is that > that array type dominates the scalar type, that may > be understandable, but then it should be a firm rule. > In this case, an exponent that is a Python int should not > affect the dtype of the (array) result. > > In sum, as a user, I've come around to Chuck's original proposal: > integers raised to negative integer powers raise an error. > My reason for coming around is that I believe it meshes > well with a general rule that in binary operations the > scalar dtypes should not influence the dtype of an array result. > Otoh, it is unclear to me how much commitment there is to that rule. > > Thanks in advance to anyone who can help me understand better > the issues in play. the main thing I get out of the discussion in this thread is that this is way to complicated. which ints do I have? is it python or one of the many numpy int types, or two different (u)int types or maybe one is a scalar so it shouldn't count? scalar dominates here >>> (np.ones(5, np.int8) *1.0).dtype dtype('float64') otherwise a huge amount of code would be broken that uses the *1. trick Josef > > Cheers, > Alan Isaac > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion From njs at pobox.com Mon Jun 20 18:15:00 2016 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 20 Jun 2016 15:15:00 -0700 Subject: [Numpy-discussion] Integers to integer powers, let's make a decision In-Reply-To: References: <253516852486873130.010471sturla.molden-gmail.com@news.gmane.org> <1e673c3b-8bea-c7b7-3ccd-d35f417741ee@gmail.com> <575B5B2E.5090709@gmail.com> <575EE869.9080401@gmail.com> Message-ID: On Mon, Jun 20, 2016 at 3:09 PM, wrote: > On Mon, Jun 20, 2016 at 4:31 PM, Alan Isaac wrote: >> On 6/13/2016 1:54 PM, Marten van Kerkwijk wrote: >>> >>> 1. What in principle is the best return type for int ** int (which >>> Josef I think most properly rephrased as whether `**` should be >>> thought of as a float operator, like `/` in python3 and `sqrt` etc.); >> >> >> >> Perhaps the question is somewhat different. Maybe it is: what type >> should a user expect when the exponent is a Python int? The obvious >> choices seem to be an object array of Python ints, or an array of >> floats. So far, nobody has proposed the former, and concerns have >> been expressed about the latter. More important, either would break >> the rule that the scalar type is not important in array operations, >> which seems like a good general rule (useful and easy to remember). >> >> How much commitment is there to such a rule? E.g., >> np.int64(2**7)*np.arange(5,dtype=np.int8) >> violates this. One thing that has come out of this >> discussion for me is that the actual rules in play are >> hard to keep track of. Are they all written down in >> one place? >> >> I suspect there is general support for the idea that if someone >> explicitly specifies the same dtype for the base and the >> exponent then the result should also have that dtype. >> I think this is already true for array exponentiation >> and for scalar exponentiation. >> >> One other thing that a user might expect, I believe, is that >> any type promotion rules for scalars and arrays will be the same. >> This is not currently the case, and that feels like an >> inconsistency. But is it an inconsistency? If the rule is that >> that array type dominates the scalar type, that may >> be understandable, but then it should be a firm rule. >> In this case, an exponent that is a Python int should not >> affect the dtype of the (array) result. >> >> In sum, as a user, I've come around to Chuck's original proposal: >> integers raised to negative integer powers raise an error. >> My reason for coming around is that I believe it meshes >> well with a general rule that in binary operations the >> scalar dtypes should not influence the dtype of an array result. >> Otoh, it is unclear to me how much commitment there is to that rule. >> >> Thanks in advance to anyone who can help me understand better >> the issues in play. > > the main thing I get out of the discussion in this thread is that this > is way to complicated. > > which ints do I have? > > is it python or one of the many numpy int types, or two different > (u)int types or maybe one is a scalar so it shouldn't count? > > > scalar dominates here > >>>> (np.ones(5, np.int8) *1.0).dtype > dtype('float64') > > otherwise a huge amount of code would be broken that uses the *1. trick I *think* the documented rule is that scalar *kind* matters (so we pay attention to it being a float) but scalar *type* doesn't (we ignore whether it's float64 versus float32) and scalar *value* doesn't (we ignore whether it's 1.0 or 2.0**53). Obviously even this is not 100% true, but I think it is the original intent. My suspicion is that a better rule would be: *Python* types (int, float, bool) are treated as having an unspecified width, but all numpy types/dtypes are treated the same regardless of whether they're a scalar or not. So np.int8(2) * 2 would return an int8, but np.int8(2) * np.int64(2) would return an int64. But this is totally separate from the issues around **, and would require a longer discussion and larger overhaul of the typing system. -n -- Nathaniel J. Smith -- https://vorpus.org From sebastian at sipsolutions.net Mon Jun 20 18:22:56 2016 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 21 Jun 2016 00:22:56 +0200 Subject: [Numpy-discussion] Integers to integer powers, let's make a decision In-Reply-To: References: <253516852486873130.010471sturla.molden-gmail.com@news.gmane.org> <1e673c3b-8bea-c7b7-3ccd-d35f417741ee@gmail.com> <575B5B2E.5090709@gmail.com> <575EE869.9080401@gmail.com> Message-ID: <1466461376.25298.3.camel@sipsolutions.net> On Mo, 2016-06-20 at 15:15 -0700, Nathaniel Smith wrote: > On Mon, Jun 20, 2016 at 3:09 PM,?? wrote: > > > > On Mon, Jun 20, 2016 at 4:31 PM, Alan Isaac > > wrote: > > > > > > On 6/13/2016 1:54 PM, Marten van Kerkwijk wrote: > > > > > > > > > > > > 1. What in principle is the best return type for int ** int > > > > (which > > > > Josef I think most properly rephrased as whether `**` should be > > > > thought of as a float operator, like `/` in python3 and `sqrt` > > > > etc.); > > > > > > > > > Perhaps the question is somewhat different.??Maybe it is: what > > > type > > > should a user expect when the exponent is a Python int???The > > > obvious > > > choices seem to be an object array of Python ints, or an array of > > > floats.??So far, nobody has proposed the former, and concerns > > > have > > > been expressed about the latter.??More important, either would > > > break > > > the rule that the scalar type is not important in array > > > operations, > > > which seems like a good general rule (useful and easy to > > > remember). > > > > > > How much commitment is there to such a rule???E.g., > > > np.int64(2**7)*np.arange(5,dtype=np.int8) > > > violates this.??One thing that has come out of this > > > discussion for me is that the actual rules in play are > > > hard to keep track of.??Are they all written down in > > > one place? > > > > > > I suspect there is general support for the idea that if someone > > > explicitly specifies the same dtype for the base and the > > > exponent then the result should also have that dtype. > > > I think this is already true for array exponentiation > > > and for scalar exponentiation. > > > > > > One other thing that a user might expect, I believe, is that > > > any type promotion rules for scalars and arrays will be the same. > > > This is not currently the case, and that feels like an > > > inconsistency.??But is it an inconsistency???If the rule is that > > > that array type dominates the scalar type, that may > > > be understandable, but then it should be a firm rule. > > > In this case, an exponent that is a Python int should not > > > affect the dtype of the (array) result. > > > > > > In sum, as a user, I've come around to Chuck's original proposal: > > > integers raised to negative integer powers raise an error. > > > My reason for coming around is that I believe it meshes > > > well with a general rule that in binary operations the > > > scalar dtypes should not influence the dtype of an array result. > > > Otoh, it is unclear to me how much commitment there is to that > > > rule. > > > > > > Thanks in advance to anyone who can help me understand better > > > the issues in play. > > the main thing I get out of the discussion in this thread is that > > this > > is way to complicated. > > > > which ints do I have? > > > > is it python or one of the many numpy int types, or two different > > (u)int types or maybe one is a scalar so it shouldn't count? > > > > > > scalar dominates here > > > > > > > > > > > > > > > > > > > (np.ones(5, np.int8) *1.0).dtype > > dtype('float64') > > > > otherwise a huge amount of code would be broken that uses the *1. > > trick > I *think* the documented rule is that scalar *kind* matters (so we > pay > attention to it being a float) but scalar *type* doesn't (we ignore > whether it's float64 versus float32) and scalar *value* doesn't (we > ignore whether it's 1.0 or 2.0**53). Obviously even this is not 100% > true, but I think it is the original intent. > Except for int types, which force a result type large enough to hold the input value. > My suspicion is that a better rule would be: *Python* types (int, > float, bool) are treated as having an unspecified width, but all > numpy > types/dtypes are treated the same regardless of whether they're a > scalar or not. So np.int8(2) * 2 would return an int8, but np.int8(2) > * np.int64(2) would return an int64. But this is totally separate > from > the issues around **, and would require a longer discussion and > larger > overhaul of the typing system. > I agree with that. The rule makes sense for python types, but somewhat creates oddities for numpy types and could probably just be made more array like there. - Sebastian > -n > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From matthew.brett at gmail.com Mon Jun 20 23:27:21 2016 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 20 Jun 2016 20:27:21 -0700 Subject: [Numpy-discussion] scipy 0.18 release candidate 1 In-Reply-To: References: Message-ID: Hi, On Mon, Jun 20, 2016 at 6:31 AM, Evgeni Burovski wrote: > Hi, > > I'm pleased to announce the availability of the first release > candidate for scipy 0.18.0. > Please try this release and report any issues on Github tracker, > https://github.com/scipy/scipy, or scipy-dev mailing list. > Source tarballs and release notes are available from Github releases, > https://github.com/scipy/scipy/releases/tag/v0.18.0rc1 > > Please note that this is a source-only release. We do not provide > Windows binaries for this release. OS X and Linux wheels will be > provided for the final release. > > The current release schedule is > > 27 June: rc2 (if necessary) > 11 July: final release > > Thanks to everyone who contributed to this release! > > Cheers, > > Evgeni > > > A part of the release notes follows: > > > > ========================== > SciPy 0.18.0 Release Notes > ========================== > > .. note:: Scipy 0.18.0 is not released yet! > > .. contents:: > > SciPy 0.18.0 is the culmination of 6 months of hard work. It contains > many new features, numerous bug-fixes, improved test coverage and > better documentation. There have been a number of deprecations and > API changes in this release, which are documented below. All users > are encouraged to upgrade to this release, as there are a large number > of bug-fixes and optimizations. Moreover, our development attention > will now shift to bug-fix releases on the 0.19.x branch, and on adding > new features on the master branch. > > This release requires Python 2.7 or 3.4-3.5 and NumPy 1.7.1 or greater. Thanks a lot for taking on the release. I put the manylinux1 and OSX wheel building into a single repo to test 64- and 32-bit linux wheels. There's a test run with the 0.18.0rc1 code here: https://travis-ci.org/MacPython/scipy-wheels/builds/139084454 For Python 3 I am getting these errors: https://github.com/scipy/scipy/issues/6292 For all 32-bit builds I am getting this error: https://github.com/scipy/scipy/issues/6093 For the Python 3 32-bit builds I am also getting this error: https://github.com/scipy/scipy/issues/6101 For the builds that succeeded without failure (all OSX and manylinux1 for 64 bit Python 2.7), you can test with: python -m pip install -U pip pip install --trusted-host wheels.scipy.org -f https://wheels.scipy.org -U --pre scipy Thanks again, sorry for the tiring news, Matthew From charlesr.harris at gmail.com Tue Jun 21 10:58:33 2016 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 21 Jun 2016 08:58:33 -0600 Subject: [Numpy-discussion] block function Message-ID: Hi All, I've updated Stefan Otte's block function enhancement at https://github.com/numpy/numpy/pull/7768. Could folks interested in that function review the proposed grammar for the creation of blocked arrays. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From evgeny.burovskiy at gmail.com Tue Jun 21 13:29:22 2016 From: evgeny.burovskiy at gmail.com (Evgeni Burovski) Date: Tue, 21 Jun 2016 18:29:22 +0100 Subject: [Numpy-discussion] scipy 0.18 release candidate 1 In-Reply-To: References: Message-ID: On Tue, Jun 21, 2016 at 4:27 AM, Matthew Brett wrote: > Hi, > > On Mon, Jun 20, 2016 at 6:31 AM, Evgeni Burovski > wrote: >> Hi, >> >> I'm pleased to announce the availability of the first release >> candidate for scipy 0.18.0. >> Please try this release and report any issues on Github tracker, >> https://github.com/scipy/scipy, or scipy-dev mailing list. >> Source tarballs and release notes are available from Github releases, >> https://github.com/scipy/scipy/releases/tag/v0.18.0rc1 >> >> Please note that this is a source-only release. We do not provide >> Windows binaries for this release. OS X and Linux wheels will be >> provided for the final release. >> >> The current release schedule is >> >> 27 June: rc2 (if necessary) >> 11 July: final release >> >> Thanks to everyone who contributed to this release! >> >> Cheers, >> >> Evgeni >> >> >> A part of the release notes follows: >> >> >> >> ========================== >> SciPy 0.18.0 Release Notes >> ========================== >> >> .. note:: Scipy 0.18.0 is not released yet! >> >> .. contents:: >> >> SciPy 0.18.0 is the culmination of 6 months of hard work. It contains >> many new features, numerous bug-fixes, improved test coverage and >> better documentation. There have been a number of deprecations and >> API changes in this release, which are documented below. All users >> are encouraged to upgrade to this release, as there are a large number >> of bug-fixes and optimizations. Moreover, our development attention >> will now shift to bug-fix releases on the 0.19.x branch, and on adding >> new features on the master branch. >> >> This release requires Python 2.7 or 3.4-3.5 and NumPy 1.7.1 or greater. > > Thanks a lot for taking on the release. > > I put the manylinux1 and OSX wheel building into a single repo to test > 64- and 32-bit linux wheels. There's a test run with the 0.18.0rc1 > code here: > > https://travis-ci.org/MacPython/scipy-wheels/builds/139084454 > > For Python 3 I am getting these errors: > https://github.com/scipy/scipy/issues/6292 > > For all 32-bit builds I am getting this error: > https://github.com/scipy/scipy/issues/6093 > > For the Python 3 32-bit builds I am also getting this error: > https://github.com/scipy/scipy/issues/6101 > > For the builds that succeeded without failure (all OSX and manylinux1 > for 64 bit Python 2.7), you can test with: > > python -m pip install -U pip > pip install --trusted-host wheels.scipy.org -f > https://wheels.scipy.org -U --pre scipy > > Thanks again, sorry for the tiring news, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion Thanks Matthew for testing and reporting these! Two out of three: https://github.com/scipy/scipy/pull/6295 https://github.com/scipy/scipy/pull/6293 The Qhull failure is a bit more mysterious. From alan.isaac at gmail.com Tue Jun 21 18:54:06 2016 From: alan.isaac at gmail.com (Alan Isaac) Date: Tue, 21 Jun 2016 18:54:06 -0400 Subject: [Numpy-discussion] Integers to integer powers, let's make a decision In-Reply-To: References: <253516852486873130.010471sturla.molden-gmail.com@news.gmane.org> <1e673c3b-8bea-c7b7-3ccd-d35f417741ee@gmail.com> <575B5B2E.5090709@gmail.com> <15d20753-ce82-9802-592e-a39bae3fcd23@gmail.com> Message-ID: On 6/20/2016 5:59 PM, Nathaniel Smith wrote: > If you have the time to check for existing bug reports about this, and > file a new bug if you don't find one, then it'd be appreciated. https://github.com/numpy/numpy/issues/7770 Alan From jocjo at mail.dk Tue Jun 21 20:38:56 2016 From: jocjo at mail.dk (Hans Larsen) Date: Wed, 22 Jun 2016 02:38:56 +0200 Subject: [Numpy-discussion] Support of '@='? Message-ID: <19e7e58d-f348-9ba6-69ac-467a26d9d120@mail.dk> I have Python 3-5-1 and NumPy 1-11! windows 64bits! When will by side 'M=M at P' be supported with 'M@=P'???:-( -- Hans Larsen Galgebakken S?nder 4-11A 2620 Albertslund Danmark/Danio From sebastian at sipsolutions.net Wed Jun 22 13:53:43 2016 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Wed, 22 Jun 2016 19:53:43 +0200 Subject: [Numpy-discussion] Support of '@='? In-Reply-To: <19e7e58d-f348-9ba6-69ac-467a26d9d120@mail.dk> (sfid-20160622_024027_187610_CE0CE8EF) References: <19e7e58d-f348-9ba6-69ac-467a26d9d120@mail.dk> (sfid-20160622_024027_187610_CE0CE8EF) Message-ID: <1466618023.8746.3.camel@sipsolutions.net> On Mi, 2016-06-22 at 02:38 +0200, Hans Larsen wrote: > I have Python 3-5-1 and NumPy 1-11! windows 64bits! > When will by side 'M=M at P' be supported with 'M@=P'???:-( > When someone gets around to making it a well defined operation? ;) Just to be clear, `M = M @ P` is probably not what `M @= P` is, because the result of that should probably be `temp = M @ P; M[...] = temp`. Now this operation needs copy back to the original array from a temporary array (you can't do it in-place, because you still need the values in M after overwriting them if you would). Just if you are curious why it is an error at the moment. We can't have it be filled in by python to be not in-place (`M = M @ P` meaning), but copying over the result is a bit annoying and nobody was quite sure about it, so it was delayed. - Sebastian -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From m.h.vankerkwijk at gmail.com Wed Jun 22 14:32:54 2016 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Wed, 22 Jun 2016 14:32:54 -0400 Subject: [Numpy-discussion] Support of '@='? In-Reply-To: <1466618023.8746.3.camel@sipsolutions.net> References: <19e7e58d-f348-9ba6-69ac-467a26d9d120@mail.dk> <1466618023.8746.3.camel@sipsolutions.net> Message-ID: > > Just if you are curious why it is an error at the moment. We can't have > it be filled in by python to be not in-place (`M = M @ P` meaning), but > copying over the result is a bit annoying and nobody was quite sure > about it, so it was delayed. The problem with using out in-place is clear from trying `np.matmul(a, a, out=a)`: ``` In [487]: a array([[ 1. , 0. , 0. ], [ 0. , 0.8660254, 0.5 ], [ 0. , -0.5 , 0.8660254]]) In [488]: np.matmul(a, a) Out[488]: array([[ 1. , 0. , 0. ], [ 0. , 0.5 , 0.8660254], [ 0. , -0.8660254, 0.5 ]]) In [489]: np.matmul(a, a, out=a) Out[489]: array([[ 0., 0., 0.], [ 0., 0., 0.], [ 0., 0., 0.]]) ``` It would seem hard to avoid doing the copying (though obviously one should iterate over higher dimensiones, ie., temp.shape = M.shape[-2:]). Not dissimilar from cumsum etc which are also not true ufuncs (but where things can be made to work by ensuring operations are doing in the right order). -- Marten -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Wed Jun 22 15:39:15 2016 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 22 Jun 2016 12:39:15 -0700 Subject: [Numpy-discussion] Support of '@='? In-Reply-To: <19e7e58d-f348-9ba6-69ac-467a26d9d120@mail.dk> References: <19e7e58d-f348-9ba6-69ac-467a26d9d120@mail.dk> Message-ID: To repeat and (hopefully) clarify/summarize the other answers: It's been left out on purpose so far. Why was it left out? A few reasons: - Usually in-place operations like "a += b" are preferred over the out-of-place equivalents like "a[...] = a + b" because they avoid some copies and potentially large temporary arrays. But for @= this is impossible -- you have to make a temporary copy of the whole matrix, because otherwise you find yourself writing output elements on top of input elements that you're still using. So it's probably better style to write this as "a[...] = a @ b": this makes it more clear to the reader that a potentially large temporary array is being allocated. - The one place where this doesn't apply, and where "a @= b" really could be a performance win, is when working with higher dimensional stacks of matrices. In this case we still have to make a temporary copy of each matrix, but only of one matrix at a time, not the whole stack together. - But, not that many people are using matrix stacks yet, and in any case "a @= b" is limited to cases where both matrices are square. And making it efficient in the stacked case may require some non-trivial surgery on the internals. So there hasn't been much urgency to fix this. My guess is that eventually it will be supported because the stacked matrix use case is somewhat compelling, but it will take a bit until someone (maybe you!) decides they care enough and have the time/energy to fix it. -n On Jun 21, 2016 17:39, "Hans Larsen" wrote: > I have Python 3-5-1 and NumPy 1-11! windows 64bits! > When will by side 'M=M at P' be supported with 'M@=P'???:-( > > -- > Hans Larsen Galgebakken S?nder 4-11A 2620 Albertslund Danmark/Danio > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jocjo at mail.dk Fri Jun 24 02:40:58 2016 From: jocjo at mail.dk (Hans Larsen) Date: Fri, 24 Jun 2016 08:40:58 +0200 Subject: [Numpy-discussion] Support of '@='? In-Reply-To: References: <19e7e58d-f348-9ba6-69ac-467a26d9d120@mail.dk> Message-ID: <5ddbc45c-85c5-7450-4ede-4e35b59b6875@mail.dk> I thank you for the answer!!! Bu this with "temp" is also in core-Python bihind the scene!;-) Den 22-06-2016 kl. 21:39 skrev Nathaniel Smith: > > To repeat and (hopefully) clarify/summarize the other answers: > > It's been left out on purpose so far. > > Why was it left out? A few reasons: > > - Usually in-place operations like "a += b" are preferred over the > out-of-place equivalents like "a[...] = a + b" because they avoid some > copies and potentially large temporary arrays. But for @= this is > impossible -- you have to make a temporary copy of the whole matrix, > because otherwise you find yourself writing output elements on top of > input elements that you're still using. So it's probably better style > to write this as "a[...] = a @ b": this makes it more clear to the > reader that a potentially large temporary array is being allocated. > > - The one place where this doesn't apply, and where "a @= b" really > could be a performance win, is when working with higher dimensional > stacks of matrices. In this case we still have to make a temporary > copy of each matrix, but only of one matrix at a time, not the whole > stack together. > > - But, not that many people are using matrix stacks yet, and in any > case "a @= b" is limited to cases where both matrices are square. And > making it efficient in the stacked case may require some non-trivial > surgery on the internals. So there hasn't been much urgency to fix this. > > My guess is that eventually it will be supported because the stacked > matrix use case is somewhat compelling, but it will take a bit until > someone (maybe you!) decides they care enough and have the time/energy > to fix it. > > -n > > On Jun 21, 2016 17:39, "Hans Larsen" > wrote: > > I have Python 3-5-1 and NumPy 1-11! windows 64bits! > When will by side 'M=M at P' be supported with 'M@=P'???:-( > > -- > Hans Larsen Galgebakken S?nder 4-11A 2620 Albertslund Danmark/Danio > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion -- Hans Larsen Galgebakken S?nder 4-11A 2620 Albertslund Danmark/Danio -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Fri Jun 24 15:05:40 2016 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 24 Jun 2016 21:05:40 +0200 Subject: [Numpy-discussion] Benchmark regression feeds Message-ID: <576D8484.7030801@iki.fi> Hi, In case someone is interested in getting notifications of performance regressions in the Numpy and Scipy benchmarks, this is available as Atom feeds at: https://pv.github.io/numpy-bench/regressions.xml https://pv.github.io/scipy-bench/regressions.xml -- Pauli Virtanen From matthew.brett at gmail.com Fri Jun 24 18:25:38 2016 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 24 Jun 2016 15:25:38 -0700 Subject: [Numpy-discussion] Pip download stats for numpy Message-ID: Hi, I just ran a query on pypi downloads [1] using the BigQuery interface to pypi stats [2]. It lists the numpy files downloaded from pypi via a pip install, over the last two weeks, ordered by the number of downloads: 1 100595 numpy-1.11.0.tar.gz 2 97754 numpy-1.11.0-cp27-cp27mu-manylinux1_x86_64.whl 3 38471 numpy-1.8.1-cp27-cp27mu-manylinux1_x86_64.whl 4 20874 numpy-1.11.0-cp27-none-win_amd64.whl 5 20049 numpy-1.11.0-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl 6 17100 numpy-1.10.4-cp27-cp27mu-manylinux1_x86_64.whl 7 15187 numpy-1.10.1.zip 8 14277 numpy-1.11.0-cp35-cp35m-manylinux1_x86_64.whl 9 11538 numpy-1.9.1.tar.gz 10 11272 numpy-1.11.0-cp27-none-win32.whl Of course, it's difficult to know how many of these are from automated builds, such as from travis-ci, but it does look as if manylinux wheels are getting some traction. Cheers, Matthew [1] SELECT COUNT(*) AS downloads, file.filename FROM TABLE_DATE_RANGE( [the-psf:pypi.downloads], TIMESTAMP("20160610"), CURRENT_TIMESTAMP() ) WHERE details.installer.name = 'pip' AND REGEXP_MATCH(file.filename, '^numpy-.*') GROUP BY file.filename ORDER BY downloads DESC LIMIT 1000 [2] https://mail.python.org/pipermail/distutils-sig/2016-May/028986.html From davidgshi at yahoo.co.uk Sat Jun 25 17:50:57 2016 From: davidgshi at yahoo.co.uk (David Shi) Date: Sat, 25 Jun 2016 21:50:57 +0000 (UTC) Subject: [Numpy-discussion] How best to turn JSON into a CSV or Pandas data frame table? In-Reply-To: <1827931572.2383452.1466889007036.JavaMail.yahoo@mail.yahoo.com> References: <1827931572.2383452.1466889007036.JavaMail.yahoo.ref@mail.yahoo.com> <1827931572.2383452.1466889007036.JavaMail.yahoo@mail.yahoo.com> Message-ID: <1171511771.2353445.1466891458011.JavaMail.yahoo@mail.yahoo.com> Which are the best ways to turn a JSON object into a CSV or Pandas data frame table? Looking forward to hearing from you. Regards. David -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sun Jun 26 12:36:28 2016 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 26 Jun 2016 10:36:28 -0600 Subject: [Numpy-discussion] Numpy 1.11.1 release Message-ID: Hi All, I'm pleased to announce the release of Numpy 1.11.1. This release supports Python 2.6 - 2.7, and 3.2 - 3.5 and fixes bugs and regressions found in Numpy 1.11.0 as well as making several build related improvements. Wheels for Linux, Windows, and OSX can be found on PyPI. Sources are available on both PyPI and Sourceforge . Thanks to all who were involved in this release, and a special thanks to Matthew Brett for his work on the Linux and Windows wheel infrastructure. The following pull requests have been merged: - 7506 BUG: Make sure numpy imports on python 2.6 when nose is unavailable. - 7530 BUG: Floating exception with invalid axis in np.lexsort. - 7535 BUG: Extend glibc complex trig functions blacklist to glibc < 2.18. - 7551 BUG: Allow graceful recovery for no compiler. - 7558 BUG: Constant padding expected wrong type in constant_values. - 7578 BUG: Fix OverflowError in Python 3.x. in swig interface. - 7590 BLD: Fix configparser.InterpolationSyntaxError. - 7597 BUG: Make np.ma.take work on scalars. - 7608 BUG: linalg.norm(): Don't convert object arrays to float. - 7638 BLD: Correct C compiler customization in system_info.py. - 7654 BUG: ma.median of 1d array should return a scalar. - 7656 BLD: Remove hardcoded Intel compiler flag -xSSE4.2. - 7660 BUG: Temporary fix for str(mvoid) for object field types. - 7665 BUG: Fix incorrect printing of 1D masked arrays. - 7670 BUG: Correct initial index estimate in histogram. - 7671 BUG: Boolean assignment no GIL release when transfer needs API. - 7676 BUG: Fix handling of right edge of final histogram bin. - 7680 BUG: Fix np.clip bug NaN handling for Visual Studio 2015. - 7724 BUG: Fix segfaults in np.random.shuffle. - 7731 MAINT: Change mkl_info.dir_env_var from MKL to MKLROOT. - 7737 BUG: Fix issue on OS X with Python 3.x, npymath.ini not installed. The following developers contributed to this release, developers marked with a '+' are first time contributors. - Allan Haldane - Amit Aronovitch+ - Andrei Kucharavy+ - Charles Harris - Eric Wieser+ - Evgeni Burovski - Lo?c Est?ve+ - Mathieu Lamarre+ - Matthew Brett - Matthias Geier - Nathaniel J. Smith - Nikola Forr?+ - Ralf Gommers - Ray Donnelly+ - Robert Kern - Sebastian Berg - Simon Conseil - Simon Gibbons - Sorin Sbarnea+ Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sun Jun 26 17:09:45 2016 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sun, 26 Jun 2016 23:09:45 +0200 Subject: [Numpy-discussion] Pip download stats for numpy In-Reply-To: References: Message-ID: On Sat, Jun 25, 2016 at 12:25 AM, Matthew Brett wrote: > Hi, > > I just ran a query on pypi downloads [1] using the BigQuery interface > to pypi stats [2]. It lists the numpy files downloaded from pypi via > a pip install, over the last two weeks, ordered by the number of > downloads: > > 1 100595 numpy-1.11.0.tar.gz > 2 97754 numpy-1.11.0-cp27-cp27mu-manylinux1_x86_64.whl > 3 38471 numpy-1.8.1-cp27-cp27mu-manylinux1_x86_64.whl > 4 20874 numpy-1.11.0-cp27-none-win_amd64.whl > 5 20049 > numpy-1.11.0-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl > 6 17100 numpy-1.10.4-cp27-cp27mu-manylinux1_x86_64.whl > 7 15187 numpy-1.10.1.zip > 8 14277 numpy-1.11.0-cp35-cp35m-manylinux1_x86_64.whl > 9 11538 numpy-1.9.1.tar.gz > 10 11272 numpy-1.11.0-cp27-none-win32.whl > Thanks Matthew, interesting. > Of course, it's difficult to know how many of these are from automated > builds, such as from travis-ci, but it does look as if manylinux > wheels are getting some traction. > Looks like the vast majority is from CI setups, but that's still a lot of time not spent building numpy from source so also a good thing. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Mon Jun 27 23:46:46 2016 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 27 Jun 2016 20:46:46 -0700 Subject: [Numpy-discussion] Accelerate or OpenBLAS for numpy / scipy wheels? Message-ID: Hi, I just succeeded in getting an automated dual arch build of numpy and scipy, using OpenBLAS. See the last three build jobs in these two build matrices: https://travis-ci.org/matthew-brett/numpy-wheels/builds/140388119 https://travis-ci.org/matthew-brett/scipy-wheels/builds/140684673 Tests are passing on 32 and 64-bit. I didn't upload these to the usual Rackspace container at wheels.scipy.org to avoid confusion. So, I guess the question now is - should we switch to shipping OpenBLAS wheels for the next release of numpy and scipy? Or should we stick with the Accelerate framework that comes with OSX? In favor of the Accelerate build : faster to build, it's what we've been doing thus far. In favor of OpenBLAS build : allows us to commit to one BLAS / LAPACK library cross platform, when we have the Windows builds working. Faster to fix bugs with good support from main developer. No multiprocessing crashes for Python 2.7. Any thoughts? Cheers, Matthew From charlesr.harris at gmail.com Tue Jun 28 08:25:44 2016 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 28 Jun 2016 06:25:44 -0600 Subject: [Numpy-discussion] Accelerate or OpenBLAS for numpy / scipy wheels? In-Reply-To: References: Message-ID: On Mon, Jun 27, 2016 at 9:46 PM, Matthew Brett wrote: > Hi, > > I just succeeded in getting an automated dual arch build of numpy and > scipy, using OpenBLAS. See the last three build jobs in these two > build matrices: > > https://travis-ci.org/matthew-brett/numpy-wheels/builds/140388119 > https://travis-ci.org/matthew-brett/scipy-wheels/builds/140684673 > > Tests are passing on 32 and 64-bit. > > I didn't upload these to the usual Rackspace container at > wheels.scipy.org to avoid confusion. > > So, I guess the question now is - should we switch to shipping > OpenBLAS wheels for the next release of numpy and scipy? Or should we > stick with the Accelerate framework that comes with OSX? > > In favor of the Accelerate build : faster to build, it's what we've > been doing thus far. > > In favor of OpenBLAS build : allows us to commit to one BLAS / LAPACK > library cross platform, when we have the Windows builds working. > Faster to fix bugs with good support from main developer. No > multiprocessing crashes for Python 2.7. > I'm still a bit nervous about OpenBLAS, see https://github.com/scipy/scipy/issues/6286. That was with version 0.2.18, which is pretty recent. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Tue Jun 28 08:55:07 2016 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 28 Jun 2016 05:55:07 -0700 Subject: [Numpy-discussion] Accelerate or OpenBLAS for numpy / scipy wheels? In-Reply-To: References: Message-ID: Hi, On Tue, Jun 28, 2016 at 5:25 AM, Charles R Harris wrote: > > > On Mon, Jun 27, 2016 at 9:46 PM, Matthew Brett > wrote: >> >> Hi, >> >> I just succeeded in getting an automated dual arch build of numpy and >> scipy, using OpenBLAS. See the last three build jobs in these two >> build matrices: >> >> https://travis-ci.org/matthew-brett/numpy-wheels/builds/140388119 >> https://travis-ci.org/matthew-brett/scipy-wheels/builds/140684673 >> >> Tests are passing on 32 and 64-bit. >> >> I didn't upload these to the usual Rackspace container at >> wheels.scipy.org to avoid confusion. >> >> So, I guess the question now is - should we switch to shipping >> OpenBLAS wheels for the next release of numpy and scipy? Or should we >> stick with the Accelerate framework that comes with OSX? >> >> In favor of the Accelerate build : faster to build, it's what we've >> been doing thus far. >> >> In favor of OpenBLAS build : allows us to commit to one BLAS / LAPACK >> library cross platform, when we have the Windows builds working. >> Faster to fix bugs with good support from main developer. No >> multiprocessing crashes for Python 2.7. > > > I'm still a bit nervous about OpenBLAS, see > https://github.com/scipy/scipy/issues/6286. That was with version 0.2.18, > which is pretty recent. Well - we are committed to OpenBLAS already for the Linux wheels, so if that failure was due to an error in OpenBLAS, we'll have to report it and get it fixed / fix it ourselves upstream. Cheers, Matthew From ralf.gommers at gmail.com Tue Jun 28 10:33:33 2016 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Tue, 28 Jun 2016 16:33:33 +0200 Subject: [Numpy-discussion] Accelerate or OpenBLAS for numpy / scipy wheels? In-Reply-To: References: Message-ID: On Tue, Jun 28, 2016 at 2:55 PM, Matthew Brett wrote: > Hi, > > On Tue, Jun 28, 2016 at 5:25 AM, Charles R Harris > wrote: > > > > > > On Mon, Jun 27, 2016 at 9:46 PM, Matthew Brett > > wrote: > >> > >> Hi, > >> > >> I just succeeded in getting an automated dual arch build of numpy and > >> scipy, using OpenBLAS. See the last three build jobs in these two > >> build matrices: > >> > >> https://travis-ci.org/matthew-brett/numpy-wheels/builds/140388119 > >> https://travis-ci.org/matthew-brett/scipy-wheels/builds/140684673 > >> > >> Tests are passing on 32 and 64-bit. > >> > >> I didn't upload these to the usual Rackspace container at > >> wheels.scipy.org to avoid confusion. > >> > >> So, I guess the question now is - should we switch to shipping > >> OpenBLAS wheels for the next release of numpy and scipy? Or should we > >> stick with the Accelerate framework that comes with OSX? > >> > >> In favor of the Accelerate build : faster to build, it's what we've > >> been doing thus far. > Faster to build isn't really an argument right? Should be the same build time except for building OpenBLAS itself once per OpenBLAS version. And only applies to building wheels for releases - nothing changes for source builds done by users on OS X. If build time ever becomes a real issue, then dropping the dual arch stuff is probably the way to go - the 32-bit builds make very little sense these days. What we've been doing thus far - that is the more important argument. There's a risk in switching, we may encounter new bugs or lose some performance in particular functions. > >> > >> In favor of OpenBLAS build : allows us to commit to one BLAS / LAPACK > >> library cross platform, > This doesn't really matter too much imho, we have to support Accelerate either way. > when we have the Windows builds working. > >> Faster to fix bugs with good support from main developer. No > >> multiprocessing crashes for Python 2.7. > This is probably the main reason to make the switch, if we decide to do that. > I'm still a bit nervous about OpenBLAS, see > > https://github.com/scipy/scipy/issues/6286. That was with version > 0.2.18, > > which is pretty recent. > > Well - we are committed to OpenBLAS already for the Linux wheels, so > if that failure was due to an error in OpenBLAS, we'll have to report > it and get it fixed / fix it ourselves upstream. > Indeed. And those wheels have been downloaded a lot already, without any issues being reported. I'm +0 on the proposal - the risk seems acceptable, but the reasons to make the switch are also not super compelling. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Tue Jun 28 11:15:12 2016 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 28 Jun 2016 08:15:12 -0700 Subject: [Numpy-discussion] Accelerate or OpenBLAS for numpy / scipy wheels? In-Reply-To: References: Message-ID: Hi, On Tue, Jun 28, 2016 at 7:33 AM, Ralf Gommers wrote: > > > On Tue, Jun 28, 2016 at 2:55 PM, Matthew Brett > wrote: >> >> Hi, >> >> On Tue, Jun 28, 2016 at 5:25 AM, Charles R Harris >> wrote: >> > >> > >> > On Mon, Jun 27, 2016 at 9:46 PM, Matthew Brett >> > wrote: >> >> >> >> Hi, >> >> >> >> I just succeeded in getting an automated dual arch build of numpy and >> >> scipy, using OpenBLAS. See the last three build jobs in these two >> >> build matrices: >> >> >> >> https://travis-ci.org/matthew-brett/numpy-wheels/builds/140388119 >> >> https://travis-ci.org/matthew-brett/scipy-wheels/builds/140684673 >> >> >> >> Tests are passing on 32 and 64-bit. >> >> >> >> I didn't upload these to the usual Rackspace container at >> >> wheels.scipy.org to avoid confusion. >> >> >> >> So, I guess the question now is - should we switch to shipping >> >> OpenBLAS wheels for the next release of numpy and scipy? Or should we >> >> stick with the Accelerate framework that comes with OSX? >> >> >> >> In favor of the Accelerate build : faster to build, it's what we've >> >> been doing thus far. > > > Faster to build isn't really an argument right? Should be the same build > time except for building OpenBLAS itself once per OpenBLAS version. And only > applies to building wheels for releases - nothing changes for source builds > done by users on OS X. If build time ever becomes a real issue, then > dropping the dual arch stuff is probably the way to go - the 32-bit builds > make very little sense these days. Yes, that's true, but as you know, the OSX system and Python.org Pythons are still dual arch, so technically a matching wheel should also be dual arch. I agree that we're near the point where there's near zero likelihood that the 32-bit arch will ever get exercised. > What we've been doing thus far - that is the more important argument. > There's a risk in switching, we may encounter new bugs or lose some > performance in particular functions. > >> >> >> >> >> In favor of OpenBLAS build : allows us to commit to one BLAS / LAPACK >> >> library cross platform, > > > This doesn't really matter too much imho, we have to support Accelerate > either way. > >> >> when we have the Windows builds working. >> >> Faster to fix bugs with good support from main developer. No >> >> multiprocessing crashes for Python 2.7. > > > This is probably the main reason to make the switch, if we decide to do > that. > >> >> I'm still a bit nervous about OpenBLAS, see >> > https://github.com/scipy/scipy/issues/6286. That was with version >> > 0.2.18, >> > which is pretty recent. >> >> Well - we are committed to OpenBLAS already for the Linux wheels, so >> if that failure was due to an error in OpenBLAS, we'll have to report >> it and get it fixed / fix it ourselves upstream. > > > Indeed. And those wheels have been downloaded a lot already, without any > issues being reported. > > I'm +0 on the proposal - the risk seems acceptable, but the reasons to make > the switch are also not super compelling. I guess I'm about +0.5 (multiprocessing, simplifying mainstream blas / lapack support) - I'm floating it now because I hadn't got the build machinery working before. Cheers, Matthew From chris.barker at noaa.gov Tue Jun 28 11:50:39 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 28 Jun 2016 08:50:39 -0700 Subject: [Numpy-discussion] Accelerate or OpenBLAS for numpy / scipy wheels? In-Reply-To: References: Message-ID: On Tue, Jun 28, 2016 at 8:15 AM, Matthew Brett wrote: > > dropping the dual arch stuff is probably the way to go - the 32-bit > builds > > make very little sense these days. > > Yes, that's true, but as you know, the OSX system and Python.org > Pythons are still dual arch, so technically a matching wheel should > also be dual arch. but as they say, practicality beat purity... It's not clear yet whether 3.6 will be built dual arch at this point, but in any case, no one is going to go back and change the builds on 2.7 or 3.4 or 3.5 .... But that doesn't mean we necessarily need to support dual arch downstream. Personally, I"d drop it and see if anyone screams. Though it's actually a bit tricky, at least with my knowledge to build a 64 bit only extension against the dual-arch build. At least the only way I figured out was to hack the install. ( I did this a while back when I needed a 32bit-only build -- ironic?) > This doesn't really matter too much imho, we have to support Accelerate > > either way. > do we? -- so if we go OpenBlas, and someone want to do a simple build from source, what happens? Do they get accelerate? or would we ship OpenBlas source itself? or would they need to install OpenBlas some other way? >> >> Faster to fix bugs with good support from main developer. No > >> >> multiprocessing crashes for Python 2.7. > this seems to be the compelling one. How does the performance compare? -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Tue Jun 28 13:50:39 2016 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Tue, 28 Jun 2016 19:50:39 +0200 Subject: [Numpy-discussion] Accelerate or OpenBLAS for numpy / scipy wheels? In-Reply-To: References: Message-ID: On Tue, Jun 28, 2016 at 5:50 PM, Chris Barker wrote: > > > > > > This doesn't really matter too much imho, we have to support Accelerate >> > either way. >> > > do we? -- so if we go OpenBlas, and someone want to do a simple build from > source, what happens? Do they get accelerate? > Indeed, unless they go through the effort of downloading a separate BLAS and LAPACK, and figuring out how to make that visible to numpy.distutils. Very few users will do that. > or would we ship OpenBlas source itself? > Definitely don't want to do that. > or would they need to install OpenBlas some other way? > Yes, or MKL, or ATLAS, or BLIS. We have support for all these, and that's a good thing. Making a uniform choice for our official binaries on various OSes doesn't reduce the need or effort for supporting those other options. > > >> >> Faster to fix bugs with good support from main developer. No >> >> >> multiprocessing crashes for Python 2.7. >> > > this seems to be the compelling one. > > How does the performance compare? > For most routines performance seems to be comparable, and both are much better than ATLAS. When there's a significant difference, I have the impression that OpenBLAS is more often the slower one (example: https://github.com/xianyi/OpenBLAS/issues/533). Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From bryanv at continuum.io Tue Jun 28 14:32:38 2016 From: bryanv at continuum.io (Bryan Van de Ven) Date: Tue, 28 Jun 2016 13:32:38 -0500 Subject: [Numpy-discussion] ANN: Bokeh 0.12 Released Message-ID: <0337284C-B006-4315-879A-253571F6D67B@continuum.io> Hi all, On behalf of the Bokeh team, I am pleased to announce the release of version 0.12.0 of Bokeh! This release was a major update, and was focused on areas of layout and styling, new JavaScript APIs for BokehJS, and improvements to the Bokeh Server. But there were many additional improvements in other areas as well. Rather than try to describe all the changes here, I encourage every one to check out the new project blog: https://bokeh.github.io/blog/2016/6/28/release-0-12/ which has details as well as live demonstrations. And as always, see the CHANGELOG and Release Notes for full details. If you are using Anaconda/miniconda, you can install it with conda: conda install bokeh Alternatively, you can also install it with pip: pip install bokeh Full information including details about how to use and obtain BokehJS are at: http://bokeh.pydata.org/en/0.12.0/docs/installation.html Issues, enhancement requests, and pull requests can be made on the Bokeh Github page: https://github.com/bokeh/bokeh Documentation is available at http://bokeh.pydata.org/en/0.12.0 Questions can be directed to the Bokeh mailing list: bokeh at continuum.io or the Gitter Chat room: https://gitter.im/bokeh/bokeh Thanks, Bryan Van de Ven Continuum Analytics From mward at cims.nyu.edu Tue Jun 28 16:36:26 2016 From: mward at cims.nyu.edu (Michael Ward) Date: Tue, 28 Jun 2016 16:36:26 -0400 Subject: [Numpy-discussion] Is numpy.test() supposed to be multithreaded? Message-ID: Heya, I'm not a numbers guy, but I maintain servers for scientists and researchers who are. Someone pointed out that our numpy installation on a particular server was only using one core. I'm unaware of the who/how the previous version of numpy/OpenBLAS were installed, so I installed them from scratch, and confirmed that the users test code now runs on multiple cores as expected, drastically increasing performance time. Now the user is writing back to say, "my test code is fast now, but numpy.test() is still about three times slower than ". When I watch htop as numpy.test() executes, sure enough, it's using one core. Now I'm not sure if that's the expected behavior or not. Questions: * if numpy.test() is supposed to be using multiple cores, why isn't it, when we've established with other test code that it's now using multiple cores? * if numpy.test() is not supposed to be using multiple cores, what could be the reason that the performance is drastically slower than another server with a comparable CPU, when the user's test code performs comparably? For what it's worth, the users "test" code which does run on multiple cores is as simple as: size=4000 a = np.random.random_sample((size,size)) b = np.random.random_sample((size,size)) x = np.dot(a,b) Whereas this uses only one core: numpy.test() --------------------------- OpenBLAS 0.2.18 was basically just compiled with "make", nothing special to it. Numpy 1.11.0 was installed from source (python setup.py install), using a site.cfg file to point numpy to the new OpenBLAS. Thanks, Mike From ralf.gommers at gmail.com Tue Jun 28 16:53:09 2016 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Tue, 28 Jun 2016 22:53:09 +0200 Subject: [Numpy-discussion] Is numpy.test() supposed to be multithreaded? In-Reply-To: References: Message-ID: On Tue, Jun 28, 2016 at 10:36 PM, Michael Ward wrote: > Heya, I'm not a numbers guy, but I maintain servers for scientists and > researchers who are. Someone pointed out that our numpy installation on a > particular server was only using one core. I'm unaware of the who/how the > previous version of numpy/OpenBLAS were installed, so I installed them from > scratch, and confirmed that the users test code now runs on multiple cores > as expected, drastically increasing performance time. > > Now the user is writing back to say, "my test code is fast now, but > numpy.test() is still about three times slower than don't manage>". When I watch htop as numpy.test() executes, sure enough, > it's using one core. Now I'm not sure if that's the expected behavior or > not. Questions: > > * if numpy.test() is supposed to be using multiple cores, why isn't it, > when we've established with other test code that it's now using multiple > cores? > Some numpy.linalg functions (like np.dot) will be using multiple cores, but np.linalg.test() takes only ~1% of the time of the full test suite. Everything else will be running single core. So your observations are not surprising. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Tue Jun 28 21:27:21 2016 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Tue, 28 Jun 2016 18:27:21 -0700 Subject: [Numpy-discussion] Is numpy.test() supposed to be multithreaded? In-Reply-To: References: Message-ID: <9201610374898457984@unknownmsgid> > Now the user is writing back to say, "my test code is fast now, but > numpy.test() is still about three times slower than don't manage>". When I watch htop as numpy.test() executes, sure enough, > it's using one core > * if numpy.test() is supposed to be using multiple cores, why isn't it, > when we've established with other test code that it's now using multiple > cores? > Some numpy.linalg functions (like np.dot) will be using multiple cores, but np.linalg.test() takes only ~1% of the time of the full test suite. Everything else will be running single core. So your observations are not surprising. Though why it would run slower on one box than another comparable box is a mystery... -CHB -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Wed Jun 29 03:07:14 2016 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Wed, 29 Jun 2016 09:07:14 +0200 Subject: [Numpy-discussion] Is numpy.test() supposed to be multithreaded? In-Reply-To: <9201610374898457984@unknownmsgid> References: <9201610374898457984@unknownmsgid> Message-ID: On Wed, Jun 29, 2016 at 3:27 AM, Chris Barker - NOAA Federal < chris.barker at noaa.gov> wrote: > > >> Now the user is writing back to say, "my test code is fast now, but >> numpy.test() is still about three times slower than > don't manage>". When I watch htop as numpy.test() executes, sure enough, >> it's using one core >> > > * if numpy.test() is supposed to be using multiple cores, why isn't it, >> when we've established with other test code that it's now using multiple >> cores? >> > > Some numpy.linalg functions (like np.dot) will be using multiple cores, > but np.linalg.test() takes only ~1% of the time of the full test suite. > Everything else will be running single core. So your observations are not > surprising. > > > Though why it would run slower on one box than another comparable box is a > mystery... > Maybe just hardware config? I see a similar difference between how long the test suite runs on TravisCI vs my linux desktop (the latter is slower, surprisingly). Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Wed Jun 29 05:03:43 2016 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 29 Jun 2016 02:03:43 -0700 Subject: [Numpy-discussion] Is numpy.test() supposed to be multithreaded? In-Reply-To: References: <9201610374898457984@unknownmsgid> Message-ID: As a general rule I wouldn't worry too much about test speed. Speed is extremely dependent on exact workloads. And this is doubly so for test suites -- production workloads tend to do a small number of normal things over and over, while a good test suite never does the same thing twice and spends most of its time exercising weird edge conditions. So unless your actual workload is running the numpy test suite :-), it's probably not worth trying to track down. And yeah, numpy does not in general do automatic multithreading -- the only automatic multithreading you should see is when using linear algebra functions (matrix multiply, eigenvalue calculations, etc.) that dispatch to the BLAS. -n On Wed, Jun 29, 2016 at 12:07 AM, Ralf Gommers wrote: > > > On Wed, Jun 29, 2016 at 3:27 AM, Chris Barker - NOAA Federal > wrote: >> >> >>> >>> Now the user is writing back to say, "my test code is fast now, but >>> numpy.test() is still about three times slower than >> don't manage>". When I watch htop as numpy.test() executes, sure enough, >>> it's using one core >> >> >>> * if numpy.test() is supposed to be using multiple cores, why isn't it, >>> when we've established with other test code that it's now using multiple >>> cores? >> >> >> Some numpy.linalg functions (like np.dot) will be using multiple cores, >> but np.linalg.test() takes only ~1% of the time of the full test suite. >> Everything else will be running single core. So your observations are not >> surprising. >> >> >> Though why it would run slower on one box than another comparable box is a >> mystery... > > > Maybe just hardware config? I see a similar difference between how long the > test suite runs on TravisCI vs my linux desktop (the latter is slower, > surprisingly). > > Ralf > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Nathaniel J. Smith -- https://vorpus.org From a.h.jaffe at gmail.com Wed Jun 29 05:49:25 2016 From: a.h.jaffe at gmail.com (Andrew Jaffe) Date: Wed, 29 Jun 2016 10:49:25 +0100 Subject: [Numpy-discussion] Accelerate or OpenBLAS for numpy / scipy wheels? In-Reply-To: References: Message-ID: On 28/06/2016 18:50, Ralf Gommers wrote: > > On Tue, Jun 28, 2016 at 5:50 PM, Chris Barker > wrote: > > > This doesn't really matter too much imho, we have to support Accelerate > > either way. > > do we? -- so if we go OpenBlas, and someone want to do a simple > build from source, what happens? Do they get accelerate? > > Indeed, unless they go through the effort of downloading a separate BLAS > and LAPACK, and figuring out how to make that visible to > numpy.distutils. Very few users will do that. > > or would we ship OpenBlas source itself? > > Definitely don't want to do that. > > or would they need to install OpenBlas some other way? > > Yes, or MKL, or ATLAS, or BLIS. We have support for all these, and > that's a good thing. Making a uniform choice for our official binaries > on various OSes doesn't reduce the need or effort for supporting those > other options. > > >> >> Faster to fix bugs with good support from main developer. No > >> >> multiprocessing crashes for Python 2.7. > > this seems to be the compelling one. > > How does the performance compare? > > For most routines performance seems to be comparable, and both are much > better than ATLAS. When there's a significant difference, I have the > impression that OpenBLAS is more often the slower one (example: > https://github.com/xianyi/OpenBLAS/issues/533). In that case: -1 (but this seems so obvious that I'm probably missing the point of the +1s) From a.h.jaffe at gmail.com Wed Jun 29 05:49:25 2016 From: a.h.jaffe at gmail.com (Andrew Jaffe) Date: Wed, 29 Jun 2016 10:49:25 +0100 Subject: [Numpy-discussion] Accelerate or OpenBLAS for numpy / scipy wheels? In-Reply-To: References: Message-ID: On 28/06/2016 18:50, Ralf Gommers wrote: > > On Tue, Jun 28, 2016 at 5:50 PM, Chris Barker > wrote: > > > This doesn't really matter too much imho, we have to support Accelerate > > either way. > > do we? -- so if we go OpenBlas, and someone want to do a simple > build from source, what happens? Do they get accelerate? > > Indeed, unless they go through the effort of downloading a separate BLAS > and LAPACK, and figuring out how to make that visible to > numpy.distutils. Very few users will do that. > > or would we ship OpenBlas source itself? > > Definitely don't want to do that. > > or would they need to install OpenBlas some other way? > > Yes, or MKL, or ATLAS, or BLIS. We have support for all these, and > that's a good thing. Making a uniform choice for our official binaries > on various OSes doesn't reduce the need or effort for supporting those > other options. > > >> >> Faster to fix bugs with good support from main developer. No > >> >> multiprocessing crashes for Python 2.7. > > this seems to be the compelling one. > > How does the performance compare? > > For most routines performance seems to be comparable, and both are much > better than ATLAS. When there's a significant difference, I have the > impression that OpenBLAS is more often the slower one (example: > https://github.com/xianyi/OpenBLAS/issues/533). In that case: -1 (but this seems so obvious that I'm probably missing the point of the +1s) From sebastian at sipsolutions.net Wed Jun 29 05:59:15 2016 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Wed, 29 Jun 2016 11:59:15 +0200 Subject: [Numpy-discussion] Is numpy.test() supposed to be multithreaded? In-Reply-To: References: <9201610374898457984@unknownmsgid> Message-ID: <1467194355.5990.25.camel@sipsolutions.net> On Mi, 2016-06-29 at 02:03 -0700, Nathaniel Smith wrote: > As a general rule I wouldn't worry too much about test speed. Speed > is > extremely dependent on exact workloads. And this is doubly so for > test > suites -- production workloads tend to do a small number of normal > things over and over, while a good test suite never does the same > thing twice and spends most of its time exercising weird edge > conditions. So unless your actual workload is running the numpy test > suite :-), it's probably not worth trying to track down. > Agreed, the test suit, and likely also the few tests which might take most time in the end, could be arbitrarily weird and skewed. I could for example imagine IO speed being a big factor. Also depending on system configuration (or numpy version) a different number of tests may be run sometimes. What might make somewhat more sense would be to compare some of the benchmarks `python runtests.py --bench` if you have airspeed velocity installed. While not extensive, a lot of those things at least do test more typical use cases. Though in any case I think the user should probably just test some other thing. - Sebastian > And yeah, numpy does not in general do automatic multithreading -- > the > only automatic multithreading you should see is when using linear > algebra functions (matrix multiply, eigenvalue calculations, etc.) > that dispatch to the BLAS. > > -n > > On Wed, Jun 29, 2016 at 12:07 AM, Ralf Gommers m> wrote: > > > > > > > > On Wed, Jun 29, 2016 at 3:27 AM, Chris Barker - NOAA Federal > > wrote: > > > > > > > > > > > > > > > > > > > > > Now the user is writing back to say, "my test code is fast now, > > > > but > > > > numpy.test() is still about three times slower than > > > server we > > > > don't manage>".??When I watch htop as numpy.test() executes, > > > > sure enough, > > > > it's using one core > > > > > > > > > > > * if numpy.test() is supposed to be using multiple cores, why > > > > isn't it, > > > > when we've established with other test code that it's now using > > > > multiple > > > > cores? > > > > > > Some numpy.linalg functions (like np.dot) will be using multiple > > > cores, > > > but np.linalg.test() takes only ~1% of the time of the full test > > > suite. > > > Everything else will be running single core. So your observations > > > are not > > > surprising. > > > > > > > > > Though why it would run slower on one box than another comparable > > > box is a > > > mystery... > > > > Maybe just hardware config? I see a similar difference between how > > long the > > test suite runs on TravisCI vs my linux desktop (the latter is > > slower, > > surprisingly). > > > > Ralf > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From njs at pobox.com Wed Jun 29 15:55:11 2016 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 29 Jun 2016 12:55:11 -0700 Subject: [Numpy-discussion] Accelerate or OpenBLAS for numpy / scipy wheels? In-Reply-To: References: Message-ID: On Jun 29, 2016 2:49 AM, "Andrew Jaffe" wrote: > > On 28/06/2016 18:50, Ralf Gommers wrote: >> >> >> On Tue, Jun 28, 2016 at 5:50 PM, Chris Barker > > wrote: >> >> > This doesn't really matter too much imho, we have to support Accelerate >> > either way. >> >> do we? -- so if we go OpenBlas, and someone want to do a simple >> build from source, what happens? Do they get accelerate? >> >> Indeed, unless they go through the effort of downloading a separate BLAS >> and LAPACK, and figuring out how to make that visible to >> numpy.distutils. Very few users will do that. >> >> or would we ship OpenBlas source itself? >> >> Definitely don't want to do that. >> >> or would they need to install OpenBlas some other way? >> >> Yes, or MKL, or ATLAS, or BLIS. We have support for all these, and >> that's a good thing. Making a uniform choice for our official binaries >> on various OSes doesn't reduce the need or effort for supporting those >> other options. >> >> >> >> Faster to fix bugs with good support from main developer. No >> >> >> multiprocessing crashes for Python 2.7. >> >> this seems to be the compelling one. >> >> How does the performance compare? >> >> For most routines performance seems to be comparable, and both are much >> better than ATLAS. When there's a significant difference, I have the >> impression that OpenBLAS is more often the slower one (example: >> https://github.com/xianyi/OpenBLAS/issues/533). > > > In that case: > > -1 > > (but this seems so obvious that I'm probably missing the point of the +1s) Speed is important, but it's far from the only consideration, especially since differences between the top tier libraries are usually rather small. (And note that even though that bug is still listed as open, it has a link to a commit that appears to have fixed it by implementing the missing kernels.) The advantage of openblas is that it's open source, fixable, and we already focus energy on supporting it for Linux (and probably windows too soon). Accelerate is closed, so when we hit bugs then there's often nothing we can do except file a bug with apple and hope that it gets fixed within a year or two. This isn't hypothetical -- we've hit cases where accelerate gave wrong answers. Numpy actually carries some scary code right now to work around one of these bugs by monkeypatching (!) accelerate using dynamic linker trickiness. And, of course, there's the thing where accelerate totally breaks multiprocessing. Apple has said that they don't consider this a bug. Which is probably not much comfort to the new users who are getting obscure hangs when they try to use Python's most obvious and commonly recommended concurrency library. If you sum across our user base, I'm 99% sure that this means accelerate is slower than openblas on net, because you need a *lot* of code to get 10% speedups before it cancels out one person spending 3 days trying to figure out why their code is silently hanging for no reason. This probably makes me sound more negative about accelerate then I actually am -- it does work well most of the time, and obviously lots of people are using it successfully with numpy. But for our official binaries, my vote is we should switch to openblas, because these binaries are likely to be used by non-experts who are likely to hit the multiprocessing issue, and because when we're already struggling to do sufficient QA on our releases then it makes sense to focus our efforts on a single blas library. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From evgeny.burovskiy at gmail.com Wed Jun 29 16:42:35 2016 From: evgeny.burovskiy at gmail.com (Evgeni Burovski) Date: Wed, 29 Jun 2016 21:42:35 +0100 Subject: [Numpy-discussion] Benchmark regression feeds In-Reply-To: <576D8484.7030801@iki.fi> References: <576D8484.7030801@iki.fi> Message-ID: Thanks Pauli! Maybe it's worth to add these to the devdocs pages? On Jun 24, 2016 10:05 PM, "Pauli Virtanen" wrote: > Hi, > > In case someone is interested in getting notifications of performance > regressions in the Numpy and Scipy benchmarks, this is available as Atom > feeds at: > > https://pv.github.io/numpy-bench/regressions.xml > > https://pv.github.io/scipy-bench/regressions.xml > > -- > Pauli Virtanen > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Wed Jun 29 17:06:54 2016 From: sturla.molden at gmail.com (Sturla Molden) Date: Wed, 29 Jun 2016 21:06:54 +0000 (UTC) Subject: [Numpy-discussion] Accelerate or OpenBLAS for numpy / scipy wheels? References: Message-ID: <374618154488923939.947496sturla.molden-gmail.com@news.gmane.org> Ralf Gommers wrote: > For most routines performance seems to be comparable, and both are much > better than ATLAS. When there's a significant difference, I have the > impression that OpenBLAS is more often the slower one (example: > href="https://github.com/xianyi/OpenBLAS/issues/533">https://github.com/xianyi/OpenBLAS/issues/533). Accelerate is in general better optimized for level-1 and level-2 BLAS than OpenBLAS. There are two reasons for this: First, OpenBLAS does not use AVX for these kernels, but Accelerate does. This is the more important difference. It seems the OpenBLAS devs are now working on this. Second, the thread pool in OpenBLAS is not as scalable on small tasks as the "Grand Central Dispatch" (GCD) used by Accelerate. The GCD thread-pool used by Accelerate is actually quite unique in having a very tiny overhead: It takes only 16 extra opcodes (IIRC) for running a task on the global parallel queue instead of the current thread. (Even if my memory is not perfect and it is not exactly 16 opcodes, it is within that order of magnitude.) GCD can do this because the global queues and threadpool is actually built into the kernel of the OS. On the other hand, OpenBLAS and MKL depends on thread pools managed in userspace, for which the scheduler in the OS have no special knowledge. When you need fine-grained parallelism and synchronization, there is nothing like GCD. Even a user-space spinlock will have bigger overhead than a sequential queue in GCD. With a userspace threadpool all threads are scheduled on a round robin basis, but with GCD the scheduler has special knowledge about the tasks put on the queues, and executes them as fast as possible. Accelerate therefore has an unique advantage when running level-1 and 2 BLAS routines, with which OpenBLAS or MKL probably never can properly compete. Programming with GCD can actually often be counter-intuitive to someone used to deal with OpenMP, MPI or pthreads. For example it is often better to enqueue a lot of small tasks instead of splitting up the computation into large chunks of work. When parallelising a tight loop, a chunk size of 1 can be great on GCD but is likely to be horrible on OpenMP and anything else that has userspace threads. Sturla