From charlesr.harris at gmail.com Sun Jan 1 19:04:08 2017 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 1 Jan 2017 17:04:08 -0700 Subject: [Numpy-discussion] NumPy 1.12.0rc2 release. Message-ID: Hi All, I'm pleased to announce the NumPy 1.12.0rc2 New Year's release. This release supports Python 2.7 and 3.4-3.6. Wheels for all supported Python versions may be downloaded from PiPY , the tarball and zip files may be downloaded from Github . The release notes and files hashes may also be found at Github . NumPy 1.12.0rc 2 is the result of 413 pull requests submitted by 139 contributors and comprises a large number of fixes and improvements. Among the many improvements it is difficult to pick out just a few as standing above the others, but the following may be of particular interest or indicate areas likely to have future consequences. * Order of operations in ``np.einsum`` can now be optimized for large speed improvements. * New ``signature`` argument to ``np.vectorize`` for vectorizing with core dimensions. * The ``keepdims`` argument was added to many functions. * New context manager for testing warnings * Support for BLIS in numpy.distutils * Much improved support for PyPy (not yet finished) Enjoy, Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon Jan 2 20:36:22 2017 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 2 Jan 2017 18:36:22 -0700 Subject: [Numpy-discussion] Deprecating matrices. Message-ID: Hi All, Just throwing this click bait out for discussion. Now that the `@` operator is available and things seem to be moving towards Python 3, especially in the classroom, we should consider the real possibility of deprecating the matrix type and later removing it. No doubt there are old scripts that require them, but older versions of numpy are available for those who need to run old scripts. Thoughts? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Mon Jan 2 21:00:56 2017 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Tue, 3 Jan 2017 15:00:56 +1300 Subject: [Numpy-discussion] Deprecating matrices. In-Reply-To: References: Message-ID: On Tue, Jan 3, 2017 at 2:36 PM, Charles R Harris wrote: > Hi All, > > Just throwing this click bait out for discussion. Now that the `@` > operator is available and things seem to be moving towards Python 3, > especially in the classroom, we should consider the real possibility of > deprecating the matrix type and later removing it. No doubt there are old > scripts that require them, but older versions of numpy are available for > those who need to run old scripts. > > Thoughts? > Clearly deprecate in the docs now, and warn only later imho. We can't warn before we have a good solution for scipy.sparse matrices, which have matrix semantics and return matrix instances. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Mon Jan 2 21:26:32 2017 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 2 Jan 2017 21:26:32 -0500 Subject: [Numpy-discussion] Deprecating matrices. In-Reply-To: References: Message-ID: On Mon, Jan 2, 2017 at 9:00 PM, Ralf Gommers wrote: > > > On Tue, Jan 3, 2017 at 2:36 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> Hi All, >> >> Just throwing this click bait out for discussion. Now that the `@` >> operator is available and things seem to be moving towards Python 3, >> especially in the classroom, we should consider the real possibility of >> deprecating the matrix type and later removing it. No doubt there are old >> scripts that require them, but older versions of numpy are available for >> those who need to run old scripts. >> >> Thoughts? >> > > Clearly deprecate in the docs now, and warn only later imho. We can't warn > before we have a good solution for scipy.sparse matrices, which have matrix > semantics and return matrix instances. > > Ralf > How about dropping python 2 support at the same time, then we can all be in a @ world. Josef > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon Jan 2 21:27:09 2017 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 2 Jan 2017 19:27:09 -0700 Subject: [Numpy-discussion] Default type for functions that accumulate integers Message-ID: Hi All, Currently functions like trace use the C long type as the default accumulator for integer types of lesser precision: dtype : dtype, optional > Determines the data-type of the returned array and of the accumulator > where the elements are summed. If dtype has the value None and `a` is > of integer type of precision less than the default integer > precision, then the default integer precision is used. Otherwise, > the precision is the same as that of `a`. > The problem with this is that the precision of long varies with the platform so that the result varies, see gh-8433 for a complaint about this. There are two possible alternatives that seem reasonable to me: 1. Use 32 bit accumulators on 32 bit platforms and 64 bit accumulators on 64 bit platforms. 2. Always use 64 bit accumulators. Thoughts? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Mon Jan 2 21:46:08 2017 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 2 Jan 2017 18:46:08 -0800 Subject: [Numpy-discussion] Default type for functions that accumulate integers In-Reply-To: References: Message-ID: On Mon, Jan 2, 2017 at 6:27 PM, Charles R Harris wrote: > Hi All, > > Currently functions like trace use the C long type as the default > accumulator for integer types of lesser precision: > >> dtype : dtype, optional >> Determines the data-type of the returned array and of the accumulator >> where the elements are summed. If dtype has the value None and `a` is >> of integer type of precision less than the default integer >> precision, then the default integer precision is used. Otherwise, >> the precision is the same as that of `a`. > > > The problem with this is that the precision of long varies with the platform > so that the result varies, see gh-8433 for a complaint about this. There > are two possible alternatives that seem reasonable to me: > > Use 32 bit accumulators on 32 bit platforms and 64 bit accumulators on 64 > bit platforms. > Always use 64 bit accumulators. This is a special case of a more general question: right now we use the default integer precision (i.e., what you get from np.array([1]), or np.arange, or np.dtype(int)), and it turns out that the default integer precision itself varies in confusing ways, and this is a common source of bugs. Specifically: right now it's 32-bit on 32-bit builds, and 64-bit on 64-bit builds, except on Windows where it's always 32-bit. This matches the default precision of Python 2 'int'. So some options include: - make the default integer precision 64-bits everywhere - make the default integer precision 32-bits on 32-bit systems, and 64-bits on 64-bit systems (including Windows) - leave the default integer precision the same, but make accumulators 64-bits everywhere - leave the default integer precision the same, but make accumulators 64-bits on 64-bit systems (including Windows) - ... Given the prevalence of 64-bit systems these days, and the fact that the current setup makes it very easy to write code that seems to work when tested on a 64-bit system but that silently returns incorrect results on 32-bit systems, it sure would be nice if we could switch to a 64-bit default everywhere. (You could still get 32-bit integers, of course, you'd just have to ask for them explicitly.) Things we'd need to know more about before making a decision: - compatibility: if we flip this switch, how much code breaks? In general correct numpy-using code has to be prepared to handle np.dtype(int) being 64-bits, and in fact there might be more code that accidentally assumes that np.dtype(int) is always 64-bits than there is code that assumes it is always 32-bits. But that's theory; to know how bad this is we would need to try actually running some projects test suites and see whether they break or not. - speed: there's probably some cost to using 64-bit integers on 32-bit systems; how big is the penalty in practice? -n -- Nathaniel J. Smith -- https://vorpus.org From njs at pobox.com Mon Jan 2 22:11:01 2017 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 2 Jan 2017 19:11:01 -0800 Subject: [Numpy-discussion] Deprecating matrices. In-Reply-To: References: Message-ID: On Mon, Jan 2, 2017 at 6:26 PM, wrote: > > > On Mon, Jan 2, 2017 at 9:00 PM, Ralf Gommers wrote: >> >> >> >> On Tue, Jan 3, 2017 at 2:36 PM, Charles R Harris >> wrote: >>> >>> Hi All, >>> >>> Just throwing this click bait out for discussion. Now that the `@` >>> operator is available and things seem to be moving towards Python 3, >>> especially in the classroom, we should consider the real possibility of >>> deprecating the matrix type and later removing it. No doubt there are old >>> scripts that require them, but older versions of numpy are available for >>> those who need to run old scripts. >>> >>> Thoughts? >> >> >> Clearly deprecate in the docs now, and warn only later imho. We can't warn >> before we have a good solution for scipy.sparse matrices, which have matrix >> semantics and return matrix instances. >> >> Ralf > > > How about dropping python 2 support at the same time, then we can all be in > a @ world. > > Josef Let's not yoke together two (mostly) unrelated controversial discussions? I doubt we'll be able to remove either Python 2 or matrix support before 2020 at the earliest, so the discussion now is just about how to communicate to users that they should not be using 'matrix'. -n -- Nathaniel J. Smith -- https://vorpus.org From charlesr.harris at gmail.com Mon Jan 2 22:12:00 2017 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 2 Jan 2017 20:12:00 -0700 Subject: [Numpy-discussion] Deprecating matrices. In-Reply-To: References: Message-ID: On Mon, Jan 2, 2017 at 7:26 PM, wrote: > > > On Mon, Jan 2, 2017 at 9:00 PM, Ralf Gommers > wrote: > >> >> >> On Tue, Jan 3, 2017 at 2:36 PM, Charles R Harris < >> charlesr.harris at gmail.com> wrote: >> >>> Hi All, >>> >>> Just throwing this click bait out for discussion. Now that the `@` >>> operator is available and things seem to be moving towards Python 3, >>> especially in the classroom, we should consider the real possibility of >>> deprecating the matrix type and later removing it. No doubt there are old >>> scripts that require them, but older versions of numpy are available for >>> those who need to run old scripts. >>> >>> Thoughts? >>> >> >> Clearly deprecate in the docs now, and warn only later imho. We can't >> warn before we have a good solution for scipy.sparse matrices, which have >> matrix semantics and return matrix instances. >> >> Ralf >> > > How about dropping python 2 support at the same time, then we can all be > in a @ world. > > The "@" operator works with matrices already, what causes problems is the combination of matrices with 1-D arrays. That can be fixed, I think. The big problem is probably the lack of "@" in Python 2.7. I wonder if there is any chance of getting it backported to 2.7 before support is dropped in 2020? I expect it would be a fight, but I also suspect it would not be difficult to do if the proposal was accepted. Then at some future date sparse could simply start returning arrays. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon Jan 2 22:15:43 2017 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 2 Jan 2017 20:15:43 -0700 Subject: [Numpy-discussion] Deprecating matrices. In-Reply-To: References: Message-ID: On Mon, Jan 2, 2017 at 8:12 PM, Charles R Harris wrote: > > > On Mon, Jan 2, 2017 at 7:26 PM, wrote: > >> >> >> On Mon, Jan 2, 2017 at 9:00 PM, Ralf Gommers >> wrote: >> >>> >>> >>> On Tue, Jan 3, 2017 at 2:36 PM, Charles R Harris < >>> charlesr.harris at gmail.com> wrote: >>> >>>> Hi All, >>>> >>>> Just throwing this click bait out for discussion. Now that the `@` >>>> operator is available and things seem to be moving towards Python 3, >>>> especially in the classroom, we should consider the real possibility of >>>> deprecating the matrix type and later removing it. No doubt there are old >>>> scripts that require them, but older versions of numpy are available for >>>> those who need to run old scripts. >>>> >>>> Thoughts? >>>> >>> >>> Clearly deprecate in the docs now, and warn only later imho. We can't >>> warn before we have a good solution for scipy.sparse matrices, which have >>> matrix semantics and return matrix instances. >>> >>> Ralf >>> >> >> How about dropping python 2 support at the same time, then we can all be >> in a @ world. >> >> > The "@" operator works with matrices already, what causes problems is the > combination of matrices with 1-D arrays. That can be fixed, I think. The > big problem is probably the lack of "@" in Python 2.7. I wonder if there is > any chance of getting it backported to 2.7 before support is dropped in > 2020? I expect it would be a fight, but I also suspect it would not be > difficult to do if the proposal was accepted. Then at some future date > sparse could simply start returning arrays. > Hmm, matrix-scalar multiplication will be a problem. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Mon Jan 2 22:29:09 2017 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 2 Jan 2017 19:29:09 -0800 Subject: [Numpy-discussion] Deprecating matrices. In-Reply-To: References: Message-ID: On Mon, Jan 2, 2017 at 7:12 PM, Charles R Harris wrote: > > > On Mon, Jan 2, 2017 at 7:26 PM, wrote: [...] >> How about dropping python 2 support at the same time, then we can all be >> in a @ world. >> > > The "@" operator works with matrices already, what causes problems is the > combination of matrices with 1-D arrays. That can be fixed, I think. The > big problem is probably the lack of "@" in Python 2.7. I wonder if there is > any chance of getting it backported to 2.7 before support is dropped in > 2020? I expect it would be a fight, but I also suspect it would not be > difficult to do if the proposal was accepted. Then at some future date > sparse could simply start returning arrays. Unfortunately the chance of Python 2.7 adding support for "@" is best expressed as a denormal. -n -- Nathaniel J. Smith -- https://vorpus.org From charlesr.harris at gmail.com Mon Jan 2 22:54:19 2017 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 2 Jan 2017 20:54:19 -0700 Subject: [Numpy-discussion] Deprecating matrices. In-Reply-To: References: Message-ID: On Mon, Jan 2, 2017 at 8:29 PM, Nathaniel Smith wrote: > On Mon, Jan 2, 2017 at 7:12 PM, Charles R Harris > wrote: > > > > > > On Mon, Jan 2, 2017 at 7:26 PM, wrote: > [...] > >> How about dropping python 2 support at the same time, then we can all be > >> in a @ world. > >> > > > > The "@" operator works with matrices already, what causes problems is the > > combination of matrices with 1-D arrays. That can be fixed, I think. The > > big problem is probably the lack of "@" in Python 2.7. I wonder if there > is > > any chance of getting it backported to 2.7 before support is dropped in > > 2020? I expect it would be a fight, but I also suspect it would not be > > difficult to do if the proposal was accepted. Then at some future date > > sparse could simply start returning arrays. > > Unfortunately the chance of Python 2.7 adding support for "@" is best > expressed as a denormal. > That's what I figured ;) Hmm, matrices would work fine with the current combination of '*' (works for scalar muiltiplication) and '@' (works for matrices). So for Python3 code currently written for matrices can be reformed to be array compatible. But '@' for Python 2.7 would sure help... Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Mon Jan 2 22:58:06 2017 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 2 Jan 2017 19:58:06 -0800 Subject: [Numpy-discussion] Deprecating matrices. In-Reply-To: References: Message-ID: On Mon, Jan 2, 2017 at 7:54 PM, Charles R Harris wrote: > > > On Mon, Jan 2, 2017 at 8:29 PM, Nathaniel Smith wrote: >> >> On Mon, Jan 2, 2017 at 7:12 PM, Charles R Harris >> wrote: >> > >> > >> > On Mon, Jan 2, 2017 at 7:26 PM, wrote: >> [...] >> >> How about dropping python 2 support at the same time, then we can all >> >> be >> >> in a @ world. >> >> >> > >> > The "@" operator works with matrices already, what causes problems is >> > the >> > combination of matrices with 1-D arrays. That can be fixed, I think. >> > The >> > big problem is probably the lack of "@" in Python 2.7. I wonder if there >> > is >> > any chance of getting it backported to 2.7 before support is dropped in >> > 2020? I expect it would be a fight, but I also suspect it would not be >> > difficult to do if the proposal was accepted. Then at some future date >> > sparse could simply start returning arrays. >> >> Unfortunately the chance of Python 2.7 adding support for "@" is best >> expressed as a denormal. > > > That's what I figured ;) Hmm, matrices would work fine with the current > combination of '*' (works for scalar muiltiplication) and '@' (works for > matrices). So for Python3 code currently written for matrices can be > reformed to be array compatible. But '@' for Python 2.7 would sure help... I mean, it can just use arrays + dot(). It's not as elegant as '@', but given that almost everyone has already switched it's clearly not *that* bad... -n -- Nathaniel J. Smith -- https://vorpus.org From lists at onerussian.com Tue Jan 3 12:00:04 2017 From: lists at onerussian.com (Yaroslav Halchenko) Date: Tue, 3 Jan 2017 12:00:04 -0500 Subject: [Numpy-discussion] numpy vs algebra Was: Integers to negative integer powers... In-Reply-To: References: Message-ID: <20170103170004.GA7160@onerussian.com> On Tue, 11 Oct 2016, Peter Creasey wrote: > >> I agree with Sebastian and Nathaniel. I don't think we can deviating from > >> the existing behavior (int ** int -> int) without breaking lots of existing > >> code, and if we did, yes, we would need a new integer power function. > >> I think it's better to preserve the existing behavior when it gives > >> sensible results, and error when it doesn't. Adding another function > >> float_power for the case that is currently broken seems like the right way > >> to go. > I actually suspect that the amount of code broken by int**int->float > may be relatively small (though extremely annoying for those that it > happens to, and it would definitely be good to have statistics). I > mean, Numpy silently transitioned to int32+uint64->float64 not so long > ago which broke my code, but the world didn?t end. > If the primary argument against int**int->float seems to be the > difficulty of managing the transition, with int**int->Error being the > seen as the required yet *very* painful intermediate step for the > large fraction of the int**int users who didn?t care if it was int or > float (e.g. the output is likely to be cast to float in the next step > anyway), and fail loudly for those users who need int**int->int, then > if you are prepared to risk a less conservative transition (i.e. we > think that latter group is small enough) you could skip the error on > users and just throw a warning for a couple of releases, along the > lines of: > WARNING int**int -> int is going to be deprecated in favour of > int**int->float in Numpy 1.16. To avoid seeing this message, either > use ?from numpy import __future_float_power__? or explicitly set the > type of one of your inputs to float, or use the new ipower(x,y) > function for integer powers. Sorry for coming too late to the discussion and after PR "addressing" the issue by issuing an error was merged [1]. I got burnt by new behavior while trying to build fresh pandas release on Debian (we are freezing for release way too soon ;) ) -- some pandas tests failed since they rely on previous non-erroring behavior and we got numpy 1.12.0~b1 which included [1] in unstable/testing (candidate release) now. I quickly glanced over the discussion but I guess I have missed actual description of the problem being fixed here... what was it?? previous behavior, int**int->int made sense to me as it seemed to be consistent with casting Python's pow result to int, somewhat fulfilling desired promise for in-place operations and being inline with built-in pow results as far as I see it (up to casting). Current handling and error IMHO is going against rudimentary algebra, where numbers can be brought to negative power (integer or not). [1] https://github.com/numpy/numpy/pull/8231 -- Yaroslav O. Halchenko Center for Open Neuroscience http://centerforopenneuroscience.org Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 WWW: http://www.linkedin.com/in/yarik From sebastian at sipsolutions.net Tue Jan 3 12:08:41 2017 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 03 Jan 2017 18:08:41 +0100 Subject: [Numpy-discussion] Default type for functions that accumulate integers In-Reply-To: References: Message-ID: <1483463321.27223.46.camel@sipsolutions.net> On Mo, 2017-01-02 at 18:46 -0800, Nathaniel Smith wrote: > On Mon, Jan 2, 2017 at 6:27 PM, Charles R Harris > wrote: > > > > Hi All, > > > > Currently functions like trace use the C long type as the default > > accumulator for integer types of lesser precision: > > > > Things we'd need to know more about before making a decision: > - compatibility: if we flip this switch, how much code breaks? In > general correct numpy-using code has to be prepared to handle > np.dtype(int) being 64-bits, and in fact there might be more code > that > accidentally assumes that np.dtype(int) is always 64-bits than there > is code that assumes it is always 32-bits. But that's theory; to know > how bad this is we would need to try actually running some projects > test suites and see whether they break or not. > - speed: there's probably some cost to using 64-bit integers on 32- > bit > systems; how big is the penalty in practice? > I agree with trying to switch the default in general first, I don't like the idea of having two different "defaults". There are two issues, one is the change on Python 2 (no inheritance of Python int by default numpy type) and any issues due to increased precision (more RAM usage, code actually expects lower precision somehow, etc.). Cannot say I know for sure, but I would be extremely surprised if there is a speed difference between 32bit vs. 64bit architectures, except the general slowdown you get due to bus speeds, etc. when going to higher bit width. If the inheritance for some reason is a bigger issue, we might limit the change to Python 3. For other possible problems, I think we may have difficulties assessing how much is affected. The problem is, that the most affected thing should be projects only being used on windows, or so. Bigger projects should work fine already (they are more likely to get better due to not being tested as well on 32bit long platforms, especially 64bit windows). Of course limiting the change to python 3, could have the advantage of not affecting older projects which are possibly more likely to be specifically using the current behaviour. So, I would be open to trying the change, I think the idea of at least changing it in python 3 has been brought up a couple of times, including by Julian, so maybe it is time to give it a shot.... It would be interesting to see if anyone knows projects that may be affected (for example because they are designed to only run on windows or limited hardware), and if avoiding to change anything in python 2 might mitigate problems here as well (additionally to avoiding the inheritance change)? Best, Sebastian > -n > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From shoyer at gmail.com Tue Jan 3 12:37:27 2017 From: shoyer at gmail.com (Stephan Hoyer) Date: Tue, 3 Jan 2017 09:37:27 -0800 Subject: [Numpy-discussion] numpy vs algebra Was: Integers to negative integer powers... In-Reply-To: <20170103170004.GA7160@onerussian.com> References: <20170103170004.GA7160@onerussian.com> Message-ID: On Tue, Jan 3, 2017 at 9:00 AM, Yaroslav Halchenko wrote: > Sorry for coming too late to the discussion and after PR "addressing" > the issue by issuing an error was merged [1]. I got burnt by new > behavior while trying to build fresh pandas release on Debian (we are > freezing for release way too soon ;) ) -- some pandas tests failed since > they rely on previous non-erroring behavior and we got numpy 1.12.0~b1 > which included [1] in unstable/testing (candidate release) now. > > I quickly glanced over the discussion but I guess I have missed > actual description of the problem being fixed here... what was it?? > > previous behavior, int**int->int made sense to me as it seemed to be > consistent with casting Python's pow result to int, somewhat fulfilling > desired promise for in-place operations and being inline with built-in > pow results as far as I see it (up to casting). I believe this is exactly the behavior we preserved. Rather, we turned some cases that previously often gave wrong results (involving negative integer powers) into errors. The pandas test suite triggered this behavior, but not intentionally, and should be fixed in the next release: https://github.com/pandas-dev/pandas/pull/14498 -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue Jan 3 13:15:14 2017 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 3 Jan 2017 11:15:14 -0700 Subject: [Numpy-discussion] Default type for functions that accumulate integers In-Reply-To: <1483463321.27223.46.camel@sipsolutions.net> References: <1483463321.27223.46.camel@sipsolutions.net> Message-ID: On Tue, Jan 3, 2017 at 10:08 AM, Sebastian Berg wrote: > On Mo, 2017-01-02 at 18:46 -0800, Nathaniel Smith wrote: > > On Mon, Jan 2, 2017 at 6:27 PM, Charles R Harris > > wrote: > > > > > > Hi All, > > > > > > Currently functions like trace use the C long type as the default > > > accumulator for integer types of lesser precision: > > > > > > > > > > Things we'd need to know more about before making a decision: > > - compatibility: if we flip this switch, how much code breaks? In > > general correct numpy-using code has to be prepared to handle > > np.dtype(int) being 64-bits, and in fact there might be more code > > that > > accidentally assumes that np.dtype(int) is always 64-bits than there > > is code that assumes it is always 32-bits. But that's theory; to know > > how bad this is we would need to try actually running some projects > > test suites and see whether they break or not. > > - speed: there's probably some cost to using 64-bit integers on 32- > > bit > > systems; how big is the penalty in practice? > > > > I agree with trying to switch the default in general first, I don't > like the idea of having two different "defaults". > > There are two issues, one is the change on Python 2 (no inheritance of > Python int by default numpy type) and any issues due to increased > precision (more RAM usage, code actually expects lower precision > somehow, etc.). > Cannot say I know for sure, but I would be extremely surprised if there > is a speed difference between 32bit vs. 64bit architectures, except the > general slowdown you get due to bus speeds, etc. when going to higher > bit width. > > If the inheritance for some reason is a bigger issue, we might limit > the change to Python 3. For other possible problems, I think we may > have difficulties assessing how much is affected. The problem is, that > the most affected thing should be projects only being used on windows, > or so. Bigger projects should work fine already (they are more likely > to get better due to not being tested as well on 32bit long platforms, > especially 64bit windows). > > Of course limiting the change to python 3, could have the advantage of > not affecting older projects which are possibly more likely to be > specifically using the current behaviour. > > So, I would be open to trying the change, I think the idea of at least > changing it in python 3 has been brought up a couple of times, > including by Julian, so maybe it is time to give it a shot.... > > It would be interesting to see if anyone knows projects that may be > affected (for example because they are designed to only run on windows > or limited hardware), and if avoiding to change anything in python 2 > might mitigate problems here as well (additionally to avoiding the > inheritance change)? > There have been a number of reports of problems due to the inheritance stemming both from the changing precision and, IIRC, from differences in print format or some such. So I don't expect that there will be no problems, but they will probably not be difficult to fix. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From toddrjen at gmail.com Tue Jan 3 14:31:45 2017 From: toddrjen at gmail.com (Todd) Date: Tue, 3 Jan 2017 14:31:45 -0500 Subject: [Numpy-discussion] Deprecating matrices. In-Reply-To: References: Message-ID: On Mon, Jan 2, 2017 at 8:36 PM, Charles R Harris wrote: > Hi All, > > Just throwing this click bait out for discussion. Now that the `@` > operator is available and things seem to be moving towards Python 3, > especially in the classroom, we should consider the real possibility of > deprecating the matrix type and later removing it. No doubt there are old > scripts that require them, but older versions of numpy are available for > those who need to run old scripts. > > Thoughts? > > Chuck > > What if the matrix class was split out into its own project, perhaps as a scikit. That way those who really need it can still use it. If there is sufficient desire for it, those who need it can maintain it. If not, it will hopefully it will take long enough for it to bitrot that everyone has transitioned. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.v.root at gmail.com Tue Jan 3 14:54:28 2017 From: ben.v.root at gmail.com (Benjamin Root) Date: Tue, 3 Jan 2017 14:54:28 -0500 Subject: [Numpy-discussion] Deprecating matrices. In-Reply-To: References: Message-ID: That's not a bad idea. Matplotlib is currently considering something similar for its mlab module. It has been there since the beginning, but it is very outdated and very out-of-scope for matplotlib. However, there are still lots of code out there that depends on it. So, we are looking to split it off as its own package. The details still need to be worked out (should we initially depend on the package and simply alias its import with a DeprecationWarning, or should we go cold turkey and have a good message explaining the change). Ben Root On Tue, Jan 3, 2017 at 2:31 PM, Todd wrote: > On Mon, Jan 2, 2017 at 8:36 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> Hi All, >> >> Just throwing this click bait out for discussion. Now that the `@` >> operator is available and things seem to be moving towards Python 3, >> especially in the classroom, we should consider the real possibility of >> deprecating the matrix type and later removing it. No doubt there are old >> scripts that require them, but older versions of numpy are available for >> those who need to run old scripts. >> >> Thoughts? >> >> Chuck >> >> > What if the matrix class was split out into its own project, perhaps as a > scikit. That way those who really need it can still use it. If there is > sufficient desire for it, those who need it can maintain it. If not, it > will hopefully it will take long enough for it to bitrot that everyone has > transitioned. > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Tue Jan 3 14:59:47 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 3 Jan 2017 20:59:47 +0100 Subject: [Numpy-discussion] Default type for functions that accumulate integers References: Message-ID: <20170103205947.69473d91@fsol> On Mon, 2 Jan 2017 18:46:08 -0800 Nathaniel Smith wrote: > > So some options include: > - make the default integer precision 64-bits everywhere > - make the default integer precision 32-bits on 32-bit systems, and > 64-bits on 64-bit systems (including Windows) Either of those two would be the best IMO. Intuitively, I think people would expect 32-bit ints in 32-bit processes by default, and 64-bit ints in 64-bit processes likewise. So I would slightly favour the latter option. > - leave the default integer precision the same, but make accumulators > 64-bits everywhere > - leave the default integer precision the same, but make accumulators > 64-bits on 64-bit systems (including Windows) Both of these options introduce a confusing discrepancy. > - speed: there's probably some cost to using 64-bit integers on 32-bit > systems; how big is the penalty in practice? Ok, I have fired up a Windows VM to compare 32-bit and 64-bit builds. Numpy version is 1.11.2, Python version is 3.5.2. Keep in mind those are Anaconda builds of Numpy, with MKL enabled for linear algebra; YMMV. For each benchmark, the first number is the result on the 32-bit build, the second number on the 64-bit build. Simple arithmetic ----------------- >>> v = np.ones(1024**2, dtype='int32') >>> %timeit v + v # 1.73 ms per loop | 1.78 ms per loop >>> %timeit v * v # 1.77 ms per loop | 1.79 ms per loop >>> %timeit v // v # 5.89 ms per loop | 5.39 ms per loop >>> v = np.ones(1024**2, dtype='int64') >>> %timeit v + v # 3.54 ms per loop | 3.54 ms per loop >>> %timeit v * v # 5.61 ms per loop | 3.52 ms per loop >>> %timeit v // v # 17.1 ms per loop | 13.9 ms per loop Linear algebra -------------- >>> m = np.ones((1024,1024), dtype='int32') >>> %timeit m @ m # 556 ms per loop | 569 ms per loop >>> m = np.ones((1024,1024), dtype='int64') >>> %timeit m @ m # 3.81 s per loop | 1.01 s per loop Sorting ------- >>> v = np.random.RandomState(42).randint(1000, size=1024**2).astype('int32') >>> %timeit np.sort(v) # 43.4 ms per loop | 44 ms per loop >>> v = np.random.RandomState(42).randint(1000, size=1024**2).astype('int64') >>> %timeit np.sort(v) # 61.5 ms per loop | 45.5 ms per loop Indexing -------- >>> v = np.ones(1024**2, dtype='int32') >>> %timeit v[v[::-1]] # 2.38 ms per loop | 4.63 ms per loop >>> v = np.ones(1024**2, dtype='int64') >>> %timeit v[v[::-1]] # 6.9 ms per loop | 3.63 ms per loop Quick summary: - for very simple operations, 32b and 64b builds can have the same perf on each given bitwidth (though speed is uniformly halved on 64-bit integers when the given operation is SIMD-vectorized) - for more sophisticated operations (such as element-wise multiplication or division, or quicksort, but much more so on the matrix product), 32b builds are competitive with 64b builds on 32-bit ints, but lag behind on 64-bit ints - for indexing, it's desirable to use a "native" width integer, regardless of whether that means 32- or 64-bit Of course the numbers will vary depend on the platform (read: compiler), but some aspects of this comparison will probably translate to other platforms. Regards Antoine. From bryanv at continuum.io Tue Jan 3 15:07:55 2017 From: bryanv at continuum.io (Bryan Van de Ven) Date: Tue, 3 Jan 2017 14:07:55 -0600 Subject: [Numpy-discussion] Deprecating matrices. In-Reply-To: References: Message-ID: <8865B9EF-5A73-4784-8478-0A254333BA2B@continuum.io> There's a good chance that bokeh.charts will be split off into a separately distributed package as well. Hopefully being a much smaller, pure Python project makes it a more accessible target for anyone interested in maintaining it, and if no one is interested in it anymore, well that fact becomes easier to judge. I think it would be a reasonable approach here for the same reasons. Bryan > On Jan 3, 2017, at 13:54, Benjamin Root wrote: > > That's not a bad idea. Matplotlib is currently considering something similar for its mlab module. It has been there since the beginning, but it is very outdated and very out-of-scope for matplotlib. However, there are still lots of code out there that depends on it. So, we are looking to split it off as its own package. The details still need to be worked out (should we initially depend on the package and simply alias its import with a DeprecationWarning, or should we go cold turkey and have a good message explaining the change). > > Ben Root > > >> On Tue, Jan 3, 2017 at 2:31 PM, Todd wrote: >>> On Mon, Jan 2, 2017 at 8:36 PM, Charles R Harris wrote: >>> Hi All, >>> >>> Just throwing this click bait out for discussion. Now that the `@` operator is available and things seem to be moving towards Python 3, especially in the classroom, we should consider the real possibility of deprecating the matrix type and later removing it. No doubt there are old scripts that require them, but older versions of numpy are available for those who need to run old scripts. >>> >>> Thoughts? >>> >>> Chuck >>> >> >> What if the matrix class was split out into its own project, perhaps as a scikit. That way those who really need it can still use it. If there is sufficient desire for it, those who need it can maintain it. If not, it will hopefully it will take long enough for it to bitrot that everyone has transitioned. >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From lists at onerussian.com Tue Jan 3 16:46:59 2017 From: lists at onerussian.com (Yaroslav Halchenko) Date: Tue, 3 Jan 2017 16:46:59 -0500 Subject: [Numpy-discussion] numpy vs algebra Was: Integers to negative integer powers... In-Reply-To: References: <20170103170004.GA7160@onerussian.com> Message-ID: <20170103214659.GB7160@onerussian.com> On Tue, 03 Jan 2017, Stephan Hoyer wrote: > On Tue, Jan 3, 2017 at 9:00 AM, Yaroslav Halchenko > wrote: > Sorry for coming too late to the discussion and after PR "addressing" > the issue by issuing an error was merged [1].A I got burnt by new > behavior while trying to build fresh pandas release on Debian (we are > freezing for release way too soon ;) ) -- some pandas tests failed since > they rely on previous non-erroring behavior and we gotA numpy 1.12.0~b1 > which included [1] in unstable/testing (candidate release) now. > I quickly glanced over the discussion but I guess I have missed > actual description of the problem being fixed here...A what was it?? > previous behavior, int**int->int made sense to me as it seemed to be > consistent with casting Python's pow result to int, somewhat fulfilling > desired promise for in-place operations and being inline with built-in > pow results as far as I see it (up to casting). > I believe this is exactly the behavior we preserved. Rather, we turned > some cases that previously often gave wrong results (involving negative > integer powers) into errors. hm... testing on current master (first result is from python's pow) $> python -c "import numpy; print('numpy version: ', numpy.__version__); a=2; b=-2; print(pow(a,b)); print(pow(numpy.array(a), b))" ('numpy version: ', '1.13.0.dev0+02e2ea8') 0.25 Traceback (most recent call last): File "", line 1, in ValueError: Integers to negative integer powers are not allowed. testing on Debian's packaged beta $> python -c "import numpy; print('numpy version: ', numpy.__version__); a=2; b=-2; print(pow(a,b)); print(pow(numpy.array(a), b))" ('numpy version: ', '1.12.0b1') 0.25 Traceback (most recent call last): File "", line 1, in ValueError: Integers to negative integer powers are not allowed. testing on stable debian box with elderly numpy, where it does behave sensibly: $> python -c "import numpy; print('numpy version: ', numpy.__version__); a=2; b=-2; print(pow(a,b)); print(pow(numpy.array(a), b))" ('numpy version: ', '1.8.2') 0.25 0 what am I missing? > The pandas test suite triggered this behavior, but not intentionally, and > should be fixed in the next release: > https://github.com/pandas-dev/pandas/pull/14498 I don't think that was the full set of cases, e.g. (git)hopa/sid-i386:~exppsy/pandas[bf-i386] $> nosetests -s -v pandas/tests/test_expressions.py:TestExpressions.test_mixed_arithmetic_series test_mixed_arithmetic_series (pandas.tests.test_expressions.TestExpressions) ... ERROR ====================================================================== ERROR: test_mixed_arithmetic_series (pandas.tests.test_expressions.TestExpressions) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/tests/test_expressions.py", line 223, in test_mixed_arithmetic_series self.run_series(self.mixed2[col], self.mixed2[col], binary_comp=4) File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/tests/test_expressions.py", line 164, in run_series test_flex=False, **kwargs) File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/tests/test_expressions.py", line 93, in run_arithmetic_test expected = op(df, other) File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/core/ops.py", line 715, in wrapper result = wrap_results(safe_na_op(lvalues, rvalues)) File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/core/ops.py", line 676, in safe_na_op return na_op(lvalues, rvalues) File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/core/ops.py", line 652, in na_op raise_on_error=True, **eval_kwargs) File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/computation/expressions.py", line 210, in evaluate **eval_kwargs) File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/computation/expressions.py", line 63, in _evaluate_standard return op(a, b) ValueError: Integers to negative integer powers are not allowed. and being paranoid, I have rebuilt exact current master of pandas with master numpy in PYTHONPATH: (git)hopa:~exppsy/pandas[master]git $> PYTHONPATH=/home/yoh/proj/numpy nosetests -s -v pandas/tests/test_expressions.py:TestExpressions.test_mixed_arithmetic_series test_mixed_arithmetic_series (pandas.tests.test_expressions.TestExpressions) ... ERROR ====================================================================== ERROR: test_mixed_arithmetic_series (pandas.tests.test_expressions.TestExpressions) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/tests/test_expressions.py", line 223, in test_mixed_arithmetic_series self.run_series(self.mixed2[col], self.mixed2[col], binary_comp=4) File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/tests/test_expressions.py", line 164, in run_series test_flex=False, **kwargs) File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/tests/test_expressions.py", line 93, in run_arithmetic_test expected = op(df, other) File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/core/ops.py", line 715, in wrapper result = wrap_results(safe_na_op(lvalues, rvalues)) File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/core/ops.py", line 676, in safe_na_op return na_op(lvalues, rvalues) File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/core/ops.py", line 652, in na_op raise_on_error=True, **eval_kwargs) File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/computation/expressions.py", line 210, in evaluate **eval_kwargs) File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/computation/expressions.py", line 63, in _evaluate_standard return op(a, b) ValueError: Integers to negative integer powers are not allowed. ---------------------------------------------------------------------- Ran 1 test in 0.015s FAILED (errors=1) $> git describe --tags v0.19.0-303-gb957f6f $> PYTHONPATH=/home/yoh/proj/numpy python -c "import numpy; print('numpy version: ', numpy.__version__); a=2; b=-2; print(pow(a,b)); print(pow(numpy.array(a), b))" ('numpy version: ', '1.13.0.dev0+02e2ea8') 0.25 Traceback (most recent call last): File "", line 1, in ValueError: Integers to negative integer powers are not allowed. -- Yaroslav O. Halchenko Center for Open Neuroscience http://centerforopenneuroscience.org Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 WWW: http://www.linkedin.com/in/yarik From njs at pobox.com Tue Jan 3 18:05:09 2017 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 3 Jan 2017 15:05:09 -0800 Subject: [Numpy-discussion] numpy vs algebra Was: Integers to negative integer powers... In-Reply-To: <20170103214659.GB7160@onerussian.com> References: <20170103170004.GA7160@onerussian.com> <20170103214659.GB7160@onerussian.com> Message-ID: It's possible we should back off to just issuing a deprecation warning in 1.12? On Jan 3, 2017 1:47 PM, "Yaroslav Halchenko" wrote: > > On Tue, 03 Jan 2017, Stephan Hoyer wrote: > > > On Tue, Jan 3, 2017 at 9:00 AM, Yaroslav Halchenko < > lists at onerussian.com> > > wrote: > > > Sorry for coming too late to the discussion and after PR > "addressing" > > the issue by issuing an error was merged [1].A I got burnt by new > > behavior while trying to build fresh pandas release on Debian (we > are > > freezing for release way too soon ;) ) -- some pandas tests failed > since > > they rely on previous non-erroring behavior and we gotA numpy > 1.12.0~b1 > > which included [1] in unstable/testing (candidate release) now. > > > I quickly glanced over the discussion but I guess I have missed > > actual description of the problem being fixed here...A what was > it?? > > > previous behavior, int**int->int made sense to me as it seemed to be > > consistent with casting Python's pow result to int, somewhat > fulfilling > > desired promise for in-place operations and being inline with > built-in > > pow results as far as I see it (up to casting). > > > I believe this is exactly the behavior we preserved. Rather, we turned > > some cases that previously often gave wrong results (involving > negative > > integer powers) into errors. > > hm... testing on current master (first result is from python's pow) > > $> python -c "import numpy; print('numpy version: ', numpy.__version__); > a=2; b=-2; print(pow(a,b)); print(pow(numpy.array(a), b))" > ('numpy version: ', '1.13.0.dev0+02e2ea8') > 0.25 > Traceback (most recent call last): > File "", line 1, in > ValueError: Integers to negative integer powers are not allowed. > > > testing on Debian's packaged beta > > $> python -c "import numpy; print('numpy version: ', numpy.__version__); > a=2; b=-2; print(pow(a,b)); print(pow(numpy.array(a), b))" > ('numpy version: ', '1.12.0b1') > 0.25 > Traceback (most recent call last): > File "", line 1, in > ValueError: Integers to negative integer powers are not allowed. > > > testing on stable debian box with elderly numpy, where it does behave > sensibly: > > $> python -c "import numpy; print('numpy version: ', numpy.__version__); > a=2; b=-2; print(pow(a,b)); print(pow(numpy.array(a), b))" > ('numpy version: ', '1.8.2') > 0.25 > 0 > > what am I missing? > > > The pandas test suite triggered this behavior, but not intentionally, > and > > should be fixed in the next release: > > https://github.com/pandas-dev/pandas/pull/14498 > > I don't think that was the full set of cases, e.g. > > (git)hopa/sid-i386:~exppsy/pandas[bf-i386] > $> nosetests -s -v pandas/tests/test_expressions. > py:TestExpressions.test_mixed_arithmetic_series > test_mixed_arithmetic_series (pandas.tests.test_expressions.TestExpressions) > ... ERROR > > ====================================================================== > ERROR: test_mixed_arithmetic_series (pandas.tests.test_ > expressions.TestExpressions) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/tests/test_expressions.py", > line 223, in test_mixed_arithmetic_series > self.run_series(self.mixed2[col], self.mixed2[col], binary_comp=4) > File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/tests/test_expressions.py", > line 164, in run_series > test_flex=False, **kwargs) > File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/tests/test_expressions.py", > line 93, in run_arithmetic_test > expected = op(df, other) > File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/core/ops.py", line > 715, in wrapper > result = wrap_results(safe_na_op(lvalues, rvalues)) > File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/core/ops.py", line > 676, in safe_na_op > return na_op(lvalues, rvalues) > File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/core/ops.py", line > 652, in na_op > raise_on_error=True, **eval_kwargs) > File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/computation/expressions.py", > line 210, in evaluate > **eval_kwargs) > File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/computation/expressions.py", > line 63, in _evaluate_standard > return op(a, b) > ValueError: Integers to negative integer powers are not allowed. > > > and being paranoid, I have rebuilt exact current master of pandas with > master numpy in PYTHONPATH: > > (git)hopa:~exppsy/pandas[master]git > $> PYTHONPATH=/home/yoh/proj/numpy nosetests -s -v > pandas/tests/test_expressions.py:TestExpressions.test_mixed_ > arithmetic_series > test_mixed_arithmetic_series (pandas.tests.test_expressions.TestExpressions) > ... ERROR > > ====================================================================== > ERROR: test_mixed_arithmetic_series (pandas.tests.test_ > expressions.TestExpressions) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/tests/test_expressions.py", > line 223, in test_mixed_arithmetic_series > self.run_series(self.mixed2[col], self.mixed2[col], binary_comp=4) > File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/tests/test_expressions.py", > line 164, in run_series > test_flex=False, **kwargs) > File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/tests/test_expressions.py", > line 93, in run_arithmetic_test > expected = op(df, other) > File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/core/ops.py", line > 715, in wrapper > result = wrap_results(safe_na_op(lvalues, rvalues)) > File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/core/ops.py", line > 676, in safe_na_op > return na_op(lvalues, rvalues) > File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/core/ops.py", line > 652, in na_op > raise_on_error=True, **eval_kwargs) > File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/computation/expressions.py", > line 210, in evaluate > **eval_kwargs) > File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/computation/expressions.py", > line 63, in _evaluate_standard > return op(a, b) > ValueError: Integers to negative integer powers are not allowed. > > ---------------------------------------------------------------------- > Ran 1 test in 0.015s > > FAILED (errors=1) > > $> git describe --tags > v0.19.0-303-gb957f6f > > $> PYTHONPATH=/home/yoh/proj/numpy python -c "import numpy; print('numpy > version: ', numpy.__version__); a=2; b=-2; print(pow(a,b)); > print(pow(numpy.array(a), b))" > > ('numpy version: ', '1.13.0.dev0+02e2ea8') > 0.25 > Traceback (most recent call last): > File "", line 1, in > ValueError: Integers to negative integer powers are not allowed. > > > -- > Yaroslav O. Halchenko > Center for Open Neuroscience http://centerforopenneuroscience.org > Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 > Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 > WWW: http://www.linkedin.com/in/yarik > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Tue Jan 3 19:09:55 2017 From: shoyer at gmail.com (Stephan Hoyer) Date: Tue, 3 Jan 2017 16:09:55 -0800 Subject: [Numpy-discussion] numpy vs algebra Was: Integers to negative integer powers... In-Reply-To: References: <20170103170004.GA7160@onerussian.com> <20170103214659.GB7160@onerussian.com> Message-ID: On Tue, Jan 3, 2017 at 3:05 PM, Nathaniel Smith wrote: > It's possible we should back off to just issuing a deprecation warning in > 1.12? > > On Jan 3, 2017 1:47 PM, "Yaroslav Halchenko" wrote: > >> hm... testing on current master (first result is from python's pow) >> >> $> python -c "import numpy; print('numpy version: ', numpy.__version__); >> a=2; b=-2; print(pow(a,b)); print(pow(numpy.array(a), b))" >> ('numpy version: ', '1.13.0.dev0+02e2ea8') >> 0.25 >> Traceback (most recent call last): >> File "", line 1, in >> ValueError: Integers to negative integer powers are not allowed. >> >> >> testing on Debian's packaged beta >> >> $> python -c "import numpy; print('numpy version: ', numpy.__version__); >> a=2; b=-2; print(pow(a,b)); print(pow(numpy.array(a), b))" >> ('numpy version: ', '1.12.0b1') >> 0.25 >> Traceback (most recent call last): >> File "", line 1, in >> ValueError: Integers to negative integer powers are not allowed. >> >> >> testing on stable debian box with elderly numpy, where it does behave >> sensibly: >> >> $> python -c "import numpy; print('numpy version: ', numpy.__version__); >> a=2; b=-2; print(pow(a,b)); print(pow(numpy.array(a), b))" >> ('numpy version: ', '1.8.2') >> 0.25 >> 0 >> >> what am I missing? >> >> 2 ** -2 should be 0.25. On old versions of NumPy, you see the the incorrect answer 0. We are now preferring to give an error rather than the wrong answer. > > The pandas test suite triggered this behavior, but not intentionally, >> and >> > should be fixed in the next release: >> > https://github.com/pandas-dev/pandas/pull/14498 >> >> I don't think that was the full set of cases, e.g. >> >> (git)hopa/sid-i386:~exppsy/pandas[bf-i386] >> $> nosetests -s -v pandas/tests/test_expressions. >> py:TestExpressions.test_mixed_arithmetic_series >> test_mixed_arithmetic_series (pandas.tests.test_expressions.TestExpressions) >> ... ERROR >> >> ====================================================================== >> ERROR: test_mixed_arithmetic_series (pandas.tests.test_expressions >> .TestExpressions) >> ---------------------------------------------------------------------- >> Traceback (most recent call last): >> File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/tests/test_expressions.py", >> line 223, in test_mixed_arithmetic_series >> self.run_series(self.mixed2[col], self.mixed2[col], binary_comp=4) >> File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/tests/test_expressions.py", >> line 164, in run_series >> test_flex=False, **kwargs) >> File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/tests/test_expressions.py", >> line 93, in run_arithmetic_test >> expected = op(df, other) >> File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/core/ops.py", line >> 715, in wrapper >> result = wrap_results(safe_na_op(lvalues, rvalues)) >> File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/core/ops.py", line >> 676, in safe_na_op >> return na_op(lvalues, rvalues) >> File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/core/ops.py", line >> 652, in na_op >> raise_on_error=True, **eval_kwargs) >> File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/computation/expressions.py", >> line 210, in evaluate >> **eval_kwargs) >> File "/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/computation/expressions.py", >> line 63, in _evaluate_standard >> return op(a, b) >> ValueError: Integers to negative integer powers are not allowed. >> > Agreed, it looks like pandas still has this issue in the test suite. Nonetheless, I don't think this should be an issue for users -- pandas defines all handling of arithmetic to numpy. -------------- next part -------------- An HTML attachment was scrubbed... URL: From lists at onerussian.com Tue Jan 3 21:24:43 2017 From: lists at onerussian.com (Yaroslav Halchenko) Date: Tue, 3 Jan 2017 21:24:43 -0500 Subject: [Numpy-discussion] numpy vs algebra Was: Integers to negative integer powers... In-Reply-To: References: <20170103170004.GA7160@onerussian.com> <20170103214659.GB7160@onerussian.com> Message-ID: <20170104022443.GC7160@onerussian.com> On Tue, 03 Jan 2017, Stephan Hoyer wrote: > >> testing on stable debian box with elderly numpy, where it does behave > >> sensibly: > >> $> python -c "import numpy; print('numpy version: ', numpy.__version__); > >> a=2; b=-2; print(pow(a,b)); print(pow(numpy.array(a), b))" > >> ('numpy version: ', '1.8.2') > >> 0.25 > >> 0 > >> what am I missing? > 2 ** -2 should be 0.25. > On old versions of NumPy, you see the the incorrect answer 0. We are now > preferring to give an error rather than the wrong answer. it is correct up to casting/truncating to an int for the desire to maintain the int data type -- the same as >>> int(0.25) 0 >>> 1/4 0 or even >>> np.arange(5)/4 array([0, 0, 0, 0, 1]) so it is IMHO more of a documented feature and I don't see why pow needs to get all so special. Sure thing, in the bring future, unless in-place operation is demanded I would have voted for consistent float output. -- Yaroslav O. Halchenko Center for Open Neuroscience http://centerforopenneuroscience.org Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 WWW: http://www.linkedin.com/in/yarik From alex.rogozhnikov at yandex.ru Thu Jan 5 04:12:24 2017 From: alex.rogozhnikov at yandex.ru (Alex Rogozhnikov) Date: Thu, 5 Jan 2017 13:12:24 +0400 Subject: [Numpy-discussion] From Python to Numpy In-Reply-To: <62BD0BF9-534F-4A63-98E0-CE1C0137F805@inria.fr> References: <62BD0BF9-534F-4A63-98E0-CE1C0137F805@inria.fr> Message-ID: <31A0AC3C-8E06-4B51-A82C-44E52C5C7A58@yandex.ru> > 31 ???. 2016 ?., ? 2:09, Nicolas P. Rougier ???????(?): > >> >> On 30 Dec 2016, at 20:36, Alex Rogozhnikov wrote: >> >> Hi Nicolas, >> that's a very nice work! >> >>> Comments/questions/fixes/ideas are of course welcome. >> >> Boids example brought my attention too, some comments on it: >> - I find using complex numbers here very natural, this should speed up things and also shorten the code (rotating without einsum, etc.) >> - you probably can speed up things with going to sparse arrays >> - and you can go to really large numbers of 'birds' if you combine it with preliminary splitting of space into squares, thus analyze only birds from close squares >> >> Also I think worth adding some operations with HSV / HSL color spaces as those can be visualized easily e.g. on some photo. >> >> Thanks, >> Alex. > > > Thanks. > > I'm not sure to know how to use complex with this example. Could you elaborate ? Position and velocity are encoded by complex numbers. Rotation is multiplication by exp(i \phi), translating is adding a complex number. Distance = abs(x - y). I think, that's all operations you need, but maybe I miss something. > > For the preliminary splitting, a quadtree (scipy KDTree) could also help a lot but I wanted to stick to numpy only. > A simpler square splitting as you suggest could make thing faster but require some work. I'm not sure yet I see how to restrict analysis to close squares. > > Nicolas > > >> >> >> >>> 23 ???. 2016 ?., ? 12:14, Kiko ???????(?): >>> >>> >>> >>> 2016-12-22 17:44 GMT+01:00 Nicolas P. Rougier : >>> >>> Dear all, >>> >>> I've just put online a (kind of) book on Numpy and more specifically about vectorization methods. It's not yet finished, has not been reviewed and it's a bit rough around the edges. But I think there are some material that can be interesting. I'm specifically happy with the boids example that show a nice combination of numpy and matplotlib strengths. >>> >>> Book is online at: http://www.labri.fr/perso/nrougier/from-python-to-numpy/ >>> Sources are available at: https://github.com/rougier/from-python-to-numpy >>> >>> >>> Comments/questions/fixes/ideas are of course welcome. >>> >>> Wow!!! Beautiful. >>> >>> Thanks for sharing. >>> >>> >>> >>> Nicolas >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> https://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From loic.esteve at ymail.com Thu Jan 5 15:11:43 2017 From: loic.esteve at ymail.com (=?UTF-8?B?TG/Dr2MgRXN0w6h2ZQ==?=) Date: Thu, 5 Jan 2017 21:11:43 +0100 Subject: [Numpy-discussion] Proposed change in memmap offset attribute Message-ID: Dear all, I have a PR at https://github.com/numpy/numpy/pull/8443 that proposes to change the value of the offset attribute of memmap objects. At the moment it is not the offset into the memmap file (as the docstring would lead you to believe) but this modulo mmap.ALLOCATIONGRANULARITY. It was deemed best to double-check on the mailing list whether anyone could think of a good reason why this is the case and/or whether anyone was using this property of the offset attribute. If you have comments about this proposed change, it is probably best if you do it on the PR in order to keep the discussion all in the same place. Cheers, Lo?c From ralf.gommers at gmail.com Fri Jan 6 19:19:12 2017 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 7 Jan 2017 13:19:12 +1300 Subject: [Numpy-discussion] Deprecating matrices. In-Reply-To: <8865B9EF-5A73-4784-8478-0A254333BA2B@continuum.io> References: <8865B9EF-5A73-4784-8478-0A254333BA2B@continuum.io> Message-ID: On Wed, Jan 4, 2017 at 9:07 AM, Bryan Van de Ven wrote: > There's a good chance that bokeh.charts will be split off into a > separately distributed package as well. Hopefully being a much smaller, > pure Python project makes it a more accessible target for anyone interested > in maintaining it, and if no one is interested in it anymore, well that > fact becomes easier to judge. I think it would be a reasonable approach > here for the same reasons. > > Bryan > > On Jan 3, 2017, at 13:54, Benjamin Root wrote: > > That's not a bad idea. Matplotlib is currently considering something > similar for its mlab module. It has been there since the beginning, but it > is very outdated and very out-of-scope for matplotlib. However, there are > still lots of code out there that depends on it. So, we are looking to > split it off as its own package. The details still need to be worked out > (should we initially depend on the package and simply alias its import with > a DeprecationWarning, or should we go cold turkey and have a good message > explaining the change). > > Don't go cold turkey please, that still would break a lot of code. Even with a good message, breaking things isn't great. > > Ben Root > > > On Tue, Jan 3, 2017 at 2:31 PM, Todd wrote: > >> On Mon, Jan 2, 2017 at 8:36 PM, Charles R Harris < >> charlesr.harris at gmail.com> wrote: >> >>> Hi All, >>> >>> Just throwing this click bait out for discussion. Now that the `@` >>> operator is available and things seem to be moving towards Python 3, >>> especially in the classroom, we should consider the real possibility of >>> deprecating the matrix type and later removing it. No doubt there are old >>> scripts that require them, but older versions of numpy are available for >>> those who need to run old scripts. >>> >>> Thoughts? >>> >>> Chuck >>> >>> >> What if the matrix class was split out into its own project, perhaps as a >> scikit. >> > Something like "npmatrix" would be a better name, we'd like to keep scikit- for active well-maintained projects I'd think. > That way those who really need it can still use it. If there is >> sufficient desire for it, those who need it can maintain it. If not, it >> will hopefully it will take long enough for it to bitrot that everyone has >> transitioned. >> > This sounds like a reasonable idea. Timeline could be something like: 1. Now: create new package, deprecate np.matrix in docs. 2. In say 1.5 years: start issuing visible deprecation warnings in numpy 3. After 2020: remove matrix from numpy. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From perimosocordiae at gmail.com Fri Jan 6 20:21:36 2017 From: perimosocordiae at gmail.com (CJ Carey) Date: Fri, 6 Jan 2017 19:21:36 -0600 Subject: [Numpy-discussion] Deprecating matrices. In-Reply-To: References: <8865B9EF-5A73-4784-8478-0A254333BA2B@continuum.io> Message-ID: On Fri, Jan 6, 2017 at 6:19 PM, Ralf Gommers wrote: > This sounds like a reasonable idea. Timeline could be something like: > > 1. Now: create new package, deprecate np.matrix in docs. > 2. In say 1.5 years: start issuing visible deprecation warnings in numpy > 3. After 2020: remove matrix from numpy. > > Ralf > I think this sounds reasonable, and reminds me of the deliberate deprecation process taken for scipy.weave. I guess we'll see how successful it was when 0.19 is released. The major problem I have with removing numpy matrices is the effect on scipy.sparse, which mostly-consistently mimics numpy.matrix semantics and often produces numpy.matrix results when densifying. The two are coupled tightly enough that if numpy matrices go away, all of the existing sparse matrix classes will have to go at the same time. I don't think that would be the end of the world, but it's definitely something that should happen while scipy is still pre-1.0, if it's ever going to happen. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Fri Jan 6 20:28:48 2017 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 7 Jan 2017 14:28:48 +1300 Subject: [Numpy-discussion] Deprecating matrices. In-Reply-To: References: <8865B9EF-5A73-4784-8478-0A254333BA2B@continuum.io> Message-ID: On Sat, Jan 7, 2017 at 2:21 PM, CJ Carey wrote: > > On Fri, Jan 6, 2017 at 6:19 PM, Ralf Gommers > wrote: > >> This sounds like a reasonable idea. Timeline could be something like: >> >> 1. Now: create new package, deprecate np.matrix in docs. >> 2. In say 1.5 years: start issuing visible deprecation warnings in numpy >> 3. After 2020: remove matrix from numpy. >> >> Ralf >> > > I think this sounds reasonable, and reminds me of the deliberate > deprecation process taken for scipy.weave. I guess we'll see how successful > it was when 0.19 is released. > > The major problem I have with removing numpy matrices is the effect on > scipy.sparse, which mostly-consistently mimics numpy.matrix semantics and > often produces numpy.matrix results when densifying. The two are coupled > tightly enough that if numpy matrices go away, all of the existing sparse > matrix classes will have to go at the same time. > > I don't think that would be the end of the world, > Not the end of the world literally, but the impact would be pretty major. I think we're stuck with scipy.sparse, and may at some point will add a new sparse *array* implementation next to it. For scipy we will have to add a dependency on the new npmatrix package or vendor it. Ralf > but it's definitely something that should happen while scipy is still > pre-1.0, if it's ever going to happen. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Fri Jan 6 20:37:13 2017 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 6 Jan 2017 20:37:13 -0500 Subject: [Numpy-discussion] Deprecating matrices. In-Reply-To: References: <8865B9EF-5A73-4784-8478-0A254333BA2B@continuum.io> Message-ID: On Fri, Jan 6, 2017 at 8:28 PM, Ralf Gommers wrote: > > > On Sat, Jan 7, 2017 at 2:21 PM, CJ Carey > wrote: > >> >> On Fri, Jan 6, 2017 at 6:19 PM, Ralf Gommers >> wrote: >> >>> This sounds like a reasonable idea. Timeline could be something like: >>> >>> 1. Now: create new package, deprecate np.matrix in docs. >>> 2. In say 1.5 years: start issuing visible deprecation warnings in numpy >>> 3. After 2020: remove matrix from numpy. >>> >>> Ralf >>> >> >> I think this sounds reasonable, and reminds me of the deliberate >> deprecation process taken for scipy.weave. I guess we'll see how successful >> it was when 0.19 is released. >> >> The major problem I have with removing numpy matrices is the effect on >> scipy.sparse, which mostly-consistently mimics numpy.matrix semantics and >> often produces numpy.matrix results when densifying. The two are coupled >> tightly enough that if numpy matrices go away, all of the existing sparse >> matrix classes will have to go at the same time. >> >> I don't think that would be the end of the world, >> > > Not the end of the world literally, but the impact would be pretty major. > I think we're stuck with scipy.sparse, and may at some point will add a new > sparse *array* implementation next to it. For scipy we will have to add a > dependency on the new npmatrix package or vendor it. > That sounds to me like moving maintenance of numpy.matrix from numpy to scipy, if scipy.sparse is one of the main users and still depends on it. Josef > > Ralf > > > >> but it's definitely something that should happen while scipy is still >> pre-1.0, if it's ever going to happen. >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Jan 6 20:52:59 2017 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 6 Jan 2017 18:52:59 -0700 Subject: [Numpy-discussion] Deprecating matrices. In-Reply-To: References: <8865B9EF-5A73-4784-8478-0A254333BA2B@continuum.io> Message-ID: On Fri, Jan 6, 2017 at 6:37 PM, wrote: > > > > On Fri, Jan 6, 2017 at 8:28 PM, Ralf Gommers > wrote: > >> >> >> On Sat, Jan 7, 2017 at 2:21 PM, CJ Carey >> wrote: >> >>> >>> On Fri, Jan 6, 2017 at 6:19 PM, Ralf Gommers >>> wrote: >>> >>>> This sounds like a reasonable idea. Timeline could be something like: >>>> >>>> 1. Now: create new package, deprecate np.matrix in docs. >>>> 2. In say 1.5 years: start issuing visible deprecation warnings in numpy >>>> 3. After 2020: remove matrix from numpy. >>>> >>>> Ralf >>>> >>> >>> I think this sounds reasonable, and reminds me of the deliberate >>> deprecation process taken for scipy.weave. I guess we'll see how successful >>> it was when 0.19 is released. >>> >>> The major problem I have with removing numpy matrices is the effect on >>> scipy.sparse, which mostly-consistently mimics numpy.matrix semantics and >>> often produces numpy.matrix results when densifying. The two are coupled >>> tightly enough that if numpy matrices go away, all of the existing sparse >>> matrix classes will have to go at the same time. >>> >>> I don't think that would be the end of the world, >>> >> >> Not the end of the world literally, but the impact would be pretty major. >> I think we're stuck with scipy.sparse, and may at some point will add a new >> sparse *array* implementation next to it. For scipy we will have to add a >> dependency on the new npmatrix package or vendor it. >> > > That sounds to me like moving maintenance of numpy.matrix from numpy to > scipy, if scipy.sparse is one of the main users and still depends on it. > What I was thinking was encouraging folks to use `arr.dot(...)` or `@` instead of `*` for matrix multiplication, keeping `*` for scalar multiplication. If those operations were defined for matrices, then at some point sparse could go to arrays and it would not be noticeable except for the treatment of 1-D arrays -- which admittedly might be a bit tricky. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sat Jan 7 02:59:32 2017 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 7 Jan 2017 20:59:32 +1300 Subject: [Numpy-discussion] Deprecating matrices. In-Reply-To: References: <8865B9EF-5A73-4784-8478-0A254333BA2B@continuum.io> Message-ID: On Sat, Jan 7, 2017 at 2:52 PM, Charles R Harris wrote: > > > On Fri, Jan 6, 2017 at 6:37 PM, wrote: > >> >> >> >> On Fri, Jan 6, 2017 at 8:28 PM, Ralf Gommers >> wrote: >> >>> >>> >>> On Sat, Jan 7, 2017 at 2:21 PM, CJ Carey >>> wrote: >>> >>>> >>>> On Fri, Jan 6, 2017 at 6:19 PM, Ralf Gommers >>>> wrote: >>>> >>>>> This sounds like a reasonable idea. Timeline could be something like: >>>>> >>>>> 1. Now: create new package, deprecate np.matrix in docs. >>>>> 2. In say 1.5 years: start issuing visible deprecation warnings in >>>>> numpy >>>>> 3. After 2020: remove matrix from numpy. >>>>> >>>>> Ralf >>>>> >>>> >>>> I think this sounds reasonable, and reminds me of the deliberate >>>> deprecation process taken for scipy.weave. I guess we'll see how successful >>>> it was when 0.19 is released. >>>> >>>> The major problem I have with removing numpy matrices is the effect on >>>> scipy.sparse, which mostly-consistently mimics numpy.matrix semantics and >>>> often produces numpy.matrix results when densifying. The two are coupled >>>> tightly enough that if numpy matrices go away, all of the existing sparse >>>> matrix classes will have to go at the same time. >>>> >>>> I don't think that would be the end of the world, >>>> >>> >>> Not the end of the world literally, but the impact would be pretty >>> major. I think we're stuck with scipy.sparse, and may at some point will >>> add a new sparse *array* implementation next to it. For scipy we will have >>> to add a dependency on the new npmatrix package or vendor it. >>> >> >> That sounds to me like moving maintenance of numpy.matrix from numpy to >> scipy, if scipy.sparse is one of the main users and still depends on it. >> > Maintenance costs are pretty low, and are partly still for numpy (it has to keep subclasses like np.matrix working. I'm not too worried about the effort. The purpose here is to remove np.matrix from numpy so beginners will never see it. Educating sparse matrix users is a lot easier, and there are a lot less such users. > What I was thinking was encouraging folks to use `arr.dot(...)` or `@` > instead of `*` for matrix multiplication, keeping `*` for scalar > multiplication. > I don't think that change in behavior of `*` is doable. > If those operations were defined for matrices, > Why if? They are defined, and work as expected as far as I can tell. > then at some point sparse could go to arrays and it would not be > noticeable except for the treatment of 1-D arrays -- which admittedly might > be a bit tricky. > I'd like that to be feasible, but especially given that any such change would not break code but rather silently change numerical values, it's likely not a healthy idea. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sat Jan 7 03:39:43 2017 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 7 Jan 2017 00:39:43 -0800 Subject: [Numpy-discussion] Deprecating matrices. In-Reply-To: References: <8865B9EF-5A73-4784-8478-0A254333BA2B@continuum.io> Message-ID: On Fri, Jan 6, 2017 at 11:59 PM, Ralf Gommers wrote: > > > On Sat, Jan 7, 2017 at 2:52 PM, Charles R Harris > wrote: >> >> >> >> On Fri, Jan 6, 2017 at 6:37 PM, wrote: >>> >>> >>> >>> >>> On Fri, Jan 6, 2017 at 8:28 PM, Ralf Gommers >>> wrote: >>>> >>>> >>>> >>>> On Sat, Jan 7, 2017 at 2:21 PM, CJ Carey >>>> wrote: >>>>> >>>>> >>>>> On Fri, Jan 6, 2017 at 6:19 PM, Ralf Gommers >>>>> wrote: >>>>>> >>>>>> This sounds like a reasonable idea. Timeline could be something like: >>>>>> >>>>>> 1. Now: create new package, deprecate np.matrix in docs. >>>>>> 2. In say 1.5 years: start issuing visible deprecation warnings in >>>>>> numpy >>>>>> 3. After 2020: remove matrix from numpy. >>>>>> >>>>>> Ralf >>>>> >>>>> >>>>> I think this sounds reasonable, and reminds me of the deliberate >>>>> deprecation process taken for scipy.weave. I guess we'll see how successful >>>>> it was when 0.19 is released. >>>>> >>>>> The major problem I have with removing numpy matrices is the effect on >>>>> scipy.sparse, which mostly-consistently mimics numpy.matrix semantics and >>>>> often produces numpy.matrix results when densifying. The two are coupled >>>>> tightly enough that if numpy matrices go away, all of the existing sparse >>>>> matrix classes will have to go at the same time. >>>>> >>>>> I don't think that would be the end of the world, >>>> >>>> >>>> Not the end of the world literally, but the impact would be pretty >>>> major. I think we're stuck with scipy.sparse, and may at some point will add >>>> a new sparse *array* implementation next to it. For scipy we will have to >>>> add a dependency on the new npmatrix package or vendor it. >>> >>> >>> That sounds to me like moving maintenance of numpy.matrix from numpy to >>> scipy, if scipy.sparse is one of the main users and still depends on it. > > > Maintenance costs are pretty low, and are partly still for numpy (it has to > keep subclasses like np.matrix working. I'm not too worried about the > effort. The purpose here is to remove np.matrix from numpy so beginners will > never see it. Educating sparse matrix users is a lot easier, and there are a > lot less such users. > >> >> What I was thinking was encouraging folks to use `arr.dot(...)` or `@` >> instead of `*` for matrix multiplication, keeping `*` for scalar >> multiplication. > > > I don't think that change in behavior of `*` is doable. I guess it would be technically possible to have matrix.__mul__ issue a deprecation warning before matrix.__init__ does, to try and encourage people to switch to using .dot and/or @, and thus make it easier to later port their code to regular arrays? I'm not immediately seeing how this would help much though, since there would still be this second porting step required. Especially since there's still lots of room for things to break at that second step due to matrix's insistence that everything be 2d always, and my impression is that users are more annoyed by two-step migrations than one-step migrations. -n -- Nathaniel J. Smith -- https://vorpus.org From ralf.gommers at gmail.com Sat Jan 7 03:52:19 2017 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 7 Jan 2017 21:52:19 +1300 Subject: [Numpy-discussion] Deprecating matrices. In-Reply-To: References: <8865B9EF-5A73-4784-8478-0A254333BA2B@continuum.io> Message-ID: On Sat, Jan 7, 2017 at 9:39 PM, Nathaniel Smith wrote: > On Fri, Jan 6, 2017 at 11:59 PM, Ralf Gommers > wrote: > > > > > > On Sat, Jan 7, 2017 at 2:52 PM, Charles R Harris < > charlesr.harris at gmail.com> > > wrote: > >> > >> > >> > >> On Fri, Jan 6, 2017 at 6:37 PM, wrote: > >>> > >>> > >>> > >>> > >>> On Fri, Jan 6, 2017 at 8:28 PM, Ralf Gommers > >>> wrote: > >>>> > >>>> > >>>> > >>>> On Sat, Jan 7, 2017 at 2:21 PM, CJ Carey > >>>> wrote: > >>>>> > >>>>> > >>>>> On Fri, Jan 6, 2017 at 6:19 PM, Ralf Gommers > > >>>>> wrote: > >>>>>> > >>>>>> This sounds like a reasonable idea. Timeline could be something > like: > >>>>>> > >>>>>> 1. Now: create new package, deprecate np.matrix in docs. > >>>>>> 2. In say 1.5 years: start issuing visible deprecation warnings in > >>>>>> numpy > >>>>>> 3. After 2020: remove matrix from numpy. > >>>>>> > >>>>>> Ralf > >>>>> > >>>>> > >>>>> I think this sounds reasonable, and reminds me of the deliberate > >>>>> deprecation process taken for scipy.weave. I guess we'll see how > successful > >>>>> it was when 0.19 is released. > >>>>> > >>>>> The major problem I have with removing numpy matrices is the effect > on > >>>>> scipy.sparse, which mostly-consistently mimics numpy.matrix > semantics and > >>>>> often produces numpy.matrix results when densifying. The two are > coupled > >>>>> tightly enough that if numpy matrices go away, all of the existing > sparse > >>>>> matrix classes will have to go at the same time. > >>>>> > >>>>> I don't think that would be the end of the world, > >>>> > >>>> > >>>> Not the end of the world literally, but the impact would be pretty > >>>> major. I think we're stuck with scipy.sparse, and may at some point > will add > >>>> a new sparse *array* implementation next to it. For scipy we will > have to > >>>> add a dependency on the new npmatrix package or vendor it. > >>> > >>> > >>> That sounds to me like moving maintenance of numpy.matrix from numpy to > >>> scipy, if scipy.sparse is one of the main users and still depends on > it. > > > > > > Maintenance costs are pretty low, and are partly still for numpy (it has > to > > keep subclasses like np.matrix working. I'm not too worried about the > > effort. The purpose here is to remove np.matrix from numpy so beginners > will > > never see it. Educating sparse matrix users is a lot easier, and there > are a > > lot less such users. > > > >> > >> What I was thinking was encouraging folks to use `arr.dot(...)` or `@` > >> instead of `*` for matrix multiplication, keeping `*` for scalar > >> multiplication. > > > > > > I don't think that change in behavior of `*` is doable. > > I guess it would be technically possible to have matrix.__mul__ issue > a deprecation warning before matrix.__init__ does, to try and > encourage people to switch to using .dot and/or @, and thus make it > easier to later port their code to regular arrays? Yes, but that's not very relevant. I'm saying "not doable" since after the debacle with changing diag return to a view my understanding is we decided that it's a bad idea to make changes that don't break code but return different numerical results. There's no good way to work around that here. With something as widely used as np.matrix, you simply cannot rely on people porting code. You just need to phase out np.matrix in a way that breaks code but never changes behavior silently (even across multiple releases). Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From jni.soma at gmail.com Sat Jan 7 05:12:16 2017 From: jni.soma at gmail.com (Juan Nunez-Iglesias) Date: Sat, 7 Jan 2017 21:12:16 +1100 Subject: [Numpy-discussion] Deprecating matrices. In-Reply-To: References: <8865B9EF-5A73-4784-8478-0A254333BA2B@continuum.io> Message-ID: <2cd571f0-6391-46f3-92d2-1a0bc84e3466@Spark> Hi all! I've been lurking on this discussion, and don't have too much to add except to encourage a fast deprecation: I can't wait for sparse matrices to have an element-wise multiply operator. On 7 Jan 2017, 7:52 PM +1100, Ralf Gommers , wrote: > > > On Sat, Jan 7, 2017 at 9:39 PM, Nathaniel Smith wrote: > > On Fri, Jan 6, 2017 at 11:59 PM, Ralf Gommers wrote: > > > > > > > > > On Sat, Jan 7, 2017 at 2:52 PM, Charles R Harris > > > wrote: > > >> > > >> > > >> > > >> On Fri, Jan 6, 2017 at 6:37 PM, wrote: > > >>> > > >>> > > >>> > > >>> > > >>> On Fri, Jan 6, 2017 at 8:28 PM, Ralf Gommers > > >>> wrote: > > >>>> > > >>>> > > >>>> > > >>>> On Sat, Jan 7, 2017 at 2:21 PM, CJ Carey > > >>>> wrote: > > >>>>> > > >>>>> > > >>>>> On Fri, Jan 6, 2017 at 6:19 PM, Ralf Gommers > > >>>>> wrote: > > >>>>>> > > >>>>>> This sounds like a reasonable idea. Timeline could be something like: > > >>>>>> > > >>>>>> 1. Now: create new package, deprecate np.matrix in docs. > > >>>>>> 2. In say 1.5 years: start issuing visible deprecation warnings in > > >>>>>> numpy > > >>>>>> 3. After 2020: remove matrix from numpy. > > >>>>>> > > >>>>>> Ralf > > >>>>> > > >>>>> > > >>>>> I think this sounds reasonable, and reminds me of the deliberate > > >>>>> deprecation process taken for scipy.weave. I guess we'll see how successful > > >>>>> it was when 0.19 is released. > > >>>>> > > >>>>> The major problem I have with removing numpy matrices is the effect on > > >>>>> scipy.sparse, which mostly-consistently mimics numpy.matrix semantics and > > >>>>> often produces numpy.matrix results when densifying. The two are coupled > > >>>>> tightly enough that if numpy matrices go away, all of the existing sparse > > >>>>> matrix classes will have to go at the same time. > > >>>>> > > >>>>> I don't think that would be the end of the world, > > >>>> > > >>>> > > >>>> Not the end of the world literally, but the impact would be pretty > > >>>> major. I think we're stuck with scipy.sparse, and may at some point will add > > >>>> a new sparse *array* implementation next to it. For scipy we will have to > > >>>> add a dependency on the new npmatrix package or vendor it. > > >>> > > >>> > > >>> That sounds to me like moving maintenance of numpy.matrix from numpy to > > >>> scipy, if scipy.sparse is one of the main users and still depends on it. > > > > > > > > > Maintenance costs are pretty low, and are partly still for numpy (it has to > > > keep subclasses like np.matrix working. I'm not too worried about the > > > effort. The purpose here is to remove np.matrix from numpy so beginners will > > > never see it. Educating sparse matrix users is a lot easier, and there are a > > > lot less such users. > > > > > >> > > >> What I was thinking was encouraging folks to use `arr.dot(...)` or `@` > > >> instead of `*` for matrix multiplication, keeping `*` for scalar > > >> multiplication. > > > > > > > > > I don't think that change in behavior of `*` is doable. > > > > I guess it would be technically possible to have matrix.__mul__ issue > > a deprecation warning before matrix.__init__ does, to try and > > encourage people to switch to using .dot and/or @, and thus make it > > easier to later port their code to regular arrays? > > Yes, but that's not very relevant. I'm saying "not doable" since after the debacle with changing diag return to a view my understanding is we decided that it's a bad idea to make changes that don't break code but return different numerical results. There's no good way to work around that here. > > With something as widely used as np.matrix, you simply cannot rely on people porting code. You just need to phase out np.matrix in a way that breaks code but never changes behavior silently (even across multiple releases). > > Ralf > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.h.vankerkwijk at gmail.com Sat Jan 7 14:33:08 2017 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Sat, 7 Jan 2017 14:33:08 -0500 Subject: [Numpy-discussion] Deprecating matrices. In-Reply-To: <2cd571f0-6391-46f3-92d2-1a0bc84e3466@Spark> References: <8865B9EF-5A73-4784-8478-0A254333BA2B@continuum.io> <2cd571f0-6391-46f3-92d2-1a0bc84e3466@Spark> Message-ID: Hi All, It seems there are two steps that can be taken now and are needed no matter what: 1. Add numpy documentation describing the preferred way to handle matrices, extolling the virtues of @, and move np.matrix documentation to a deprecated section 2. Start on a new `sparse` class that is based on regular arrays (and uses `__array_func__` instead of prepare/wrap?). All the best, Marten From toddrjen at gmail.com Sat Jan 7 15:31:13 2017 From: toddrjen at gmail.com (Todd) Date: Sat, 7 Jan 2017 15:31:13 -0500 Subject: [Numpy-discussion] Deprecating matrices. In-Reply-To: References: <8865B9EF-5A73-4784-8478-0A254333BA2B@continuum.io> Message-ID: On Jan 6, 2017 20:28, "Ralf Gommers" wrote: On Sat, Jan 7, 2017 at 2:21 PM, CJ Carey wrote: > > On Fri, Jan 6, 2017 at 6:19 PM, Ralf Gommers > wrote: > >> This sounds like a reasonable idea. Timeline could be something like: >> >> 1. Now: create new package, deprecate np.matrix in docs. >> 2. In say 1.5 years: start issuing visible deprecation warnings in numpy >> 3. After 2020: remove matrix from numpy. >> >> Ralf >> > > I think this sounds reasonable, and reminds me of the deliberate > deprecation process taken for scipy.weave. I guess we'll see how successful > it was when 0.19 is released. > > The major problem I have with removing numpy matrices is the effect on > scipy.sparse, which mostly-consistently mimics numpy.matrix semantics and > often produces numpy.matrix results when densifying. The two are coupled > tightly enough that if numpy matrices go away, all of the existing sparse > matrix classes will have to go at the same time. > > I don't think that would be the end of the world, > Not the end of the world literally, but the impact would be pretty major. I think we're stuck with scipy.sparse, and may at some point will add a new sparse *array* implementation next to it. For scipy we will have to add a dependency on the new npmatrix package or vendor it. Ralf > but it's definitely something that should happen while scipy is still > pre-1.0, if it's ever going to happen. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion So what about this: 1. Create a sparse array class 2. (optional) Refactor the sparse matrix class to be based on the sparse array class (may not be feasible) 3. Copy the spare matrix class into the matrix package 4. Deprecate the scipy sparse matrix class 5. Remove the scipy sparse matrix class when the numpy matrix class I don't know about the timeline, but this would just need to be done by 2020. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sat Jan 7 16:22:48 2017 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sun, 8 Jan 2017 10:22:48 +1300 Subject: [Numpy-discussion] Deprecating matrices. In-Reply-To: References: <8865B9EF-5A73-4784-8478-0A254333BA2B@continuum.io> <2cd571f0-6391-46f3-92d2-1a0bc84e3466@Spark> Message-ID: On Sun, Jan 8, 2017 at 8:33 AM, Marten van Kerkwijk < m.h.vankerkwijk at gmail.com> wrote: > Hi All, > > It seems there are two steps that can be taken now and are needed no > matter what: > > 1. Add numpy documentation describing the preferred way to handle > matrices, extolling the virtues of @, and move np.matrix documentation > to a deprecated section > That would be good to do asap. Any volunteers? > > 2. Start on a new `sparse` class that is based on regular arrays There are two efforts that I know of in this direction: https://github.com/perimosocordiae/sparray https://github.com/ev-br/sparr (and > uses `__array_func__` instead of prepare/wrap?). > Getting __array_func__ finally into a released version of numpy will be a major improvement for sparse matrix behavior (like making np.dot(some_matrix) work) in itself. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sat Jan 7 16:29:15 2017 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sun, 8 Jan 2017 10:29:15 +1300 Subject: [Numpy-discussion] Deprecating matrices. In-Reply-To: References: <8865B9EF-5A73-4784-8478-0A254333BA2B@continuum.io> Message-ID: On Sun, Jan 8, 2017 at 9:31 AM, Todd wrote: > > > On Jan 6, 2017 20:28, "Ralf Gommers" wrote: > > > > On Sat, Jan 7, 2017 at 2:21 PM, CJ Carey > wrote: > >> >> On Fri, Jan 6, 2017 at 6:19 PM, Ralf Gommers >> wrote: >> >>> This sounds like a reasonable idea. Timeline could be something like: >>> >>> 1. Now: create new package, deprecate np.matrix in docs. >>> 2. In say 1.5 years: start issuing visible deprecation warnings in numpy >>> 3. After 2020: remove matrix from numpy. >>> >>> Ralf >>> >> >> I think this sounds reasonable, and reminds me of the deliberate >> deprecation process taken for scipy.weave. I guess we'll see how successful >> it was when 0.19 is released. >> >> The major problem I have with removing numpy matrices is the effect on >> scipy.sparse, which mostly-consistently mimics numpy.matrix semantics and >> often produces numpy.matrix results when densifying. The two are coupled >> tightly enough that if numpy matrices go away, all of the existing sparse >> matrix classes will have to go at the same time. >> >> I don't think that would be the end of the world, >> > > Not the end of the world literally, but the impact would be pretty major. > I think we're stuck with scipy.sparse, and may at some point will add a new > sparse *array* implementation next to it. For scipy we will have to add a > dependency on the new npmatrix package or vendor it. > > Ralf > > > >> but it's definitely something that should happen while scipy is still >> pre-1.0, if it's ever going to happen. >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > So what about this: > > 1. Create a sparse array class > 2. (optional) Refactor the sparse matrix class to be based on the sparse > array class (may not be feasible) > 3. Copy the spare matrix class into the matrix package > 4. Deprecate the scipy sparse matrix class > 5. Remove the scipy sparse matrix class when the numpy matrix class > It looks to me like we're getting a bit off track here. The sparse matrices in scipy are heavily used, and despite rough edges pretty good at what they do. Deprecating them is not a goal. The actual goal for the exercise that started this thread (at least as I see it) is to remove np.matrix from numpy itself so users (that don't know the difference) will only use ndarrays. And the few users that prefer np.matrix for teaching can now switch because of @, so their preference should have disappeared. To reach that goal, no deprecation or backwards incompatible changes to scipy.sparse are needed. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Jan 7 18:26:07 2017 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 7 Jan 2017 16:26:07 -0700 Subject: [Numpy-discussion] Deprecating matrices. In-Reply-To: References: <8865B9EF-5A73-4784-8478-0A254333BA2B@continuum.io> Message-ID: On Sat, Jan 7, 2017 at 2:29 PM, Ralf Gommers wrote: > > > On Sun, Jan 8, 2017 at 9:31 AM, Todd wrote: > >> >> >> On Jan 6, 2017 20:28, "Ralf Gommers" wrote: >> >> >> >> On Sat, Jan 7, 2017 at 2:21 PM, CJ Carey >> wrote: >> >>> >>> On Fri, Jan 6, 2017 at 6:19 PM, Ralf Gommers >>> wrote: >>> >>>> This sounds like a reasonable idea. Timeline could be something like: >>>> >>>> 1. Now: create new package, deprecate np.matrix in docs. >>>> 2. In say 1.5 years: start issuing visible deprecation warnings in numpy >>>> 3. After 2020: remove matrix from numpy. >>>> >>>> Ralf >>>> >>> >>> I think this sounds reasonable, and reminds me of the deliberate >>> deprecation process taken for scipy.weave. I guess we'll see how successful >>> it was when 0.19 is released. >>> >>> The major problem I have with removing numpy matrices is the effect on >>> scipy.sparse, which mostly-consistently mimics numpy.matrix semantics and >>> often produces numpy.matrix results when densifying. The two are coupled >>> tightly enough that if numpy matrices go away, all of the existing sparse >>> matrix classes will have to go at the same time. >>> >>> I don't think that would be the end of the world, >>> >> >> Not the end of the world literally, but the impact would be pretty major. >> I think we're stuck with scipy.sparse, and may at some point will add a new >> sparse *array* implementation next to it. For scipy we will have to add a >> dependency on the new npmatrix package or vendor it. >> >> Ralf >> >> >> >>> but it's definitely something that should happen while scipy is still >>> pre-1.0, if it's ever going to happen. >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> https://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> So what about this: >> >> 1. Create a sparse array class >> 2. (optional) Refactor the sparse matrix class to be based on the sparse >> array class (may not be feasible) >> 3. Copy the spare matrix class into the matrix package >> 4. Deprecate the scipy sparse matrix class >> 5. Remove the scipy sparse matrix class when the numpy matrix class >> > > It looks to me like we're getting a bit off track here. The sparse > matrices in scipy are heavily used, and despite rough edges pretty good at > what they do. Deprecating them is not a goal. > > The actual goal for the exercise that started this thread (at least as I > see it) is to remove np.matrix from numpy itself so users (that don't know > the difference) will only use ndarrays. And the few users that prefer > np.matrix for teaching can now switch because of @, so their preference > should have disappeared. > > To reach that goal, no deprecation or backwards incompatible changes to > scipy.sparse are needed. > What is the way forward with sparse? That looks like the biggest blocker on the road to a matrix free NumPy. I don't see moving the matrix package elsewhere as a solution for that. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sat Jan 7 18:35:32 2017 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sun, 8 Jan 2017 12:35:32 +1300 Subject: [Numpy-discussion] Deprecating matrices. In-Reply-To: References: <8865B9EF-5A73-4784-8478-0A254333BA2B@continuum.io> Message-ID: On Sun, Jan 8, 2017 at 12:26 PM, Charles R Harris wrote: > > > On Sat, Jan 7, 2017 at 2:29 PM, Ralf Gommers > wrote: > >> >> It looks to me like we're getting a bit off track here. The sparse >> matrices in scipy are heavily used, and despite rough edges pretty good at >> what they do. Deprecating them is not a goal. >> >> The actual goal for the exercise that started this thread (at least as I >> see it) is to remove np.matrix from numpy itself so users (that don't know >> the difference) will only use ndarrays. And the few users that prefer >> np.matrix for teaching can now switch because of @, so their preference >> should have disappeared. >> >> To reach that goal, no deprecation or backwards incompatible changes to >> scipy.sparse are needed. >> > > What is the way forward with sparse? That looks like the biggest blocker > on the road to a matrix free NumPy. I don't see moving the matrix package > elsewhere as a solution for that. > Why not? Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Jan 7 18:42:02 2017 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 7 Jan 2017 16:42:02 -0700 Subject: [Numpy-discussion] Deprecating matrices. In-Reply-To: References: <8865B9EF-5A73-4784-8478-0A254333BA2B@continuum.io> Message-ID: On Sat, Jan 7, 2017 at 4:35 PM, Ralf Gommers wrote: > > > On Sun, Jan 8, 2017 at 12:26 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Sat, Jan 7, 2017 at 2:29 PM, Ralf Gommers >> wrote: >> >>> >>> It looks to me like we're getting a bit off track here. The sparse >>> matrices in scipy are heavily used, and despite rough edges pretty good at >>> what they do. Deprecating them is not a goal. >>> >>> The actual goal for the exercise that started this thread (at least as I >>> see it) is to remove np.matrix from numpy itself so users (that don't know >>> the difference) will only use ndarrays. And the few users that prefer >>> np.matrix for teaching can now switch because of @, so their preference >>> should have disappeared. >>> >>> To reach that goal, no deprecation or backwards incompatible changes to >>> scipy.sparse are needed. >>> >> >> What is the way forward with sparse? That looks like the biggest blocker >> on the road to a matrix free NumPy. I don't see moving the matrix package >> elsewhere as a solution for that. >> > > Why not? > > Because it doesn't get rid of matrices in SciPy, not does one gain a scalar multiplication operator for sparse. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sat Jan 7 18:51:19 2017 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sun, 8 Jan 2017 12:51:19 +1300 Subject: [Numpy-discussion] Deprecating matrices. In-Reply-To: References: <8865B9EF-5A73-4784-8478-0A254333BA2B@continuum.io> Message-ID: On Sun, Jan 8, 2017 at 12:42 PM, Charles R Harris wrote: > > > On Sat, Jan 7, 2017 at 4:35 PM, Ralf Gommers > wrote: > >> >> >> On Sun, Jan 8, 2017 at 12:26 PM, Charles R Harris < >> charlesr.harris at gmail.com> wrote: >> >>> >>> >>> On Sat, Jan 7, 2017 at 2:29 PM, Ralf Gommers >>> wrote: >>> >>>> >>>> It looks to me like we're getting a bit off track here. The sparse >>>> matrices in scipy are heavily used, and despite rough edges pretty good at >>>> what they do. Deprecating them is not a goal. >>>> >>>> The actual goal for the exercise that started this thread (at least as >>>> I see it) is to remove np.matrix from numpy itself so users (that don't >>>> know the difference) will only use ndarrays. And the few users that prefer >>>> np.matrix for teaching can now switch because of @, so their preference >>>> should have disappeared. >>>> >>>> To reach that goal, no deprecation or backwards incompatible changes to >>>> scipy.sparse are needed. >>>> >>> >>> What is the way forward with sparse? That looks like the biggest blocker >>> on the road to a matrix free NumPy. I don't see moving the matrix package >>> elsewhere as a solution for that. >>> >> >> Why not? >> >> > Because it doesn't get rid of matrices in SciPy, not does one gain a > scalar multiplication operator for sparse. > That's a different goal though. You can reach the "get matrix out of numpy" goal fairly easily (docs and packaging work), but if you insist on coupling it to major changes to scipy.sparse (a lot more work + backwards compat break), then what will likely happen is: nothing. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Jan 7 19:24:06 2017 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 7 Jan 2017 17:24:06 -0700 Subject: [Numpy-discussion] Deprecating matrices. In-Reply-To: References: <8865B9EF-5A73-4784-8478-0A254333BA2B@continuum.io> Message-ID: On Sat, Jan 7, 2017 at 4:51 PM, Ralf Gommers wrote: > > > On Sun, Jan 8, 2017 at 12:42 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Sat, Jan 7, 2017 at 4:35 PM, Ralf Gommers >> wrote: >> >>> >>> >>> On Sun, Jan 8, 2017 at 12:26 PM, Charles R Harris < >>> charlesr.harris at gmail.com> wrote: >>> >>>> >>>> >>>> On Sat, Jan 7, 2017 at 2:29 PM, Ralf Gommers >>>> wrote: >>>> >>>>> >>>>> It looks to me like we're getting a bit off track here. The sparse >>>>> matrices in scipy are heavily used, and despite rough edges pretty good at >>>>> what they do. Deprecating them is not a goal. >>>>> >>>>> The actual goal for the exercise that started this thread (at least as >>>>> I see it) is to remove np.matrix from numpy itself so users (that don't >>>>> know the difference) will only use ndarrays. And the few users that prefer >>>>> np.matrix for teaching can now switch because of @, so their preference >>>>> should have disappeared. >>>>> >>>>> To reach that goal, no deprecation or backwards incompatible changes >>>>> to scipy.sparse are needed. >>>>> >>>> >>>> What is the way forward with sparse? That looks like the biggest >>>> blocker on the road to a matrix free NumPy. I don't see moving the matrix >>>> package elsewhere as a solution for that. >>>> >>> >>> Why not? >>> >>> >> Because it doesn't get rid of matrices in SciPy, not does one gain a >> scalar multiplication operator for sparse. >> > > That's a different goal though. You can reach the "get matrix out of > numpy" goal fairly easily (docs and packaging work), but if you insist on > coupling it to major changes to scipy.sparse (a lot more work + backwards > compat break), then what will likely happen is: nothing. > Could always remove matrix from the top level namespace and make it private. It still needs to reside someplace as long as sparse uses it. Fixing sparse is more work, but we have three years and it won't be getting any easier as time goes on. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From perimosocordiae at gmail.com Sat Jan 7 19:31:27 2017 From: perimosocordiae at gmail.com (CJ Carey) Date: Sat, 7 Jan 2017 18:31:27 -0600 Subject: [Numpy-discussion] Deprecating matrices. In-Reply-To: References: <8865B9EF-5A73-4784-8478-0A254333BA2B@continuum.io> Message-ID: I agree with Ralf; coupling these changes to sparse is a bad idea. I think that scipy.sparse will be an important consideration during the deprecation process, though, perhaps as an indicator of how painful the transition might be for third party code. I'm +1 for splitting matrices out into a standalone package. On Jan 7, 2017 5:51 PM, "Ralf Gommers" wrote: On Sun, Jan 8, 2017 at 12:42 PM, Charles R Harris wrote: > > > On Sat, Jan 7, 2017 at 4:35 PM, Ralf Gommers > wrote: > >> >> >> On Sun, Jan 8, 2017 at 12:26 PM, Charles R Harris < >> charlesr.harris at gmail.com> wrote: >> >>> >>> >>> On Sat, Jan 7, 2017 at 2:29 PM, Ralf Gommers >>> wrote: >>> >>>> >>>> It looks to me like we're getting a bit off track here. The sparse >>>> matrices in scipy are heavily used, and despite rough edges pretty good at >>>> what they do. Deprecating them is not a goal. >>>> >>>> The actual goal for the exercise that started this thread (at least as >>>> I see it) is to remove np.matrix from numpy itself so users (that don't >>>> know the difference) will only use ndarrays. And the few users that prefer >>>> np.matrix for teaching can now switch because of @, so their preference >>>> should have disappeared. >>>> >>>> To reach that goal, no deprecation or backwards incompatible changes to >>>> scipy.sparse are needed. >>>> >>> >>> What is the way forward with sparse? That looks like the biggest blocker >>> on the road to a matrix free NumPy. I don't see moving the matrix package >>> elsewhere as a solution for that. >>> >> >> Why not? >> >> > Because it doesn't get rid of matrices in SciPy, not does one gain a > scalar multiplication operator for sparse. > That's a different goal though. You can reach the "get matrix out of numpy" goal fairly easily (docs and packaging work), but if you insist on coupling it to major changes to scipy.sparse (a lot more work + backwards compat break), then what will likely happen is: nothing. Ralf _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Jan 7 20:09:03 2017 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 7 Jan 2017 18:09:03 -0700 Subject: [Numpy-discussion] Deprecating matrices. In-Reply-To: References: <8865B9EF-5A73-4784-8478-0A254333BA2B@continuum.io> Message-ID: On Sat, Jan 7, 2017 at 5:31 PM, CJ Carey wrote: > I agree with Ralf; coupling these changes to sparse is a bad idea. > > I think that scipy.sparse will be an important consideration during the > deprecation process, though, perhaps as an indicator of how painful the > transition might be for third party code. > > I'm +1 for splitting matrices out into a standalone package. > Decoupled or not, sparse still needs to be dealt with. What is the plan? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sat Jan 7 20:47:51 2017 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sun, 8 Jan 2017 14:47:51 +1300 Subject: [Numpy-discussion] Deprecating matrices. In-Reply-To: References: <8865B9EF-5A73-4784-8478-0A254333BA2B@continuum.io> Message-ID: On Sun, Jan 8, 2017 at 2:09 PM, Charles R Harris wrote: > > > On Sat, Jan 7, 2017 at 5:31 PM, CJ Carey > wrote: > >> I agree with Ralf; coupling these changes to sparse is a bad idea. >> >> I think that scipy.sparse will be an important consideration during the >> deprecation process, though, perhaps as an indicator of how painful the >> transition might be for third party code. >> >> I'm +1 for splitting matrices out into a standalone package. >> > > Decoupled or not, sparse still needs to be dealt with. What is the plan? > My view would be: - keep current sparse matrices as is (with improvements, like __numpy_func__ and the various performance improvements that regularly get done) - once one of the sparse *array* implementations progresses far enough, merge that and encourage people to switch over - in the far future, once packages like scikit-learn have switched to the new sparse arrays, the sparse matrices could potentially also be split off as a separate package, in the same way as we did for weave and now can do for npmatrix. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From perimosocordiae at gmail.com Sat Jan 7 23:00:25 2017 From: perimosocordiae at gmail.com (CJ Carey) Date: Sat, 7 Jan 2017 22:00:25 -0600 Subject: [Numpy-discussion] Deprecating matrices. In-Reply-To: References: <8865B9EF-5A73-4784-8478-0A254333BA2B@continuum.io> Message-ID: > Decoupled or not, sparse still needs to be dealt with. What is the plan? > My view would be: - keep current sparse matrices as is (with improvements, like __numpy_func__ and the various performance improvements that regularly get done) - once one of the sparse *array* implementations progresses far enough, merge that and encourage people to switch over - in the far future, once packages like scikit-learn have switched to the new sparse arrays, the sparse matrices could potentially also be split off as a separate package, in the same way as we did for weave and now can do for npmatrix. I think that's the best way forward as well. This can happen independently of numpy matrix changes, and doesn't leave users with silently broken code. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ilhanpolat at gmail.com Sun Jan 8 10:17:03 2017 From: ilhanpolat at gmail.com (Ilhan Polat) Date: Sun, 8 Jan 2017 16:17:03 +0100 Subject: [Numpy-discussion] Fwd: Backslash operator A\b and np/sp.linalg.solve In-Reply-To: References: Message-ID: Hi everyone, I was stalking the deprecating the numpy.matrix discussion on the other thread and I wondered maybe the mailing list is a better place for the discussion about something I've been meaning to ask the dev members. I thought mailing lists are something we dumped using together with ICQ and geocities stuff but apparently not :-) Anyways, first thing is first: I have been in need of the ill-conditioned warning behavior of matlab (and possibly other software suites) for my own work. So I looked around in the numpy issues and found https://github.com/numpy/numpy/issues/3755 some time ago. Then I've learned from @rkern that there were C translations involved in the numpy source and frankly I couldn't even find the entry point of how the project is structured so I've switched to SciPy side where things are a bit more convenient. Next to teaching me more about f2py machinery, I have noticed that the linear algebra module is a bit less competitive than the usual level of scipy though it is definitely a personal opinion. So in order to get the ill-conditioning (or at least the condition number) I've wrapped up a PR using the expert routines of LAPACK (which is I think ready to merge) but still it is far from the contemporary software convenience that you generally get. https://github.com/scipy/scipy/pull/6775 The "assume_a" keyword introduced here is hopefully modular enough that should there be any need for more structures we can simply keep adding to the list without any backwards compatibility. It will be at least offering more options than what we have currently. The part that I would like to discuss requires a bit of intro so please bear with me. Let me copy/paste the part from the old PR: Around many places online, we can witness the rant about numpy/scipy not letting the users know about the conditioning for example Mike Croucher's blog and numpy/numpy#3755 Since we don't have any central backslash function that optimizes depending on the input data, should we create a function, let's name it with the matlab equivalent for the time being linsolve such that it automatically calls for the right solver? This way, we can avoid writing new functions for each LAPACK driver . As a reference here is a SO thread that summarizes the linsolve functionality. I'm sure you are aware, but just for completeness, the linear equation solvers are often built around the concept of polyalgorithm which is a fancy way of saying that the array is tested consecutively for certain structures and the checks are ordered in such a way that the simpler structure is tested the sooner. E.g. first check for diagonal matrix, then for upper/lower triangular then permuted triangular then symmetrical and so on. Here is also another example from AdvanPix http://www.advanpix.com/2016/ 10/07/architecture-of-linear-systems-solver/ Now, according to what I have coded and optimized as much as I can, a pure Python is not acceptable as an overhead during these checks. It would definitely be a noticeable slowdown if this was in place in the existing linalg.solve however I think this is certainly doable in the low-level C/FORTRAN level. CPython is certainly faster but I think only a straight C/FORTRAN implementation would cut it. Note that we only need the discovery of the structure then we can pass to the dedicated solver as is. Hence I'm not saying that we should recode the existing solve functionality. We already have the branching in place to ?GE/SY/HE/POSVX routines. ------- The second issue about the current linalg.solve function is when trying to solve for right inverse e.g. xA = b. Again with some copy/paste: The right inversion is currently a bit annoying, that is to say if we would like to compute, say, BA^{-1}, then the user has to explicitly transpose the explicitly transposed equation to avoid using an explicit inv(whose use should be discouraged anyways) x = scipy.linalg.solve(A.T, B.T).T. Since expert drivers come with a trans switch that can internally handle whether to solve the transposed or the regular equation, these routines avoid the A.T off-the-shelf. I am wondering what might be the best way to add a "r_inv" keyword such that the B.T is also handled at the FORTRAN level instead such that the user can simply write "solve(A,B, r_inv=True)". Because we don't have a backslash operation we could at least provide this much as convenience I guess. I would love to have go at it but I'm definitely not competent enough in C/FORTRAN at the production level so I was wondering whether I could get some help about this. Anyways, I hope I could make my point with a rather lengthy post. Please let me know if this is a plausible feature ilhan PS: In case gmail links won't be parsed, here are the inline links MC blog: http://www.walkingrandomly.com/?p=5092 SO thread : http://stackoverflow.com/questions/18553210/how-to- implement-matlabs-mldivide-a-k-a-the-backslash-operator linsolve/mldivide page : http://nl.mathworks.com/help/ matlab/ref/mldivide.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Mon Jan 9 05:33:56 2017 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Mon, 9 Jan 2017 23:33:56 +1300 Subject: [Numpy-discussion] Fwd: Backslash operator A\b and np/sp.linalg.solve In-Reply-To: References: Message-ID: On Mon, Jan 9, 2017 at 4:17 AM, Ilhan Polat wrote: > > Hi everyone, > > I was stalking the deprecating the numpy.matrix discussion on the other > thread and I wondered maybe the mailing list is a better place for the > discussion about something I've been meaning to ask the dev members. I > thought mailing lists are something we dumped using together with ICQ and > geocities stuff but apparently not :-) > > Anyways, first thing is first: I have been in need of the ill-conditioned > warning behavior of matlab (and possibly other software suites) for my own > work. So I looked around in the numpy issues and found > https://github.com/numpy/numpy/issues/3755 some time ago. Then I've > learned from @rkern that there were C translations involved in the numpy > source and frankly I couldn't even find the entry point of how the project > is structured so I've switched to SciPy side where things are a bit more > convenient. Next to teaching me more about f2py machinery, I have noticed > that the linear algebra module is a bit less competitive than the usual > level of scipy though it is definitely a personal opinion. > > So in order to get the ill-conditioning (or at least the condition number) > I've wrapped up a PR using the expert routines of LAPACK (which is I think > ready to merge) but still it is far from the contemporary software > convenience that you generally get. > > https://github.com/scipy/scipy/pull/6775 > > The "assume_a" keyword introduced here is hopefully modular enough that > should there be any need for more structures we can simply keep adding to > the list without any backwards compatibility. It will be at least offering > more options than what we have currently. The part that I would like to > discuss requires a bit of intro so please bear with me. Let me copy/paste > the part from the old PR: > > Around many places online, we can witness the rant about numpy/scipy not > letting the users know about the conditioning for example Mike Croucher's > blog and numpy/numpy#3755 > > > Since we don't have any central backslash function that optimizes > depending on the input data, should we create a function, let's name it > with the matlab equivalent for the time being linsolve such that it > automatically calls for the right solver? This way, we can avoid writing > new functions for each LAPACK driver . As a reference here is a SO thread > > that summarizes the linsolve > functionality. > Note that you're proposing a new scipy feature (right?) on the numpy list.... This sounds like a good idea to me. As a former heavy Matlab user I remember a lot of things to dislike, but "\" behavior was quite nice. > I'm sure you are aware, but just for completeness, the linear equation > solvers are often built around the concept of polyalgorithm which is a > fancy way of saying that the array is tested consecutively for certain > structures and the checks are ordered in such a way that the simpler > structure is tested the sooner. E.g. first check for diagonal matrix, then > for upper/lower triangular then permuted triangular then symmetrical and so > on. Here is also another example from AdvanPix > http://www.advanpix.com/2016/10/07/architecture-of-linear-systems-solver/ > > Now, according to what I have coded and optimized as much as I can, a pure > Python is not acceptable as an overhead during these checks. It would > definitely be a noticeable slowdown if this was in place in the existing > linalg.solve however I think this is certainly doable in the low-level > C/FORTRAN level. > How much is a noticeable slowdown? Note that we still have the current interfaces available for users that know what they need, so a nice convenience function that is say 5-10% slower would not be the end of the world. Ralf > CPython is certainly faster but I think only a straight C/FORTRAN > implementation would cut it. Note that we only need the discovery of the > structure then we can pass to the dedicated solver as is. Hence I'm not > saying that we should recode the existing solve functionality. We already > have the branching in place to ?GE/SY/HE/POSVX routines. > > ------- > > The second issue about the current linalg.solve function is when trying to > solve for right inverse e.g. xA = b. Again with some copy/paste: The right > inversion is currently a bit annoying, that is to say if we would like to > compute, say, BA^{-1}, then the user has to explicitly transpose the > explicitly transposed equation to avoid using an explicit inv(whose use > should be discouraged anyways) > x = scipy.linalg.solve(A.T, B.T).T. > > Since expert drivers come with a trans switch that can internally handle > whether to solve the transposed or the regular equation, these routines > avoid the A.T off-the-shelf. I am wondering what might be the best way to > add a "r_inv" keyword such that the B.T is also handled at the FORTRAN > level instead such that the user can simply write "solve(A,B, r_inv=True)". > Because we don't have a backslash operation we could at least provide this > much as convenience I guess. > > I would love to have go at it but I'm definitely not competent enough in > C/FORTRAN at the production level so I was wondering whether I could get > some help about this. Anyways, I hope I could make my point with a rather > lengthy post. Please let me know if this is a plausible feature > > ilhan > > PS: In case gmail links won't be parsed, here are the inline links > > MC blog: http://www.walkingrandomly.com/?p=5092 > SO thread : http://stackoverflow.com/questions/18553210/how-to-implement > -matlabs-mldivide-a-k-a-the-backslash-operator > linsolve/mldivide page : http://nl.mathworks.com/help/m > atlab/ref/mldivide.html > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ilhanpolat at gmail.com Mon Jan 9 06:27:57 2017 From: ilhanpolat at gmail.com (Ilhan Polat) Date: Mon, 9 Jan 2017 12:27:57 +0100 Subject: [Numpy-discussion] Fwd: Backslash operator A\b and np/sp.linalg.solve In-Reply-To: References: Message-ID: > Note that you're proposing a new scipy feature (right?) on the numpy list.... > This sounds like a good idea to me. As a former heavy Matlab user I remember a lot of things to dislike, but "\" behavior was quite nice. Correct, I am not sure where this might go in. It seemed like a NumPy array operation (touching array elements rapidly etc. can also be added for similar functionalities other than solve) hence the NumPy list. But of course it can be pushed as an exclusive SciPy feature. I'm not sure what the outlook on np.linalg.solve is. > How much is a noticeable slowdown? Note that we still have the current interfaces available for users that know what they need, so a nice convenience function that is say 5-10% slower would not be the end of the world. the fastest case was around 150-400% slower but of course it might be the case that I'm not using the fastest methods. It was mostly shuffling things around and using np.any on them in the pure python3 case. I will cook up something again for the baseline as soon as I have time. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bryanv at continuum.io Mon Jan 9 14:08:28 2017 From: bryanv at continuum.io (Bryan Van de Ven) Date: Mon, 9 Jan 2017 13:08:28 -0600 Subject: [Numpy-discussion] ANN: Bokeh 0.12.4 Released Message-ID: Hi all, On behalf of the Bokeh team, I am pleased to announce the release of version 0.12.4 of Bokeh! Please see the announcement post at: https://bokeh.github.io/blog/2017/1/6/release-0-12-4/ which has more information as well as live demonstrations. If you are using Anaconda/miniconda, you can install it with conda: conda install -c bokeh bokeh Alternatively, you can also install it with pip: pip install bokeh Full information including details about how to use and obtain BokehJS are at: http://bokeh.pydata.org/en/0.12.4/docs/installation.html Issues, enhancement requests, and pull requests can be made on the Bokeh Github page: https://github.com/bokeh/bokeh Documentation is available at http://bokeh.pydata.org/en/0.12.4 There are over 200 total contributors to Bokeh and their time and effort help make Bokeh such an amazing project and community. Thank you again for your contributions. Finally (as always), for questions, technical assistance or if you're interested in contributing, questions can be directed to the Bokeh mailing list: bokeh at continuum.io or the Gitter Chat room: https://gitter.im/bokeh/bokeh Thanks, Bryan Van de Ven From josef.pktd at gmail.com Mon Jan 9 14:30:20 2017 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 9 Jan 2017 14:30:20 -0500 Subject: [Numpy-discussion] Fwd: Backslash operator A\b and np/sp.linalg.solve In-Reply-To: References: Message-ID: On Mon, Jan 9, 2017 at 6:27 AM, Ilhan Polat wrote: > > Note that you're proposing a new scipy feature (right?) on the numpy > list.... > > > This sounds like a good idea to me. As a former heavy Matlab user I > remember a lot of things to dislike, but "\" behavior was quite nice. > > Correct, I am not sure where this might go in. It seemed like a NumPy > array operation (touching array elements rapidly etc. can also be added for > similar functionalities other than solve) hence the NumPy list. But of > course it can be pushed as an exclusive SciPy feature. I'm not sure what > the outlook on np.linalg.solve is. > > > > How much is a noticeable slowdown? Note that we still have the current > interfaces available for users that know what they need, so a nice > convenience function that is say 5-10% slower would not be the end of the > world. > > the fastest case was around 150-400% slower but of course it might be the > case that I'm not using the fastest methods. It was mostly shuffling things > around and using np.any on them in the pure python3 case. I will cook up > something again for the baseline as soon as I have time. > > > All this checks sound a bit expensive, if we have almost always completely unstructured arrays that don't satisfy any special matrix pattern. In analogy to the type proliferation in Julia to handle those cases: Is there a way to attach information to numpy arrays that for example signals that a 2d array is hermitian, banded or diagonal or ...? (After second thought: maybe completely unstructured is not too expensive to detect if the checks are short-circuited, one off diagonal element nonzero - not diagonal, two opposite diagonal different - not symmetric, ...) Josef > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ilhanpolat at gmail.com Mon Jan 9 20:09:48 2017 From: ilhanpolat at gmail.com (Ilhan Polat) Date: Tue, 10 Jan 2017 02:09:48 +0100 Subject: [Numpy-discussion] Fwd: Backslash operator A\b and np/sp.linalg.solve In-Reply-To: References: Message-ID: Indeed, generic is the cheapest discovery including the worst case that only the last off-diagonal element is nonzero, a pseudo code is first remove the diagonals check the remaining parts for nonzero, then check the upper triangle then lower, then morally triangularness from zero structure if any then bandedness and so on. If you have access to matlab, then you can set the sparse monitor to verbose mode " spparms('spumoni', 1) " and perform a backslash operation on sparse matrices. It will spit out what it does during the checks. A = sparse([0 2 0 1 0; 4 -1 -1 0 0; 0 0 0 3 -6; -2 0 0 0 2; 0 0 4 2 0]); B = sparse([8; -1; -18; 8; 20]); spparms('spumoni',1) x = A\B So every test in the polyalgorithm is cheaper than the next one. I'm not exactly sure what might be the best strategy yet hence the question. It's really interesting that LAPACK doesn't have this type of fast checks. On Mon, Jan 9, 2017 at 8:30 PM, wrote: > > > On Mon, Jan 9, 2017 at 6:27 AM, Ilhan Polat wrote: > >> > Note that you're proposing a new scipy feature (right?) on the numpy >> list.... >> >> > This sounds like a good idea to me. As a former heavy Matlab user I >> remember a lot of things to dislike, but "\" behavior was quite nice. >> >> Correct, I am not sure where this might go in. It seemed like a NumPy >> array operation (touching array elements rapidly etc. can also be added for >> similar functionalities other than solve) hence the NumPy list. But of >> course it can be pushed as an exclusive SciPy feature. I'm not sure what >> the outlook on np.linalg.solve is. >> >> >> > How much is a noticeable slowdown? Note that we still have the current >> interfaces available for users that know what they need, so a nice >> convenience function that is say 5-10% slower would not be the end of the >> world. >> >> the fastest case was around 150-400% slower but of course it might be the >> case that I'm not using the fastest methods. It was mostly shuffling things >> around and using np.any on them in the pure python3 case. I will cook up >> something again for the baseline as soon as I have time. >> >> >> > All this checks sound a bit expensive, if we have almost always completely > unstructured arrays that don't satisfy any special matrix pattern. > > In analogy to the type proliferation in Julia to handle those cases: Is > there a way to attach information to numpy arrays that for example signals > that a 2d array is hermitian, banded or diagonal or ...? > > (After second thought: maybe completely unstructured is not too expensive > to detect if the checks are short-circuited, one off diagonal element > nonzero - not diagonal, two opposite diagonal different - not symmetric, > ...) > > Josef > > > > >> >> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Mon Jan 9 20:29:25 2017 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 9 Jan 2017 17:29:25 -0800 Subject: [Numpy-discussion] Fwd: Backslash operator A\b and np/sp.linalg.solve In-Reply-To: References: Message-ID: On Mon, Jan 9, 2017 at 5:09 PM, Ilhan Polat wrote: > So every test in the polyalgorithm is cheaper than the next one. I'm not exactly sure what might be the best strategy yet hence the question. It's really interesting that LAPACK doesn't have this type of fast checks. In Fortran LAPACK, if you have a special structured matrix, you usually explicitly use packed storage and call the appropriate function type on it. It's only when you go to a system that only has a generic, unstructured dense matrix data type that it makes sense to do those kinds of checks. -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From harrigan.matthew at gmail.com Mon Jan 9 20:55:03 2017 From: harrigan.matthew at gmail.com (Matthew Harrigan) Date: Mon, 9 Jan 2017 20:55:03 -0500 Subject: [Numpy-discussion] From Python to Numpy In-Reply-To: <4C818029-F4B8-4893-8E3E-42C24221EC49@inria.fr> References: <4C818029-F4B8-4893-8E3E-42C24221EC49@inria.fr> Message-ID: I also have been stalking this email thread. First, excellent book! Regarding the vectorization example mentioned above, one thing to note is that it increases the order of the algorithm relative to the pure python. The vectorized approach uses correlate, which requires ~(len(seq) * len(sub)) FLOPs. In the case where the first element in sub is not equal to the vast majority of elements in seq, the basic approach requires ~len(seq) comparisons. Note that is the case in the SO answer. One fairly common thing I have seen in vectorized approaches is that the memory or operations required scales worse than strictly required. It may or may not be an issue, largely depends on the specifics of how its used, but it usually indicates a better approach exists. That may be worth mentioning here. Given that, I tried to come up with an "ideal" approach. stride_tricks can be used to convert seq to a 2D array, and then ideally each row could be compared to sub. However I can't think of how to do that with numpy function calls other than compare each element in the 2D array, requiring O(n_sub*n_seq) operations again. array_equal is an example of that. Python list equality scales better, for instance if x = [0]*n and y = [1]*n, x == y is very fast and the time is independent of the value of n. It seems a generalized ufunc "all_equal" with signature (i),(i)->() and short circuit logic once the first non equal element is encountered would be an important performance improvement. In the ideal case it is dramatically faster, and even if every element must be compared then its still probably meaningfully faster since the boolean intermediate array isn't created. Even better would be to get the axis argument in place for generalized ufuncs. Then this problem could be vectorized in one line with far better performance. If others think this is a good idea I will post an issue and attempt a solution. On Sat, Dec 31, 2016 at 5:23 AM, Nicolas P. Rougier < Nicolas.Rougier at inria.fr> wrote: > > > I?ve seen vectorisation taken to the extreme, with negative consequences > in terms of both speed and readability, in both Python and MATLAB > codebases, so I would suggest some discussion / wisdom about when not to > vectorise. > > > I agree and there is actually a warning in the introduction about > readability vs speed with an example showing a clever optimization (by > Jaime Fern?ndez del R?o) that is hardly readable for the non-experts > (including myself). > > > Nicolas > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ilhanpolat at gmail.com Mon Jan 9 22:10:13 2017 From: ilhanpolat at gmail.com (Ilhan Polat) Date: Tue, 10 Jan 2017 04:10:13 +0100 Subject: [Numpy-discussion] Fwd: Backslash operator A\b and np/sp.linalg.solve In-Reply-To: References: Message-ID: Yes, that's precisely the case but when we know the structure we can just choose the appropriate solver anyhow with a little bit of overhead. What I mean is that, to my knowledge, FORTRAN routines for checking for triangularness etc. are absent. On Tue, Jan 10, 2017 at 2:29 AM, Robert Kern wrote: > On Mon, Jan 9, 2017 at 5:09 PM, Ilhan Polat wrote: > > > So every test in the polyalgorithm is cheaper than the next one. I'm not > exactly sure what might be the best strategy yet hence the question. It's > really interesting that LAPACK doesn't have this type of fast checks. > > In Fortran LAPACK, if you have a special structured matrix, you usually > explicitly use packed storage and call the appropriate function type on it. > It's only when you go to a system that only has a generic, unstructured > dense matrix data type that it makes sense to do those kinds of checks. > > -- > Robert Kern > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Mon Jan 9 22:16:33 2017 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 9 Jan 2017 19:16:33 -0800 Subject: [Numpy-discussion] Fwd: Backslash operator A\b and np/sp.linalg.solve In-Reply-To: References: Message-ID: On Mon, Jan 9, 2017 at 7:10 PM, Ilhan Polat wrote: > > Yes, that's precisely the case but when we know the structure we can just choose the appropriate solver anyhow with a little bit of overhead. What I mean is that, to my knowledge, FORTRAN routines for checking for triangularness etc. are absent. I'm responding to that. The reason that they don't have those FORTRAN routines for testing for structure inside of a generic dense matrix is that in FORTRAN it's more natural (and efficient) to just use the explicit packed structure and associated routines instead. You would only use a generic dense matrix if you know that there isn't structure in the matrix. So there are no routines for detecting that structure in generic dense matrices. -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From ilhanpolat at gmail.com Tue Jan 10 05:58:17 2017 From: ilhanpolat at gmail.com (Ilhan Polat) Date: Tue, 10 Jan 2017 11:58:17 +0100 Subject: [Numpy-discussion] Fwd: Backslash operator A\b and np/sp.linalg.solve In-Reply-To: References: Message-ID: I've done some benchmarking and it seems that the packed storage comes with a runtime penalty which agrees with a few links I've found online https://blog.debroglie.net/2013/09/01/lapack-and-packed-storage/ http://stackoverflow.com/questions/8941678/lapack-are-operations-on-packed-storage-matrices-faster The access of individual elements in packed stored matrices is expected to be more costly than in full storage, because of the more complicated indexing necessary. Hence, I am not sure if this justifies the absence just by having a dedicated solver for a prescribed structure. Existence of these polyalgorithms in matlab and not having in lapack should not imply FORTRAN users always know the structure in their matrices. I will also ask in LAPACK message board about this for some context. But thanks tough. As usual there is more to it than meets the eye probably, ilhan On Tue, Jan 10, 2017 at 4:16 AM, Robert Kern wrote: > On Mon, Jan 9, 2017 at 7:10 PM, Ilhan Polat wrote: > > > > Yes, that's precisely the case but when we know the structure we can > just choose the appropriate solver anyhow with a little bit of overhead. > What I mean is that, to my knowledge, FORTRAN routines for checking for > triangularness etc. are absent. > > I'm responding to that. The reason that they don't have those FORTRAN > routines for testing for structure inside of a generic dense matrix is that > in FORTRAN it's more natural (and efficient) to just use the explicit > packed structure and associated routines instead. You would only use a > generic dense matrix if you know that there isn't structure in the matrix. > So there are no routines for detecting that structure in generic dense > matrices. > > -- > Robert Kern > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From perimosocordiae at gmail.com Tue Jan 10 12:26:35 2017 From: perimosocordiae at gmail.com (CJ Carey) Date: Tue, 10 Jan 2017 11:26:35 -0600 Subject: [Numpy-discussion] Fwd: Backslash operator A\b and np/sp.linalg.solve In-Reply-To: References: Message-ID: I agree that this seems more like a scipy feature than a numpy feature. Users with structured matrices often use a sparse matrix format, though the API for using them in solvers could use some work. (I have a work-in-progress PR along those lines here: https://github.com/scipy/scipy/pull/6331) Perhaps this polyalgorithm approach could be used to dispatch sparse matrices to the appropriate solver, while optionally checking dense matrices for structure before dispatching them as well. Usage might look like: # if A is sparse, use scipy.sparse.linalg.solve, otherwise use scipy.linalg.solve scipy.linalg.generic_solve(A, b) # converts A to banded representation and calls scipy.linalg.solveh_banded, regardless if A is sparse or dense scipy.linalg.generic_solve(A, b, symmetric=True, banded=(-5, 5)) # runs possibly-expensive checks, then dispatches to the appropriate solver scipy.linalg.generic_solve(A, b, detect_structure=True) (I'm not advocating for "generic_solve" as the final name, I just needed a placeholder.) On Tue, Jan 10, 2017 at 4:58 AM, Ilhan Polat wrote: > I've done some benchmarking and it seems that the packed storage comes > with a runtime penalty which agrees with a few links I've found online > > https://blog.debroglie.net/2013/09/01/lapack-and-packed-storage/ > http://stackoverflow.com/questions/8941678/lapack-are- > operations-on-packed-storage-matrices-faster > > The access of individual elements in packed stored matrices is expected to > be more costly than in full storage, because of the more complicated > indexing necessary. Hence, I am not sure if this justifies the absence just > by having a dedicated solver for a prescribed structure. > > Existence of these polyalgorithms in matlab and not having in lapack > should not imply FORTRAN users always know the structure in their matrices. > I will also ask in LAPACK message board about this for some context. > > But thanks tough. As usual there is more to it than meets the eye > probably, > ilhan > > > > > On Tue, Jan 10, 2017 at 4:16 AM, Robert Kern > wrote: > >> On Mon, Jan 9, 2017 at 7:10 PM, Ilhan Polat wrote: >> > >> > Yes, that's precisely the case but when we know the structure we can >> just choose the appropriate solver anyhow with a little bit of overhead. >> What I mean is that, to my knowledge, FORTRAN routines for checking for >> triangularness etc. are absent. >> >> I'm responding to that. The reason that they don't have those FORTRAN >> routines for testing for structure inside of a generic dense matrix is that >> in FORTRAN it's more natural (and efficient) to just use the explicit >> packed structure and associated routines instead. You would only use a >> generic dense matrix if you know that there isn't structure in the matrix. >> So there are no routines for detecting that structure in generic dense >> matrices. >> >> -- >> Robert Kern >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From alebarde at gmail.com Tue Jan 10 19:28:27 2017 From: alebarde at gmail.com (alebarde at gmail.com) Date: Wed, 11 Jan 2017 01:28:27 +0100 Subject: [Numpy-discussion] making np.gradient support unevenly spaced data Message-ID: Hi all, I have implemented a proposed enhancement for the np.gradient function that allows to compute the gradient on non uniform grids. (PR: https://github.com/numpy/numpy/pull/8446) The proposed implementation has a behaviour/signature is similar to that of Matlab/Octave. As argument it can take: 1. A single scalar to specify a sample distance for all dimensions. 2. N scalars to specify a constant sample distance for each dimension. i.e. `dx`, `dy`, `dz`, ... 3. N arrays to specify the coordinates of the values along each dimension of F. The length of the array must match the size of the corresponding dimension 4. Any combination of N scalars/arrays with the meaning of 2. and 3. e.g., you can do the following: >>> f = np.array([[1, 2, 6], [3, 4, 5]], dtype=np.float) >>> dx = 2. >>> y = [1., 1.5, 3.5] >>> np.gradient(f, dx, y) [array([[ 1. , 1. , -0.5], [ 1. , 1. , -0.5]]), array([[ 2. , 2. , 2. ], [ 2. , 1.7, 0.5]])] It should not break any existing code since as of 1.12 only scalars or list of scalars are allowed. A possible alternative API could be pass arrays of sampling steps instead of the coordinates. On the one hand, this would have the advantage of having "differences" both in the scalar case and in the array case. On the other hand, if you are dealing with non uniformly-spaced data (e.g, data is mapped on a grid or it is a time-series), in most cases you already have the coordinates/timestamps. Therefore, in the case of difference as argument, you would almost always have a call np.diff before np.gradient. In the end, I would rather prefer the coordinates option since IMHO it is more handy, I don't think that would be too much "surprising" and it is what Matlab already does. Also, it could not easily lead to "silly" mistakes since the length have to match the size of the corresponding dimension. What do you think? Thanks Alessandro -- -------------------------------------------------------------------------- NOTICE: Dlgs 196/2003 this e-mail and any attachments thereto may contain confidential information and are intended for the sole use of the recipient(s) named above. If you are not the intended recipient of this message you are hereby notified that any dissemination or copying of this message is strictly prohibited. If you have received this e-mail in error, please notify the sender either by telephone or by e-mail and delete the material from any computer. Thank you. -------------------------------------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Tue Jan 10 20:27:07 2017 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Tue, 10 Jan 2017 17:27:07 -0800 Subject: [Numpy-discussion] From Python to Numpy In-Reply-To: References: <4C818029-F4B8-4893-8E3E-42C24221EC49@inria.fr> Message-ID: <4004103797063557632@unknownmsgid> > It seems a generalized ufunc "all_equal" with signature (i),(i)->() and short circuit logic once the first non equal element is encountered would be an important performance improvement. How does array_equal() perform? -CHB From harrigan.matthew at gmail.com Fri Jan 13 11:02:47 2017 From: harrigan.matthew at gmail.com (Matthew Harrigan) Date: Fri, 13 Jan 2017 11:02:47 -0500 Subject: [Numpy-discussion] From Python to Numpy In-Reply-To: <4004103797063557632@unknownmsgid> References: <4C818029-F4B8-4893-8E3E-42C24221EC49@inria.fr> <4004103797063557632@unknownmsgid> Message-ID: I coded up an all_equal gufunc here . Benchmark results are also in that repo. For the specific problem in the book which started this, its 40x faster than the optimized code in the book. For large arrays which have any early non equal element, its dramatically faster (1000x) than the current alternative. For large arrays which are all equal, its ~10% faster due to eliminating the intermediate boolean array. For tiny arrays its much faster due to a single function call instead of at least two, but its debatable how relevant speed is for tiny problems. Disclaimer: this is my first ufunc I have every written. On Tue, Jan 10, 2017 at 8:27 PM, Chris Barker - NOAA Federal < chris.barker at noaa.gov> wrote: > > It seems a generalized ufunc "all_equal" with signature (i),(i)->() and > short circuit logic once the first non equal element is encountered would > be an important performance improvement. > > How does array_equal() perform? > > -CHB > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sun Jan 15 18:43:41 2017 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 15 Jan 2017 16:43:41 -0700 Subject: [Numpy-discussion] NumPy 1.12.0 release Message-ID: Hi All, I'm pleased to announce the NumPy 1.12.0 release. This release supports Python 2.7 and 3.4-3.6. Wheels for all supported Python versions may be downloaded from PiPY , the tarball and zip files may be downloaded from Github . The release notes and files hashes may also be found at Github . NumPy 1.12.0rc 2 is the result of 418 pull requests submitted by 139 contributors and comprises a large number of fixes and improvements. Among the many improvements it is difficult to pick out just a few as standing above the others, but the following may be of particular interest or indicate areas likely to have future consequences. * Order of operations in ``np.einsum`` can now be optimized for large speed improvements. * New ``signature`` argument to ``np.vectorize`` for vectorizing with core dimensions. * The ``keepdims`` argument was added to many functions. * New context manager for testing warnings * Support for BLIS in numpy.distutils * Much improved support for PyPy (not yet finished) Enjoy, Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From tcaswell at gmail.com Mon Jan 16 01:00:34 2017 From: tcaswell at gmail.com (Thomas Caswell) Date: Mon, 16 Jan 2017 06:00:34 +0000 Subject: [Numpy-discussion] question about long doubles on ppc64el Message-ID: Folks, Over at h5py we are trying to get a release out and have discovered (via debian) that on ppc64el there is an apparent disagreement between the size of a native long double according to hdf5 and numpy. For all of the gorey details see: https://github.com/h5py/h5py/issues/817 . In short, `np.longdouble` seems to be `np.float128` and according to the docs should map to the native 'long double'. However, hdf5 provides a `H5T_NATIVE_LDOUBLE` which should also refer to the native 'long double', but seems to be a 64 bit float. Anyone on this list have a ppc64el machine (or experience with) that can provide some guidance here? Tom -------------- next part -------------- An HTML attachment was scrubbed... URL: From jenshnielsen at gmail.com Mon Jan 16 03:47:20 2017 From: jenshnielsen at gmail.com (Jens Nielsen) Date: Mon, 16 Jan 2017 08:47:20 +0000 Subject: [Numpy-discussion] question about long doubles on ppc64el In-Reply-To: References: Message-ID: According to https://docs.scipy.org/doc/numpy-dev/user/basics.types.html#extended-precision numpy long doubles are typically zero padded to 128 bits on 64 bit systems could that be the reason? On Mon, 16 Jan 2017 at 07:00 Thomas Caswell wrote: > Folks, > > Over at h5py we are trying to get a release out and have discovered (via > debian) that on ppc64el there is an apparent disagreement between the size > of a native long double according to hdf5 and numpy. > > For all of the gorey details see: https://github.com/h5py/h5py/issues/817 > . > > In short, `np.longdouble` seems to be `np.float128` and according to the > docs should map to the native 'long double'. However, hdf5 provides a > `H5T_NATIVE_LDOUBLE` which should also refer to the native 'long double', > but seems to be a 64 bit float. > > Anyone on this list have a ppc64el machine (or experience with) that can > provide some guidance here? > > Tom > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Mon Jan 16 04:42:06 2017 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Mon, 16 Jan 2017 22:42:06 +1300 Subject: [Numpy-discussion] NumPy 1.12.0 release In-Reply-To: References: Message-ID: On Mon, Jan 16, 2017 at 12:43 PM, Charles R Harris < charlesr.harris at gmail.com> wrote: > Hi All, > > I'm pleased to announce the NumPy 1.12.0 release. This release supports > Python 2.7 and 3.4-3.6. Wheels for all supported Python versions may be > downloaded from PiPY > , the tarball > and zip files may be downloaded from Github > . The release notes > and files hashes may also be found at Github > . > > NumPy 1.12.0rc 2 is the result of 418 pull requests submitted by 139 > contributors and comprises a large number of fixes and improvements. Among > the many improvements it is difficult to pick out just a few as standing > above the others, but the following may be of particular interest or > indicate areas likely to have future consequences. > > * Order of operations in ``np.einsum`` can now be optimized for large > speed improvements. > * New ``signature`` argument to ``np.vectorize`` for vectorizing with core > dimensions. > * The ``keepdims`` argument was added to many functions. > * New context manager for testing warnings > * Support for BLIS in numpy.distutils > * Much improved support for PyPy (not yet finished) > Thanks for all the heavy lifting on this one Chuck! Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From max_linke at gmx.de Mon Jan 16 08:38:11 2017 From: max_linke at gmx.de (Max Linke) Date: Mon, 16 Jan 2017 14:38:11 +0100 Subject: [Numpy-discussion] GSoC 2017: NumFocus will be an umbrella organization Message-ID: Hi Organizations can start submitting applications for Google Summer of Code 2017 on January 19 (and the deadline is February 9) https://developers.google.com/open-source/gsoc/timeline?hl=en NumFOCUS will be applying again this year. If you want to work with us please let me know and if you apply as an organization yourself or under a different umbrella organization please tell me as well. If you participate with us it would be great if you start to add possible projects to the ideas page on github soon. We some general information for mentors on github. https://github.com/numfocus/gsoc/blob/master/CONTRIBUTING-mentors.md We also have a template for ideas that might help. It lists the things Google likes to see. https://github.com/numfocus/gsoc/blob/master/2017/ideas-list-skeleton.md In case you participated in earlier years with NumFOCUS there are some small changes this year. Raniere won't be the admin this year. Instead I'm going to be the admin. We are also planning to include two explicit rules when a student should be failed, they have to communicate regularly and commit code into your development branch at the end of the summer. best, Max From charlesr.harris at gmail.com Mon Jan 16 10:47:19 2017 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 16 Jan 2017 08:47:19 -0700 Subject: [Numpy-discussion] question about long doubles on ppc64el In-Reply-To: References: Message-ID: On Sun, Jan 15, 2017 at 11:00 PM, Thomas Caswell wrote: > Folks, > > Over at h5py we are trying to get a release out and have discovered (via > debian) that on ppc64el there is an apparent disagreement between the size > of a native long double according to hdf5 and numpy. > > For all of the gorey details see: https://github.com/h5py/h5py/issues/817 > . > > In short, `np.longdouble` seems to be `np.float128` and according to the > docs should map to the native 'long double'. However, hdf5 provides a > `H5T_NATIVE_LDOUBLE` which should also refer to the native 'long double', > but seems to be a 64 bit float. > > Anyone on this list have a ppc64el machine (or experience with) that can > provide some guidance here? > I believe the ppc64 long double is IBM double double, i.e., two doubles for 128 bits. It isn't IEEE compliant and probably not very portable. It is possible that different compilers could treat it differently or it may be flagged to be treated in some specific way. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Mon Jan 16 11:55:59 2017 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 16 Jan 2017 08:55:59 -0800 Subject: [Numpy-discussion] question about long doubles on ppc64el In-Reply-To: References: Message-ID: Hi, On Sun, Jan 15, 2017 at 10:00 PM, Thomas Caswell wrote: > Folks, > > Over at h5py we are trying to get a release out and have discovered (via > debian) that on ppc64el there is an apparent disagreement between the size > of a native long double according to hdf5 and numpy. > > For all of the gorey details see: https://github.com/h5py/h5py/issues/817 . > > In short, `np.longdouble` seems to be `np.float128` and according to the > docs should map to the native 'long double'. However, hdf5 provides a > `H5T_NATIVE_LDOUBLE` which should also refer to the native 'long double', > but seems to be a 64 bit float. > > Anyone on this list have a ppc64el machine (or experience with) that can > provide some guidance here? I know that long double on numpy for the PPC on Mac G4 (power64 arch) is the twin double, as expected, so I'd be surprised if that wasn't true for numpy on ppc64el . Do you want a login for the G4 running Jessie? If so, send me your public key off-list? Cheers, Matthew From nyh at scylladb.com Tue Jan 17 04:55:58 2017 From: nyh at scylladb.com (Nadav Har'El) Date: Tue, 17 Jan 2017 11:55:58 +0200 Subject: [Numpy-discussion] Question about numpy.random.choice with probabilties Message-ID: Hi, I'm looking for a way to find a random sample of C different items out of N items, with a some desired probabilty Pi for each item i. I saw that numpy has a function that supposedly does this, numpy.random.choice (with replace=False and a probabilities array), but looking at the algorithm actually implemented, I am wondering in what sense are the probabilities Pi actually obeyed... To me, the code doesn't seem to be doing the right thing... Let me explain: Consider a simple numerical example: We have 3 items, and need to pick 2 different ones randomly. Let's assume the desired probabilities for item 1, 2 and 3 are: 0.2, 0.4 and 0.4. Working out the equations there is exactly one solution here: The random outcome of numpy.random.choice in this case should be [1,2] at probability 0.2, [1,3] at probabilty 0.2, and [2,3] at probability 0.6. That is indeed a solution for the desired probabilities because it yields item 1 in [1,2]+[1,3] = 0.2 + 0.2 = 2*P1 of the trials, item 2 in [1,2]+[2,3] = 0.2+0.6 = 0.8 = 2*P2, etc. However, the algorithm in numpy.random.choice's replace=False generates, if I understand correctly, different probabilities for the outcomes: I believe in this case it generates [1,2] at probability 0.23333, [1,3] also 0.2333, and [2,3] at probability 0.53333. My question is how does this result fit the desired probabilities? If we get [1,2] at probability 0.23333 and [1,3] at probability 0.2333, then the expect number of "1" results we'll get per drawing is 0.23333 + 0.2333 = 0.46666, and similarly for "2" the expected number 0.7666, and for "3" 0.76666. As you can see, the proportions are off: Item 2 is NOT twice common than item 1 as we originally desired (we asked for probabilities 0.2, 0.4, 0.4 for the individual items!). -- Nadav Har'El nyh at scylladb.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndbecker2 at gmail.com Tue Jan 17 08:56:42 2017 From: ndbecker2 at gmail.com (Neal Becker) Date: Tue, 17 Jan 2017 08:56:42 -0500 Subject: [Numpy-discussion] NumPy 1.12.0 release References: Message-ID: Charles R Harris wrote: > Hi All, > > I'm pleased to announce the NumPy 1.12.0 release. This release supports > Python 2.7 and 3.4-3.6. Wheels for all supported Python versions may be > downloaded from PiPY > , the tarball > and zip files may be downloaded from Github > . The release notes > and files hashes may also be found at Github > . > > NumPy 1.12.0rc 2 is the result of 418 pull requests submitted by 139 > contributors and comprises a large number of fixes and improvements. Among > the many improvements it is difficult to pick out just a few as standing > above the others, but the following may be of particular interest or > indicate areas likely to have future consequences. > > * Order of operations in ``np.einsum`` can now be optimized for large > speed improvements. > * New ``signature`` argument to ``np.vectorize`` for vectorizing with core > dimensions. > * The ``keepdims`` argument was added to many functions. > * New context manager for testing warnings > * Support for BLIS in numpy.distutils > * Much improved support for PyPy (not yet finished) > > Enjoy, > > Chuck I've installed via pip3 on linux x86_64, which gives me a wheel. My question is, am I loosing significant performance choosing this pre-built binary vs. compiling myself? For example, my processor might have some more features than the base version used to build wheels. From tcaswell at gmail.com Tue Jan 17 11:55:12 2017 From: tcaswell at gmail.com (Thomas Caswell) Date: Tue, 17 Jan 2017 16:55:12 +0000 Subject: [Numpy-discussion] [REL] matplotlib v2.0.0 Message-ID: Folks, We are happy to announce the release of (long delayed) matplotlib 2.0! This release completely overhauls the default style of the plots. The source tarball and wheels for Mac, Win, and manylinux for python 2.7, 3.4-3.6 are available on pypi pip install --upgrade matplotlib and conda packages for Mac, Win, linux for python 2.7, 3.4-3.6 are available from conda-forge conda install matplotlib -c conda-forge Highlights include: - 'viridis' is default color map instead of jet. - Modernized the default color cycle. - Many more functions respect the color cycle. - Line dash patterns scale with linewidth. - Change default font to DejaVu, now supports most Western alphabets (including Greek, Cyrillic and Latin with diacritics), math symbols and emoji out of the box. - Faster text rendering. - Improved auto-limits. - Ticks out and only on the right and bottom spines by default. - Improved auto-ticking, particularly for log scales and dates. - Improved image support (imshow respects scales and eliminated a class of artifacts). For a full list of the default changes (along with how to revert them) please see http://matplotlib.org/users/dflt_style_changes.html and http://matplotlib.org/users/whats_new.html#new-in-matplotlib-2-0. There were a number of small API changes documented at http://matplotlib.org/api/api_changes.html#api-changes-in-2-0-0 I would like to thank everyone who helped on this release in anyway. The people at 2015 scipy BOF where this got started, users who provided feedback and suggestions along the way, the beta-testers, Nathaniel, Stefan and Eric for the new color maps, and all of the documentation and code contributors. Please report any issues to matplotlib-users at python.org (will have to join to post un-moderated) or https://github.com/matplotlib/matplotlib/issues . Tom -------------- next part -------------- An HTML attachment was scrubbed... URL: From alebarde at gmail.com Tue Jan 17 12:18:59 2017 From: alebarde at gmail.com (alebarde at gmail.com) Date: Tue, 17 Jan 2017 18:18:59 +0100 Subject: [Numpy-discussion] Question about numpy.random.choice with probabilties Message-ID: Hi Nadav, I may be wrong, but I think that the result of the current implementation is actually the expected one. Using you example: probabilities for item 1, 2 and 3 are: 0.2, 0.4 and 0.4 P([1,2]) = P([2] | 1st=[1]) P([1]) + P([1] | 1st=[2]) P([2]) Now, P([1]) = 0.2 and P([2]) = 0.4. However: P([2] | 1st=[1]) = 0.5 (2 and 3 have the same sampling probability) P([1] | 1st=[2]) = 1/3 (1 and 3 have probability 0.2 and 0.4 that, once normalised, translate into 1/3 and 2/3 respectively) Therefore P([1,2]) = 0.7/3 = 0.23333 Similarly, P([1,3]) = 0.23333 and P([2,3]) = 1.6/3 = 0.533333 What am I missing? Alessandro 2017-01-17 13:00 GMT+01:00 : > Hi, I'm looking for a way to find a random sample of C different items out > of N items, with a some desired probabilty Pi for each item i. > > I saw that numpy has a function that supposedly does this, > numpy.random.choice (with replace=False and a probabilities array), but > looking at the algorithm actually implemented, I am wondering in what sense > are the probabilities Pi actually obeyed... > > To me, the code doesn't seem to be doing the right thing... Let me explain: > > Consider a simple numerical example: We have 3 items, and need to pick 2 > different ones randomly. Let's assume the desired probabilities for item 1, > 2 and 3 are: 0.2, 0.4 and 0.4. > > Working out the equations there is exactly one solution here: The random > outcome of numpy.random.choice in this case should be [1,2] at probability > 0.2, [1,3] at probabilty 0.2, and [2,3] at probability 0.6. That is indeed > a solution for the desired probabilities because it yields item 1 in > [1,2]+[1,3] = 0.2 + 0.2 = 2*P1 of the trials, item 2 in [1,2]+[2,3] = > 0.2+0.6 = 0.8 = 2*P2, etc. > > However, the algorithm in numpy.random.choice's replace=False generates, if > I understand correctly, different probabilities for the outcomes: I believe > in this case it generates [1,2] at probability 0.23333, [1,3] also 0.2333, > and [2,3] at probability 0.53333. > > My question is how does this result fit the desired probabilities? > > If we get [1,2] at probability 0.23333 and [1,3] at probability 0.2333, > then the expect number of "1" results we'll get per drawing is 0.23333 + > 0.2333 = 0.46666, and similarly for "2" the expected number 0.7666, and for > "3" 0.76666. As you can see, the proportions are off: Item 2 is NOT twice > common than item 1 as we originally desired (we asked for probabilities > 0.2, 0.4, 0.4 for the individual items!). > > > -- > Nadav Har'El > nyh at scylladb.com > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: attachments/20170117/d1f0a1db/attachment-0001.html> > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > ------------------------------ > > End of NumPy-Discussion Digest, Vol 124, Issue 24 > ************************************************* > -- -------------------------------------------------------------------------- NOTICE: Dlgs 196/2003 this e-mail and any attachments thereto may contain confidential information and are intended for the sole use of the recipient(s) named above. If you are not the intended recipient of this message you are hereby notified that any dissemination or copying of this message is strictly prohibited. If you have received this e-mail in error, please notify the sender either by telephone or by e-mail and delete the material from any computer. Thank you. -------------------------------------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Tue Jan 17 13:02:42 2017 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 17 Jan 2017 10:02:42 -0800 Subject: [Numpy-discussion] NumPy 1.12.0 release In-Reply-To: References: Message-ID: Hi, On Tue, Jan 17, 2017 at 5:56 AM, Neal Becker wrote: > Charles R Harris wrote: > >> Hi All, >> >> I'm pleased to announce the NumPy 1.12.0 release. This release supports >> Python 2.7 and 3.4-3.6. Wheels for all supported Python versions may be >> downloaded from PiPY >> , the tarball >> and zip files may be downloaded from Github >> . The release notes >> and files hashes may also be found at Github >> . >> >> NumPy 1.12.0rc 2 is the result of 418 pull requests submitted by 139 >> contributors and comprises a large number of fixes and improvements. Among >> the many improvements it is difficult to pick out just a few as standing >> above the others, but the following may be of particular interest or >> indicate areas likely to have future consequences. >> >> * Order of operations in ``np.einsum`` can now be optimized for large >> speed improvements. >> * New ``signature`` argument to ``np.vectorize`` for vectorizing with core >> dimensions. >> * The ``keepdims`` argument was added to many functions. >> * New context manager for testing warnings >> * Support for BLIS in numpy.distutils >> * Much improved support for PyPy (not yet finished) >> >> Enjoy, >> >> Chuck > > I've installed via pip3 on linux x86_64, which gives me a wheel. My > question is, am I loosing significant performance choosing this pre-built > binary vs. compiling myself? For example, my processor might have some more > features than the base version used to build wheels. I guess you are thinking about using this built wheel on some other machine? You'd have to be lucky for that to work; the wheel depends on the symbols it found at build time, which may not exist in the same places on your other machine. If it does work, the speed will primarily depend on your BLAS library. The pypi wheels should be pretty fast; they are built with OpenBLAS, which is at or near top of range for speed, across a range of platforms. Cheers, Matthew From nyh at scylladb.com Tue Jan 17 16:13:26 2017 From: nyh at scylladb.com (Nadav Har'El) Date: Tue, 17 Jan 2017 23:13:26 +0200 Subject: [Numpy-discussion] Question about numpy.random.choice with probabilties In-Reply-To: References: Message-ID: On Tue, Jan 17, 2017 at 7:18 PM, alebarde at gmail.com wrote: > Hi Nadav, > > I may be wrong, but I think that the result of the current implementation > is actually the expected one. > Using you example: probabilities for item 1, 2 and 3 are: 0.2, 0.4 and 0.4 > > P([1,2]) = P([2] | 1st=[1]) P([1]) + P([1] | 1st=[2]) P([2]) > Yes, this formula does fit well with the actual algorithm in the code. But, my question is *why* we want this formula to be correct: > Now, P([1]) = 0.2 and P([2]) = 0.4. However: > P([2] | 1st=[1]) = 0.5 (2 and 3 have the same sampling probability) > P([1] | 1st=[2]) = 1/3 (1 and 3 have probability 0.2 and 0.4 that, > once normalised, translate into 1/3 and 2/3 respectively) > Therefore P([1,2]) = 0.7/3 = 0.23333 > Similarly, P([1,3]) = 0.23333 and P([2,3]) = 1.6/3 = 0.533333 > Right, these are the numbers that the algorithm in the current code, and the formula above, produce: P([1,2]) = P([1,3]) = 0.23333 P([2,3]) = 0.53333 What I'm puzzled about is that these probabilities do not really fullfill the given probability vector 0.2, 0.4, 0.4... Let me try to explain explain: Why did the user choose the probabilities 0.2, 0.4, 0.4 for the three items in the first place? One reasonable interpretation is that the user wants in his random picks to see item 1 half the time of item 2 or 3. For example, maybe item 1 costs twice as much as item 2 or 3, so picking it half as often will result in an equal expenditure on each item. If the user randomly picks the items individually (a single item at a time), he indeed gets exactly this distribution: 0.2 of the time item 1, 0.4 of the time item 2, 0.4 of the time item 3. Now, what happens if he picks not individual items, but pairs of different items using numpy.random.choice with two items, replace=false? Suddenly, the distribution of the individual items in the results get skewed: If we look at the expected number of times we'll see each item in one draw of a random pair, we will get: E(1) = P([1,2]) + P([1,3]) = 0.46666 E(2) = P([1,2]) + P([2,3]) = 0.76666 E(3) = P([1,3]) + P([2,3]) = 0.76666 Or renormalizing by dividing by 2: P(1) = 0.233333 P(2) = 0.383333 P(3) = 0.383333 As you can see this is not quite the probabilities we wanted (which were 0.2, 0.4, 0.4)! In the random pairs we picked, item 1 was used a bit more often than we wanted, and item 2 and 3 were used a bit less often! So that brought my question of why we consider these numbers right. In this example, it's actually possible to get the right item distribution, if we pick the pair outcomes with the following probabilties: P([1,2]) = 0.2 (not 0.233333 as above) P([1,3]) = 0.2 P([2,3]) = 0.6 (not 0.533333 as above) Then, we get exactly the right P(1), P(2), P(3): 0.2, 0.4, 0.4 Interestingly, fixing things like I suggest is not always possible. Consider a different probability-vector example for three items - 0.99, 0.005, 0.005. Now, no matter which algorithm we use for randomly picking pairs from these three items, *each* returned pair will inevitably contain one of the two very-low-probability items, so each of those items will appear in roughly half the pairs, instead of in a vanishingly small percentage as we hoped. But in other choices of probabilities (like the one in my original example), there is a solution. For 2-out-of-3 sampling we can actually show a system of three linear equations in three variables, so there is always one solution but if this solution has components not valid as probabilities (not in [0,1]) we end up with no solution - as happens in the 0.99, 0.005, 0.005 example. > What am I missing? > > Alessandro > > > 2017-01-17 13:00 GMT+01:00 : > >> Hi, I'm looking for a way to find a random sample of C different items out >> of N items, with a some desired probabilty Pi for each item i. >> >> I saw that numpy has a function that supposedly does this, >> numpy.random.choice (with replace=False and a probabilities array), but >> looking at the algorithm actually implemented, I am wondering in what >> sense >> are the probabilities Pi actually obeyed... >> >> To me, the code doesn't seem to be doing the right thing... Let me >> explain: >> >> Consider a simple numerical example: We have 3 items, and need to pick 2 >> different ones randomly. Let's assume the desired probabilities for item >> 1, >> 2 and 3 are: 0.2, 0.4 and 0.4. >> >> Working out the equations there is exactly one solution here: The random >> outcome of numpy.random.choice in this case should be [1,2] at probability >> 0.2, [1,3] at probabilty 0.2, and [2,3] at probability 0.6. That is indeed >> a solution for the desired probabilities because it yields item 1 in >> [1,2]+[1,3] = 0.2 + 0.2 = 2*P1 of the trials, item 2 in [1,2]+[2,3] = >> 0.2+0.6 = 0.8 = 2*P2, etc. >> >> However, the algorithm in numpy.random.choice's replace=False generates, >> if >> I understand correctly, different probabilities for the outcomes: I >> believe >> in this case it generates [1,2] at probability 0.23333, [1,3] also 0.2333, >> and [2,3] at probability 0.53333. >> >> My question is how does this result fit the desired probabilities? >> >> If we get [1,2] at probability 0.23333 and [1,3] at probability 0.2333, >> then the expect number of "1" results we'll get per drawing is 0.23333 + >> 0.2333 = 0.46666, and similarly for "2" the expected number 0.7666, and >> for >> "3" 0.76666. As you can see, the proportions are off: Item 2 is NOT twice >> common than item 1 as we originally desired (we asked for probabilities >> 0.2, 0.4, 0.4 for the individual items!). >> >> >> -- >> Nadav Har'El >> nyh at scylladb.com >> -------------- next part -------------- >> An HTML attachment was scrubbed... >> URL: > ts/20170117/d1f0a1db/attachment-0001.html> >> >> ------------------------------ >> >> Subject: Digest Footer >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> ------------------------------ >> >> End of NumPy-Discussion Digest, Vol 124, Issue 24 >> ************************************************* >> > > > > -- > -------------------------------------------------------------------------- > NOTICE: Dlgs 196/2003 this e-mail and any attachments thereto may contain > confidential information and are intended for the sole use of the > recipient(s) named above. If you are not the intended recipient of this > message you are hereby notified that any dissemination or copying of this > message is strictly prohibited. If you have received this e-mail in error, > please notify the sender either by telephone or by e-mail and delete the > material from any computer. Thank you. > -------------------------------------------------------------------------- > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Tue Jan 17 17:25:39 2017 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 17 Jan 2017 17:25:39 -0500 Subject: [Numpy-discussion] Question about numpy.random.choice with probabilties In-Reply-To: References: Message-ID: On Tue, Jan 17, 2017 at 4:13 PM, Nadav Har'El wrote: > > > On Tue, Jan 17, 2017 at 7:18 PM, alebarde at gmail.com wrote: >> >> Hi Nadav, >> >> I may be wrong, but I think that the result of the current implementation is actually the expected one. >> Using you example: probabilities for item 1, 2 and 3 are: 0.2, 0.4 and 0.4 >> >> P([1,2]) = P([2] | 1st=[1]) P([1]) + P([1] | 1st=[2]) P([2]) > > > Yes, this formula does fit well with the actual algorithm in the code. But, my question is *why* we want this formula to be correct: > >> >> Now, P([1]) = 0.2 and P([2]) = 0.4. However: >> P([2] | 1st=[1]) = 0.5 (2 and 3 have the same sampling probability) >> P([1] | 1st=[2]) = 1/3 (1 and 3 have probability 0.2 and 0.4 that, once normalised, translate into 1/3 and 2/3 respectively) >> Therefore P([1,2]) = 0.7/3 = 0.23333 >> Similarly, P([1,3]) = 0.23333 and P([2,3]) = 1.6/3 = 0.533333 > > > Right, these are the numbers that the algorithm in the current code, and the formula above, produce: > > P([1,2]) = P([1,3]) = 0.23333 > P([2,3]) = 0.53333 > > What I'm puzzled about is that these probabilities do not really fullfill the given probability vector 0.2, 0.4, 0.4... > Let me try to explain explain: > > Why did the user choose the probabilities 0.2, 0.4, 0.4 for the three items in the first place? > > One reasonable interpretation is that the user wants in his random picks to see item 1 half the time of item 2 or 3. > For example, maybe item 1 costs twice as much as item 2 or 3, so picking it half as often will result in an equal expenditure on each item. > > If the user randomly picks the items individually (a single item at a time), he indeed gets exactly this distribution: 0.2 of the time item 1, 0.4 of the time item 2, 0.4 of the time item 3. > > Now, what happens if he picks not individual items, but pairs of different items using numpy.random.choice with two items, replace=false? > Suddenly, the distribution of the individual items in the results get skewed: If we look at the expected number of times we'll see each item in one draw of a random pair, we will get: > > E(1) = P([1,2]) + P([1,3]) = 0.46666 > E(2) = P([1,2]) + P([2,3]) = 0.76666 > E(3) = P([1,3]) + P([2,3]) = 0.76666 > > Or renormalizing by dividing by 2: > > P(1) = 0.233333 > P(2) = 0.383333 > P(3) = 0.383333 > > As you can see this is not quite the probabilities we wanted (which were 0.2, 0.4, 0.4)! In the random pairs we picked, item 1 was used a bit more often than we wanted, and item 2 and 3 were used a bit less often! > > So that brought my question of why we consider these numbers right. > > In this example, it's actually possible to get the right item distribution, if we pick the pair outcomes with the following probabilties: > > P([1,2]) = 0.2 (not 0.233333 as above) > P([1,3]) = 0.2 > P([2,3]) = 0.6 (not 0.533333 as above) > > Then, we get exactly the right P(1), P(2), P(3): 0.2, 0.4, 0.4 > > Interestingly, fixing things like I suggest is not always possible. Consider a different probability-vector example for three items - 0.99, 0.005, 0.005. Now, no matter which algorithm we use for randomly picking pairs from these three items, *each* returned pair will inevitably contain one of the two very-low-probability items, so each of those items will appear in roughly half the pairs, instead of in a vanishingly small percentage as we hoped. > > But in other choices of probabilities (like the one in my original example), there is a solution. For 2-out-of-3 sampling we can actually show a system of three linear equations in three variables, so there is always one solution but if this solution has components not valid as probabilities (not in [0,1]) we end up with no solution - as happens in the 0.99, 0.005, 0.005 example. I think the underlying problem is that in the sampling space the events (1, 2) (1, 3) (2, 3) are correlated and because of the discreteness an arbitrary marginal distribution on the individual events 1, 2, 3 is not possible. related aside: I'm not able (or willing to spend the time) on the math, but I just went through something similar for survey sampling in finite population (e.g. survey two out of 3 individuals, where 3 is the population), leading to the Horvitz?Thompson estimator. The books have chapters on different sampling schemes and derivation of the marginal and joint probability to be surveyed. (I gave up on sampling without replacement, and assume we have a large population where it doesn't make a difference.) In some of the sampling schemes they pick sequentially and adjust the probabilities for the remaining individuals. That seems to provide more flexibility to create a desired or optimal sampling scheme. Josef > > > >> >> What am I missing? >> >> Alessandro >> >> >> 2017-01-17 13:00 GMT+01:00 : >>> >>> Hi, I'm looking for a way to find a random sample of C different items out >>> of N items, with a some desired probabilty Pi for each item i. >>> >>> I saw that numpy has a function that supposedly does this, >>> numpy.random.choice (with replace=False and a probabilities array), but >>> looking at the algorithm actually implemented, I am wondering in what sense >>> are the probabilities Pi actually obeyed... >>> >>> To me, the code doesn't seem to be doing the right thing... Let me explain: >>> >>> Consider a simple numerical example: We have 3 items, and need to pick 2 >>> different ones randomly. Let's assume the desired probabilities for item 1, >>> 2 and 3 are: 0.2, 0.4 and 0.4. >>> >>> Working out the equations there is exactly one solution here: The random >>> outcome of numpy.random.choice in this case should be [1,2] at probability >>> 0.2, [1,3] at probabilty 0.2, and [2,3] at probability 0.6. That is indeed >>> a solution for the desired probabilities because it yields item 1 in >>> [1,2]+[1,3] = 0.2 + 0.2 = 2*P1 of the trials, item 2 in [1,2]+[2,3] = >>> 0.2+0.6 = 0.8 = 2*P2, etc. >>> >>> However, the algorithm in numpy.random.choice's replace=False generates, if >>> I understand correctly, different probabilities for the outcomes: I believe >>> in this case it generates [1,2] at probability 0.23333, [1,3] also 0.2333, >>> and [2,3] at probability 0.53333. >>> >>> My question is how does this result fit the desired probabilities? >>> >>> If we get [1,2] at probability 0.23333 and [1,3] at probability 0.2333, >>> then the expect number of "1" results we'll get per drawing is 0.23333 + >>> 0.2333 = 0.46666, and similarly for "2" the expected number 0.7666, and for >>> "3" 0.76666. As you can see, the proportions are off: Item 2 is NOT twice >>> common than item 1 as we originally desired (we asked for probabilities >>> 0.2, 0.4, 0.4 for the individual items!). >>> >>> >>> -- >>> Nadav Har'El >>> nyh at scylladb.com >>> -------------- next part -------------- >>> An HTML attachment was scrubbed... >>> URL: < https://mail.scipy.org/pipermail/numpy-discussion/attachments/20170117/d1f0a1db/attachment-0001.html > >>> >>> ------------------------------ >>> >>> Subject: Digest Footer >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> https://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >>> ------------------------------ >>> >>> End of NumPy-Discussion Digest, Vol 124, Issue 24 >>> ************************************************* >> >> >> >> >> -- >> -------------------------------------------------------------------------- >> NOTICE: Dlgs 196/2003 this e-mail and any attachments thereto may contain confidential information and are intended for the sole use of the recipient(s) named above. If you are not the intended recipient of this message you are hereby notified that any dissemination or copying of this message is strictly prohibited. If you have received this e-mail in error, please notify the sender either by telephone or by e-mail and delete the material from any computer. Thank you. >> -------------------------------------------------------------------------- >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From alebarde at gmail.com Tue Jan 17 18:58:05 2017 From: alebarde at gmail.com (alebarde at gmail.com) Date: Wed, 18 Jan 2017 00:58:05 +0100 Subject: [Numpy-discussion] Question about numpy.random.choice with probabilties In-Reply-To: References: Message-ID: 2017-01-17 22:13 GMT+01:00 Nadav Har'El : > > On Tue, Jan 17, 2017 at 7:18 PM, alebarde at gmail.com > wrote: > >> Hi Nadav, >> >> I may be wrong, but I think that the result of the current implementation >> is actually the expected one. >> Using you example: probabilities for item 1, 2 and 3 are: 0.2, 0.4 and 0.4 >> >> P([1,2]) = P([2] | 1st=[1]) P([1]) + P([1] | 1st=[2]) P([2]) >> > > Yes, this formula does fit well with the actual algorithm in the code. > But, my question is *why* we want this formula to be correct: > > Just a note: this formula is correct and it is one of statistics fundamental law: https://en.wikipedia.org/wiki/Law_of_total_probability + https://en.wikipedia.org/wiki/Bayes%27_theorem Thus, the result we get from random.choice IMHO definitely makes sense. Of course, I think we could always discuss about implementing other sampling methods if they are useful to some application. > >> Now, P([1]) = 0.2 and P([2]) = 0.4. However: >> P([2] | 1st=[1]) = 0.5 (2 and 3 have the same sampling probability) >> P([1] | 1st=[2]) = 1/3 (1 and 3 have probability 0.2 and 0.4 that, >> once normalised, translate into 1/3 and 2/3 respectively) >> Therefore P([1,2]) = 0.7/3 = 0.23333 >> Similarly, P([1,3]) = 0.23333 and P([2,3]) = 1.6/3 = 0.533333 >> > > Right, these are the numbers that the algorithm in the current code, and > the formula above, produce: > > P([1,2]) = P([1,3]) = 0.23333 > P([2,3]) = 0.53333 > > What I'm puzzled about is that these probabilities do not really fullfill > the given probability vector 0.2, 0.4, 0.4... > Let me try to explain explain: > > Why did the user choose the probabilities 0.2, 0.4, 0.4 for the three > items in the first place? > > One reasonable interpretation is that the user wants in his random picks > to see item 1 half the time of item 2 or 3. > For example, maybe item 1 costs twice as much as item 2 or 3, so picking > it half as often will result in an equal expenditure on each item. > > If the user randomly picks the items individually (a single item at a > time), he indeed gets exactly this distribution: 0.2 of the time item 1, > 0.4 of the time item 2, 0.4 of the time item 3. > > Now, what happens if he picks not individual items, but pairs of different > items using numpy.random.choice with two items, replace=false? > Suddenly, the distribution of the individual items in the results get > skewed: If we look at the expected number of times we'll see each item in > one draw of a random pair, we will get: > > E(1) = P([1,2]) + P([1,3]) = 0.46666 > E(2) = P([1,2]) + P([2,3]) = 0.76666 > E(3) = P([1,3]) + P([2,3]) = 0.76666 > > Or renormalizing by dividing by 2: > > P(1) = 0.233333 > P(2) = 0.383333 > P(3) = 0.383333 > > As you can see this is not quite the probabilities we wanted (which were > 0.2, 0.4, 0.4)! In the random pairs we picked, item 1 was used a bit more > often than we wanted, and item 2 and 3 were used a bit less often! > p is not the probability of the output but the one of the source finite population. I think that if you want to preserve that distribution, as Josef pointed out, you have to make extractions independent, that is either sample with replacement or approximate an infinite population (that is basically the same thing). But of course in this case you will also end up with events [X,X]. > So that brought my question of why we consider these numbers right. > > In this example, it's actually possible to get the right item > distribution, if we pick the pair outcomes with the following probabilties: > > P([1,2]) = 0.2 (not 0.233333 as above) > P([1,3]) = 0.2 > P([2,3]) = 0.6 (not 0.533333 as above) > > Then, we get exactly the right P(1), P(2), P(3): 0.2, 0.4, 0.4 > > Interestingly, fixing things like I suggest is not always possible. > Consider a different probability-vector example for three items - 0.99, > 0.005, 0.005. Now, no matter which algorithm we use for randomly picking > pairs from these three items, *each* returned pair will inevitably contain > one of the two very-low-probability items, so each of those items will > appear in roughly half the pairs, instead of in a vanishingly small > percentage as we hoped. > > But in other choices of probabilities (like the one in my original > example), there is a solution. For 2-out-of-3 sampling we can actually show > a system of three linear equations in three variables, so there is always > one solution but if this solution has components not valid as probabilities > (not in [0,1]) we end up with no solution - as happens in the 0.99, 0.005, > 0.005 example. > > > >> What am I missing? >> >> Alessandro >> >> >> 2017-01-17 13:00 GMT+01:00 : >> >>> Hi, I'm looking for a way to find a random sample of C different items >>> out >>> of N items, with a some desired probabilty Pi for each item i. >>> >>> I saw that numpy has a function that supposedly does this, >>> numpy.random.choice (with replace=False and a probabilities array), but >>> looking at the algorithm actually implemented, I am wondering in what >>> sense >>> are the probabilities Pi actually obeyed... >>> >>> To me, the code doesn't seem to be doing the right thing... Let me >>> explain: >>> >>> Consider a simple numerical example: We have 3 items, and need to pick 2 >>> different ones randomly. Let's assume the desired probabilities for item >>> 1, >>> 2 and 3 are: 0.2, 0.4 and 0.4. >>> >>> Working out the equations there is exactly one solution here: The random >>> outcome of numpy.random.choice in this case should be [1,2] at >>> probability >>> 0.2, [1,3] at probabilty 0.2, and [2,3] at probability 0.6. That is >>> indeed >>> a solution for the desired probabilities because it yields item 1 in >>> [1,2]+[1,3] = 0.2 + 0.2 = 2*P1 of the trials, item 2 in [1,2]+[2,3] = >>> 0.2+0.6 = 0.8 = 2*P2, etc. >>> >>> However, the algorithm in numpy.random.choice's replace=False generates, >>> if >>> I understand correctly, different probabilities for the outcomes: I >>> believe >>> in this case it generates [1,2] at probability 0.23333, [1,3] also >>> 0.2333, >>> and [2,3] at probability 0.53333. >>> >>> My question is how does this result fit the desired probabilities? >>> >>> If we get [1,2] at probability 0.23333 and [1,3] at probability 0.2333, >>> then the expect number of "1" results we'll get per drawing is 0.23333 + >>> 0.2333 = 0.46666, and similarly for "2" the expected number 0.7666, and >>> for >>> "3" 0.76666. As you can see, the proportions are off: Item 2 is NOT twice >>> common than item 1 as we originally desired (we asked for probabilities >>> 0.2, 0.4, 0.4 for the individual items!). >>> >>> >>> -- >>> Nadav Har'El >>> nyh at scylladb.com >>> -------------- next part -------------- >>> An HTML attachment was scrubbed... >>> URL: >> ts/20170117/d1f0a1db/attachment-0001.html> >>> >>> ------------------------------ >>> >>> Subject: Digest Footer >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> https://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >>> ------------------------------ >>> >>> End of NumPy-Discussion Digest, Vol 124, Issue 24 >>> ************************************************* >>> >> >> >> >> -- >> ------------------------------------------------------------ >> -------------- >> NOTICE: Dlgs 196/2003 this e-mail and any attachments thereto may >> contain confidential information and are intended for the sole use of the >> recipient(s) named above. If you are not the intended recipient of this >> message you are hereby notified that any dissemination or copying of this >> message is strictly prohibited. If you have received this e-mail in error, >> please notify the sender either by telephone or by e-mail and delete the >> material from any computer. Thank you. >> ------------------------------------------------------------ >> -------------- >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- -------------------------------------------------------------------------- NOTICE: Dlgs 196/2003 this e-mail and any attachments thereto may contain confidential information and are intended for the sole use of the recipient(s) named above. If you are not the intended recipient of this message you are hereby notified that any dissemination or copying of this message is strictly prohibited. If you have received this e-mail in error, please notify the sender either by telephone or by e-mail and delete the material from any computer. Thank you. -------------------------------------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Tue Jan 17 19:14:14 2017 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 17 Jan 2017 16:14:14 -0800 Subject: [Numpy-discussion] [SciPy-Dev] NumPy 1.12.0 release In-Reply-To: References: Message-ID: On Tue, Jan 17, 2017 at 3:47 PM, Neal Becker wrote: > Matthew Brett wrote: > >> Hi, >> >> On Tue, Jan 17, 2017 at 5:56 AM, Neal Becker wrote: >>> Charles R Harris wrote: >>> >>>> Hi All, >>>> >>>> I'm pleased to announce the NumPy 1.12.0 release. This release supports >>>> Python 2.7 and 3.4-3.6. Wheels for all supported Python versions may be >>>> downloaded from PiPY >>>> , the >>>> tarball and zip files may be downloaded from Github >>>> . The release notes >>>> and files hashes may also be found at Github >>>> . >>>> >>>> NumPy 1.12.0rc 2 is the result of 418 pull requests submitted by 139 >>>> contributors and comprises a large number of fixes and improvements. >>>> Among >>>> the many improvements it is difficult to pick out just a few as >>>> standing above the others, but the following may be of particular >>>> interest or indicate areas likely to have future consequences. >>>> >>>> * Order of operations in ``np.einsum`` can now be optimized for large >>>> speed improvements. >>>> * New ``signature`` argument to ``np.vectorize`` for vectorizing with >>>> core dimensions. >>>> * The ``keepdims`` argument was added to many functions. >>>> * New context manager for testing warnings >>>> * Support for BLIS in numpy.distutils >>>> * Much improved support for PyPy (not yet finished) >>>> >>>> Enjoy, >>>> >>>> Chuck >>> >>> I've installed via pip3 on linux x86_64, which gives me a wheel. My >>> question is, am I loosing significant performance choosing this pre-built >>> binary vs. compiling myself? For example, my processor might have some >>> more features than the base version used to build wheels. >> >> I guess you are thinking about using this built wheel on some other >> machine? You'd have to be lucky for that to work; the wheel depends >> on the symbols it found at build time, which may not exist in the same >> places on your other machine. >> >> If it does work, the speed will primarily depend on your BLAS library. >> >> The pypi wheels should be pretty fast; they are built with OpenBLAS, >> which is at or near top of range for speed, across a range of >> platforms. >> >> Cheers, >> >> Matthew > > I installed using pip3 install, and it installed a wheel package. I did not > build it - aren't wheels already compiled packages? So isn't it built for > the common denominator architecture, not necessarily as fast as one I built > myself on my own machine? My question is, on x86_64, is this potential > difference large enough to bother with not using precompiled wheel packages? Ah - my guess is that you'd be hard pressed to make a numpy that is as fast as the precompiled wheel. The OpenBLAS library included in numpy selects the routines for your CPU at run-time, so they will generally be fast on your CPU. You might be able to get equivalent or even better performance with a ATLAS BLAS library recompiled on your exact machine, but that's quite a serious investment of time to get working, and you'd have to benchmark to find if you were really doing any better. Cheers, Matthew From njs at pobox.com Tue Jan 17 19:20:12 2017 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 17 Jan 2017 16:20:12 -0800 Subject: [Numpy-discussion] [SciPy-Dev] NumPy 1.12.0 release In-Reply-To: References: Message-ID: On Tue, Jan 17, 2017 at 3:47 PM, Neal Becker wrote: > Matthew Brett wrote: > >> Hi, >> >> On Tue, Jan 17, 2017 at 5:56 AM, Neal Becker wrote: >>> Charles R Harris wrote: >>> >>>> Hi All, >>>> >>>> I'm pleased to announce the NumPy 1.12.0 release. This release supports >>>> Python 2.7 and 3.4-3.6. Wheels for all supported Python versions may be >>>> downloaded from PiPY >>>> , the >>>> tarball and zip files may be downloaded from Github >>>> . The release notes >>>> and files hashes may also be found at Github >>>> . >>>> >>>> NumPy 1.12.0rc 2 is the result of 418 pull requests submitted by 139 >>>> contributors and comprises a large number of fixes and improvements. >>>> Among >>>> the many improvements it is difficult to pick out just a few as >>>> standing above the others, but the following may be of particular >>>> interest or indicate areas likely to have future consequences. >>>> >>>> * Order of operations in ``np.einsum`` can now be optimized for large >>>> speed improvements. >>>> * New ``signature`` argument to ``np.vectorize`` for vectorizing with >>>> core dimensions. >>>> * The ``keepdims`` argument was added to many functions. >>>> * New context manager for testing warnings >>>> * Support for BLIS in numpy.distutils >>>> * Much improved support for PyPy (not yet finished) >>>> >>>> Enjoy, >>>> >>>> Chuck >>> >>> I've installed via pip3 on linux x86_64, which gives me a wheel. My >>> question is, am I loosing significant performance choosing this pre-built >>> binary vs. compiling myself? For example, my processor might have some >>> more features than the base version used to build wheels. >> >> I guess you are thinking about using this built wheel on some other >> machine? You'd have to be lucky for that to work; the wheel depends >> on the symbols it found at build time, which may not exist in the same >> places on your other machine. >> >> If it does work, the speed will primarily depend on your BLAS library. >> >> The pypi wheels should be pretty fast; they are built with OpenBLAS, >> which is at or near top of range for speed, across a range of >> platforms. >> >> Cheers, >> >> Matthew > > I installed using pip3 install, and it installed a wheel package. I did not > build it - aren't wheels already compiled packages? So isn't it built for > the common denominator architecture, not necessarily as fast as one I built > myself on my own machine? My question is, on x86_64, is this potential > difference large enough to bother with not using precompiled wheel packages? Ultimately, it's going to depend on all sorts of things, including most importantly your actual code. Like most speed questions, the only real way to know is to try it and measure the difference. The wheels do ship with a fast BLAS (OpenBLAS configured to automatically adapt to your CPU at runtime), so the performance will at least be reasonable. Possible improvements would include using a different and somehow better BLAS (MKL might be faster in some cases), tweaking your compiler options to take advantage of whatever SIMD ISAs your particular CPU supports (numpy's build system doesn't do this automatically but in principle you could do it by hand -- were you bothering before? does it even make a difference in practice? I dunno), and using a new compiler (the linux wheels use a somewhat ancient version of gcc for Reasons; newer compilers are better at optimizing -- how much does it matter? again I dunno). Basically: if you want to experiment and report back then I think we'd all be interested to hear; OTOH if you aren't feeling particularly curious/ambitious then I wouldn't worry about it :-). -n -- Nathaniel J. Smith -- https://vorpus.org From josef.pktd at gmail.com Tue Jan 17 19:51:25 2017 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 17 Jan 2017 19:51:25 -0500 Subject: [Numpy-discussion] Question about numpy.random.choice with probabilties In-Reply-To: References: Message-ID: On Tue, Jan 17, 2017 at 6:58 PM, alebarde at gmail.com wrote: > > > 2017-01-17 22:13 GMT+01:00 Nadav Har'El : > >> >> On Tue, Jan 17, 2017 at 7:18 PM, alebarde at gmail.com >> wrote: >> >>> Hi Nadav, >>> >>> I may be wrong, but I think that the result of the current >>> implementation is actually the expected one. >>> Using you example: probabilities for item 1, 2 and 3 are: 0.2, 0.4 and >>> 0.4 >>> >>> P([1,2]) = P([2] | 1st=[1]) P([1]) + P([1] | 1st=[2]) P([2]) >>> >> >> Yes, this formula does fit well with the actual algorithm in the code. >> But, my question is *why* we want this formula to be correct: >> >> Just a note: this formula is correct and it is one of statistics > fundamental law: https://en.wikipedia.org/wiki/Law_of_total_probability + > https://en.wikipedia.org/wiki/Bayes%27_theorem > Thus, the result we get from random.choice IMHO definitely makes sense. Of > course, I think we could always discuss about implementing other sampling > methods if they are useful to some application. > > >> >>> Now, P([1]) = 0.2 and P([2]) = 0.4. However: >>> P([2] | 1st=[1]) = 0.5 (2 and 3 have the same sampling probability) >>> P([1] | 1st=[2]) = 1/3 (1 and 3 have probability 0.2 and 0.4 that, >>> once normalised, translate into 1/3 and 2/3 respectively) >>> Therefore P([1,2]) = 0.7/3 = 0.23333 >>> Similarly, P([1,3]) = 0.23333 and P([2,3]) = 1.6/3 = 0.533333 >>> >> >> Right, these are the numbers that the algorithm in the current code, and >> the formula above, produce: >> >> P([1,2]) = P([1,3]) = 0.23333 >> P([2,3]) = 0.53333 >> >> What I'm puzzled about is that these probabilities do not really fullfill >> the given probability vector 0.2, 0.4, 0.4... >> Let me try to explain explain: >> >> Why did the user choose the probabilities 0.2, 0.4, 0.4 for the three >> items in the first place? >> >> One reasonable interpretation is that the user wants in his random picks >> to see item 1 half the time of item 2 or 3. >> For example, maybe item 1 costs twice as much as item 2 or 3, so picking >> it half as often will result in an equal expenditure on each item. >> >> If the user randomly picks the items individually (a single item at a >> time), he indeed gets exactly this distribution: 0.2 of the time item 1, >> 0.4 of the time item 2, 0.4 of the time item 3. >> >> Now, what happens if he picks not individual items, but pairs of >> different items using numpy.random.choice with two items, replace=false? >> Suddenly, the distribution of the individual items in the results get >> skewed: If we look at the expected number of times we'll see each item in >> one draw of a random pair, we will get: >> >> E(1) = P([1,2]) + P([1,3]) = 0.46666 >> E(2) = P([1,2]) + P([2,3]) = 0.76666 >> E(3) = P([1,3]) + P([2,3]) = 0.76666 >> >> Or renormalizing by dividing by 2: >> >> P(1) = 0.233333 >> P(2) = 0.383333 >> P(3) = 0.383333 >> >> As you can see this is not quite the probabilities we wanted (which were >> 0.2, 0.4, 0.4)! In the random pairs we picked, item 1 was used a bit more >> often than we wanted, and item 2 and 3 were used a bit less often! >> > > p is not the probability of the output but the one of the source finite > population. I think that if you want to preserve that distribution, as > Josef pointed out, you have to make extractions independent, that is either > sample with replacement or approximate an infinite population (that is > basically the same thing). But of course in this case you will also end up > with events [X,X]. > With replacement and keeping duplicates the results might also be similar in the pattern of the marginal probabilities https://onlinecourses.science.psu.edu/stat506/node/17 Another approach in survey sampling is also to drop duplicates in with replacement sampling, but then the sample size itself is random. (again I didn't try to understand the small print) (another related aside: The problem with discrete sample space in small samples shows up also in calculating hypothesis tests, e.g. fisher's exact or similar. Because, we only get a few discrete possibilities in the sample space, it is not possible to construct a test that has exactly the desired type 1 error.) Josef > > >> So that brought my question of why we consider these numbers right. >> >> In this example, it's actually possible to get the right item >> distribution, if we pick the pair outcomes with the following probabilties: >> >> P([1,2]) = 0.2 (not 0.233333 as above) >> P([1,3]) = 0.2 >> P([2,3]) = 0.6 (not 0.533333 as above) >> >> Then, we get exactly the right P(1), P(2), P(3): 0.2, 0.4, 0.4 >> >> Interestingly, fixing things like I suggest is not always possible. >> Consider a different probability-vector example for three items - 0.99, >> 0.005, 0.005. Now, no matter which algorithm we use for randomly picking >> pairs from these three items, *each* returned pair will inevitably contain >> one of the two very-low-probability items, so each of those items will >> appear in roughly half the pairs, instead of in a vanishingly small >> percentage as we hoped. >> >> But in other choices of probabilities (like the one in my original >> example), there is a solution. For 2-out-of-3 sampling we can actually show >> a system of three linear equations in three variables, so there is always >> one solution but if this solution has components not valid as probabilities >> (not in [0,1]) we end up with no solution - as happens in the 0.99, 0.005, >> 0.005 example. >> >> >> >>> What am I missing? >>> >>> Alessandro >>> >>> >>> 2017-01-17 13:00 GMT+01:00 : >>> >>>> Hi, I'm looking for a way to find a random sample of C different items >>>> out >>>> of N items, with a some desired probabilty Pi for each item i. >>>> >>>> I saw that numpy has a function that supposedly does this, >>>> numpy.random.choice (with replace=False and a probabilities array), but >>>> looking at the algorithm actually implemented, I am wondering in what >>>> sense >>>> are the probabilities Pi actually obeyed... >>>> >>>> To me, the code doesn't seem to be doing the right thing... Let me >>>> explain: >>>> >>>> Consider a simple numerical example: We have 3 items, and need to pick 2 >>>> different ones randomly. Let's assume the desired probabilities for >>>> item 1, >>>> 2 and 3 are: 0.2, 0.4 and 0.4. >>>> >>>> Working out the equations there is exactly one solution here: The random >>>> outcome of numpy.random.choice in this case should be [1,2] at >>>> probability >>>> 0.2, [1,3] at probabilty 0.2, and [2,3] at probability 0.6. That is >>>> indeed >>>> a solution for the desired probabilities because it yields item 1 in >>>> [1,2]+[1,3] = 0.2 + 0.2 = 2*P1 of the trials, item 2 in [1,2]+[2,3] = >>>> 0.2+0.6 = 0.8 = 2*P2, etc. >>>> >>>> However, the algorithm in numpy.random.choice's replace=False >>>> generates, if >>>> I understand correctly, different probabilities for the outcomes: I >>>> believe >>>> in this case it generates [1,2] at probability 0.23333, [1,3] also >>>> 0.2333, >>>> and [2,3] at probability 0.53333. >>>> >>>> My question is how does this result fit the desired probabilities? >>>> >>>> If we get [1,2] at probability 0.23333 and [1,3] at probability 0.2333, >>>> then the expect number of "1" results we'll get per drawing is 0.23333 + >>>> 0.2333 = 0.46666, and similarly for "2" the expected number 0.7666, and >>>> for >>>> "3" 0.76666. As you can see, the proportions are off: Item 2 is NOT >>>> twice >>>> common than item 1 as we originally desired (we asked for probabilities >>>> 0.2, 0.4, 0.4 for the individual items!). >>>> >>>> >>>> -- >>>> Nadav Har'El >>>> nyh at scylladb.com >>>> -------------- next part -------------- >>>> An HTML attachment was scrubbed... >>>> URL: >>> ts/20170117/d1f0a1db/attachment-0001.html> >>>> >>>> ------------------------------ >>>> >>>> Subject: Digest Footer >>>> >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at scipy.org >>>> https://mail.scipy.org/mailman/listinfo/numpy-discussion >>>> >>>> >>>> ------------------------------ >>>> >>>> End of NumPy-Discussion Digest, Vol 124, Issue 24 >>>> ************************************************* >>>> >>> >>> >>> >>> -- >>> ------------------------------------------------------------ >>> -------------- >>> NOTICE: Dlgs 196/2003 this e-mail and any attachments thereto may >>> contain confidential information and are intended for the sole use of the >>> recipient(s) named above. If you are not the intended recipient of this >>> message you are hereby notified that any dissemination or copying of this >>> message is strictly prohibited. If you have received this e-mail in error, >>> please notify the sender either by telephone or by e-mail and delete the >>> material from any computer. Thank you. >>> ------------------------------------------------------------ >>> -------------- >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> https://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > > -- > -------------------------------------------------------------------------- > NOTICE: Dlgs 196/2003 this e-mail and any attachments thereto may contain > confidential information and are intended for the sole use of the > recipient(s) named above. If you are not the intended recipient of this > message you are hereby notified that any dissemination or copying of this > message is strictly prohibited. If you have received this e-mail in error, > please notify the sender either by telephone or by e-mail and delete the > material from any computer. Thank you. > -------------------------------------------------------------------------- > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jerome.Kieffer at esrf.fr Wed Jan 18 02:15:06 2017 From: Jerome.Kieffer at esrf.fr (Jerome Kieffer) Date: Wed, 18 Jan 2017 08:15:06 +0100 Subject: [Numpy-discussion] NumPy 1.12.0 release In-Reply-To: References: Message-ID: <20170118081506.4ccd1cee@lintaillefer.esrf.fr> On Tue, 17 Jan 2017 08:56:42 -0500 Neal Becker wrote: > I've installed via pip3 on linux x86_64, which gives me a wheel. My > question is, am I loosing significant performance choosing this pre-built > binary vs. compiling myself? For example, my processor might have some more > features than the base version used to build wheels. Hi, I have done some benchmarking (%timeit) for my code running in a jupyter-notebook within a venv installed with pip+manylinux wheels versus ipython and debian packages (on the same computer). I noticed the debian installation was ~20% faster. I did not investigate further if those 20% came from the manylinux (I suspect) or from the notebook infrastructure. HTH, -- J?r?me Kieffer From nathan12343 at gmail.com Wed Jan 18 02:27:28 2017 From: nathan12343 at gmail.com (Nathan Goldbaum) Date: Wed, 18 Jan 2017 07:27:28 +0000 Subject: [Numpy-discussion] NumPy 1.12.0 release In-Reply-To: <20170118081506.4ccd1cee@lintaillefer.esrf.fr> References: <20170118081506.4ccd1cee@lintaillefer.esrf.fr> Message-ID: I've seen reports on the anaconda mailing list of people seeing similar speed ups when they compile e.g. Numpy with a recent gcc. Anaconda has the same issue as manylinux in that they need to use versions of GCC available on CentOS 5. Given the upcoming official EOL for CentOS5, it might make sense to think about making a pep for a CentOS 6-based manylinux2 docker image, which will allow compiling with a newer GCC. On Tue, Jan 17, 2017 at 9:15 PM Jerome Kieffer wrote: > On Tue, 17 Jan 2017 08:56:42 -0500 > > Neal Becker wrote: > > > > > I've installed via pip3 on linux x86_64, which gives me a wheel. My > > > question is, am I loosing significant performance choosing this pre-built > > > binary vs. compiling myself? For example, my processor might have some > more > > > features than the base version used to build wheels. > > > > Hi, > > > > I have done some benchmarking (%timeit) for my code running in a > > jupyter-notebook within a venv installed with pip+manylinux wheels > > versus ipython and debian packages (on the same computer). > > I noticed the debian installation was ~20% faster. > > > > I did not investigate further if those 20% came from the manylinux (I > > suspect) or from the notebook infrastructure. > > > > HTH, > > -- > > J?r?me Kieffer > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Wed Jan 18 03:28:43 2017 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Wed, 18 Jan 2017 21:28:43 +1300 Subject: [Numpy-discussion] GSoC 2017: NumFocus will be an umbrella organization In-Reply-To: References: Message-ID: Hi Max, On Tue, Jan 17, 2017 at 2:38 AM, Max Linke wrote: > Hi > > Organizations can start submitting applications for Google Summer of Code > 2017 on January 19 (and the deadline is February 9) > > https://developers.google.com/open-source/gsoc/timeline?hl=en Thanks for bringing this up, and for organizing the NumFOCUS participation! > NumFOCUS will be applying again this year. If you want to work with us > please let me know and if you apply as an organization yourself or under a > different umbrella organization please tell me as well. I suspect we won't participate at all, but if we do then it's likely under the PSF umbrella as we have done previously. @all: in practice working on NumPy is just far too hard for most GSoC students. Previous years we've registered and generated ideas, but not gotten any students. We're also short on maintainer capacity. So I propose to not participate this year. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From nyh at scylladb.com Wed Jan 18 03:35:30 2017 From: nyh at scylladb.com (Nadav Har'El) Date: Wed, 18 Jan 2017 10:35:30 +0200 Subject: [Numpy-discussion] Question about numpy.random.choice with probabilties In-Reply-To: References: Message-ID: On Wed, Jan 18, 2017 at 1:58 AM, alebarde at gmail.com wrote: > > > 2017-01-17 22:13 GMT+01:00 Nadav Har'El : > >> >> On Tue, Jan 17, 2017 at 7:18 PM, alebarde at gmail.com >> wrote: >> >>> Hi Nadav, >>> >>> I may be wrong, but I think that the result of the current >>> implementation is actually the expected one. >>> Using you example: probabilities for item 1, 2 and 3 are: 0.2, 0.4 and >>> 0.4 >>> >>> P([1,2]) = P([2] | 1st=[1]) P([1]) + P([1] | 1st=[2]) P([2]) >>> >> >> Yes, this formula does fit well with the actual algorithm in the code. >> But, my question is *why* we want this formula to be correct: >> >> Just a note: this formula is correct and it is one of statistics > fundamental law: https://en.wikipedia.org/wiki/Law_of_total_probability + > https://en.wikipedia.org/wiki/Bayes%27_theorem > Hi, Yes, of course the formula is correct, but it doesn't mean we're not applying it in the wrong context. I'll be honest here: I came to numpy.random.choice after I actually coded a similar algorithm (with the same results) myself, because like you I thought this was the "obvious" and correct algorithm. Only then I realized that its output doesn't actually produce the desired probabilities specified by the user - even in the cases where that is possible. And I started wondering if existing libraries - like numpy - do this differently. And it turns out, numpy does it (basically) in the same way as my algorithm. > > Thus, the result we get from random.choice IMHO definitely makes sense. > Let's look at what the user asked this function, and what it returns: User asks: please give me random pairs of the three items, where item 1 has probability 0.2, item 2 has 0.4, and 3 has 0.4. Function returns: random pairs, where if you make many random returned results (as in the law of large numbers) and look at the items they contain, item 1 is 0.2333 of the items, item 2 is 0.38333, and item 3 is 0.38333. These are not (quite) the probabilities the user asked for... Can you explain a sense where the user's requested probabilities (0.2, 0.4, 0.4) are actually adhered in the results which random.choice returns? Thanks, Nadav Har'El. -------------- next part -------------- An HTML attachment was scrubbed... URL: From alebarde at gmail.com Wed Jan 18 04:00:50 2017 From: alebarde at gmail.com (alebarde at gmail.com) Date: Wed, 18 Jan 2017 10:00:50 +0100 Subject: [Numpy-discussion] Question about numpy.random.choice with probabilties In-Reply-To: References: Message-ID: 2017-01-18 9:35 GMT+01:00 Nadav Har'El : > > On Wed, Jan 18, 2017 at 1:58 AM, alebarde at gmail.com > wrote: > >> >> >> 2017-01-17 22:13 GMT+01:00 Nadav Har'El : >> >>> >>> On Tue, Jan 17, 2017 at 7:18 PM, alebarde at gmail.com >>> wrote: >>> >>>> Hi Nadav, >>>> >>>> I may be wrong, but I think that the result of the current >>>> implementation is actually the expected one. >>>> Using you example: probabilities for item 1, 2 and 3 are: 0.2, 0.4 and >>>> 0.4 >>>> >>>> P([1,2]) = P([2] | 1st=[1]) P([1]) + P([1] | 1st=[2]) P([2]) >>>> >>> >>> Yes, this formula does fit well with the actual algorithm in the code. >>> But, my question is *why* we want this formula to be correct: >>> >>> Just a note: this formula is correct and it is one of statistics >> fundamental law: https://en.wikipedia.org/wiki/Law_of_total_probability >> + https://en.wikipedia.org/wiki/Bayes%27_theorem >> > > Hi, > > Yes, of course the formula is correct, but it doesn't mean we're not > applying it in the wrong context. > > I'll be honest here: I came to numpy.random.choice after I actually coded > a similar algorithm (with the same results) myself, because like you I > thought this was the "obvious" and correct algorithm. Only then I realized > that its output doesn't actually produce the desired probabilities > specified by the user - even in the cases where that is possible. And I > started wondering if existing libraries - like numpy - do this differently. > And it turns out, numpy does it (basically) in the same way as my algorithm. > > >> >> Thus, the result we get from random.choice IMHO definitely makes sense. >> > > Let's look at what the user asked this function, and what it returns: > > User asks: please give me random pairs of the three items, where item 1 > has probability 0.2, item 2 has 0.4, and 3 has 0.4. > > Function returns: random pairs, where if you make many random returned > results (as in the law of large numbers) and look at the items they > contain, item 1 is 0.2333 of the items, item 2 is 0.38333, and item 3 is > 0.38333. > These are not (quite) the probabilities the user asked for... > > Can you explain a sense where the user's requested probabilities (0.2, > 0.4, 0.4) are actually adhered in the results which random.choice returns? > I think that the question the user is asking by specifying p is a slightly different one: "please give me random pairs of the three items extracted from a population of 3 items where item 1 has probability of being extracted of 0.2, item 2 has 0.4, and 3 has 0.4. Also please remove extract items once extracted." > Thanks, > Nadav Har'El. > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- -------------------------------------------------------------------------- NOTICE: Dlgs 196/2003 this e-mail and any attachments thereto may contain confidential information and are intended for the sole use of the recipient(s) named above. If you are not the intended recipient of this message you are hereby notified that any dissemination or copying of this message is strictly prohibited. If you have received this e-mail in error, please notify the sender either by telephone or by e-mail and delete the material from any computer. Thank you. -------------------------------------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: From nyh at scylladb.com Wed Jan 18 04:52:45 2017 From: nyh at scylladb.com (Nadav Har'El) Date: Wed, 18 Jan 2017 11:52:45 +0200 Subject: [Numpy-discussion] Question about numpy.random.choice with probabilties In-Reply-To: References: Message-ID: On Wed, Jan 18, 2017 at 11:00 AM, alebarde at gmail.com wrote: > Let's look at what the user asked this function, and what it returns: > >> >> User asks: please give me random pairs of the three items, where item 1 >> has probability 0.2, item 2 has 0.4, and 3 has 0.4. >> >> Function returns: random pairs, where if you make many random returned >> results (as in the law of large numbers) and look at the items they >> contain, item 1 is 0.2333 of the items, item 2 is 0.38333, and item 3 is >> 0.38333. >> These are not (quite) the probabilities the user asked for... >> >> Can you explain a sense where the user's requested probabilities (0.2, >> 0.4, 0.4) are actually adhered in the results which random.choice returns? >> > > I think that the question the user is asking by specifying p is a slightly > different one: > "please give me random pairs of the three items extracted from a > population of 3 items where item 1 has probability of being extracted of > 0.2, item 2 has 0.4, and 3 has 0.4. Also please remove extract items once > extracted." > You are right, if that is what the user wants, numpy.random.choice does the right thing. I'm just wondering whether this is actually what users want, and whether they understand this is what they are getting. As I said, I expected it to generate pairs with, empirically, the desired distribution of individual items. The documentation of numpy.random.choice seemed to me (wrongly) that it implis that that's what it does. So I was surprised to realize that it does not. Nadav. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtaylor.debian at googlemail.com Wed Jan 18 06:43:25 2017 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Wed, 18 Jan 2017 12:43:25 +0100 Subject: [Numpy-discussion] NumPy 1.12.0 release In-Reply-To: References: <20170118081506.4ccd1cee@lintaillefer.esrf.fr> Message-ID: <10e7c488-13bc-ae42-ced2-330ee9dd4c88@googlemail.com> The version of gcc used will make a large difference in some places. E.g. the AVX2 integer ufuncs require something around 4.5 to work and in general the optimization level of gcc has improved greatly since the clang competition showed up around that time. centos 5 has 4.1 which is really ancient. I though the wheels used newer gccs also on centos 5? On 18.01.2017 08:27, Nathan Goldbaum wrote: > I've seen reports on the anaconda mailing list of people seeing similar > speed ups when they compile e.g. Numpy with a recent gcc. Anaconda has > the same issue as manylinux in that they need to use versions of GCC > available on CentOS 5. > > Given the upcoming official EOL for CentOS5, it might make sense to > think about making a pep for a CentOS 6-based manylinux2 docker image, > which will allow compiling with a newer GCC. > > On Tue, Jan 17, 2017 at 9:15 PM Jerome Kieffer > wrote: > > On Tue, 17 Jan 2017 08:56:42 -0500 > > Neal Becker > wrote: > > > > > I've installed via pip3 on linux x86_64, which gives me a wheel. My > > > question is, am I loosing significant performance choosing this > pre-built > > > binary vs. compiling myself? For example, my processor might have > some more > > > features than the base version used to build wheels. > > > > Hi, > > > > I have done some benchmarking (%timeit) for my code running in a > > jupyter-notebook within a venv installed with pip+manylinux wheels > > versus ipython and debian packages (on the same computer). > > I noticed the debian installation was ~20% faster. > > > > I did not investigate further if those 20% came from the manylinux (I > > suspect) or from the notebook infrastructure. > > > > HTH, > > -- > > J?r?me Kieffer > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > From ndbecker2 at gmail.com Wed Jan 18 07:00:18 2017 From: ndbecker2 at gmail.com (Neal Becker) Date: Wed, 18 Jan 2017 07:00:18 -0500 Subject: [Numpy-discussion] [SciPy-Dev] NumPy 1.12.0 release References: Message-ID: Nathaniel Smith wrote: > On Tue, Jan 17, 2017 at 3:47 PM, Neal Becker wrote: >> Matthew Brett wrote: >> >>> Hi, >>> >>> On Tue, Jan 17, 2017 at 5:56 AM, Neal Becker >>> wrote: >>>> Charles R Harris wrote: >>>> >>>>> Hi All, >>>>> >>>>> I'm pleased to announce the NumPy 1.12.0 release. This release >>>>> supports Python 2.7 and 3.4-3.6. Wheels for all supported Python >>>>> versions may be downloaded from PiPY >>>>> , the >>>>> tarball and zip files may be downloaded from Github >>>>> . The release >>>>> notes and files hashes may also be found at Github >>>>> . >>>>> >>>>> NumPy 1.12.0rc 2 is the result of 418 pull requests submitted by 139 >>>>> contributors and comprises a large number of fixes and improvements. >>>>> Among >>>>> the many improvements it is difficult to pick out just a few as >>>>> standing above the others, but the following may be of particular >>>>> interest or indicate areas likely to have future consequences. >>>>> >>>>> * Order of operations in ``np.einsum`` can now be optimized for large >>>>> speed improvements. >>>>> * New ``signature`` argument to ``np.vectorize`` for vectorizing with >>>>> core dimensions. >>>>> * The ``keepdims`` argument was added to many functions. >>>>> * New context manager for testing warnings >>>>> * Support for BLIS in numpy.distutils >>>>> * Much improved support for PyPy (not yet finished) >>>>> >>>>> Enjoy, >>>>> >>>>> Chuck >>>> >>>> I've installed via pip3 on linux x86_64, which gives me a wheel. My >>>> question is, am I loosing significant performance choosing this >>>> pre-built >>>> binary vs. compiling myself? For example, my processor might have some >>>> more features than the base version used to build wheels. >>> >>> I guess you are thinking about using this built wheel on some other >>> machine? You'd have to be lucky for that to work; the wheel depends >>> on the symbols it found at build time, which may not exist in the same >>> places on your other machine. >>> >>> If it does work, the speed will primarily depend on your BLAS library. >>> >>> The pypi wheels should be pretty fast; they are built with OpenBLAS, >>> which is at or near top of range for speed, across a range of >>> platforms. >>> >>> Cheers, >>> >>> Matthew >> >> I installed using pip3 install, and it installed a wheel package. I did >> not >> build it - aren't wheels already compiled packages? So isn't it built >> for the common denominator architecture, not necessarily as fast as one I >> built >> myself on my own machine? My question is, on x86_64, is this potential >> difference large enough to bother with not using precompiled wheel >> packages? > > Ultimately, it's going to depend on all sorts of things, including > most importantly your actual code. Like most speed questions, the only > real way to know is to try it and measure the difference. > > The wheels do ship with a fast BLAS (OpenBLAS configured to > automatically adapt to your CPU at runtime), so the performance will > at least be reasonable. Possible improvements would include using a > different and somehow better BLAS (MKL might be faster in some cases), > tweaking your compiler options to take advantage of whatever SIMD ISAs > your particular CPU supports (numpy's build system doesn't do this > automatically but in principle you could do it by hand -- were you > bothering before? does it even make a difference in practice? I > dunno), and using a new compiler (the linux wheels use a somewhat > ancient version of gcc for Reasons; newer compilers are better at > optimizing -- how much does it matter? again I dunno). > > Basically: if you want to experiment and report back then I think we'd > all be interested to hear; OTOH if you aren't feeling particularly > curious/ambitious then I wouldn't worry about it :-). > > -n > Yes, I always add -march=native, which should pickup whatever SIMD is available. So my question was primarily if I should bother. Thanks for the detailed answer. From ndbecker2 at gmail.com Wed Jan 18 07:02:01 2017 From: ndbecker2 at gmail.com (Neal Becker) Date: Wed, 18 Jan 2017 07:02:01 -0500 Subject: [Numpy-discussion] NumPy 1.12.0 release References: Message-ID: Matthew Brett wrote: > On Tue, Jan 17, 2017 at 3:47 PM, Neal Becker wrote: >> Matthew Brett wrote: >> >>> Hi, >>> >>> On Tue, Jan 17, 2017 at 5:56 AM, Neal Becker >>> wrote: >>>> Charles R Harris wrote: >>>> >>>>> Hi All, >>>>> >>>>> I'm pleased to announce the NumPy 1.12.0 release. This release >>>>> supports Python 2.7 and 3.4-3.6. Wheels for all supported Python >>>>> versions may be downloaded from PiPY >>>>> , the >>>>> tarball and zip files may be downloaded from Github >>>>> . The release >>>>> notes and files hashes may also be found at Github >>>>> . >>>>> >>>>> NumPy 1.12.0rc 2 is the result of 418 pull requests submitted by 139 >>>>> contributors and comprises a large number of fixes and improvements. >>>>> Among >>>>> the many improvements it is difficult to pick out just a few as >>>>> standing above the others, but the following may be of particular >>>>> interest or indicate areas likely to have future consequences. >>>>> >>>>> * Order of operations in ``np.einsum`` can now be optimized for large >>>>> speed improvements. >>>>> * New ``signature`` argument to ``np.vectorize`` for vectorizing with >>>>> core dimensions. >>>>> * The ``keepdims`` argument was added to many functions. >>>>> * New context manager for testing warnings >>>>> * Support for BLIS in numpy.distutils >>>>> * Much improved support for PyPy (not yet finished) >>>>> >>>>> Enjoy, >>>>> >>>>> Chuck >>>> >>>> I've installed via pip3 on linux x86_64, which gives me a wheel. My >>>> question is, am I loosing significant performance choosing this >>>> pre-built >>>> binary vs. compiling myself? For example, my processor might have some >>>> more features than the base version used to build wheels. >>> >>> I guess you are thinking about using this built wheel on some other >>> machine? You'd have to be lucky for that to work; the wheel depends >>> on the symbols it found at build time, which may not exist in the same >>> places on your other machine. >>> >>> If it does work, the speed will primarily depend on your BLAS library. >>> >>> The pypi wheels should be pretty fast; they are built with OpenBLAS, >>> which is at or near top of range for speed, across a range of >>> platforms. >>> >>> Cheers, >>> >>> Matthew >> >> I installed using pip3 install, and it installed a wheel package. I did >> not >> build it - aren't wheels already compiled packages? So isn't it built >> for the common denominator architecture, not necessarily as fast as one I >> built >> myself on my own machine? My question is, on x86_64, is this potential >> difference large enough to bother with not using precompiled wheel >> packages? > > Ah - my guess is that you'd be hard pressed to make a numpy that is as > fast as the precompiled wheel. The OpenBLAS library included in > numpy selects the routines for your CPU at run-time, so they will > generally be fast on your CPU. You might be able to get equivalent > or even better performance with a ATLAS BLAS library recompiled on > your exact machine, but that's quite a serious investment of time to > get working, and you'd have to benchmark to find if you were really > doing any better. > > Cheers, > > Matthew OK, so at least for BLAS things should be pretty well optimized. From cournape at gmail.com Wed Jan 18 07:15:16 2017 From: cournape at gmail.com (David Cournapeau) Date: Wed, 18 Jan 2017 12:15:16 +0000 Subject: [Numpy-discussion] NumPy 1.12.0 release In-Reply-To: <10e7c488-13bc-ae42-ced2-330ee9dd4c88@googlemail.com> References: <20170118081506.4ccd1cee@lintaillefer.esrf.fr> <10e7c488-13bc-ae42-ced2-330ee9dd4c88@googlemail.com> Message-ID: On Wed, Jan 18, 2017 at 11:43 AM, Julian Taylor < jtaylor.debian at googlemail.com> wrote: > The version of gcc used will make a large difference in some places. > E.g. the AVX2 integer ufuncs require something around 4.5 to work and in > general the optimization level of gcc has improved greatly since the > clang competition showed up around that time. centos 5 has 4.1 which is > really ancient. > I though the wheels used newer gccs also on centos 5? > I don't know if it is mandatory for many wheels, but it is possilbe to build w/ gcc 4.8 at least, and still binary compatibility with centos 5.X and above, though I am not sure about the impact on speed. It has been quite some time already that building numpy/scipy with gcc 4.1 causes troubles with errors and even crashes anyway, so you definitely want to use a more recent compiler in any case. David > On 18.01.2017 08:27, Nathan Goldbaum wrote: > > I've seen reports on the anaconda mailing list of people seeing similar > > speed ups when they compile e.g. Numpy with a recent gcc. Anaconda has > > the same issue as manylinux in that they need to use versions of GCC > > available on CentOS 5. > > > > Given the upcoming official EOL for CentOS5, it might make sense to > > think about making a pep for a CentOS 6-based manylinux2 docker image, > > which will allow compiling with a newer GCC. > > > > On Tue, Jan 17, 2017 at 9:15 PM Jerome Kieffer > > wrote: > > > > On Tue, 17 Jan 2017 08:56:42 -0500 > > > > Neal Becker > > wrote: > > > > > > > > > I've installed via pip3 on linux x86_64, which gives me a wheel. > My > > > > > question is, am I loosing significant performance choosing this > > pre-built > > > > > binary vs. compiling myself? For example, my processor might have > > some more > > > > > features than the base version used to build wheels. > > > > > > > > Hi, > > > > > > > > I have done some benchmarking (%timeit) for my code running in a > > > > jupyter-notebook within a venv installed with pip+manylinux wheels > > > > versus ipython and debian packages (on the same computer). > > > > I noticed the debian installation was ~20% faster. > > > > > > > > I did not investigate further if those 20% came from the manylinux (I > > > > suspect) or from the notebook infrastructure. > > > > > > > > HTH, > > > > -- > > > > J?r?me Kieffer > > > > > > > > _______________________________________________ > > > > NumPy-Discussion mailing list > > > > NumPy-Discussion at scipy.org > > > > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Wed Jan 18 07:59:18 2017 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 18 Jan 2017 04:59:18 -0800 Subject: [Numpy-discussion] NumPy 1.12.0 release In-Reply-To: <10e7c488-13bc-ae42-ced2-330ee9dd4c88@googlemail.com> References: <20170118081506.4ccd1cee@lintaillefer.esrf.fr> <10e7c488-13bc-ae42-ced2-330ee9dd4c88@googlemail.com> Message-ID: On Wed, Jan 18, 2017 at 3:43 AM, Julian Taylor wrote: > The version of gcc used will make a large difference in some places. > E.g. the AVX2 integer ufuncs require something around 4.5 to work and in > general the optimization level of gcc has improved greatly since the > clang competition showed up around that time. centos 5 has 4.1 which is > really ancient. > I though the wheels used newer gccs also on centos 5? The wheels are built with gcc 4.8, which is the last version that you can get to build for centos 5. When we bump to centos 6 as the minimum supported, we'll be able to switch to gcc 5.3.1. -n -- Nathaniel J. Smith -- https://vorpus.org From josef.pktd at gmail.com Wed Jan 18 08:53:24 2017 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 18 Jan 2017 08:53:24 -0500 Subject: [Numpy-discussion] Question about numpy.random.choice with probabilties In-Reply-To: References: Message-ID: On Wed, Jan 18, 2017 at 4:52 AM, Nadav Har'El wrote: > > On Wed, Jan 18, 2017 at 11:00 AM, alebarde at gmail.com > wrote: > >> Let's look at what the user asked this function, and what it returns: >> >>> >>> User asks: please give me random pairs of the three items, where item 1 >>> has probability 0.2, item 2 has 0.4, and 3 has 0.4. >>> >>> Function returns: random pairs, where if you make many random returned >>> results (as in the law of large numbers) and look at the items they >>> contain, item 1 is 0.2333 of the items, item 2 is 0.38333, and item 3 is >>> 0.38333. >>> These are not (quite) the probabilities the user asked for... >>> >>> Can you explain a sense where the user's requested probabilities (0.2, >>> 0.4, 0.4) are actually adhered in the results which random.choice returns? >>> >> >> I think that the question the user is asking by specifying p is a >> slightly different one: >> "please give me random pairs of the three items extracted from a >> population of 3 items where item 1 has probability of being extracted of >> 0.2, item 2 has 0.4, and 3 has 0.4. Also please remove extract items once >> extracted." >> > > You are right, if that is what the user wants, numpy.random.choice does > the right thing. > > I'm just wondering whether this is actually what users want, and whether > they understand this is what they are getting. > > As I said, I expected it to generate pairs with, empirically, the desired > distribution of individual items. The documentation of numpy.random.choice > seemed to me (wrongly) that it implis that that's what it does. So I was > surprised to realize that it does not. > As Alessandro and you showed, the function returns something that makes sense. If the user wants something different, then they need to look for a different function, which is however difficult if it doesn't have a solution in general. Sounds to me a bit like a Monty Hall problem. Whether we like it or not, or find it counter intuitive, it is what it is given the sampling scheme. Having more sampling schemes would be useful, but it's not possible to implement sampling schemes with impossible properties Josef > > Nadav. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Wed Jan 18 09:30:48 2017 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 18 Jan 2017 09:30:48 -0500 Subject: [Numpy-discussion] Question about numpy.random.choice with probabilties In-Reply-To: References: Message-ID: On Wed, Jan 18, 2017 at 8:53 AM, wrote: > > > On Wed, Jan 18, 2017 at 4:52 AM, Nadav Har'El wrote: > >> >> On Wed, Jan 18, 2017 at 11:00 AM, alebarde at gmail.com >> wrote: >> >>> Let's look at what the user asked this function, and what it returns: >>> >>>> >>>> User asks: please give me random pairs of the three items, where item 1 >>>> has probability 0.2, item 2 has 0.4, and 3 has 0.4. >>>> >>>> Function returns: random pairs, where if you make many random returned >>>> results (as in the law of large numbers) and look at the items they >>>> contain, item 1 is 0.2333 of the items, item 2 is 0.38333, and item 3 is >>>> 0.38333. >>>> These are not (quite) the probabilities the user asked for... >>>> >>>> Can you explain a sense where the user's requested probabilities (0.2, >>>> 0.4, 0.4) are actually adhered in the results which random.choice returns? >>>> >>> >>> I think that the question the user is asking by specifying p is a >>> slightly different one: >>> "please give me random pairs of the three items extracted from a >>> population of 3 items where item 1 has probability of being extracted of >>> 0.2, item 2 has 0.4, and 3 has 0.4. Also please remove extract items once >>> extracted." >>> >> >> You are right, if that is what the user wants, numpy.random.choice does >> the right thing. >> >> I'm just wondering whether this is actually what users want, and whether >> they understand this is what they are getting. >> >> As I said, I expected it to generate pairs with, empirically, the desired >> distribution of individual items. The documentation of numpy.random.choice >> seemed to me (wrongly) that it implis that that's what it does. So I was >> surprised to realize that it does not. >> > > As Alessandro and you showed, the function returns something that makes > sense. If the user wants something different, then they need to look for a > different function, which is however difficult if it doesn't have a > solution in general. > > Sounds to me a bit like a Monty Hall problem. Whether we like it or not, > or find it counter intuitive, it is what it is given the sampling scheme. > > Having more sampling schemes would be useful, but it's not possible to > implement sampling schemes with impossible properties. > BTW: sampling 3 out of 3 without replacement is even worse No matter what sampling scheme and what selection probabilities we use, we always have every element with probability 1 in the sample. (Which in survey statistics implies that the sampling error or standard deviation of any estimate of a population mean or total is zero. Which I found weird. How can you do statistics and get an estimate that doesn't have any uncertainty associated with it?) Josef > > Josef > > > >> >> Nadav. >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nyh at scylladb.com Wed Jan 18 10:12:57 2017 From: nyh at scylladb.com (Nadav Har'El) Date: Wed, 18 Jan 2017 17:12:57 +0200 Subject: [Numpy-discussion] Question about numpy.random.choice with probabilties In-Reply-To: References: Message-ID: On Wed, Jan 18, 2017 at 4:30 PM, wrote: > > > Having more sampling schemes would be useful, but it's not possible to >> implement sampling schemes with impossible properties. >> >> > > BTW: sampling 3 out of 3 without replacement is even worse > > No matter what sampling scheme and what selection probabilities we use, we > always have every element with probability 1 in the sample. > I agree. The random-sample function of the type I envisioned will be able to reproduce the desired probabilities in some cases (like the example I gave) but not in others. Because doing this correctly involves a set of n linear equations in comb(n,k) variables, it can have no solution, or many solutions, depending on the n and k, and the desired probabilities. A function of this sort could return an error if it can't achieve the desired probabilities. But in many cases (the 0.2, 0.4, 0.4 example I gave was just something random I tried) there will be a way to achieve exactly the desired distribution. I guess I'll need to write this new function myself :-) Because my use case definitely requires that the output of the random items produced matches the required probabilities (when possible). Thanks, Nadav. -------------- next part -------------- An HTML attachment was scrubbed... URL: From max_linke at gmx.de Wed Jan 18 10:18:24 2017 From: max_linke at gmx.de (Max Linke) Date: Wed, 18 Jan 2017 16:18:24 +0100 Subject: [Numpy-discussion] GSoC 2017: NumFocus will be an umbrella organization In-Reply-To: References: Message-ID: <1b0dc33a-d608-81b0-7211-71ee3fd5e37a@gmx.de> On 01/18/2017 09:28 AM, Ralf Gommers wrote: > Hi Max, > > On Tue, Jan 17, 2017 at 2:38 AM, Max Linke > wrote: > > Hi > > Organizations can start submitting applications for Google Summer of > Code 2017 on January 19 (and the deadline is February 9) > > https://developers.google.com/open-source/gsoc/timeline?hl=en > > > > Thanks for bringing this up, and for organizing the NumFOCUS > participation! > > > NumFOCUS will be applying again this year. If you want to work with > us please let me know and if you apply as an organization yourself > or under a different umbrella organization please tell me as well. > > > I suspect we won't participate at all, but if we do then it's likely > under the PSF umbrella as we have done previously. Thanks for letting me now. If you decide to participate with the PSF please write me a private mail so that I can update the NumFOCUS gsoc page accordingly. > > @all: in practice working on NumPy is just far too hard for most > GSoC students. Previous years we've registered and generated ideas, > but not gotten any students. We're also short on maintainer capacity. > So I propose to not participate this year. > > Ralf > > > > _______________________________________________ NumPy-Discussion > mailing list NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > From pierre.schnizer at helmholtz-berlin.de Sat Jan 21 03:23:43 2017 From: pierre.schnizer at helmholtz-berlin.de (Schnizer, Pierre) Date: Sat, 21 Jan 2017 08:23:43 +0000 Subject: [Numpy-discussion] Building external c modules with mingw64 / numpy Message-ID: <243DBD016692E54EB12F37B87C66E70E815DB8@didag1> Dear all, I built an external c-module (pygsl) using mingw 64 from msys2 mingw64-gcc compiler. This built required some changes to numpy.distutils to get the ?python setup.py config? and ?python setup.py build? working. In this process I replaced 2 files in numpy.distutils from numpy git repository: - numpy.dist_utils.misc_utils.py version ec0e046 on 14 Dec 2016 - numpy.dist_utils. mingw32ccompiler.py version ec0e046 on 14 Dec 2016 mingw32ccompiler.py required to be modified to get it work n preprocessor had to be defined as I am using setup.py config n specifying the runtime library search path to the linker n include path of the vcrtruntime I attached a patch reflecting the changes I had to make to file mingw32ccompile.py If this information is useful I am happy to answer questions Sincerely yours Pierre PS Version infos: Python: Python 3.6.0 (v3.6.0:41df79263a11, Dec 23 2016, 08:06:12) [MSC v.1900 64 bit (AMD64)] on win32 Numpy: >> help(numpy.version) Help on module numpy.version in numpy: DATA full_version = '1.12.0' git_revision = '561f1accf861ad8606ea2dd723d2be2b09a2dffa' release = True short_version = '1.12.0' version = '1.12.0' gcc.exe (Rev2, Built by MSYS2 project) 6.2.0 ________________________________ Helmholtz-Zentrum Berlin f?r Materialien und Energie GmbH Mitglied der Hermann von Helmholtz-Gemeinschaft Deutscher Forschungszentren e.V. Aufsichtsrat: Vorsitzender Dr. Karl Eugen Huthmacher, stv. Vorsitzende Dr. Jutta Koch-Unterseher Gesch?ftsf?hrung: Prof. Dr. Anke Rita Kaysser-Pyzalla, Thomas Frederking Sitz Berlin, AG Charlottenburg, 89 HRB 5583 Postadresse: Hahn-Meitner-Platz 1 D-14109 Berlin http://www.helmholtz-berlin.de -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: python_numpy_mingw_git.diff Type: application/octet-stream Size: 2448 bytes Desc: python_numpy_mingw_git.diff URL: From josef.pktd at gmail.com Sat Jan 21 10:10:53 2017 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 21 Jan 2017 10:10:53 -0500 Subject: [Numpy-discussion] offset in fill diagonal Message-ID: Is there a simple way to fill in diagonal elements in an array for other than main diagonal? As far as I can see, the diagxxx functions that have offset can only read and not inplace modify, and the functions for modifying don't have offset and only allow changing the main diagonal. Usecase: creating banded matrices (2-D arrays) similar to toeplitz. Josef -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtaylor.debian at googlemail.com Sat Jan 21 10:23:33 2017 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Sat, 21 Jan 2017 16:23:33 +0100 Subject: [Numpy-discussion] offset in fill diagonal In-Reply-To: References: Message-ID: On 21.01.2017 16:10, josef.pktd at gmail.com wrote: > Is there a simple way to fill in diagonal elements in an array for other > than main diagonal? > > As far as I can see, the diagxxx functions that have offset can only > read and not inplace modify, and the functions for modifying don't have > offset and only allow changing the main diagonal. > > Usecase: creating banded matrices (2-D arrays) similar to toeplitz. > you can construct index arrays or boolean masks to index using the np.tri* functions. e.g. a = np.arange(5*5).reshape(5,5) band = np.tri(5, 5, 1, dtype=np.bool) & ~np.tri(5, 5, -2, dtype=np.bool) a[band] = -1 From insertinterestingnamehere at gmail.com Sat Jan 21 14:26:12 2017 From: insertinterestingnamehere at gmail.com (Ian Henriksen) Date: Sat, 21 Jan 2017 19:26:12 +0000 Subject: [Numpy-discussion] offset in fill diagonal In-Reply-To: References: Message-ID: On Sat, Jan 21, 2017 at 9:23 AM Julian Taylor wrote: > On 21.01.2017 16:10, josef.pktd at gmail.com wrote: > > Is there a simple way to fill in diagonal elements in an array for other > > than main diagonal? > > > > As far as I can see, the diagxxx functions that have offset can only > > read and not inplace modify, and the functions for modifying don't have > > offset and only allow changing the main diagonal. > > > > Usecase: creating banded matrices (2-D arrays) similar to toeplitz. > > > > you can construct index arrays or boolean masks to index using the > np.tri* functions. > e.g. > > a = np.arange(5*5).reshape(5,5) > band = np.tri(5, 5, 1, dtype=np.bool) & ~np.tri(5, 5, -2, dtype=np.bool) > a[band] = -1 > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion You can slice the array you're filling before passing it to fill_diagonal. For example: import numpy as np a = np.zeros((4, 4)) b = np.ones(3) np.fill_diagonal(a[1:], b) np.fill_diagonal(a[:,1:], -b) yields array([[ 0., -1., 0., 0.], [ 1., 0., -1., 0.], [ 0., 1., 0., -1.], [ 0., 0., 1., 0.]]) Hope this helps, Ian Henriksen -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Mon Jan 23 06:40:51 2017 From: cournape at gmail.com (David Cournapeau) Date: Mon, 23 Jan 2017 11:40:51 +0000 Subject: [Numpy-discussion] Numpy 1.11.3, scipy 0.18.1, MSVC 2015 and crashes in complex functions Message-ID: Hi there, While building the latest scipy on top of numpy 1.11.3, I have noticed crashes while running the scipy test suite, in scipy.special (e.g. in scipy.special hyp0f1 test).. This only happens on windows for python 3.5 (where we use MSVC 2015 compiler). Applying some violence to distutils, I re-built numpy/scipy with debug symbols, and the debugger claims that crashes happen inside scipy.special ufunc cython code, when calling clog or csqrt. I first suspected a compiler bug, but disabling those functions in numpy, to force using our own versions in npymath, made the problem go away. I am a bit suspicious about the whole thing as neither conda's or gholke's wheel crashed. Has anybody else encountered this ? David -------------- next part -------------- An HTML attachment was scrubbed... URL: From evgeny.burovskiy at gmail.com Mon Jan 23 06:46:02 2017 From: evgeny.burovskiy at gmail.com (Evgeni Burovski) Date: Mon, 23 Jan 2017 14:46:02 +0300 Subject: [Numpy-discussion] Numpy 1.11.3, scipy 0.18.1, MSVC 2015 and crashes in complex functions In-Reply-To: References: Message-ID: Related to https://github.com/scipy/scipy/issues/6336? 23.01.2017 14:40 ???????????? "David Cournapeau" ???????: > Hi there, > > While building the latest scipy on top of numpy 1.11.3, I have noticed > crashes while running the scipy test suite, in scipy.special (e.g. in > scipy.special hyp0f1 test).. This only happens on windows for python 3.5 > (where we use MSVC 2015 compiler). > > Applying some violence to distutils, I re-built numpy/scipy with debug > symbols, and the debugger claims that crashes happen inside scipy.special > ufunc cython code, when calling clog or csqrt. I first suspected a compiler > bug, but disabling those functions in numpy, to force using our own > versions in npymath, made the problem go away. > > I am a bit suspicious about the whole thing as neither conda's or gholke's > wheel crashed. Has anybody else encountered this ? > > David > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Mon Jan 23 07:02:01 2017 From: cournape at gmail.com (David Cournapeau) Date: Mon, 23 Jan 2017 12:02:01 +0000 Subject: [Numpy-discussion] Numpy 1.11.3, scipy 0.18.1, MSVC 2015 and crashes in complex functions In-Reply-To: References: Message-ID: Indeed. I wrongly assumed that since gholke's wheels did not crash, they did not run into that issue. That sounds like an ABI issue, since I suspect intel math library supports C99 complex numbers. I will add info on that issue then, David On Mon, Jan 23, 2017 at 11:46 AM, Evgeni Burovski < evgeny.burovskiy at gmail.com> wrote: > Related to https://github.com/scipy/scipy/issues/6336? > 23.01.2017 14:40 ???????????? "David Cournapeau" > ???????: > >> Hi there, >> >> While building the latest scipy on top of numpy 1.11.3, I have noticed >> crashes while running the scipy test suite, in scipy.special (e.g. in >> scipy.special hyp0f1 test).. This only happens on windows for python 3.5 >> (where we use MSVC 2015 compiler). >> >> Applying some violence to distutils, I re-built numpy/scipy with debug >> symbols, and the debugger claims that crashes happen inside scipy.special >> ufunc cython code, when calling clog or csqrt. I first suspected a compiler >> bug, but disabling those functions in numpy, to force using our own >> versions in npymath, made the problem go away. >> >> I am a bit suspicious about the whole thing as neither conda's or >> gholke's wheel crashed. Has anybody else encountered this ? >> >> David >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From peridot.faceted at gmail.com Mon Jan 23 07:27:43 2017 From: peridot.faceted at gmail.com (Anne Archibald) Date: Mon, 23 Jan 2017 12:27:43 +0000 Subject: [Numpy-discussion] Question about numpy.random.choice with probabilties In-Reply-To: References: Message-ID: On Wed, Jan 18, 2017 at 4:13 PM Nadav Har'El wrote: > On Wed, Jan 18, 2017 at 4:30 PM, wrote: > > > > Having more sampling schemes would be useful, but it's not possible to > implement sampling schemes with impossible properties. > > > > BTW: sampling 3 out of 3 without replacement is even worse > > No matter what sampling scheme and what selection probabilities we use, we > always have every element with probability 1 in the sample. > > > I agree. The random-sample function of the type I envisioned will be able > to reproduce the desired probabilities in some cases (like the example I > gave) but not in others. Because doing this correctly involves a set of n > linear equations in comb(n,k) variables, it can have no solution, or many > solutions, depending on the n and k, and the desired probabilities. A > function of this sort could return an error if it can't achieve the desired > probabilities. > It seems to me that the basic problem here is that the numpy.random.choice docstring fails to explain what the function actually does when called with weights and without replacement. Clearly there are different expectations; I think numpy.random.choice chose one that is easy to explain and implement but not necessarily what everyone expects. So the docstring should be clarified. Perhaps a Notes section: When numpy.random.choice is called with replace=False and non-uniform probabilities, the resulting distribution of samples is not obvious. numpy.random.choice effectively follows the procedure: when choosing the kth element in a set, the probability of element i occurring is p[i] divided by the total probability of all not-yet-chosen (and therefore eligible) elements. This approach is always possible as long as the sample size is no larger than the population, but it means that the probability that element i occurs in the sample is not exactly p[i]. Anne > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Mon Jan 23 09:33:57 2017 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 23 Jan 2017 08:33:57 -0600 Subject: [Numpy-discussion] Question about numpy.random.choice with probabilties In-Reply-To: References: Message-ID: On Mon, Jan 23, 2017 at 6:27 AM, Anne Archibald wrote: > > On Wed, Jan 18, 2017 at 4:13 PM Nadav Har'El wrote: >> >> On Wed, Jan 18, 2017 at 4:30 PM, wrote: >>> >>>> Having more sampling schemes would be useful, but it's not possible to implement sampling schemes with impossible properties. >>> >>> BTW: sampling 3 out of 3 without replacement is even worse >>> >>> No matter what sampling scheme and what selection probabilities we use, we always have every element with probability 1 in the sample. >> >> I agree. The random-sample function of the type I envisioned will be able to reproduce the desired probabilities in some cases (like the example I gave) but not in others. Because doing this correctly involves a set of n linear equations in comb(n,k) variables, it can have no solution, or many solutions, depending on the n and k, and the desired probabilities. A function of this sort could return an error if it can't achieve the desired probabilities. > > It seems to me that the basic problem here is that the numpy.random.choice docstring fails to explain what the function actually does when called with weights and without replacement. Clearly there are different expectations; I think numpy.random.choice chose one that is easy to explain and implement but not necessarily what everyone expects. So the docstring should be clarified. Perhaps a Notes section: > > When numpy.random.choice is called with replace=False and non-uniform probabilities, the resulting distribution of samples is not obvious. numpy.random.choice effectively follows the procedure: when choosing the kth element in a set, the probability of element i occurring is p[i] divided by the total probability of all not-yet-chosen (and therefore eligible) elements. This approach is always possible as long as the sample size is no larger than the population, but it means that the probability that element i occurs in the sample is not exactly p[i]. I don't object to some Notes, but I would probably phrase it more like we are providing the standard definition of the jargon term "sampling without replacement" in the case of non-uniform probabilities. To my mind (or more accurately, with my background), "replace=False" obviously picks out the implemented procedure, and I would have been incredibly surprised if it did anything else. If the option were named "unique=True", then I would have needed some more documentation to let me know exactly how it was implemented. -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From alebarde at gmail.com Mon Jan 23 09:52:56 2017 From: alebarde at gmail.com (alebarde at gmail.com) Date: Mon, 23 Jan 2017 15:52:56 +0100 Subject: [Numpy-discussion] Question about numpy.random.choice with probabilties In-Reply-To: References: Message-ID: 2017-01-23 15:33 GMT+01:00 Robert Kern : > On Mon, Jan 23, 2017 at 6:27 AM, Anne Archibald > wrote: > > > > On Wed, Jan 18, 2017 at 4:13 PM Nadav Har'El wrote: > >> > >> On Wed, Jan 18, 2017 at 4:30 PM, wrote: > >>> > >>>> Having more sampling schemes would be useful, but it's not possible > to implement sampling schemes with impossible properties. > >>> > >>> BTW: sampling 3 out of 3 without replacement is even worse > >>> > >>> No matter what sampling scheme and what selection probabilities we > use, we always have every element with probability 1 in the sample. > >> > >> I agree. The random-sample function of the type I envisioned will be > able to reproduce the desired probabilities in some cases (like the example > I gave) but not in others. Because doing this correctly involves a set of n > linear equations in comb(n,k) variables, it can have no solution, or many > solutions, depending on the n and k, and the desired probabilities. A > function of this sort could return an error if it can't achieve the desired > probabilities. > > > > It seems to me that the basic problem here is that the > numpy.random.choice docstring fails to explain what the function actually > does when called with weights and without replacement. Clearly there are > different expectations; I think numpy.random.choice chose one that is easy > to explain and implement but not necessarily what everyone expects. So the > docstring should be clarified. Perhaps a Notes section: > > > > When numpy.random.choice is called with replace=False and non-uniform > probabilities, the resulting distribution of samples is not obvious. > numpy.random.choice effectively follows the procedure: when choosing the > kth element in a set, the probability of element i occurring is p[i] > divided by the total probability of all not-yet-chosen (and therefore > eligible) elements. This approach is always possible as long as the sample > size is no larger than the population, but it means that the probability > that element i occurs in the sample is not exactly p[i]. > > I don't object to some Notes, but I would probably phrase it more like we > are providing the standard definition of the jargon term "sampling without > replacement" in the case of non-uniform probabilities. To my mind (or more > accurately, with my background), "replace=False" obviously picks out the > implemented procedure, and I would have been incredibly surprised if it did > anything else. If the option were named "unique=True", then I would have > needed some more documentation to let me know exactly how it was > implemented. > > FWIW, I totally agree with Robert > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- -------------------------------------------------------------------------- NOTICE: Dlgs 196/2003 this e-mail and any attachments thereto may contain confidential information and are intended for the sole use of the recipient(s) named above. If you are not the intended recipient of this message you are hereby notified that any dissemination or copying of this message is strictly prohibited. If you have received this e-mail in error, please notify the sender either by telephone or by e-mail and delete the material from any computer. Thank you. -------------------------------------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: From peridot.faceted at gmail.com Mon Jan 23 10:22:42 2017 From: peridot.faceted at gmail.com (Anne Archibald) Date: Mon, 23 Jan 2017 15:22:42 +0000 Subject: [Numpy-discussion] Question about numpy.random.choice with probabilties In-Reply-To: References: Message-ID: On Mon, Jan 23, 2017 at 3:34 PM Robert Kern wrote: > I don't object to some Notes, but I would probably phrase it more like we > are providing the standard definition of the jargon term "sampling without > replacement" in the case of non-uniform probabilities. To my mind (or more > accurately, with my background), "replace=False" obviously picks out the > implemented procedure, and I would have been incredibly surprised if it did > anything else. If the option were named "unique=True", then I would have > needed some more documentation to let me know exactly how it was > implemented. > It is what I would have expected too, but we have a concrete example of a user who expected otherwise; where one user speaks up, there are probably more who didn't (some of whom probably have code that's not doing what they think it does). So for the cost of adding a Note, why not help some of them? As for the standardness of the definition: I don't know, have you a reference where it is defined? More natural to me would be to have a list of items with integer multiplicities (as in: "cat" 3 times, "dog" 1 time). I'm hesitant to claim ours is a standard definition unless it's in a textbook somewhere. But I don't insist on my phrasing. Anne -------------- next part -------------- An HTML attachment was scrubbed... URL: From nyh at scylladb.com Mon Jan 23 10:41:57 2017 From: nyh at scylladb.com (Nadav Har'El) Date: Mon, 23 Jan 2017 17:41:57 +0200 Subject: [Numpy-discussion] Question about numpy.random.choice with probabilties In-Reply-To: References: Message-ID: On Mon, Jan 23, 2017 at 4:52 PM, alebarde at gmail.com wrote: > > > 2017-01-23 15:33 GMT+01:00 Robert Kern : > >> >> I don't object to some Notes, but I would probably phrase it more like we >> are providing the standard definition of the jargon term "sampling without >> replacement" in the case of non-uniform probabilities. To my mind (or more >> accurately, with my background), "replace=False" obviously picks out the >> implemented procedure, and I would have been incredibly surprised if it did >> anything else. If the option were named "unique=True", then I would have >> needed some more documentation to let me know exactly how it was >> implemented. >> >> FWIW, I totally agree with Robert > With my own background (MSc. in Mathematics), I agree that this algorithm is indeed the most natural one. And as I said, when I wanted to implement something myself when I wanted to choose random combinations (k out of n items), I wrote exactly the same one. But when it didn't produce the desired probabilities (even in cases where I knew that doing this was possible), I wrongly assumed numpy would do things differently - only to realize it uses exactly the same algorithm. So clearly, the documentation didn't quite explain what it does or doesn't do. Also, Robert, I'm curious: beyond explaining why the existing algorithm is reasonable (which I agree), could you give me an example of where it is actually *useful* for sampling? Let me give you an illustrative counter-example: Let's imagine a country that a country has 3 races: 40% Lilliputians, 40% Blefuscans, an 20% Yahoos (immigrants from a different section of the book ;-)). Gulliver wants to take a poll, and needs to sample people from all these races with appropriate proportions. These races live in different parts of town, so to pick a random person he needs to first pick one of the races and then a random person from that part of town. If he picks one respondent at a time, he uses numpy.random.choice(3, size=1,p=[0.4,0.4,0.2])) to pick the part of town, and then a person from that part - he gets the desired 40% / 40% / 20% division of races. Now imagine that Gulliver can interview two respondents each day, so he needs to pick two people each time. If he picks 2 choices of part-of-town *with* replacement, numpy.random.choice(3, size=2,p=[0.4,0.4,0.2]), that's also fine: he may need to take two people from the same part of town, or two from two different parts of town, but in any case will still get the desired 40% / 40% / 20% division between the races of the people he interviews. But consider that we are told that if two people from the same race meet in Gulliver's interview room, the two start chatting between themselves, and waste Gulliver's time. So he prefers to interview two people of *different* races. That's sampling without replacement. So he uses numpy.random.choice(size=2,p=[0.4,0.4,0.2],replace=False) to pick two different parts of town, and one person from each. But then he looks at his logs, and discovers he actually interviewed the races at 38% / 38% / 23% proportions - not the 40%/40%/20% he wanted. So the opinions of the Yahoos were over-counted in this poll! I know that this is a silly example (made even sillier by the names of races I used), but I wonder if you could give me an example where the current behavior of replace=False is genuinely useful. Not that I'm saying that fixing this problem is easy (I'm still struggling with it myself in the general case of size < n-1). Nadav. -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Mon Jan 23 10:47:54 2017 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 23 Jan 2017 09:47:54 -0600 Subject: [Numpy-discussion] Question about numpy.random.choice with probabilties In-Reply-To: References: Message-ID: On Mon, Jan 23, 2017 at 9:22 AM, Anne Archibald wrote: > > > On Mon, Jan 23, 2017 at 3:34 PM Robert Kern wrote: >> >> I don't object to some Notes, but I would probably phrase it more like we are providing the standard definition of the jargon term "sampling without replacement" in the case of non-uniform probabilities. To my mind (or more accurately, with my background), "replace=False" obviously picks out the implemented procedure, and I would have been incredibly surprised if it did anything else. If the option were named "unique=True", then I would have needed some more documentation to let me know exactly how it was implemented. > > > It is what I would have expected too, but we have a concrete example of a user who expected otherwise; where one user speaks up, there are probably more who didn't (some of whom probably have code that's not doing what they think it does). So for the cost of adding a Note, why not help some of them? That's why I said I'm fine with adding a Note. I'm just suggesting a re-wording so that the cautious language doesn't lead anyone who is familiar with the jargon to think we're doing something ad hoc while still providing the details for those who aren't so familiar. > As for the standardness of the definition: I don't know, have you a reference where it is defined? More natural to me would be to have a list of items with integer multiplicities (as in: "cat" 3 times, "dog" 1 time). I'm hesitant to claim ours is a standard definition unless it's in a textbook somewhere. But I don't insist on my phrasing. Textbook, I'm not so sure, but it is the *only* definition I've ever encountered in the literature: http://epubs.siam.org/doi/abs/10.1137/0209009 http://www.sciencedirect.com/science/article/pii/S002001900500298X -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From nyh at scylladb.com Mon Jan 23 11:08:18 2017 From: nyh at scylladb.com (Nadav Har'El) Date: Mon, 23 Jan 2017 18:08:18 +0200 Subject: [Numpy-discussion] Question about numpy.random.choice with probabilties In-Reply-To: References: Message-ID: On Mon, Jan 23, 2017 at 5:47 PM, Robert Kern wrote: > > > As for the standardness of the definition: I don't know, have you a > reference where it is defined? More natural to me would be to have a list > of items with integer multiplicities (as in: "cat" 3 times, "dog" 1 time). > I'm hesitant to claim ours is a standard definition unless it's in a > textbook somewhere. But I don't insist on my phrasing. > > Textbook, I'm not so sure, but it is the *only* definition I've ever > encountered in the literature: > > http://epubs.siam.org/doi/abs/10.1137/0209009 > Very interesting. This paper (PDF available if you search for its name in Google) explicitly mentions one of the uses of this algorithm is "multistage sampling", which appears to be exactly the same thing as in the hypothetical Gulliver example I gave in my earlier mail. And yet, I showed in my mail that this algorithm does NOT reproduce the desired frequency of the different sampling units... Moreover, this paper doesn't explain why you need the "without replacement" for this use case (everything seems easier, and the desired probabilities are reproduced, with replacement). In my story I gave a funny excuse why "without replacement" might be warrented, but if you're interested I can tell you a bit about my actual use case, with a more serious reason why I want without replacement. > http://www.sciencedirect.com/science/article/pii/S002001900500298X > > -- > Robert Kern > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Mon Jan 23 11:08:29 2017 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 23 Jan 2017 10:08:29 -0600 Subject: [Numpy-discussion] Question about numpy.random.choice with probabilties In-Reply-To: References: Message-ID: On Mon, Jan 23, 2017 at 9:41 AM, Nadav Har'El wrote: > > On Mon, Jan 23, 2017 at 4:52 PM, alebarde at gmail.com wrote: >> >> 2017-01-23 15:33 GMT+01:00 Robert Kern : >>> >>> I don't object to some Notes, but I would probably phrase it more like we are providing the standard definition of the jargon term "sampling without replacement" in the case of non-uniform probabilities. To my mind (or more accurately, with my background), "replace=False" obviously picks out the implemented procedure, and I would have been incredibly surprised if it did anything else. If the option were named "unique=True", then I would have needed some more documentation to let me know exactly how it was implemented. >>> >> FWIW, I totally agree with Robert > > With my own background (MSc. in Mathematics), I agree that this algorithm is indeed the most natural one. And as I said, when I wanted to implement something myself when I wanted to choose random combinations (k out of n items), I wrote exactly the same one. But when it didn't produce the desired probabilities (even in cases where I knew that doing this was possible), I wrongly assumed numpy would do things differently - only to realize it uses exactly the same algorithm. So clearly, the documentation didn't quite explain what it does or doesn't do. In my experience, I have seen "without replacement" mean only one thing. If the docstring had said "returns unique items", I'd agree that it doesn't explain what it does or doesn't do. The only issue is that "without replacement" is jargon, and it is good to recapitulate the definitions of such terms for those who aren't familiar with them. > Also, Robert, I'm curious: beyond explaining why the existing algorithm is reasonable (which I agree), could you give me an example of where it is actually *useful* for sampling? The references I previously quoted list a few. One is called "multistage sampling proportional to size". The idea being that you draw (without replacement) from a larger units (say, congressional districts) before sampling within them. It is similar to the situation you outline, but it is probably more useful at a different scale, like lots of larger units (where your algorithm is likely to provide no solution) rather than a handful. It is probably less useful in terms of survey design, where you are trying to *design* a process to get a result, than it is in queueing theory and related fields, where you are trying to *describe* and simulate a process that is pre-defined. -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From edwardlrichards at gmail.com Wed Jan 25 15:14:50 2017 From: edwardlrichards at gmail.com (Edward Richards) Date: Wed, 25 Jan 2017 12:14:50 -0800 Subject: [Numpy-discussion] Checking matrix condition number Message-ID: <5889073A.6050403@gmail.com> What is the best way to make sure that a matrix inversion makes any sense before preforming it? I am currently struggling to understand some results from matrix inversions in my work, and I would like to see if I am dealing with an ill-conditioned problem. It is probably user error, but I don't like having the possibility hanging over my head. I naively put a call to np.linalg.cond into my code; all of my cores went to 100% and a few minutes later I got a number. To be fair A is 6400 elements square, but this takes ~20x more time than the inversion. This is not really practical for what I am doing, is there a better way? This is partly in response to Ilhan Polat's post about introducing the A\b operator to numpy. I also couldn't check the Numpy mailing list archives to see if this has been asked before, the numpy-discussion gmane link isn't working for me at all. Thanks for your time, Ned From ilhanpolat at gmail.com Thu Jan 26 04:29:45 2017 From: ilhanpolat at gmail.com (Ilhan Polat) Date: Thu, 26 Jan 2017 10:29:45 +0100 Subject: [Numpy-discussion] Checking matrix condition number In-Reply-To: <5889073A.6050403@gmail.com> References: <5889073A.6050403@gmail.com> Message-ID: I've indeed opened an issue for this : https://github.com/numpy/numpy/issues/8090 . Recently, I've included the LAPACK routines into SciPy dev version that will come with version 0.19. Then you can use ?GECON, ?POCON and other ?XXCON routines for yourself or wait a bit more until I have time to implement it on the SciPy side. @rkern told me that for NumPy, C translations are involved but I couldn't find an entrance point to contribute for yet. It's a bit above my abilities to fully grasp the way of working in NumPy. You can read more in https://github.com/numpy/numpy/issues/3755 Best, ilhan On Wed, Jan 25, 2017 at 9:14 PM, Edward Richards wrote: > What is the best way to make sure that a matrix inversion makes any sense > before preforming it? I am currently struggling to understand some results > from matrix inversions in my work, and I would like to see if I am dealing > with an ill-conditioned problem. It is probably user error, but I don't > like having the possibility hanging over my head. > > I naively put a call to np.linalg.cond into my code; all of my cores went > to 100% and a few minutes later I got a number. To be fair A is 6400 > elements square, but this takes ~20x more time than the inversion. This is > not really practical for what I am doing, is there a better way? > > This is partly in response to Ilhan Polat's post about introducing the A\b > operator to numpy. I also couldn't check the Numpy mailing list archives to > see if this has been asked before, the numpy-discussion gmane link isn't > working for me at all. > > Thanks for your time, > Ned > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Fri Jan 27 13:24:16 2017 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 27 Jan 2017 18:24:16 +0000 Subject: [Numpy-discussion] Numpy development version wheels for testing Message-ID: Hi, I've taken advantage of the new travis-ci cron job feature [1] to set up daily builds of numpy manylinux and OSX wheels for the current trunk, uploading to: https://7933911d6844c6c53a7d-47bd50c35cd79bd838daf386af554a83.ssl.cf2.rackcdn.com The numpy build process already builds Ubuntu Precise numpy wheels for the current trunk, available at [2], but the cron-job manylinux wheels have the following advantages: * they are built the same way as our usual pypi wheels, using openblas, and so will be closer to the eventual numpy distributed wheel; * manylinux wheels will install on all the travis-ci containers, not just the Precise container; * manylinux wheels don't need any extra packages installed by apt, because they are self-contained. There's an example of use at https://github.com/matthew-brett/nibabel/blob/use-pre/.travis.yml#L23 Cheers, Matthew [1] https://docs.travis-ci.com/user/cron-jobs [2] https://f66d8a5767b134cb96d3-4ffdece11fd3f72855e4665bc61c7445.ssl.cf2.rackcdn.com From evgeny.burovskiy at gmail.com Sat Jan 28 06:37:28 2017 From: evgeny.burovskiy at gmail.com (Evgeni Burovski) Date: Sat, 28 Jan 2017 14:37:28 +0300 Subject: [Numpy-discussion] Numpy development version wheels for testing In-Reply-To: References: Message-ID: On Fri, Jan 27, 2017 at 9:24 PM, Matthew Brett wrote: > Hi, > > I've taken advantage of the new travis-ci cron job feature [1] to set > up daily builds of numpy manylinux and OSX wheels for the current > trunk, uploading to: > > https://7933911d6844c6c53a7d-47bd50c35cd79bd838daf386af554a83.ssl.cf2.rackcdn.com > > The numpy build process already builds Ubuntu Precise numpy wheels for > the current trunk, available at [2], but the cron-job manylinux wheels > have the following advantages: > > * they are built the same way as our usual pypi wheels, using > openblas, and so will be closer to the eventual numpy distributed > wheel; > * manylinux wheels will install on all the travis-ci containers, not > just the Precise container; > * manylinux wheels don't need any extra packages installed by apt, > because they are self-contained. > > There's an example of use at > https://github.com/matthew-brett/nibabel/blob/use-pre/.travis.yml#L23 > > Cheers, > > Matthew > > [1] https://docs.travis-ci.com/user/cron-jobs > [2] https://f66d8a5767b134cb96d3-4ffdece11fd3f72855e4665bc61c7445.ssl.cf2.rackcdn.com This is great, thank you Matthew! From faltet at gmail.com Sun Jan 29 08:07:48 2017 From: faltet at gmail.com (Francesc Alted) Date: Sun, 29 Jan 2017 14:07:48 +0100 Subject: [Numpy-discussion] ANN: numexpr 2.6.2 released! Message-ID: ========================= Announcing Numexpr 2.6.2 ========================= What's new ========== This is a maintenance release that fixes several issues, with special emphasis in keeping compatibility with newer NumPy versions. Also, initial support for POWER processors is here. Thanks to Oleksandr Pavlyk, Alexander Shadchin, Breno Leitao, Fernando Seiti Furusato and Antonio Valentino for their nice contributions. In case you want to know more in detail what has changed in this version, see: https://github.com/pydata/numexpr/blob/master/RELEASE_NOTES.rst What's Numexpr ============== Numexpr is a fast numerical expression evaluator for NumPy. With it, expressions that operate on arrays (like "3*a+4*b") are accelerated and use less memory than doing the same calculation in Python. It wears multi-threaded capabilities, as well as support for Intel's MKL (Math Kernel Library), which allows an extremely fast evaluation of transcendental functions (sin, cos, tan, exp, log...) while squeezing the last drop of performance out of your multi-core processors. Look here for a some benchmarks of numexpr using MKL: https://github.com/pydata/numexpr/wiki/NumexprMKL Its only dependency is NumPy (MKL is optional), so it works well as an easy-to-deploy, easy-to-use, computational engine for projects that don't want to adopt other solutions requiring more heavy dependencies. Where I can find Numexpr? ========================= The project is hosted at GitHub in: https://github.com/pydata/numexpr You can get the packages from PyPI as well (but not for RC releases): http://pypi.python.org/pypi/numexpr Share your experience ===================== Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. Enjoy data! -- Francesc Alted -------------- next part -------------- An HTML attachment was scrubbed... URL: From spluque at gmail.com Tue Jan 31 19:56:30 2017 From: spluque at gmail.com (Seb) Date: Tue, 31 Jan 2017 18:56:30 -0600 Subject: [Numpy-discussion] composing Euler rotation matrices Message-ID: <87h94e4tkx.fsf@otaria.sebmel.org> Hello, I'm trying to compose Euler rotation matrices shown in https://en.wikipedia.org/wiki/Euler_angles#Rotation_matrix. For example, The Z1Y2X3 Tait-Bryan rotation shown in the table can be represented in Numpy using the function: def z1y2x3(alpha, beta, gamma): """Rotation matrix given Euler angles""" return np.array([[np.cos(alpha) * np.cos(beta), np.cos(alpha) * np.sin(beta) * np.sin(gamma) - np.cos(gamma) * np.sin(alpha), np.sin(alpha) * np.sin(gamma) + np.cos(alpha) * np.cos(gamma) * np.sin(beta)], [np.cos(beta) * np.sin(alpha), np.cos(alpha) * np.cos(gamma) + np.sin(alpha) * np.sin(beta) * np.sin(gamma), np.cos(gamma) * np.sin(alpha) * np.sin(beta) - np.cos(alpha) * np.sin(gamma)], [-np.sin(beta), np.cos(beta) * np.sin(gamma), np.cos(beta) * np.cos(gamma)]]) which given alpha, beta, gamma as: angles = np.radians(np.array([30, 20, 10])) returns the following matrix: In [31]: z1y2x3(angles[0], angles[1], angles[2]) Out[31]: array([[ 0.81379768, -0.44096961, 0.37852231], [ 0.46984631, 0.88256412, 0.01802831], [-0.34202014, 0.16317591, 0.92541658]]) If I understand correctly, one should be able to compose this matrix by multiplying the rotation matrices that it is made of. However, I cannot reproduce this matrix via composition; i.e. by multiplying the underlying rotation matrices. Any tips would be appreciated. -- Seb From jfoxrabinovitz at gmail.com Tue Jan 31 21:23:55 2017 From: jfoxrabinovitz at gmail.com (Joseph Fox-Rabinovitz) Date: Tue, 31 Jan 2017 21:23:55 -0500 Subject: [Numpy-discussion] composing Euler rotation matrices In-Reply-To: <87h94e4tkx.fsf@otaria.sebmel.org> References: <87h94e4tkx.fsf@otaria.sebmel.org> Message-ID: Could you show what you are doing to get the statement "However, I cannot reproduce this matrix via composition; i.e. by multiplying the underlying rotation matrices.". I would guess something involving the `*` operator instead of `@`, but guessing probably won't help you solve your issue. -Joe On Tue, Jan 31, 2017 at 7:56 PM, Seb wrote: > Hello, > > I'm trying to compose Euler rotation matrices shown in > https://en.wikipedia.org/wiki/Euler_angles#Rotation_matrix. For > example, The Z1Y2X3 Tait-Bryan rotation shown in the table can be > represented in Numpy using the function: > > def z1y2x3(alpha, beta, gamma): > """Rotation matrix given Euler angles""" > return np.array([[np.cos(alpha) * np.cos(beta), > np.cos(alpha) * np.sin(beta) * np.sin(gamma) - > np.cos(gamma) * np.sin(alpha), > np.sin(alpha) * np.sin(gamma) + > np.cos(alpha) * np.cos(gamma) * np.sin(beta)], > [np.cos(beta) * np.sin(alpha), > np.cos(alpha) * np.cos(gamma) + > np.sin(alpha) * np.sin(beta) * np.sin(gamma), > np.cos(gamma) * np.sin(alpha) * np.sin(beta) - > np.cos(alpha) * np.sin(gamma)], > [-np.sin(beta), np.cos(beta) * np.sin(gamma), > np.cos(beta) * np.cos(gamma)]]) > > which given alpha, beta, gamma as: > > angles = np.radians(np.array([30, 20, 10])) > > returns the following matrix: > > In [31]: z1y2x3(angles[0], angles[1], angles[2]) > Out[31]: > > array([[ 0.81379768, -0.44096961, 0.37852231], > [ 0.46984631, 0.88256412, 0.01802831], > [-0.34202014, 0.16317591, 0.92541658]]) > > If I understand correctly, one should be able to compose this matrix by > multiplying the rotation matrices that it is made of. However, I cannot > reproduce this matrix via composition; i.e. by multiplying the > underlying rotation matrices. Any tips would be appreciated. > > -- > Seb > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From spluque at gmail.com Tue Jan 31 22:27:35 2017 From: spluque at gmail.com (Seb) Date: Tue, 31 Jan 2017 21:27:35 -0600 Subject: [Numpy-discussion] composing Euler rotation matrices References: <87h94e4tkx.fsf@otaria.sebmel.org> Message-ID: <87d1f24ml4.fsf@otaria.sebmel.org> On Tue, 31 Jan 2017 21:23:55 -0500, Joseph Fox-Rabinovitz wrote: > Could you show what you are doing to get the statement "However, I > cannot reproduce this matrix via composition; i.e. by multiplying the > underlying rotation matrices.". I would guess something involving the > `*` operator instead of `@`, but guessing probably won't help you > solve your issue. Sure, although composition is not something I can take credit for, as it's a well-described operation for generating linear transformations. It is the matrix multiplication of two or more transformation matrices. In the case of Euler transformations, it's matrices specifying rotations around 3 orthogonal axes by 3 given angles. I'm using `numpy.dot' to perform matrix multiplication on 2D arrays representing matrices. However, it's not obvious from the link I provided what particular rotation matrices are multiplied and in what order (i.e. what composition) is used to arrive at the Z1Y2X3 rotation matrix shown. Perhaps I'm not understanding the conventions used therein. This is one of my attempts at reproducing that rotation matrix via composition: ---<--------------------cut here---------------start------------------->--- import numpy as np angles = np.radians(np.array([30, 20, 10])) def z1y2x3(alpha, beta, gamma): """Z1Y2X3 rotation matrix given Euler angles""" return np.array([[np.cos(alpha) * np.cos(beta), np.cos(alpha) * np.sin(beta) * np.sin(gamma) - np.cos(gamma) * np.sin(alpha), np.sin(alpha) * np.sin(gamma) + np.cos(alpha) * np.cos(gamma) * np.sin(beta)], [np.cos(beta) * np.sin(alpha), np.cos(alpha) * np.cos(gamma) + np.sin(alpha) * np.sin(beta) * np.sin(gamma), np.cos(gamma) * np.sin(alpha) * np.sin(beta) - np.cos(alpha) * np.sin(gamma)], [-np.sin(beta), np.cos(beta) * np.sin(gamma), np.cos(beta) * np.cos(gamma)]]) euler_mat = z1y2x3(angles[0], angles[1], angles[2]) ## Now via composition def rotation_matrix(theta, axis, active=False): """Generate rotation matrix for a given axis Parameters ---------- theta: numeric, optional The angle (degrees) by which to perform the rotation. Default is 0, which means return the coordinates of the vector in the rotated coordinate system, when rotate_vectors=False. axis: int, optional Axis around which to perform the rotation (x=0; y=1; z=2) active: bool, optional Whether to return active transformation matrix. Returns ------- numpy.ndarray 3x3 rotation matrix """ theta = np.radians(theta) if axis == 0: R_theta = np.array([[1, 0, 0], [0, np.cos(theta), -np.sin(theta)], [0, np.sin(theta), np.cos(theta)]]) elif axis == 1: R_theta = np.array([[np.cos(theta), 0, np.sin(theta)], [0, 1, 0], [-np.sin(theta), 0, np.cos(theta)]]) else: R_theta = np.array([[np.cos(theta), -np.sin(theta), 0], [np.sin(theta), np.cos(theta), 0], [0, 0, 1]]) if active: R_theta = np.transpose(R_theta) return R_theta ## The rotations are given as active xmat = rotation_matrix(angles[2], 0, active=True) ymat = rotation_matrix(angles[1], 1, active=True) zmat = rotation_matrix(angles[0], 2, active=True) ## The operation seems to imply this composition euler_comp_mat = np.dot(xmat, np.dot(ymat, zmat)) ---<--------------------cut here---------------end--------------------->--- I believe the matrices `euler_mat' and `euler_comp_mat' should be the same, but they aren't, so it's unclear to me what particular composition is meant to produce the matrix specified by this Z1Y2X3 transformation. What am I missing? -- Seb From shoyer at gmail.com Tue Jan 31 23:19:08 2017 From: shoyer at gmail.com (Stephan Hoyer) Date: Tue, 31 Jan 2017 20:19:08 -0800 Subject: [Numpy-discussion] ANN: xarray v0.9 released Message-ID: I'm pleased to announce the release of the latest major version of xarray, v0.9. xarray is an open source project and Python package that provides a toolkit and data structures for N-dimensional labeled arrays. Its approach combines an API inspired by pandas with the Common Data Model for self-described scientific data. This release includes five months worth of enhancements and bug fixes from 24 contributors, including some significant enhancements to the data model that are not fully backwards compatible. Highlights include: - Coordinates are now optional in the xarray data model, even for dimensions. - Changes to caching, lazy loading and pickling to improve xarray?s experience for parallel computing. - Improvements for accessing and manipulating pandas.MultiIndex levels. - Many new methods and functions, including quantile(), cumsum(), cumprod(), combine_firstset_index(), reset_index(), reorder_levels(), full_like(), zeros_like(), ones_like(), open_dataarray(), compute(), Dataset.info(), testing.assert_equal(), testing.assert_identical(), and testing.assert_allclose(). For more details, read the full release notes: http://xarray.pydata.org/en/latest/whats-new.html You can install xarray with pip or conda: pip install xarray conda install -c conda-forge xarray Best, Stephan -------------- next part -------------- An HTML attachment was scrubbed... URL: