From jfoxrabinovitz at gmail.com Tue Apr 3 17:45:21 2018 From: jfoxrabinovitz at gmail.com (Joseph Fox-Rabinovitz) Date: Tue, 3 Apr 2018 17:45:21 -0400 Subject: [Numpy-discussion] Possible bug in np.array type calculation Message-ID: I recently asked a question on Stack Overflow about whether `np.array` could raise an error if not passed a dtype parameter: https://stackoverflow.com/q/49639414/2988730. Turns out it can: np.array([1, [2]]) raises `ValueError: setting an array element with a sequence.` Surprisingly though, the following does not, and gives the expected array with `dtype=object`: np.array([[1], 2]) Is this behavior a bug of sorts, or is there some arcane reason behind it? Regards, - Joe -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue Apr 3 21:05:22 2018 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 3 Apr 2018 19:05:22 -0600 Subject: [Numpy-discussion] Possible bug in np.array type calculation In-Reply-To: References: Message-ID: On Tue, Apr 3, 2018 at 3:45 PM, Joseph Fox-Rabinovitz < jfoxrabinovitz at gmail.com> wrote: > I recently asked a question on Stack Overflow about whether `np.array` > could raise an error if not passed a dtype parameter: > https://stackoverflow.com/q/49639414/2988730. > > Turns out it can: > > np.array([1, [2]]) > > raises `ValueError: setting an array element with a sequence.` > Surprisingly though, the following does not, and gives the expected array > with `dtype=object`: > > np.array([[1], 2]) > > Is this behavior a bug of sorts, or is there some arcane reason behind it? > It's a bug of sorts, but the creation of object arrays is weird anyway. There has been a long time semi-proposal to raise an error in these cases unless `dtype=object` is specified, and even then there is a question of where the array ends and the objects begin, nested lists of mixed sized and such. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefanv at berkeley.edu Fri Apr 6 02:08:24 2018 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Thu, 5 Apr 2018 23:08:24 -0700 Subject: [Numpy-discussion] Introduction: NumPy developers at BIDS Message-ID: <20180406060824.65ldhpbynwqbmgek@carbo> Hi everyone, I am excited to report that we have completed the hiring of two full time NumPy developers at BIDS [0]. Matti Picus has done extensive work on the PyPy project and specifically cpyext, their C compatibility layer that allows PyPy to run NumPy. In the course of this work, Matti has also been contributing to NumPy itself. He will officially start on Monday. Tyler Reddy joins us from Los Alamos National Lab for a two year sabbatical. Tyler has been working mainly on SciPy, and will start at BIDS late in June. We are very excited about this opportunity to develop NumPy further, together with the NumPy community, and look forward to making in-person introductions at SciPy2018 in July. Best regards, St?fan [0] Berkeley Institute for Data Science at UC Berkeley From charlesr.harris at gmail.com Fri Apr 6 09:42:34 2018 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 6 Apr 2018 07:42:34 -0600 Subject: [Numpy-discussion] Introduction: NumPy developers at BIDS In-Reply-To: <20180406060824.65ldhpbynwqbmgek@carbo> References: <20180406060824.65ldhpbynwqbmgek@carbo> Message-ID: On Fri, Apr 6, 2018 at 12:08 AM, Stefan van der Walt wrote: > Hi everyone, > > I am excited to report that we have completed the hiring of two full > time NumPy developers at BIDS [0]. > > Matti Picus has done extensive work on the PyPy project and specifically > cpyext, their C compatibility layer that allows PyPy to run NumPy. In > the course of this work, Matti has also been contributing to NumPy > itself. He will officially start on Monday. > > Tyler Reddy joins us from Los Alamos National Lab for a two year > sabbatical. Tyler has been working mainly on SciPy, and will start at > BIDS late in June. > > We are very excited about this opportunity to develop NumPy further, > together with the NumPy community, and look forward to making in-person > introductions at SciPy2018 in July. > > It begins ... :) Congratulations and welcome to Matti and Tyler. I expect the first couple of days will be spent dealing with organizational details. What is first on the agenda after that? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Apr 6 12:42:48 2018 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 6 Apr 2018 10:42:48 -0600 Subject: [Numpy-discussion] Switch to pytest Message-ID: Hi All, Just a heads up that there is a PR to switching numpy testing from nose to pytest. I will put it in soon if there are no complaints. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Fri Apr 6 23:51:21 2018 From: cournape at gmail.com (David Cournapeau) Date: Sat, 7 Apr 2018 12:51:21 +0900 Subject: [Numpy-discussion] PyData Man AHL Hackathon Message-ID: Hi there, Man AHL is organizing a hackathon for various projects around PyData, including NumPy (but also SciPy, etc.: https://www.ahl.com/hackathon). They have generously offered some funding to get some contributors to help. Not many current contributors from NumPy could make the trip, so I agreed to help. As I have not contributed in NumPy for sometimes, I think it would make more sense for me to help onboarding new contributors that may be interested, and also do some code reviews instead of contributing directly. I see that we already have > 30 bugs labeled as easy, which would be a good starting point. Is there particular areas where we should focus ? regards, David -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Apr 7 00:16:36 2018 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 6 Apr 2018 22:16:36 -0600 Subject: [Numpy-discussion] PyData Man AHL Hackathon In-Reply-To: References: Message-ID: On Fri, Apr 6, 2018 at 9:51 PM, David Cournapeau wrote: > Hi there, > > Man AHL is organizing a hackathon for various projects around PyData, > including NumPy (but also SciPy, etc.: https://www.ahl.com/hackathon). > > They have generously offered some funding to get some contributors to > help. Not many current contributors from NumPy could make the trip, so I > agreed to help. As I have not contributed in NumPy for sometimes, I think > it would make more sense for me to help onboarding new contributors that > may be interested, and also do some code reviews instead of contributing > directly. > > I see that we already have > 30 bugs labeled as easy, which would be a > good starting point. Is there particular areas where we should focus ? > > regards, > > David > > If you don't get caught up in onboarding developers, code review would be helpful. We have a real problem trying to keep up. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From jfoxrabinovitz at gmail.com Sun Apr 8 02:49:41 2018 From: jfoxrabinovitz at gmail.com (Joseph Fox-Rabinovitz) Date: Sun, 8 Apr 2018 02:49:41 -0400 Subject: [Numpy-discussion] ENH: Adding a count parameter to np.unpackbits Message-ID: Hi, I have added PR #10855 to allow unpackbits to unpack less than the entire set of bits. This is not a very big change, and 100% backwards compatible. It serves two purposes: 1. To make packbits and unpackbits completely invertible (and prevent things like this from being necessary: https://stackoverflow.com/a/44962805/2988730) 2. To prevent an unnecessary waste of space for large arrays that are unpacked along a single dimension. Regards, - Joe -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefanv at berkeley.edu Sun Apr 8 03:19:31 2018 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Sun, 8 Apr 2018 00:19:31 -0700 Subject: [Numpy-discussion] Introduction: NumPy developers at BIDS In-Reply-To: References: <20180406060824.65ldhpbynwqbmgek@carbo> Message-ID: <20180408071931.kohuvysr5sfcmfzl@carbo> On Fri, 06 Apr 2018 07:42:34 -0600, Charles R Harris wrote: > On Fri, Apr 6, 2018 at 12:08 AM, Stefan van der Walt > wrote: > > We are very excited about this opportunity to develop NumPy further, > > together with the NumPy community, and look forward to making in-person > > introductions at SciPy2018 in July. > > > It begins ... :) Congratulations and welcome to Matti and Tyler. I expect > the first couple of days will be spent dealing with organizational details. > What is first on the agenda after that? On a high level, we have the following focuses: - To support the community by providing developer time to do code review, triage issues, fix bugs, help with releases, implement infrastructure (e.g., improve benchmarking, inter-package testing, project analytics), etc. The first work done will be mainly in this category. - To solve medium- and large-scale issues through the design and implementation of community-approved NEPs. This might include items such as duck arrays, parameterized dtypes, and missing value support. - To provide logistical (and some financial) support for the organization of NumPy developer meetings, coding sprints, sabbaticals, technical talks, and similar community-building activities. We started with the recent NEP writing sprint; this is to be followed by an Airspeed Velocity sprint with Mike Droettboom at BIDS around mid-June, and a NumPy developer meeting at SciPy2018 on July 14?15. We would love community input on identifying the best areas & issues to pay attention to, and I invite developers who want to meet with the team at Berkeley to contact me. Best regards, St?fan From efiring at hawaii.edu Sun Apr 8 14:02:19 2018 From: efiring at hawaii.edu (Eric Firing) Date: Sun, 8 Apr 2018 08:02:19 -1000 Subject: [Numpy-discussion] Introduction: NumPy developers at BIDS In-Reply-To: <20180408071931.kohuvysr5sfcmfzl@carbo> References: <20180406060824.65ldhpbynwqbmgek@carbo> <20180408071931.kohuvysr5sfcmfzl@carbo> Message-ID: On 2018/04/07 9:19 PM, Stefan van der Walt wrote: > We would love community input on identifying the best areas & issues to > pay attention to, Stefan, What is the best way to provide this, and how will the decisions be made? Eric From einstein.edison at gmail.com Mon Apr 9 07:37:22 2018 From: einstein.edison at gmail.com (Hameer Abbasi) Date: Mon, 9 Apr 2018 13:37:22 +0200 Subject: [Numpy-discussion] PR to add an initializer kwarg to ufunc.reduce (and similar functions) In-Reply-To: <1522087740.11797.7.camel@sipsolutions.net> References: <1522079156.4888.12.camel@sipsolutions.net> <1522082927.4888.24.camel@sipsolutions.net> <1522083291.8319.3.camel@sipsolutions.net> <1522084192.8883.6.camel@sipsolutions.net> <1522087740.11797.7.camel@sipsolutions.net> Message-ID: I've renamed the kwarg to `initial`. I'm willing to make the object dtype changes as well, if someone pointed me to relevant bits of code. Unfortunately, currently, the identity is only used for object dtypes if the reduction is empty. I think this is to prevent things like `0` being passed in the sum of objects (and similar cases), which makes sense. However, with the kwarg, it makes sense to include it in the reduction. I think the change will be somewhere along the lines of: Detect if `initial` was passed, if so, include for object, otherwise exclude. I personally feel `initial` renders `default` redundant. It can be used for both purposes. I can't think of a reasonable use case where you would want the default to be different from the initial value. However, I do agree that fixing the object case is important, we don't want users to get used to this behaviour and then rely on it later. Hameer On Mon, Mar 26, 2018 at 8:09 PM, Sebastian Berg wrote: > On Mon, 2018-03-26 at 17:40 +0000, Eric Wieser wrote: > > The difficulty in supporting object arrays is that func.reduce(arr, > > initial=func.identity) and func.reduce(arr) have different meanings - > > whereas with the current patch, they are equivalent. > > > > True, but the current meaning is: > > func.reduce(arr, intial=, default=func.identity) > > in the case for object dtype. Luckily for normal dtypes, func.identity > is both the correct default "default" and a no-op for initial. Thus the > name "identity" kinda works there. I am also not really sure that both > kwargs would make real sense (plus initial probably disallows > default...), but I got some feeling that the "default" meaning may be > even more useful to simplify special casing the empty case. > > Anyway, still just pointing out that I it gives me some headaches to > see such a special case for objects :(. > > - Sebastian > > > > > > On Mon, 26 Mar 2018 at 10:10 Sebastian Berg > et> wrote: > > > On Mon, 2018-03-26 at 12:59 -0400, Hameer Abbasi wrote: > > > > That may be complicated. Currently, the identity isn't used in > > > object > > > > dtype reductions. We may need to change that, which could cause a > > > > whole lot of other backwards incompatible changes. For example, > > > sum > > > > actually including zero in object reductions. Or we could pass in > > > a > > > > flag saying an initializer was passed in to change that > > > behaviour. If > > > > this is agreed upon and someone is kind enough to point me to the > > > > code, I'd be willing to make this change. > > > > > > I realize the implication, I am not suggesting to change the > > > default > > > behaviour (when no initial=... is passed), I would think about > > > deprecating it, but probably only if we also have the `default` > > > argument, since otherwise you cannot replicate the old behaviour. > > > > > > What I think I would like to see is to change how it works if (and > > > only > > > if) the initializer is passed in. Yes, this will require holding on > > > to > > > some extra information since you will have to know/remember whether > > > the > > > "identity" was passed in or defined otherwise. > > > > > > I did not check the code, but I would hope that it is not awfully > > > tricky to do that. > > > > > > - Sebastian > > > > > > > > > PS: A side note, but I see your emails as a single block of text > > > with > > > no/broken new-lines. > > > > > > > > > > On 26/03/2018 at 18:54, > > > > Sebastian wrote: On Mon, 2018-03-26 at 18:48 +0200, Sebastian > > > Berg > > > > wrote: On Mon, 2018-03-26 at 11:53 -0400, Hameer Abbasi wrote: > > > It'll > > > > need to be thought out for object arrays and subclasses. But for > > > > Regular numeric stuff, Numpy uses fmin and this would have the > > > > desired > > > > effect. I do not want to block this, but I would like a clearer > > > > opinion about this issue, `np.nansum` as Benjamin noted would > > > require > > > > something like: np.nansum([np.nan], default=np.nan) because > > > > np.sum([1], initializer=np.nan) np.nansum([1], > > > initializer=np.nan) > > > > would both give NaN if the logic is the same as the current > > > `np.sum`. > > > > And yes, I guess for fmin/fmax NaN happens to work. And then > > > there > > > > are > > > > many nonsense reduces which could make sense with `initializer`. > > > Now > > > > nansum is not implemented in a way that could make use of the new > > > > kwarg anyway, so maybe it does not matter in some sense. We can > > > in > > > > principle use `default` in nansum and at some point possibly add > > > > `default` to the normal ufuncs. If we argue like that, the only > > > > annoying thing is the `object` dtype which confuses the two use > > > cases > > > > currently. This confusion IMO is not harmless, because I might > > > want > > > > to > > > > use it (e.g. sum with initializer=5), and I would expect things > > > like > > > > dropping in `decimal.Decimal` to work most of the time, while > > > here it > > > > would give silently bad results. In other words: I am very very > > > much > > > > in favor if you get rid that object dtype special case. I frankly > > > not > > > > see why not (except that it needs a bit more code change). If > > > given > > > > explicitly, we might as well force the use and not do the funny > > > stuff > > > > which is designed to be more type agnostic! If it happens to fail > > > due > > > > to not being type agnostic, it will at least fail loudly. If you > > > > leave > > > > that object special case I am *very* hesitant about it. That I > > > think > > > > I > > > > would like a `default` argument as well, is another issue and it > > > can > > > > wait to another day. - Sebastian - Sebastian On 26/03/2018 at > > > 17:45, > > > > Sebastian wrote: On Mon, 2018-03-26 at 11:39 -0400, Hameer Abbasi > > > > wrote: That is the idea, but NaN functions are in a separate > > > branch > > > > for another PR to be discussed later. You can see it on my fork, > > > if > > > > you're interested. Except that as far as I understand I am not > > > sure > > > > it > > > > will help much with it, since it is not a default, but an > > > > initializer. > > > > Initializing to NaN would just make all results NaN. - Sebastian > > > On > > > > 26/03/2018 at 17:35, Benjamin wrote: Hmm, this is neat. I imagine > > > it > > > > would finally give some people a choice on what > > > np.nansum([np.nan]) > > > > should return? It caused a huge hullabeloo a few years ago when > > > we > > > > changed it from returning NaN to returning zero. Ben Root On Mon, > > > Mar > > > > 26, 2018 at 11:16 AM, Sebastian Berg > > > > wrote: OK, the new documentation is actually clear: initializer : > > > > scalar, optional The value with which to start the reduction. > > > > Defaults > > > > to the `~numpy.ufunc.identity` of the ufunc. If ``None`` is > > > given, > > > > the > > > > first element of the reduction is used, and an error is thrown if > > > the > > > > reduction is empty. If ``a.dtype`` is ``object``, then the > > > > initializer > > > > is _only_ used if reduction is empty. I would actually like to > > > say > > > > that I do not like the object special case much (and it is > > > probably > > > > the reason why I was confused), nor am I quite sure this is what > > > > helps > > > > a lot? Logically, I would argue there are two things: 1. > > > > initializer/start (always used) 2. default (oly used for empty > > > > reductions) For example, I might like to give `np.nan` as the > > > default > > > > for some empty reductions, this will not work. I understand that > > > this > > > > is a minimal invasive PR and I am not sure I find the solution > > > bad > > > > enough to really dislike it, but what do other think? My first > > > > expectation was the default behaviour (in all cases, not just > > > object > > > > case) for some reason. To be honest, for now I just wonder a bit: > > > How > > > > hard would it be to do both, or is that too annoying? It would at > > > > least get rid of that annoying thing with object ufuncs (which > > > > currently have a default, but not really an > > > identity/initializer). > > > > Best, Sebastian On Mon, 2018-03-26 at 08:20 -0400, Hameer Abbasi > > > > wrote: > Actually, the behavior right now isn?t that of `default` > > > but > > > > that of > `initializer` or `start`. > > This was discussed > > > further > > > > down in the PR but to reiterate: > `np.sum([10], initializer=5)` > > > > becomes `15`. > > Also, `np.min([5], initializer=0)` becomes `0`, > > > so > > > > it isn?t really > the default value, it?s the initial value among > > > > which the reduction > is performed. > > This was the reason to > > > call > > > > it > > > > initializer in the first place. I like > `initial` and > > > > `initial_value` > > > > as well, and `start` also makes sense > but isn?t descriptive > > > enough. > > > > > > Hameer > Sent from Astro for Mac > > > On Mar 26, 2018 at > > > 12:06, > > > > > > > > Sebastian Berg > t> wrote: > > > > > > > > Initializer or this sounds fine to me. As an other data point > > > which > > > > > > I > > think has been mentioned before, `sum` uses start and > > > min/max > > > > > > > > use > > default. `start` does not work, unless we also change the > > > > code > > > > to > > always use the identity if given (currently that is not > > > the > > > > case), > > in > > which case it might be nice. However, "start" > > > seems > > > > a bit like > > solving > > a different issue in any case. > > > > > > > > Anyway, mostly noise. I really like adding this, the only thing > > > > > > > > > worth > > discussing a bit is the name :). - Sebastian > > > > > > > > > On > > > > Mon, 2018-03-26 at 05:57 -0400, Hameer Abbasi wrote: > > > It > > > calls > > > > it > > > > `initializer` - See https://docs.python.org/3.5/libra > > > ry/f > > > > > > > > > > > > > > unctools.html#functools.reduce > > > > > > Sent from Astro for > > > Mac On > > > > Mar 26, 2018 at 09:54, Eric Wieser > > > > com> > > > > > > > > wrote: > > > > > > > > It turns out I mispoke - > > > > > > > > functools.reduce calls the argument > > > > `initial` > > > > > > > > > > > > > > On > > > > Mon, 26 Mar 2018 at 00:17 Stephan Hoyer > > > > > > > wrote: > > > > > This looks like a very logical addition to the > > > > reduce > > > > interface. > > > > > It has my support! > > > > > > > > > I would > > > > have > > > > preferred the more descriptive name > > > > > "initial_value", > > > > > > > > > > > > > > > > but consistency with functools.reduce makes a compelling case > > > > > > > > > > > > for > > > > > "initializer". > > > > > On Sun, Mar 25, 2018 > > > at > > > > > > > > 1:15 PM Eric Wieser > > > > ail.com> > > > wrote: > > > > To reiterate my comments in the issue - I'm in favor of > this. > > > > > > > > > > > > > > > > > > > It seems seem especially valuable for identity- > > > less > > > > > > > > > > > > > > > > > > > > functions > > > > (`min`, `max`, `lcm`), and the > > > argument > > > > > > > > name is consistent > > > with > `functools.reduce`. too. > > > > > > > > > > > > > > > > > > > > > > > The only argument I can see against merging this would > > > be > > > > > > > > > > `kwarg`-creep of `reduce`, and I think this has enough > > > use > > > > > > > > > > > > > > > > > > > > > > > > cases to justify that. > > > > > > > > > > > > I'd like to > > > > > > > merge > > > > > > > > in a few days, if no one else has any > > > > > > opinions. > > > > > > > > > > > > > Eric > > > > > > > > > > > > On Fri, 16 Mar 2018 at 10:13 > > > Hameer > > > > > > > > Abbasi > > > > > @gma > > > > > > il.com> > > > wrote: > > > > Hello, everyone. I?ve submitted a PR to add a initializer kwarg > > > to > > > > ufunc.reduce. This is useful in a few cases, e.g., > > > > > > > > > > it > > > > allows one to supply a ?default? value for identity- > > > > > > > > > > > > > > less > > > > > > > ufunc reductions, and specify an initial value > > > for > > > > > > > > > > > reductions such as sum (other than zero.) > > > > > > > > > > > > > > > > > > > > > > > > > > > > Please feel free to review or leave feedback, (although > > > I > > > > > > > > > > > > > > > > > think Eric and Marten have picked it apart pretty well). > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/numpy/numpy/pull/10635 > > > > > Thanks, > > > > > > > > > > > > > > > Hameer > > > > Sent from Astro for Mac > > > > > > > > > _______________________________________________ > > > NumPy- > > > > Discussion > > > > mailing list > > > > > > > NumPy-Discussion at python.org > > > > > > > > > > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > > > > > > _______________________________________________ > > > > > > > > > NumPy-Discussion mailing list > > > > > > NumPy-Discussion at python > > > .o > > > > rg > > > > https://mail.python.org/mailman/listinfo/numpy-discussi on > > > > _______________________________________________ > > > > > > > > NumPy-Discussion mailing list > > > > > NumPy-Discussion at python.o > > > rg > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > _______________________________________________ > > > > > > > > NumPy-Discussion mailing list > > > > NumPy-Discussion at python.org > > > > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > > _______________________________________________ > > > NumPy- > > > > Discussion mailing list > > > NumPy-Discussion at python.org > > > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > > _______________________________________________ > > NumPy- > > > Discussion > > > > mailing list > > NumPy-Discussion at python.org > > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > _______________________________________________ > NumPy- > > > Discussion > > > > mailing list > NumPy-Discussion at python.org > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ NumPy-Discussion > > > > mailing list NumPy-Discussion at python.org > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ NumPy-Discussion > > > > mailing list NumPy-Discussion at python.org > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ NumPy-Discussion > > > > mailing list NumPy-Discussion at python.org > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ NumPy-Discussion > > > > mailing list NumPy-Discussion at python.org > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > > > > NumPy-Discussion mailing list > > > > NumPy-Discussion at python.org > > > > https://mail.python.org/mailman/listinfo/numpy-discussion________ > > > _______________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at python.org > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Mon Apr 9 07:55:03 2018 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Mon, 09 Apr 2018 13:55:03 +0200 Subject: [Numpy-discussion] PR to add an initializer kwarg to ufunc.reduce (and similar functions) In-Reply-To: References: <1522079156.4888.12.camel@sipsolutions.net> <1522082927.4888.24.camel@sipsolutions.net> <1522083291.8319.3.camel@sipsolutions.net> <1522084192.8883.6.camel@sipsolutions.net> <1522087740.11797.7.camel@sipsolutions.net> Message-ID: On Mon, 2018-04-09 at 13:37 +0200, Hameer Abbasi wrote: > I've renamed the kwarg to `initial`. I'm willing to make the object > dtype changes as well, if someone pointed me to relevant bits of > code. > > Unfortunately, currently, the identity is only used for object dtypes > if the reduction is empty. I think this is to prevent things like `0` > being passed in the sum of objects (and similar cases), which makes > sense. > > However, with the kwarg, it makes sense to include it in the > reduction. I think the change will be somewhere along the lines of: > Detect if `initial` was passed, if so, include for object, otherwise > exclude. > > I personally feel `initial` renders `default` redundant. It can be > used for both purposes. I can't think of a reasonable use case where > you would want the default to be different from the initial value. > However, I do agree that fixing the object case is important, we > don't want users to get used to this behaviour and then rely on it > later. The reason would be the case of NaN which is not a possible initial value for the reduction. I personally find the object case important, if someone seriously argues the opposite I might be swayed possibly. - Sebastian > > Hameer > > On Mon, Mar 26, 2018 at 8:09 PM, Sebastian Berg ns.net> wrote: > > On Mon, 2018-03-26 at 17:40 +0000, Eric Wieser wrote: > > > The difficulty in supporting object arrays is that > > func.reduce(arr, > > > initial=func.identity) and func.reduce(arr) have different > > meanings - > > > whereas with the current patch, they are equivalent. > > > > > > > True, but the current meaning is: > > > > func.reduce(arr, intial=, default=func.identity) > > > > in the case for object dtype. Luckily for normal dtypes, > > func.identity > > is both the correct default "default" and a no-op for initial. Thus > > the > > name "identity" kinda works there. I am also not really sure that > > both > > kwargs would make real sense (plus initial probably disallows > > default...), but I got some feeling that the "default" meaning may > > be > > even more useful to simplify special casing the empty case. > > > > Anyway, still just pointing out that I it gives me some headaches > > to > > see such a special case for objects :(. > > > > - Sebastian > > > > > > > > > > On Mon, 26 Mar 2018 at 10:10 Sebastian Berg > ns.n > > > et> wrote: > > > > On Mon, 2018-03-26 at 12:59 -0400, Hameer Abbasi wrote: > > > > > That may be complicated. Currently, the identity isn't used > > in > > > > object > > > > > dtype reductions. We may need to change that, which could > > cause a > > > > > whole lot of other backwards incompatible changes. For > > example, > > > > sum > > > > > actually including zero in object reductions. Or we could > > pass in > > > > a > > > > > flag saying an initializer was passed in to change that > > > > behaviour. If > > > > > this is agreed upon and someone is kind enough to point me to > > the > > > > > code, I'd be willing to make this change. > > > > > > > > I realize the implication, I am not suggesting to change the > > > > default > > > > behaviour (when no initial=... is passed), I would think about > > > > deprecating it, but probably only if we also have the `default` > > > > argument, since otherwise you cannot replicate the old > > behaviour. > > > > > > > > What I think I would like to see is to change how it works if > > (and > > > > only > > > > if) the initializer is passed in. Yes, this will require > > holding on > > > > to > > > > some extra information since you will have to know/remember > > whether > > > > the > > > > "identity" was passed in or defined otherwise. > > > > > > > > I did not check the code, but I would hope that it is not > > awfully > > > > tricky to do that. > > > > > > > > - Sebastian > > > > > > > > > > > > PS: A side note, but I see your emails as a single block of > > text > > > > with > > > > no/broken new-lines. > > > > > > > > > > > > > On 26/03/2018 at 18:54, > > > > > Sebastian wrote: On Mon, 2018-03-26 at 18:48 +0200, Sebastian > > > > Berg > > > > > wrote: On Mon, 2018-03-26 at 11:53 -0400, Hameer Abbasi > > wrote: > > > > It'll > > > > > need to be thought out for object arrays and subclasses. But > > for > > > > > Regular numeric stuff, Numpy uses fmin and this would have > > the > > > > > desired > > > > > effect. I do not want to block this, but I would like a > > clearer > > > > > opinion about this issue, `np.nansum` as Benjamin noted would > > > > require > > > > > something like: np.nansum([np.nan], default=np.nan) because > > > > > np.sum([1], initializer=np.nan) np.nansum([1], > > > > initializer=np.nan) > > > > > would both give NaN if the logic is the same as the current > > > > `np.sum`. > > > > > And yes, I guess for fmin/fmax NaN happens to work. And then > > > > there > > > > > are > > > > > many nonsense reduces which could make sense with > > `initializer`. > > > > Now > > > > > nansum is not implemented in a way that could make use of the > > new > > > > > kwarg anyway, so maybe it does not matter in some sense. We > > can > > > > in > > > > > principle use `default` in nansum and at some point possibly > > add > > > > > `default` to the normal ufuncs. If we argue like that, the > > only > > > > > annoying thing is the `object` dtype which confuses the two > > use > > > > cases > > > > > currently. This confusion IMO is not harmless, because I > > might > > > > want > > > > > to > > > > > use it (e.g. sum with initializer=5), and I would expect > > things > > > > like > > > > > dropping in `decimal.Decimal` to work most of the time, while > > > > here it > > > > > would give silently bad results. In other words: I am very > > very > > > > much > > > > > in favor if you get rid that object dtype special case. I > > frankly > > > > not > > > > > see why not (except that it needs a bit more code change). If > > > > given > > > > > explicitly, we might as well force the use and not do the > > funny > > > > stuff > > > > > which is designed to be more type agnostic! If it happens to > > fail > > > > due > > > > > to not being type agnostic, it will at least fail loudly. If > > you > > > > > leave > > > > > that object special case I am *very* hesitant about it. That > > I > > > > think > > > > > I > > > > > would like a `default` argument as well, is another issue and > > it > > > > can > > > > > wait to another day. - Sebastian - Sebastian On 26/03/2018 at > > > > 17:45, > > > > > Sebastian wrote: On Mon, 2018-03-26 at 11:39 -0400, Hameer > > Abbasi > > > > > wrote: That is the idea, but NaN functions are in a separate > > > > branch > > > > > for another PR to be discussed later. You can see it on my > > fork, > > > > if > > > > > you're interested. Except that as far as I understand I am > > not > > > > sure > > > > > it > > > > > will help much with it, since it is not a default, but an > > > > > initializer. > > > > > Initializing to NaN would just make all results NaN. - > > Sebastian > > > > On > > > > > 26/03/2018 at 17:35, Benjamin wrote: Hmm, this is neat. I > > imagine > > > > it > > > > > would finally give some people a choice on what > > > > np.nansum([np.nan]) > > > > > should return? It caused a huge hullabeloo a few years ago > > when > > > > we > > > > > changed it from returning NaN to returning zero. Ben Root On > > Mon, > > > > Mar > > > > > 26, 2018 at 11:16 AM, Sebastian Berg > net> > > > > > wrote: OK, the new documentation is actually clear: > > initializer : > > > > > scalar, optional The value with which to start the reduction. > > > > > Defaults > > > > > to the `~numpy.ufunc.identity` of the ufunc. If ``None`` is > > > > given, > > > > > the > > > > > first element of the reduction is used, and an error is > > thrown if > > > > the > > > > > reduction is empty. If ``a.dtype`` is ``object``, then the > > > > > initializer > > > > > is _only_ used if reduction is empty. I would actually like > > to > > > > say > > > > > that I do not like the object special case much (and it is > > > > probably > > > > > the reason why I was confused), nor am I quite sure this is > > what > > > > > helps > > > > > a lot? Logically, I would argue there are two things: 1. > > > > > initializer/start (always used) 2. default (oly used for > > empty > > > > > reductions) For example, I might like to give `np.nan` as the > > > > default > > > > > for some empty reductions, this will not work. I understand > > that > > > > this > > > > > is a minimal invasive PR and I am not sure I find the > > solution > > > > bad > > > > > enough to really dislike it, but what do other think? My > > first > > > > > expectation was the default behaviour (in all cases, not just > > > > object > > > > > case) for some reason. To be honest, for now I just wonder a > > bit: > > > > How > > > > > hard would it be to do both, or is that too annoying? It > > would at > > > > > least get rid of that annoying thing with object ufuncs > > (which > > > > > currently have a default, but not really an > > > > identity/initializer). > > > > > Best, Sebastian On Mon, 2018-03-26 at 08:20 -0400, Hameer > > Abbasi > > > > > wrote: > Actually, the behavior right now isn?t that of > > `default` > > > > but > > > > > that of > `initializer` or `start`. > > This was discussed > > > > further > > > > > down in the PR but to reiterate: > `np.sum([10], > > initializer=5)` > > > > > becomes `15`. > > Also, `np.min([5], initializer=0)` becomes > > `0`, > > > > so > > > > > it isn?t really > the default value, it?s the initial value > > among > > > > > which the reduction > is performed. > > This was the reason > > to > > > > call > > > > > it > > > > > initializer in the first place. I like > `initial` and > > > > > `initial_value` > > > > > as well, and `start` also makes sense > but isn?t descriptive > > > > enough. > > > > > > > Hameer > Sent from Astro for Mac > > > On Mar 26, 2018 at > > > > 12:06, > > > > > > > > > > Sebastian Berg > t> wrote: > > > > > > > > > > > Initializer or this sounds fine to me. As an other data point > > > > which > > > > > > > I > > think has been mentioned before, `sum` uses start and > > > > min/max > > > > > > > > > > use > > default. `start` does not work, unless we also change > > the > > > > > code > > > > > to > > always use the identity if given (currently that is > > not > > > > the > > > > > case), > > in > > which case it might be nice. However, > > "start" > > > > seems > > > > > a bit like > > solving > > a different issue in any case. > > > > > > > > > > > Anyway, mostly noise. I really like adding this, the only > > thing > > > > > > > > > > > worth > > discussing a bit is the name :). - Sebastian > > > > > > > > > > > > On > > > > > Mon, 2018-03-26 at 05:57 -0400, Hameer Abbasi wrote: > > > It > > > > calls > > > > > it > > > > > `initializer` - See https://docs.python.org/3.5/libra > > > > > ry/f > > > > > > > > > > > > > > > > > unctools.html#functools.reduce > > > > > > Sent from Astro > > for > > > > Mac On > > > > > Mar 26, 2018 at 09:54, Eric Wieser > > > > > > > com> > > > > > > > > > wrote: > > > > > > > > It turns out I mispoke - > > > > > > > > > > functools.reduce calls the argument > > > > `initial` > > > > > > > > > > > > > > > > > > On > > > > > Mon, 26 Mar 2018 at 00:17 Stephan Hoyer > > > > > > > > > > wrote: > > > > > This looks like a very logical addition to > > the > > > > > reduce > > > > > interface. > > > > > It has my support! > > > > > > > > > I > > would > > > > > have > > > > > preferred the more descriptive name > > > > > > > "initial_value", > > > > > > > > > > > > > > > > > > > but consistency with functools.reduce makes a compelling > > case > > > > > > > > > > > > > > for > > > > > "initializer". > > > > > On Sun, Mar 25, > > 2018 > > > > at > > > > > > > > > > 1:15 PM Eric Wieser > > > > ail.com> > > > > wrote: > > > > > To reiterate my comments in the issue - I'm in favor of > > > this. > > > > > > > > > > > > > > > > > > > > > It seems seem especially valuable for > > identity- > > > > less > > > > > > > > > > > > > > > > > > > > > > functions > > > > (`min`, `max`, `lcm`), and the > > > > argument > > > > > > > > > > name is consistent > > > with > `functools.reduce`. too. > > > > > > > > > > > > > > > > > > > > > > > > > > > > The only argument I can see against merging this > > would > > > > be > > > > > > > > > > > `kwarg`-creep of `reduce`, and I think this has > > enough > > > > use > > > > > > > > > > > > > > > > > > > > > > > > > > > cases to justify that. > > > > > > > > > > > > I'd like > > to > > > > > > > > merge > > > > > > > > > > in a few days, if no one else has any > > > > > > opinions. > > > > > > > > > > > > > > > > > Eric > > > > > > > > > > > > On Fri, 16 Mar 2018 at 10:13 > > > > Hameer > > > > > > > > > > Abbasi > > > > > @gma > > > > > > il.com> > > > > wrote: > > > > > Hello, everyone. I?ve submitted a PR to add a initializer > > kwarg > > > > to > > > > > ufunc.reduce. This is useful in a few cases, e.g., > > > > > > > > > > > > > it > > > > > allows one to supply a ?default? value for identity- > > > > > > > > > > > > > > > > > > less > > > > > > > ufunc reductions, and specify an initial > > value > > > > for > > > > > > > > > > > > reductions such as sum (other than zero.) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Please feel free to review or leave feedback, > > (although > > > > I > > > > > > > > > > > > > > > > > > > think Eric and Marten have picked it apart pretty > > well). > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/numpy/numpy/pull/10635 > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > Hameer > > > > Sent from Astro for Mac > > > > > > > > > > _______________________________________________ > > > NumPy- > > > > > Discussion > > > > > mailing list > > > > > > > NumPy-Discussion at python.org > > > > > > > > > > > > > > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > > > NumPy-Discussion mailing list > > > > > > NumPy-Discussion at py > > thon > > > > .o > > > > > rg > > > > > https://mail.python.org/mailman/listinfo/numpy-discussi on > > > > > _______________________________________________ > > > > > > > > > NumPy-Discussion mailing list > > > > > NumPy-Discussion at pyth > > on.o > > > > rg > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > _______________________________________________ > > > > > > > > > NumPy-Discussion mailing list > > > > NumPy-Discussion at python > > .org > > > > > > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > > > > > _______________________________________________ > > > NumPy- > > > > > Discussion mailing list > > > NumPy-Discussion at python.org > > > > > > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > > > > > _______________________________________________ > > NumPy- > > > > Discussion > > > > > mailing list > > NumPy-Discussion at python.org > > > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > _______________________________________________ > NumPy- > > > > Discussion > > > > > mailing list > NumPy-Discussion at python.org > > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ NumPy- > > Discussion > > > > > mailing list NumPy-Discussion at python.org > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ NumPy- > > Discussion > > > > > mailing list NumPy-Discussion at python.org > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ NumPy- > > Discussion > > > > > mailing list NumPy-Discussion at python.org > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ NumPy- > > Discussion > > > > > mailing list NumPy-Discussion at python.org > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > > > > > NumPy-Discussion mailing list > > > > > NumPy-Discussion at python.org > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion____ > > ____ > > > > _______________________________________ > > > > NumPy-Discussion mailing list > > > > NumPy-Discussion at python.org > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at python.org > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From einstein.edison at gmail.com Mon Apr 9 08:47:23 2018 From: einstein.edison at gmail.com (Hameer Abbasi) Date: Mon, 9 Apr 2018 14:47:23 +0200 Subject: [Numpy-discussion] PR to add an initializer kwarg to ufunc.reduce (and similar functions) In-Reply-To: References: <1522079156.4888.12.camel@sipsolutions.net> <1522082927.4888.24.camel@sipsolutions.net> <1522083291.8319.3.camel@sipsolutions.net> <1522084192.8883.6.camel@sipsolutions.net> <1522087740.11797.7.camel@sipsolutions.net> Message-ID: > > The reason would be the case of NaN which is not a possible initial > value for the reduction. > Ah, I didn't think of that. However, at least for `min` and `max` this can be accomplished with `fmin` and `fmax`. -------------- next part -------------- An HTML attachment was scrubbed... URL: From wieser.eric+numpy at gmail.com Tue Apr 10 01:24:01 2018 From: wieser.eric+numpy at gmail.com (Eric Wieser) Date: Tue, 10 Apr 2018 05:24:01 +0000 Subject: [Numpy-discussion] Changing the return type of np.histogramdd Message-ID: Numpy has three histogram functions - histogram, histogram2d, and histogramdd. histogram is by far the most widely used, and in the absence of weights and normalization, returns an np.intp count for each bin. histogramdd (for which histogram2d is a wrapper) returns np.float64 in all circumstances. As a contrived comparison >>> x = np.linspace(0, 1)>>> h, e = np.histogram(x*x, bins=4); h array([25, 10, 8, 7], dtype=int64)>>> h, e = np.histogramdd((x*x,), bins=4); h array([25., 10., 8., 7.]) https://github.com/numpy/numpy/issues/7845 tracks this inconsistency. The fix is now trivial: the question is, will changing the return type break people?s code? Either we should: 1. Just change it, and hope no one is broken by it 2. Add a dtype argument: - If dtype=None, behave like np.histogram - If dtype is not specified, emit a future warning recommending to use dtype=None or dtype=float - In future, change the default to None 3. Create a new better-named function histogram_nd, which can also be created without the mistake that is https://github.com/numpy/numpy/issues/10864. Thoughts? Eric ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jerome.Kieffer at esrf.fr Tue Apr 10 03:22:05 2018 From: Jerome.Kieffer at esrf.fr (Jerome Kieffer) Date: Tue, 10 Apr 2018 09:22:05 +0200 Subject: [Numpy-discussion] Changing the return type of np.histogramdd In-Reply-To: References: Message-ID: <20180410092205.20c263c0@lintaillefer.esrf.fr> > Either we should: > > 1. Just change it, and hope no one is broken by it > 2. Add a dtype argument: > - If dtype=None, behave like np.histogram > - If dtype is not specified, emit a future warning recommending to > use dtype=None or dtype=float > - In future, change the default to None > 3. Create a new better-named function histogram_nd, which can also be > created without the mistake that is > https://github.com/numpy/numpy/issues/10864. > > Thoughts? I like the option 2. By the way, we (@ESRF) re-developped many times histogram and histogram_nd in various projects in order to have a better consistency on the one hand and better performances on the other (re-written in C or C++). I noticed a noticeable gain in performance in the last years of numpy but I did not check consistency. The issue is that every bin should be an interval open on the right-hand side which causes stability issues depending as the smallest value greater than the max depend on the input dtype. For example the smallest value greater than 10 is 11 in int but 10.000001 in float32 and 10.000000000000002 in float64. Cheers, -- J?r?me Kieffer From matti.picus at gmail.com Tue Apr 10 05:29:21 2018 From: matti.picus at gmail.com (Matti Picus) Date: Tue, 10 Apr 2018 12:29:21 +0300 Subject: [Numpy-discussion] Introduction: NumPy developers at BIDS In-Reply-To: References: <20180406060824.65ldhpbynwqbmgek@carbo> <20180408071931.kohuvysr5sfcmfzl@carbo> Message-ID: <19f47605-6c6b-5b7a-a02a-535975519b20@gmail.com> On 08/04/18 21:02, Eric Firing wrote: > On 2018/04/07 9:19 PM, Stefan van der Walt wrote: >> We would love community input on identifying the best areas & issues to >> pay attention to, > > Stefan, > > What is the best way to provide this, and how will the decisions be made? > > Eric > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion Hi. I feel very lucky to be able to dedicate the next phase of my career to working on NumPy. Even though BIDS has hired me, I view myself as working for the community, in an open and transparent way. In thinking about how to help make NumPy contributors more productive, we laid out these tasks: - triage open issues and pull requests, picking up some of the long-standing issues and trying to resolve them - help with code review - review and suggest improvements to the NumPy documentation - if needed, help with releases and infrastructure maintenance tasks Down the road, the next level of things would be - setting up a benchmark site like speed.python.org - add more downstream package testing to the NumPy CI so we can verify that new releases work with packages such as scipy, scikit-learn, astropy To document my work, I have set up a wikihttps://github.com/mattip/numpy/wiki that lists some longer-term tasks and ideas. I look forward to meeting and working with Tyler as well as SciPy2018 where there will be both a BOF meeting to discuss NumPy and a two-day sprint. BIDS is ultimately responsible to the funders to make sure my work achieves the goals Stefan laid out, but I am going to try to be as responsive as possible to any input from the wider community, either directly (mattip on github and #numpy on IRC), via email, or this mailing list. Matti From sebastian at sipsolutions.net Tue Apr 10 07:33:18 2018 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 10 Apr 2018 13:33:18 +0200 Subject: [Numpy-discussion] Introduction: NumPy developers at BIDS In-Reply-To: <19f47605-6c6b-5b7a-a02a-535975519b20@gmail.com> References: <20180406060824.65ldhpbynwqbmgek@carbo> <20180408071931.kohuvysr5sfcmfzl@carbo> <19f47605-6c6b-5b7a-a02a-535975519b20@gmail.com> Message-ID: <9219924681f69036f799edc395c13f1103809db9.camel@sipsolutions.net> On Tue, 2018-04-10 at 12:29 +0300, Matti Picus wrote: > On 08/04/18 21:02, Eric Firing wrote: > > On 2018/04/07 9:19 PM, Stefan van der Walt wrote: > > > We would love community input on identifying the best areas & > > > issues to > > > pay attention to, > > > > Stefan, > > > > What is the best way to provide this, and how will the decisions be > > made? > > > > Eric > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > Hi. I feel very lucky to be able to dedicate the next phase of my > career to working on NumPy. Even though BIDS has hired me, I view > myself as working for the community, in an open and transparent way. > In thinking about how to help make NumPy contributors more > productive, we laid out these tasks: > Welcome also from me :), I am looking forward to seeing how things develop! - Sebastian > - triage open issues and pull requests, picking up some of the long- > standing issues and trying to resolve them > > - help with code review > > - review and suggest improvements to the NumPy documentation > > - if needed, help with releases and infrastructure maintenance tasks > > Down the road, the next level of things would be > > - setting up a benchmark site like speed.python.org > > - add more downstream package testing to the NumPy CI so we can > verify that new releases work with packages such as scipy, scikit- > learn, astropy > > To document my work, I have set up a wikihttps://github.com/mattip/nu > mpy/wiki that lists some longer-term tasks and ideas. I look forward > to meeting and working with Tyler as well as SciPy2018 where there > will be both a BOF meeting to discuss NumPy and a two-day sprint. > > BIDS is ultimately responsible to the funders to make sure my work > achieves the goals Stefan laid out, but I am going to try to be as > responsive as possible to any input from the wider community, either > directly (mattip on github and #numpy on IRC), via email, or this > mailing list. > > Matti > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From stefanv at berkeley.edu Tue Apr 10 12:59:41 2018 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Tue, 10 Apr 2018 09:59:41 -0700 Subject: [Numpy-discussion] Introduction: NumPy developers at BIDS In-Reply-To: References: <20180406060824.65ldhpbynwqbmgek@carbo> <20180408071931.kohuvysr5sfcmfzl@carbo> Message-ID: <20180410165941.2ycpbriyzl2wif6e@carbo> Hi Eric, On Sun, 08 Apr 2018 08:02:19 -1000, Eric Firing wrote: > On 2018/04/07 9:19 PM, Stefan van der Walt wrote: > > We would love community input on identifying the best areas & issues to > > pay attention to, > > What is the best way to provide this, and how will the decisions be > made? These are good questions. We are also new at this, so while we have some ideas on how things could work, we may have to refine the process along the way. We want to operate as openly as we can, so discussing ideas on the mailing list is a preferred first option. But we're also open to inchoate ideas and recommendations (including on how we run things on our end) via email. Unless instructed explicitly otherwise, those ideas will likely bubble up into posts here anyway. Since we're learning the ropes, we'd like to expose the team to a wide variety of ideas. Visitors to the team are most welcome---please reach out to me if you want to talk to us, either in person or via video chat. Can you help us think of good ways to learn "community priorities"? E.g., for GitHub issues, should we take monthly polls, count the number of "thumbs up"s, consider issues with the most comments, or tally the number of explicit mentions of team members? Best regards, St?fan From nathan12343 at gmail.com Tue Apr 10 13:03:06 2018 From: nathan12343 at gmail.com (Nathan Goldbaum) Date: Tue, 10 Apr 2018 10:03:06 -0700 Subject: [Numpy-discussion] Introduction: NumPy developers at BIDS In-Reply-To: <20180410165941.2ycpbriyzl2wif6e@carbo> References: <20180406060824.65ldhpbynwqbmgek@carbo> <20180408071931.kohuvysr5sfcmfzl@carbo> <20180410165941.2ycpbriyzl2wif6e@carbo> Message-ID: On Tue, Apr 10, 2018 at 9:59 AM, Stefan van der Walt wrote: > Hi Eric, > > On Sun, 08 Apr 2018 08:02:19 -1000, Eric Firing wrote: > > On 2018/04/07 9:19 PM, Stefan van der Walt wrote: > > > We would love community input on identifying the best areas & issues to > > > pay attention to, > > > > What is the best way to provide this, and how will the decisions be > > made? > > These are good questions. We are also new at this, so while we have > some ideas on how things could work, we may have to refine the process > along the way. > > We want to operate as openly as we can, so discussing ideas on the > mailing list is a preferred first option. But we're also open to > inchoate ideas and recommendations (including on how we run things on > our end) via email. Unless instructed explicitly otherwise, those ideas > will likely bubble up into posts here anyway. > > Since we're learning the ropes, we'd like to expose the team to a wide > variety of ideas. Visitors to the team are most welcome---please reach > out to me if you want to talk to us, either in person or via video chat. > > Can you help us think of good ways to learn "community priorities"? > E.g., for GitHub issues, should we take monthly polls, count the number > of "thumbs up"s, consider issues with the most comments, or tally the > number of explicit mentions of team members? > Keep in mind that only a subset of the community engages on GitHub (mostly developers who are already engaged in the numpy community). You may want to explore other venues for this sort of feedback, e.g. a SciPy BoF session, which will capture a different subset of the community. > > Best regards, > St?fan > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Tue Apr 10 13:05:14 2018 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 10 Apr 2018 18:05:14 +0100 Subject: [Numpy-discussion] Introduction: NumPy developers at BIDS In-Reply-To: <20180410165941.2ycpbriyzl2wif6e@carbo> References: <20180406060824.65ldhpbynwqbmgek@carbo> <20180408071931.kohuvysr5sfcmfzl@carbo> <20180410165941.2ycpbriyzl2wif6e@carbo> Message-ID: Yo, How about weekly open developer hangouts, recorded, to keep it all public? Cheers, Matthew On Tue, Apr 10, 2018 at 5:59 PM, Stefan van der Walt wrote: > Hi Eric, > > On Sun, 08 Apr 2018 08:02:19 -1000, Eric Firing wrote: >> On 2018/04/07 9:19 PM, Stefan van der Walt wrote: >> > We would love community input on identifying the best areas & issues to >> > pay attention to, >> >> What is the best way to provide this, and how will the decisions be >> made? > > These are good questions. We are also new at this, so while we have > some ideas on how things could work, we may have to refine the process > along the way. > > We want to operate as openly as we can, so discussing ideas on the > mailing list is a preferred first option. But we're also open to > inchoate ideas and recommendations (including on how we run things on > our end) via email. Unless instructed explicitly otherwise, those ideas > will likely bubble up into posts here anyway. > > Since we're learning the ropes, we'd like to expose the team to a wide > variety of ideas. Visitors to the team are most welcome---please reach > out to me if you want to talk to us, either in person or via video chat. > > Can you help us think of good ways to learn "community priorities"? > E.g., for GitHub issues, should we take monthly polls, count the number > of "thumbs up"s, consider issues with the most comments, or tally the > number of explicit mentions of team members? > > Best regards, > St?fan > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion From theodore.goetz at gmail.com Tue Apr 10 13:34:32 2018 From: theodore.goetz at gmail.com (John T. Goetz) Date: Tue, 10 Apr 2018 10:34:32 -0700 Subject: [Numpy-discussion] Changing the return type of np.histogramdd In-Reply-To: <20180410092205.20c263c0@lintaillefer.esrf.fr> References: <20180410092205.20c263c0@lintaillefer.esrf.fr> Message-ID: <1523381672.7880.36.camel@gmail.com> On Tue, 2018-04-10 at 09:22 +0200, Jerome Kieffer wrote: > > Either we should: > > > > ???1. Just change it, and hope no one is broken by it > > ???2. Add a dtype argument: > > ??????- If dtype=None, behave like np.histogram > > ??????- If dtype is not specified, emit a future warning > > recommending to > > ??????use dtype=None or dtype=float > > ??????- In future, change the default to None > > ???3. Create a new better-named function histogram_nd, which can > > also be > > ???created without the mistake that is > > ???https://github.com/numpy/numpy/issues/10864. > > > > Thoughts? > > I like the option 2. > > By the way, we (@ESRF) re-developped many times histogram and > histogram_nd in various projects in order to have a better > consistency > on the one hand and better performances on the other (re-written in C > or C++). > --? > J?r?me Kieffer I think this was a mistake and should be fixed so option 1 is my preference. A dtype argument might be convenient but what does that gain over having the user do something like result.astype(np.float64) ? J?r?me, as to performance, I have a PR that pushes histogramming code into C here:?https://github.com/numpy/numpy/pull/9910 . After it was submitted, there was a rearrangement of the code which broke the merge to master. I've been meaning to update the PR to get it through, but haven't had the time. -- John T. Goetz From howard.page at localdipity.com Wed Apr 11 10:52:16 2018 From: howard.page at localdipity.com (Howard Page) Date: Wed, 11 Apr 2018 14:52:16 +0000 Subject: [Numpy-discussion] Looking for data sets for auto analysis In-Reply-To: References: Message-ID: I am working on an automated math analyzer (sort of an statistical analyst in a box) that 1. uses artificial intelligence to recognize probable candidate mathematical models for the data 2. performs standard mathematical model fits for the best candidate models found 3. computes the standard statistical goodness of fit and statistical power statistics 4. computes and report a quantitative measure of the "falsifiability" of the mathematical model 5. generates NumPy code for verification I'm looking for data sets to analyze to test out the machine learning capabilities. We've done some interesting work so far, but there is a lot of validation we need to do. We'd love to work with anyone on the group to analyze your data and show you the output logs and generated code. Howard... 404-754-8763 From jfoxrabinovitz at gmail.com Thu Apr 12 13:36:23 2018 From: jfoxrabinovitz at gmail.com (Joseph Fox-Rabinovitz) Date: Thu, 12 Apr 2018 13:36:23 -0400 Subject: [Numpy-discussion] Adding a return value to np.random.shuffle Message-ID: Would it break backwards compatibility to add the input as a return value to np.random.shuffle? I doubt anyone out there is relying on the None return value. The change is trivial, and allows shuffling a new array in one line instead of two: x = np.random.shuffle(np.array(some_junk)) I've implemented the change in PR#10893. Regards, - Joe -------------- next part -------------- An HTML attachment was scrubbed... URL: From wieser.eric+numpy at gmail.com Thu Apr 12 13:52:51 2018 From: wieser.eric+numpy at gmail.com (Eric Wieser) Date: Thu, 12 Apr 2018 17:52:51 +0000 Subject: [Numpy-discussion] Adding a return value to np.random.shuffle In-Reply-To: References: Message-ID: I?m against this change, because it: - Is inconsistent with the builtin random.shuffle - Makes it easy to fall into the trap of assuming that np.random.shuffle does not mutate it?s input Eric ? On Thu, 12 Apr 2018 at 10:37 Joseph Fox-Rabinovitz wrote: > Would it break backwards compatibility to add the input as a return value > to np.random.shuffle? I doubt anyone out there is relying on the None > return value. > > The change is trivial, and allows shuffling a new array in one line > instead of two: > > x = np.random.shuffle(np.array(some_junk)) > > I've implemented the change in PR#10893. > > Regards, > > - Joe > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Thu Apr 12 13:54:00 2018 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Thu, 12 Apr 2018 19:54:00 +0200 Subject: [Numpy-discussion] Adding a return value to np.random.shuffle In-Reply-To: References: Message-ID: <2f44bf42df6d0b623ae87a032e1b1e4c13670373.camel@sipsolutions.net> On Thu, 2018-04-12 at 13:36 -0400, Joseph Fox-Rabinovitz wrote: > Would it break backwards compatibility to add the input as a return > value to np.random.shuffle? I doubt anyone out there is relying on > the None return value. > Well, python discourages this IIRC, and opts to not do these things for in place functions (see random package specifically). Numpy breaks this in a few places, but that is mostly because we have the out argument as an optional input argument. As is, it is a nice way of making people not write: new = np.random.shuffle(old) and think old won't change. So I think we should probably just stick with the python/Guido van Rossum ideals, or did those change? - Sebastian > The change is trivial, and allows shuffling a new array in one line > instead of two: > > x = np.random.shuffle(np.array(some_junk)) > > I've implemented the change in PR#10893. > > Regards, > > - Joe > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From jfoxrabinovitz at gmail.com Thu Apr 12 14:00:58 2018 From: jfoxrabinovitz at gmail.com (Joseph Fox-Rabinovitz) Date: Thu, 12 Apr 2018 14:00:58 -0400 Subject: [Numpy-discussion] Adding a return value to np.random.shuffle In-Reply-To: <2f44bf42df6d0b623ae87a032e1b1e4c13670373.camel@sipsolutions.net> References: <2f44bf42df6d0b623ae87a032e1b1e4c13670373.camel@sipsolutions.net> Message-ID: Sounds good. I will close the PR. - Joe On Thu, Apr 12, 2018 at 1:54 PM, Sebastian Berg wrote: > On Thu, 2018-04-12 at 13:36 -0400, Joseph Fox-Rabinovitz wrote: > > Would it break backwards compatibility to add the input as a return > > value to np.random.shuffle? I doubt anyone out there is relying on > > the None return value. > > > > Well, python discourages this IIRC, and opts to not do these things for > in place functions (see random package specifically). Numpy breaks this > in a few places, but that is mostly because we have the out argument as > an optional input argument. > > As is, it is a nice way of making people not write: > > new = np.random.shuffle(old) > > and think old won't change. So I think we should probably just stick > with the python/Guido van Rossum ideals, or did those change? > > - Sebastian > > > > > The change is trivial, and allows shuffling a new array in one line > > instead of two: > > > > x = np.random.shuffle(np.array(some_junk)) > > > > I've implemented the change in PR#10893. > > > > Regards, > > > > - Joe > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From alan.isaac at gmail.com Thu Apr 12 16:24:50 2018 From: alan.isaac at gmail.com (Alan Isaac) Date: Thu, 12 Apr 2018 16:24:50 -0400 Subject: [Numpy-discussion] Adding a return value to np.random.shuffle In-Reply-To: References: Message-ID: Some people consider that not to be Pythonic: https://mail.python.org/pipermail/python-dev/2003-October/038855.html Alan Isaac On 4/12/2018 1:36 PM, Joseph Fox-Rabinovitz wrote: > Would it break backwards compatibility to add the input as a return value to np.random.shuffle? I doubt anyone out there is relying on the None return value. > > The change is trivial, and allows shuffling a new array in one line instead of two: > > ??? x = np.random.shuffle(np.array(some_junk)) > > I've implemented the change in PR#10893. > From jfoxrabinovitz at gmail.com Thu Apr 12 16:36:32 2018 From: jfoxrabinovitz at gmail.com (Joseph Fox-Rabinovitz) Date: Thu, 12 Apr 2018 16:36:32 -0400 Subject: [Numpy-discussion] Adding a return value to np.random.shuffle In-Reply-To: References: Message-ID: Agreed. I closed the PR. - Joe On Thu, Apr 12, 2018 at 4:24 PM, Alan Isaac wrote: > Some people consider that not to be Pythonic: > https://mail.python.org/pipermail/python-dev/2003-October/038855.html > > Alan Isaac > > On 4/12/2018 1:36 PM, Joseph Fox-Rabinovitz wrote: > >> Would it break backwards compatibility to add the input as a return value >> to np.random.shuffle? I doubt anyone out there is relying on the None >> return value. >> >> The change is trivial, and allows shuffling a new array in one line >> instead of two: >> >> x = np.random.shuffle(np.array(some_junk)) >> >> I've implemented the change in PR#10893. >> >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefanv at berkeley.edu Tue Apr 17 17:07:38 2018 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Tue, 17 Apr 2018 14:07:38 -0700 Subject: [Numpy-discussion] Introduction: NumPy developers at BIDS In-Reply-To: References: <20180406060824.65ldhpbynwqbmgek@carbo> <20180408071931.kohuvysr5sfcmfzl@carbo> <20180410165941.2ycpbriyzl2wif6e@carbo> Message-ID: <20180417210738.h7xjr2fscpxu6sxz@carbo> On Tue, 10 Apr 2018 10:03:06 -0700, Nathan Goldbaum wrote: > You may want to explore other venues for this sort of feedback, e.g. a > SciPy BoF session, which will capture a different subset of the > community. Thanks for the suggestion, Nathan. We are coordinating with SciPy2018 to have both a BoF and sprint at the end of the conference. On Tue, 10 Apr 2018 18:05:14 +0100, Matthew Brett wrote: > How about weekly open developer hangouts, recorded, to keep it all > public? Thanks for that idea, Matthew. While we are ramping up, there's a lot of noise in sorting things out. So how about we do dedicated monthly hangouts, where everyone can weigh in on the discussion, and where the signal-to-noise ratio is higher for interested parties? We are tracking all work items here on Trello: https://trello.com/b/Azg4fYZH/numpy-at-bids (Of course, a lot happens directly on NumPy issues too, but this board is for publicly tracking "work to support the work"). Best regards, St?fan From matthew.brett at gmail.com Wed Apr 18 11:42:49 2018 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 18 Apr 2018 16:42:49 +0100 Subject: [Numpy-discussion] Introduction: NumPy developers at BIDS In-Reply-To: <20180417210738.h7xjr2fscpxu6sxz@carbo> References: <20180406060824.65ldhpbynwqbmgek@carbo> <20180408071931.kohuvysr5sfcmfzl@carbo> <20180410165941.2ycpbriyzl2wif6e@carbo> <20180417210738.h7xjr2fscpxu6sxz@carbo> Message-ID: Hi St?fan, On Tue, Apr 17, 2018 at 10:07 PM, Stefan van der Walt wrote: > On Tue, 10 Apr 2018 10:03:06 -0700, Nathan Goldbaum wrote: >> You may want to explore other venues for this sort of feedback, e.g. a >> SciPy BoF session, which will capture a different subset of the >> community. > > Thanks for the suggestion, Nathan. We are coordinating with SciPy2018 > to have both a BoF and sprint at the end of the conference. > > On Tue, 10 Apr 2018 18:05:14 +0100, Matthew Brett wrote: >> How about weekly open developer hangouts, recorded, to keep it all >> public? > > Thanks for that idea, Matthew. While we are ramping up, there's a lot > of noise in sorting things out. So how about we do dedicated monthly > hangouts, where everyone can weigh in on the discussion, and where the > signal-to-noise ratio is higher for interested parties? > > We are tracking all work items here on Trello: > > https://trello.com/b/Azg4fYZH/numpy-at-bids > > (Of course, a lot happens directly on NumPy issues too, but this board > is for publicly tracking "work to support the work"). Hum - I see the Trello board is for bite- to meal- size practical issues, but not for the general process of how to engage the community in guiding the the project - is that fair? I was thinking about the engage community part, because it seems to me it would be good to spend time on that first, and if it was me, I think I'd go for more regular public meetings / discussions at this stage rather than less. I'm thinking now of Jarrod's / Brian's story about "more typing", I'm sure you know the one I mean :) Cheers, Matthew From stefanv at berkeley.edu Wed Apr 18 14:21:47 2018 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Wed, 18 Apr 2018 11:21:47 -0700 Subject: [Numpy-discussion] Introduction: NumPy developers at BIDS In-Reply-To: References: <20180406060824.65ldhpbynwqbmgek@carbo> <20180408071931.kohuvysr5sfcmfzl@carbo> <20180410165941.2ycpbriyzl2wif6e@carbo> <20180417210738.h7xjr2fscpxu6sxz@carbo> Message-ID: <20180418182147.yu2sntcsxzyatkkm@carbo> Hi Matthew, On Wed, 18 Apr 2018 16:42:49 +0100, Matthew Brett wrote: > Hum - I see the Trello board is for bite- to meal- size practical > issues, but not for the general process of how to engage the community > in guiding the the project - is that fair? That's correct; it's just another window onto local discussions / planning. > I was thinking about the engage community part, because it seems to me > it would be good to spend time on that first, and if it was me, I > think I'd go for more regular public meetings / discussions at this > stage rather than less. Right, so what do you think of the suggested monthly developer meeting, to start off with. Why don't we try it and see how much interest there is? Best regards, St?fan From matti.picus at gmail.com Thu Apr 19 17:21:44 2018 From: matti.picus at gmail.com (Matti Picus) Date: Fri, 20 Apr 2018 00:21:44 +0300 Subject: [Numpy-discussion] Introduction: NumPy developers at BIDS In-Reply-To: <20180418182147.yu2sntcsxzyatkkm@carbo> References: <20180406060824.65ldhpbynwqbmgek@carbo> <20180408071931.kohuvysr5sfcmfzl@carbo> <20180410165941.2ycpbriyzl2wif6e@carbo> <20180417210738.h7xjr2fscpxu6sxz@carbo> <20180418182147.yu2sntcsxzyatkkm@carbo> Message-ID: <6a9a8156-6d2d-8716-d0b1-5cb45643d506@gmail.com> On 18/04/18 21:21, Stefan van der Walt wrote: > Hi Matthew, > > On Wed, 18 Apr 2018 16:42:49 +0100, Matthew Brett wrote: >> I was thinking about the engage community part, because it seems to me >> it would be good to spend time on that first, and if it was me, I >> think I'd go for more regular public meetings / discussions at this >> stage rather than less. > Right, so what do you think of the suggested monthly developer meeting, > to start off with. Why don't we try it and see how much interest there > is? > > Best regards, > St?fan > Let's try holding a video conference on Wed April 25, noon to one Berkeley time. Details are on the Trello card here https://trello.com/c/mTcHBmqq . If there are particular topics you would like to bring up please add them as a comment on the card. Matti From snailandmail at gmail.com Sun Apr 22 07:29:36 2018 From: snailandmail at gmail.com (=?utf-8?B?0J3QuNC60LjRgtCwINCa0LDRgNGC0LDRiNC+0LI=?=) Date: Sun, 22 Apr 2018 12:29:36 +0100 Subject: [Numpy-discussion] NpyIter_MultiNew doesn't preserve the subclass of the array even without NPY_ITER_NO_SUBTYPE flag Message-ID: Hi everyone, @cournape and I hit a problem trying to fix this issue: https://github.com/numpy/numpy/issues/10933 In short, the C implementation for `where` depends on the array created by NpyIter_MultiNew and it seems that there is no way to get the subclass from the iterator (even when all of the initial arrays are subclasses). Here is the gist with reproduction steps: https://gist.github.com/nkartashov/6eb56f942a18c7ee237689f9c71a3b27 Could anyone help us with this problem? With regards, Nikita Kartashov -------------- next part -------------- An HTML attachment was scrubbed... URL: From matti.picus at gmail.com Tue Apr 24 02:55:08 2018 From: matti.picus at gmail.com (Matti Picus) Date: Tue, 24 Apr 2018 09:55:08 +0300 Subject: [Numpy-discussion] NumPy sprint May 24-25 at BIDS and virtual meetup tomorrow Message-ID: We will take advantage of a few NumPy developers being at Berkeley to hold a two day sprint May 24-25 https://scisprints.github.io/#may-numpy-developer-sprint. Everyone is welcome, drop me a note if you are thinking of coming so we can estimate numbers. As previously announced, I am hosting a video conference open discussion Wed April 25 12:00-13:00 PDT at https://meet.google.com/bmv-fbob-ezp as part of a community building effort. If there are specific topics you wish to bring up, either contact me, add a line to the Trello card here https://trello.com/c/mTcHBmqq, or bring them up at the discussion. Thanks, Matti Picus . From charlesr.harris at gmail.com Tue Apr 24 10:07:04 2018 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 24 Apr 2018 08:07:04 -0600 Subject: [Numpy-discussion] NumPy sprint May 24-25 at BIDS and virtual meetup tomorrow In-Reply-To: References: Message-ID: On Tue, Apr 24, 2018 at 12:55 AM, Matti Picus wrote: > We will take advantage of a few NumPy developers being at Berkeley to hold > a two day sprint May 24-25 https://scisprints.github.io/# > may-numpy-developer-sprint. > Everyone is welcome, drop me a note if you are thinking of coming so we > can estimate numbers. > > As previously announced, I am hosting a video conference open discussion > Wed April 25 12:00-13:00 PDT at https://meet.google.com/bmv-fbob-ezp as > part of a community building effort. > If there are specific topics you wish to bring up, either contact me, add > a line to the Trello card here https://trello.com/c/mTcHBmqq, or bring > them up at the discussion. > I'd like to add ci testing cleanup to the trello topics. It would be good to simplify the scripts -- maybe use runtest for most things -- and add Mac testing. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Tue Apr 24 13:53:07 2018 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Tue, 24 Apr 2018 10:53:07 -0700 Subject: [Numpy-discussion] NumPy sprint May 24-25 at BIDS and virtual meetup tomorrow In-Reply-To: References: Message-ID: On Tue, Apr 24, 2018 at 7:07 AM, Charles R Harris wrote: > > > On Tue, Apr 24, 2018 at 12:55 AM, Matti Picus > wrote: > >> We will take advantage of a few NumPy developers being at Berkeley to >> hold a two day sprint May 24-25 https://scisprints.github.io/# >> may-numpy-developer-sprint. >> Everyone is welcome, drop me a note if you are thinking of coming so we >> can estimate numbers. >> >> As previously announced, I am hosting a video conference open discussion >> Wed April 25 12:00-13:00 PDT at https://meet.google.com/bmv-fbob-ezp as >> part of a community building effort. >> If there are specific topics you wish to bring up, either contact me, add >> a line to the Trello card here https://trello.com/c/mTcHBmqq, or bring >> them up at the discussion. >> > > I'd like to add ci testing cleanup to the trello topics. It would be good > to simplify the scripts -- maybe use runtest for most things -- > +1 > and add Mac testing. > macOS testing on TravisCI is often backing up, so rather than adding it to PRs I think it would be better to build a notification mechanism for failures for the macOS testing that we do already have on https://github.com/MacPython/numpy-wheels Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Thu Apr 26 00:56:25 2018 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Wed, 25 Apr 2018 21:56:25 -0700 Subject: [Numpy-discussion] Changing the return type of np.histogramdd In-Reply-To: References: Message-ID: On Mon, Apr 9, 2018 at 10:24 PM, Eric Wieser wrote: > Numpy has three histogram functions - histogram, histogram2d, and > histogramdd. > > histogram is by far the most widely used, and in the absence of weights > and normalization, returns an np.intp count for each bin. > > histogramdd (for which histogram2d is a wrapper) returns np.float64 in > all circumstances. > > As a contrived comparison > > >>> x = np.linspace(0, 1)>>> h, e = np.histogram(x*x, bins=4); h > array([25, 10, 8, 7], dtype=int64)>>> h, e = np.histogramdd((x*x,), bins=4); h > array([25., 10., 8., 7.]) > > https://github.com/numpy/numpy/issues/7845 tracks this inconsistency. > > The fix is now trivial: the question is, will changing the return type > break people?s code? > > Either we should: > > 1. Just change it, and hope no one is broken by it > 2. Add a dtype argument: > - If dtype=None, behave like np.histogram > - If dtype is not specified, emit a future warning recommending to > use dtype=None or dtype=float > - In future, change the default to None > 3. Create a new better-named function histogram_nd, which can also be > created without the mistake that is https://github.com/numpy/ > numpy/issues/10864. > > Thoughts? > (1) sems like a no-go, taking such risks isn't justified by a minor inconsistency. (2) is still fairly intrusive, you're emitting warnings for everyone and still force people to change their code (and if they don't they may run into a backwards compat break). (3) is the best of these options, however is this really worth a new function? My vote would be "do nothing". Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From wieser.eric+numpy at gmail.com Thu Apr 26 01:07:56 2018 From: wieser.eric+numpy at gmail.com (Eric Wieser) Date: Thu, 26 Apr 2018 05:07:56 +0000 Subject: [Numpy-discussion] Changing the return type of np.histogramdd In-Reply-To: References: Message-ID: what does that gain over having the user do something like result.astype() It means that the user can use integer weights without worrying about losing precision due to an intermediate float representation. It also means they can use higher precision values (np.longdouble) or complex weights. you?re emitting warnings for everyone When there?s a risk of precision loss, that seems like the responsible thing to do. Users passing float weights would see no warning, I suppose. is this really worth a new function There ought to be a function for computing histograms with integer weights that doesn?t lose precision. Either we change the existing function to do that, or we make a new function. A possible compromise: like 1, but only change the dtype of the result if a weights argument is passed. #10864 seems like a worrying design flaw too, but I suppose that can be dealt with separately. Eric ? On Wed, 25 Apr 2018 at 21:57 Ralf Gommers wrote: > On Mon, Apr 9, 2018 at 10:24 PM, Eric Wieser > wrote: > >> Numpy has three histogram functions - histogram, histogram2d, and >> histogramdd. >> >> histogram is by far the most widely used, and in the absence of weights >> and normalization, returns an np.intp count for each bin. >> >> histogramdd (for which histogram2d is a wrapper) returns np.float64 in >> all circumstances. >> >> As a contrived comparison >> >> >>> x = np.linspace(0, 1)>>> h, e = np.histogram(x*x, bins=4); h >> array([25, 10, 8, 7], dtype=int64)>>> h, e = np.histogramdd((x*x,), bins=4); h >> array([25., 10., 8., 7.]) >> >> https://github.com/numpy/numpy/issues/7845 tracks this inconsistency. >> >> The fix is now trivial: the question is, will changing the return type >> break people?s code? >> >> Either we should: >> >> 1. Just change it, and hope no one is broken by it >> 2. Add a dtype argument: >> - If dtype=None, behave like np.histogram >> - If dtype is not specified, emit a future warning recommending to >> use dtype=None or dtype=float >> - In future, change the default to None >> 3. Create a new better-named function histogram_nd, which can also be >> created without the mistake that is >> https://github.com/numpy/numpy/issues/10864. >> >> Thoughts? >> > > (1) sems like a no-go, taking such risks isn't justified by a minor > inconsistency. > > (2) is still fairly intrusive, you're emitting warnings for everyone and > still force people to change their code (and if they don't they may run > into a backwards compat break). > > (3) is the best of these options, however is this really worth a new > function? My vote would be "do nothing". > > Ralf > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Thu Apr 26 01:50:18 2018 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Wed, 25 Apr 2018 22:50:18 -0700 Subject: [Numpy-discussion] Changing the return type of np.histogramdd In-Reply-To: References: Message-ID: On Wed, Apr 25, 2018 at 10:07 PM, Eric Wieser wrote: > what does that gain over having the user do something like result.astype() > > It means that the user can use integer weights without worrying about > losing precision due to an intermediate float representation. > > It also means they can use higher precision values (np.longdouble) or > complex weights. > None of that seems particularly important to be honest. you?re emitting warnings for everyone > > When there?s a risk of precision loss, that seems like the responsible > thing to do. > For precision loss of the order of float64 eps, I disagree. There will be many such places in numpy and in other core libraries. > Users passing float weights would see no warning, I suppose. > > is this really worth a new function > > There ought to be a function for computing histograms with integer weights > that doesn?t lose precision. Either we change the existing function to do > that, or we make a new function. > It's also possible to refer users to scipy.stats.binned_statistic(_2d/dd), which provides a superset of the histogram functionality and is internally consistent because the implementations of 1d/2d call the dd one. Ralf > A possible compromise: like 1, but only change the dtype of the result if > a weights argument is passed. > > #10864 seems like a > worrying design flaw too, but I suppose that can be dealt with separately. > > Eric > ? > > On Wed, 25 Apr 2018 at 21:57 Ralf Gommers wrote: > >> On Mon, Apr 9, 2018 at 10:24 PM, Eric Wieser > > wrote: >> >>> Numpy has three histogram functions - histogram, histogram2d, and >>> histogramdd. >>> >>> histogram is by far the most widely used, and in the absence of weights >>> and normalization, returns an np.intp count for each bin. >>> >>> histogramdd (for which histogram2d is a wrapper) returns np.float64 in >>> all circumstances. >>> >>> As a contrived comparison >>> >>> >>> x = np.linspace(0, 1)>>> h, e = np.histogram(x*x, bins=4); h >>> array([25, 10, 8, 7], dtype=int64)>>> h, e = np.histogramdd((x*x,), bins=4); h >>> array([25., 10., 8., 7.]) >>> >>> https://github.com/numpy/numpy/issues/7845 tracks this inconsistency. >>> >>> The fix is now trivial: the question is, will changing the return type >>> break people?s code? >>> >>> Either we should: >>> >>> 1. Just change it, and hope no one is broken by it >>> 2. Add a dtype argument: >>> - If dtype=None, behave like np.histogram >>> - If dtype is not specified, emit a future warning recommending >>> to use dtype=None or dtype=float >>> - In future, change the default to None >>> 3. Create a new better-named function histogram_nd, which can also >>> be created without the mistake that is https://github.com/numpy/ >>> numpy/issues/10864. >>> >>> Thoughts? >>> >> >> (1) sems like a no-go, taking such risks isn't justified by a minor >> inconsistency. >> >> (2) is still fairly intrusive, you're emitting warnings for everyone and >> still force people to change their code (and if they don't they may run >> into a backwards compat break). >> >> (3) is the best of these options, however is this really worth a new >> function? My vote would be "do nothing". >> >> Ralf >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wieser.eric+numpy at gmail.com Thu Apr 26 02:00:01 2018 From: wieser.eric+numpy at gmail.com (Eric Wieser) Date: Thu, 26 Apr 2018 06:00:01 +0000 Subject: [Numpy-discussion] Changing the return type of np.histogramdd In-Reply-To: References: Message-ID: For precision loss of the order of float64 eps, I disagree. I was thinking more about precision loss on the order of 1, for large 64-bit integers that can?t fit in a float64 Note also that #10864 incurs deliberate precision loss of the order 10**-6 x smallest bin, which is also much larger than eps. It?s also possible to refer users to scipy.stats.binned_statistic That sounds like a good idea to do irrespective of whether histogramdd has problems - I had no idea those existed. Is there a precedent for referring to more feature-rich scipy functions from the basic numpy ones? ? On Wed, 25 Apr 2018 at 22:51 Ralf Gommers wrote: > On Wed, Apr 25, 2018 at 10:07 PM, Eric Wieser > wrote: > >> what does that gain over having the user do something like result.astype() >> >> It means that the user can use integer weights without worrying about >> losing precision due to an intermediate float representation. >> >> It also means they can use higher precision values (np.longdouble) or >> complex weights. >> > None of that seems particularly important to be honest. > > you?re emitting warnings for everyone >> >> When there?s a risk of precision loss, that seems like the responsible >> thing to do. >> > For precision loss of the order of float64 eps, I disagree. There will be > many such places in numpy and in other core libraries. > > >> Users passing float weights would see no warning, I suppose. >> >> is this really worth a new function >> >> There ought to be a function for computing histograms with integer >> weights that doesn?t lose precision. Either we change the existing function >> to do that, or we make a new function. >> > It's also possible to refer users to scipy.stats.binned_statistic(_2d/dd), > which provides a superset of the histogram functionality and is internally > consistent because the implementations of 1d/2d call the dd one. > > Ralf > > > >> A possible compromise: like 1, but only change the dtype of the result if >> a weights argument is passed. >> >> #10864 seems like a >> worrying design flaw too, but I suppose that can be dealt with separately. >> >> Eric >> ? >> >> On Wed, 25 Apr 2018 at 21:57 Ralf Gommers wrote: >> >>> On Mon, Apr 9, 2018 at 10:24 PM, Eric Wieser < >>> wieser.eric+numpy at gmail.com> wrote: >>> >>>> Numpy has three histogram functions - histogram, histogram2d, and >>>> histogramdd. >>>> >>>> histogram is by far the most widely used, and in the absence of >>>> weights and normalization, returns an np.intp count for each bin. >>>> >>>> histogramdd (for which histogram2d is a wrapper) returns np.float64 in >>>> all circumstances. >>>> >>>> As a contrived comparison >>>> >>>> >>> x = np.linspace(0, 1)>>> h, e = np.histogram(x*x, bins=4); h >>>> array([25, 10, 8, 7], dtype=int64)>>> h, e = np.histogramdd((x*x,), bins=4); h >>>> array([25., 10., 8., 7.]) >>>> >>>> https://github.com/numpy/numpy/issues/7845 tracks this inconsistency. >>>> >>>> The fix is now trivial: the question is, will changing the return type >>>> break people?s code? >>>> >>>> Either we should: >>>> >>>> 1. Just change it, and hope no one is broken by it >>>> 2. Add a dtype argument: >>>> - If dtype=None, behave like np.histogram >>>> - If dtype is not specified, emit a future warning recommending >>>> to use dtype=None or dtype=float >>>> - In future, change the default to None >>>> 3. Create a new better-named function histogram_nd, which can also >>>> be created without the mistake that is >>>> https://github.com/numpy/numpy/issues/10864. >>>> >>>> Thoughts? >>>> >>> >>> (1) sems like a no-go, taking such risks isn't justified by a minor >>> inconsistency. >>> >>> (2) is still fairly intrusive, you're emitting warnings for everyone and >>> still force people to change their code (and if they don't they may run >>> into a backwards compat break). >>> >>> (3) is the best of these options, however is this really worth a new >>> function? My vote would be "do nothing". >>> >>> Ralf >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> >> _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From corinhoad at gmail.com Thu Apr 26 11:44:11 2018 From: corinhoad at gmail.com (Corin Hoad) Date: Thu, 26 Apr 2018 15:44:11 +0000 Subject: [Numpy-discussion] Adding fweights and aweights to numpy.corrcoef Message-ID: Hello, Would it be possible to add the fweights and aweights keyword arguments from np.cov to np.corrcoef? They would retain their meaning from np.cov as frequency- or importance-based weightings respectively. Yours, Corin Hoad -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Thu Apr 26 11:59:57 2018 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Thu, 26 Apr 2018 17:59:57 +0200 Subject: [Numpy-discussion] Adding fweights and aweights to numpy.corrcoef In-Reply-To: References: Message-ID: <2897fa67e5db6b0e6c8b00b6c09f6b12b234a62b.camel@sipsolutions.net> I seem to recall that there was a discussion on this and it was a lot trickier then expected. I think statsmodels might have options in this direction. - Sebastian On Thu, 2018-04-26 at 15:44 +0000, Corin Hoad wrote: > Hello, > > Would it be possible to add the fweights and aweights keyword > arguments from np.cov to np.corrcoef? They would retain their meaning > from np.cov as frequency- or importance-based weightings > respectively. > > Yours, > Corin Hoad > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From nathan12343 at gmail.com Thu Apr 26 12:45:12 2018 From: nathan12343 at gmail.com (Nathan Goldbaum) Date: Thu, 26 Apr 2018 16:45:12 +0000 Subject: [Numpy-discussion] Short-circuiting equivalent of np.any or np.all? Message-ID: Hi all, I was surprised recently to discover that both np.any and np.all() do not have a way to exit early: In [1]: import numpy as np In [2]: data = np.arange(1e6) In [3]: print(data[:10]) [0. 1. 2. 3. 4. 5. 6. 7. 8. 9.] In [4]: %timeit np.any(data) 724 us +- 42.4 us per loop (mean +- std. dev. of 7 runs, 1000 loops each) In [5]: data = np.zeros(int(1e6)) In [6]: %timeit np.any(data) 732 us +- 52.9 us per loop (mean +- std. dev. of 7 runs, 1000 loops each) I don't see any discussions about this on the NumPy issue tracker but perhaps I'm missing something. I'm curious if there's a way to get a fast early-terminating search in NumPy? Perhaps there's another package I can depend on that does this? I guess I could also write a bit of cython code that does this but so far this project is pure python and I don't want to deal with the packaging headache of getting wheels built and conda-forge packages set up on all platforms. Thanks for your help! -Nathan -------------- next part -------------- An HTML attachment was scrubbed... URL: From einstein.edison at gmail.com Thu Apr 26 12:51:20 2018 From: einstein.edison at gmail.com (Hameer Abbasi) Date: Thu, 26 Apr 2018 09:51:20 -0700 Subject: [Numpy-discussion] Short-circuiting equivalent of np.any or np.all? In-Reply-To: References: Message-ID: Hi Nathan, np.any and np.all call np.or.reduce and np.and.reduce respectively, and unfortunately the underlying function (ufunc.reduce) has no way of detecting that the value isn?t going to change anymore. It?s also used for (for example) np.sum (np.add.reduce), np.prod (np.multiply.reduce), np.min(np.minimum.reduce), np.max(np.maximum.reduce). You can find more information about this on the ufunc doc page . I don?t think it?s worth it to break this machinery for any and all, as it has numerous other advantages (such as being able to override in duck arrays, etc) Best regards, Hameer Abbasi Sent from Astro for Mac On Apr 26, 2018 at 18:45, Nathan Goldbaum wrote: Hi all, I was surprised recently to discover that both np.any and np.all() do not have a way to exit early: In [1]: import numpy as np In [2]: data = np.arange(1e6) In [3]: print(data[:10]) [0. 1. 2. 3. 4. 5. 6. 7. 8. 9.] In [4]: %timeit np.any(data) 724 us +- 42.4 us per loop (mean +- std. dev. of 7 runs, 1000 loops each) In [5]: data = np.zeros(int(1e6)) In [6]: %timeit np.any(data) 732 us +- 52.9 us per loop (mean +- std. dev. of 7 runs, 1000 loops each) I don't see any discussions about this on the NumPy issue tracker but perhaps I'm missing something. I'm curious if there's a way to get a fast early-terminating search in NumPy? Perhaps there's another package I can depend on that does this? I guess I could also write a bit of cython code that does this but so far this project is pure python and I don't want to deal with the packaging headache of getting wheels built and conda-forge packages set up on all platforms. Thanks for your help! -Nathan _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at python.org https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From nathan12343 at gmail.com Thu Apr 26 12:58:05 2018 From: nathan12343 at gmail.com (Nathan Goldbaum) Date: Thu, 26 Apr 2018 16:58:05 +0000 Subject: [Numpy-discussion] Short-circuiting equivalent of np.any or np.all? In-Reply-To: References: Message-ID: On Thu, Apr 26, 2018 at 11:52 AM Hameer Abbasi wrote: > Hi Nathan, > > np.any and np.all call np.or.reduce and np.and.reduce respectively, and > unfortunately the underlying function (ufunc.reduce) has no way of > detecting that the value isn?t going to change anymore. It?s also used for > (for example) np.sum (np.add.reduce), np.prod (np.multiply.reduce), > np.min(np.minimum.reduce), np.max(np.maximum.reduce). > > You can find more information about this on the ufunc doc page > . I don?t think > it?s worth it to break this machinery for any and all, as it has numerous > other advantages (such as being able to override in duck arrays, etc) > Sure, I'm not saying that numpy should change, more trying to see if there's an alternate way to get what I want in NumPy or some other package. > > Best regards, > Hameer Abbasi > Sent from Astro for Mac > > On Apr 26, 2018 at 18:45, Nathan Goldbaum wrote: > > > Hi all, > > I was surprised recently to discover that both np.any and np.all() do not > have a way to exit early: > > In [1]: import numpy as np > > In [2]: data = np.arange(1e6) > > In [3]: print(data[:10]) > [0. 1. 2. 3. 4. 5. 6. 7. 8. 9.] > > In [4]: %timeit np.any(data) > 724 us +- 42.4 us per loop (mean +- std. dev. of 7 runs, 1000 loops each) > > In [5]: data = np.zeros(int(1e6)) > > In [6]: %timeit np.any(data) > 732 us +- 52.9 us per loop (mean +- std. dev. of 7 runs, 1000 loops each) > > I don't see any discussions about this on the NumPy issue tracker but > perhaps I'm missing something. > > I'm curious if there's a way to get a fast early-terminating search in > NumPy? Perhaps there's another package I can depend on that does this? I > guess I could also write a bit of cython code that does this but so far > this project is pure python and I don't want to deal with the packaging > headache of getting wheels built and conda-forge packages set up on all > platforms. > > Thanks for your help! > > -Nathan > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jfoxrabinovitz at gmail.com Thu Apr 26 13:00:18 2018 From: jfoxrabinovitz at gmail.com (Joseph Fox-Rabinovitz) Date: Thu, 26 Apr 2018 13:00:18 -0400 Subject: [Numpy-discussion] Short-circuiting equivalent of np.any or np.all? In-Reply-To: References: Message-ID: Would it be useful to have a short-circuited version of the function that is not a ufunc? - Joe On Thu, Apr 26, 2018 at 12:51 PM, Hameer Abbasi wrote: > Hi Nathan, > > np.any and np.all call np.or.reduce and np.and.reduce respectively, and > unfortunately the underlying function (ufunc.reduce) has no way of > detecting that the value isn?t going to change anymore. It?s also used for > (for example) np.sum (np.add.reduce), np.prod (np.multiply.reduce), > np.min(np.minimum.reduce), np.max(np.maximum.reduce). > > You can find more information about this on the ufunc doc page > . I don?t think > it?s worth it to break this machinery for any and all, as it has numerous > other advantages (such as being able to override in duck arrays, etc) > > Best regards, > Hameer Abbasi > Sent from Astro for Mac > > On Apr 26, 2018 at 18:45, Nathan Goldbaum wrote: > > > Hi all, > > I was surprised recently to discover that both np.any and np.all() do not > have a way to exit early: > > In [1]: import numpy as np > > In [2]: data = np.arange(1e6) > > In [3]: print(data[:10]) > [0. 1. 2. 3. 4. 5. 6. 7. 8. 9.] > > In [4]: %timeit np.any(data) > 724 us +- 42.4 us per loop (mean +- std. dev. of 7 runs, 1000 loops each) > > In [5]: data = np.zeros(int(1e6)) > > In [6]: %timeit np.any(data) > 732 us +- 52.9 us per loop (mean +- std. dev. of 7 runs, 1000 loops each) > > I don't see any discussions about this on the NumPy issue tracker but > perhaps I'm missing something. > > I'm curious if there's a way to get a fast early-terminating search in > NumPy? Perhaps there's another package I can depend on that does this? I > guess I could also write a bit of cython code that does this but so far > this project is pure python and I don't want to deal with the packaging > headache of getting wheels built and conda-forge packages set up on all > platforms. > > Thanks for your help! > > -Nathan > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From einstein.edison at gmail.com Thu Apr 26 13:01:19 2018 From: einstein.edison at gmail.com (Hameer Abbasi) Date: Thu, 26 Apr 2018 10:01:19 -0700 Subject: [Numpy-discussion] Short-circuiting equivalent of np.any or np.all? In-Reply-To: References: Message-ID: Ah, in that case, if exotic platforms aren?t important for you, Numba can do the trick quite well. Best regards, Hameer Abbasi Sent from Astro for Mac On Apr 26, 2018 at 18:58, Nathan Goldbaum wrote: On Thu, Apr 26, 2018 at 11:52 AM Hameer Abbasi wrote: > Hi Nathan, > > np.any and np.all call np.or.reduce and np.and.reduce respectively, and > unfortunately the underlying function (ufunc.reduce) has no way of > detecting that the value isn?t going to change anymore. It?s also used for > (for example) np.sum (np.add.reduce), np.prod (np.multiply.reduce), > np.min(np.minimum.reduce), np.max(np.maximum.reduce). > > You can find more information about this on the ufunc doc page > . I don?t think > it?s worth it to break this machinery for any and all, as it has numerous > other advantages (such as being able to override in duck arrays, etc) > Sure, I'm not saying that numpy should change, more trying to see if there's an alternate way to get what I want in NumPy or some other package. > > Best regards, > Hameer Abbasi > Sent from Astro for Mac > > On Apr 26, 2018 at 18:45, Nathan Goldbaum wrote: > > > Hi all, > > I was surprised recently to discover that both np.any and np.all() do not > have a way to exit early: > > In [1]: import numpy as np > > In [2]: data = np.arange(1e6) > > In [3]: print(data[:10]) > [0. 1. 2. 3. 4. 5. 6. 7. 8. 9.] > > In [4]: %timeit np.any(data) > 724 us +- 42.4 us per loop (mean +- std. dev. of 7 runs, 1000 loops each) > > In [5]: data = np.zeros(int(1e6)) > > In [6]: %timeit np.any(data) > 732 us +- 52.9 us per loop (mean +- std. dev. of 7 runs, 1000 loops each) > > I don't see any discussions about this on the NumPy issue tracker but > perhaps I'm missing something. > > I'm curious if there's a way to get a fast early-terminating search in > NumPy? Perhaps there's another package I can depend on that does this? I > guess I could also write a bit of cython code that does this but so far > this project is pure python and I don't want to deal with the packaging > headache of getting wheels built and conda-forge packages set up on all > platforms. > > Thanks for your help! > > -Nathan > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at python.org https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From nathan12343 at gmail.com Thu Apr 26 13:19:19 2018 From: nathan12343 at gmail.com (Nathan Goldbaum) Date: Thu, 26 Apr 2018 17:19:19 +0000 Subject: [Numpy-discussion] Short-circuiting equivalent of np.any or np.all? In-Reply-To: References: Message-ID: On Thu, Apr 26, 2018 at 12:03 PM Joseph Fox-Rabinovitz < jfoxrabinovitz at gmail.com> wrote: > Would it be useful to have a short-circuited version of the function that > is not a ufunc? > Yes definitely. I could use numba as suggested by Hameer but I'd rather not add a new runtime dependency. I could use cython or C but I'd need to deal with the packaging headaches of including C code in your package. I guess I could also create a new project that just implements the functions I need in cython, deal with the packaging headaches there, and then depend on that package. At least that way others won't need to deal with the pain :) > - Joe > > On Thu, Apr 26, 2018 at 12:51 PM, Hameer Abbasi > wrote: > >> Hi Nathan, >> >> np.any and np.all call np.or.reduce and np.and.reduce respectively, and >> unfortunately the underlying function (ufunc.reduce) has no way of >> detecting that the value isn?t going to change anymore. It?s also used for >> (for example) np.sum (np.add.reduce), np.prod (np.multiply.reduce), >> np.min(np.minimum.reduce), np.max(np.maximum.reduce). >> >> You can find more information about this on the ufunc doc page >> . I don?t think >> it?s worth it to break this machinery for any and all, as it has numerous >> other advantages (such as being able to override in duck arrays, etc) >> >> Best regards, >> Hameer Abbasi >> Sent from Astro for Mac >> >> On Apr 26, 2018 at 18:45, Nathan Goldbaum wrote: >> >> >> Hi all, >> >> I was surprised recently to discover that both np.any and np.all() do not >> have a way to exit early: >> >> In [1]: import numpy as np >> >> In [2]: data = np.arange(1e6) >> >> In [3]: print(data[:10]) >> [0. 1. 2. 3. 4. 5. 6. 7. 8. 9.] >> >> In [4]: %timeit np.any(data) >> 724 us +- 42.4 us per loop (mean +- std. dev. of 7 runs, 1000 loops each) >> >> In [5]: data = np.zeros(int(1e6)) >> >> In [6]: %timeit np.any(data) >> 732 us +- 52.9 us per loop (mean +- std. dev. of 7 runs, 1000 loops each) >> >> I don't see any discussions about this on the NumPy issue tracker but >> perhaps I'm missing something. >> >> I'm curious if there's a way to get a fast early-terminating search in >> NumPy? Perhaps there's another package I can depend on that does this? I >> guess I could also write a bit of cython code that does this but so far >> this project is pure python and I don't want to deal with the packaging >> headache of getting wheels built and conda-forge packages set up on all >> platforms. >> >> Thanks for your help! >> >> -Nathan >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Thu Apr 26 13:26:53 2018 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Thu, 26 Apr 2018 19:26:53 +0200 Subject: [Numpy-discussion] Short-circuiting equivalent of np.any or np.all? In-Reply-To: References: Message-ID: <3cdd1c62656596c486ac0e4d741d4ff326f56586.camel@sipsolutions.net> On Thu, 2018-04-26 at 09:51 -0700, Hameer Abbasi wrote: > Hi Nathan, > > np.any and np.all call np.or.reduce and np.and.reduce respectively, > and unfortunately the underlying function (ufunc.reduce) has no way > of detecting that the value isn?t going to change anymore. It?s also > used for (for example) np.sum (np.add.reduce), np.prod > (np.multiply.reduce), np.min(np.minimum.reduce), > np.max(np.maximum.reduce). I would like to point out that this is not almost, but not quite true. The boolean versions will short circuit on the innermost level, which is good enough for all practical purposes probably. One way to get around it would be to use a chunked iteration using np.nditer in pure python. I admit it is a bit tricky to get start on, but it is basically what numexpr uses also (at least in the simplest mode), and if your arrays are relatively large, there is likely no real performance hit compared to a non-pure python version. - Sebastian > > You can find more information about this on the ufunc doc page. I > don?t think it?s worth it to break this machinery for any and all, as > it has numerous other advantages (such as being able to override in > duck arrays, etc) > > Best regards, > Hameer Abbasi > Sent from Astro for Mac > > > On Apr 26, 2018 at 18:45, Nathan Goldbaum > > wrote: > > > > Hi all, > > > > I was surprised recently to discover that both np.any and np.all() > > do not have a way to exit early: > > > > In [1]: import numpy as np > > > > In [2]: data = np.arange(1e6) > > > > In [3]: print(data[:10]) > > [0. 1. 2. 3. 4. 5. 6. 7. 8. 9.] > > > > In [4]: %timeit np.any(data) > > 724 us +- 42.4 us per loop (mean +- std. dev. of 7 runs, 1000 loops > > each) > > > > In [5]: data = np.zeros(int(1e6)) > > > > In [6]: %timeit np.any(data) > > 732 us +- 52.9 us per loop (mean +- std. dev. of 7 runs, 1000 loops > > each) > > > > I don't see any discussions about this on the NumPy issue tracker > > but perhaps I'm missing something. > > > > I'm curious if there's a way to get a fast early-terminating search > > in NumPy? Perhaps there's another package I can depend on that does > > this? I guess I could also write a bit of cython code that does > > this but so far this project is pure python and I don't want to > > deal with the packaging headache of getting wheels built and conda- > > forge packages set up on all platforms. > > > > Thanks for your help! > > > > -Nathan > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From m.h.vankerkwijk at gmail.com Thu Apr 26 15:04:12 2018 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Thu, 26 Apr 2018 15:04:12 -0400 Subject: [Numpy-discussion] Short-circuiting equivalent of np.any or np.all? In-Reply-To: <3cdd1c62656596c486ac0e4d741d4ff326f56586.camel@sipsolutions.net> References: <3cdd1c62656596c486ac0e4d741d4ff326f56586.camel@sipsolutions.net> Message-ID: For a lot more discussion, and a possible solution, see https://github.com/numpy/numpy/pull/8528 From allanhaldane at gmail.com Thu Apr 26 15:21:18 2018 From: allanhaldane at gmail.com (Allan Haldane) Date: Thu, 26 Apr 2018 15:21:18 -0400 Subject: [Numpy-discussion] Short-circuiting equivalent of np.any or np.all? In-Reply-To: References: Message-ID: <7769f3ce-a46a-392a-ca03-ca72127ede8e@gmail.com> On 04/26/2018 12:45 PM, Nathan Goldbaum wrote: > I'm curious if there's a way to get a fast early-terminating search in > NumPy? Perhaps there's another package I can depend on that does this? I > guess I could also write a bit of cython code that does this but so far > this project is pure python and I don't want to deal with the packaging > headache of getting wheels built and conda-forge packages set up on all > platforms. > > Thanks for your help! > > -Nathan A current PR that implements short-circuiting for "all"-like operations is: https://github.com/numpy/numpy/pull/8528 Actually, I have a little dream that we will be able to implement this kind of short-circuiting more generally in numpy soon, following the idea in that PR of turning functions into gufuncs. We just need to add some finishing touches on the gufunc implementation first. We are almost there - the one important feature gufuncs are still missing is support for "multiple axis" arguments. See https://github.com/numpy/numpy/issues/8810. Once that is done I also think there are some other new and useful short-circuiting gufuncs we could add, like "count" and "first". See some comments: https://github.com/numpy/numpy/pull/8528#issuecomment-365358119 I am imagining we will end up with a "gufunc ecosystem", where there are some core ufuncs like np.add, np.multiply, np.less_than, and then a bunch of "associated" gufuncs for each of these, like reduce, first, all, accessible as attributes of the core ufunc. (It has long been vaguely planned to turn "reduce" into a gufunc too, according to comments in the code. I'm excited for when that can happen!) Allan From sebastian at sipsolutions.net Thu Apr 26 18:13:40 2018 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Fri, 27 Apr 2018 00:13:40 +0200 Subject: [Numpy-discussion] Short-circuiting equivalent of np.any or np.all? In-Reply-To: <3cdd1c62656596c486ac0e4d741d4ff326f56586.camel@sipsolutions.net> References: <3cdd1c62656596c486ac0e4d741d4ff326f56586.camel@sipsolutions.net> Message-ID: On Thu, 2018-04-26 at 19:26 +0200, Sebastian Berg wrote: > On Thu, 2018-04-26 at 09:51 -0700, Hameer Abbasi wrote: > > Hi Nathan, > > > > np.any and np.all call np.or.reduce and np.and.reduce respectively, > > and unfortunately the underlying function (ufunc.reduce) has no way > > of detecting that the value isn?t going to change anymore. It?s > > also > > used for (for example) np.sum (np.add.reduce), np.prod > > (np.multiply.reduce), np.min(np.minimum.reduce), > > np.max(np.maximum.reduce). > > > I would like to point out that this is not almost, but not quite > true. > The boolean versions will short circuit on the innermost level, which > is good enough for all practical purposes probably. > > One way to get around it would be to use a chunked iteration using > np.nditer in pure python. I admit it is a bit tricky to get start on, > but it is basically what numexpr uses also (at least in the simplest > mode), and if your arrays are relatively large, there is likely no > real > performance hit compared to a non-pure python version. > I mean something like this: def check_any(arr, func=lambda x: x, buffersize=0): """ Check if the function is true for any value in arr and stop once the first was found. Parameters ---------- arr : ndarray Array to test. func : function Function taking a 1D array as argument and returning an array (on which ``np.any`` will be called. buffersize : int Size of the chunk/buffer in the iteration, zero will use the default numpy value. Notes ----- The stopping does not occur immediatly but in buffersize chunks. """ iterflags = ['buffered', 'external_loop', 'refs_ok', 'zerosize_ok'] for chunk in np.nditer((arr,), flags=iterflags, buffersize=buffersize): if np.any(func(chunk)): return True return False not sure how it performs actually, but you can give it a try especially if you know you have large arrays, or if "func" is pretty expensive. If the input is already bool, it will be quite a bit slower though I am sure. - Sebastian > - Sebastian > > > > > > > You can find more information about this on the ufunc doc page. I > > don?t think it?s worth it to break this machinery for any and all, > > as > > it has numerous other advantages (such as being able to override in > > duck arrays, etc) > > > > Best regards, > > Hameer Abbasi > > Sent from Astro for Mac > > > > > On Apr 26, 2018 at 18:45, Nathan Goldbaum > > > wrote: > > > > > > Hi all, > > > > > > I was surprised recently to discover that both np.any and > > > np.all() > > > do not have a way to exit early: > > > > > > In [1]: import numpy as np > > > > > > In [2]: data = np.arange(1e6) > > > > > > In [3]: print(data[:10]) > > > [0. 1. 2. 3. 4. 5. 6. 7. 8. 9.] > > > > > > In [4]: %timeit np.any(data) > > > 724 us +- 42.4 us per loop (mean +- std. dev. of 7 runs, 1000 > > > loops > > > each) > > > > > > In [5]: data = np.zeros(int(1e6)) > > > > > > In [6]: %timeit np.any(data) > > > 732 us +- 52.9 us per loop (mean +- std. dev. of 7 runs, 1000 > > > loops > > > each) > > > > > > I don't see any discussions about this on the NumPy issue tracker > > > but perhaps I'm missing something. > > > > > > I'm curious if there's a way to get a fast early-terminating > > > search > > > in NumPy? Perhaps there's another package I can depend on that > > > does > > > this? I guess I could also write a bit of cython code that does > > > this but so far this project is pure python and I don't want to > > > deal with the packaging headache of getting wheels built and > > > conda- > > > forge packages set up on all platforms. > > > > > > Thanks for your help! > > > > > > -Nathan > > > > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at python.org > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From josef.pktd at gmail.com Thu Apr 26 18:43:33 2018 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 26 Apr 2018 18:43:33 -0400 Subject: [Numpy-discussion] Adding fweights and aweights to numpy.corrcoef In-Reply-To: <2897fa67e5db6b0e6c8b00b6c09f6b12b234a62b.camel@sipsolutions.net> References: <2897fa67e5db6b0e6c8b00b6c09f6b12b234a62b.camel@sipsolutions.net> Message-ID: On Thu, Apr 26, 2018 at 11:59 AM, Sebastian Berg wrote: > I seem to recall that there was a discussion on this and it was a lot > trickier then expected. > But given that numpy has the weights already for cov, then I don't see any additional issues whith adding it also to corrcoef. corrcoef is just rescaling the cov, so there is nothing special to add except that corrcoef hands off the options to cov. > I think statsmodels might have options in this direction. > statsmodels still has only fweights (case weights) for covariance and correlation Josef > - Sebastian > > > On Thu, 2018-04-26 at 15:44 +0000, Corin Hoad wrote: > > Hello, > > > > Would it be possible to add the fweights and aweights keyword > > arguments from np.cov to np.corrcoef? They would retain their meaning > > from np.cov as frequency- or importance-based weightings > > respectively. > > > > Yours, > > Corin Hoad > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From corinhoad at gmail.com Fri Apr 27 05:44:47 2018 From: corinhoad at gmail.com (Corin Hoad) Date: Fri, 27 Apr 2018 10:44:47 +0100 Subject: [Numpy-discussion] Adding fweights and aweights to numpy.corrcoef In-Reply-To: References: <2897fa67e5db6b0e6c8b00b6c09f6b12b234a62b.camel@sipsolutions.net> Message-ID: > > I seem to recall that there was a discussion on this and it was a lot >> trickier then expected. >> > > But given that numpy has the weights already for cov, then I don't see > any additional issues > whith adding it also to corrcoef. > > corrcoef is just rescaling the cov, so there is nothing special to add > except that corrcoef hands off the options to cov. > This was my understanding. I am currently just using my own copy of corrcoef which forwards the aweights and fweights arguments directly to np.cov. Is this the correct approach? Corin Hoad -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sat Apr 28 01:25:36 2018 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Fri, 27 Apr 2018 22:25:36 -0700 Subject: [Numpy-discussion] Changing the return type of np.histogramdd In-Reply-To: References: Message-ID: On Wed, Apr 25, 2018 at 11:00 PM, Eric Wieser wrote: > For precision loss of the order of float64 eps, I disagree. > > I was thinking more about precision loss on the order of 1, for large > 64-bit integers that can?t fit in a float64 > It's late and I'm probably missing something, but: >>> np.iinfo(np.int64).max > np.finfo(np.float64).max False Either way, such weights don't really happen in real code I think. > Note also that #10864 > incurs deliberate precision loss of the order 10**-6 x smallest bin, which > is also much larger than eps. > Yeah that's worse. > It?s also possible to refer users to scipy.stats.binned_statistic > > That sounds like a good idea to do irrespective of whether histogramdd has > problems - I had no idea those existed. Is there a precedent for referring > to more feature-rich scipy functions from the basic numpy ones? > Yes, there are cross-links to Python, SciPy and Matplotlib functions in the docs. This is done with intersphinx ( https://github.com/numpy/numpy/blob/master/doc/source/conf.py#L215). Example cross-link for convolve: https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.convolve.html Ralf > ? > > On Wed, 25 Apr 2018 at 22:51 Ralf Gommers wrote: > >> On Wed, Apr 25, 2018 at 10:07 PM, Eric Wieser < >> wieser.eric+numpy at gmail.com> wrote: >> >>> what does that gain over having the user do something like >>> result.astype() >>> >>> It means that the user can use integer weights without worrying about >>> losing precision due to an intermediate float representation. >>> >>> It also means they can use higher precision values (np.longdouble) or >>> complex weights. >>> >> None of that seems particularly important to be honest. >> >> you?re emitting warnings for everyone >>> >>> When there?s a risk of precision loss, that seems like the responsible >>> thing to do. >>> >> For precision loss of the order of float64 eps, I disagree. There will be >> many such places in numpy and in other core libraries. >> >> >>> Users passing float weights would see no warning, I suppose. >>> >>> is this really worth a new function >>> >>> There ought to be a function for computing histograms with integer >>> weights that doesn?t lose precision. Either we change the existing function >>> to do that, or we make a new function. >>> >> It's also possible to refer users to scipy.stats.binned_statistic(_2d/dd), >> which provides a superset of the histogram functionality and is internally >> consistent because the implementations of 1d/2d call the dd one. >> >> Ralf >> >> >> >>> A possible compromise: like 1, but only change the dtype of the result >>> if a weights argument is passed. >>> >>> #10864 seems like a >>> worrying design flaw too, but I suppose that can be dealt with separately. >>> >>> Eric >>> ? >>> >>> On Wed, 25 Apr 2018 at 21:57 Ralf Gommers >>> wrote: >>> >>>> On Mon, Apr 9, 2018 at 10:24 PM, Eric Wieser < >>>> wieser.eric+numpy at gmail.com> wrote: >>>> >>>>> Numpy has three histogram functions - histogram, histogram2d, and >>>>> histogramdd. >>>>> >>>>> histogram is by far the most widely used, and in the absence of >>>>> weights and normalization, returns an np.intp count for each bin. >>>>> >>>>> histogramdd (for which histogram2d is a wrapper) returns np.float64 >>>>> in all circumstances. >>>>> >>>>> As a contrived comparison >>>>> >>>>> >>> x = np.linspace(0, 1)>>> h, e = np.histogram(x*x, bins=4); h >>>>> array([25, 10, 8, 7], dtype=int64)>>> h, e = np.histogramdd((x*x,), bins=4); h >>>>> array([25., 10., 8., 7.]) >>>>> >>>>> https://github.com/numpy/numpy/issues/7845 tracks this inconsistency. >>>>> >>>>> The fix is now trivial: the question is, will changing the return type >>>>> break people?s code? >>>>> >>>>> Either we should: >>>>> >>>>> 1. Just change it, and hope no one is broken by it >>>>> 2. Add a dtype argument: >>>>> - If dtype=None, behave like np.histogram >>>>> - If dtype is not specified, emit a future warning recommending >>>>> to use dtype=None or dtype=float >>>>> - In future, change the default to None >>>>> 3. Create a new better-named function histogram_nd, which can also >>>>> be created without the mistake that is https://github.com/numpy/ >>>>> numpy/issues/10864. >>>>> >>>>> Thoughts? >>>>> >>>> >>>> (1) sems like a no-go, taking such risks isn't justified by a minor >>>> inconsistency. >>>> >>>> (2) is still fairly intrusive, you're emitting warnings for everyone >>>> and still force people to change their code (and if they don't they may run >>>> into a backwards compat break). >>>> >>>> (3) is the best of these options, however is this really worth a new >>>> function? My vote would be "do nothing". >>>> >>>> Ralf >>>> >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at python.org >>>> https://mail.python.org/mailman/listinfo/numpy-discussion >>>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >>> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wieser.eric+numpy at gmail.com Sat Apr 28 02:38:35 2018 From: wieser.eric+numpy at gmail.com (Eric Wieser) Date: Sat, 28 Apr 2018 06:38:35 +0000 Subject: [Numpy-discussion] Changing the return type of np.histogramdd In-Reply-To: References: Message-ID: It?s late and I?m probably missing something The issue is not one of range as you showed there, but of precision. Here?s the test case you?re missing: def get_err(u64): """ return the absolute error incurred by storing a uint64 in a float64 "" u64 = np.uint64(u64) return u64 - u64.astype(np.float64).astype(np.uint64) The problem starts appearing with >>> get_err(2**53 + 1)1 and only gets worse as the size of the integers increases >>> get_err(2**64 - 2*10)9223372036854775788 # this is a lot bigger than float64.eps (although as a relative error, it's similar) Either way, such weights don?t really happen in real code I think. The counterexample I can think of is someone trying to implement fixed-precision arithmetic with large integers. The intersection of people doing both that and histogramdd is probably very small, but it?s at least plausible. Yes, there are cross-links to Python, SciPy and Matplotlib functions in the docs. Great, that was what I was unsure of. I was worried that linking to upstream projects would be sort of weird, but practicality beats purity for sure here. Eric ? On Fri, 27 Apr 2018 at 22:26 Ralf Gommers wrote: > On Wed, Apr 25, 2018 at 11:00 PM, Eric Wieser > wrote: > >> For precision loss of the order of float64 eps, I disagree. >> >> I was thinking more about precision loss on the order of 1, for large >> 64-bit integers that can?t fit in a float64 >> > It's late and I'm probably missing something, but: > > >>> np.iinfo(np.int64).max > np.finfo(np.float64).max > False > > Either way, such weights don't really happen in real code I think. > > >> Note also that #10864 >> incurs deliberate precision loss of the order 10**-6 x smallest bin, which >> is also much larger than eps. >> > Yeah that's worse. > > >> It?s also possible to refer users to scipy.stats.binned_statistic >> >> That sounds like a good idea to do irrespective of whether histogramdd >> has problems - I had no idea those existed. Is there a precedent for >> referring to more feature-rich scipy functions from the basic numpy ones? >> > Yes, there are cross-links to Python, SciPy and Matplotlib functions in > the docs. This is done with intersphinx ( > https://github.com/numpy/numpy/blob/master/doc/source/conf.py#L215). > Example cross-link for convolve: > https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.convolve.html > > Ralf > > > >> ? >> >> On Wed, 25 Apr 2018 at 22:51 Ralf Gommers wrote: >> >>> On Wed, Apr 25, 2018 at 10:07 PM, Eric Wieser < >>> wieser.eric+numpy at gmail.com> wrote: >>> >>>> what does that gain over having the user do something like >>>> result.astype() >>>> >>>> It means that the user can use integer weights without worrying about >>>> losing precision due to an intermediate float representation. >>>> >>>> It also means they can use higher precision values (np.longdouble) or >>>> complex weights. >>>> >>> None of that seems particularly important to be honest. >>> >>> you?re emitting warnings for everyone >>>> >>>> When there?s a risk of precision loss, that seems like the responsible >>>> thing to do. >>>> >>> For precision loss of the order of float64 eps, I disagree. There will >>> be many such places in numpy and in other core libraries. >>> >>> >>>> Users passing float weights would see no warning, I suppose. >>>> >>>> is this really worth a new function >>>> >>>> There ought to be a function for computing histograms with integer >>>> weights that doesn?t lose precision. Either we change the existing function >>>> to do that, or we make a new function. >>>> >>> It's also possible to refer users to >>> scipy.stats.binned_statistic(_2d/dd), which provides a superset of the >>> histogram functionality and is internally consistent because the >>> implementations of 1d/2d call the dd one. >>> >>> Ralf >>> >>> >>> >>>> A possible compromise: like 1, but only change the dtype of the result >>>> if a weights argument is passed. >>>> >>>> #10864 seems like a >>>> worrying design flaw too, but I suppose that can be dealt with separately. >>>> >>>> Eric >>>> ? >>>> >>>> On Wed, 25 Apr 2018 at 21:57 Ralf Gommers >>>> wrote: >>>> >>>>> On Mon, Apr 9, 2018 at 10:24 PM, Eric Wieser < >>>>> wieser.eric+numpy at gmail.com> wrote: >>>>> >>>>>> Numpy has three histogram functions - histogram, histogram2d, and >>>>>> histogramdd. >>>>>> >>>>>> histogram is by far the most widely used, and in the absence of >>>>>> weights and normalization, returns an np.intp count for each bin. >>>>>> >>>>>> histogramdd (for which histogram2d is a wrapper) returns np.float64 >>>>>> in all circumstances. >>>>>> >>>>>> As a contrived comparison >>>>>> >>>>>> >>> x = np.linspace(0, 1)>>> h, e = np.histogram(x*x, bins=4); h >>>>>> array([25, 10, 8, 7], dtype=int64)>>> h, e = np.histogramdd((x*x,), bins=4); h >>>>>> array([25., 10., 8., 7.]) >>>>>> >>>>>> https://github.com/numpy/numpy/issues/7845 tracks this inconsistency. >>>>>> >>>>>> The fix is now trivial: the question is, will changing the return >>>>>> type break people?s code? >>>>>> >>>>>> Either we should: >>>>>> >>>>>> 1. Just change it, and hope no one is broken by it >>>>>> 2. Add a dtype argument: >>>>>> - If dtype=None, behave like np.histogram >>>>>> - If dtype is not specified, emit a future warning >>>>>> recommending to use dtype=None or dtype=float >>>>>> - In future, change the default to None >>>>>> 3. Create a new better-named function histogram_nd, which can >>>>>> also be created without the mistake that is >>>>>> https://github.com/numpy/numpy/issues/10864. >>>>>> >>>>>> Thoughts? >>>>>> >>>>> >>>>> (1) sems like a no-go, taking such risks isn't justified by a minor >>>>> inconsistency. >>>>> >>>>> (2) is still fairly intrusive, you're emitting warnings for everyone >>>>> and still force people to change their code (and if they don't they may run >>>>> into a backwards compat break). >>>>> >>>>> (3) is the best of these options, however is this really worth a new >>>>> function? My vote would be "do nothing". >>>>> >>>>> Ralf >>>>> >>>>> _______________________________________________ >>>>> NumPy-Discussion mailing list >>>>> NumPy-Discussion at python.org >>>>> https://mail.python.org/mailman/listinfo/numpy-discussion >>>>> >>>> >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at python.org >>>> https://mail.python.org/mailman/listinfo/numpy-discussion >>>> >>>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> >> _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From allanhaldane at gmail.com Sat Apr 28 13:50:34 2018 From: allanhaldane at gmail.com (Allan Haldane) Date: Sat, 28 Apr 2018 13:50:34 -0400 Subject: [Numpy-discussion] NumPy 1.14.3 released Message-ID: <2135b615-5f9c-8f1c-66b4-bd7a0e1d0506@gmail.com> Hi All, I am pleased to announce the release of NumPy 1.14.3. This is a bugfix release for a few bugs reported following the 1.14.2 release: * np.lib.recfunctions.fromrecords accepts a list-of-lists, until 1.15 * In python2, float types use the new print style when printing to a file * style arg in "legacy" print mode now works for 0d arrays The Python versions supported in this release are 2.7 and 3.4 - 3.6. The Python 3.6 wheels available from PIP are built with Python 3.6.2 and should be compatible with all previous versions of Python 3.6. The source releases were cythonized with Cython 0.28.2. Contributors ============ A total of 6 people contributed to this release. People with a "+" by their names contributed a patch for the first time. * Allan Haldane * Charles Harris * Jonathan March + * Malcolm Smith + * Matti Picus * Pauli Virtanen Pull requests merged ==================== A total of 8 pull requests were merged for this release. * `#10862 `__: BUG: floating types should override tp_print (1.14 backport) * `#10905 `__: BUG: for 1.14 back-compat, accept list-of-lists in fromrecords * `#10947 `__: BUG: 'style' arg to array2string broken in legacy mode (1.14... * `#10959 `__: BUG: test, fix for missing flags['WRITEBACKIFCOPY'] key * `#10960 `__: BUG: Add missing underscore to prototype in check_embedded_lapack * `#10961 `__: BUG: Fix encoding regression in ma/bench.py (Issue #10868) * `#10962 `__: BUG: core: fix NPY_TITLE_KEY macro on pypy * `#10974 `__: BUG: test, fix PyArray_DiscardWritebackIfCopy... Cheers, Allan Haldane From charlesr.harris at gmail.com Sat Apr 28 21:50:14 2018 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 29 Apr 2018 01:50:14 +0000 Subject: [Numpy-discussion] NumPy 1.14.3 released In-Reply-To: <2135b615-5f9c-8f1c-66b4-bd7a0e1d0506@gmail.com> References: <2135b615-5f9c-8f1c-66b4-bd7a0e1d0506@gmail.com> Message-ID: On Sat, Apr 28, 2018, 1:51 PM Allan Haldane wrote: > Hi All, > > I am pleased to announce the release of NumPy 1.14.3. This is a bugfix > release for a few bugs reported following the 1.14.2 release: > > * np.lib.recfunctions.fromrecords accepts a list-of-lists, until 1.15 > * In python2, float types use the new print style when printing to a file > * style arg in "legacy" print mode now works for 0d arrays > > The Python versions supported in this release are 2.7 and 3.4 - 3.6. The > Python 3.6 wheels available from PIP are built with Python 3.6.2 and should > be compatible with all previous versions of Python 3.6. The source releases > were cythonized with Cython 0.28.2. > > Contributors > ============ > > A total of 6 people contributed to this release. People with a "+" by > their > names contributed a patch for the first time. > > * Allan Haldane > * Charles Harris > * Jonathan March + > * Malcolm Smith + > * Matti Picus > * Pauli Virtanen > > Pull requests merged > ==================== > > A total of 8 pull requests were merged for this release. > > * `#10862 `__: BUG: floating > types should override tp_print (1.14 backport) > * `#10905 `__: BUG: for 1.14 > back-compat, accept list-of-lists in fromrecords > * `#10947 `__: BUG: 'style' > arg to array2string broken in legacy mode (1.14... > * `#10959 `__: BUG: test, fix > for missing flags['WRITEBACKIFCOPY'] key > * `#10960 `__: BUG: Add > missing underscore to prototype in check_embedded_lapack > * `#10961 `__: BUG: Fix > encoding regression in ma/bench.py (Issue #10868) > * `#10962 `__: BUG: core: fix > NPY_TITLE_KEY macro on pypy > * `#10974 `__: BUG: test, fix > PyArray_DiscardWritebackIfCopy... > > Cheers, > > Allan Haldane > Congratulations. If you have any useful information to add to the release documentation, please do so. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From matti.picus at gmail.com Sun Apr 29 05:46:36 2018 From: matti.picus at gmail.com (Matti Picus) Date: Sun, 29 Apr 2018 12:46:36 +0300 Subject: [Numpy-discussion] Extending ufunc signature syntax for matmul, frozen dimensions Message-ID: In looking to solve issue #9028 "no way to override matmul/@ if __array_ufunc__ is set", it seems there is consensus around the idea of making matmul a true gufunc, but matmul can behave differently for different combinations of array and vector: (n,k),(k,m)->(n,m) (n,k),(k) -> (n) (k),(k,m)->(m) Currently there is no way to express that in the ufunc signature. The proposed solution to issue #9029 is to extend the meaning of a signature so "syntax like (n?,k),(k,m?)->(n?,m?) could mean that n and m are optional dimensions; if missing in the input, they're treated as 1, and then dropped from the output" Additionally, there is an open pull request #5015 "Add frozen dimensions to gufunc signatures" to allow signatures like '(3),(3)->(3)'. I would like extending ufunc signature handling to implement both these ideas, in a way that would be backwardly-compatible with the publicly exposed PyUFuncObject. PyUFunc_FromFuncAndDataAndSignature is used to allocate and initialize a PyUFuncObject, are there downstream projects that allocate their own PyUFuncObject not via PyUFunc_FromFuncAndDataAndSignature? If so, we could use one of the "reserved" fields, or extend the meaning of the "identity" field to allow version detection. Any thoughts? Any other thoughts about extending the signature syntax? Thanks, Matti From m.h.vankerkwijk at gmail.com Sun Apr 29 11:13:29 2018 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Sun, 29 Apr 2018 11:13:29 -0400 Subject: [Numpy-discussion] Extending ufunc signature syntax for matmul, frozen dimensions In-Reply-To: References: Message-ID: Hi Matti, This sounds great. For completeness, you omitted the vector-vector case for matmul '(k),(k)->()' - but the suggested new signature for `matmul` would cover that case as well, so not a problem. All the best, Marten From vs at it.uu.se Sun Apr 29 16:46:51 2018 From: vs at it.uu.se (Virgil Stokes) Date: Sun, 29 Apr 2018 22:46:51 +0200 Subject: [Numpy-discussion] numpy.pad -- problem? Message-ID: Here is a python code snippet: # python vers. 3.6.5 (v3.6.5:f59c0932b4, Mar 28 2018, 17:00:18) [MSC v.1900 64 bit (AMD64)] import numpy as np? # numpy vers. 1.14.3 #import matplotlib.pyplot as plt N?? = 21 amp = 10 t?? = np.linspace(0.0,N-1,N) arg = 2.0*np.pi/(N-1) y = amp*np.sin(arg*t) print('y:\n',y) print('mean(y): ',np.mean(y)) #plt.plot(t,y) #plt.show() ypad = np.pad(y, (3,2),'mean') print('ypad:\n',ypad) When I execute this the outputs are: y: ?[ 0.00000000e+00? 3.09016994e+00? 5.87785252e+00 8.09016994e+00 ? 9.51056516e+00? 1.00000000e+01? 9.51056516e+00 8.09016994e+00 ? 5.87785252e+00? 3.09016994e+00? 1.22464680e-15 -3.09016994e+00 ?-5.87785252e+00 -8.09016994e+00 -9.51056516e+00 -1.00000000e+01 ?-9.51056516e+00 -8.09016994e+00 -5.87785252e+00 -3.09016994e+00 ?-2.44929360e-15] mean(y):? -1.3778013372117948e-16 ypad: ?[-1.37780134e-16 -1.37780134e-16 -1.37780134e-16 0.00000000e+00 ? 3.09016994e+00? 5.87785252e+00? 8.09016994e+00 9.51056516e+00 ? 1.00000000e+01? 9.51056516e+00? 8.09016994e+00 5.87785252e+00 ? 3.09016994e+00? 1.22464680e-15 -3.09016994e+00 -5.87785252e+00 ?-8.09016994e+00 -9.51056516e+00 -1.00000000e+01 -9.51056516e+00 ?-8.09016994e+00 -5.87785252e+00 -3.09016994e+00 -2.44929360e-15 ?-7.40148683e-17 -7.40148683e-17] The left pad is correct, but the right pad is different and not the mean of y)? --- why? -------------- next part -------------- An HTML attachment was scrubbed... URL: From deak.andris at gmail.com Sun Apr 29 17:36:29 2018 From: deak.andris at gmail.com (Andras Deak) Date: Sun, 29 Apr 2018 23:36:29 +0200 Subject: [Numpy-discussion] numpy.pad -- problem? In-Reply-To: References: Message-ID: > mean(y): -1.3778013372117948e-16 > ypad: > [-1.37780134e-16 -1.37780134e-16 -1.37780134e-16 0.00000000e+00 > 3.09016994e+00 5.87785252e+00 8.09016994e+00 9.51056516e+00 > 1.00000000e+01 9.51056516e+00 8.09016994e+00 5.87785252e+00 > 3.09016994e+00 1.22464680e-15 -3.09016994e+00 -5.87785252e+00 > -8.09016994e+00 -9.51056516e+00 -1.00000000e+01 -9.51056516e+00 > -8.09016994e+00 -5.87785252e+00 -3.09016994e+00 -2.44929360e-15 > -7.40148683e-17 -7.40148683e-17] > > The left pad is correct, but the right pad is different and not the mean of > y) --- why? This is how np.pad computes mean padding: https://github.com/numpy/numpy/blob/01541f2822d0d4b37b96f6b42e35963b132f1947/numpy/lib/arraypad.py#L1396-L1400 elif mode == 'mean': for axis, ((pad_before, pad_after), (chunk_before, chunk_after)) \ in enumerate(zip(pad_width, kwargs['stat_length'])): newmat = _prepend_mean(newmat, pad_before, chunk_before, axis) newmat = _append_mean(newmat, pad_after, chunk_after, axis) That is, first the mean is prepended, then appended, and in the latter step the updates (front-padded) array is used for computing the mean again. Note that with arbitrary precision this is fine, since appending n*`mean` to an array with mean `mean` should preserve the mean. But with doubles you can get errors on the order of the machine epsilon, which is what happens here: In [16]: ypad[3:-2].mean() Out[16]: -1.1663302849022412e-16 In [17]: ypad[:-2].mean() Out[17]: -3.700743415417188e-17 So the prepended values are `y.mean()`, but the appended values are `ypad[:-2].mean()` which includes the near-zero padding values. I don't think this error should be a problem in practice, but I agree it's surprising. Andr?s From deak.andris at gmail.com Sun Apr 29 17:38:23 2018 From: deak.andris at gmail.com (Andras Deak) Date: Sun, 29 Apr 2018 23:38:23 +0200 Subject: [Numpy-discussion] numpy.pad -- problem? In-Reply-To: References: Message-ID: PS. my exact numbers are different from yours (probably a multithreaded thing?), but `ypad[:-2].mean()` agrees with the last 3 elements in `ypad` in my case and I'm sure this is true for yours too. On Sun, Apr 29, 2018 at 11:36 PM, Andras Deak wrote: >> mean(y): -1.3778013372117948e-16 >> ypad: >> [-1.37780134e-16 -1.37780134e-16 -1.37780134e-16 0.00000000e+00 >> 3.09016994e+00 5.87785252e+00 8.09016994e+00 9.51056516e+00 >> 1.00000000e+01 9.51056516e+00 8.09016994e+00 5.87785252e+00 >> 3.09016994e+00 1.22464680e-15 -3.09016994e+00 -5.87785252e+00 >> -8.09016994e+00 -9.51056516e+00 -1.00000000e+01 -9.51056516e+00 >> -8.09016994e+00 -5.87785252e+00 -3.09016994e+00 -2.44929360e-15 >> -7.40148683e-17 -7.40148683e-17] >> >> The left pad is correct, but the right pad is different and not the mean of >> y) --- why? > > This is how np.pad computes mean padding: > https://github.com/numpy/numpy/blob/01541f2822d0d4b37b96f6b42e35963b132f1947/numpy/lib/arraypad.py#L1396-L1400 > elif mode == 'mean': > for axis, ((pad_before, pad_after), (chunk_before, chunk_after)) \ > in enumerate(zip(pad_width, kwargs['stat_length'])): > newmat = _prepend_mean(newmat, pad_before, chunk_before, axis) > newmat = _append_mean(newmat, pad_after, chunk_after, axis) > > That is, first the mean is prepended, then appended, and in the latter > step the updates (front-padded) array is used for computing the mean > again. Note that with arbitrary precision this is fine, since > appending n*`mean` to an array with mean `mean` should preserve the > mean. But with doubles you can get errors on the order of the machine > epsilon, which is what happens here: > > In [16]: ypad[3:-2].mean() > Out[16]: -1.1663302849022412e-16 > > In [17]: ypad[:-2].mean() > Out[17]: -3.700743415417188e-17 > > So the prepended values are `y.mean()`, but the appended values are > `ypad[:-2].mean()` which includes the near-zero padding values. I > don't think this error should be a problem in practice, but I agree > it's surprising. > > Andr?s From wieser.eric+numpy at gmail.com Sun Apr 29 17:39:34 2018 From: wieser.eric+numpy at gmail.com (Eric Wieser) Date: Sun, 29 Apr 2018 21:39:34 +0000 Subject: [Numpy-discussion] numpy.pad -- problem? In-Reply-To: References: Message-ID: I would consider this a bug, and think we should fix this. On Sun, 29 Apr 2018 at 13:48 Virgil Stokes wrote: > Here is a python code snippet: > > # python vers. 3.6.5 (v3.6.5:f59c0932b4, Mar 28 2018, 17:00:18) [MSC > v.1900 64 bit (AMD64)] > import numpy as np # numpy vers. 1.14.3 > #import matplotlib.pyplot as plt > > N = 21 > amp = 10 > t = np.linspace(0.0,N-1,N) > arg = 2.0*np.pi/(N-1) > > y = amp*np.sin(arg*t) > print('y:\n',y) > print('mean(y): ',np.mean(y)) > > #plt.plot(t,y) > #plt.show() > > ypad = np.pad(y, (3,2),'mean') > print('ypad:\n',ypad) > > When I execute this the outputs are: > > y: > [ 0.00000000e+00 3.09016994e+00 5.87785252e+00 8.09016994e+00 > 9.51056516e+00 1.00000000e+01 9.51056516e+00 8.09016994e+00 > 5.87785252e+00 3.09016994e+00 1.22464680e-15 -3.09016994e+00 > -5.87785252e+00 -8.09016994e+00 -9.51056516e+00 -1.00000000e+01 > -9.51056516e+00 -8.09016994e+00 -5.87785252e+00 -3.09016994e+00 > -2.44929360e-15] > mean(y): -1.3778013372117948e-16 > ypad: > [-1.37780134e-16 -1.37780134e-16 -1.37780134e-16 0.00000000e+00 > 3.09016994e+00 5.87785252e+00 8.09016994e+00 9.51056516e+00 > 1.00000000e+01 9.51056516e+00 8.09016994e+00 5.87785252e+00 > 3.09016994e+00 1.22464680e-15 -3.09016994e+00 -5.87785252e+00 > -8.09016994e+00 -9.51056516e+00 -1.00000000e+01 -9.51056516e+00 > -8.09016994e+00 -5.87785252e+00 -3.09016994e+00 -2.44929360e-15 > -7.40148683e-17 -7.40148683e-17] > > The left pad is correct, but the right pad is different and not the mean > of y) --- why? > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From deak.andris at gmail.com Sun Apr 29 17:44:43 2018 From: deak.andris at gmail.com (Andras Deak) Date: Sun, 29 Apr 2018 23:44:43 +0200 Subject: [Numpy-discussion] numpy.pad -- problem? In-Reply-To: References: Message-ID: On Sun, Apr 29, 2018 at 11:39 PM, Eric Wieser wrote: > I would consider this a bug, and think we should fix this. In that case `mode='median'` should probably fixed as well. From matti.picus at gmail.com Mon Apr 30 12:24:21 2018 From: matti.picus at gmail.com (Matti Picus) Date: Mon, 30 Apr 2018 19:24:21 +0300 Subject: [Numpy-discussion] summary of "office Hours" open discusison April 25 Message-ID: <039ab238-be1f-c83f-8a2e-8925033b102a@gmail.com> Office Hours 25April 2018 12:00 -13:00 PDT Present: Matti Picus, Allan Haldane, Ralf Gommers, Matthew Brett, Tyler Reddy, St?fan van der Walt, Hameer Abbasi Some of the people were not present for the entire discussion, audio was a little flaky at times. Topics: Grant background overview Matti has been browsing through issues and pull-requests to try to get a handle on common themes and community pain points. - Policy questions: - Do we close duplicate issues? (answer - Yes, referencing the other issue, as long as they are true duplicates ) - Do we close tutorial-like issues that are documented?(answer - Yes, maybe improving documentation) - Common theme - there are many issues about overflow, mainly about int32. Maybe add a mode or command switch for warning on int32 overflow? - Requested topic for discussion - improving CI and MacOS testing ? - How to filter CI issues on github? There is a component:build label but it is not CI specific ? - What about MacOS testing - should it be sending notices? (answer - Probably) ? - Running ASV benchmarking (https://asv.readthedocs.io/en/latest/). It is done with SciPy, but it is fragile, not done nightly; need ability to run branches more robustly documentation on SciPy site https://github.com/scipy/scipy/tree/master/benchmarks - Hameer: f2py during testing is the system one, not the internal one Most of the remaining discussion was a meta-discussion about how the community will continue to decide priorities and influence how the full-time developers spend their time. - Setting up a community-driven roadmap would be useful - Be aware of the risks of having devoted developer time on a community project - Influence can be subtle: ideally, community writes roadmap, instead of simply commenting on proposal - Can we distill past lessons to inform future decisions? - In general, how to determine community priorities? - Constant communication paramount, looks like things are going in the right direction. Furher resources to consider: - How did Jupyter organize their roadmap (ask Brian Granger)? - How did Pandas run the project with a full time maintainer (Jeff Reback)? - Can we copy other projects' management guidelines? We did not set a time for another online discussion, since it was felt that maybe near/during the sprint in May would be appropriate. I apologize for any misrepresentation. Matti Picus From m.h.vankerkwijk at gmail.com Mon Apr 30 12:55:52 2018 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Mon, 30 Apr 2018 12:55:52 -0400 Subject: [Numpy-discussion] axis and keepdims arguments for generalized ufuncs Message-ID: Hi All, When introducing the ``axes`` argument for generalized ufuncs, the plan was to eventually also add ``axis`` and ``keepdims`` for reduction-like gufuncs. I have now attempted to do so in https://github.com/numpy/numpy/pull/11018 It is not completely feature-compatible with reductions in that one cannot (yet) pass in a tuple or None to ``axis``. Comments most welcome. All the best, Marten From m.h.vankerkwijk at gmail.com Mon Apr 30 12:53:00 2018 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Mon, 30 Apr 2018 12:53:00 -0400 Subject: [Numpy-discussion] Extending ufunc signature syntax for matmul, frozen dimensions In-Reply-To: References: Message-ID: I thought a bit further about this proposal: a disadvantage for matmul specifically is that is does not solve the need for `matvec`, `vecmat`, and `vecvec` gufuncs. That said, it might make sense to implement those as "pseudo-ufuncs" that just add a 1 in the right place and call `matmul`... -- Marten From shoyer at gmail.com Mon Apr 30 17:33:19 2018 From: shoyer at gmail.com (Stephan Hoyer) Date: Mon, 30 Apr 2018 21:33:19 +0000 Subject: [Numpy-discussion] Extending ufunc signature syntax for matmul, frozen dimensions In-Reply-To: References: Message-ID: On Sun, Apr 29, 2018 at 2:48 AM Matti Picus wrote: > The proposed solution to issue #9029 is to extend the meaning of a > signature so "syntax like (n?,k),(k,m?)->(n?,m?) could mean that n and m > are optional dimensions; if missing in the input, they're treated as 1, and > then dropped from the output" I agree that this is an elegant fix for matmul, but are there other use-cases for "optional dimensions" in gufuncs? It feels a little wrong to add gufunc features if we can only think of one function that can use them. -------------- next part -------------- An HTML attachment was scrubbed... URL: From wieser.eric+numpy at gmail.com Mon Apr 30 17:38:16 2018 From: wieser.eric+numpy at gmail.com (Eric Wieser) Date: Mon, 30 Apr 2018 21:38:16 +0000 Subject: [Numpy-discussion] Extending ufunc signature syntax for matmul, frozen dimensions In-Reply-To: References: Message-ID: I think I?m -1 on this - this just makes things harder on the implementers of _array_ufunc__ who now might have to work out which signature matches. I?d prefer the solution where np.matmul is a wrapper around one of three gufuncs (or maybe just around one with axis insertion) - this is similar to how np.linalg already works. Eric ? On Mon, 30 Apr 2018 at 14:34 Stephan Hoyer wrote: > On Sun, Apr 29, 2018 at 2:48 AM Matti Picus wrote: > >> The proposed solution to issue #9029 is to extend the meaning of a >> signature so "syntax like (n?,k),(k,m?)->(n?,m?) could mean that n and m >> are optional dimensions; if missing in the input, they're treated as 1, and >> then dropped from the output" > > > I agree that this is an elegant fix for matmul, but are there other > use-cases for "optional dimensions" in gufuncs? > > It feels a little wrong to add gufunc features if we can only think of one > function that can use them. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From allanhaldane at gmail.com Mon Apr 30 18:45:14 2018 From: allanhaldane at gmail.com (Allan Haldane) Date: Mon, 30 Apr 2018 18:45:14 -0400 Subject: [Numpy-discussion] Extending ufunc signature syntax for matmul, frozen dimensions In-Reply-To: References: Message-ID: On 04/29/2018 05:46 AM, Matti Picus wrote: > In looking to solve issue #9028 "no way to override matmul/@ if > __array_ufunc__ is set", it seems there is consensus around the idea of > making matmul a true gufunc, but matmul can behave differently for > different combinations of array and vector: > > (n,k),(k,m)->(n,m) > (n,k),(k) -> (n) > (k),(k,m)->(m) > > Currently there is no way to express that in the ufunc signature. The > proposed solution to issue #9029 is to extend the meaning of a signature > so "syntax like (n?,k),(k,m?)->(n?,m?) could mean that n and m are > optional dimensions; if missing in the input, they're treated as 1, and > then dropped from the output" Additionally, there is an open pull > request #5015 "Add frozen dimensions to gufunc signatures" to allow > signatures like '(3),(3)->(3)'. How much harder would it be to implement multiple-dispatch for gufunc signatures, instead of modifying the signature to include `?` ? There was some discussion of this last year: http://numpy-discussion.10968.n7.nabble.com/Changes-to-generalized-ufunc-core-dimension-checking-tp42618p42638.html That sounded like a clean solution to me, although I'm a bit ignorant of the gufunc internals and the compatibility constraints. I assume gufuncs already have code to match the signature to the array dims, so it sounds fairly straightforward (I say without looking at any code) to do this in a loop over alternate signatures until one works. Allan