From felix.hartmann at crans.org Mon Jul 1 08:11:15 2013 From: felix.hartmann at crans.org (=?UTF-8?B?RsOpbGl4?= Hartmann) Date: Mon, 1 Jul 2013 14:11:15 +0200 Subject: [Numpy-discussion] np.insert with axis=-1 Message-ID: <20130701141115.480c8103@artemis.nancy.inra.local> Hi all, I recently upgraded from Numpy 1.6.2 to 1.7.1 on my Debian testing, and then got a bug in a program that was previously working. It turned out that the problem comes from the np.insert function when the argument `axis=-1` is given. Here is a minimal example: >>> u = np.zeros((2,3,4)) >>> ui = np.ones((2,3)) >>> u = np.insert(u, 1, ui, axis=-1) The last line should be equivalent to >>> u = np.insert(u, 1, ui, axis=2) It was indeed the case in Numpy 1.6, but in 1.7.1 it raises a ValueError exception. Note that the problem seems specific to axis=-1, and not to all negative axis values, since the following example works as expected: >>> u = np.zeros((2,3,4)) >>> ui = np.ones((2,4)) >>> u = np.insert(u, 1, ui, axis=-2) # equivalent to axis=1 I didn't check on current master, so maybe things have changed since 1.7.1. If they have not, do you think a bug report would be relevant? Cheers, F?lix From sebastian at sipsolutions.net Mon Jul 1 11:54:36 2013 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Mon, 01 Jul 2013 17:54:36 +0200 Subject: [Numpy-discussion] np.insert with axis=-1 In-Reply-To: <20130701141115.480c8103@artemis.nancy.inra.local> References: <20130701141115.480c8103@artemis.nancy.inra.local> Message-ID: <1372694076.16404.2.camel@sebastian-laptop> On Mon, 2013-07-01 at 14:11 +0200, F?lix Hartmann wrote: > Hi all, > > I recently upgraded from Numpy 1.6.2 to 1.7.1 on my Debian testing, and > then got a bug in a program that was previously working. It turned out > that the problem comes from the np.insert function when the argument > `axis=-1` is given. > Dang, yes, its a pretty stupid bug, exists basically the same in both 1.7 and 1.8. If you got a minute, it is because of np.rollaxis usage, and in it there it says `axis-1` which is wrong for negative axes! Could you create a pull request to fix that? That would be great. - Sebastian > Here is a minimal example: > >>> u = np.zeros((2,3,4)) > >>> ui = np.ones((2,3)) > >>> u = np.insert(u, 1, ui, axis=-1) > > The last line should be equivalent to > >>> u = np.insert(u, 1, ui, axis=2) > > It was indeed the case in Numpy 1.6, but in 1.7.1 it raises a > ValueError exception. > > Note that the problem seems specific to axis=-1, and not to all negative > axis values, since the following example works as expected: > >>> u = np.zeros((2,3,4)) > >>> ui = np.ones((2,4)) > >>> u = np.insert(u, 1, ui, axis=-2) # equivalent to axis=1 > > I didn't check on current master, so maybe things have changed since > 1.7.1. If they have not, do you think a bug report would be relevant? > > Cheers, > F?lix > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From sebastian at sipsolutions.net Mon Jul 1 12:04:18 2013 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Mon, 01 Jul 2013 18:04:18 +0200 Subject: [Numpy-discussion] np.insert with axis=-1 In-Reply-To: <1372694076.16404.2.camel@sebastian-laptop> References: <20130701141115.480c8103@artemis.nancy.inra.local> <1372694076.16404.2.camel@sebastian-laptop> Message-ID: <1372694658.16404.3.camel@sebastian-laptop> On Mon, 2013-07-01 at 17:54 +0200, Sebastian Berg wrote: > On Mon, 2013-07-01 at 14:11 +0200, F?lix Hartmann wrote: > > Hi all, > > > > I recently upgraded from Numpy 1.6.2 to 1.7.1 on my Debian testing, and > > then got a bug in a program that was previously working. It turned out > > that the problem comes from the np.insert function when the argument > > `axis=-1` is given. > > > > Dang, yes, its a pretty stupid bug, exists basically the same in both > 1.7 and 1.8. If you got a minute, it is because of np.rollaxis usage, > and in it there it says `axis-1` which is wrong for negative axes! > That is axis + 1 of course... > Could you create a pull request to fix that? That would be great. > > - Sebastian > > > Here is a minimal example: > > >>> u = np.zeros((2,3,4)) > > >>> ui = np.ones((2,3)) > > >>> u = np.insert(u, 1, ui, axis=-1) > > > > The last line should be equivalent to > > >>> u = np.insert(u, 1, ui, axis=2) > > > > It was indeed the case in Numpy 1.6, but in 1.7.1 it raises a > > ValueError exception. > > > > Note that the problem seems specific to axis=-1, and not to all negative > > axis values, since the following example works as expected: > > >>> u = np.zeros((2,3,4)) > > >>> ui = np.ones((2,4)) > > >>> u = np.insert(u, 1, ui, axis=-2) # equivalent to axis=1 > > > > I didn't check on current master, so maybe things have changed since > > 1.7.1. If they have not, do you think a bug report would be relevant? > > > > Cheers, > > F?lix > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From lists at onerussian.com Mon Jul 1 15:30:06 2013 From: lists at onerussian.com (Yaroslav Halchenko) Date: Mon, 1 Jul 2013 15:30:06 -0400 Subject: [Numpy-discussion] Really cruel draft of vbench setup for NumPy (.add.reduce benchmarks since 2011) In-Reply-To: References: <20130506143241.GV5140@onerussian.com> <1367856232.2506.31.camel@sebastian-laptop> <20130506161153.GW5140@onerussian.com> <1367927238.23010.12.camel@sebastian-laptop> Message-ID: <20130701193006.GC27621@onerussian.com> Hi Guys, not quite the recommendations you expressed, but here is my ugly attempt to improve benchmarks coverage: http://www.onerussian.com/tmp/numpy-vbench-20130701/index.html initially I also ran those ufunc benchmarks per each dtype separately, but then resulting webpage is loong which brings my laptop on its knees by firefox. So I commented those out for now, and left only "summary" ones across multiple datatypes. There is a bug in sphinx which forbids embedding some figures for vb_random "as is", so pardon that for now... I have not set cpu affinity of the process (but ran it at nice -10), so may be that also contributed to variance of benchmark estimates. And there probably could be more of goodies (e.g. gc control etc) to borrow from https://github.com/pydata/pandas/blob/master/vb_suite/test_perf.py which I have just discovered to minimize variance. nothing really interesting was pin-pointed so far, besides that - svd became a bit faster since few months back ;-) http://www.onerussian.com/tmp/numpy-vbench-20130701/vb_vb_linalg.html - isnan (and isinf, isfinite) got improved http://www.onerussian.com/tmp/numpy-vbench-20130701/vb_vb_ufunc.html#numpy-isnan-a-10types - right_shift got a miniscule slowdown from what it used to be? http://www.onerussian.com/tmp/numpy-vbench-20130701/vb_vb_ufunc.html#numpy-right-shift-a-a-3types As before -- current code of those benchmarks collection is available at http://github.com/yarikoptic/numpy-vbench/pull/new/master if you have specific snippets you would like to benchmark -- just state them here or send a PR -- I will add them in. Cheers, On Tue, 07 May 2013, Da?id wrote: > On 7 May 2013 13:47, Sebastian Berg wrote: > > Indexing/assignment was the first thing I thought of too (also because > > fancy indexing/assignment really could use some speedups...). Other then > > that maybe some timings for small arrays/scalar math, but that might be > > nice for that GSoC project. > Why not going bigger? Ufunc operations on big arrays, CPU and memory bound. > Also, what about interfacing with other packages? It may increase the > compiling overhead, but I would like to see Cython in action (say, > only last version, maybe it can be fixed). > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Yaroslav O. Halchenko, Ph.D. http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org Senior Research Associate, Psychological and Brain Sciences Dept. Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 WWW: http://www.linkedin.com/in/yarik From lists at onerussian.com Mon Jul 1 17:58:05 2013 From: lists at onerussian.com (Yaroslav Halchenko) Date: Mon, 1 Jul 2013 17:58:05 -0400 Subject: [Numpy-discussion] Really cruel draft of vbench setup for NumPy (.add.reduce benchmarks since 2011) In-Reply-To: <20130701193006.GC27621@onerussian.com> References: <20130506143241.GV5140@onerussian.com> <1367856232.2506.31.camel@sebastian-laptop> <20130506161153.GW5140@onerussian.com> <1367927238.23010.12.camel@sebastian-laptop> <20130701193006.GC27621@onerussian.com> Message-ID: <20130701215804.GG27621@onerussian.com> FWIW -- updated plots with contribution from Julian Taylor http://www.onerussian.com/tmp/numpy-vbench-20130701/vb_vb_indexing.html#mmap-slicing ;-) On Mon, 01 Jul 2013, Yaroslav Halchenko wrote: > Hi Guys, > not quite the recommendations you expressed, but here is my ugly > attempt to improve benchmarks coverage: > http://www.onerussian.com/tmp/numpy-vbench-20130701/index.html > initially I also ran those ufunc benchmarks per each dtype separately, > but then resulting webpage is loong which brings my laptop on its knees > by firefox. So I commented those out for now, and left only "summary" > ones across multiple datatypes. > There is a bug in sphinx which forbids embedding some figures for > vb_random "as is", so pardon that for now... > I have not set cpu affinity of the process (but ran it at nice -10), so may be > that also contributed to variance of benchmark estimates. And there probably > could be more of goodies (e.g. gc control etc) to borrow from > https://github.com/pydata/pandas/blob/master/vb_suite/test_perf.py which I have > just discovered to minimize variance. > nothing really interesting was pin-pointed so far, besides that > - svd became a bit faster since few months back ;-) > http://www.onerussian.com/tmp/numpy-vbench-20130701/vb_vb_linalg.html > - isnan (and isinf, isfinite) got improved > http://www.onerussian.com/tmp/numpy-vbench-20130701/vb_vb_ufunc.html#numpy-isnan-a-10types > - right_shift got a miniscule slowdown from what it used to be? > http://www.onerussian.com/tmp/numpy-vbench-20130701/vb_vb_ufunc.html#numpy-right-shift-a-a-3types > As before -- current code of those benchmarks collection is available > at http://github.com/yarikoptic/numpy-vbench/pull/new/master > if you have specific snippets you would like to benchmark -- just state them > here or send a PR -- I will add them in. > Cheers, > On Tue, 07 May 2013, Da?id wrote: > > On 7 May 2013 13:47, Sebastian Berg wrote: > > > Indexing/assignment was the first thing I thought of too (also because > > > fancy indexing/assignment really could use some speedups...). Other then > > > that maybe some timings for small arrays/scalar math, but that might be > > > nice for that GSoC project. > > Why not going bigger? Ufunc operations on big arrays, CPU and memory bound. > > Also, what about interfacing with other packages? It may increase the > > compiling overhead, but I would like to see Cython in action (say, > > only last version, maybe it can be fixed). > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Yaroslav O. Halchenko, Ph.D. http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org Senior Research Associate, Psychological and Brain Sciences Dept. Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 WWW: http://www.linkedin.com/in/yarik From mdroe at stsci.edu Tue Jul 2 12:39:27 2013 From: mdroe at stsci.edu (Michael Droettboom) Date: Tue, 2 Jul 2013 12:39:27 -0400 Subject: [Numpy-discussion] matplotlib user survey 2013 Message-ID: <51D3023F.3020108@stsci.edu> [Apologies for cross-posting] The matplotlib developers want to hear from you! We are conducting a user survey to determine how and where matplotlib is being used in order to focus its further development. This should only take a couple of minutes. To fill it out, visit: https://docs.google.com/spreadsheet/viewform?fromEmail=true&formkey=dHpQS25pcTZIRWdqX0pNckNSU01sMHc6MQ Please forward to your colleagues, particularly those who don't read these mailing lists. Cheers, Michael Droettboom, and the matplotlib team -------------- next part -------------- An HTML attachment was scrubbed... URL: From jjstickel at gmail.com Tue Jul 2 17:07:14 2013 From: jjstickel at gmail.com (Jonathan Stickel) Date: Tue, 02 Jul 2013 15:07:14 -0600 Subject: [Numpy-discussion] trouble with numpy.float64 multiplied with cvxopt.matrix Message-ID: <51D34102.2080507@gmail.com> I recently ran into some trouble with multiplying scalar variables of type numpy.float64 with cvxopt matrices. The cvxopt matrix is converted to a numpy array. I first reported this at the cvxopt google groups, where I give an example: https://groups.google.com/forum/#!topic/cvxopt/4suFNOY75E4 The response I got is that it would be difficult to correct this in cvxopt, and that a workaround would be to do A*s rather than s*A (where s is the scalar of type numpy.float64 and A is the cvxopt matrix). I inferred from this that the leading object takes charge of how operator works, including the type conversion. Now, I don't know much about low-level programming for operators and type conversion, but I want to ask whether this should be considered a bug or whether correcting this might be a reasonable feature request in numpy. It seems to me that scalars of any numpy type, when operated with other objects, should not change the high-level type of those objects. Thanks, Jonathan From brad.froehle at gmail.com Tue Jul 2 23:44:19 2013 From: brad.froehle at gmail.com (Bradley M. Froehle) Date: Tue, 2 Jul 2013 20:44:19 -0700 Subject: [Numpy-discussion] Fancy indexing oddity Message-ID: A colleague just showed me this indexing behavior and I was at a loss to explain what was going on. Can anybody else chime in and help me understand this indexing behavior? >>> import numpy as np >>> np.__version__ '1.7.1' >>> A = np.ones((2,3,5)) >>> mask = np.array([True]*4 + [False], dtype=bool) >>> A.shape (2, 3, 5) >>> A[:,:,mask].shape (2, 3, 4) >>> A[:,1,mask].shape (2, 4) >>> A[1,:,mask].shape (4, 3) # Why is this not (3, 4)? >>> A[1][:,mask].shape (3, 4) Thanks! Brad From sebastian at sipsolutions.net Wed Jul 3 03:52:16 2013 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Wed, 03 Jul 2013 09:52:16 +0200 Subject: [Numpy-discussion] Fancy indexing oddity In-Reply-To: References: Message-ID: <1372837936.7112.10.camel@sebastian-laptop> On Tue, 2013-07-02 at 20:44 -0700, Bradley M. Froehle wrote: > A colleague just showed me this indexing behavior and I was at a loss > to explain what was going on. Can anybody else chime in and help me > understand this indexing behavior? > > >>> import numpy as np > >>> np.__version__ > '1.7.1' > >>> A = np.ones((2,3,5)) > >>> mask = np.array([True]*4 + [False], dtype=bool) > >>> A.shape > (2, 3, 5) > >>> A[:,:,mask].shape > (2, 3, 4) > >>> A[:,1,mask].shape > (2, 4) > >>> A[1,:,mask].shape > (4, 3) # Why is this not (3, 4)? > >>> A[1][:,mask].shape > (3, 4) > Numpy has slicing and fancy indexing. But scalars are both. They are fancy indexes, but they do not trigger fancy indexing (you could also add the special case of a scalar result to this, but it doesn't matter for this)! Implementation wise mixed fancy indexing/slicing is a multi step process: 1. Evaluate the slices. (no surprises here) 2. Evaluate the fancy indexing moving the new axes to the *front*. Here this means A[1,:,mask] -> A.transpose(2,0,1) then combining all fancy indexes so that A.shape goes from (2, 3, 5) via transpose (5,2,3) to (4,3), since the combination of 1 and mask gives a 1-d result with 4 entries. 3. If and only if all fancy indexes were consecutive, i.e. A[:,1,mask], A[mask,[[3]],:], numpy can basically guess where it would make sense to put the fancy axes. So it transposes it back. This is what makes a single fancy index behave like a slice. Now in your example A[1,:,mask] is *not* consecutive (remember scalars are fancy in this regard), so the fancy axis goes to the front instead of going to where "mask" was. In short, the resulting axes from the fancy indices is at the front if the fancy indices are not consecutive. And since scalars are considered fancy in this regard they are not consecutive in your example. - Sebastian > Thanks! > Brad > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From thomas.robitaille at gmail.com Thu Jul 4 09:06:49 2013 From: thomas.robitaille at gmail.com (Thomas Robitaille) Date: Thu, 4 Jul 2013 15:06:49 +0200 Subject: [Numpy-discussion] Equality not working as expected with ndarray sub-class Message-ID: Hi everyone, The following example: import numpy as np class SimpleArray(np.ndarray): __array_priority__ = 10000 def __new__(cls, input_array, info=None): return np.asarray(input_array).view(cls) def __eq__(self, other): return False a = SimpleArray(10) print (np.int64(10) == a) print (a == np.int64(10)) gives the following output $ python2.7 eq.py True False so that in the first case, SimpleArray.__eq__ is not called. Is this a bug, and if so, can anyone think of a workaround? If this is expected behavior, how do I ensure SimpleArray.__eq__ gets called in both cases? Thanks, Tom ps: cross-posting to stackoverflow From nouiz at nouiz.org Thu Jul 4 09:09:37 2013 From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=) Date: Thu, 4 Jul 2013 09:09:37 -0400 Subject: [Numpy-discussion] Equality not working as expected with ndarray sub-class In-Reply-To: References: Message-ID: Hi, __array__priority wasn't checked for ==, !=, <, <=, >, >= operation. I added it in the development version and someone else back-ported it to the 1.7.X branch. So this will work with the next release of numpy. I don't know of a workaround until the next release. Fred On Thu, Jul 4, 2013 at 9:06 AM, Thomas Robitaille < thomas.robitaille at gmail.com> wrote: > Hi everyone, > > The following example: > > import numpy as np > > class SimpleArray(np.ndarray): > > __array_priority__ = 10000 > > def __new__(cls, input_array, info=None): > return np.asarray(input_array).view(cls) > > def __eq__(self, other): > return False > > a = SimpleArray(10) > print (np.int64(10) == a) > print (a == np.int64(10)) > > gives the following output > > $ python2.7 eq.py > True > False > > so that in the first case, SimpleArray.__eq__ is not called. Is this a > bug, and if so, can anyone think of a workaround? If this is expected > behavior, how do I ensure SimpleArray.__eq__ gets called in both > cases? > > Thanks, > Tom > > ps: cross-posting to stackoverflow > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Thu Jul 4 09:12:16 2013 From: sebastian at sipsolutions.net (sebastian) Date: Thu, 04 Jul 2013 15:12:16 +0200 Subject: [Numpy-discussion] Equality not working as expected with ndarray sub-class In-Reply-To: References: Message-ID: On 2013-07-04 15:06, Thomas Robitaille wrote: > Hi everyone, > > The following example: > > import numpy as np > > class SimpleArray(np.ndarray): > > __array_priority__ = 10000 > > def __new__(cls, input_array, info=None): > return np.asarray(input_array).view(cls) > > def __eq__(self, other): > return False > > a = SimpleArray(10) > print (np.int64(10) == a) > print (a == np.int64(10)) > > gives the following output > > $ python2.7 eq.py > True > False > > so that in the first case, SimpleArray.__eq__ is not called. Is this a > bug, and if so, can anyone think of a workaround? If this is expected > behavior, how do I ensure SimpleArray.__eq__ gets called in both > cases? > This should be working in all development versions. I.e. NumPy >1.7.2 (which is not released yet). - Sebastian > Thanks, > Tom > > ps: cross-posting to stackoverflow > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From nouiz at nouiz.org Thu Jul 4 09:22:56 2013 From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=) Date: Thu, 4 Jul 2013 09:22:56 -0400 Subject: [Numpy-discussion] Equality not working as expected with ndarray sub-class In-Reply-To: References: Message-ID: On Thu, Jul 4, 2013 at 9:12 AM, sebastian wrote: > On 2013-07-04 15:06, Thomas Robitaille wrote: > > Hi everyone, > > > > The following example: > > > > import numpy as np > > > > class SimpleArray(np.ndarray): > > > > __array_priority__ = 10000 > > > > def __new__(cls, input_array, info=None): > > return np.asarray(input_array).view(cls) > > > > def __eq__(self, other): > > return False > > > > a = SimpleArray(10) > > print (np.int64(10) == a) > > print (a == np.int64(10)) > > > > gives the following output > > > > $ python2.7 eq.py > > True > > False > > > > so that in the first case, SimpleArray.__eq__ is not called. Is this a > > bug, and if so, can anyone think of a workaround? If this is expected > > behavior, how do I ensure SimpleArray.__eq__ gets called in both > > cases? > > > > This should be working in all development versions. I.e. NumPy >1.7.2 > (which is not released yet). > I think you mean: NumPy >= 1.7.2 Fred -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtaylor.debian at googlemail.com Thu Jul 4 14:15:12 2013 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Thu, 04 Jul 2013 20:15:12 +0200 Subject: [Numpy-discussion] Reducing the rounding error of np.sum Message-ID: <51D5BBB0.5060505@googlemail.com> hi, numpys implementation of sum is just a simple: for i in d: sum += d[i] this suffers from rather high rounding errors in the order of the d.size * epsilon. Consider: (np.ones(50000) / 10.).sum() 5000.0000000006585 There are numerous algorithms which reduce the error of this operation. E.g. python implements one in math.fsum which is accurate but slow compared to np.sum [0]. Numpy is currently lacking a precise summation, but I think it would make sense to add one. The question is whether we go with the python approach of adding a new function which is slower but more precise or if we even change the default summation algorithm (or do nothing :) ). For a new function I guess the method used in python itself makes sense, its probably well chosen by the python developers (though I did not lookup the rational for the choice yet). For replacing the default, two algorithms come to my mind, pairwise summation [1] and kahan summation (compensated sum) [2]. pairwise summation adds in pairs so usually the magnitude of the two operands is the same magnitude, this produces an error of O(log n * epsilon) for the common case. This algorithm has the advantage that it is almost as fast as the naive sum with an reasonable error. Problematic might be the buffering numpy does when reducing, this would limit the error reduction to the buffer size. kahan summation adds some extra operations to recover the rounding errors. This results in an error of o(epsilon). It is four times slower than the naive summation but this can be (partially) compensated by vectorizing it. It has the advantage of lower error, simpler implementation and buffering does not interfere. I did prototype implementations (only unit strides) of both in [3, 4]. Any thoughts on this? [0] http://docs.python.org/2/library/math.html#number-theoretic-and-representation-functions [1] http://en.wikipedia.org/wiki/Pairwise_summationg [2] http://en.wikipedia.org/wiki/Kahan_summation_algorithm [3] https://github.com/juliantaylor/numpy/tree/pairwise [4] https://github.com/juliantaylor/numpy/tree/kahan From deil.christoph at googlemail.com Thu Jul 4 14:36:07 2013 From: deil.christoph at googlemail.com (Christoph Deil) Date: Thu, 4 Jul 2013 20:36:07 +0200 Subject: [Numpy-discussion] Reducing the rounding error of np.sum In-Reply-To: <51D5BBB0.5060505@googlemail.com> References: <51D5BBB0.5060505@googlemail.com> Message-ID: <2144B798-CFF8-4E37-B713-865C5965C748@gmail.com> On Jul 4, 2013, at 8:15 PM, Julian Taylor wrote: > hi, > numpys implementation of sum is just a simple: > for i in d: > sum += d[i] > > this suffers from rather high rounding errors in the order of the d.size > * epsilon. Consider: > (np.ones(50000) / 10.).sum() > 5000.0000000006585 > > There are numerous algorithms which reduce the error of this operation. > E.g. python implements one in math.fsum which is accurate but slow > compared to np.sum [0]. > > Numpy is currently lacking a precise summation, but I think it would > make sense to add one. > The question is whether we go with the python approach of adding a new > function which is slower but more precise or if we even change the > default summation algorithm (or do nothing :) ). > > For a new function I guess the method used in python itself makes sense, > its probably well chosen by the python developers (though I did not > lookup the rational for the choice yet). > > For replacing the default, two algorithms come to my mind, pairwise > summation [1] and kahan summation (compensated sum) [2]. > pairwise summation adds in pairs so usually the magnitude of the two > operands is the same magnitude, this produces an error of O(log n * > epsilon) for the common case. > This algorithm has the advantage that it is almost as fast as the naive > sum with an reasonable error. > Problematic might be the buffering numpy does when reducing, this would > limit the error reduction to the buffer size. > > kahan summation adds some extra operations to recover the rounding > errors. This results in an error of o(epsilon). > It is four times slower than the naive summation but this can be > (partially) compensated by vectorizing it. > It has the advantage of lower error, simpler implementation and > buffering does not interfere. > > I did prototype implementations (only unit strides) of both in [3, 4]. > > Any thoughts on this? In case you are not aware, there has been some discussion on how numerically stable sum could be added to numpy here: https://github.com/numpy/numpy/issues/2448 > > > [0] > http://docs.python.org/2/library/math.html#number-theoretic-and-representation-functions > [1] http://en.wikipedia.org/wiki/Pairwise_summationg > [2] http://en.wikipedia.org/wiki/Kahan_summation_algorithm > [3] https://github.com/juliantaylor/numpy/tree/pairwise > [4] https://github.com/juliantaylor/numpy/tree/kahan > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From charlesr.harris at gmail.com Thu Jul 4 15:33:43 2013 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 4 Jul 2013 13:33:43 -0600 Subject: [Numpy-discussion] Reducing the rounding error of np.sum In-Reply-To: <51D5BBB0.5060505@googlemail.com> References: <51D5BBB0.5060505@googlemail.com> Message-ID: On Thu, Jul 4, 2013 at 12:15 PM, Julian Taylor < jtaylor.debian at googlemail.com> wrote: > hi, > numpys implementation of sum is just a simple: > for i in d: > sum += d[i] > > this suffers from rather high rounding errors in the order of the d.size > * epsilon. Consider: > (np.ones(50000) / 10.).sum() > 5000.0000000006585 > > There are numerous algorithms which reduce the error of this operation. > E.g. python implements one in math.fsum which is accurate but slow > compared to np.sum [0]. > > Numpy is currently lacking a precise summation, but I think it would > make sense to add one. > The question is whether we go with the python approach of adding a new > function which is slower but more precise or if we even change the > default summation algorithm (or do nothing :) ). > > For a new function I guess the method used in python itself makes sense, > its probably well chosen by the python developers (though I did not > lookup the rational for the choice yet). > > For replacing the default, two algorithms come to my mind, pairwise > summation [1] and kahan summation (compensated sum) [2]. > pairwise summation adds in pairs so usually the magnitude of the two > operands is the same magnitude, this produces an error of O(log n * > epsilon) for the common case. > This algorithm has the advantage that it is almost as fast as the naive > sum with an reasonable error. > Problematic might be the buffering numpy does when reducing, this would > limit the error reduction to the buffer size. > > kahan summation adds some extra operations to recover the rounding > errors. This results in an error of o(epsilon). > It is four times slower than the naive summation but this can be > (partially) compensated by vectorizing it. > It has the advantage of lower error, simpler implementation and > buffering does not interfere. > > I did prototype implementations (only unit strides) of both in [3, 4]. > > Any thoughts on this? > > > [0] > > http://docs.python.org/2/library/math.html#number-theoretic-and-representation-functions > [1] http://en.wikipedia.org/wiki/Pairwise_summationg > [2] http://en.wikipedia.org/wiki/Kahan_summation_algorithm > [3] https://github.com/juliantaylor/numpy/tree/pairwise > [4] https://github.com/juliantaylor/numpy/tree/kahan > I think this would be useful as part of a bigger package that included accurate mean, var, and std. In particular, the need for more accurate means and variances have been discussed on the list before, but no one has stepped forward to do anything about them. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From matti.picus at gmail.com Thu Jul 4 15:43:20 2013 From: matti.picus at gmail.com (Matti Picus) Date: Thu, 04 Jul 2013 22:43:20 +0300 Subject: [Numpy-discussion] subtypes of ndarray and round() Message-ID: <51D5D058.9090502@gmail.com> round() does not consistently preserve subtype of the ndarray, is this known behaviour or should I file a bug for it? Python 2.7.3 (default, Sep 26 2012, 21:51:14) [GCC 4.7.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import numpy as np >>> np.version.version '1.7.0' >>> a=np.matrix(range(10)) >>> a.round(decimals=10) matrix([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]]) >>> a.round(decimals=-10) array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]) Matti From daschaich at gmail.com Thu Jul 4 20:01:17 2013 From: daschaich at gmail.com (David Schaich) Date: Thu, 04 Jul 2013 18:01:17 -0600 Subject: [Numpy-discussion] Covariance matrix from polyfit Message-ID: <51D60CCD.5090604@gmail.com> Hi all, I recently adopted python, and am in the process of replacing my old analysis tools. For simple (e.g., linear) interpolations and extrapolations, in the past I used gnuplot. Today I set up the equivalent with polyfit in numpy v1.7.1, first running a simple test to reproduce the gnuplot result. A discussion on this list back in February alerted me that I should use 1/sigma for the weights in polyfit as opposed to 1/sigma**2. Fine -- that's not what I'm used to, but I can make a note. http://mail.scipy.org/pipermail/numpy-discussion/2013-February/065649.html Another issue mentioned in that thread is scaling the covariance matrix by fac = resids / (len(x) - order - 2.0) This wreaked havoc on the simple test I mentioned above (and include below), which fit three data points to a straight line. I spent hours trying to figure out why numpy returned negative variances, before tracking down this line. And, indeed, if I add a fake fourth data point, I end up with inf's and nan's. There is some lengthy justification for subtracting that 2 around line 590 in lib/polynomial.py. Fine -- it's nothing I recall seeing before (and I removed it from my local installation), but I'm just a new user. However, I do think it is important to fix polyfit so that it doesn't produce pathological results like those I encountered today. Here are a couple of possibilities that would let the subtraction of 2 remain: * Check whether len(x) > order + 2, and if it is not, either ** Die with an error ** Scale by resids / (len(x) - order) instead of resids / (len(x) - order - 2.0) * Don't bother with this scaling at all, leaving it to the users (who can subtract 2 if they want). This is what scipy.optimize.leastsq does, after what seems to be a good deal of discussion: "This matrix must be multiplied by the residual variance to get the covariance of the parameter estimates ? see curve_fit." http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.leastsq.html https://github.com/scipy/scipy/pull/448 I leave it to those of you with more numpy experience to decide what would be the best way to go. Cheers, David http://www-hep.colorado.edu/~schaich/ +++ Here's the simple example: >>> import numpy as np >>> m = np.array([0.008, 0.01, 0.015]) >>> dat = np.array([1.0822582, 1.0805417, 1.0766624]) >>> weight = np.array([1/0.000370, 1/0.000355, 1/0.000249]) >>> out, cov = np.polyfit(m, dat, 1, full=False, w=weight, cov=True) >>> print out, '\n', cov [-0.79269957 1.08854252] [[ -2.34965006e-04 2.84428412e-06] [ 2.84428412e-06 -3.66283662e-08]] >>> print np.sqrt(-1. * cov[0][0]) 0.0153285682895 >>> print np.sqrt(-1. * cov[1][1]) 0.000191385386578 +++ Gnuplot gives +++ Final set of parameters Asymptotic Standard Error ======================= ========================== A = -0.792719 +/- 0.01533 (1.934%) B = 1.08854 +/- 0.0001914 (0.01758%) +++ so up to the negative sign, all is good. For its part, scipy.optimize.leastsq needs me to do the scaling: +++ >>> import numpy as np >>> from scipy import optimize >>> m = np.array([0.008, 0.01, 0.015]) >>> dat = np.array([1.0822582, 1.0805417, 1.0766624]) >>> err = np.array([0.000370, 0.000355, 0.000249]) >>> linear = lambda p, x: p[0] * x + p[1] >>> errfunc = lambda p, x, y, err: (linear(p, x) - y) / err >>> p_in = [-1., 1.] >>> all_out = optimize.leastsq(errfunc, p_in[:], args=(m, dat, err), full_output = 1) >>> out = all_out[0] >>> cov = all_out[1] >>> print out, '\n', cov [-0.79269959 1.08854252] [[ 3.40800756e-03 -4.12544212e-05] [ -4.12544212e-05 5.31270007e-07]] >>> chiSq_dof = ((errfunc(out, m, dat, err))**2).sum() / (len(m) - len(out)) >>> cov *= chiSq_dof >>> print cov [[ 2.34964190e-04 -2.84427528e-06] [ -2.84427528e-06 3.66282716e-08]] +++ From josef.pktd at gmail.com Thu Jul 4 21:40:17 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 4 Jul 2013 21:40:17 -0400 Subject: [Numpy-discussion] Covariance matrix from polyfit In-Reply-To: <51D60CCD.5090604@gmail.com> References: <51D60CCD.5090604@gmail.com> Message-ID: On Thu, Jul 4, 2013 at 8:01 PM, David Schaich wrote: > Hi all, > > I recently adopted python, and am in the process of replacing my old > analysis tools. For simple (e.g., linear) interpolations and > extrapolations, in the past I used gnuplot. Today I set up the > equivalent with polyfit in numpy v1.7.1, first running a simple test to > reproduce the gnuplot result. > > A discussion on this list back in February alerted me that I should use > 1/sigma for the weights in polyfit as opposed to 1/sigma**2. Fine -- > that's not what I'm used to, but I can make a note. > http://mail.scipy.org/pipermail/numpy-discussion/2013-February/065649.html > > Another issue mentioned in that thread is scaling the covariance matrix by > fac = resids / (len(x) - order - 2.0) > This wreaked havoc on the simple test I mentioned above (and include > below), which fit three data points to a straight line. I spent hours > trying to figure out why numpy returned negative variances, before > tracking down this line. And, indeed, if I add a fake fourth data point, > I end up with inf's and nan's. > > There is some lengthy justification for subtracting that 2 around line > 590 in lib/polynomial.py. Fine -- it's nothing I recall seeing before > (and I removed it from my local installation), but I'm just a new user. > > However, I do think it is important to fix polyfit so that it doesn't > produce pathological results like those I encountered today. Here are a > couple of possibilities that would let the subtraction of 2 remain: > * Check whether len(x) > order + 2, and if it is not, either > ** Die with an error > ** Scale by resids / (len(x) - order) instead of resids / (len(x) - > order - 2.0) I would throw out the -2, or at least make it optional like `ddof`. (It's not in the docstring AFAICS) returning a negative (!) definite covariance matrix is definitely a bug. (should return nan or raise exception) my 1.5 cents Josef > > * Don't bother with this scaling at all, leaving it to the users (who > can subtract 2 if they want). This is what scipy.optimize.leastsq does, > after what seems to be a good deal of discussion: "This matrix must be > multiplied by the residual variance to get the covariance of the > parameter estimates ? see curve_fit." > http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.leastsq.html > https://github.com/scipy/scipy/pull/448 > > I leave it to those of you with more numpy experience to decide what > would be the best way to go. > > Cheers, > David > http://www-hep.colorado.edu/~schaich/ > > > +++ > Here's the simple example: > >>> import numpy as np > >>> m = np.array([0.008, 0.01, 0.015]) > >>> dat = np.array([1.0822582, 1.0805417, 1.0766624]) > >>> weight = np.array([1/0.000370, 1/0.000355, 1/0.000249]) > >>> out, cov = np.polyfit(m, dat, 1, full=False, w=weight, cov=True) > >>> print out, '\n', cov > [-0.79269957 1.08854252] > [[ -2.34965006e-04 2.84428412e-06] > [ 2.84428412e-06 -3.66283662e-08]] > >>> print np.sqrt(-1. * cov[0][0]) > 0.0153285682895 > >>> print np.sqrt(-1. * cov[1][1]) > 0.000191385386578 > +++ > > Gnuplot gives > +++ > Final set of parameters Asymptotic Standard Error > ======================= ========================== > A = -0.792719 +/- 0.01533 (1.934%) > B = 1.08854 +/- 0.0001914 (0.01758%) > +++ > so up to the negative sign, all is good. > > For its part, scipy.optimize.leastsq needs me to do the scaling: > +++ > >>> import numpy as np > >>> from scipy import optimize > >>> m = np.array([0.008, 0.01, 0.015]) > >>> dat = np.array([1.0822582, 1.0805417, 1.0766624]) > >>> err = np.array([0.000370, 0.000355, 0.000249]) > >>> linear = lambda p, x: p[0] * x + p[1] > >>> errfunc = lambda p, x, y, err: (linear(p, x) - y) / err > >>> p_in = [-1., 1.] > >>> all_out = optimize.leastsq(errfunc, p_in[:], args=(m, dat, err), > full_output = 1) > >>> out = all_out[0] > >>> cov = all_out[1] > >>> print out, '\n', cov > [-0.79269959 1.08854252] > [[ 3.40800756e-03 -4.12544212e-05] > [ -4.12544212e-05 5.31270007e-07]] > >>> chiSq_dof = ((errfunc(out, m, dat, err))**2).sum() / (len(m) - > len(out)) > >>> cov *= chiSq_dof > >>> print cov > [[ 2.34964190e-04 -2.84427528e-06] > [ -2.84427528e-06 3.66282716e-08]] > +++ > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From bakhtiyor_zokhidov at mail.ru Fri Jul 5 04:20:28 2013 From: bakhtiyor_zokhidov at mail.ru (=?UTF-8?B?QmFraHRpeW9yIFpva2hpZG92?=) Date: Fri, 05 Jul 2013 12:20:28 +0400 Subject: [Numpy-discussion] =?utf-8?q?Unique=28=29_function_and_avoiding_L?= =?utf-8?q?oop?= Message-ID: <1373012428.804822615@f377.i.mail.ru> Hi everybody, I have a problem with sorting out the following function. What I expect is that I showed as an example below. Two problems are encountered to achieve the result: 1) The function sometimes can't not sort as expected: I showed an example for that below. 2) I could not do vectorization to avoid loop. OR, Is there another way to solve that problem?? Thanks in advance Example: data = ['', 12, 12, 423, '1', 423, -32, 12, 721, 345]. Expected result:??[0, 12, 12, 423, 0, 423, -32, 12, 721, 345],? here, '' and '1' are string type I need to replace them by zero The result I got:?['', 12, 12, 423, '1', 423, -32, 12, 721, 345] import numpy as np def func(data): ? ? ? ? ? x, i = np.unique(data, return_inverse = True) ? ? ? ? ? f = [ np.where( i == ind )[0] for ind in range(len(x)) ] ? ? ? ? ? new_data = [] ? ? ? ? ? # Obtain 'data' arguments and give these data to New_data ? ? ? ? ? for i in range(len(x)): ? ? ? ? ? ? ? ? ? ? ? if np.size(f[i]) > 1: ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?for j in f[i]: ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?if str(data[j]) <> '': ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? new_data.append(data[j]) ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? else: ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? data[j] = 0 ? ? ? ? ? return data --? Bakhtiyor Zokhidov -------------- next part -------------- An HTML attachment was scrubbed... URL: From grb at skogoglandskap.no Fri Jul 5 14:29:23 2013 From: grb at skogoglandskap.no (Graeme B. Bell) Date: Fri, 5 Jul 2013 18:29:23 +0000 Subject: [Numpy-discussion] simple select(): for anyone with too many conditions for np.select(), or scalar-valued choicelists. Message-ID: <65BDE0BC-0E8B-44C5-AAFD-876490382FE3@skogoglandskap.no> I've made a drop-in replacement for select() which works with large numbers of conditions, and which consistently outperforms numpy.select for my use case (scalar condlist). It fixes a couple of other issues too, and (I feel) improves the internal documentation of the code. I have included benchmarks and some tests. https://github.com/gbb/numpy-simple-select The numpy dev team are welcome to include all or part of this code into the main numpy distribution or it can be kept separate or ignored if they prefer. :-) If you need more than 30 ndarrays in your 'condlist', or if you have an all-scalar choicelist, I think you will find this code particularly interesting. Formal test coverage is incomplete, but I think this is still going to be quite useful for some people. Have a nice weekend, Graeme From mjanikas at esri.com Fri Jul 5 17:48:42 2013 From: mjanikas at esri.com (Mark Janikas) Date: Fri, 5 Jul 2013 21:48:42 +0000 Subject: [Numpy-discussion] PyArray_PutTo Question Message-ID: <1C37EAF5F95D764D99E0EABE944AE95F225C7E53@RED-INF-EXMB-P1.esri.com> Hi All, I am a bit new to the NumPy C-API and I am having a hard time with placing results into output arrays... I am using PyArray_TakeFrom to grab an input dimension of data, then do a calculation, then I want to pack it back to the output... yet the PutTo function does not have an axis argument like the TakeFrom does... I am grabbing by column in a two-dimensional array and I would like to pack it that way. I know that I can build the result in reverse and pack the columns into rows and then reshape the output... but I am wondering why the PutTo does not behave exactly like the take-from does?... The python implementation "numpy.put" also does not have the axis... so I guess I can see the one-to-one reason for the omission. However, is building in reverse and reshaping the normal way to pack by column? Thanks much! MJ -------------- next part -------------- An HTML attachment was scrubbed... URL: From alan.isaac at gmail.com Fri Jul 5 18:45:16 2013 From: alan.isaac at gmail.com (Alan G Isaac) Date: Fri, 05 Jul 2013 18:45:16 -0400 Subject: [Numpy-discussion] SLARTG In-Reply-To: <51D7158C.1090903@american.edu> References: <51D7158C.1090903@american.edu> Message-ID: <51D74C7C.5030500@gmail.com> On 7/5/2013 2:50 PM, Alan G Isaac wrote: > I see that CLARTG is here: > https://github.com/scipy/scipy/blob/master/scipy/sparse/linalg/eigen/arpack/ARPACK/SRC/sstqrb.f > > But is there a Python interface in SciPy? > (Or any other SciPy access to Givens rotation?) Sorry, that was SLARTG, whereas CLARTG is here: https://github.com/scipy/scipy/blob/master/scipy/sparse/linalg/eigen/arpack/ARPACK/SRC/cnapps.f But the question stands: is there a Python interface to Givens rotation? Thanks, Alan Isaac From alan.isaac at gmail.com Sun Jul 7 11:28:02 2013 From: alan.isaac at gmail.com (Alan G Isaac) Date: Sun, 07 Jul 2013 11:28:02 -0400 Subject: [Numpy-discussion] add .H attribute? Message-ID: <51D98902.1090403@gmail.com> With numpy arrays, I miss being able to spell a.conj().T as a.H, as one can with numpy matrices. Is adding this attribute to arrays ever under consideration? Thanks, Alan Isaac From charlesr.harris at gmail.com Sun Jul 7 16:49:18 2013 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 7 Jul 2013 14:49:18 -0600 Subject: [Numpy-discussion] add .H attribute? In-Reply-To: <51D98902.1090403@gmail.com> References: <51D98902.1090403@gmail.com> Message-ID: On Sun, Jul 7, 2013 at 9:28 AM, Alan G Isaac wrote: > With numpy arrays, I miss being able to spell a.conj().T as a.H, > as one can with numpy matrices. > > Is adding this attribute to arrays ever under consideration? > There was a long thread about this back around 1.1 or so, long time ago in any case. IIRC correctly, Travis was opposed. I think part of the problem was that arr.T is a view, but arr.H would not be. Probably it could be be made to return an iterator that performed the conjugation, or we could simply return a new array. I'm not opposed myself, but I'd have to review the old discussion to see if there was good reason not to have it in the first place. I think the original discussion of an abs method took place about the same time. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From kimsunghyun at kaist.ac.kr Mon Jul 8 12:05:44 2013 From: kimsunghyun at kaist.ac.kr (sunghyun Kim) Date: Tue, 9 Jul 2013 01:05:44 +0900 Subject: [Numpy-discussion] f2py build with mkl lapack Message-ID: Hi I'm trying to use fortran wrapper f2py with intel's mkl following is my command LIB='-L/opt/intel/Compiler/11.1/064/mkl/lib/em64t/ -lguide -lpthread -lmkl_core -lmkl_intel_lp64 -lmkl_sequential' INC=-I/opt/intel/Compiler/11.1/064/mkl/include f2py --fcompiler=intelem $INC $LIB -m solveLE -c solveLE2.f solveLE2.f is simple fortran code using lapack's linear equation solver SGESV ============= CALL SGESV(N, NRHS, A, LDA, IPIV, B, LDB, INFO) ============= When i use the command, compile was done. But when I use the solveLE.so, I received following error massage ================== $python test.py python: symbol lookup error: /opt/intel/Compiler/11.1/064/mkl/lib/em64t/libmkl_lapack.so: undefined symbol: mkl_lapack_sgetrf ================== I think "mkl_lapack_sgetrf" is defined in -lmkl_sequential. I don't know what should I do. Any help would be greatly appreciated! Sunghyun Kim Ph.D. Candidate Theoretical Condensed Matter Physics Group. KAIST 291 Daehak-ro(373-1 Guseong-dong), Yuseong-gu, Daejeon 305-701, Republic of Korea +10-4144-5946 -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Mon Jul 8 13:15:50 2013 From: cournape at gmail.com (David Cournapeau) Date: Mon, 8 Jul 2013 18:15:50 +0100 Subject: [Numpy-discussion] f2py build with mkl lapack In-Reply-To: References: Message-ID: On Mon, Jul 8, 2013 at 5:05 PM, sunghyun Kim wrote: > Hi > > I'm trying to use fortran wrapper f2py with intel's mkl > > following is my command > > LIB='-L/opt/intel/Compiler/11.1/064/mkl/lib/em64t/ -lguide -lpthread > -lmkl_core -lmkl_intel_lp64 -lmkl_sequential' > Linking order matters: if A needs B, A should appear before B, so -lpthread/-lguide should be at the end, mkl_intel_lp64 before mkl_core, and mkl_sequential in front of that. See the MKL manual for more details, David > INC=-I/opt/intel/Compiler/11.1/064/mkl/include > f2py --fcompiler=intelem $INC $LIB -m solveLE -c solveLE2.f > solveLE2.f is simple fortran code using lapack's linear equation solver > SGESV > ============= > CALL SGESV(N, NRHS, A, LDA, IPIV, B, LDB, INFO) > ============= > > When i use the command, compile was done. > But when I use the solveLE.so, I received following error massage > > ================== > $python test.py > python: symbol lookup error: > /opt/intel/Compiler/11.1/064/mkl/lib/em64t/libmkl_lapack.so: undefined > symbol: mkl_lapack_sgetrf > ================== > > I think "mkl_lapack_sgetrf" is defined in -lmkl_sequential. > > I don't know what should I do. > > > Any help would be greatly appreciated! > > > > > Sunghyun Kim > Ph.D. Candidate > Theoretical Condensed Matter Physics Group. > KAIST > 291 Daehak-ro(373-1 Guseong-dong), Yuseong-gu, Daejeon 305-701, Republic > of Korea > +10-4144-5946 > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brad.froehle at gmail.com Mon Jul 8 14:37:22 2013 From: brad.froehle at gmail.com (Bradley M. Froehle) Date: Mon, 8 Jul 2013 11:37:22 -0700 Subject: [Numpy-discussion] f2py build with mkl lapack In-Reply-To: References: Message-ID: On Mon, Jul 8, 2013 at 10:15 AM, David Cournapeau wrote: > > > On Mon, Jul 8, 2013 at 5:05 PM, sunghyun Kim > wrote: >> >> Hi >> >> I'm trying to use fortran wrapper f2py with intel's mkl >> >> following is my command >> >> LIB='-L/opt/intel/Compiler/11.1/064/mkl/lib/em64t/ -lguide -lpthread >> -lmkl_core -lmkl_intel_lp64 -lmkl_sequential' > > > Linking order matters: if A needs B, A should appear before B, so > -lpthread/-lguide should be at the end, mkl_intel_lp64 before mkl_core, and > mkl_sequential in front of that. > > See the MKL manual for more details, You may also want to consult the MKL Link Line Advsior [1], which in your case recommends an ordering like: -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lm [1]: http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor -Brad From kimsunghyun at kaist.ac.kr Mon Jul 8 21:12:06 2013 From: kimsunghyun at kaist.ac.kr (sunghyun Kim) Date: Tue, 9 Jul 2013 10:12:06 +0900 Subject: [Numpy-discussion] f2py build with mkl lapack In-Reply-To: References: Message-ID: thank you for your help I tried following orders and many combinations... ===================== LIB='-L/opt/intel/Compiler/11.1/064/mkl/lib/em64t/ -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lm' LIB='-L/opt/intel/Compiler/11.1/064/mkl/lib/em64t/ -lmkl_solver_lp64_sequential -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lm' ==================== but I still got following massage ================ undefined symbol: mkl_lapack_sgetrf ================ Sunghyun Kim Ph.D. Candidate Theoretical Condensed Matter Physics Group. KAIST 291 Daehak-ro(373-1 Guseong-dong), Yuseong-gu, Daejeon 305-701, Republic of Korea +10-4144-5946 On Tue, Jul 9, 2013 at 3:37 AM, Bradley M. Froehle wrote: > On Mon, Jul 8, 2013 at 10:15 AM, David Cournapeau > wrote: > > > > > > On Mon, Jul 8, 2013 at 5:05 PM, sunghyun Kim > > wrote: > >> > >> Hi > >> > >> I'm trying to use fortran wrapper f2py with intel's mkl > >> > >> following is my command > >> > >> LIB='-L/opt/intel/Compiler/11.1/064/mkl/lib/em64t/ -lguide -lpthread > >> -lmkl_core -lmkl_intel_lp64 -lmkl_sequential' > > > > > > Linking order matters: if A needs B, A should appear before B, so > > -lpthread/-lguide should be at the end, mkl_intel_lp64 before mkl_core, > and > > mkl_sequential in front of that. > > > > See the MKL manual for more details, > > You may also want to consult the MKL Link Line Advsior [1], which in > your case recommends an ordering like: > > -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lm > > [1]: http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor > > -Brad > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chaoyuejoy at gmail.com Tue Jul 9 08:55:53 2013 From: chaoyuejoy at gmail.com (Chao YUE) Date: Tue, 9 Jul 2013 14:55:53 +0200 Subject: [Numpy-discussion] np.ma.argmax not respecting the mask? Message-ID: Dear all, I am using 1.7.1 version of numpy and np.ma.argmax is not repecting the mask? I expect for all data that are masked, it should also return a mask, but this is not the case. In [96]: d3 Out[96]: masked_array(data = [[-- -- -- -- 4] [5 -- 7 8 9]], mask = [[ True True True True False] [False True False False False]], fill_value = 6) In [97]: np.ma.argmax(d3,axis=0) Out[97]: array([1, 0, 1, 1, 1]) In [98]: np.__version__ Out[98]: '1.7.1' Can I file a bug report on this? thanks, Chao -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at sun.ac.za Tue Jul 9 09:14:28 2013 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Tue, 9 Jul 2013 15:14:28 +0200 Subject: [Numpy-discussion] np.ma.argmax not respecting the mask? In-Reply-To: References: Message-ID: On Tue, Jul 9, 2013 at 2:55 PM, Chao YUE wrote: > I am using 1.7.1 version of numpy and np.ma.argmax is not repecting the > mask? > > In [96]: d3 > Out[96]: > masked_array(data = > [[-- -- -- -- 4] > [5 -- 7 8 9]], > mask = > [[ True True True True False] > [False True False False False]], > fill_value = 6) > > > In [97]: np.ma.argmax(d3,axis=0) > Out[97]: array([1, 0, 1, 1, 1]) This is the result I would expect. If both values are masked, the fill value is used, so there is always an argmin value. The following workaround should have done the trick, but it exposes a different bug: x = np.ma.array([[0,1,2,3,4],[5,6,7,8, 9]], mask=[[1, 1, 1, 1, 0], [0, 1, 0, 0 ,0]], dtype=float) np.nanargmax(x.filled(np.nan), axis=0) This breaks with "ValueError: cannot convert float NaN to integer" St?fan From sebastian at sipsolutions.net Tue Jul 9 10:08:04 2013 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 09 Jul 2013 16:08:04 +0200 Subject: [Numpy-discussion] np.ma.argmax not respecting the mask? In-Reply-To: References: Message-ID: <1373378884.2604.6.camel@sebastian-laptop> On Tue, 2013-07-09 at 15:14 +0200, St?fan van der Walt wrote: > On Tue, Jul 9, 2013 at 2:55 PM, Chao YUE wrote: > > I am using 1.7.1 version of numpy and np.ma.argmax is not repecting the > > mask? > > > > In [96]: d3 > > Out[96]: > > masked_array(data = > > [[-- -- -- -- 4] > > [5 -- 7 8 9]], > > mask = > > [[ True True True True False] > > [False True False False False]], > > fill_value = 6) > > > > > > In [97]: np.ma.argmax(d3,axis=0) > > Out[97]: array([1, 0, 1, 1, 1]) > > This is the result I would expect. If both values are masked, the > fill value is used, so there is always an argmin value. > To be honest, I would expect the exact opposite. If there is no value, there is no minimum argument -> either its an error, or it signals invalid in some other way. On masked arrays I would expect it to be masked to signal this. The error for nanargmax is annoying, but it is right to be an error IMO, due to lack of a better representation. (Ideally mabe the user would be given the option to pass an Identity element for those nanfuncs (basically this is always NaN now, which fails for argmax since the result is integer) for which the ufunc does not have an Identity, and for those that do, we should actually use it. - Sebastian > The following workaround should have done the trick, but it exposes a > different bug: > > x = np.ma.array([[0,1,2,3,4],[5,6,7,8, 9]], mask=[[1, 1, 1, 1, 0], [0, > 1, 0, 0 ,0]], dtype=float) > np.nanargmax(x.filled(np.nan), axis=0) > > This breaks with "ValueError: cannot convert float NaN to integer" > > St?fan > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From pgmdevlist at gmail.com Tue Jul 9 10:26:08 2013 From: pgmdevlist at gmail.com (Pierre Gerard-Marchant) Date: Tue, 9 Jul 2013 16:26:08 +0200 Subject: [Numpy-discussion] np.ma.argmax not respecting the mask? In-Reply-To: <1373378884.2604.6.camel@sebastian-laptop> References: <1373378884.2604.6.camel@sebastian-laptop> Message-ID: <200BAAEA-7F9D-4A72-93B9-EE96EE347601@gmail.com> On Jul 9, 2013, at 16:08 , Sebastian Berg wrote: > On Tue, 2013-07-09 at 15:14 +0200, St?fan van der Walt wrote: >> On Tue, Jul 9, 2013 at 2:55 PM, Chao YUE wrote: >>> I am using 1.7.1 version of numpy and np.ma.argmax is not repecting the >>> mask? >>> >>> In [96]: d3 >>> Out[96]: >>> masked_array(data = >>> [[-- -- -- -- 4] >>> [5 -- 7 8 9]], >>> mask = >>> [[ True True True True False] >>> [False True False False False]], >>> fill_value = 6) >>> >>> >>> In [97]: np.ma.argmax(d3,axis=0) >>> Out[97]: array([1, 0, 1, 1, 1]) >> >> This is the result I would expect. If both values are masked, the >> fill value is used, so there is always an argmin value. >> > > To be honest, I would expect the exact opposite. If there is no value, > there is no minimum argument -> either its an error, or it signals > invalid in some other way. On masked arrays I would expect it to be > masked to signal this. The doc is quite clear: masked values are replaced by `fill_value` when determining the argmax/argmin. Attaching a mask a posteriori is always doable, but making the output of np.ma.argstuff a MaskedArray may be a nuisance at this point (any input from heavy users?). From chaoyuejoy at gmail.com Tue Jul 9 10:38:18 2013 From: chaoyuejoy at gmail.com (Chao YUE) Date: Tue, 9 Jul 2013 16:38:18 +0200 Subject: [Numpy-discussion] np.ma.argmax not respecting the mask? In-Reply-To: <200BAAEA-7F9D-4A72-93B9-EE96EE347601@gmail.com> References: <1373378884.2604.6.camel@sebastian-laptop> <200BAAEA-7F9D-4A72-93B9-EE96EE347601@gmail.com> Message-ID: Sorry I didn't the docs very carefully. there is no doc for np.ma.argmax for indeed there is for np.ma.argmin so it's an expected behavior rather than a bug. Let some heavy users to say their ideas. Practicaly, the returned value of 0 will be always confused with the values which are not masked but do have the minimum or maximum values at the 0 position over the specified axis. One way to walk around is: data_mask = np.ma.mean(axis=0).mask np.ma.masked_array(np.ma.argmax(data,axis=0), mask=data_mask) Chao On Tue, Jul 9, 2013 at 4:26 PM, Pierre Gerard-Marchant wrote: > > On Jul 9, 2013, at 16:08 , Sebastian Berg > wrote: > > > On Tue, 2013-07-09 at 15:14 +0200, St?fan van der Walt wrote: > >> On Tue, Jul 9, 2013 at 2:55 PM, Chao YUE wrote: > >>> I am using 1.7.1 version of numpy and np.ma.argmax is not repecting the > >>> mask? > >>> > >>> In [96]: d3 > >>> Out[96]: > >>> masked_array(data = > >>> [[-- -- -- -- 4] > >>> [5 -- 7 8 9]], > >>> mask = > >>> [[ True True True True False] > >>> [False True False False False]], > >>> fill_value = 6) > >>> > >>> > >>> In [97]: np.ma.argmax(d3,axis=0) > >>> Out[97]: array([1, 0, 1, 1, 1]) > >> > >> This is the result I would expect. If both values are masked, the > >> fill value is used, so there is always an argmin value. > >> > > > > To be honest, I would expect the exact opposite. If there is no value, > > there is no minimum argument -> either its an error, or it signals > > invalid in some other way. On masked arrays I would expect it to be > > masked to signal this. > > The doc is quite clear: masked values are replaced by `fill_value` when > determining the argmax/argmin. Attaching a mask a posteriori is always > doable, but making the output of np.ma.argstuff a MaskedArray may be a > nuisance at this point (any input from heavy users?). > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgmdevlist at gmail.com Tue Jul 9 10:55:51 2013 From: pgmdevlist at gmail.com (Pierre Gerard-Marchant) Date: Tue, 9 Jul 2013 16:55:51 +0200 Subject: [Numpy-discussion] np.ma.argmax not respecting the mask? In-Reply-To: References: <1373378884.2604.6.camel@sebastian-laptop> <200BAAEA-7F9D-4A72-93B9-EE96EE347601@gmail.com> Message-ID: On Jul 9, 2013, at 16:38 , Chao YUE wrote: > Sorry I didn't the docs very carefully. there is no doc for np.ma.argmax for indeed there is for np.ma.argmin Yeah, the doc of the function asks you to go check the doc of the method? Not the best. > so it's an expected behavior rather than a bug. Let some heavy users to say their ideas. > > Practicaly, the returned value of 0 will be always confused with the values which are not masked > but do have the minimum or maximum values at the 0 position over the specified axis. Well, it's just an index: if you take the corresponding value from the input array, it'll be masked... > One way to walk around is: > > > data_mask = np.ma.mean(axis=0).mask > > np.ma.masked_array(np.ma.argmax(data,axis=0), mask=data_mask) I find easier to use `mask=x.mask.prod(axis)` to get the combined mask along the desired axis (you could also use a `reduce(np.logical_and, x.mask)` for axis=0, but it's less convenient I think). From chaoyuejoy at gmail.com Tue Jul 9 11:20:46 2013 From: chaoyuejoy at gmail.com (Chao YUE) Date: Tue, 9 Jul 2013 17:20:46 +0200 Subject: [Numpy-discussion] np.ma.argmax not respecting the mask? In-Reply-To: References: <1373378884.2604.6.camel@sebastian-laptop> <200BAAEA-7F9D-4A72-93B9-EE96EE347601@gmail.com> Message-ID: Thanks Pierre, good to know there are so many tricks available. Chao On Tue, Jul 9, 2013 at 4:55 PM, Pierre Gerard-Marchant wrote: > > On Jul 9, 2013, at 16:38 , Chao YUE wrote: > > > Sorry I didn't the docs very carefully. there is no doc for np.ma.argmax > for indeed there is for np.ma.argmin > > Yeah, the doc of the function asks you to go check the doc of the method? > Not the best. > > > > so it's an expected behavior rather than a bug. Let some heavy users to > say their ideas. > > > > Practicaly, the returned value of 0 will be always confused with the > values which are not masked > > but do have the minimum or maximum values at the 0 position over the > specified axis. > > Well, it's just an index: if you take the corresponding value from the > input array, it'll be masked... > > > One way to walk around is: > > > > > > data_mask = np.ma.mean(axis=0).mask > > > > np.ma.masked_array(np.ma.argmax(data,axis=0), mask=data_mask) > > I find easier to use `mask=x.mask.prod(axis)` to get the combined mask > along the desired axis (you could also use a `reduce(np.logical_and, > x.mask)` for axis=0, but it's less convenient I think). > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From lists at onerussian.com Tue Jul 9 12:10:07 2013 From: lists at onerussian.com (Yaroslav Halchenko) Date: Tue, 9 Jul 2013 12:10:07 -0400 Subject: [Numpy-discussion] Really cruel draft of vbench setup for NumPy (.add.reduce benchmarks since 2011) In-Reply-To: <20130701215804.GG27621@onerussian.com> References: <20130506143241.GV5140@onerussian.com> <1367856232.2506.31.camel@sebastian-laptop> <20130506161153.GW5140@onerussian.com> <1367927238.23010.12.camel@sebastian-laptop> <20130701193006.GC27621@onerussian.com> <20130701215804.GG27621@onerussian.com> Message-ID: <20130709161007.GL27621@onerussian.com> Julian Taylor contributed some benchmarks he was "concerned" about, so now the collection is even better. I will keep updating tests on the same url: http://www.onerussian.com/tmp/numpy-vbench/ [it is now running and later I will upload with more commits for higher temporal fidelity] of particular interest for you might be: some minor consistent recent losses in http://www.onerussian.com/tmp/numpy-vbench/vb_vb_io.html#strided-assign-float64 http://www.onerussian.com/tmp/numpy-vbench/vb_vb_io.html#strided-assign-float32 http://www.onerussian.com/tmp/numpy-vbench/vb_vb_io.html#strided-assign-int16 http://www.onerussian.com/tmp/numpy-vbench/vb_vb_io.html#strided-assign-int8 seems have lost more than 25% of performance throughout the timeline http://www.onerussian.com/tmp/numpy-vbench/vb_vb_io.html#memcpy-int8 "fast" calls to all/any seemed to be hurt twice in their life time now running *3 times slower* than in 2011 -- inflection points correspond to regressions and/or their fixes in those functions to bring back performance on "slow" cases (when array traversal is needed, e.g. on arrays of zeros for any) http://www.onerussian.com/tmp/numpy-vbench/vb_vb_reduce.html#numpy-all-fast http://www.onerussian.com/tmp/numpy-vbench/vb_vb_reduce.html#numpy-any-fast Enjoy On Mon, 01 Jul 2013, Yaroslav Halchenko wrote: > FWIW -- updated plots with contribution from Julian Taylor > http://www.onerussian.com/tmp/numpy-vbench-20130701/vb_vb_indexing.html#mmap-slicing > ;-) > On Mon, 01 Jul 2013, Yaroslav Halchenko wrote: > > Hi Guys, > > not quite the recommendations you expressed, but here is my ugly > > attempt to improve benchmarks coverage: > > http://www.onerussian.com/tmp/numpy-vbench-20130701/index.html > > initially I also ran those ufunc benchmarks per each dtype separately, > > but then resulting webpage is loong which brings my laptop on its knees > > by firefox. So I commented those out for now, and left only "summary" > > ones across multiple datatypes. > > There is a bug in sphinx which forbids embedding some figures for > > vb_random "as is", so pardon that for now... > > I have not set cpu affinity of the process (but ran it at nice -10), so may be > > that also contributed to variance of benchmark estimates. And there probably > > could be more of goodies (e.g. gc control etc) to borrow from > > https://github.com/pydata/pandas/blob/master/vb_suite/test_perf.py which I have > > just discovered to minimize variance. > > nothing really interesting was pin-pointed so far, besides that > > - svd became a bit faster since few months back ;-) > > http://www.onerussian.com/tmp/numpy-vbench-20130701/vb_vb_linalg.html > > - isnan (and isinf, isfinite) got improved > > http://www.onerussian.com/tmp/numpy-vbench-20130701/vb_vb_ufunc.html#numpy-isnan-a-10types > > - right_shift got a miniscule slowdown from what it used to be? > > http://www.onerussian.com/tmp/numpy-vbench-20130701/vb_vb_ufunc.html#numpy-right-shift-a-a-3types > > As before -- current code of those benchmarks collection is available > > at http://github.com/yarikoptic/numpy-vbench/pull/new/master > > if you have specific snippets you would like to benchmark -- just state them > > here or send a PR -- I will add them in. > > Cheers, > > On Tue, 07 May 2013, Da?id wrote: > > > On 7 May 2013 13:47, Sebastian Berg wrote: > > > > Indexing/assignment was the first thing I thought of too (also because > > > > fancy indexing/assignment really could use some speedups...). Other then > > > > that maybe some timings for small arrays/scalar math, but that might be > > > > nice for that GSoC project. > > > Why not going bigger? Ufunc operations on big arrays, CPU and memory bound. > > > Also, what about interfacing with other packages? It may increase the > > > compiling overhead, but I would like to see Cython in action (say, > > > only last version, maybe it can be fixed). > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at scipy.org > > > http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Yaroslav O. Halchenko, Ph.D. http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org Senior Research Associate, Psychological and Brain Sciences Dept. Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 WWW: http://www.linkedin.com/in/yarik From lists at hilboll.de Wed Jul 10 11:02:07 2013 From: lists at hilboll.de (Andreas Hilboll) Date: Wed, 10 Jul 2013 17:02:07 +0200 Subject: [Numpy-discussion] flip array on axis Message-ID: <51DD776F.4040306@hilboll.de> Hi, there are np.flipud and np.fliplr methods to flip 2d arrays on the first and second dimension, respectively. What can I do to flip an array on an axis which I don't know before runtime? I'd really like to see a np.flip(arr, axis) method which lets me specify which axis to flip on. Any ideas? Cheers, Andreas. From matthew.brett at gmail.com Wed Jul 10 11:06:21 2013 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 10 Jul 2013 11:06:21 -0400 Subject: [Numpy-discussion] flip array on axis In-Reply-To: <51DD776F.4040306@hilboll.de> References: <51DD776F.4040306@hilboll.de> Message-ID: Hi, On Wed, Jul 10, 2013 at 11:02 AM, Andreas Hilboll wrote: > Hi, > > there are np.flipud and np.fliplr methods to flip 2d arrays on the first > and second dimension, respectively. What can I do to flip an array on an > axis which I don't know before runtime? I'd really like to see a > np.flip(arr, axis) method which lets me specify which axis to flip on. I have something like that that's a few lines long: https://github.com/nipy/nibabel/blob/master/nibabel/orientations.py#L231 Cheers, Matthew From jgomezdans at gmail.com Wed Jul 10 11:50:25 2013 From: jgomezdans at gmail.com (Jose Gomez-Dans) Date: Wed, 10 Jul 2013 16:50:25 +0100 Subject: [Numpy-discussion] f2py and setup.py how can I specify where the .so file goes? Message-ID: Hi, I am building a package that exposes some Fortran libraries through f2py. The packages directory looks like this: setup.py my_pack/ | |---------->__init__.py |----------> some.pyf |-----------> code.f90 I thoughat that once installed, I'd get the .so and __init__.py in the same directory (namely ~/.local/lib/python2.7/site-packages/my_pack/). However, I get ~/.local/lib/python2.7/site-packages/mypack_fortran.so ~/.local/lib/python2.7/site-packages/my_pack__fortran-1.0.2-py2.7.egg-info ~/.local/lib/python2.7/site-packages/my_pack/__init__.py Thet setup file is this at the end, I am clearly missing some option here to move the *.so into the my_pack directory.... Anybody know which one? Cheers Jose [setup.py] #!/usr/bin/env python def configuration(parent_package='',top_path=None): from numpy.distutils.misc_util import Configuration config = Configuration(parent_package,top_path) config.add_extension('mypack_fortran', ['the_pack/code.f90'] ) return config if __name__ == "__main__": from numpy.distutils.core import setup # Global variables for this extension: name = "mypack_fortran" # name of the generated python extension (.so) description = "blah" author = "" author_email = "" setup( name=name,\ description=description, \ author=author, \ author_email = author_email, \ configuration = configuration, version="1.0.2",\ packages=["my_pack"]) -------------- next part -------------- An HTML attachment was scrubbed... URL: From lists at hilboll.de Wed Jul 10 12:03:38 2013 From: lists at hilboll.de (Andreas Hilboll) Date: Wed, 10 Jul 2013 18:03:38 +0200 Subject: [Numpy-discussion] flip array on axis In-Reply-To: References: <51DD776F.4040306@hilboll.de> Message-ID: <51DD85DA.906@hilboll.de> On 10.07.2013 17:06, Matthew Brett wrote: > Hi, > > On Wed, Jul 10, 2013 at 11:02 AM, Andreas Hilboll wrote: >> Hi, >> >> there are np.flipud and np.fliplr methods to flip 2d arrays on the first >> and second dimension, respectively. What can I do to flip an array on an >> axis which I don't know before runtime? I'd really like to see a >> np.flip(arr, axis) method which lets me specify which axis to flip on. > > I have something like that that's a few lines long: > > https://github.com/nipy/nibabel/blob/master/nibabel/orientations.py#L231 > > Cheers, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > Thanks, Matthew! Should this go into numpy itself? If so, I could prepare a PR, if you point me to the right place (file) to put it. Cheers, Andreas. From aronne.merrelli at gmail.com Wed Jul 10 14:11:07 2013 From: aronne.merrelli at gmail.com (Aronne Merrelli) Date: Wed, 10 Jul 2013 14:11:07 -0400 Subject: [Numpy-discussion] Unique() function and avoiding Loop In-Reply-To: <1373012428.804822615@f377.i.mail.ru> References: <1373012428.804822615@f377.i.mail.ru> Message-ID: On Fri, Jul 5, 2013 at 4:20 AM, Bakhtiyor Zokhidov < bakhtiyor_zokhidov at mail.ru> wrote: > Hi everybody, > > I have a problem with sorting out the following function. What I expect is > that I showed as an example below. > > Two problems are encountered to achieve the result: > 1) The function sometimes can't not sort as expected: I showed an example > for that below. > 2) I could not do vectorization to avoid loop. > > > OR, Is there another way to solve that problem?? > Thanks in advance > > > Example: > data = ['', 12, 12, 423, '1', 423, -32, 12, 721, 345]. Expected > result: [0, 12, 12, 423, 0, 423, -32, 12, 721, 345], here, '' and '1' > are string type I need to replace them by zero > I don't understand your code example, but if your problem is fully described as above (replace the strings '' or '1' with the integer 0), then it would seem simplest to just do this with python built in functions rather than using numpy. The numpy functions work best with arrays, and your "data" variable is a python list with a mixture of integers and strings. Here is a possible solution: >>> data = ['', 12, 12, 423, '1', 423, -32, 12, 721, 345] >>> foo = lambda x: 0 if (x == '') or (x == '1') else x >>> print map(foo, data) [0, 12, 12, 423, 0, 423, -32, 12, 721, 345] Hope that helps, Aronne > The result I got: ['', 12, 12, 423, '1', 423, -32, 12, 721, 345] > > import numpy as np > > def func(data): > > x, i = np.unique(data, return_inverse = True) > f = [ np.where( i == ind )[0] for ind in range(len(x)) ] > > new_data = [] > # Obtain 'data' arguments and give these data to New_data > for i in range(len(x)): > if np.size(f[i]) > 1: > for j in f[i]: > if str(data[j]) <> '': > > new_data.append(data[j]) > else: > data[j] = 0 > return data > > -- > Bakhtiyor Zokhidov > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at gmail.com Wed Jul 10 14:29:00 2013 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Wed, 10 Jul 2013 14:29:00 -0400 Subject: [Numpy-discussion] flip array on axis In-Reply-To: <51DD85DA.906@hilboll.de> References: <51DD776F.4040306@hilboll.de> <51DD85DA.906@hilboll.de> Message-ID: On Wed, Jul 10, 2013 at 12:03 PM, Andreas Hilboll wrote: > On 10.07.2013 17:06, Matthew Brett wrote: > > Hi, > > > > On Wed, Jul 10, 2013 at 11:02 AM, Andreas Hilboll > wrote: > >> Hi, > >> > >> there are np.flipud and np.fliplr methods to flip 2d arrays on the first > >> and second dimension, respectively. What can I do to flip an array on an > >> axis which I don't know before runtime? I'd really like to see a > >> np.flip(arr, axis) method which lets me specify which axis to flip on. > > > > I have something like that that's a few lines long: > > > > https://github.com/nipy/nibabel/blob/master/nibabel/orientations.py#L231 > > > > Cheers, > > > > Matthew > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > Thanks, Matthew! Should this go into numpy itself? If so, I could > prepare a PR, if you point me to the right place (file) to put it. > > Something like this would be nice to have in numpy, so we don't continue to reinvent it (e.g. https://github.com/scipy/scipy/blob/master/scipy/signal/_arraytools.py; see `axis_slice` and `axis_reverse`). Warren > Cheers, Andreas. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From blake.a.griffith at gmail.com Wed Jul 10 23:29:05 2013 From: blake.a.griffith at gmail.com (Blake Griffith) Date: Wed, 10 Jul 2013 22:29:05 -0500 Subject: [Numpy-discussion] ufunc overrides Message-ID: Hello NumPy, Part of my GSoC is compatibility with SciPy's sparse matrices and NumPy's ufuncs. Currently there is no feasible way to do this without changing ufuncs a bit. I've been considering a mechanism to override ufuncs based on checking the ufuncs arguments for a __ufunc_override__ attribute. Then handing off the operation to a function specified by that attribute. I prototyped this in python and did a demo in a blog post here: http://cwl.cx/posts/week-6-ufunc-overrides.html This is similar to a previously discussed, but never implemented change: http://mail.scipy.org/pipermail/numpy-discussion/2011-June/056945.html However it seems like the ufunc machinery might be ripped out and replaced with a true multi-method implementation soon. See Travis' blog post: http://technicaldiscovery.blogspot.com/2013/07/thoughts-after-scipy-2013-and-specific.html So I'd like to make my changes as forward compatible as possible. However I'm not sure what I should even consider here, or how forward compatible my current implementation is. Thoughts? Until then, I'm writing up a nep, it is still pretty incomplete, it can be found here: https://github.com/cowlicks/numpy/blob/ufunc-override/doc/neps/ufunc-overrides.rst -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Thu Jul 11 01:01:15 2013 From: travis at continuum.io (Travis Oliphant) Date: Thu, 11 Jul 2013 00:01:15 -0500 Subject: [Numpy-discussion] ufunc overrides In-Reply-To: References: Message-ID: Hey Blake, To be clear, my blog-post is just a pre-NEP and should not be perceived as something that will transpire in NumPy anytime soon. You should take it as a "hey everyone, I think I know how to solve this problem, but I have no time to do it, but wanted to get the word out to those who might have the time" I think the multi-method approach I outline is the *right* thing to do for NumPy. Another attribute on ufuncs would be a bit of a hack (though easier to implement). But, on the other-hand, the current ufunc attributes are also a bit of a hack. While my overall proposal is to make *all* functions in NumPy (and SciPy and Scikits) multimethods, I think it's actually pretty straightforward and a more contained problem to make all *ufuncs* multi-methods. I think that could fit in a summer of code project. I don't think it would be that difficult to make all ufuncs multi-methods that dispatch based on the Python type (they are already multi-methods based on the array dtype). You could basically take the code from Guido's essay or from Peak Rules multi-method implementation or from the links below and integrate it with a wrapped version of the current ufuncs (or do a bit more glue and modify the ufunc_call function in 'C' directly and get nice general multi-methods for ufuncs). Of course, you would need to define a decorator that NumPy users could use to register their multi-method implementation with the ufunc. But, this again would not be too difficult. Look for examples and inspiration at the following places: http://alexgaynor.net/2010/jun/26/multimethods-python/ https://pypi.python.org/pypi/typed.py I really think this would be a great addition to NumPy (it would simplify a lot of cruft around masked arrays, character arrays, etc.) and be quite useful. I wish you the best. I can't promise I will have time to help, but I will try to chime in the best I can. Best regards, -Travis On Wed, Jul 10, 2013 at 10:29 PM, Blake Griffith wrote: > Hello NumPy, > > Part of my GSoC is compatibility with SciPy's sparse matrices and NumPy's > ufuncs. Currently there is no feasible way to do this without changing > ufuncs a bit. > > I've been considering a mechanism to override ufuncs based on checking the > ufuncs arguments for a __ufunc_override__ attribute. Then handing off the > operation to a function specified by that attribute. I prototyped this in > python and did a demo in a blog post here: > http://cwl.cx/posts/week-6-ufunc-overrides.html > This is similar to a previously discussed, but never implemented change: > http://mail.scipy.org/pipermail/numpy-discussion/2011-June/056945.html > > However it seems like the ufunc machinery might be ripped out and replaced > with a true multi-method implementation soon. See Travis' blog post: > > http://technicaldiscovery.blogspot.com/2013/07/thoughts-after-scipy-2013-and-specific.html > So I'd like to make my changes as forward compatible as possible. However > I'm not sure what I should even consider here, or how forward compatible my > current implementation is. Thoughts? > > Until then, I'm writing up a nep, it is still pretty incomplete, it can be > found here: > > https://github.com/cowlicks/numpy/blob/ufunc-override/doc/neps/ufunc-overrides.rst > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- Travis Oliphant Continuum Analytics, Inc. http://www.continuum.io -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Thu Jul 11 15:00:30 2013 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 11 Jul 2013 20:00:30 +0100 Subject: [Numpy-discussion] flip array on axis In-Reply-To: <51DD85DA.906@hilboll.de> References: <51DD776F.4040306@hilboll.de> <51DD85DA.906@hilboll.de> Message-ID: On Wed, Jul 10, 2013 at 5:03 PM, Andreas Hilboll wrote: > On 10.07.2013 17:06, Matthew Brett wrote: >> Hi, >> >> On Wed, Jul 10, 2013 at 11:02 AM, Andreas Hilboll wrote: >>> Hi, >>> >>> there are np.flipud and np.fliplr methods to flip 2d arrays on the first >>> and second dimension, respectively. What can I do to flip an array on an >>> axis which I don't know before runtime? I'd really like to see a >>> np.flip(arr, axis) method which lets me specify which axis to flip on. >> >> I have something like that that's a few lines long: >> >> https://github.com/nipy/nibabel/blob/master/nibabel/orientations.py#L231 >> >> Cheers, >> >> Matthew >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > Thanks, Matthew! Should this go into numpy itself? Don't see why not. > If so, I could > prepare a PR, if you point me to the right place (file) to put it. I don't think there's a lot of rigid logic to how numpy's source is laid out. numpy/lib/function_base.py maybe, or next to flipud/fliplr in numpy/lib/twodim_base.py? -n From scott.sinclair.za at gmail.com Fri Jul 12 05:02:16 2013 From: scott.sinclair.za at gmail.com (Scott Sinclair) Date: Fri, 12 Jul 2013 11:02:16 +0200 Subject: [Numpy-discussion] f2py and setup.py how can I specify where the .so file goes? In-Reply-To: References: Message-ID: On 10 July 2013 17:50, Jose Gomez-Dans wrote: > Hi, > I am building a package that exposes some Fortran libraries through f2py. > The packages directory looks like this: > setup.py > my_pack/ > | > |---------->__init__.py > |----------> some.pyf > |-----------> code.f90 > > I thoughat that once installed, I'd get the .so and __init__.py in the same > directory (namely ~/.local/lib/python2.7/site-packages/my_pack/). However, I > get > ~/.local/lib/python2.7/site-packages/mypack_fortran.so > ~/.local/lib/python2.7/site-packages/my_pack__fortran-1.0.2-py2.7.egg-info > ~/.local/lib/python2.7/site-packages/my_pack/__init__.py > > Thet setup file is this at the end, I am clearly missing some option here to > move the *.so into the my_pack directory.... Anybody know which one? > > Cheers > Jose > > [setup.py] > > #!/usr/bin/env python > > > def configuration(parent_package='',top_path=None): > from numpy.distutils.misc_util import Configuration > config = Configuration(parent_package,top_path) > config.add_extension('mypack_fortran', ['the_pack/code.f90'] ) > return config > > if __name__ == "__main__": > from numpy.distutils.core import setup > # Global variables for this extension: > name = "mypack_fortran" # name of the generated python > extension (.so) > description = "blah" > author = "" > author_email = "" > > setup( name=name,\ > description=description, \ > author=author, \ > author_email = author_email, \ > configuration = configuration, version="1.0.2",\ > packages=["my_pack"]) Something like the following should work... from numpy.distutils.core import setup, Extension my_ext = Extension(name = 'my_pack._fortran', sources = ['my_pack/code.f90']) if __name__ == "__main__": setup(name = 'my_pack', description = ..., author =..., author_email = ..., version = ..., packages = ['my_pack'], ext_modules = [my_ext], ) Cheers, Scott From sebastian at sipsolutions.net Fri Jul 12 08:38:08 2013 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Fri, 12 Jul 2013 14:38:08 +0200 Subject: [Numpy-discussion] Allow == and != to raise errors Message-ID: <1373632688.13968.13.camel@sebastian-laptop> Hey, the array comparisons == and != never raise errors but instead simply return False for invalid comparisons. The main example are arrays of non-matching dimensions, and object arrays with invalid element-wise comparisons: In [1]: np.array([1,2,3]) == np.array([1,2]) Out[1]: False In [2]: np.array([1, np.array([2, 3])], dtype=object) == [1, 2] Out[2]: False This seems wrong to me, and I am sure not just me. I doubt any large projects makes use of such comparisons and assume that most would prefer the shape mismatch to raise an error, so I would like to change it. But I am a bit unsure especially about smaller projects. So to keep the transition a bit safer could imagine implementing a FutureWarning for these cases (and that would at least notify new users that what they are doing doesn't seem like the right thing). So the question is: Is such a change safe enough, or is there some good reason for the current behavior that I am missing? Regards, Sebastian (There may be other issues with structured types that would continue returning False I think, because neither side knows how to compare) From ben.root at ou.edu Fri Jul 12 09:13:51 2013 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 12 Jul 2013 09:13:51 -0400 Subject: [Numpy-discussion] Allow == and != to raise errors In-Reply-To: <1373632688.13968.13.camel@sebastian-laptop> References: <1373632688.13968.13.camel@sebastian-laptop> Message-ID: I can see where you are getting at, but I would have to disagree. First of all, when a comparison between two mis-shaped arrays occur, you get back a bone fide python boolean, not a numpy array of bools. So if any action was taken on the result of such a comparison assumed that the result was some sort of an array, it would fail (yes, this does make it a bit difficult to trace back the source of the problem, but not impossible). Second, no semantics are broken with this. Are the arrays equal or not? If they weren't broadcastible, then returning False for == and True for != makes perfect sense to me. At least, that is my take on it. Cheers! Ben Root On Fri, Jul 12, 2013 at 8:38 AM, Sebastian Berg wrote: > Hey, > > the array comparisons == and != never raise errors but instead simply > return False for invalid comparisons. > > The main example are arrays of non-matching dimensions, and object > arrays with invalid element-wise comparisons: > > In [1]: np.array([1,2,3]) == np.array([1,2]) > Out[1]: False > > In [2]: np.array([1, np.array([2, 3])], dtype=object) == [1, 2] > Out[2]: False > > This seems wrong to me, and I am sure not just me. I doubt any large > projects makes use of such comparisons and assume that most would prefer > the shape mismatch to raise an error, so I would like to change it. But > I am a bit unsure especially about smaller projects. So to keep the > transition a bit safer could imagine implementing a FutureWarning for > these cases (and that would at least notify new users that what they are > doing doesn't seem like the right thing). > > So the question is: Is such a change safe enough, or is there some good > reason for the current behavior that I am missing? > > Regards, > > Sebastian > > (There may be other issues with structured types that would continue > returning False I think, because neither side knows how to compare) > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jgomezdans at gmail.com Fri Jul 12 09:26:19 2013 From: jgomezdans at gmail.com (Jose Gomez-Dans) Date: Fri, 12 Jul 2013 14:26:19 +0100 Subject: [Numpy-discussion] f2py and setup.py how can I specify where the .so file goes? In-Reply-To: References: Message-ID: Hi Scott, thanks for your help. On 12 July 2013 10:02, Scott Sinclair wrote: > > Something like the following should work... [...] > Your suggestion works like what I already had. The issue is that the .so created by the Extension is copied to copying /lib/python2.7/site-packages/ and not to /lib/python2.7/site-packages/my_pack As it is, Python finds it with no problems (as site-packages is in the PYTHONPATH), but I'm worried that that might not be the case with all possible setups. But maybe that's the way it's suppossed to work. Thanks! Jose -------------- next part -------------- An HTML attachment was scrubbed... URL: From gregorio.bastardo at gmail.com Fri Jul 12 10:41:04 2013 From: gregorio.bastardo at gmail.com (Gregorio Bastardo) Date: Fri, 12 Jul 2013 16:41:04 +0200 Subject: [Numpy-discussion] read-only or immutable masked array Message-ID: Hi, I use masked arrays to mark missing values in data and found it very convenient, although sometimes counterintuitive. I'd like to make a pool of masked arrays (shared between several processing steps) read-only (both data and mask property) to protect the arrays from accidental modification (and the array users from hours of debugging). The regular ndarray trick array.flags.writeable = False is perfectly fine, but it does not work on ma-s. Moreover, mask hardening only protects masked elements, and does not raise error (as I'd expect). Could you recommend an easy way to set an ma read-only? Thanks, Gregorio From stefan at sun.ac.za Fri Jul 12 11:45:30 2013 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Fri, 12 Jul 2013 17:45:30 +0200 Subject: [Numpy-discussion] read-only or immutable masked array In-Reply-To: References: Message-ID: On Fri, Jul 12, 2013 at 4:41 PM, Gregorio Bastardo wrote: > array.flags.writeable = False > > is perfectly fine, but it does not work on ma-s. Moreover, mask > hardening only protects masked elements, and does not raise error (as > I'd expect). You probably have to modify the underlying array and mask: x = np.ma.array(...) x.mask.flags.writeable = False x.data.flags.writeable = False St?fan From alan.isaac at gmail.com Fri Jul 12 14:53:44 2013 From: alan.isaac at gmail.com (Alan G Isaac) Date: Fri, 12 Jul 2013 14:53:44 -0400 Subject: [Numpy-discussion] numpy.sign query Message-ID: <51E050B8.6090006@gmail.com> The docs for numpy.sign at http://docs.scipy.org/doc/numpy/reference/generated/numpy.sign.html do not indicate how complex numbers are handled. Currently, np.sign appears to return the sign of the real part as a complex value. Was this an explicit choice? Was x/abs(x) considered (for non-zero elements)? Thanks, Alan Isaac From nouiz at nouiz.org Fri Jul 12 15:35:51 2013 From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=) Date: Fri, 12 Jul 2013 15:35:51 -0400 Subject: [Numpy-discussion] Allow == and != to raise errors In-Reply-To: References: <1373632688.13968.13.camel@sebastian-laptop> Message-ID: I also don't like that idea, but I'm not able to come to a good reasoning like Benjamin. I don't see advantage to this change and the reason isn't good enough to justify breaking the interface I think. But I don't think we rely on this, so if the change goes in, it probably won't break stuff or they will be easily seen and repared. Fred On Fri, Jul 12, 2013 at 9:13 AM, Benjamin Root wrote: > I can see where you are getting at, but I would have to disagree. First > of all, when a comparison between two mis-shaped arrays occur, you get back > a bone fide python boolean, not a numpy array of bools. So if any action > was taken on the result of such a comparison assumed that the result was > some sort of an array, it would fail (yes, this does make it a bit > difficult to trace back the source of the problem, but not impossible). > > Second, no semantics are broken with this. Are the arrays equal or not? If > they weren't broadcastible, then returning False for == and True for != > makes perfect sense to me. At least, that is my take on it. > > Cheers! > Ben Root > > > > On Fri, Jul 12, 2013 at 8:38 AM, Sebastian Berg < > sebastian at sipsolutions.net> wrote: > >> Hey, >> >> the array comparisons == and != never raise errors but instead simply >> return False for invalid comparisons. >> >> The main example are arrays of non-matching dimensions, and object >> arrays with invalid element-wise comparisons: >> >> In [1]: np.array([1,2,3]) == np.array([1,2]) >> Out[1]: False >> >> In [2]: np.array([1, np.array([2, 3])], dtype=object) == [1, 2] >> Out[2]: False >> >> This seems wrong to me, and I am sure not just me. I doubt any large >> projects makes use of such comparisons and assume that most would prefer >> the shape mismatch to raise an error, so I would like to change it. But >> I am a bit unsure especially about smaller projects. So to keep the >> transition a bit safer could imagine implementing a FutureWarning for >> these cases (and that would at least notify new users that what they are >> doing doesn't seem like the right thing). >> >> So the question is: Is such a change safe enough, or is there some good >> reason for the current behavior that I am missing? >> >> Regards, >> >> Sebastian >> >> (There may be other issues with structured types that would continue >> returning False I think, because neither side knows how to compare) >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tmp50 at ukr.net Fri Jul 12 15:46:12 2013 From: tmp50 at ukr.net (Dmitrey) Date: Fri, 12 Jul 2013 22:46:12 +0300 Subject: [Numpy-discussion] new free software for knapsack problem Message-ID: <1373658171.847348466.d5wdxm7s@fmst-1.ukr.net> Hi all, FYI new free software for knapsack problem ( http://en.wikipedia.org/wiki/Knapsack_problem ) has been made (written in Python language); it can solve possibly constrained, possibly (with interalg ) nonlinear and multiobjective problems with specifiable accuracy. Along with interalg lots of? MILP ? solvers can be used. See http://openopt.org/KSP for details. Regards, Dmitrey. -------------- next part -------------- An HTML attachment was scrubbed... URL: From lists at hilboll.de Fri Jul 12 17:14:40 2013 From: lists at hilboll.de (Andreas Hilboll) Date: Fri, 12 Jul 2013 23:14:40 +0200 Subject: [Numpy-discussion] flip array on axis In-Reply-To: References: <51DD776F.4040306@hilboll.de> Message-ID: <51E071C0.7050504@hilboll.de> Am 10.07.2013 17:06, schrieb Matthew Brett: > Hi, > > On Wed, Jul 10, 2013 at 11:02 AM, Andreas Hilboll wrote: >> Hi, >> >> there are np.flipud and np.fliplr methods to flip 2d arrays on the first >> and second dimension, respectively. What can I do to flip an array on an >> axis which I don't know before runtime? I'd really like to see a >> np.flip(arr, axis) method which lets me specify which axis to flip on. > > I have something like that that's a few lines long: > > https://github.com/nipy/nibabel/blob/master/nibabel/orientations.py#L231 > > Cheers, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > Hi Matthew, is it okay with you as the original author in nipy if I copy the flip_axis function to numpy, more or less verbatim, including tests? Cheers, Andreas. From josef.pktd at gmail.com Fri Jul 12 19:29:07 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 12 Jul 2013 19:29:07 -0400 Subject: [Numpy-discussion] Allow == and != to raise errors In-Reply-To: References: <1373632688.13968.13.camel@sebastian-laptop> Message-ID: On Fri, Jul 12, 2013 at 3:35 PM, Fr?d?ric Bastien wrote: > I also don't like that idea, but I'm not able to come to a good reasoning > like Benjamin. > > I don't see advantage to this change and the reason isn't good enough to > justify breaking the interface I think. > > But I don't think we rely on this, so if the change goes in, it probably > won't break stuff or they will be easily seen and repared. > > Fred > > > On Fri, Jul 12, 2013 at 9:13 AM, Benjamin Root wrote: >> >> I can see where you are getting at, but I would have to disagree. First >> of all, when a comparison between two mis-shaped arrays occur, you get back >> a bone fide python boolean, not a numpy array of bools. So if any action was >> taken on the result of such a comparison assumed that the result was some >> sort of an array, it would fail (yes, this does make it a bit difficult to >> trace back the source of the problem, but not impossible). >> >> Second, no semantics are broken with this. Are the arrays equal or not? If >> they weren't broadcastible, then returning False for == and True for != >> makes perfect sense to me. At least, that is my take on it. >> >> Cheers! >> Ben Root >> >> >> >> On Fri, Jul 12, 2013 at 8:38 AM, Sebastian Berg >> wrote: >>> >>> Hey, >>> >>> the array comparisons == and != never raise errors but instead simply >>> return False for invalid comparisons. >>> >>> The main example are arrays of non-matching dimensions, and object >>> arrays with invalid element-wise comparisons: >>> >>> In [1]: np.array([1,2,3]) == np.array([1,2]) >>> Out[1]: False >>> >>> In [2]: np.array([1, np.array([2, 3])], dtype=object) == [1, 2] >>> Out[2]: False >>> >>> This seems wrong to me, and I am sure not just me. I doubt any large >>> projects makes use of such comparisons and assume that most would prefer >>> the shape mismatch to raise an error, so I would like to change it. But >>> I am a bit unsure especially about smaller projects. So to keep the >>> transition a bit safer could imagine implementing a FutureWarning for >>> these cases (and that would at least notify new users that what they are >>> doing doesn't seem like the right thing). >>> >>> So the question is: Is such a change safe enough, or is there some good >>> reason for the current behavior that I am missing? >>> >>> Regards, >>> >>> Sebastian >>> >>> (There may be other issues with structured types that would continue >>> returning False I think, because neither side knows how to compare) >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > I thought Benjamin sounds pretty convincing, and since I never use this, I don't care. However, I (and I'm pretty convinced all statsmodels code) uses equality comparison only element wise. Getting a boolean back is an indicator for a bug, which is most of the time easy to trace back. There is an inconsistency in the behavior with the inequalities. >>> np.array([1,2,3]) < np.array([1,2]) Traceback (most recent call last): File "", line 1, in ValueError: shape mismatch: objects cannot be broadcast to a single shape >>> np.array([1,2,3]) <= np.array([1,2]) Traceback (most recent call last): File "", line 1, in ValueError: shape mismatch: objects cannot be broadcast to a single shape >>> (np.array([1,2,3]) == np.array([1,2])).any() Traceback (most recent call last): File "", line 1, in AttributeError: 'bool' object has no attribute 'any' The last one could be misleading and difficult to catch. >>> np.any(np.array([1,2,3]) == np.array([1,2])) False numpy 1.5.1 since I'm playing rear guard Josef Josef From charlesr.harris at gmail.com Fri Jul 12 21:25:43 2013 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 12 Jul 2013 19:25:43 -0600 Subject: [Numpy-discussion] numpy.sign query In-Reply-To: <51E050B8.6090006@gmail.com> References: <51E050B8.6090006@gmail.com> Message-ID: On Fri, Jul 12, 2013 at 12:53 PM, Alan G Isaac wrote: > The docs for numpy.sign at > http://docs.scipy.org/doc/numpy/reference/generated/numpy.sign.html > do not indicate how complex numbers are handled. Currently, np.sign > appears to return the sign of the real part as a complex value. > Was this an explicit choice? Was x/abs(x) considered (for non-zero > elements)? > > ISTR some discussion of that. Personally, I like the x/abs(x) idea. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Jul 12 21:46:54 2013 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 12 Jul 2013 19:46:54 -0600 Subject: [Numpy-discussion] nansum, nanmean, nanvar, nanstd Message-ID: Hi All, I've been working on Benjamin's PR, which I took down as he didn't have time to finish it. I've made the following changes and thought I'd run them past others before putting up a new pull request. 1. The new functions are consolidated with the old ones inn (new) numpy/lib/nanfunctions.py. 2. There is a new test module numpy/lib/tests/test_nanfunctions.py 3. The functions punt to standard routines if the array is not inexact. 4. If the array is inexact, then so must be the optional out and dtype arguments. 5. Nans are returned for all nan axis, no warnings are raised. 6. If cnt - ddof <= 0 the result is Nan for that axis, no warnings are raised. 7. For scalar returns the type of the array, or the type given by the dtype option, is preserved. Number 7 does not hold for current mean, var, and std. I propose that those functions be fixed. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From brady.mccary at gmail.com Fri Jul 12 23:00:08 2013 From: brady.mccary at gmail.com (Brady McCary) Date: Fri, 12 Jul 2013 22:00:08 -0500 Subject: [Numpy-discussion] PIL and NumPy Message-ID: NumPy Folks, I want to load images with PIL and then operate on them with NumPy. According to the PIL and NumPy documentation, I would expect the following to work, but it is not. Python 2.7.4 (default, Apr 19 2013, 18:28:01) [GCC 4.7.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import numpy >>> numpy.version.version >>> >>> import Image >>> Image.VERSION '1.1.7' >>> >>> im = Image.open('big-0.png') >>> im.size (2550, 3300) >>> >>> ar = numpy.asarray(im) >>> ar.size 1 >>> ar.shape () >>> ar array(, dtype=object) By "not working" I mean that I would have expected the data to be loaded/available in ar. PIL and NumPy/SciPy seem to be working fine independently of each other. Any guidance? Brady From brady.mccary at gmail.com Fri Jul 12 23:50:26 2013 From: brady.mccary at gmail.com (Brady McCary) Date: Fri, 12 Jul 2013 22:50:26 -0500 Subject: [Numpy-discussion] PIL and NumPy In-Reply-To: References: Message-ID: NumPy Folks, Sorry for the self-reply, but I have determined that this may have something to do with an alpha channel being present. When I remove the alpha channel, things appear to work as I expect. Any discussion on the matter? Brady On Fri, Jul 12, 2013 at 10:00 PM, Brady McCary wrote: > NumPy Folks, > > I want to load images with PIL and then operate on them with NumPy. > According to the PIL and NumPy documentation, I would expect the > following to work, but it is not. > > > > Python 2.7.4 (default, Apr 19 2013, 18:28:01) > [GCC 4.7.3] on linux2 > Type "help", "copyright", "credits" or "license" for more information. >>>> import numpy >>>> numpy.version.version >>>> >>>> import Image >>>> Image.VERSION > '1.1.7' >>>> >>>> im = Image.open('big-0.png') >>>> im.size > (2550, 3300) >>>> >>>> ar = numpy.asarray(im) >>>> ar.size > 1 >>>> ar.shape > () >>>> ar > array( 0x1E5BA70>, dtype=object) > > > > By "not working" I mean that I would have expected the data to be > loaded/available in ar. PIL and NumPy/SciPy seem to be working fine > independently of each other. Any guidance? > > Brady From sebastian at sipsolutions.net Sat Jul 13 06:26:45 2013 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Sat, 13 Jul 2013 12:26:45 +0200 Subject: [Numpy-discussion] Allow == and != to raise errors In-Reply-To: References: <1373632688.13968.13.camel@sebastian-laptop> Message-ID: <1373711205.31992.20.camel@sebastian-laptop> On Fri, 2013-07-12 at 19:29 -0400, josef.pktd at gmail.com wrote: > On Fri, Jul 12, 2013 at 3:35 PM, Fr?d?ric Bastien wrote: > > I also don't like that idea, but I'm not able to come to a good reasoning > > like Benjamin. > > > > I don't see advantage to this change and the reason isn't good enough to > > justify breaking the interface I think. > > > > But I don't think we rely on this, so if the change goes in, it probably > > won't break stuff or they will be easily seen and repared. > > > > Fred > > I thought Benjamin sounds pretty convincing, and since I never use > this, I don't care. > > However, I (and I'm pretty convinced all statsmodels code) uses > equality comparison only element wise. Getting a boolean back is an > indicator for a bug, which is most of the time easy to trace back. > > There is an inconsistency in the behavior with the inequalities. > Well, I guess I tend to think on the purity side of things. And the comparisons currently mix container and element-wise comparison up. It seems to me that it can lead to bugs, though I suppose it is unlikely to really hit anyone. One thing that keeping the behaviour means, is that the object array comparisons will be a little buggy (you get False for the whole array, when an element comparison gives an error). Though I admit, that for example arrays inside containers make any equality for the container quirky, since arrays cannot define a truth value. But if there is concern that this really could break code I won't try to press for it. - Sebastian > >>> np.array([1,2,3]) < np.array([1,2]) > Traceback (most recent call last): > File "", line 1, in > ValueError: shape mismatch: objects cannot be broadcast to a single shape > > >>> np.array([1,2,3]) <= np.array([1,2]) > Traceback (most recent call last): > File "", line 1, in > ValueError: shape mismatch: objects cannot be broadcast to a single shape > > >>> (np.array([1,2,3]) == np.array([1,2])).any() > Traceback (most recent call last): > File "", line 1, in > AttributeError: 'bool' object has no attribute 'any' > > > The last one could be misleading and difficult to catch. > > >>> np.any(np.array([1,2,3]) == np.array([1,2])) > False > > numpy 1.5.1 since I'm playing rear guard > > Josef > > > Josef > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From gregorio.bastardo at gmail.com Sat Jul 13 07:36:49 2013 From: gregorio.bastardo at gmail.com (Gregorio Bastardo) Date: Sat, 13 Jul 2013 13:36:49 +0200 Subject: [Numpy-discussion] read-only or immutable masked array In-Reply-To: References: Message-ID: Hi St?fan, Thanks for the suggestion, but it does not protect the array: >>> x = np.ma.masked_array(xrange(4), [0,1,0,1]) >>> x masked_array(data = [0 -- 2 --], mask = [False True False True], fill_value = 999999) >>> x.mask.flags.writeable = False >>> x.data.flags.writeable = False >>> x.data.flags.writeable True >>> x.mask.flags.writeable False >>> x[0] = -1 >>> x masked_array(data = [-1 -- 2 --], mask = [False True False True], fill_value = 999999) Is there a working solution for this problem? Thanks, Gregorio 2013/7/12 St?fan van der Walt : > On Fri, Jul 12, 2013 at 4:41 PM, Gregorio Bastardo > wrote: >> array.flags.writeable = False >> >> is perfectly fine, but it does not work on ma-s. Moreover, mask >> hardening only protects masked elements, and does not raise error (as >> I'd expect). > > You probably have to modify the underlying array and mask: > > x = np.ma.array(...) > x.mask.flags.writeable = False > x.data.flags.writeable = False > > St?fan > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From njs at pobox.com Sat Jul 13 09:14:42 2013 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 13 Jul 2013 14:14:42 +0100 Subject: [Numpy-discussion] Allow == and != to raise errors In-Reply-To: References: <1373632688.13968.13.camel@sebastian-laptop> Message-ID: On Fri, Jul 12, 2013 at 2:13 PM, Benjamin Root wrote: > I can see where you are getting at, but I would have to disagree. First of > all, when a comparison between two mis-shaped arrays occur, you get back a > bone fide python boolean, not a numpy array of bools. So if any action was > taken on the result of such a comparison assumed that the result was some > sort of an array, it would fail (yes, this does make it a bit difficult to > trace back the source of the problem, but not impossible). > > Second, no semantics are broken with this. Are the arrays equal or not? If > they weren't broadcastible, then returning False for == and True for != > makes perfect sense to me. At least, that is my take on it. But it does break semantics. Sure, it tells you that the arrays aren't equal -- but that's not the question you asked. "==" is not "are these arrays equal"; it's "is each pair of broadcasted aligned elements in these arrays equal", and these are totally different operations. It's unfortunate that "==" is a somewhat confusing name, but that's no reason to mix things up like this. "+" in python sometimes means "add all elements" and sometimes means "concatenate", but no-one would argue that ndarray.__add__ should the former when the arrays were broadcastable and the latter when they weren't. This is the same thing. "Errors should never pass silently", "In the face of ambiguity, refuse the temptation to guess." There's really no sensible interface here -- notice that '==' can return False but can never return True, and Josef gave an example of where it can silently produce misleading results. So to me it seems like a clear bug, but one of the sort that has a higher probability than usual that someone somewhere is depending on it... which makes it less clear what exactly to do with it. I guess one option is to just start raising errors in the first RC and see whether anyone complains! But people people don't seem to test the RCs enough to make this entirely reliable :-(. -n From josef.pktd at gmail.com Sat Jul 13 11:28:06 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 13 Jul 2013 11:28:06 -0400 Subject: [Numpy-discussion] Allow == and != to raise errors In-Reply-To: References: <1373632688.13968.13.camel@sebastian-laptop> Message-ID: On Sat, Jul 13, 2013 at 9:14 AM, Nathaniel Smith wrote: > On Fri, Jul 12, 2013 at 2:13 PM, Benjamin Root wrote: >> I can see where you are getting at, but I would have to disagree. First of >> all, when a comparison between two mis-shaped arrays occur, you get back a >> bone fide python boolean, not a numpy array of bools. So if any action was >> taken on the result of such a comparison assumed that the result was some >> sort of an array, it would fail (yes, this does make it a bit difficult to >> trace back the source of the problem, but not impossible). >> >> Second, no semantics are broken with this. Are the arrays equal or not? If >> they weren't broadcastible, then returning False for == and True for != >> makes perfect sense to me. At least, that is my take on it. > > But it does break semantics. Sure, it tells you that the arrays aren't > equal -- but that's not the question you asked. "==" is not "are these > arrays equal"; it's "is each pair of broadcasted aligned elements in > these arrays equal", and these are totally different operations. It's > unfortunate that "==" is a somewhat confusing name, but that's no > reason to mix things up like this. "+" in python sometimes means "add > all elements" and sometimes means "concatenate", but no-one would > argue that ndarray.__add__ should the former when the arrays were > broadcastable and the latter when they weren't. This is the same > thing. > > "Errors should never pass silently", "In the face of ambiguity, refuse > the temptation to guess." > > There's really no sensible interface here -- notice that '==' can > return False but can never return True, and Josef gave an example of > where it can silently produce misleading results. So to me it seems > like a clear bug, but one of the sort that has a higher probability > than usual that someone somewhere is depending on it... which makes it > less clear what exactly to do with it. > > I guess one option is to just start raising errors in the first RC and > see whether anyone complains! But people people don't seem to test the > RCs enough to make this entirely reliable :-(. I'm now +1 on the exception that Sebastian proposed. I like consistency, and having a more straightforward mental model of the numpy behavior for elementwise operations, that don't pretend sometimes to be "python" (when I'm doing array math), like this >>> [1,2,3] < [1,2] False >>> [1,2,3] > [1,2] True Josef > > -n > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From alan.isaac at gmail.com Sat Jul 13 11:30:08 2013 From: alan.isaac at gmail.com (Alan G Isaac) Date: Sat, 13 Jul 2013 11:30:08 -0400 Subject: [Numpy-discussion] add .H attribute? In-Reply-To: References: <51D98902.1090403@gmail.com> Message-ID: <51E17280.2030105@gmail.com> > On Sun, Jul 7, 2013 at 9:28 AM, Alan G Isaac > wrote: > I miss being able to spell a.conj().T as a.H, as one can > with numpy matrices. On 7/7/2013 4:49 PM, Charles R Harris wrote: > There was a long thread about this back around 1.1 or so, > long time ago in any case. IIRC correctly, Travis was > opposed. I think part of the problem was that arr.T is > a view, but arr.H would not be. Probably it could be be > made to return an iterator that performed the conjugation, > or we could simply return a new array. I'm not opposed > myself, but I'd have to review the old discussion to see > if there was good reason not to have it in the first > place. I think the original discussion of an abs method > took place about the same time. If not being a view is determinative, could a .ct() method be considered? Or would the objection apply there too? Thanks, Alan From njs at pobox.com Sat Jul 13 13:46:12 2013 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 13 Jul 2013 18:46:12 +0100 Subject: [Numpy-discussion] add .H attribute? In-Reply-To: <51E17280.2030105@gmail.com> References: <51D98902.1090403@gmail.com> <51E17280.2030105@gmail.com> Message-ID: On 13 Jul 2013 16:30, "Alan G Isaac" wrote: > > > On Sun, Jul 7, 2013 at 9:28 AM, Alan G Isaac > wrote: > > I miss being able to spell a.conj().T as a.H, as one can > > with numpy matrices. > > > On 7/7/2013 4:49 PM, Charles R Harris wrote: > > There was a long thread about this back around 1.1 or so, > > long time ago in any case. IIRC correctly, Travis was > > opposed. I think part of the problem was that arr.T is > > a view, but arr.H would not be. Probably it could be be > > made to return an iterator that performed the conjugation, > > or we could simply return a new array. I'm not opposed > > myself, but I'd have to review the old discussion to see > > if there was good reason not to have it in the first > > place. I think the original discussion of an abs method > > took place about the same time. > > > If not being a view is determinative, could a .ct() method > be considered? Or would the objection apply there too? Why not just write def H(a): return a.conj().T in your local namespace? The resulting code will be even more concise than if we had a .ct() method. ndarray has way too many attributes already IMHO (though I realize this may be a minority view). -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Sat Jul 13 14:31:32 2013 From: pav at iki.fi (Pauli Virtanen) Date: Sat, 13 Jul 2013 21:31:32 +0300 Subject: [Numpy-discussion] add .H attribute? In-Reply-To: References: <51D98902.1090403@gmail.com> <51E17280.2030105@gmail.com> Message-ID: 13.07.2013 20:46, Nathaniel Smith kirjoitti: [clip] > Why not just write > > def H(a): > return a.conj().T In long expressions, this puts H to the wrong side. -- Pauli Virtanen From alan.isaac at gmail.com Sat Jul 13 16:37:09 2013 From: alan.isaac at gmail.com (Alan G Isaac) Date: Sat, 13 Jul 2013 16:37:09 -0400 Subject: [Numpy-discussion] add .H attribute? In-Reply-To: References: <51D98902.1090403@gmail.com> <51E17280.2030105@gmail.com> Message-ID: <51E1BA75.4000806@gmail.com> On 7/13/2013 1:46 PM, Nathaniel Smith wrote: > Why not just write > > def H(a): > return a.conj().T > > in your local namespace? > First of all, I am sympathetic to being conservative about the addition of attributes! But the question about adding a.H about the possibility of improving - speed (relative to adding a function of my own) - readability (including error-free readability of others' code) - consistency (across code bases and objects) - competitiveness (with other array languages) - convenience (including key strokes) I agree that there are alternatives for the last of these. Alan From pgmdevlist at gmail.com Sun Jul 14 09:55:32 2013 From: pgmdevlist at gmail.com (Pierre GM) Date: Sun, 14 Jul 2013 15:55:32 +0200 Subject: [Numpy-discussion] read-only or immutable masked array In-Reply-To: References: Message-ID: <5585563F-9DEA-4142-A60E-6F8028E53D28@gmail.com> On Jul 13, 2013, at 13:36 , Gregorio Bastardo wrote: > Hi St?fan, > > Thanks for the suggestion, but it does not protect the array: Thinking about it, it can't: when `x` is a MaskedArray, `x.data` is just a view of the underlying array as a regular ndarray. As far as I understand, changing the `.flags` of a view doesn't affect the original. I'm a bit surprised, though. Here's what I tried >>> np.version.version <<< 1.7.0 >>> x = np.ma.array([1,2,3], mask=[0,1,0]) >>> x.flags.writeable=False >>> x[0]=-1 <<< ValueError: assignment destination is read-only What did you mean by >>> array.flags.writeable = False >>> >>> is perfectly fine, but it does not work on ma-s. ? Could you post what you did and what you got? >>> Moreover, mask >>> hardening only protects masked elements, and does not raise error (as >>> I'd expect). Yes, that's how it supposed to work. From charlesr.harris at gmail.com Sun Jul 14 14:55:17 2013 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 14 Jul 2013 12:55:17 -0600 Subject: [Numpy-discussion] What should be the result in some statistics corner cases? Message-ID: Some corner cases in the mean, var, std. *Empty arrays* I think these cases should either raise an error or just return nan. Warnings seem ineffective to me as they are only issued once by default. In [3]: ones(0).mean() /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:61: RuntimeWarning: invalid value encountered in double_scalars ret = ret / float(rcount) Out[3]: nan In [4]: ones(0).var() /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:76: RuntimeWarning: invalid value encountered in true_divide out=arrmean, casting='unsafe', subok=False) /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:100: RuntimeWarning: invalid value encountered in double_scalars ret = ret / float(rcount) Out[4]: nan In [5]: ones(0).std() /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:76: RuntimeWarning: invalid value encountered in true_divide out=arrmean, casting='unsafe', subok=False) /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:100: RuntimeWarning: invalid value encountered in double_scalars ret = ret / float(rcount) Out[5]: nan *ddof >= number of elements* I think these should just raise errors. The results for ddof >= #elements is happenstance, and certainly negative numbers should never be returned. In [6]: ones(2).var(ddof=2) /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:100: RuntimeWarning: invalid value encountered in double_scalars ret = ret / float(rcount) Out[6]: nan In [7]: ones(2).var(ddof=3) Out[7]: -0.0 * nansum* Currently returns nan for empty arrays. I suspect it should return nan for slices that are all nan, but 0 for empty slices. That would make it consistent with sum in the empty case. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at gmail.com Sun Jul 14 16:55:08 2013 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Sun, 14 Jul 2013 16:55:08 -0400 Subject: [Numpy-discussion] What should be the result in some statistics corner cases? In-Reply-To: References: Message-ID: On 7/14/13, Charles R Harris wrote: > Some corner cases in the mean, var, std. > > *Empty arrays* > > I think these cases should either raise an error or just return nan. > Warnings seem ineffective to me as they are only issued once by default. > > In [3]: ones(0).mean() > /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:61: > RuntimeWarning: invalid value encountered in double_scalars > ret = ret / float(rcount) > Out[3]: nan > > In [4]: ones(0).var() > /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:76: > RuntimeWarning: invalid value encountered in true_divide > out=arrmean, casting='unsafe', subok=False) > /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:100: > RuntimeWarning: invalid value encountered in double_scalars > ret = ret / float(rcount) > Out[4]: nan > > In [5]: ones(0).std() > /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:76: > RuntimeWarning: invalid value encountered in true_divide > out=arrmean, casting='unsafe', subok=False) > /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:100: > RuntimeWarning: invalid value encountered in double_scalars > ret = ret / float(rcount) > Out[5]: nan > > *ddof >= number of elements* > > I think these should just raise errors. The results for ddof >= #elements > is happenstance, and certainly negative numbers should never be returned. > > In [6]: ones(2).var(ddof=2) > /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:100: > RuntimeWarning: invalid value encountered in double_scalars > ret = ret / float(rcount) > Out[6]: nan > > In [7]: ones(2).var(ddof=3) > Out[7]: -0.0 > * > nansum* > > Currently returns nan for empty arrays. I suspect it should return nan for > slices that are all nan, but 0 for empty slices. That would make it > consistent with sum in the empty case. > For nansum, I would expect 0 even in the case of all nans. The point of these functions is to simply ignore nans, correct? So I would aim for this behaviour: nanfunc(x) behaves the same as func(x[~isnan(x)]) Warren > Chuck > From charlesr.harris at gmail.com Sun Jul 14 17:35:29 2013 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 14 Jul 2013 15:35:29 -0600 Subject: [Numpy-discussion] What should be the result in some statistics corner cases? In-Reply-To: References: Message-ID: On Sun, Jul 14, 2013 at 2:55 PM, Warren Weckesser < warren.weckesser at gmail.com> wrote: > On 7/14/13, Charles R Harris wrote: > > Some corner cases in the mean, var, std. > > > > *Empty arrays* > > > > I think these cases should either raise an error or just return nan. > > Warnings seem ineffective to me as they are only issued once by default. > > > > In [3]: ones(0).mean() > > > /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:61: > > RuntimeWarning: invalid value encountered in double_scalars > > ret = ret / float(rcount) > > Out[3]: nan > > > > In [4]: ones(0).var() > > > /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:76: > > RuntimeWarning: invalid value encountered in true_divide > > out=arrmean, casting='unsafe', subok=False) > > > /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:100: > > RuntimeWarning: invalid value encountered in double_scalars > > ret = ret / float(rcount) > > Out[4]: nan > > > > In [5]: ones(0).std() > > > /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:76: > > RuntimeWarning: invalid value encountered in true_divide > > out=arrmean, casting='unsafe', subok=False) > > > /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:100: > > RuntimeWarning: invalid value encountered in double_scalars > > ret = ret / float(rcount) > > Out[5]: nan > > > > *ddof >= number of elements* > > > > I think these should just raise errors. The results for ddof >= #elements > > is happenstance, and certainly negative numbers should never be returned. > > > > In [6]: ones(2).var(ddof=2) > > > /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:100: > > RuntimeWarning: invalid value encountered in double_scalars > > ret = ret / float(rcount) > > Out[6]: nan > > > > In [7]: ones(2).var(ddof=3) > > Out[7]: -0.0 > > * > > nansum* > > > > Currently returns nan for empty arrays. I suspect it should return nan > for > > slices that are all nan, but 0 for empty slices. That would make it > > consistent with sum in the empty case. > > > > > For nansum, I would expect 0 even in the case of all nans. The point > of these functions is to simply ignore nans, correct? So I would aim > for this behaviour: nanfunc(x) behaves the same as func(x[~isnan(x)]) > > Agreed, although that changes current behavior. What about the other cases? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From gregorio.bastardo at gmail.com Mon Jul 15 04:04:46 2013 From: gregorio.bastardo at gmail.com (Gregorio Bastardo) Date: Mon, 15 Jul 2013 10:04:46 +0200 Subject: [Numpy-discussion] read-only or immutable masked array In-Reply-To: <5585563F-9DEA-4142-A60E-6F8028E53D28@gmail.com> References: <5585563F-9DEA-4142-A60E-6F8028E53D28@gmail.com> Message-ID: Hi Pierre, > I'm a bit surprised, though. Here's what I tried > >>>> np.version.version > <<< 1.7.0 >>>> x = np.ma.array([1,2,3], mask=[0,1,0]) >>>> x.flags.writeable=False >>>> x[0]=-1 > <<< ValueError: assignment destination is read-only Thanks, it works perfectly =) Sorry, probably have overlooked this simple solution, tried to set x.data and x.mask directly. I noticed that this only protects the data, so mask also has to be set to read-only or be hardened to avoid accidental (un)masking. Gregorio From pgmdevlist at gmail.com Mon Jul 15 07:11:54 2013 From: pgmdevlist at gmail.com (Pierre Gerard-Marchant) Date: Mon, 15 Jul 2013 13:11:54 +0200 Subject: [Numpy-discussion] read-only or immutable masked array In-Reply-To: References: <5585563F-9DEA-4142-A60E-6F8028E53D28@gmail.com> Message-ID: On Jul 15, 2013, at 10:04 , Gregorio Bastardo wrote: > Hi Pierre, > >> I'm a bit surprised, though. Here's what I tried >> >>>>> np.version.version >> <<< 1.7.0 >>>>> x = np.ma.array([1,2,3], mask=[0,1,0]) >>>>> x.flags.writeable=False >>>>> x[0]=-1 >> <<< ValueError: assignment destination is read-only > > Thanks, it works perfectly =) Sorry, probably have overlooked this > simple solution, tried to set x.data and x.mask directly. I noticed > that this only protects the data, so mask also has to be set to > read-only or be hardened to avoid accidental (un)masking. Well, yes and no. Settings the flags of `x` doesn't set (yet) the flags of the mask, that's true. Still, `.writeable=False` should prevent you to unmask data, provided you're not trying to modify the mask directly but use basic assignment like `x[?]=?`. However, assigning `np.ma.masked` to array items does modify the mask and only the mask, hence the absence of error if the array is not writeable. Note as well that hardening the mask only prevents unmasking: you can still grow the mask, which may not be what you want. Use `x.mask.flags.writeable=False` to make the mask really read-only. From gregorio.bastardo at gmail.com Mon Jul 15 08:40:18 2013 From: gregorio.bastardo at gmail.com (Gregorio Bastardo) Date: Mon, 15 Jul 2013 14:40:18 +0200 Subject: [Numpy-discussion] read-only or immutable masked array In-Reply-To: References: <5585563F-9DEA-4142-A60E-6F8028E53D28@gmail.com> Message-ID: Hi Pierre, > Note as well that hardening the mask only prevents unmasking: you can still grow the mask, which may not be what you want. Use `x.mask.flags.writeable=False` to make the mask really read-only. I ran into an unmasking problem with the suggested approach: >>> np.version.version '1.7.0' >>> x = np.ma.masked_array(xrange(4), [0,1,0,1]) >>> x masked_array(data = [0 -- 2 --], mask = [False True False True], fill_value = 999999) >>> x.flags.writeable = False >>> x.mask.flags.writeable = False >>> x.mask[1] = 0 # ok Traceback (most recent call last): ... ValueError: assignment destination is read-only >>> x[1] = 0 # ok Traceback (most recent call last): ... ValueError: assignment destination is read-only >>> x.mask[1] = 0 # ?? >>> x masked_array(data = [0 1 2 --], mask = [False False False True], fill_value = 999999) I noticed that "sharedmask" attribute changes (from True to False) after "x[1] = 0". Also, some of the ma operations result mask identity of the new ma, which causes ValueError when the new ma mask is modified: >>> x = np.ma.masked_array(xrange(4), [0,1,0,1]) >>> x.flags.writeable = False >>> x.mask.flags.writeable = False >>> x1 = x > 0 >>> x1.mask is x.mask # ok False >>> x2 = x != 0 >>> x2.mask is x.mask # ?? True >>> x2.mask[1] = 0 Traceback (most recent call last): ... ValueError: assignment destination is read-only which is a bit confusing. And I experienced that *_like operations give mask identity too: >>> y = np.ones_like(x) >>> y.mask is x.mask True but for that I found a recent discussion ("empty_like for masked arrays") on the mailing list: http://mail.scipy.org/pipermail/numpy-discussion/2013-June/066836.html I might be missing something but could you clarify these issues? Thanks, Gregorio From bruno.piguet at gmail.com Mon Jul 15 09:09:12 2013 From: bruno.piguet at gmail.com (bruno Piguet) Date: Mon, 15 Jul 2013 15:09:12 +0200 Subject: [Numpy-discussion] Allow == and != to raise errors In-Reply-To: References: <1373632688.13968.13.camel@sebastian-laptop> Message-ID: Python itself doesn't raise an exception in such cases : >>> (3,4) != (2, 3, 4) True >>> (3,4) == (2, 3, 4) False Should numpy behave differently ? Bruno. 2013/7/12 Fr?d?ric Bastien > I also don't like that idea, but I'm not able to come to a good reasoning > like Benjamin. > > I don't see advantage to this change and the reason isn't good enough to > justify breaking the interface I think. > > But I don't think we rely on this, so if the change goes in, it probably > won't break stuff or they will be easily seen and repared. > > Fred > > > On Fri, Jul 12, 2013 at 9:13 AM, Benjamin Root wrote: > >> I can see where you are getting at, but I would have to disagree. First >> of all, when a comparison between two mis-shaped arrays occur, you get back >> a bone fide python boolean, not a numpy array of bools. So if any action >> was taken on the result of such a comparison assumed that the result was >> some sort of an array, it would fail (yes, this does make it a bit >> difficult to trace back the source of the problem, but not impossible). >> >> Second, no semantics are broken with this. Are the arrays equal or not? >> If they weren't broadcastible, then returning False for == and True for != >> makes perfect sense to me. At least, that is my take on it. >> >> Cheers! >> Ben Root >> >> >> >> On Fri, Jul 12, 2013 at 8:38 AM, Sebastian Berg < >> sebastian at sipsolutions.net> wrote: >> >>> Hey, >>> >>> the array comparisons == and != never raise errors but instead simply >>> return False for invalid comparisons. >>> >>> The main example are arrays of non-matching dimensions, and object >>> arrays with invalid element-wise comparisons: >>> >>> In [1]: np.array([1,2,3]) == np.array([1,2]) >>> Out[1]: False >>> >>> In [2]: np.array([1, np.array([2, 3])], dtype=object) == [1, 2] >>> Out[2]: False >>> >>> This seems wrong to me, and I am sure not just me. I doubt any large >>> projects makes use of such comparisons and assume that most would prefer >>> the shape mismatch to raise an error, so I would like to change it. But >>> I am a bit unsure especially about smaller projects. So to keep the >>> transition a bit safer could imagine implementing a FutureWarning for >>> these cases (and that would at least notify new users that what they are >>> doing doesn't seem like the right thing). >>> >>> So the question is: Is such a change safe enough, or is there some good >>> reason for the current behavior that I am missing? >>> >>> Regards, >>> >>> Sebastian >>> >>> (There may be other issues with structured types that would continue >>> returning False I think, because neither side knows how to compare) >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gregorio.bastardo at gmail.com Mon Jul 15 09:33:24 2013 From: gregorio.bastardo at gmail.com (Gregorio Bastardo) Date: Mon, 15 Jul 2013 15:33:24 +0200 Subject: [Numpy-discussion] empty_like for masked arrays Message-ID: Hi, On Mon, Jun 10, 2013 at 3:47 PM, Nathaniel Smith wrote: > Hi all, > > Is there anyone out there using numpy masked arrays, who has an > opinion on how empty_like (and its friends ones_like, zeros_like) > should handle the mask? > > Right now apparently if you call np.ma.empty_like on a masked array, > you get a new masked array that shares the original array's mask, so > modifying one modifies the other. That's almost certainly wrong. This > PR: > https://github.com/numpy/numpy/pull/3404 > makes it so instead the new array has values that are all set to > empty/zero/one, and a mask which is set to match the input array's > mask (so whenever something was masked in the original array, the > empty/zero/one in that place is also masked). We don't know if this is > the desired behaviour for these functions, though. Maybe it's more > intuitive for the new array to match the original array in shape and > dtype, but to always have an empty mask. Or maybe not. None of us > really use np.ma, so if you do and have an opinion then please speak > up... I recently joined the mailing list, so the message might not reach the original thread, sorry for that. I use masked arrays extensively, and would vote for the first option, as I use the *_like operations with the assumption that the resulting array has the same mask as the original. I think it's more intuitive than selecting between all masked or all unmasked behaviour. If it's not too late, please consider my use case. Thanks, Gregorio From charlesr.harris at gmail.com Mon Jul 15 09:52:15 2013 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 15 Jul 2013 07:52:15 -0600 Subject: [Numpy-discussion] What should be the result in some statistics corner cases? In-Reply-To: References: Message-ID: On Sun, Jul 14, 2013 at 3:35 PM, Charles R Harris wrote: > > > On Sun, Jul 14, 2013 at 2:55 PM, Warren Weckesser < > warren.weckesser at gmail.com> wrote: > >> On 7/14/13, Charles R Harris wrote: >> > Some corner cases in the mean, var, std. >> > >> > *Empty arrays* >> > >> > I think these cases should either raise an error or just return nan. >> > Warnings seem ineffective to me as they are only issued once by default. >> > >> > In [3]: ones(0).mean() >> > >> /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:61: >> > RuntimeWarning: invalid value encountered in double_scalars >> > ret = ret / float(rcount) >> > Out[3]: nan >> > >> > In [4]: ones(0).var() >> > >> /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:76: >> > RuntimeWarning: invalid value encountered in true_divide >> > out=arrmean, casting='unsafe', subok=False) >> > >> /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:100: >> > RuntimeWarning: invalid value encountered in double_scalars >> > ret = ret / float(rcount) >> > Out[4]: nan >> > >> > In [5]: ones(0).std() >> > >> /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:76: >> > RuntimeWarning: invalid value encountered in true_divide >> > out=arrmean, casting='unsafe', subok=False) >> > >> /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:100: >> > RuntimeWarning: invalid value encountered in double_scalars >> > ret = ret / float(rcount) >> > Out[5]: nan >> > >> > *ddof >= number of elements* >> > >> > I think these should just raise errors. The results for ddof >= >> #elements >> > is happenstance, and certainly negative numbers should never be >> returned. >> > >> > In [6]: ones(2).var(ddof=2) >> > >> /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:100: >> > RuntimeWarning: invalid value encountered in double_scalars >> > ret = ret / float(rcount) >> > Out[6]: nan >> > >> > In [7]: ones(2).var(ddof=3) >> > Out[7]: -0.0 >> > * >> > nansum* >> > >> > Currently returns nan for empty arrays. I suspect it should return nan >> for >> > slices that are all nan, but 0 for empty slices. That would make it >> > consistent with sum in the empty case. >> > >> >> >> For nansum, I would expect 0 even in the case of all nans. The point >> of these functions is to simply ignore nans, correct? So I would aim >> for this behaviour: nanfunc(x) behaves the same as func(x[~isnan(x)]) >> >> > Agreed, although that changes current behavior. What about the other > cases? > > Looks like there isn't much interest in the topic, so I'll just go ahead with the following choices: Non-NaN case 1) Empty array -> ValueError The current behavior with stats is an accident, i.e., the nan arises from 0/0. I like to think that in this case the result is any number, rather than not a number, so *the* value is simply not defined. So in this case raise a ValueError for empty array. 2) ddof >= n -> ValueError If the number of elements, n, is not zero and ddof >= n, raise a ValueError for the ddof value. Nan case 1) Empty array -> Value Error 2) Empty slice -> NaN 3) For slice ddof >= n -> Nan Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Mon Jul 15 10:20:13 2013 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 15 Jul 2013 15:20:13 +0100 Subject: [Numpy-discussion] Allow == and != to raise errors In-Reply-To: References: <1373632688.13968.13.camel@sebastian-laptop> Message-ID: On Mon, Jul 15, 2013 at 2:09 PM, bruno Piguet wrote: > Python itself doesn't raise an exception in such cases : > >>>> (3,4) != (2, 3, 4) > True >>>> (3,4) == (2, 3, 4) > False > > Should numpy behave differently ? The numpy equivalent to Python's scalar "==" is called array_equal, and that does indeed behave the same: In [5]: np.array_equal([3, 4], [2, 3, 4]) Out[5]: False But in numpy, the name "==" is shorthand for the ufunc np.equal, which raises an error: In [8]: np.equal([3, 4], [2, 3, 4]) ValueError: operands could not be broadcast together with shapes (2) (3) -n From sebastian at sipsolutions.net Mon Jul 15 10:21:35 2013 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Mon, 15 Jul 2013 16:21:35 +0200 Subject: [Numpy-discussion] Allow == and != to raise errors In-Reply-To: References: <1373632688.13968.13.camel@sebastian-laptop> Message-ID: <1373898095.15619.2.camel@sebastian-laptop> On Mon, 2013-07-15 at 15:09 +0200, bruno Piguet wrote: > Python itself doesn't raise an exception in such cases : > > >>> (3,4) != (2, 3, 4) > True > >>> (3,4) == (2, 3, 4) > False > > Should numpy behave differently ? > Yes, because Python tests whether the tuple is different, not whether the elements are: >>> (3, 4) == (3, 4) True >>> np.array([3, 4]) == np.array([3, 4]) array([ True, True], dtype=bool) So doing the test "like python" *changes* the behaviour. - Sebastian > > Bruno. > > > > 2013/7/12 Fr?d?ric Bastien > I also don't like that idea, but I'm not able to come to a > good reasoning like Benjamin. > > > I don't see advantage to this change and the reason isn't good > enough to justify breaking the interface I think. > > > But I don't think we rely on this, so if the change goes in, > it probably won't break stuff or they will be easily seen and > repared. > > > Fred > > > On Fri, Jul 12, 2013 at 9:13 AM, Benjamin Root > wrote: > I can see where you are getting at, but I would have > to disagree. First of all, when a comparison between > two mis-shaped arrays occur, you get back a bone fide > python boolean, not a numpy array of bools. So if any > action was taken on the result of such a comparison > assumed that the result was some sort of an array, it > would fail (yes, this does make it a bit difficult to > trace back the source of the problem, but not > impossible). > > > Second, no semantics are broken with this. Are the > arrays equal or not? If they weren't broadcastible, > then returning False for == and True for != makes > perfect sense to me. At least, that is my take on it. > > > Cheers! > > Ben Root > > > > > On Fri, Jul 12, 2013 at 8:38 AM, Sebastian Berg > wrote: > Hey, > > the array comparisons == and != never raise > errors but instead simply > return False for invalid comparisons. > > The main example are arrays of non-matching > dimensions, and object > arrays with invalid element-wise comparisons: > > In [1]: np.array([1,2,3]) == np.array([1,2]) > Out[1]: False > > In [2]: np.array([1, np.array([2, 3])], > dtype=object) == [1, 2] > Out[2]: False > > This seems wrong to me, and I am sure not just > me. I doubt any large > projects makes use of such comparisons and > assume that most would prefer > the shape mismatch to raise an error, so I > would like to change it. But > I am a bit unsure especially about smaller > projects. So to keep the > transition a bit safer could imagine > implementing a FutureWarning for > these cases (and that would at least notify > new users that what they are > doing doesn't seem like the right thing). > > So the question is: Is such a change safe > enough, or is there some good > reason for the current behavior that I am > missing? > > Regards, > > Sebastian > > (There may be other issues with structured > types that would continue > returning False I think, because neither side > knows how to compare) > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From ben.root at ou.edu Mon Jul 15 10:25:08 2013 From: ben.root at ou.edu (Benjamin Root) Date: Mon, 15 Jul 2013 10:25:08 -0400 Subject: [Numpy-discussion] What should be the result in some statistics corner cases? In-Reply-To: References: Message-ID: This is going to need to be heavily documented with doctests. Also, just to clarify, are we talking about a ValueError for doing a nansum on an empty array as well, or will that now return a zero? Ben Root On Mon, Jul 15, 2013 at 9:52 AM, Charles R Harris wrote: > > > On Sun, Jul 14, 2013 at 3:35 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Sun, Jul 14, 2013 at 2:55 PM, Warren Weckesser < >> warren.weckesser at gmail.com> wrote: >> >>> On 7/14/13, Charles R Harris wrote: >>> > Some corner cases in the mean, var, std. >>> > >>> > *Empty arrays* >>> > >>> > I think these cases should either raise an error or just return nan. >>> > Warnings seem ineffective to me as they are only issued once by >>> default. >>> > >>> > In [3]: ones(0).mean() >>> > >>> /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:61: >>> > RuntimeWarning: invalid value encountered in double_scalars >>> > ret = ret / float(rcount) >>> > Out[3]: nan >>> > >>> > In [4]: ones(0).var() >>> > >>> /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:76: >>> > RuntimeWarning: invalid value encountered in true_divide >>> > out=arrmean, casting='unsafe', subok=False) >>> > >>> /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:100: >>> > RuntimeWarning: invalid value encountered in double_scalars >>> > ret = ret / float(rcount) >>> > Out[4]: nan >>> > >>> > In [5]: ones(0).std() >>> > >>> /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:76: >>> > RuntimeWarning: invalid value encountered in true_divide >>> > out=arrmean, casting='unsafe', subok=False) >>> > >>> /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:100: >>> > RuntimeWarning: invalid value encountered in double_scalars >>> > ret = ret / float(rcount) >>> > Out[5]: nan >>> > >>> > *ddof >= number of elements* >>> > >>> > I think these should just raise errors. The results for ddof >= >>> #elements >>> > is happenstance, and certainly negative numbers should never be >>> returned. >>> > >>> > In [6]: ones(2).var(ddof=2) >>> > >>> /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:100: >>> > RuntimeWarning: invalid value encountered in double_scalars >>> > ret = ret / float(rcount) >>> > Out[6]: nan >>> > >>> > In [7]: ones(2).var(ddof=3) >>> > Out[7]: -0.0 >>> > * >>> > nansum* >>> > >>> > Currently returns nan for empty arrays. I suspect it should return nan >>> for >>> > slices that are all nan, but 0 for empty slices. That would make it >>> > consistent with sum in the empty case. >>> > >>> >>> >>> For nansum, I would expect 0 even in the case of all nans. The point >>> of these functions is to simply ignore nans, correct? So I would aim >>> for this behaviour: nanfunc(x) behaves the same as func(x[~isnan(x)]) >>> >>> >> Agreed, although that changes current behavior. What about the other >> cases? >> >> > Looks like there isn't much interest in the topic, so I'll just go ahead > with the following choices: > > Non-NaN case > > 1) Empty array -> ValueError > > The current behavior with stats is an accident, i.e., the nan arises from > 0/0. I like to think that in this case the result is any number, rather > than not a number, so *the* value is simply not defined. So in this case > raise a ValueError for empty array. > > 2) ddof >= n -> ValueError > > If the number of elements, n, is not zero and ddof >= n, raise a > ValueError for the ddof value. > > Nan case > > 1) Empty array -> Value Error > 2) Empty slice -> NaN > 3) For slice ddof >= n -> Nan > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon Jul 15 10:33:47 2013 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 15 Jul 2013 08:33:47 -0600 Subject: [Numpy-discussion] What should be the result in some statistics corner cases? In-Reply-To: References: Message-ID: On Mon, Jul 15, 2013 at 8:25 AM, Benjamin Root wrote: > This is going to need to be heavily documented with doctests. Also, just > to clarify, are we talking about a ValueError for doing a nansum on an > empty array as well, or will that now return a zero? > > I was going to leave nansum as is, as it seems that the result was by choice rather than by accident. Tests, not doctests. I detest doctests ;) Examples, OTOH... Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Mon Jul 15 10:34:16 2013 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Mon, 15 Jul 2013 16:34:16 +0200 Subject: [Numpy-discussion] What should be the result in some statistics corner cases? In-Reply-To: References: Message-ID: <1373898856.15619.14.camel@sebastian-laptop> On Mon, 2013-07-15 at 07:52 -0600, Charles R Harris wrote: > > > On Sun, Jul 14, 2013 at 3:35 PM, Charles R Harris > wrote: > > > For nansum, I would expect 0 even in the case of all > nans. The point > of these functions is to simply ignore nans, correct? > So I would aim > for this behaviour: nanfunc(x) behaves the same as > func(x[~isnan(x)]) > > > Agreed, although that changes current behavior. What about the > other cases? > > > > Looks like there isn't much interest in the topic, so I'll just go > ahead with the following choices: > > Non-NaN case > > 1) Empty array -> ValueError > > The current behavior with stats is an accident, i.e., the nan arises > from 0/0. I like to think that in this case the result is any number, > rather than not a number, so *the* value is simply not defined. So in > this case raise a ValueError for empty array. > To be honest, I don't mind the current behaviour much sum([]) = 0, len([]) = 0, so it is in a way well defined. At least I am not sure if I would prefer always an error. I am a bit worried that just changing it might break code out there, such as plotting code where it makes perfectly sense to plot a NaN (i.e. nothing), but if that is the case it would probably be visible fast. > 2) ddof >= n -> ValueError > > If the number of elements, n, is not zero and ddof >= n, raise a > ValueError for the ddof value. > Makes sense to me, especially for ddof > n. Just returning nan in all cases for backward compatibility would be fine with me too. > Nan case > > 1) Empty array -> Value Error > 2) Empty slice -> NaN > 3) For slice ddof >= n -> Nan > Personally I would somewhat prefer if 1) and 2) would at least default to the same thing. But I don't use the nanfuncs anyway. I was wondering about adding the option for the user to pick what the fill is (and i.e. if it is None (maybe default) -> ValueError). We could also allow this for normal reductions without an identity, but I am not sure if it is useful there. - Sebastian > Chuck > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From charlesr.harris at gmail.com Mon Jul 15 10:47:07 2013 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 15 Jul 2013 08:47:07 -0600 Subject: [Numpy-discussion] What should be the result in some statistics corner cases? In-Reply-To: <1373898856.15619.14.camel@sebastian-laptop> References: <1373898856.15619.14.camel@sebastian-laptop> Message-ID: On Mon, Jul 15, 2013 at 8:34 AM, Sebastian Berg wrote: > On Mon, 2013-07-15 at 07:52 -0600, Charles R Harris wrote: > > > > > > On Sun, Jul 14, 2013 at 3:35 PM, Charles R Harris > > wrote: > > > > > > > > > For nansum, I would expect 0 even in the case of all > > nans. The point > > of these functions is to simply ignore nans, correct? > > So I would aim > > for this behaviour: nanfunc(x) behaves the same as > > func(x[~isnan(x)]) > > > > > > Agreed, although that changes current behavior. What about the > > other cases? > > > > > > > > Looks like there isn't much interest in the topic, so I'll just go > > ahead with the following choices: > > > > Non-NaN case > > > > 1) Empty array -> ValueError > > > > The current behavior with stats is an accident, i.e., the nan arises > > from 0/0. I like to think that in this case the result is any number, > > rather than not a number, so *the* value is simply not defined. So in > > this case raise a ValueError for empty array. > > > To be honest, I don't mind the current behaviour much sum([]) = 0, > len([]) = 0, so it is in a way well defined. At least I am not sure if I > would prefer always an error. I am a bit worried that just changing it > might break code out there, such as plotting code where it makes > perfectly sense to plot a NaN (i.e. nothing), but if that is the case it > would probably be visible fast. > I'm talking about mean, var, and std as statistics, sum isn't part of that. If there is agreement that nansum of empty arrays/columns should be zero I will do that. Note the sums of empty arrays may or may not be empty. In [1]: ones((0, 3)).sum(axis=0) Out[1]: array([ 0., 0., 0.]) In [2]: ones((3, 0)).sum(axis=0) Out[2]: array([], dtype=float64) Which, sort of, makes sense. > > > 2) ddof >= n -> ValueError > > > > If the number of elements, n, is not zero and ddof >= n, raise a > > ValueError for the ddof value. > > > Makes sense to me, especially for ddof > n. Just returning nan in all > cases for backward compatibility would be fine with me too. > > > Nan case > > > > 1) Empty array -> Value Error > > 2) Empty slice -> NaN > > 3) For slice ddof >= n -> Nan > > > Personally I would somewhat prefer if 1) and 2) would at least default > to the same thing. But I don't use the nanfuncs anyway. I was wondering > about adding the option for the user to pick what the fill is (and i.e. > if it is None (maybe default) -> ValueError). We could also allow this > for normal reductions without an identity, but I am not sure if it is > useful there. > > Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From nouiz at nouiz.org Mon Jul 15 10:57:17 2013 From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=) Date: Mon, 15 Jul 2013 10:57:17 -0400 Subject: [Numpy-discussion] Allow == and != to raise errors In-Reply-To: References: <1373632688.13968.13.camel@sebastian-laptop> Message-ID: Just a question, should == behave like a ufunc or like python == for tuple? I think that all ndarray comparision (==, !=, <=, ...) should behave the same. If they don't (like it was said), making them consistent is good. What is the minimal change to have them behave the same? From my understanding, it is your proposal to change == and != to behave like real ufunc. But I'm not sure if the minimal change is the best, for new user, what they will expect more? The ufunc of the python behavior? Anyway, I see the advantage to simplify the interface to something more consistent. Anyway, if we make all comparison behave like ufunc, there is array_equal as said to have the python behavior of ==, is it useful to have equivalent function the other comparison? Do they already exist. thanks Fred On Mon, Jul 15, 2013 at 10:20 AM, Nathaniel Smith wrote: > On Mon, Jul 15, 2013 at 2:09 PM, bruno Piguet > wrote: > > Python itself doesn't raise an exception in such cases : > > > >>>> (3,4) != (2, 3, 4) > > True > >>>> (3,4) == (2, 3, 4) > > False > > > > Should numpy behave differently ? > > The numpy equivalent to Python's scalar "==" is called array_equal, > and that does indeed behave the same: > > In [5]: np.array_equal([3, 4], [2, 3, 4]) > Out[5]: False > > But in numpy, the name "==" is shorthand for the ufunc np.equal, which > raises an error: > > In [8]: np.equal([3, 4], [2, 3, 4]) > ValueError: operands could not be broadcast together with shapes (2) (3) > > -n > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon Jul 15 10:58:12 2013 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 15 Jul 2013 08:58:12 -0600 Subject: [Numpy-discussion] What should be the result in some statistics corner cases? In-Reply-To: <1373898856.15619.14.camel@sebastian-laptop> References: <1373898856.15619.14.camel@sebastian-laptop> Message-ID: On Mon, Jul 15, 2013 at 8:34 AM, Sebastian Berg wrote: > On Mon, 2013-07-15 at 07:52 -0600, Charles R Harris wrote: > > > > > > On Sun, Jul 14, 2013 at 3:35 PM, Charles R Harris > > wrote: > > > > > > > > > For nansum, I would expect 0 even in the case of all > > nans. The point > > of these functions is to simply ignore nans, correct? > > So I would aim > > for this behaviour: nanfunc(x) behaves the same as > > func(x[~isnan(x)]) > > > > > > Agreed, although that changes current behavior. What about the > > other cases? > > > > > > > > Looks like there isn't much interest in the topic, so I'll just go > > ahead with the following choices: > > > > Non-NaN case > > > > 1) Empty array -> ValueError > > > > The current behavior with stats is an accident, i.e., the nan arises > > from 0/0. I like to think that in this case the result is any number, > > rather than not a number, so *the* value is simply not defined. So in > > this case raise a ValueError for empty array. > > > To be honest, I don't mind the current behaviour much sum([]) = 0, > len([]) = 0, so it is in a way well defined. At least I am not sure if I > would prefer always an error. I am a bit worried that just changing it > might break code out there, such as plotting code where it makes > perfectly sense to plot a NaN (i.e. nothing), but if that is the case it > would probably be visible fast. > > > 2) ddof >= n -> ValueError > > > > If the number of elements, n, is not zero and ddof >= n, raise a > > ValueError for the ddof value. > > > Makes sense to me, especially for ddof > n. Just returning nan in all > cases for backward compatibility would be fine with me too. > Currently if ddof > n it returns a negative number for variance, the NaN only comes when ddof == 0 and n == 0, leading to 0/0 (float is NaN, integer is zero division). > > > Nan case > > > > 1) Empty array -> Value Error > > 2) Empty slice -> NaN > > 3) For slice ddof >= n -> Nan > > > Personally I would somewhat prefer if 1) and 2) would at least default > to the same thing. But I don't use the nanfuncs anyway. I was wondering > about adding the option for the user to pick what the fill is (and i.e. > if it is None (maybe default) -> ValueError). We could also allow this > for normal reductions without an identity, but I am not sure if it is > useful there. > In the NaN case some slices may be empty, others not. My reasoning is that that is going to be data dependent, not operator error, but if the array is empty the writer of the code should deal with that. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From bruno.piguet at gmail.com Mon Jul 15 11:00:19 2013 From: bruno.piguet at gmail.com (bruno Piguet) Date: Mon, 15 Jul 2013 17:00:19 +0200 Subject: [Numpy-discussion] Allow == and != to raise errors In-Reply-To: References: <1373632688.13968.13.camel@sebastian-laptop> Message-ID: Thank-you for your explanations. So, if the operator "==" applied to np.arrays is a shorthand for the ufunc np.equal, it should definitly behave exactly as np.equal(), and raise an error. One side question about style : In case you would like to protect a "x == y" test by a try/except clause, wouldn't it feel more "natural" to write " np.equal(x, y)" ? Bruno. 2013/7/15 Nathaniel Smith > On Mon, Jul 15, 2013 at 2:09 PM, bruno Piguet > wrote: > > Python itself doesn't raise an exception in such cases : > > > >>>> (3,4) != (2, 3, 4) > > True > >>>> (3,4) == (2, 3, 4) > > False > > > > Should numpy behave differently ? > > The numpy equivalent to Python's scalar "==" is called array_equal, > and that does indeed behave the same: > > In [5]: np.array_equal([3, 4], [2, 3, 4]) > Out[5]: False > > But in numpy, the name "==" is shorthand for the ufunc np.equal, which > raises an error: > > In [8]: np.equal([3, 4], [2, 3, 4]) > ValueError: operands could not be broadcast together with shapes (2) (3) > > -n > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Mon Jul 15 11:05:45 2013 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Mon, 15 Jul 2013 08:05:45 -0700 Subject: [Numpy-discussion] PIL and NumPy In-Reply-To: References: Message-ID: <4863100717398894614@unknownmsgid> On Jul 12, 2013, at 8:51 PM, Brady McCary wrote: > > something to do with an alpha channel being present. I'd check and see how PIL is storing the alpha channel. If it's RGBA, then I'd expect it to work. But I'd PIL is storing the alpha channel as a separate band, then I'm not surprised you have an issue. Can you either drop the alpha or convert to RGBA? There is also a package called something line "imageArray" that loads and saves image formats directly to/from numpy arrays-maybe that would be helpful. CHB > When I remove the > alpha channel, things appear to work as I expect. Any discussion on > the matter? > > Brady > > On Fri, Jul 12, 2013 at 10:00 PM, Brady McCary wrote: >> NumPy Folks, >> >> I want to load images with PIL and then operate on them with NumPy. >> According to the PIL and NumPy documentation, I would expect the >> following to work, but it is not. >> >> >> >> Python 2.7.4 (default, Apr 19 2013, 18:28:01) >> [GCC 4.7.3] on linux2 >> Type "help", "copyright", "credits" or "license" for more information. >>>>> import numpy >>>>> numpy.version.version >>>>> >>>>> import Image >>>>> Image.VERSION >> '1.1.7' >>>>> >>>>> im = Image.open('big-0.png') >>>>> im.size >> (2550, 3300) >>>>> >>>>> ar = numpy.asarray(im) >>>>> ar.size >> 1 >>>>> ar.shape >> () >>>>> ar >> array(> 0x1E5BA70>, dtype=object) >> >> >> >> By "not working" I mean that I would have expected the data to be >> loaded/available in ar. PIL and NumPy/SciPy seem to be working fine >> independently of each other. Any guidance? >> >> Brady > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From bruno.piguet at gmail.com Mon Jul 15 11:12:58 2013 From: bruno.piguet at gmail.com (bruno Piguet) Date: Mon, 15 Jul 2013 17:12:58 +0200 Subject: [Numpy-discussion] Allow == and != to raise errors In-Reply-To: References: <1373632688.13968.13.camel@sebastian-laptop> Message-ID: 2013/7/15 Fr?d?ric Bastien > Just a question, should == behave like a ufunc or like python == for tuple? > That's what I was also wondering. I see the advantage of consistency for newcomers. I'm not experienced enough to see if this is a problem for numerical practitionners Maybe they wouldn't even imagine that "==" applied to arrays could do anything else than element-wise comparison ? "Explicit is better than implicit" : to me, np.equal(x, y) is more explicit than "x == y". But "Beautiful is better than ugly". Is np.equal(x, y) ugly ? Bruno. > I think that all ndarray comparision (==, !=, <=, ...) should behave the > same. If they don't (like it was said), making them consistent is good. > What is the minimal change to have them behave the same? From my > understanding, it is your proposal to change == and != to behave like real > ufunc. But I'm not sure if the minimal change is the best, for new user, > what they will expect more? The ufunc of the python behavior? > > Anyway, I see the advantage to simplify the interface to something more > consistent. > > Anyway, if we make all comparison behave like ufunc, there is array_equal > as said to have the python behavior of ==, is it useful to have equivalent > function the other comparison? Do they already exist. > > thanks > > Fred > > > On Mon, Jul 15, 2013 at 10:20 AM, Nathaniel Smith wrote: > >> On Mon, Jul 15, 2013 at 2:09 PM, bruno Piguet >> wrote: >> > Python itself doesn't raise an exception in such cases : >> > >> >>>> (3,4) != (2, 3, 4) >> > True >> >>>> (3,4) == (2, 3, 4) >> > False >> > >> > Should numpy behave differently ? >> >> The numpy equivalent to Python's scalar "==" is called array_equal, >> and that does indeed behave the same: >> >> In [5]: np.array_equal([3, 4], [2, 3, 4]) >> Out[5]: False >> >> But in numpy, the name "==" is shorthand for the ufunc np.equal, which >> raises an error: >> >> In [8]: np.equal([3, 4], [2, 3, 4]) >> ValueError: operands could not be broadcast together with shapes (2) (3) >> >> -n >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgmdevlist at gmail.com Mon Jul 15 11:25:17 2013 From: pgmdevlist at gmail.com (Pierre Gerard-Marchant) Date: Mon, 15 Jul 2013 17:25:17 +0200 Subject: [Numpy-discussion] read-only or immutable masked array In-Reply-To: References: <5585563F-9DEA-4142-A60E-6F8028E53D28@gmail.com> Message-ID: <93CF6FC4-E06F-4697-93F9-6628E6C3528D@gmail.com> On Jul 15, 2013, at 14:40 , Gregorio Bastardo wrote: > Hi Pierre, > >> Note as well that hardening the mask only prevents unmasking: you can still grow the mask, which may not be what you want. Use `x.mask.flags.writeable=False` to make the mask really read-only. > > I ran into an unmasking problem with the suggested approach: > >>>> np.version.version > '1.7.0' >>>> x = np.ma.masked_array(xrange(4), [0,1,0,1]) >>>> x > masked_array(data = [0 -- 2 --], > mask = [False True False True], > fill_value = 999999) >>>> x.flags.writeable = False >>>> x.mask.flags.writeable = False >>>> x.mask[1] = 0 # ok > Traceback (most recent call last): > ... > ValueError: assignment destination is read-only >>>> x[1] = 0 # ok > Traceback (most recent call last): > ... > ValueError: assignment destination is read-only >>>> x.mask[1] = 0 # ?? >>>> x > masked_array(data = [0 1 2 --], > mask = [False False False True], > fill_value = 999999) Ouch? Quick workaround: use `x.harden_mask()` *then* `x.mask.flags.writeable=False` [Longer explanation] > I noticed that "sharedmask" attribute changes (from True to False) > after "x[1] = 0". Indeed, indeed? When setting items, the mask is unshared to limit some issues (like propagation to the other masked_arrays sharing the mask). Unsharing the mask involves a copy, which unfortunately doesn't copy the flags. In other terms, when you try `x[1]=0`, the mask becomes rewritable. That hurts? But! This call to `unshare_mask` is performed only when the mask is 'soft' hence the quick workaround? Note to self (or whomever will fix the issue before I can do it): * We could make sure that copying a mask copies some of its flags to (like the `writeable` one, which other ones?) * The call to `unshare_mask` is made *before* we try to call `__setitem__` on the `_data` part: that's silly, if we called `__setitem__(_data,index,dval)` before, the `ValueError: assignment destination is read-only` would be raised before the mask could get unshared? TLD;DR: move L3073 of np.ma.core to L3068 * There should be some simpler ways to make a masked_array read-only, this little dance is rapidly tiring. > Also, some of the ma operations result mask identity > of the new ma, which causes ValueError when the new ma mask is > modified: > >>>> x = np.ma.masked_array(xrange(4), [0,1,0,1]) >>>> x.flags.writeable = False >>>> x.mask.flags.writeable = False >>>> x1 = x > 0 >>>> x1.mask is x.mask # ok > False >>>> x2 = x != 0 >>>> x2.mask is x.mask # ?? > True >>>> x2.mask[1] = 0 > Traceback (most recent call last): > ... > ValueError: assignment destination is read-only > > which is a bit confusing. Ouch again. [TL;DR] No workaround, sorry [Long version] The inconsistency comes from the fact that '!=' or '==' call the `__ne__` or `__eq__` methods while other comparison operators call their own function. In the first case, because we're comparing with a non-masked scalar, no copy of the mask is made; in the second case, a copy is systematically made. As pointed out earlier, copies of a mask don't preserve its flags? [Note to self] * Define a factory for __lt__/__le__/__gt__/__ge__ based on __eq__ : MaskedArray.__eq__ and __ne__ already have almost the same code.. (but what about filling? Is it an issue?) > And I experienced that *_like operations > give mask identity too: > >>>> y = np.ones_like(x) >>>> y.mask is x.mask > True This may change in the future, depending on a yet-to-be-achieved consensus on the definition of 'least-surprising behaviour'. Right now, the *-like functions return an array that shares the mask with the input, as you've noticed. Some people complained about it, what's your take on that? > I might be missing something but could you clarify these issues? You were not missing anything, np.ma isn't the most straightforward module: plenty of corner cases, and the implementation is pretty naive at times (but hey, it works). My only advice is to never lose hope. From charlesr.harris at gmail.com Mon Jul 15 11:47:50 2013 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 15 Jul 2013 09:47:50 -0600 Subject: [Numpy-discussion] What should be the result in some statistics corner cases? In-Reply-To: References: <1373898856.15619.14.camel@sebastian-laptop> Message-ID: On Mon, Jul 15, 2013 at 8:58 AM, Charles R Harris wrote: > > > On Mon, Jul 15, 2013 at 8:34 AM, Sebastian Berg < > sebastian at sipsolutions.net> wrote: > >> On Mon, 2013-07-15 at 07:52 -0600, Charles R Harris wrote: >> > >> > >> > On Sun, Jul 14, 2013 at 3:35 PM, Charles R Harris >> > wrote: >> > >> >> >> >> > >> > For nansum, I would expect 0 even in the case of all >> > nans. The point >> > of these functions is to simply ignore nans, correct? >> > So I would aim >> > for this behaviour: nanfunc(x) behaves the same as >> > func(x[~isnan(x)]) >> > >> > >> > Agreed, although that changes current behavior. What about the >> > other cases? >> > >> > >> > >> > Looks like there isn't much interest in the topic, so I'll just go >> > ahead with the following choices: >> > >> > Non-NaN case >> > >> > 1) Empty array -> ValueError >> > >> > The current behavior with stats is an accident, i.e., the nan arises >> > from 0/0. I like to think that in this case the result is any number, >> > rather than not a number, so *the* value is simply not defined. So in >> > this case raise a ValueError for empty array. >> > >> To be honest, I don't mind the current behaviour much sum([]) = 0, >> len([]) = 0, so it is in a way well defined. At least I am not sure if I >> would prefer always an error. I am a bit worried that just changing it >> might break code out there, such as plotting code where it makes >> perfectly sense to plot a NaN (i.e. nothing), but if that is the case it >> would probably be visible fast. >> >> > 2) ddof >= n -> ValueError >> > >> > If the number of elements, n, is not zero and ddof >= n, raise a >> > ValueError for the ddof value. >> > >> Makes sense to me, especially for ddof > n. Just returning nan in all >> cases for backward compatibility would be fine with me too. >> > > Currently if ddof > n it returns a negative number for variance, the NaN > only comes when ddof == 0 and n == 0, leading to 0/0 (float is NaN, integer > is zero division). > > >> >> > Nan case >> > >> > 1) Empty array -> Value Error >> > 2) Empty slice -> NaN >> > 3) For slice ddof >= n -> Nan >> > >> Personally I would somewhat prefer if 1) and 2) would at least default >> to the same thing. But I don't use the nanfuncs anyway. I was wondering >> about adding the option for the user to pick what the fill is (and i.e. >> if it is None (maybe default) -> ValueError). We could also allow this >> for normal reductions without an identity, but I am not sure if it is >> useful there. >> > > In the NaN case some slices may be empty, others not. My reasoning is that > that is going to be data dependent, not operator error, but if the array is > empty the writer of the code should deal with that. > > In the case of the nanvar, nanstd, it might make more sense to handle ddof as 1) if ddof is >= axis size, raise ValueError 2) if ddof is >= number of values after removing NaNs, return NaN The first would be consistent with the non-nan case, the second accounts for the variable nature of data containing NaNs. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Mon Jul 15 11:55:05 2013 From: ben.root at ou.edu (Benjamin Root) Date: Mon, 15 Jul 2013 11:55:05 -0400 Subject: [Numpy-discussion] What should be the result in some statistics corner cases? In-Reply-To: References: <1373898856.15619.14.camel@sebastian-laptop> Message-ID: On Jul 15, 2013 11:47 AM, "Charles R Harris" wrote: > > > On Mon, Jul 15, 2013 at 8:58 AM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Mon, Jul 15, 2013 at 8:34 AM, Sebastian Berg < >> sebastian at sipsolutions.net> wrote: >> >>> On Mon, 2013-07-15 at 07:52 -0600, Charles R Harris wrote: >>> > >>> > >>> > On Sun, Jul 14, 2013 at 3:35 PM, Charles R Harris >>> > wrote: >>> > >>> >>> >>> >>> > >>> > For nansum, I would expect 0 even in the case of all >>> > nans. The point >>> > of these functions is to simply ignore nans, correct? >>> > So I would aim >>> > for this behaviour: nanfunc(x) behaves the same as >>> > func(x[~isnan(x)]) >>> > >>> > >>> > Agreed, although that changes current behavior. What about the >>> > other cases? >>> > >>> > >>> > >>> > Looks like there isn't much interest in the topic, so I'll just go >>> > ahead with the following choices: >>> > >>> > Non-NaN case >>> > >>> > 1) Empty array -> ValueError >>> > >>> > The current behavior with stats is an accident, i.e., the nan arises >>> > from 0/0. I like to think that in this case the result is any number, >>> > rather than not a number, so *the* value is simply not defined. So in >>> > this case raise a ValueError for empty array. >>> > >>> To be honest, I don't mind the current behaviour much sum([]) = 0, >>> len([]) = 0, so it is in a way well defined. At least I am not sure if I >>> would prefer always an error. I am a bit worried that just changing it >>> might break code out there, such as plotting code where it makes >>> perfectly sense to plot a NaN (i.e. nothing), but if that is the case it >>> would probably be visible fast. >>> >>> > 2) ddof >= n -> ValueError >>> > >>> > If the number of elements, n, is not zero and ddof >= n, raise a >>> > ValueError for the ddof value. >>> > >>> Makes sense to me, especially for ddof > n. Just returning nan in all >>> cases for backward compatibility would be fine with me too. >>> >> >> Currently if ddof > n it returns a negative number for variance, the NaN >> only comes when ddof == 0 and n == 0, leading to 0/0 (float is NaN, integer >> is zero division). >> >> >>> >>> > Nan case >>> > >>> > 1) Empty array -> Value Error >>> > 2) Empty slice -> NaN >>> > 3) For slice ddof >= n -> Nan >>> > >>> Personally I would somewhat prefer if 1) and 2) would at least default >>> to the same thing. But I don't use the nanfuncs anyway. I was wondering >>> about adding the option for the user to pick what the fill is (and i.e. >>> if it is None (maybe default) -> ValueError). We could also allow this >>> for normal reductions without an identity, but I am not sure if it is >>> useful there. >>> >> >> In the NaN case some slices may be empty, others not. My reasoning is >> that that is going to be data dependent, not operator error, but if the >> array is empty the writer of the code should deal with that. >> >> > In the case of the nanvar, nanstd, it might make more sense to handle ddof > as > > 1) if ddof is >= axis size, raise ValueError > 2) if ddof is >= number of values after removing NaNs, return NaN > > The first would be consistent with the non-nan case, the second accounts > for the variable nature of data containing NaNs. > > Chuck > > > I think this is a good idea in that it naturally follows well with the conventions of what to do with empty arrays / empty slices with nanmean, etc. Note, however, I am not a very big fan of the idea of having two different behaviors for what I see as semantically the same thing. But, my objections are not strong enough to veto it, and I do think this proposal is well thought-out. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Mon Jul 15 11:55:44 2013 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Mon, 15 Jul 2013 17:55:44 +0200 Subject: [Numpy-discussion] What should be the result in some statistics corner cases? In-Reply-To: References: <1373898856.15619.14.camel@sebastian-laptop> Message-ID: <1373903744.15619.35.camel@sebastian-laptop> On Mon, 2013-07-15 at 08:47 -0600, Charles R Harris wrote: > > > On Mon, Jul 15, 2013 at 8:34 AM, Sebastian Berg > wrote: > On Mon, 2013-07-15 at 07:52 -0600, Charles R Harris wrote: > > > > > > On Sun, Jul 14, 2013 at 3:35 PM, Charles R Harris > > wrote: > > > > > > > > > > For nansum, I would expect 0 even in the > case of all > > nans. The point > > of these functions is to simply ignore nans, > correct? > > So I would aim > > for this behaviour: nanfunc(x) behaves the > same as > > func(x[~isnan(x)]) > > > > > > Agreed, although that changes current behavior. What > about the > > other cases? > > > > > > > > Looks like there isn't much interest in the topic, so I'll > just go > > ahead with the following choices: > > > > Non-NaN case > > > > 1) Empty array -> ValueError > > > > The current behavior with stats is an accident, i.e., the > nan arises > > from 0/0. I like to think that in this case the result is > any number, > > rather than not a number, so *the* value is simply not > defined. So in > > this case raise a ValueError for empty array. > > > > To be honest, I don't mind the current behaviour much sum([]) > = 0, > len([]) = 0, so it is in a way well defined. At least I am not > sure if I > would prefer always an error. I am a bit worried that just > changing it > might break code out there, such as plotting code where it > makes > perfectly sense to plot a NaN (i.e. nothing), but if that is > the case it > would probably be visible fast. > > I'm talking about mean, var, and std as statistics, sum isn't part of > that. If there is agreement that nansum of empty arrays/columns should > be zero I will do that. Note the sums of empty arrays may or may not > be empty. > > In [1]: ones((0, 3)).sum(axis=0) > Out[1]: array([ 0., 0., 0.]) > > In [2]: ones((3, 0)).sum(axis=0) > Out[2]: array([], dtype=float64) > > Which, sort of, makes sense. > > I think we can agree that the behaviour for reductions with an identity should default to returning the identity, including for the nanfuncs, i.e. sum([]) is 0, product([]) is 1... Since mean = sum/length is a sensible definition, having 0/0 as a result doesn't seem to bad to me to be honest, it might be accidental but it is not a special case in the code ;). Though I don't mind an error as long as it doesn't break matplotlib or so. I agree about the nanfuncs raising an error would probably be more of a problem then for a usual ufunc, but still a bit hesitant about saying that it is ok too. I could imagine adding a very general "identity" argument (though I would not call it identity, because it is not the same as `np.add.identity`, just used in a place where that would be used otherwise): np.add.reduce([], identity=123) -> [123] np.add.reduce([1], identity=123) -> [1] np.nanmean([np.nan], identity=None) -> Error np.nanmean([np.nan], identity=np.nan) -> np.nan It doesn't really make sense, but: np.subtract.reduce([]) -> Error, since np.substract.identity is None np.subtract.reduce([], identity=0) -> 0, suppressing the error. I am not sure if I am convinced myself, but especially for the nanfuncs it could maybe provide a way to circumvent the problem somewhat. Including functions such as np.nanargmin, whose result type does not even support NaN. Plus it gives an argument allowing for warnings about changing behaviour. - Sebastian > > > 2) ddof >= n -> ValueError > > > > If the number of elements, n, is not zero and ddof >= n, > raise a > > ValueError for the ddof value. > > > > Makes sense to me, especially for ddof > n. Just returning nan > in all > cases for backward compatibility would be fine with me too. > > > Nan case > > > > 1) Empty array -> Value Error > > 2) Empty slice -> NaN > > 3) For slice ddof >= n -> Nan > > > > Personally I would somewhat prefer if 1) and 2) would at least > default > to the same thing. But I don't use the nanfuncs anyway. I was > wondering > about adding the option for the user to pick what the fill is > (and i.e. > if it is None (maybe default) -> ValueError). We could also > allow this > for normal reductions without an identity, but I am not sure > if it is > useful there. > > > Chuck > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From sebastian at sipsolutions.net Mon Jul 15 12:18:44 2013 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Mon, 15 Jul 2013 18:18:44 +0200 Subject: [Numpy-discussion] Allow == and != to raise errors In-Reply-To: References: <1373632688.13968.13.camel@sebastian-laptop> Message-ID: <1373905124.15619.45.camel@sebastian-laptop> On Mon, 2013-07-15 at 17:12 +0200, bruno Piguet wrote: > > > > 2013/7/15 Fr?d?ric Bastien > Just a question, should == behave like a ufunc or like python > == for tuple? > > > > That's what I was also wondering. I am not sure I understand the question. Of course == should be (mostly?) identical to np.equal. Things like arr[arr == 0] = -1 etc., etc., are a common design pattern. Operations on arrays are element-wise by default, "falling back" to the python tuple/container behaviour" is a special case and I do not see a good reason for it, except possibly backward compatibility. Personally I doubt anyone who seriously uses numpy, uses the np.array([1, 2, 3]) == np.array([1,2]) -> False behaviour, and it seems a bit like a trap to me, because suddenly you get: np.array([1, 2, 3]) == np.array([1]) -> np.array([True, False, False]) (Though in combination with np.all, it can make sense and is then identical to np.array_equiv/np.array_equal) - Sebastian > I see the advantage of consistency for newcomers. > I'm not experienced enough to see if this is a problem for numerical > practitionners Maybe they wouldn't even imagine that "==" applied to > arrays could do anything else than element-wise comparison ? > > "Explicit is better than implicit" : to me, np.equal(x, y) is more > explicit than "x == y". > > But "Beautiful is better than ugly". Is np.equal(x, y) ugly ? > > > Bruno. > > > > > > I think that all ndarray comparision (==, !=, <=, ...) should > behave the same. If they don't (like it was said), making them > consistent is good. What is the minimal change to have them > behave the same? From my understanding, it is your proposal to > change == and != to behave like real ufunc. But I'm not sure > if the minimal change is the best, for new user, what they > will expect more? The ufunc of the python behavior? > > > Anyway, I see the advantage to simplify the interface to > something more consistent. > > > Anyway, if we make all comparison behave like ufunc, there is > array_equal as said to have the python behavior of ==, is it > useful to have equivalent function the other comparison? Do > they already exist. > > > thanks > > > > Fred > > > On Mon, Jul 15, 2013 at 10:20 AM, Nathaniel Smith > wrote: > On Mon, Jul 15, 2013 at 2:09 PM, bruno Piguet > wrote: > > Python itself doesn't raise an exception in such > cases : > > > >>>> (3,4) != (2, 3, 4) > > True > >>>> (3,4) == (2, 3, 4) > > False > > > > Should numpy behave differently ? > > > The numpy equivalent to Python's scalar "==" is called > array_equal, > and that does indeed behave the same: > > In [5]: np.array_equal([3, 4], [2, 3, 4]) > Out[5]: False > > But in numpy, the name "==" is shorthand for the ufunc > np.equal, which > raises an error: > > In [8]: np.equal([3, 4], [2, 3, 4]) > ValueError: operands could not be broadcast together > with shapes (2) (3) > > -n > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From stefan at sun.ac.za Mon Jul 15 12:25:56 2013 From: stefan at sun.ac.za (=?iso-8859-1?Q?St=E9fan?= van der Walt) Date: Mon, 15 Jul 2013 18:25:56 +0200 Subject: [Numpy-discussion] PIL and NumPy In-Reply-To: References: Message-ID: <20130715162556.GC18804@shinobi> Dear Brady On Fri, 12 Jul 2013 22:00:08 -0500, Brady McCary wrote: > > I want to load images with PIL and then operate on them with NumPy. > According to the PIL and NumPy documentation, I would expect the > following to work, but it is not. Reading images as PIL is a little bit trickier than one would hope. You can find an example of how to do it (taken scikit-image) here: https://github.com/scikit-image/scikit-image/blob/master/skimage/io/_plugins/pil_plugin.py#L15 St?fan From gregorio.bastardo at gmail.com Mon Jul 15 12:41:17 2013 From: gregorio.bastardo at gmail.com (Gregorio Bastardo) Date: Mon, 15 Jul 2013 18:41:17 +0200 Subject: [Numpy-discussion] read-only or immutable masked array In-Reply-To: <93CF6FC4-E06F-4697-93F9-6628E6C3528D@gmail.com> References: <5585563F-9DEA-4142-A60E-6F8028E53D28@gmail.com> <93CF6FC4-E06F-4697-93F9-6628E6C3528D@gmail.com> Message-ID: > Ouch? > Quick workaround: use `x.harden_mask()` *then* `x.mask.flags.writeable=False` Thanks for the update and the detailed explanation. I'll try this trick. > This may change in the future, depending on a yet-to-be-achieved consensus on the definition of 'least-surprising behaviour'. Right now, the *-like functions return an array that shares the mask with the input, as you've noticed. Some people complained about it, what's your take on that? I already took part in the survey (possibly out of thread): http://mail.scipy.org/pipermail/numpy-discussion/2013-July/067136.html > You were not missing anything, np.ma isn't the most straightforward module: plenty of corner cases, and the implementation is pretty naive at times (but hey, it works). My only advice is to never lose hope. I agree there are plenty of hard-to-define cases, and I came accross a hot debate on missing data representation in python: https://github.com/njsmith/numpy/wiki/NA-discussion-status but still I believe np.ma is very usable when compression is not strongly needed. From charlesr.harris at gmail.com Mon Jul 15 13:29:43 2013 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 15 Jul 2013 11:29:43 -0600 Subject: [Numpy-discussion] What should be the result in some statistics corner cases? In-Reply-To: <1373903744.15619.35.camel@sebastian-laptop> References: <1373898856.15619.14.camel@sebastian-laptop> <1373903744.15619.35.camel@sebastian-laptop> Message-ID: On Mon, Jul 15, 2013 at 9:55 AM, Sebastian Berg wrote: > On Mon, 2013-07-15 at 08:47 -0600, Charles R Harris wrote: > > > > > > On Mon, Jul 15, 2013 at 8:34 AM, Sebastian Berg > > wrote: > > On Mon, 2013-07-15 at 07:52 -0600, Charles R Harris wrote: > > > > > > > > > On Sun, Jul 14, 2013 at 3:35 PM, Charles R Harris > > > wrote: > > > > > > > > > > > > > > > > > For nansum, I would expect 0 even in the > > case of all > > > nans. The point > > > of these functions is to simply ignore nans, > > correct? > > > So I would aim > > > for this behaviour: nanfunc(x) behaves the > > same as > > > func(x[~isnan(x)]) > > > > > > > > > Agreed, although that changes current behavior. What > > about the > > > other cases? > > > > > > > > > > > > Looks like there isn't much interest in the topic, so I'll > > just go > > > ahead with the following choices: > > > > > > Non-NaN case > > > > > > 1) Empty array -> ValueError > > > > > > The current behavior with stats is an accident, i.e., the > > nan arises > > > from 0/0. I like to think that in this case the result is > > any number, > > > rather than not a number, so *the* value is simply not > > defined. So in > > > this case raise a ValueError for empty array. > > > > > > > To be honest, I don't mind the current behaviour much sum([]) > > = 0, > > len([]) = 0, so it is in a way well defined. At least I am not > > sure if I > > would prefer always an error. I am a bit worried that just > > changing it > > might break code out there, such as plotting code where it > > makes > > perfectly sense to plot a NaN (i.e. nothing), but if that is > > the case it > > would probably be visible fast. > > > > I'm talking about mean, var, and std as statistics, sum isn't part of > > that. If there is agreement that nansum of empty arrays/columns should > > be zero I will do that. Note the sums of empty arrays may or may not > > be empty. > > > > In [1]: ones((0, 3)).sum(axis=0) > > Out[1]: array([ 0., 0., 0.]) > > > > In [2]: ones((3, 0)).sum(axis=0) > > Out[2]: array([], dtype=float64) > > > > Which, sort of, makes sense. > > > > > I think we can agree that the behaviour for reductions with an identity > should default to returning the identity, including for the nanfuncs, > i.e. sum([]) is 0, product([]) is 1... > > Since mean = sum/length is a sensible definition, having 0/0 as a result > doesn't seem to bad to me to be honest, it might be accidental but it is > not a special case in the code ;). Though I don't mind an error as long > as it doesn't break matplotlib or so. > > I agree about the nanfuncs raising an error would probably be more of a > problem then for a usual ufunc, but still a bit hesitant about saying > that it is ok too. I could imagine adding a very general "identity" > argument (though I would not call it identity, because it is not the > same as `np.add.identity`, just used in a place where that would be used > otherwise): > > np.add.reduce([], identity=123) -> [123] > np.add.reduce([1], identity=123) -> [1] > np.nanmean([np.nan], identity=None) -> Error > np.nanmean([np.nan], identity=np.nan) -> np.nan > > It doesn't really make sense, but: > np.subtract.reduce([]) -> Error, since np.substract.identity is None > np.subtract.reduce([], identity=0) -> 0, suppressing the error. > > I am not sure if I am convinced myself, but especially for the nanfuncs > it could maybe provide a way to circumvent the problem somewhat. > Including functions such as np.nanargmin, whose result type does not > even support NaN. Plus it gives an argument allowing for warnings about > changing behaviour. > > Let me try to summarize. To begin with, the environment of the nan functions is rather special. 1) if the array is of not of inexact type, they punt to the non-nan versions. 2) if the array is of inexact type, then out and dtype must be inexact if specified The second assumption guarantees that NaN can be used in the return values. *sum and nansum* These should be consistent so that empty sums are 0. This should cover the empty array case, but will change the behaviour of nansum which currently returns NaN if the array isn't empty but the slice is after NaN removal. *mean and nanmean* In the case of empty arrays, an empty slice, this leads to 0/0. For Python this is always a zero division error, for Numpy this raises a warning and and returns NaN for floats, 0 for integers. Currently mean returns NaN and raises a RuntimeWarning when 0/0 occurs. In the special case where dtype=int, the NaN is cast to integer. Option1 1) mean raise error on 0/0 2) nanmean no warning, return NaN Option2 1) mean raise warning, return NaN (current behavior) 2) nanmean no warning, return NaN Option3 1) mean raise warning, return NaN (current behavior) 2) nanmean raise warning, return NaN *var, std, nanvar, nanstd* 1) if ddof > axis(axes) size, raise error, probably a program bug. 2) If ddof=0, then whatever is the case for mean, nanmean For nanvar, nanstd it is possible that some slice are good, some bad, so option1 1) if n - ddof <= 0 for a slice, raise warning, return NaN for slice option2 1) if n - ddof <= 0 for a slice, don't warn, return NaN for slice Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Mon Jul 15 14:55:04 2013 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 15 Jul 2013 19:55:04 +0100 Subject: [Numpy-discussion] What should be the result in some statistics corner cases? In-Reply-To: References: <1373898856.15619.14.camel@sebastian-laptop> <1373903744.15619.35.camel@sebastian-laptop> Message-ID: On Mon, Jul 15, 2013 at 6:29 PM, Charles R Harris wrote: > Let me try to summarize. To begin with, the environment of the nan functions > is rather special. > > 1) if the array is of not of inexact type, they punt to the non-nan > versions. > 2) if the array is of inexact type, then out and dtype must be inexact if > specified > > The second assumption guarantees that NaN can be used in the return values. The requirement on the 'out' dtype only exists because currently the nan function like to return nan for things like empty arrays, right? If not for that, it could be relaxed? (it's a rather weird requirement, since the whole point of these functions is that they ignore nans, yet they don't always...) > sum and nansum > > These should be consistent so that empty sums are 0. This should cover the > empty array case, but will change the behaviour of nansum which currently > returns NaN if the array isn't empty but the slice is after NaN removal. I agree that returning 0 is the right behaviour, but we might need a FutureWarning period. > mean and nanmean > > In the case of empty arrays, an empty slice, this leads to 0/0. For Python > this is always a zero division error, for Numpy this raises a warning and > and returns NaN for floats, 0 for integers. > > Currently mean returns NaN and raises a RuntimeWarning when 0/0 occurs. In > the special case where dtype=int, the NaN is cast to integer. > > Option1 > 1) mean raise error on 0/0 > 2) nanmean no warning, return NaN > > Option2 > 1) mean raise warning, return NaN (current behavior) > 2) nanmean no warning, return NaN > > Option3 > 1) mean raise warning, return NaN (current behavior) > 2) nanmean raise warning, return NaN I have mixed feelings about the whole np.seterr apparatus, but since it exists, shouldn't we use it for consistency? I.e., just do whatever numpy is set up to do with 0/0? (Which I think means, warn and return NaN by default, but this can be changed.) > var, std, nanvar, nanstd > > 1) if ddof > axis(axes) size, raise error, probably a program bug. > 2) If ddof=0, then whatever is the case for mean, nanmean > > For nanvar, nanstd it is possible that some slice are good, some bad, so > > option1 > 1) if n - ddof <= 0 for a slice, raise warning, return NaN for slice > > option2 > 1) if n - ddof <= 0 for a slice, don't warn, return NaN for slice I don't really have any intuition for these ddof cases. Just raising an error on negative effective dof is pretty defensible and might be the safest -- it's a easy to turn an error into something sensible later if people come up with use cases... -n From josef.pktd at gmail.com Mon Jul 15 16:24:52 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 15 Jul 2013 16:24:52 -0400 Subject: [Numpy-discussion] What should be the result in some statistics corner cases? In-Reply-To: References: <1373898856.15619.14.camel@sebastian-laptop> <1373903744.15619.35.camel@sebastian-laptop> Message-ID: On Mon, Jul 15, 2013 at 2:55 PM, Nathaniel Smith wrote: > On Mon, Jul 15, 2013 at 6:29 PM, Charles R Harris > wrote: >> Let me try to summarize. To begin with, the environment of the nan functions >> is rather special. >> >> 1) if the array is of not of inexact type, they punt to the non-nan >> versions. >> 2) if the array is of inexact type, then out and dtype must be inexact if >> specified >> >> The second assumption guarantees that NaN can be used in the return values. > > The requirement on the 'out' dtype only exists because currently the > nan function like to return nan for things like empty arrays, right? > If not for that, it could be relaxed? (it's a rather weird > requirement, since the whole point of these functions is that they > ignore nans, yet they don't always...) > >> sum and nansum >> >> These should be consistent so that empty sums are 0. This should cover the >> empty array case, but will change the behaviour of nansum which currently >> returns NaN if the array isn't empty but the slice is after NaN removal. > > I agree that returning 0 is the right behaviour, but we might need a > FutureWarning period. > >> mean and nanmean >> >> In the case of empty arrays, an empty slice, this leads to 0/0. For Python >> this is always a zero division error, for Numpy this raises a warning and >> and returns NaN for floats, 0 for integers. >> >> Currently mean returns NaN and raises a RuntimeWarning when 0/0 occurs. In >> the special case where dtype=int, the NaN is cast to integer. >> >> Option1 >> 1) mean raise error on 0/0 >> 2) nanmean no warning, return NaN >> >> Option2 >> 1) mean raise warning, return NaN (current behavior) >> 2) nanmean no warning, return NaN >> >> Option3 >> 1) mean raise warning, return NaN (current behavior) >> 2) nanmean raise warning, return NaN > > I have mixed feelings about the whole np.seterr apparatus, but since > it exists, shouldn't we use it for consistency? I.e., just do whatever > numpy is set up to do with 0/0? (Which I think means, warn and return > NaN by default, but this can be changed.) > >> var, std, nanvar, nanstd >> >> 1) if ddof > axis(axes) size, raise error, probably a program bug. >> 2) If ddof=0, then whatever is the case for mean, nanmean >> >> For nanvar, nanstd it is possible that some slice are good, some bad, so >> >> option1 >> 1) if n - ddof <= 0 for a slice, raise warning, return NaN for slice >> >> option2 >> 1) if n - ddof <= 0 for a slice, don't warn, return NaN for slice > > I don't really have any intuition for these ddof cases. Just raising > an error on negative effective dof is pretty defensible and might be > the safest -- it's a easy to turn an error into something sensible > later if people come up with use cases... related why does reduceat not have empty slices? >>> np.add.reduceat(np.arange(8),[0,4, 5, 7,7]) array([ 6, 4, 11, 7, 7]) I'm in favor of returning nans instead of raising exceptions, except if the return type is int and we cannot cast nan to int. If we get functions into numpy that know how to handle nans, then it would be useful to get the nans, so we can work with them Some cases where this might come in handy are when we iterate over slices of an array that define groups or category levels with possible empty groups *) >>> idx = np.repeat(np.array([0, 1, 2, 3]), [4, 3, 0, 2]) >>> x = np.arange(9) >>> [x[idx==ii].mean() for ii in range(4)] [1.5, 5.0, nan, 7.5] instead of >>> [x[idx==ii].mean() for ii in range(4) if (idx==ii).sum()>0] [1.5, 5.0, 7.5] same for var, I wouldn't have to check that the size is larger than the ddof (whatever that is in the specific case) *) groups could be empty because they were defined for a larger dataset or as a union of different datasets PS: I used mean() above and not var() because >>> np.__version__ '1.5.1' >>> np.mean([]) nan >>> np.var([]) 0.0 Josef > > -n > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From josef.pktd at gmail.com Mon Jul 15 16:44:18 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 15 Jul 2013 16:44:18 -0400 Subject: [Numpy-discussion] What should be the result in some statistics corner cases? In-Reply-To: References: <1373898856.15619.14.camel@sebastian-laptop> <1373903744.15619.35.camel@sebastian-laptop> Message-ID: On Mon, Jul 15, 2013 at 4:24 PM, wrote: > On Mon, Jul 15, 2013 at 2:55 PM, Nathaniel Smith wrote: >> On Mon, Jul 15, 2013 at 6:29 PM, Charles R Harris >> wrote: >>> Let me try to summarize. To begin with, the environment of the nan functions >>> is rather special. >>> >>> 1) if the array is of not of inexact type, they punt to the non-nan >>> versions. >>> 2) if the array is of inexact type, then out and dtype must be inexact if >>> specified >>> >>> The second assumption guarantees that NaN can be used in the return values. >> >> The requirement on the 'out' dtype only exists because currently the >> nan function like to return nan for things like empty arrays, right? >> If not for that, it could be relaxed? (it's a rather weird >> requirement, since the whole point of these functions is that they >> ignore nans, yet they don't always...) >> >>> sum and nansum >>> >>> These should be consistent so that empty sums are 0. This should cover the >>> empty array case, but will change the behaviour of nansum which currently >>> returns NaN if the array isn't empty but the slice is after NaN removal. >> >> I agree that returning 0 is the right behaviour, but we might need a >> FutureWarning period. >> >>> mean and nanmean >>> >>> In the case of empty arrays, an empty slice, this leads to 0/0. For Python >>> this is always a zero division error, for Numpy this raises a warning and >>> and returns NaN for floats, 0 for integers. >>> >>> Currently mean returns NaN and raises a RuntimeWarning when 0/0 occurs. In >>> the special case where dtype=int, the NaN is cast to integer. >>> >>> Option1 >>> 1) mean raise error on 0/0 >>> 2) nanmean no warning, return NaN >>> >>> Option2 >>> 1) mean raise warning, return NaN (current behavior) >>> 2) nanmean no warning, return NaN >>> >>> Option3 >>> 1) mean raise warning, return NaN (current behavior) >>> 2) nanmean raise warning, return NaN >> >> I have mixed feelings about the whole np.seterr apparatus, but since >> it exists, shouldn't we use it for consistency? I.e., just do whatever >> numpy is set up to do with 0/0? (Which I think means, warn and return >> NaN by default, but this can be changed.) >> >>> var, std, nanvar, nanstd >>> >>> 1) if ddof > axis(axes) size, raise error, probably a program bug. >>> 2) If ddof=0, then whatever is the case for mean, nanmean >>> >>> For nanvar, nanstd it is possible that some slice are good, some bad, so >>> >>> option1 >>> 1) if n - ddof <= 0 for a slice, raise warning, return NaN for slice >>> >>> option2 >>> 1) if n - ddof <= 0 for a slice, don't warn, return NaN for slice >> >> I don't really have any intuition for these ddof cases. Just raising >> an error on negative effective dof is pretty defensible and might be >> the safest -- it's a easy to turn an error into something sensible >> later if people come up with use cases... > > related why does reduceat not have empty slices? > >>>> np.add.reduceat(np.arange(8),[0,4, 5, 7,7]) > array([ 6, 4, 11, 7, 7]) > > > I'm in favor of returning nans instead of raising exceptions, except > if the return type is int and we cannot cast nan to int. > > If we get functions into numpy that know how to handle nans, then it > would be useful to get the nans, so we can work with them > > Some cases where this might come in handy are when we iterate over > slices of an array that define groups or category levels with possible > empty groups *) > >>>> idx = np.repeat(np.array([0, 1, 2, 3]), [4, 3, 0, 2]) >>>> x = np.arange(9) >>>> [x[idx==ii].mean() for ii in range(4)] > [1.5, 5.0, nan, 7.5] > > instead of >>>> [x[idx==ii].mean() for ii in range(4) if (idx==ii).sum()>0] > [1.5, 5.0, 7.5] > > same for var, I wouldn't have to check that the size is larger than > the ddof (whatever that is in the specific case) > > *) groups could be empty because they were defined for a larger > dataset or as a union of different datasets background: I wrote several robust anova versions a few weeks ago, that were essentially list comprehension as above. However, I didn't allow nans and didn't check for minimum size. Allowing for empty groups to return nan would mainly be a convenience, since I need to check the group size only once. ddof: tests for proportions have ddof=0, for regular t-test ddof=1, for tests of correlation ddof=2 IIRC so we would need to check for the corresponding minimum size that n-ddof>0 "negative effective dof" doesn't exist, that's np.maximum(n - ddof, 0) which is always non-negative but might result in a zero-division error. :) I don't think making anything conditional on ddof>0 is useful. Josef > > > PS: I used mean() above and not var() because > >>>> np.__version__ > '1.5.1' >>>> np.mean([]) > nan >>>> np.var([]) > 0.0 > > Josef > >> >> -n >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion From charlesr.harris at gmail.com Mon Jul 15 17:34:22 2013 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 15 Jul 2013 15:34:22 -0600 Subject: [Numpy-discussion] What should be the result in some statistics corner cases? In-Reply-To: References: <1373898856.15619.14.camel@sebastian-laptop> <1373903744.15619.35.camel@sebastian-laptop> Message-ID: On Mon, Jul 15, 2013 at 2:44 PM, wrote: > On Mon, Jul 15, 2013 at 4:24 PM, wrote: > > On Mon, Jul 15, 2013 at 2:55 PM, Nathaniel Smith wrote: > >> On Mon, Jul 15, 2013 at 6:29 PM, Charles R Harris > >> wrote: > >>> Let me try to summarize. To begin with, the environment of the nan > functions > >>> is rather special. > >>> > >>> 1) if the array is of not of inexact type, they punt to the non-nan > >>> versions. > >>> 2) if the array is of inexact type, then out and dtype must be inexact > if > >>> specified > >>> > >>> The second assumption guarantees that NaN can be used in the return > values. > >> > >> The requirement on the 'out' dtype only exists because currently the > >> nan function like to return nan for things like empty arrays, right? > >> If not for that, it could be relaxed? (it's a rather weird > >> requirement, since the whole point of these functions is that they > >> ignore nans, yet they don't always...) > >> > >>> sum and nansum > >>> > >>> These should be consistent so that empty sums are 0. This should cover > the > >>> empty array case, but will change the behaviour of nansum which > currently > >>> returns NaN if the array isn't empty but the slice is after NaN > removal. > >> > >> I agree that returning 0 is the right behaviour, but we might need a > >> FutureWarning period. > >> > >>> mean and nanmean > >>> > >>> In the case of empty arrays, an empty slice, this leads to 0/0. For > Python > >>> this is always a zero division error, for Numpy this raises a warning > and > >>> and returns NaN for floats, 0 for integers. > >>> > >>> Currently mean returns NaN and raises a RuntimeWarning when 0/0 > occurs. In > >>> the special case where dtype=int, the NaN is cast to integer. > >>> > >>> Option1 > >>> 1) mean raise error on 0/0 > >>> 2) nanmean no warning, return NaN > >>> > >>> Option2 > >>> 1) mean raise warning, return NaN (current behavior) > >>> 2) nanmean no warning, return NaN > >>> > >>> Option3 > >>> 1) mean raise warning, return NaN (current behavior) > >>> 2) nanmean raise warning, return NaN > >> > >> I have mixed feelings about the whole np.seterr apparatus, but since > >> it exists, shouldn't we use it for consistency? I.e., just do whatever > >> numpy is set up to do with 0/0? (Which I think means, warn and return > >> NaN by default, but this can be changed.) > >> > >>> var, std, nanvar, nanstd > >>> > >>> 1) if ddof > axis(axes) size, raise error, probably a program bug. > >>> 2) If ddof=0, then whatever is the case for mean, nanmean > >>> > >>> For nanvar, nanstd it is possible that some slice are good, some bad, > so > >>> > >>> option1 > >>> 1) if n - ddof <= 0 for a slice, raise warning, return NaN for slice > >>> > >>> option2 > >>> 1) if n - ddof <= 0 for a slice, don't warn, return NaN for slice > >> > >> I don't really have any intuition for these ddof cases. Just raising > >> an error on negative effective dof is pretty defensible and might be > >> the safest -- it's a easy to turn an error into something sensible > >> later if people come up with use cases... > > > > related why does reduceat not have empty slices? > > > >>>> np.add.reduceat(np.arange(8),[0,4, 5, 7,7]) > > array([ 6, 4, 11, 7, 7]) > > > > > > I'm in favor of returning nans instead of raising exceptions, except > > if the return type is int and we cannot cast nan to int. > > > > If we get functions into numpy that know how to handle nans, then it > > would be useful to get the nans, so we can work with them > > > > Some cases where this might come in handy are when we iterate over > > slices of an array that define groups or category levels with possible > > empty groups *) > > > >>>> idx = np.repeat(np.array([0, 1, 2, 3]), [4, 3, 0, 2]) > >>>> x = np.arange(9) > >>>> [x[idx==ii].mean() for ii in range(4)] > > [1.5, 5.0, nan, 7.5] > > > > instead of > >>>> [x[idx==ii].mean() for ii in range(4) if (idx==ii).sum()>0] > > [1.5, 5.0, 7.5] > > > > same for var, I wouldn't have to check that the size is larger than > > the ddof (whatever that is in the specific case) > > > > *) groups could be empty because they were defined for a larger > > dataset or as a union of different datasets > > background: > > I wrote several robust anova versions a few weeks ago, that were > essentially list comprehension as above. However, I didn't allow nans > and didn't check for minimum size. > Allowing for empty groups to return nan would mainly be a convenience, > since I need to check the group size only once. > > ddof: tests for proportions have ddof=0, for regular t-test ddof=1, > for tests of correlation ddof=2 IIRC > so we would need to check for the corresponding minimum size that n-ddof>0 > > "negative effective dof" doesn't exist, that's np.maximum(n - ddof, 0) > which is always non-negative but might result in a zero-division > error. :) > > I don't think making anything conditional on ddof>0 is useful. > > So how would you want it? To summarize the problem areas: 1) What is the sum of an empty slice? NaN or 0? 2) What is mean of empy slice? NaN, NaN and warn, or error? 3) What if n - ddof < 0 for slice? NaN, NaN and warn, or error? 4) What if n - ddof = 0 for slice? NaN, NaN and warn, or error? I'm tending to NaN and warn for 2 -- 3, because, as Nathaniel notes, the warning can be turned into an error by the user. The errstate context manager would be good for that. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Mon Jul 15 17:57:56 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 15 Jul 2013 17:57:56 -0400 Subject: [Numpy-discussion] What should be the result in some statistics corner cases? In-Reply-To: References: <1373898856.15619.14.camel@sebastian-laptop> <1373903744.15619.35.camel@sebastian-laptop> Message-ID: On Mon, Jul 15, 2013 at 5:34 PM, Charles R Harris wrote: > > > On Mon, Jul 15, 2013 at 2:44 PM, wrote: >> >> On Mon, Jul 15, 2013 at 4:24 PM, wrote: >> > On Mon, Jul 15, 2013 at 2:55 PM, Nathaniel Smith wrote: >> >> On Mon, Jul 15, 2013 at 6:29 PM, Charles R Harris >> >> wrote: >> >>> Let me try to summarize. To begin with, the environment of the nan >> >>> functions >> >>> is rather special. >> >>> >> >>> 1) if the array is of not of inexact type, they punt to the non-nan >> >>> versions. >> >>> 2) if the array is of inexact type, then out and dtype must be inexact >> >>> if >> >>> specified >> >>> >> >>> The second assumption guarantees that NaN can be used in the return >> >>> values. >> >> >> >> The requirement on the 'out' dtype only exists because currently the >> >> nan function like to return nan for things like empty arrays, right? >> >> If not for that, it could be relaxed? (it's a rather weird >> >> requirement, since the whole point of these functions is that they >> >> ignore nans, yet they don't always...) >> >> >> >>> sum and nansum >> >>> >> >>> These should be consistent so that empty sums are 0. This should cover >> >>> the >> >>> empty array case, but will change the behaviour of nansum which >> >>> currently >> >>> returns NaN if the array isn't empty but the slice is after NaN >> >>> removal. >> >> >> >> I agree that returning 0 is the right behaviour, but we might need a >> >> FutureWarning period. >> >> >> >>> mean and nanmean >> >>> >> >>> In the case of empty arrays, an empty slice, this leads to 0/0. For >> >>> Python >> >>> this is always a zero division error, for Numpy this raises a warning >> >>> and >> >>> and returns NaN for floats, 0 for integers. >> >>> >> >>> Currently mean returns NaN and raises a RuntimeWarning when 0/0 >> >>> occurs. In >> >>> the special case where dtype=int, the NaN is cast to integer. >> >>> >> >>> Option1 >> >>> 1) mean raise error on 0/0 >> >>> 2) nanmean no warning, return NaN >> >>> >> >>> Option2 >> >>> 1) mean raise warning, return NaN (current behavior) >> >>> 2) nanmean no warning, return NaN >> >>> >> >>> Option3 >> >>> 1) mean raise warning, return NaN (current behavior) >> >>> 2) nanmean raise warning, return NaN >> >> >> >> I have mixed feelings about the whole np.seterr apparatus, but since >> >> it exists, shouldn't we use it for consistency? I.e., just do whatever >> >> numpy is set up to do with 0/0? (Which I think means, warn and return >> >> NaN by default, but this can be changed.) >> >> >> >>> var, std, nanvar, nanstd >> >>> >> >>> 1) if ddof > axis(axes) size, raise error, probably a program bug. >> >>> 2) If ddof=0, then whatever is the case for mean, nanmean >> >>> >> >>> For nanvar, nanstd it is possible that some slice are good, some bad, >> >>> so >> >>> >> >>> option1 >> >>> 1) if n - ddof <= 0 for a slice, raise warning, return NaN for slice >> >>> >> >>> option2 >> >>> 1) if n - ddof <= 0 for a slice, don't warn, return NaN for slice >> >> >> >> I don't really have any intuition for these ddof cases. Just raising >> >> an error on negative effective dof is pretty defensible and might be >> >> the safest -- it's a easy to turn an error into something sensible >> >> later if people come up with use cases... >> > >> > related why does reduceat not have empty slices? >> > >> >>>> np.add.reduceat(np.arange(8),[0,4, 5, 7,7]) >> > array([ 6, 4, 11, 7, 7]) >> > >> > >> > I'm in favor of returning nans instead of raising exceptions, except >> > if the return type is int and we cannot cast nan to int. >> > >> > If we get functions into numpy that know how to handle nans, then it >> > would be useful to get the nans, so we can work with them >> > >> > Some cases where this might come in handy are when we iterate over >> > slices of an array that define groups or category levels with possible >> > empty groups *) >> > >> >>>> idx = np.repeat(np.array([0, 1, 2, 3]), [4, 3, 0, 2]) >> >>>> x = np.arange(9) >> >>>> [x[idx==ii].mean() for ii in range(4)] >> > [1.5, 5.0, nan, 7.5] >> > >> > instead of >> >>>> [x[idx==ii].mean() for ii in range(4) if (idx==ii).sum()>0] >> > [1.5, 5.0, 7.5] >> > >> > same for var, I wouldn't have to check that the size is larger than >> > the ddof (whatever that is in the specific case) >> > >> > *) groups could be empty because they were defined for a larger >> > dataset or as a union of different datasets >> >> background: >> >> I wrote several robust anova versions a few weeks ago, that were >> essentially list comprehension as above. However, I didn't allow nans >> and didn't check for minimum size. >> Allowing for empty groups to return nan would mainly be a convenience, >> since I need to check the group size only once. >> >> ddof: tests for proportions have ddof=0, for regular t-test ddof=1, >> for tests of correlation ddof=2 IIRC >> so we would need to check for the corresponding minimum size that n-ddof>0 >> >> "negative effective dof" doesn't exist, that's np.maximum(n - ddof, 0) >> which is always non-negative but might result in a zero-division >> error. :) >> >> I don't think making anything conditional on ddof>0 is useful. >> > > So how would you want it? > > To summarize the problem areas: > > 1) What is the sum of an empty slice? NaN or 0? 0 as it is now for sum, (including 0 for nansum with no valid entries). > 2) What is mean of empy slice? NaN, NaN and warn, or error? > 3) What if n - ddof < 0 for slice? NaN, NaN and warn, or error? > 4) What if n - ddof = 0 for slice? NaN, NaN and warn, or error? > > I'm tending to NaN and warn for 2 -- 3, because, as Nathaniel notes, the > warning can be turned into an error by the user. The errstate context > manager would be good for that. Yes, That's what I would prefer also, NaN and ZeroDivisionError, for 2-4, including mean, var and std, for both nan and non-nan functions. with the extra argument that 3) and 4) are the same case (except in polyfit :) Josef > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From charlesr.harris at gmail.com Mon Jul 15 18:03:01 2013 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 15 Jul 2013 16:03:01 -0600 Subject: [Numpy-discussion] What should be the result in some statistics corner cases? In-Reply-To: References: <1373898856.15619.14.camel@sebastian-laptop> <1373903744.15619.35.camel@sebastian-laptop> Message-ID: On Mon, Jul 15, 2013 at 3:57 PM, wrote: > On Mon, Jul 15, 2013 at 5:34 PM, Charles R Harris > wrote: > > > > > > On Mon, Jul 15, 2013 at 2:44 PM, wrote: > >> > >> On Mon, Jul 15, 2013 at 4:24 PM, wrote: > >> > On Mon, Jul 15, 2013 at 2:55 PM, Nathaniel Smith > wrote: > >> >> On Mon, Jul 15, 2013 at 6:29 PM, Charles R Harris > >> >> wrote: > >> >>> Let me try to summarize. To begin with, the environment of the nan > >> >>> functions > >> >>> is rather special. > >> >>> > >> >>> 1) if the array is of not of inexact type, they punt to the non-nan > >> >>> versions. > >> >>> 2) if the array is of inexact type, then out and dtype must be > inexact > >> >>> if > >> >>> specified > >> >>> > >> >>> The second assumption guarantees that NaN can be used in the return > >> >>> values. > >> >> > >> >> The requirement on the 'out' dtype only exists because currently the > >> >> nan function like to return nan for things like empty arrays, right? > >> >> If not for that, it could be relaxed? (it's a rather weird > >> >> requirement, since the whole point of these functions is that they > >> >> ignore nans, yet they don't always...) > >> >> > >> >>> sum and nansum > >> >>> > >> >>> These should be consistent so that empty sums are 0. This should > cover > >> >>> the > >> >>> empty array case, but will change the behaviour of nansum which > >> >>> currently > >> >>> returns NaN if the array isn't empty but the slice is after NaN > >> >>> removal. > >> >> > >> >> I agree that returning 0 is the right behaviour, but we might need a > >> >> FutureWarning period. > >> >> > >> >>> mean and nanmean > >> >>> > >> >>> In the case of empty arrays, an empty slice, this leads to 0/0. For > >> >>> Python > >> >>> this is always a zero division error, for Numpy this raises a > warning > >> >>> and > >> >>> and returns NaN for floats, 0 for integers. > >> >>> > >> >>> Currently mean returns NaN and raises a RuntimeWarning when 0/0 > >> >>> occurs. In > >> >>> the special case where dtype=int, the NaN is cast to integer. > >> >>> > >> >>> Option1 > >> >>> 1) mean raise error on 0/0 > >> >>> 2) nanmean no warning, return NaN > >> >>> > >> >>> Option2 > >> >>> 1) mean raise warning, return NaN (current behavior) > >> >>> 2) nanmean no warning, return NaN > >> >>> > >> >>> Option3 > >> >>> 1) mean raise warning, return NaN (current behavior) > >> >>> 2) nanmean raise warning, return NaN > >> >> > >> >> I have mixed feelings about the whole np.seterr apparatus, but since > >> >> it exists, shouldn't we use it for consistency? I.e., just do > whatever > >> >> numpy is set up to do with 0/0? (Which I think means, warn and return > >> >> NaN by default, but this can be changed.) > >> >> > >> >>> var, std, nanvar, nanstd > >> >>> > >> >>> 1) if ddof > axis(axes) size, raise error, probably a program bug. > >> >>> 2) If ddof=0, then whatever is the case for mean, nanmean > >> >>> > >> >>> For nanvar, nanstd it is possible that some slice are good, some > bad, > >> >>> so > >> >>> > >> >>> option1 > >> >>> 1) if n - ddof <= 0 for a slice, raise warning, return NaN for slice > >> >>> > >> >>> option2 > >> >>> 1) if n - ddof <= 0 for a slice, don't warn, return NaN for slice > >> >> > >> >> I don't really have any intuition for these ddof cases. Just raising > >> >> an error on negative effective dof is pretty defensible and might be > >> >> the safest -- it's a easy to turn an error into something sensible > >> >> later if people come up with use cases... > >> > > >> > related why does reduceat not have empty slices? > >> > > >> >>>> np.add.reduceat(np.arange(8),[0,4, 5, 7,7]) > >> > array([ 6, 4, 11, 7, 7]) > >> > > >> > > >> > I'm in favor of returning nans instead of raising exceptions, except > >> > if the return type is int and we cannot cast nan to int. > >> > > >> > If we get functions into numpy that know how to handle nans, then it > >> > would be useful to get the nans, so we can work with them > >> > > >> > Some cases where this might come in handy are when we iterate over > >> > slices of an array that define groups or category levels with possible > >> > empty groups *) > >> > > >> >>>> idx = np.repeat(np.array([0, 1, 2, 3]), [4, 3, 0, 2]) > >> >>>> x = np.arange(9) > >> >>>> [x[idx==ii].mean() for ii in range(4)] > >> > [1.5, 5.0, nan, 7.5] > >> > > >> > instead of > >> >>>> [x[idx==ii].mean() for ii in range(4) if (idx==ii).sum()>0] > >> > [1.5, 5.0, 7.5] > >> > > >> > same for var, I wouldn't have to check that the size is larger than > >> > the ddof (whatever that is in the specific case) > >> > > >> > *) groups could be empty because they were defined for a larger > >> > dataset or as a union of different datasets > >> > >> background: > >> > >> I wrote several robust anova versions a few weeks ago, that were > >> essentially list comprehension as above. However, I didn't allow nans > >> and didn't check for minimum size. > >> Allowing for empty groups to return nan would mainly be a convenience, > >> since I need to check the group size only once. > >> > >> ddof: tests for proportions have ddof=0, for regular t-test ddof=1, > >> for tests of correlation ddof=2 IIRC > >> so we would need to check for the corresponding minimum size that > n-ddof>0 > >> > >> "negative effective dof" doesn't exist, that's np.maximum(n - ddof, 0) > >> which is always non-negative but might result in a zero-division > >> error. :) > >> > >> I don't think making anything conditional on ddof>0 is useful. > >> > > > > So how would you want it? > > > > To summarize the problem areas: > > > > 1) What is the sum of an empty slice? NaN or 0? > 0 as it is now for sum, (including 0 for nansum with no valid entries). > > > 2) What is mean of empy slice? NaN, NaN and warn, or error? > > 3) What if n - ddof < 0 for slice? NaN, NaN and warn, or error? > > 4) What if n - ddof = 0 for slice? NaN, NaN and warn, or error? > > > > I'm tending to NaN and warn for 2 -- 3, because, as Nathaniel notes, the > > warning can be turned into an error by the user. The errstate context > > manager would be good for that. > > Yes, That's what I would prefer also, NaN and ZeroDivisionError, for > 2-4, including mean, var and std, for both nan and non-nan functions. > > with the extra argument that 3) and 4) are the same case (except in > polyfit :) > One extra possibility with the nan functions could be a new keyword, error, which would turn warnings into errors. But that might be a bit much. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at sun.ac.za Mon Jul 15 20:22:26 2013 From: stefan at sun.ac.za (=?iso-8859-1?Q?St=E9fan?= van der Walt) Date: Tue, 16 Jul 2013 02:22:26 +0200 Subject: [Numpy-discussion] What should be the result in some statistics corner cases? In-Reply-To: References: Message-ID: <20130716002226.GB864@shinobi> On Mon, 15 Jul 2013 08:33:47 -0600, Charles R Harris wrote: > On Mon, Jul 15, 2013 at 8:25 AM, Benjamin Root wrote: > > > This is going to need to be heavily documented with doctests. Also, just > > to clarify, are we talking about a ValueError for doing a nansum on an > > empty array as well, or will that now return a zero? > > > > > I was going to leave nansum as is, as it seems that the result was by > choice rather than by accident. That makes sense--I like Sebastian's explanation whereby operations that define an identity yields that upon empty input. St?fan From charlesr.harris at gmail.com Mon Jul 15 20:46:33 2013 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 15 Jul 2013 18:46:33 -0600 Subject: [Numpy-discussion] What should be the result in some statistics corner cases? In-Reply-To: <20130716002226.GB864@shinobi> References: <20130716002226.GB864@shinobi> Message-ID: On Mon, Jul 15, 2013 at 6:22 PM, St?fan van der Walt wrote: > On Mon, 15 Jul 2013 08:33:47 -0600, Charles R Harris wrote: > > On Mon, Jul 15, 2013 at 8:25 AM, Benjamin Root wrote: > > > > > This is going to need to be heavily documented with doctests. Also, > just > > > to clarify, are we talking about a ValueError for doing a nansum on an > > > empty array as well, or will that now return a zero? > > > > > > > > I was going to leave nansum as is, as it seems that the result was by > > choice rather than by accident. > > That makes sense--I like Sebastian's explanation whereby operations that > define an identity yields that upon empty input. > So nansum should return zeros rather than the current NaNs? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at sun.ac.za Mon Jul 15 20:55:04 2013 From: stefan at sun.ac.za (=?iso-8859-1?Q?St=E9fan?= van der Walt) Date: Tue, 16 Jul 2013 02:55:04 +0200 Subject: [Numpy-discussion] What should be the result in some statistics corner cases? In-Reply-To: References: <20130716002226.GB864@shinobi> Message-ID: <20130716005504.GA2199@shinobi> On Mon, 15 Jul 2013 18:46:33 -0600, Charles R Harris wrote: > So nansum should return zeros rather than the current NaNs? Yes, my feeling is that nansum([]) should be 0. St?fan From ben.root at ou.edu Mon Jul 15 20:58:48 2013 From: ben.root at ou.edu (Benjamin Root) Date: Mon, 15 Jul 2013 20:58:48 -0400 Subject: [Numpy-discussion] What should be the result in some statistics corner cases? In-Reply-To: References: <20130716002226.GB864@shinobi> Message-ID: To add a bit of context to the question of nansum on empty results, we currently differ from MATLAB and R in this respect, they return zero no matter what. Personally, I think it should return zero, but our current behavior of returning nans has existed for a long time. Personally, I think we need a deprecation warning and possibly wait to change this until 2.0, with plenty of warning that this will change. Ben Root On Jul 15, 2013 8:46 PM, "Charles R Harris" wrote: > > > On Mon, Jul 15, 2013 at 6:22 PM, St?fan van der Walt wrote: > >> On Mon, 15 Jul 2013 08:33:47 -0600, Charles R Harris wrote: >> > On Mon, Jul 15, 2013 at 8:25 AM, Benjamin Root wrote: >> > >> > > This is going to need to be heavily documented with doctests. Also, >> just >> > > to clarify, are we talking about a ValueError for doing a nansum on an >> > > empty array as well, or will that now return a zero? >> > > >> > > >> > I was going to leave nansum as is, as it seems that the result was by >> > choice rather than by accident. >> >> That makes sense--I like Sebastian's explanation whereby operations that >> define an identity yields that upon empty input. >> > > So nansum should return zeros rather than the current NaNs? > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Catherine.M.Moroney at jpl.nasa.gov Mon Jul 15 21:03:26 2013 From: Catherine.M.Moroney at jpl.nasa.gov (Moroney, Catherine M (398D)) Date: Tue, 16 Jul 2013 01:03:26 +0000 Subject: [Numpy-discussion] retrieving original array locations from 2d argsort Message-ID: <36D0B3E2-E2CD-4622-89CE-E17D3737A7FE@jpl.nasa.gov> I know that there's an easy way to solve this problem, but I'm not sufficiently knowledgeable about numpy indexing to figure it out. Here is the problem: Take a 2-d array a, of any size. Sort it in ascending order using, I presume, argsort. Step through the sorted array in order, and for each element in the sorted array, retrieve what the corresponding (line, sample) indices in the original array are. For instance: a = numpy.arange(0, 16).reshape(4,4) a[0,:] = -1*numpy.arange(0,4) a[2,:] = -1*numpy.arange(4,8) asort = numpy.sort(a, axis=None) for idx in xrange(0, asort.size): element = asort[idx] !! Find the line and sample location in a that corresponds to the i-th element in assort Thank-you for your help, Catherine From warren.weckesser at gmail.com Mon Jul 15 21:23:30 2013 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Mon, 15 Jul 2013 21:23:30 -0400 Subject: [Numpy-discussion] retrieving original array locations from 2d argsort In-Reply-To: <36D0B3E2-E2CD-4622-89CE-E17D3737A7FE@jpl.nasa.gov> References: <36D0B3E2-E2CD-4622-89CE-E17D3737A7FE@jpl.nasa.gov> Message-ID: On 7/15/13, Moroney, Catherine M (398D) wrote: > I know that there's an easy way to solve this problem, but I'm not > sufficiently knowledgeable > about numpy indexing to figure it out. > > Here is the problem: > > Take a 2-d array a, of any size. > Sort it in ascending order using, I presume, argsort. > Step through the sorted array in order, and for each element in the sorted > array, > retrieve what the corresponding (line, sample) indices in the original array > are. > > For instance: > > a = numpy.arange(0, 16).reshape(4,4) > a[0,:] = -1*numpy.arange(0,4) > a[2,:] = -1*numpy.arange(4,8) > > asort = numpy.sort(a, axis=None) > for idx in xrange(0, asort.size): > element = asort[idx] > !! Find the line and sample location in a that corresponds to the > i-th element in assort > One way is to use argsort and `numpy.unravel_index` to recover the original 2D indices: import numpy a = numpy.arange(0, 16).reshape(4,4) a[0,:] = -1*numpy.arange(0,4) a[2,:] = -1*numpy.arange(4,8) flat_sort_indices = numpy.argsort(a, axis=None) original_indices = numpy.unravel_index(flat_sort_indices, a.shape) print " i j a[i,j]" for i, j in zip(*original_indices): element = a[i,j] print "%3d %3d %6d" % (i, j, element) Warren > Thank-you for your help, > > Catherine > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From charlesr.harris at gmail.com Mon Jul 15 21:50:34 2013 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 15 Jul 2013 19:50:34 -0600 Subject: [Numpy-discussion] What should be the result in some statistics corner cases? In-Reply-To: References: <20130716002226.GB864@shinobi> Message-ID: On Mon, Jul 15, 2013 at 6:58 PM, Benjamin Root wrote: > To add a bit of context to the question of nansum on empty results, we > currently differ from MATLAB and R in this respect, they return zero no > matter what. Personally, I think it should return zero, but our current > behavior of returning nans has existed for a long time. > > Personally, I think we need a deprecation warning and possibly wait to > change this until 2.0, with plenty of warning that this will change. > Waiting for the mythical 2.0 probably won't work ;) We also need to give folks a way to adjust ahead of time. I think the easiest way to do that is with an extra keyword, say nanok, with True as the starting default, then later we can make False the default. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Tue Jul 16 01:36:51 2013 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Tue, 16 Jul 2013 07:36:51 +0200 Subject: [Numpy-discussion] What should be the result in some statistics corner cases? In-Reply-To: References: <20130716002226.GB864@shinobi> Message-ID: On Tue, Jul 16, 2013 at 3:50 AM, Charles R Harris wrote: > > > On Mon, Jul 15, 2013 at 6:58 PM, Benjamin Root wrote: > >> To add a bit of context to the question of nansum on empty results, we >> currently differ from MATLAB and R in this respect, they return zero no >> matter what. Personally, I think it should return zero, but our current >> behavior of returning nans has existed for a long time. >> >> Personally, I think we need a deprecation warning and possibly wait to >> change this until 2.0, with plenty of warning that this will change. >> > Waiting for the mythical 2.0 probably won't work ;) We also need to give > folks a way to adjust ahead of time. I think the easiest way to do that is > with an extra keyword, say nanok, with True as the starting default, then > later we can make False the default. > No special keywords to work around behavior change please, it doesn't work well and you end up with a keyword you don't really want. Why not just give a FutureWarning in 1.8 and change to returning zero in 1.9? Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From arinkverma at gmail.com Tue Jul 16 06:34:47 2013 From: arinkverma at gmail.com (Arink Verma) Date: Tue, 16 Jul 2013 16:04:47 +0530 Subject: [Numpy-discussion] Speedup by avoiding memory alloc twice in scalar array Message-ID: Hi, I am working on performance parity between numpy scalar/small array and python array as GSOC mentored By Charles. Currently I am looking at PyArray_Return, which allocate separate memory just for scalar return. Unlike python which allocate memory once for returning result of scalar operations; numpy calls malloc twice once for the array object itself, and a second time for the array data. These memory allocations are happening in PyArray_NewFromDescr and PyArray_Scalar. Stashing both within a single allocation would be more efficient. In, PyArray_Scalar, new struct (PyLongScalarObject) need allocation in case of scalar arrays. Instead, can we just some how convert/cast PyArrayObject to PyLongScalarObject.?? -- Arink Verma www.arinkverma.in -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Tue Jul 16 07:10:30 2013 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 16 Jul 2013 12:10:30 +0100 Subject: [Numpy-discussion] Speedup by avoiding memory alloc twice in scalar array In-Reply-To: References: Message-ID: On 16 Jul 2013 11:35, "Arink Verma" wrote: > > Hi, > > I am working on performance parity between numpy scalar/small array and python array as GSOC mentored By Charles. > > Currently I am looking at PyArray_Return, which allocate separate memory just for scalar return. Unlike python which allocate memory once for returning result of scalar operations; numpy calls malloc twice once for the array object itself, and a second time for the array data. > > These memory allocations are happening in PyArray_NewFromDescr and PyArray_Scalar. Stashing both within a single allocation would be more efficient. > In, PyArray_Scalar, new struct (PyLongScalarObject) need allocation in case of scalar arrays. Instead, can we just some how convert/cast PyArrayObject to > PyLongScalarObject.?? I think there are more than 2 mallocs you're talking about here? Each ndarray does two mallocs, for the obj and buffer. These could be combined into 1 - just allocate the total size and do some pointer arithmetic, then set OWNDATA to false. Converting array to scalar does more allocations. I doubt there's a way to avoid these, but can't say for sure (on my phone now). In any case the idea of the project is to make scalars obsolete by making arrays competitive, right? So no need to go optimizing the competition ;-). (And more seriously, this slowdown *only* exists because of the array/scalar split, so ignoring it is fair.) In the bigger picture, these are pretty tiny optimizations, aren't they? In the quick profiling I did a while ago, it looked like there was a lot of much bigger low-hanging fruit, and fiddling around with one malloc versus two isn't going to do much if we're still wasting an order of magnitude more time in inefficient loop selection and unnecessary writes to the FP control word? -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From arinkverma at gmail.com Tue Jul 16 09:34:57 2013 From: arinkverma at gmail.com (Arink Verma) Date: Tue, 16 Jul 2013 19:04:57 +0530 Subject: [Numpy-discussion] Speedup by avoiding memory alloc twice in scalar array In-Reply-To: References: Message-ID: >Each ndarray does two mallocs, for the obj and buffer. These could be combined into 1 - just allocate the total size and do some pointer >arithmetic, then set OWNDATA to false. So, that two mallocs has been mentioned in project introduction. I got that wrong. >magnitude more time in inefficient loop selection and unnecessary writes to the FP control word? loop selection, contribute around 2~3% in time. I implemented cache with PyThreadState_GetDict() but it didnt help. Even generating prepopulated dict/list in code_generator/generate_umath.py is not helping, Here, it the distribution of time, on addition operations. All memory related and BuildValue operations cost more than 7%, rest looping ones are around 2-3%: - PyUFunc_AddititonTypeResolver(7.6%) - *SimpleBinaryOperationTypeResolver(6.2%)* - *execute_legacy_ufunc_loop(20.7%)* - trivial_three_operand_loop(8.6%) ,this will be around 3.4% when pr # 3521 get merged - *PYArray_NewFromDescr(7.3%)* - PyUFunc_DefaultLegacyInnerLoopSelector(2.5%) - PyUFunc_GetPyValues(12.0%) - *_extract_pyvals(9.2%)* - *PyArray_Return(14.3%)* -- Arink Verma www.arinkverma.in -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1-array_cast.svg Type: image/svg+xml Size: 92040 bytes Desc: not available URL: From njs at pobox.com Tue Jul 16 11:55:58 2013 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 16 Jul 2013 16:55:58 +0100 Subject: [Numpy-discussion] Speedup by avoiding memory alloc twice in scalar array In-Reply-To: References: Message-ID: On Tue, Jul 16, 2013 at 2:34 PM, Arink Verma wrote: > >Each ndarray does two mallocs, for the obj and buffer. These could be > combined into 1 - just allocate the total size and do some pointer > >arithmetic, then set OWNDATA to false. > So, that two mallocs has been mentioned in project introduction. I got > that wrong. > On further thought/reading the code, it appears to be more complicated than that, actually. It looks like (for a non-scalar array) we have 2 calls to PyMem_Malloc: 1 for the array object itself, and one for the shapes + strides. And, one call to regular-old malloc: for the data buffer. (Mysteriously, shapes + strides together have 2*ndim elements, but to hold them we allocate a memory region sized to hold 3*ndim elements. I'm not sure why.) And contrary to what I said earlier, this is about as optimized as it can be without breaking ABI. We need at least 2 calls to malloc/PyMem_Malloc, because the shapes+strides may need to be resized without affecting the much larger data area. But it's tempting to allocate the array object and the data buffer in a single memory region, like I suggested earlier. And this would ALMOST work. But, it turns out there is code out there which assumes (whether wisely or not) that you can swap around which data buffer a given PyArrayObject refers to (hi Theano!). And supporting this means that data buffers and PyArrayObjects need to be in separate memory regions. >magnitude more time in inefficient loop selection and unnecessary writes > to the FP control word? > loop selection, contribute around 2~3% in time. I implemented cache with PyThreadState_GetDict() > but it didnt help. > Even generating prepopulated dict/list in code_generator/generate_umath.py is > not helping, > > > Here, it the distribution of time, on addition operations. All memory > related and BuildValue operations cost more than 7%, rest looping ones are > around 2-3%: > > - PyUFunc_AddititonTypeResolver(7.6%) > - *SimpleBinaryOperationTypeResolver(6.2%)* > > > - *execute_legacy_ufunc_loop(20.7%)* > - trivial_three_operand_loop(8.6%) ,this will be around 3.4% when pr # > 3521 get merged > - *PYArray_NewFromDescr(7.3%)* > - PyUFunc_DefaultLegacyInnerLoopSelector(2.5%) > > > - PyUFunc_GetPyValues(12.0%) > - *_extract_pyvals(9.2%)* > - *PyArray_Return(14.3%)* > > Hmm, you prodded me into running those numbers again to see :-) At http://www.arinkverma.in/2013/06/finding-bottleneck-in-pythonnumpy.htmlyou say that you're using a Python compiled with --with-pydebug. Is this true? If so then stop! You want numpy compiled with generic debugging information ("-g" on gcc), and maybe it helps to have Python compiled with "-g" as well. But --with-pydebug goes much further -- it actually changes the Python interpreter in many ways to add lots of expensive self-checks. On my machine simple operations like "[]" (allocate a list) or "1.0 + 1.0" go about 4x slower when I use Ubuntu's python-dbg package (which is compiled with --with-pydebug). You can't trust speed measurements you get from a --with-pydebug build. Anyway, I'm using 64-bit python2.7 from Ubuntu's repo, self-compiled numpy master, with this measurement code: import ctypes profiler = ctypes.CDLL("libprofiler.so.0") def loop(n): import numpy as np print "Numpy:", np.__version__ x = np.asarray([1.0, 2.0]) for i in xrange(n): x + x profiler.ProfilerStart("/tmp/master-array-float64-add.prof") loop(10000000) profiler.ProfilerStop() Graph attached. Notice: - because my benchmark has a 2-element array instead of a scalar array, the special-case scalar return logic (PyArray_Return etc.) disappears. This makes all percentages a bit higher in my graph, because the operation is overall faster. - PyArray_NewFromDescr does indeed take 11.6% of the time, but it's not clear why. Half that time is directly inside PyArray_NewFromDescr, not in any sub-calls to malloc-related functions. Also, you see a lot more time in array_alloc than I do, which may be caused by --with-pydebug. Taking a closer look with google-pprof --disasm=PyArray_NewFromDescr (also attached), it looks like the major cost here is, bizarrely enough, the calculation of the array size?! Out of 338 cumulative samples in this function, I count 175 that are associated with various div/mul instructions, while all the mallocs together take only 164 (= 5.6% of total time). This is pretty bizarre for a bunch of 1-dimensional 2-element arrays!? - PyUFunc_AdditionTypeResolver takes 10.9% of the time, and PyUFunc_DefaultLegacyInnerLoopSelector takes another 4.2% of the time, and this pretty absurd considering that we're talking about locating the float64 + float64 loop, which should not require any complicated logic. This should be like 0.1% or something. I'm not surprised that PyThreadState_GetDict() doesn't help -- doing dict lookups was probably was more expensive than the thing you replaced! But some sort of simple table lookup scheme that reduces loop lookup to chasing a few pointers should be totally doable. - We're spending 13.6% of the time in PyUFunc_getfperr. I'm pretty sure that a lot of this is totally wasted time, because we implement both 'set' and 'clear' operations as 'set+clear', making them twice as costly as necessary. (Eventually it would be even better if we could disable this logic entirely for integer arrays, and for when the user has turned off fp error reporting. But neither of these would help for this simple float+float benchmark.) - _extract_pyvals and PyUFunc_GetPyValues (not sure why they aren't linked in my graph, but they seem to be the same code) together use >11% of time. This is also completely silly -- all this time is spent on doing elaborate stuff to look up entries in a python dict, extract them, and convert them into, like, some C level bitmasks. And then doing that again and again on every operation. Instead we should convert this stuff to a C values once, when they're set in the first place, and stash those C values directly into a thread-local variable. See PyThread_*_key in pythread.h for a raw TLS implementation that's always available (and which is what PyThreadState_GetDict() is built on top of). The documentation is in the Python source distribution in comments in Python/thread.c. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ROUTINE ====================== PyArray_NewFromDescr 168 505 samples (flat, cumulative) 17.4% of total -------------------- /tmp/pip-UN1TwQ-build/numpy/core/src/multiarray/ctors.c 3 5 838: { 1 1 4daf0: push %r15 . . 4daf2: mov %edx,%r11d . . 4daf5: mov %rsi,%r15 . . 4daf8: push %r14 . . 4dafa: push %r13 . . 4dafc: push %r12 . . 4dafe: push %rbp . 1 4daff: mov %rcx,%rbp 1 2 4db02: push %rbx 1 1 4db03: mov %r8,%rbx . . 4db06: sub $0x248,%rsp . . 845: if (descr->subarray) { . . 4db0d: mov 0x28(%rsi),%r13 . . 838: { . . 4db11: mov %rdi,0x28(%rsp) . . 4db16: mov %r9,0x30(%rsp) . . 845: if (descr->subarray) { . . 4db1b: test %r13,%r13 . . 4db1e: je 4dcb0 . . 849: memcpy(newdims, dims, nd*sizeof(npy_intp)); . . 4db24: movslq %edx,%r12 -------------------- /usr/include/x86_64-linux-gnu/bits/string3.h . . 52: return __builtin___memcpy_chk (__dest, __src, __len, __bos0 (__dest)); . . 4db27: lea 0x40(%rsp),%rdi . . 4db2c: mov $0x200,%ecx -------------------- /tmp/pip-UN1TwQ-build/numpy/core/src/multiarray/ctors.c . . 849: memcpy(newdims, dims, nd*sizeof(npy_intp)); . . 4db31: shl $0x3,%r12 -------------------- /usr/include/x86_64-linux-gnu/bits/string3.h . . 52: return __builtin___memcpy_chk (__dest, __src, __len, __bos0 (__dest)); . . 4db35: mov %rbp,%rsi . . 4db38: mov %r11d,0x10(%rsp) . . 4db3d: mov %r12,%rdx . . 4db40: callq 1a1a0 <__memcpy_chk at plt> -------------------- /tmp/pip-UN1TwQ-build/numpy/core/src/multiarray/ctors.c . . 850: if (strides) { . . 4db45: test %rbx,%rbx . . 848: npy_intp *newstrides = NULL; . . 4db48: movq $0x0,0x20(%rsp) . . 850: if (strides) { . . 4db51: mov 0x10(%rsp),%r11d . . 4db56: je 4db7d . . 851: newstrides = newdims + NPY_MAXDIMS; . . 4db58: lea 0x140(%rsp),%rbp -------------------- /usr/include/x86_64-linux-gnu/bits/string3.h . . 52: return __builtin___memcpy_chk (__dest, __src, __len, __bos0 (__dest)); . . 4db60: mov $0x100,%ecx . . 4db65: mov %r12,%rdx . . 4db68: mov %rbx,%rsi . . 4db6b: mov %rbp,%rdi . . 4db6e: callq 1a1a0 <__memcpy_chk at plt> -------------------- /tmp/pip-UN1TwQ-build/numpy/core/src/multiarray/ctors.c . . 851: newstrides = newdims + NPY_MAXDIMS; . . 4db73: mov 0x10(%rsp),%r11d . . 4db78: mov %rbp,0x20(%rsp) . . 228: tuple = PyTuple_Check(old->subarray->shape); . . 4db7d: mov 0x8(%r13),%rdi . . 227: mydim = newdims + oldnd; . . 4db81: lea 0x40(%rsp),%r14 . . 224: *des = old->subarray->base; . . 4db86: mov 0x0(%r13),%rbp . . 227: mydim = newdims + oldnd; . . 4db8a: add %r12,%r14 . . 228: tuple = PyTuple_Check(old->subarray->shape); . . 4db8d: mov 0x8(%rdi),%rax . . 229: if (tuple) { . . 4db91: testb $0x4,0xab(%rax) . . 4db98: jne 4dc60 . . 237: newnd = oldnd + numnew; . . 4db9e: add $0x1,%r11d . . 238: if (newnd > NPY_MAXDIMS) { . . 4dba2: cmp $0x20,%r11d . . 237: newnd = oldnd + numnew; . . 4dba6: mov %r11d,0x3c(%rsp) . . 238: if (newnd > NPY_MAXDIMS) { . . 4dbab: jg 4dbf6 . . 248: mydim[0] = (npy_intp) PyInt_AsLong(old->subarray->shape); . . 4dbad: callq 19970 . . 233: numnew = 1; . . 4dbb2: mov $0x1,%ebx . . 248: mydim[0] = (npy_intp) PyInt_AsLong(old->subarray->shape); . . 4dbb7: mov %rax,(%r14) . . 251: if (newstrides) { . . 4dbba: cmpq $0x0,0x20(%rsp) . . 4dbc0: je 4dbf6 . . 255: mystrides = newstrides + oldnd; . . 4dbc2: add 0x20(%rsp),%r12 . . 258: for (i = numnew - 1; i >= 0; i--) { . . 4dbc7: sub $0x1,%ebx . . 257: tempsize = (*des)->elsize; . . 4dbca: movslq 0x20(%rbp),%rdx . . 258: for (i = numnew - 1; i >= 0; i--) { . . 4dbce: js 4dbf6 . . 260: tempsize *= mydim[i] ? mydim[i] : 1; . . 4dbd0: mov $0x1,%esi . . 4dbd5: nopl (%rax) . . 259: mystrides[i] = tempsize; . . 4dbd8: movslq %ebx,%rax . . 4dbdb: mov %rdx,(%r12,%rax,8) . . 260: tempsize *= mydim[i] ? mydim[i] : 1; . . 4dbdf: mov (%r14,%rax,8),%rax . . 4dbe3: test %rax,%rax . . 4dbe6: cmove %rsi,%rax . . 258: for (i = numnew - 1; i >= 0; i--) { . . 4dbea: sub $0x1,%ebx . . 260: tempsize *= mydim[i] ? mydim[i] : 1; . . 4dbed: imul %rax,%rdx . . 258: for (i = numnew - 1; i >= 0; i--) { . . 4dbf1: cmp $0xffffffff,%ebx . . 4dbf4: jne 4dbd8 . . 265: Py_INCREF(*des); . . 4dbf6: addq $0x1,0x0(%rbp) . . 266: Py_DECREF(old); . . 4dbfb: subq $0x1,(%r15) . . 4dbff: jne 4dc0b . . 4dc01: mov 0x8(%r15),%rax . . 4dc05: mov %r15,%rdi . . 4dc08: callq *0x30(%rax) . . 856: ret = PyArray_NewFromDescr(subtype, descr, nd, newdims, . . 4dc0b: mov 0x280(%rsp),%edx . . 4dc12: mov 0x288(%rsp),%rax . . 4dc1a: lea 0x40(%rsp),%rcx . . 4dc1f: mov 0x30(%rsp),%r9 . . 4dc24: mov 0x20(%rsp),%r8 . . 4dc29: mov %rbp,%rsi . . 4dc2c: mov 0x28(%rsp),%rdi . . 4dc31: mov %edx,(%rsp) . . 4dc34: mov 0x3c(%rsp),%edx . . 4dc38: mov %rax,0x8(%rsp) . . 4dc3d: callq 4daf0 . . 4dc42: mov %rax,%rbx 5 10 1064: } . . 4dc45: add $0x248,%rsp . . 4dc4c: mov %rbx,%rax . . 4dc4f: pop %rbx . 1 4dc50: pop %rbp 1 2 4dc51: pop %r12 1 1 4dc53: pop %r13 . 2 4dc55: pop %r14 2 3 4dc57: pop %r15 1 1 4dc59: retq . . 4dc5a: nopw 0x0(%rax,%rax,1) . . 230: numnew = PyTuple_GET_SIZE(old->subarray->shape); . . 4dc60: mov 0x10(%rdi),%rax . . 237: newnd = oldnd + numnew; . . 4dc64: add %eax,%r11d . . 230: numnew = PyTuple_GET_SIZE(old->subarray->shape); . . 4dc67: mov %eax,%ebx . . 238: if (newnd > NPY_MAXDIMS) { . . 4dc69: cmp $0x20,%r11d . . 237: newnd = oldnd + numnew; . . 4dc6d: mov %r11d,0x3c(%rsp) . . 238: if (newnd > NPY_MAXDIMS) { . . 4dc72: jg 4dbf6 . . 242: for (i = 0; i < numnew; i++) { . . 4dc74: test %eax,%eax . . 4dc76: jle 4dbba . . 4dc7c: xor %r13d,%r13d . . 4dc7f: jmp 4dc90 . . 4dc81: nopl 0x0(%rax) . . 4dc88: mov 0x28(%r15),%rax . . 4dc8c: mov 0x8(%rax),%rdi . . 243: mydim[i] = (npy_intp) PyInt_AsLong( . . 4dc90: movslq %r13d,%rax . . 4dc93: mov 0x18(%rdi,%rax,8),%rdi . . 4dc98: callq 19970 . . 4dc9d: mov %rax,(%r14,%r13,8) . . 4dca1: add $0x1,%r13 . . 242: for (i = 0; i < numnew; i++) { . . 4dca5: cmp %r13d,%ebx . . 4dca8: jg 4dc88 . . 4dcaa: jmpq 4dbba . . 4dcaf: nop . . 862: if ((unsigned int)nd > (unsigned int)NPY_MAXDIMS) { . . 4dcb0: cmp $0x20,%edx . . 4dcb3: ja 4def0 . 1 872: sd = (size_t) descr->elsize; . 1 4dcb9: movslq 0x20(%rsi),%r12 2 43 873: if (sd == 0) { 1 1 4dcbd: test %r12,%r12 . 1 4dcc0: je 4dea0 1 1 4dcc6: movabs $0x7fffffffffffffff,%rax . . 4dcd0: xor %edx,%edx . 40 4dcd2: div %r12 41 42 892: for (i = 0; i < nd; i++) { 40 40 4dcd5: test %r11d,%r11d . . 4dcd8: je 4e16c . . 4dcde: xor %r9d,%r9d . 1 4dce1: mov $0x1,%r14d 1 1 4dce7: nopw 0x0(%rax,%rax,1) . . 893: npy_intp dim = dims[i]; . . 4dcf0: mov 0x0(%rbp,%r9,8),%rsi . . 895: if (dim == 0) { . . 4dcf5: cmp $0x0,%rsi . . 4dcf9: je 4dd18 . . 903: if (dim < 0) { . . 4dcfb: jl 4dfc0 . . 917: if (dim > largest) { . . 4dd01: cmp %rax,%rsi . . 4dd04: jg 4dfd8 . 47 924: largest /= dim; . . 4dd0a: mov %rax,%rdx . . 4dd0d: sar $0x3f,%rdx . 47 4dd11: idiv %rsi 48 49 923: size *= dim; 47 48 4dd14: imul %rsi,%r14 1 1 4dd18: add $0x1,%r9 . . 892: for (i = 0; i < nd; i++) { . . 4dd1c: cmp %r9d,%r11d . . 4dd1f: jg 4dcf0 1 51 927: fa = (PyArrayObject_fields *) subtype->tp_alloc(subtype, 0); . . 4dd21: mov 0x28(%rsp),%rdi . 1 4dd26: xor %esi,%esi 1 1 4dd28: mov %r11d,0x10(%rsp) . 49 4dd2d: callq *0x130(%rdi) 2 3 928: if (fa == NULL) { 2 3 4dd33: test %rax,%rax 1 1 927: fa = (PyArrayObject_fields *) subtype->tp_alloc(subtype, 0); 1 1 4dd36: mov %rax,%r13 4 8 928: if (fa == NULL) { . 4 4dd39: mov 0x10(%rsp),%r11d 4 4 4dd3e: je 4dec5 . . 935: if (data == NULL) { . . 4dd44: cmpq $0x0,0x30(%rsp) . . 932: fa->nd = nd; . . 4dd4a: mov %r11d,0x18(%rax) . . 933: fa->dimensions = NULL; . . 4dd4e: movq $0x0,0x20(%rax) . . 934: fa->data = NULL; . . 4dd56: movq $0x0,0x10(%rax) . . 935: if (data == NULL) { . . 4dd5e: je 4e068 . 3 946: fa->flags = (flags & ~NPY_ARRAY_UPDATEIFCOPY); . . 4dd64: mov 0x280(%rsp),%eax . . 4dd6b: and $0xef,%ah . 3 4dd6e: mov %eax,0x40(%r13) 3 4 952: if (nd > 0) { 3 4 4dd72: test %r11d,%r11d 1 5 948: fa->descr = descr; 1 5 4dd75: mov %r15,0x38(%r13) 4 5 949: fa->base = (PyObject *)NULL; 4 5 4dd79: movq $0x0,0x30(%r13) 1 3 950: fa->weakreflist = (PyObject *)NULL; 1 3 4dd81: movq $0x0,0x48(%r13) 2 2 952: if (nd > 0) { 2 2 4dd89: jne 4df40 . . 975: fa->flags |= NPY_ARRAY_F_CONTIGUOUS; . . 4dd8f: orl $0x2,0x40(%r13) . 1 974: fa->dimensions = fa->strides = NULL; . 1 4dd94: movq $0x0,0x28(%r13) 2 3 978: if (data == NULL) { 1 2 4dd9c: cmpq $0x0,0x30(%rsp) 1 1 4dda2: je 4e083 . 1 1008: fa->flags &= ~NPY_ARRAY_OWNDATA; . 1 4dda8: andl $0xfffffffb,0x40(%r13) 1 1 1010: fa->data = data; 1 1 4ddad: mov 0x30(%rsp),%rax . . 1016: if (strides != NULL) { . . 4ddb2: test %rbx,%rbx . . 1010: fa->data = data; . . 4ddb5: mov %rax,0x10(%r13) . . 1016: if (strides != NULL) { . . 4ddb9: je 4ddc8 . . 1017: PyArray_UpdateFlags((PyArrayObject *)fa, NPY_ARRAY_UPDATE_ALL); . . 4ddbb: mov $0x103,%esi . . 4ddc0: mov %r13,%rdi . . 4ddc3: callq 7b3a0 6 12 1025: if ((subtype != &PyArray_Type)) { . . 4ddc8: lea 0x2d7c11(%rip),%rax # 3259e0 . 6 4ddcf: cmp %rax,0x28(%rsp) 6 6 4ddd4: mov %r13,%rbx . . 4ddd7: je 4dc45 . . 1028: func = PyObject_GetAttrString((PyObject *)fa, "__array_finalize__"); . . 4dddd: lea 0x9d5d4(%rip),%rsi # eb3b8 . . 4dde4: mov %r13,%rdi . . 4dde7: callq 19ac0 . . 1029: if (func && func != Py_None) { . . 4ddec: test %rax,%rax . . 1028: func = PyObject_GetAttrString((PyObject *)fa, "__array_finalize__"); . . 4ddef: mov %rax,%rbp . . 1029: if (func && func != Py_None) { . . 4ddf2: je 4dc45 . . 4ddf8: mov 0x2d7121(%rip),%r12 # 324f20 <_DYNAMIC+0x360> . . 4ddff: cmp %r12,%rax . . 4de02: je 4e04f . . 1030: if (NpyCapsule_Check(func)) { . . 4de08: mov 0x2d7109(%rip),%rdx # 324f18 <_DYNAMIC+0x358> . . 4de0f: cmp %rdx,0x8(%rax) . . 4de13: je 4e11b . . 1040: args = PyTuple_New(1); . . 4de19: mov $0x1,%edi . . 4de1e: callq 1a200 . . 1042: obj=Py_None; . . 4de23: cmpq $0x0,0x288(%rsp) . . 1040: args = PyTuple_New(1); . . 4de2c: mov %rax,%rbx . . 1046: res = PyObject_Call(func, args, NULL); . . 4de2f: mov %rbp,%rdi . . 1042: obj=Py_None; . . 4de32: cmovne 0x288(%rsp),%r12 . . 1046: res = PyObject_Call(func, args, NULL); . . 4de3b: mov %rbx,%rsi . . 4de3e: xor %edx,%edx . . 1044: Py_INCREF(obj); . . 4de40: addq $0x1,(%r12) . . 1045: PyTuple_SET_ITEM(args, 0, obj); . . 4de45: mov %r12,0x18(%rbx) . . 1042: obj=Py_None; . . 4de49: mov %r12,0x288(%rsp) . . 1046: res = PyObject_Call(func, args, NULL); . . 4de51: callq 1a830 . . 1047: Py_DECREF(args); . . 4de56: subq $0x1,(%rbx) . . 1046: res = PyObject_Call(func, args, NULL); . . 4de5a: mov %rax,%r12 . . 1047: Py_DECREF(args); . . 4de5d: je 4e0ee . . 1048: Py_DECREF(func); . . 4de63: subq $0x1,0x0(%rbp) . . 4de68: je 4e0df . . 1049: if (res == NULL) { . . 4de6e: test %r12,%r12 . . 4de71: je 4e033 . . 1053: Py_DECREF(res); . . 4de77: mov (%r12),%rax . . 4de7b: mov %r13,%rbx . . 4de7e: sub $0x1,%rax . . 4de82: test %rax,%rax . . 4de85: mov %rax,(%r12) . . 4de89: jne 4dc45 . . 4de8f: mov 0x8(%r12),%rax . . 4de94: mov %r12,%rdi . . 4de97: callq *0x30(%rax) . . 4de9a: jmpq 4dc45 . . 4de9f: nop . . 874: if (!PyDataType_ISSTRING(descr)) { . . 4dea0: mov 0x1c(%rsi),%eax . . 4dea3: sub $0x12,%eax . . 4dea6: cmp $0x1,%eax . . 4dea9: jbe 4dfe1 . . 875: PyErr_SetString(PyExc_TypeError, "Empty data-type"); . . 4deaf: mov 0x2d6fc2(%rip),%rax # 324e78 <_DYNAMIC+0x2b8> . . 4deb6: lea 0x9d4d9(%rip),%rsi # eb396 . . 904: PyErr_SetString(PyExc_ValueError, . . 4debd: mov (%rax),%rdi . . 4dec0: callq 19d10 . . 929: Py_DECREF(descr); . . 4dec5: subq $0x1,(%r15) . . 4dec9: je 4ded8 . . 4decb: xor %ebx,%ebx . . 4decd: jmpq 4dc45 . . 4ded2: nopw 0x0(%rax,%rax,1) . . 4ded8: mov 0x8(%r15),%rax . . 4dedc: mov %r15,%rdi . . 4dedf: xor %ebx,%ebx . . 4dee1: callq *0x30(%rax) . . 4dee4: jmpq 4dc45 . . 4dee9: nopl 0x0(%rax) . . 863: PyErr_Format(PyExc_ValueError, . . 4def0: mov 0x2d6f69(%rip),%rax # 324e60 <_DYNAMIC+0x2a0> . . 4def7: lea 0x9d72a(%rip),%rsi # eb628 . . 4defe: mov $0x20,%edx . . 4df03: mov (%rax),%rdi . . 4df06: xor %eax,%eax . . 4df08: callq 1a8d0 . . 4df0d: jmp 4dec5 . . 939: if (nd > 1) { . . 4df0f: cmp $0x1,%r11d . . 4df13: jle 4e177 . . 940: fa->flags &= ~NPY_ARRAY_C_CONTIGUOUS; . . 4df19: movl $0x502,0x40(%rax) . . 948: fa->descr = descr; . . 4df20: mov %r15,0x38(%rax) . . 949: fa->base = (PyObject *)NULL; . . 4df24: movq $0x0,0x30(%rax) . . 950: fa->weakreflist = (PyObject *)NULL; . . 4df2c: movq $0x0,0x48(%rax) . . 942: flags = NPY_ARRAY_F_CONTIGUOUS; . . 4df34: movl $0x2,0x280(%rsp) . . 4df3f: nop 9 60 953: fa->dimensions = PyDimMem_NEW(3*nd); . 4 4df40: lea (%r11,%r11,2),%edi 4 9 4df44: mov %r11d,0x10(%rsp) 5 5 4df49: movslq %edi,%rdi . . 4df4c: shl $0x3,%rdi . 42 4df50: callq 1aa50 . . 954: if (fa->dimensions == NULL) { . . 4df55: test %rax,%rax . 2 953: fa->dimensions = PyDimMem_NEW(3*nd); . 2 4df58: mov %rax,0x20(%r13) 2 2 954: if (fa->dimensions == NULL) { 2 2 4df5c: mov 0x10(%rsp),%r11d . . 4df61: je 4e02e . . 958: fa->strides = fa->dimensions + nd; . . 4df67: movslq %r11d,%r8 -------------------- /usr/include/x86_64-linux-gnu/bits/string3.h . . 52: return __builtin___memcpy_chk (__dest, __src, __len, __bos0 (__dest)); . . 4df6a: mov %rax,%rdi . . 4df6d: mov %rbp,%rsi -------------------- /tmp/pip-UN1TwQ-build/numpy/core/src/multiarray/ctors.c 1 2 958: fa->strides = fa->dimensions + nd; . . 4df70: shl $0x3,%r8 . 1 4df74: lea (%rax,%r8,1),%rdx 1 1 4df78: mov %rdx,0x28(%r13) -------------------- /usr/include/x86_64-linux-gnu/bits/string3.h . 15 52: return __builtin___memcpy_chk (__dest, __src, __len, __bos0 (__dest)); . . 4df7c: mov %r8,%rdx . . 4df7f: mov %r8,0x18(%rsp) . . 4df84: mov %r11d,0x10(%rsp) . 15 4df89: callq 1a240 -------------------- /tmp/pip-UN1TwQ-build/numpy/core/src/multiarray/ctors.c 1 1 960: if (strides == NULL) { /* fill it in */ 1 1 4df8e: test %rbx,%rbx . 1 961: sd = _array_fill_strides(fa->strides, dims, nd, sd, . 1 4df91: mov 0x28(%r13),%rdi 3 5 960: if (strides == NULL) { /* fill it in */ 1 1 4df95: mov 0x18(%rsp),%r8 . 2 4df9a: mov 0x10(%rsp),%r11d 2 2 4df9f: je 4e14a -------------------- /usr/include/x86_64-linux-gnu/bits/string3.h . . 52: return __builtin___memcpy_chk (__dest, __src, __len, __bos0 (__dest)); . . 4dfa5: mov %r8,%rdx . . 4dfa8: mov %rbx,%rsi . . 4dfab: callq 1a240 -------------------- /tmp/pip-UN1TwQ-build/numpy/core/src/multiarray/ctors.c . . 970: sd *= size; . . 4dfb0: imul %r14,%r12 . . 4dfb4: jmpq 4dd9c . . 4dfb9: nopl 0x0(%rax) . . 904: PyErr_SetString(PyExc_ValueError, . . 4dfc0: lea 0x9d691(%rip),%rsi # eb658 . . 4dfc7: mov 0x2d6e92(%rip),%rax # 324e60 <_DYNAMIC+0x2a0> . . 4dfce: jmpq 4debd . . 4dfd3: nopl 0x0(%rax,%rax,1) . . 918: PyErr_SetString(PyExc_ValueError, . . 4dfd8: lea 0x9d3c7(%rip),%rsi # eb3a6 . . 4dfdf: jmp 4dfc7 . . 879: PyArray_DESCR_REPLACE(descr); . . 4dfe1: mov %rsi,%rdi . . 4dfe4: mov %edx,0x10(%rsp) . . 4dfe8: callq 5e680 . . 4dfed: subq $0x1,(%r15) . . 4dff1: mov 0x10(%rsp),%r11d . . 4dff6: je 4e0fd . . 4dffc: test %rax,%rax . . 4dfff: je 4decb . . 883: if (descr->type_num == NPY_STRING) { . . 4e005: cmpl $0x12,0x1c(%rax) . . 4e009: je 4e0c0 . . 887: sd = descr->elsize = sizeof(npy_ucs4); . . 4e00f: movl $0x4,0x20(%rax) . . 4e016: mov %rax,%r15 . . 4e019: mov $0x4,%r12d . . 4e01f: movabs $0x1fffffffffffffff,%rax . . 4e029: jmpq 4dcd5 . . 955: PyErr_NoMemory(); . . 4e02e: callq 19bb0 . 3 1062: Py_DECREF(fa); . . 4e033: subq $0x1,0x0(%r13) . . 4e038: jne 4decb . . 4e03e: mov 0x8(%r13),%rax . . 4e042: mov %r13,%rdi . . 4e045: xor %ebx,%ebx . . 4e047: callq *0x30(%rax) . . 4e04a: jmpq 4dc45 . . 4e04f: subq $0x1,(%rax) . . 4e053: jne 4dc45 . . 4e059: mov 0x8(%rax),%rax . . 4e05d: mov %rbp,%rdi . . 4e060: callq *0x30(%rax) . 3 4e063: jmpq 4dc45 11 20 937: if (flags) { 3 11 4e068: mov 0x280(%rsp),%edi 8 8 4e06f: test %edi,%edi . 1 4e071: jne 4df0f 4 7 936: fa->flags = NPY_ARRAY_DEFAULT; 1 4 4e077: movl $0x501,0x40(%rax) 3 3 4e07e: jmpq 4dd72 . . 985: if (sd == 0) { . . 4e083: test %r12,%r12 . . 4e086: jne 4e08c . 2 986: sd = descr->elsize; . 2 4e088: movslq 0x20(%r15),%r12 2 53 988: data = PyDataMem_NEW(sd); 2 2 4e08c: mov %r12,%rdi . 51 4e08f: callq af500 2 2 989: if (data == NULL) { 2 2 4e094: test %rax,%rax . 1 988: data = PyDataMem_NEW(sd); . 1 4e097: mov %rax,0x30(%rsp) 1 1 989: if (data == NULL) { 1 1 4e09c: je 4e02e . . 993: fa->flags |= NPY_ARRAY_OWNDATA; . . 4e09e: orl $0x4,0x40(%r13) 1 2 999: if (PyDataType_FLAGCHK(descr, NPY_NEEDS_INIT)) { . 1 4e0a3: testb $0x8,0x1b(%r15) 1 1 4e0a8: je 4ddad -------------------- /usr/include/x86_64-linux-gnu/bits/string3.h . . 85: return __builtin___memset_chk (__dest, __ch, __len, __bos0 (__dest)); . . 4e0ae: mov %r12,%rdx . . 4e0b1: xor %esi,%esi . . 4e0b3: mov %rax,%rdi . . 4e0b6: callq 19e50 . . 4e0bb: jmpq 4ddad -------------------- /tmp/pip-UN1TwQ-build/numpy/core/src/multiarray/ctors.c . . 884: sd = descr->elsize = 1; . . 4e0c0: movl $0x1,0x20(%rax) . . 4e0c7: mov %rax,%r15 . . 4e0ca: mov $0x1,%r12d . . 4e0d0: movabs $0x7fffffffffffffff,%rax . . 4e0da: jmpq 4dcd5 . . 4e0df: mov 0x8(%rbp),%rax . . 4e0e3: mov %rbp,%rdi . . 4e0e6: callq *0x30(%rax) . . 4e0e9: jmpq 4de6e . . 4e0ee: mov 0x8(%rbx),%rax . . 4e0f2: mov %rbx,%rdi . . 4e0f5: callq *0x30(%rax) . . 4e0f8: jmpq 4de63 . . 4e0fd: mov 0x8(%r15),%rdx . . 4e101: mov %r15,%rdi . . 4e104: mov %rax,0x18(%rsp) . . 4e109: callq *0x30(%rdx) . . 4e10c: mov 0x10(%rsp),%r11d . . 4e111: mov 0x18(%rsp),%rax . . 4e116: jmpq 4dffc -------------------- ...ip-UN1TwQ-build/numpy/core/include/numpy/npy_3kcompat.h . . 377: return PyCObject_AsVoidPtr(ptr); . . 4e11b: mov %rax,%rdi . . 4e11e: callq 1a6a0 -------------------- /tmp/pip-UN1TwQ-build/numpy/core/src/multiarray/ctors.c . . 1034: Py_DECREF(func); . . 4e123: subq $0x1,0x0(%rbp) -------------------- ...ip-UN1TwQ-build/numpy/core/include/numpy/npy_3kcompat.h . . 377: return PyCObject_AsVoidPtr(ptr); . . 4e128: mov %rax,%rbx -------------------- /tmp/pip-UN1TwQ-build/numpy/core/src/multiarray/ctors.c . . 1034: Py_DECREF(func); . . 4e12b: je 4e18e . 1 1035: if (cfunc((PyArrayObject *)fa, obj) < 0) { . . 4e12d: mov 0x288(%rsp),%rsi . . 4e135: mov %r13,%rdi . . 4e138: callq *%rbx . . 4e13a: test %eax,%eax . . 4e13c: mov %r13,%rbx . . 4e13f: jns 4dc45 . 1 4e145: jmpq 4e033 4 25 961: sd = _array_fill_strides(fa->strides, dims, nd, sd, 1 2 4e14a: mov 0x280(%rsp),%r8d 1 1 4e152: lea 0x40(%r13),%r9 . . 4e156: mov %r12,%rcx . 1 4e159: mov %r11d,%edx 1 1 4e15c: mov %rbp,%rsi . 19 4e15f: callq 4da20 <_array_fill_strides> 1 1 4e164: mov %rax,%r12 . . 4e167: jmpq 4dd9c . . 871: size = 1; . . 4e16c: mov $0x1,%r14d . . 4e172: jmpq 4dd21 . . 938: fa->flags |= NPY_ARRAY_F_CONTIGUOUS; . . 4e177: movl $0x503,0x40(%rax) . . 942: flags = NPY_ARRAY_F_CONTIGUOUS; . . 4e17e: movl $0x2,0x280(%rsp) . . 4e189: jmpq 4dd72 . . 4e18e: mov 0x8(%rbp),%rax . . 4e192: mov %rbp,%rdi . . 4e195: callq *0x30(%rax) . . 4e198: jmp 4e12d . . 4e19a: nopw 0x0(%rax,%rax,1) -------------- next part -------------- A non-text attachment was scrubbed... Name: master-array-float64-add.pdf Type: application/pdf Size: 19235 bytes Desc: not available URL: From nouiz at nouiz.org Tue Jul 16 14:53:30 2013 From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=) Date: Tue, 16 Jul 2013 14:53:30 -0400 Subject: [Numpy-discussion] Speedup by avoiding memory alloc twice in scalar array In-Reply-To: References: Message-ID: Hi, On Tue, Jul 16, 2013 at 11:55 AM, Nathaniel Smith wrote: > On Tue, Jul 16, 2013 at 2:34 PM, Arink Verma wrote: > >> >Each ndarray does two mallocs, for the obj and buffer. These could be >> combined into 1 - just allocate the total size and do some pointer >> >arithmetic, then set OWNDATA to false. >> So, that two mallocs has been mentioned in project introduction. I got >> that wrong. >> > > On further thought/reading the code, it appears to be more complicated > than that, actually. > > It looks like (for a non-scalar array) we have 2 calls to PyMem_Malloc: 1 > for the array object itself, and one for the shapes + strides. And, one > call to regular-old malloc: for the data buffer. > > (Mysteriously, shapes + strides together have 2*ndim elements, but to hold > them we allocate a memory region sized to hold 3*ndim elements. I'm not > sure why.) > > And contrary to what I said earlier, this is about as optimized as it can > be without breaking ABI. We need at least 2 calls to malloc/PyMem_Malloc, > because the shapes+strides may need to be resized without affecting the > much larger data area. But it's tempting to allocate the array object and > the data buffer in a single memory region, like I suggested earlier. And > this would ALMOST work. But, it turns out there is code out there which > assumes (whether wisely or not) that you can swap around which data buffer > a given PyArrayObject refers to (hi Theano!). And supporting this means > that data buffers and PyArrayObjects need to be in separate memory regions. > Are you sure that Theano "swap" the data ptr of an ndarray? When we play with that, it is on a newly create ndarray. So a node in our graph, won't change the input ndarray structure. It will create a new ndarray structure with new shape/strides and pass a data ptr and we flag the new ndarray with own_data correctly to my knowledge. If Theano pose a problem here, I'll suggest that I fix Theano. But currently I don't see the problem. So if this make you change your mind about this optimization, tell me. I don't want Theano to prevent optimization in NumPy. Fred -------------- next part -------------- An HTML attachment was scrubbed... URL: From irving at naml.us Tue Jul 16 19:44:37 2013 From: irving at naml.us (Geoffrey Irving) Date: Tue, 16 Jul 2013 16:44:37 -0700 Subject: [Numpy-discussion] restricting object arrays to a single Python type Message-ID: Is there a standard way of creating an object array restricted to a particular python type? I want a safe way of sending arrays of objects back and forth between Python and C++, and it'd be great if I could use numpy arrays on the Python side instead of creating a new type. For example, I might have a C++ class Force which is simultaneously a valid Python extension type (also named "Force"). I'd like to be able to switch between Array on the C++ side and a suitable numpy array on the Python side, while preventing Python from ever storing an object with different type (say, a tuple) in the array. Note that Array has the memory representation of an array of PyObject*'s. Thanks, Geoffrey From scopatz at gmail.com Tue Jul 16 19:51:58 2013 From: scopatz at gmail.com (Anthony Scopatz) Date: Tue, 16 Jul 2013 18:51:58 -0500 Subject: [Numpy-discussion] restricting object arrays to a single Python type In-Reply-To: References: Message-ID: Hi Geoffrey, Not to toot my own horn here too much, but you really should have a look at xdress (http://xdress.org/ and https://github.com/xdress/xdress). XDress will generate a wrapper of the Force class for you and then also create a custom numpy dtype for this class. In this way, you could get exactly what you want. If you run into any trouble, let me know and I'll be sure to help you out! This is the kind of thing that xdress is *supposed* to do so bugs here are a big priority for me personally =) Be Well Anthony On Tue, Jul 16, 2013 at 6:44 PM, Geoffrey Irving wrote: > Is there a standard way of creating an object array restricted to a > particular python type? I want a safe way of sending arrays of > objects back and forth between Python and C++, and it'd be great if I > could use numpy arrays on the Python side instead of creating a new > type. > > For example, I might have a C++ class Force which is simultaneously a > valid Python extension type (also named "Force"). I'd like to be able > to switch between Array on the C++ side and a suitable numpy > array on the Python side, while preventing Python from ever storing an > object with different type (say, a tuple) in the array. Note that > Array has the memory representation of an array of PyObject*'s. > > Thanks, > Geoffrey > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From irving at naml.us Tue Jul 16 20:15:22 2013 From: irving at naml.us (Geoffrey Irving) Date: Tue, 16 Jul 2013 17:15:22 -0700 Subject: [Numpy-discussion] restricting object arrays to a single Python type In-Reply-To: References: Message-ID: On Tue, Jul 16, 2013 at 4:51 PM, Anthony Scopatz wrote: > Hi Geoffrey, > > Not to toot my own horn here too much, but you really should have a look at > xdress (http://xdress.org/ and https://github.com/xdress/xdress). XDress > will generate a wrapper of the Force class for you and then also create a > custom numpy dtype for this class. In this way, you could get exactly what > you want. Unfortunately it's unlikely to work out of the box, since it uses gccxml which appears to still be based on gcc 4.2. All of our code is C++11, and we need to preserve portability to horrible places like Visual Studio (yes, these two constraints are just barely compatible at the moment). > If you run into any trouble, let me know and I'll be sure to help you out! > This is the kind of thing that xdress is supposed to do so bugs here are a > big priority for me personally =) We're currently using a custom Python binding layer which I wrote a while ago after getting fed up with boost::python. Our system is extremely lightweight but also limited, and in particular is missing a few key features like automatic support for named and default arguments (since these can't be introspected inside C++). It'd be great to chat more about our two feature sets and whether there are opportunities for collaboration and/or merging. I'm not sure if this list is a good place for that discussion, so we could optionally take it off list or to skype if you're up for that (send me an email directly if so). Here are links to our system, which unfortunately is undocumented at the moment: https://github.com/otherlab/core https://github.com/otherlab/core/blob/master/python/ClassTest.cpp # Unit test for wrapping a class Geoffrey From scopatz at gmail.com Tue Jul 16 22:15:03 2013 From: scopatz at gmail.com (Anthony Scopatz) Date: Tue, 16 Jul 2013 21:15:03 -0500 Subject: [Numpy-discussion] restricting object arrays to a single Python type In-Reply-To: References: Message-ID: Hey Geoffrey, Let's definitely take this off (this) list. The discussion could get involved :). Be Well Anthony On Tue, Jul 16, 2013 at 7:15 PM, Geoffrey Irving wrote: > On Tue, Jul 16, 2013 at 4:51 PM, Anthony Scopatz > wrote: > > Hi Geoffrey, > > > > Not to toot my own horn here too much, but you really should have a look > at > > xdress (http://xdress.org/ and https://github.com/xdress/xdress). > XDress > > will generate a wrapper of the Force class for you and then also create a > > custom numpy dtype for this class. In this way, you could get exactly > what > > you want. > > Unfortunately it's unlikely to work out of the box, since it uses > gccxml which appears to still be based on gcc 4.2. All of our code is > C++11, and we need to preserve portability to horrible places like > Visual Studio (yes, these two constraints are just barely compatible > at the moment). > > > If you run into any trouble, let me know and I'll be sure to help you > out! > > This is the kind of thing that xdress is supposed to do so bugs here are > a > > big priority for me personally =) > > We're currently using a custom Python binding layer which I wrote a > while ago after getting fed up with boost::python. Our system is > extremely lightweight but also limited, and in particular is missing a > few key features like automatic support for named and default > arguments (since these can't be introspected inside C++). It'd be > great to chat more about our two feature sets and whether there are > opportunities for collaboration and/or merging. I'm not sure if this > list is a good place for that discussion, so we could optionally take > it off list or to skype if you're up for that (send me an email > directly if so). > > Here are links to our system, which unfortunately is undocumented at the > moment: > > https://github.com/otherlab/core > https://github.com/otherlab/core/blob/master/python/ClassTest.cpp > # Unit test for wrapping a class > > Geoffrey > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lists at onerussian.com Tue Jul 16 23:53:48 2013 From: lists at onerussian.com (Yaroslav Halchenko) Date: Tue, 16 Jul 2013 23:53:48 -0400 Subject: [Numpy-discussion] Really cruel draft of vbench setup for NumPy (.add.reduce benchmarks since 2011) In-Reply-To: <20130709161007.GL27621@onerussian.com> References: <20130506143241.GV5140@onerussian.com> <1367856232.2506.31.camel@sebastian-laptop> <20130506161153.GW5140@onerussian.com> <1367927238.23010.12.camel@sebastian-laptop> <20130701193006.GC27621@onerussian.com> <20130701215804.GG27621@onerussian.com> <20130709161007.GL27621@onerussian.com> Message-ID: <20130717035348.GN27621@onerussian.com> and to put so far reported findings into some kind of automated form, please welcome http://www.onerussian.com/tmp/numpy-vbench/#benchmarks-performance-analysis This is based on a simple 1-way anova of last 10 commits and some point in the past where 10 other commits had smallest timing and were significantly different from the last 10 commits. "Possible recent" is probably too noisy and not sure if useful -- it should point to a closest in time (to the latest commits) diff where a significant excursion from current performance was detected. So per se it has nothing to do with the initial detected performance hit, but in some cases seems still to reasonably locate commits hitting on performance. Enjoy, On Tue, 09 Jul 2013, Yaroslav Halchenko wrote: > Julian Taylor contributed some benchmarks he was "concerned" about, so > now the collection is even better. > I will keep updating tests on the same url: > http://www.onerussian.com/tmp/numpy-vbench/ > [it is now running and later I will upload with more commits for higher temporal fidelity] > of particular interest for you might be: > some minor consistent recent losses in > http://www.onerussian.com/tmp/numpy-vbench/vb_vb_io.html#strided-assign-float64 > http://www.onerussian.com/tmp/numpy-vbench/vb_vb_io.html#strided-assign-float32 > http://www.onerussian.com/tmp/numpy-vbench/vb_vb_io.html#strided-assign-int16 > http://www.onerussian.com/tmp/numpy-vbench/vb_vb_io.html#strided-assign-int8 > seems have lost more than 25% of performance throughout the timeline > http://www.onerussian.com/tmp/numpy-vbench/vb_vb_io.html#memcpy-int8 > "fast" calls to all/any seemed to be hurt twice in their life time now running > *3 times slower* than in 2011 -- inflection points correspond to regressions > and/or their fixes in those functions to bring back performance on "slow" > cases (when array traversal is needed, e.g. on arrays of zeros for any) > http://www.onerussian.com/tmp/numpy-vbench/vb_vb_reduce.html#numpy-all-fast > http://www.onerussian.com/tmp/numpy-vbench/vb_vb_reduce.html#numpy-any-fast > Enjoy > On Mon, 01 Jul 2013, Yaroslav Halchenko wrote: > > FWIW -- updated plots with contribution from Julian Taylor > > http://www.onerussian.com/tmp/numpy-vbench-20130701/vb_vb_indexing.html#mmap-slicing > > ;-) > > On Mon, 01 Jul 2013, Yaroslav Halchenko wrote: > > > Hi Guys, > > > not quite the recommendations you expressed, but here is my ugly > > > attempt to improve benchmarks coverage: > > > http://www.onerussian.com/tmp/numpy-vbench-20130701/index.html > > > initially I also ran those ufunc benchmarks per each dtype separately, > > > but then resulting webpage is loong which brings my laptop on its knees > > > by firefox. So I commented those out for now, and left only "summary" > > > ones across multiple datatypes. > > > There is a bug in sphinx which forbids embedding some figures for > > > vb_random "as is", so pardon that for now... > > > I have not set cpu affinity of the process (but ran it at nice -10), so may be > > > that also contributed to variance of benchmark estimates. And there probably > > > could be more of goodies (e.g. gc control etc) to borrow from > > > https://github.com/pydata/pandas/blob/master/vb_suite/test_perf.py which I have > > > just discovered to minimize variance. > > > nothing really interesting was pin-pointed so far, besides that > > > - svd became a bit faster since few months back ;-) > > > http://www.onerussian.com/tmp/numpy-vbench-20130701/vb_vb_linalg.html > > > - isnan (and isinf, isfinite) got improved > > > http://www.onerussian.com/tmp/numpy-vbench-20130701/vb_vb_ufunc.html#numpy-isnan-a-10types > > > - right_shift got a miniscule slowdown from what it used to be? > > > http://www.onerussian.com/tmp/numpy-vbench-20130701/vb_vb_ufunc.html#numpy-right-shift-a-a-3types > > > As before -- current code of those benchmarks collection is available > > > at http://github.com/yarikoptic/numpy-vbench/pull/new/master > > > if you have specific snippets you would like to benchmark -- just state them > > > here or send a PR -- I will add them in. > > > Cheers, > > > On Tue, 07 May 2013, Da?id wrote: > > > > On 7 May 2013 13:47, Sebastian Berg wrote: > > > > > Indexing/assignment was the first thing I thought of too (also because > > > > > fancy indexing/assignment really could use some speedups...). Other then > > > > > that maybe some timings for small arrays/scalar math, but that might be > > > > > nice for that GSoC project. > > > > Why not going bigger? Ufunc operations on big arrays, CPU and memory bound. > > > > Also, what about interfacing with other packages? It may increase the > > > > compiling overhead, but I would like to see Cython in action (say, > > > > only last version, maybe it can be fixed). > > > > _______________________________________________ > > > > NumPy-Discussion mailing list > > > > NumPy-Discussion at scipy.org > > > > http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Yaroslav O. Halchenko, Ph.D. http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org Senior Research Associate, Psychological and Brain Sciences Dept. Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 WWW: http://www.linkedin.com/in/yarik From scopatz at gmail.com Wed Jul 17 01:50:17 2013 From: scopatz at gmail.com (Anthony Scopatz) Date: Wed, 17 Jul 2013 00:50:17 -0500 Subject: [Numpy-discussion] restricting object arrays to a single Python type In-Reply-To: References: Message-ID: On Tue, Jul 16, 2013 at 7:15 PM, Geoffrey Irving wrote: > On Tue, Jul 16, 2013 at 4:51 PM, Anthony Scopatz > wrote: > > Hi Geoffrey, > > > > Not to toot my own horn here too much, but you really should have a look > at > > xdress (http://xdress.org/ and https://github.com/xdress/xdress). > XDress > > will generate a wrapper of the Force class for you and then also create a > > custom numpy dtype for this class. In this way, you could get exactly > what > > you want. > > Unfortunately it's unlikely to work out of the box, since it uses > gccxml which appears to still be based on gcc 4.2. All of our code is > C++11, and we need to preserve portability to horrible places like > Visual Studio (yes, these two constraints are just barely compatible > at the moment). > Hey Geoffrey, I don't think that GCC-XML should be a show stopper. There are a couple of reasons for this. The first is that, correct me if I am wrong, but most of the C++11 updates are not really changes that affect top-level API elements -- which is the only thing that you care about when creating wrappers. So unless you are relying on a lot of lambdas or something, my guess is that GCC-XML might just work anyways. The second reason is that xdress is written to be *very* modular. There is no reason that it needs to rely on GCC-XML at all. Other parsers and ASTs, such as Clang or SWIG or ROSE, could be used. In fact there is a mostly complete version of a Clang AST present in XDress already. I have disabled it and am not worrying about it personally because the current Clang Python AST bindings do not support template arguments. Since this is a major use case of mine, I had to abandon that code line. However, other people could forge ahead with Clang in one of a few ways: 1. Use the nascent XML output of Clang, 2. Use the existing Clang Python AST bindings, understanding that they are incomplete 3. Fix the Python Clang AST bindings 4. Write your on Python Clang AST Bindings (I know people who have done this but they are not open sorce), possibly using XDress! In any event, none of this is super difficult, wouldn't impair xdress development at all, and everyone would benefit. Alternative parsers are something that is on my radar and I would love to support. > > If you run into any trouble, let me know and I'll be sure to help you > out! > > This is the kind of thing that xdress is supposed to do so bugs here are > a > > big priority for me personally =) > > We're currently using a custom Python binding layer which I wrote a > while ago after getting fed up with boost::python. I have been down that painful road =) > Our system is > extremely lightweight but also limited, and in particular is missing a > few key features like automatic support for named and default > arguments (since these can't be introspected inside C++). XDress supports these. > It'd be > great to chat more about our two feature sets and whether there are > opportunities for collaboration and/or merging. I'm not sure if this > list is a good place for that discussion, so we could optionally take > it off list or to skype if you're up for that (send me an email > directly if so). > I'd be happy to! My skype name is 'scopatz' or you can find me on Google+. I tend not to just hang out on skype and I have a lot to do tomorrow, so if you want to set a time that would probably be best. My schedule is pretty flexible if busy. Be Well Anthony > Here are links to our system, which unfortunately is undocumented at the > moment: > > https://github.com/otherlab/core > https://github.com/otherlab/core/blob/master/python/ClassTest.cpp > # Unit test for wrapping a class > > Geoffrey > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Wed Jul 17 10:25:43 2013 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 17 Jul 2013 15:25:43 +0100 Subject: [Numpy-discussion] ufunc overrides In-Reply-To: References: Message-ID: On Thu, Jul 11, 2013 at 4:29 AM, Blake Griffith wrote: > > Hello NumPy, > > Part of my GSoC is compatibility with SciPy's sparse matrices and NumPy's ufuncs. Currently there is no feasible way to do this without changing ufuncs a bit. > > I've been considering a mechanism to override ufuncs based on checking the ufuncs arguments for a __ufunc_override__ attribute. Then handing off the operation to a function specified by that attribute. I prototyped this in python and did a demo in a blog post here: > http://cwl.cx/posts/week-6-ufunc-overrides.html > This is similar to a previously discussed, but never implemented change: > http://mail.scipy.org/pipermail/numpy-discussion/2011-June/056945.html I've just posted long comment with a slightly different proposal in the PR: https://github.com/numpy/numpy/pull/3524#issuecomment-21115548 Mentioning this here because this has the potential to majorly affect anyone working with ndarray subclasses or other array-like objects (e.g., masked arrays, GPU arrays, etc.), so if you care about these things then please take a look and help us make sure that the final API is flexible enough to handle your needs. -n From njs at pobox.com Wed Jul 17 10:39:54 2013 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 17 Jul 2013 15:39:54 +0100 Subject: [Numpy-discussion] Speedup by avoiding memory alloc twice in scalar array In-Reply-To: References: Message-ID: On Tue, Jul 16, 2013 at 7:53 PM, Fr?d?ric Bastien wrote: > Hi, > > > On Tue, Jul 16, 2013 at 11:55 AM, Nathaniel Smith wrote: >> >> On Tue, Jul 16, 2013 at 2:34 PM, Arink Verma wrote: >>> >>> >Each ndarray does two mallocs, for the obj and buffer. These could be >>> > combined into 1 - just allocate the total size and do some pointer >>> > >arithmetic, then set OWNDATA to false. >>> So, that two mallocs has been mentioned in project introduction. I got >>> that wrong. >> >> >> On further thought/reading the code, it appears to be more complicated >> than that, actually. >> >> It looks like (for a non-scalar array) we have 2 calls to PyMem_Malloc: 1 >> for the array object itself, and one for the shapes + strides. And, one call >> to regular-old malloc: for the data buffer. >> >> (Mysteriously, shapes + strides together have 2*ndim elements, but to hold >> them we allocate a memory region sized to hold 3*ndim elements. I'm not sure >> why.) >> >> And contrary to what I said earlier, this is about as optimized as it can >> be without breaking ABI. We need at least 2 calls to malloc/PyMem_Malloc, >> because the shapes+strides may need to be resized without affecting the much >> larger data area. But it's tempting to allocate the array object and the >> data buffer in a single memory region, like I suggested earlier. And this >> would ALMOST work. But, it turns out there is code out there which assumes >> (whether wisely or not) that you can swap around which data buffer a given >> PyArrayObject refers to (hi Theano!). And supporting this means that data >> buffers and PyArrayObjects need to be in separate memory regions. > > > Are you sure that Theano "swap" the data ptr of an ndarray? When we play > with that, it is on a newly create ndarray. So a node in our graph, won't > change the input ndarray structure. It will create a new ndarray structure > with new shape/strides and pass a data ptr and we flag the new ndarray with > own_data correctly to my knowledge. > > If Theano pose a problem here, I'll suggest that I fix Theano. But currently > I don't see the problem. So if this make you change your mind about this > optimization, tell me. I don't want Theano to prevent optimization in NumPy. It's entirely possible I misunderstood, so let's see if we can work it out. I know that you want to assign to the ->data pointer in a PyArrayObject, right? That's what caused some trouble with the 1.7 API deprecations, which were trying to prevent direct access to this field? Creating a new array given a pointer to a memory region is no problem, and obviously will be supported regardless of any optimizations. But if that's all you were doing then you shouldn't have run into the deprecation problem. Or maybe I'm misremembering! The problem is if one wants to (a) create a PyArrayObject, which will by default allocate a new memory region and assign a pointer to it to the ->data field, and *then* (b) "steal" that memory region and replace it with another one, while keeping the same PyArrayObject. This is technically possible right now (though I wouldn't say it was necessarily a good idea!), but it would become impossible if we allocated the PyArrayObject and data into a single region. The profiles suggest that this would only make allocation of arrays maybe 15% faster, with probably a similar effect on deallocation. And I'm not sure how often array allocation per se is actually a bottleneck -- usually you also do things with the arrays, which is more expensive :-). But hey, 15% is nothing to sneeze at. -n From njs at pobox.com Wed Jul 17 11:18:07 2013 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 17 Jul 2013 16:18:07 +0100 Subject: [Numpy-discussion] empty_like for masked arrays In-Reply-To: References: Message-ID: On Mon, Jul 15, 2013 at 2:33 PM, Gregorio Bastardo wrote: > Hi, > > On Mon, Jun 10, 2013 at 3:47 PM, Nathaniel Smith wrote: >> Hi all, >> >> Is there anyone out there using numpy masked arrays, who has an >> opinion on how empty_like (and its friends ones_like, zeros_like) >> should handle the mask? >> >> Right now apparently if you call np.ma.empty_like on a masked array, >> you get a new masked array that shares the original array's mask, so >> modifying one modifies the other. That's almost certainly wrong. This >> PR: >> https://github.com/numpy/numpy/pull/3404 >> makes it so instead the new array has values that are all set to >> empty/zero/one, and a mask which is set to match the input array's >> mask (so whenever something was masked in the original array, the >> empty/zero/one in that place is also masked). We don't know if this is >> the desired behaviour for these functions, though. Maybe it's more >> intuitive for the new array to match the original array in shape and >> dtype, but to always have an empty mask. Or maybe not. None of us >> really use np.ma, so if you do and have an opinion then please speak >> up... > > I recently joined the mailing list, so the message might not reach the > original thread, sorry for that. > > I use masked arrays extensively, and would vote for the first option, > as I use the *_like operations with the assumption that the resulting > array has the same mask as the original. I think it's more intuitive > than selecting between all masked or all unmasked behaviour. If it's > not too late, please consider my use case. The original submitter of that PR has been silent since then, so so far nothing has happened. So that's 2 votes for copying the mask and 3 against, I guess. That's not very consensus-ful. If there's really a lot of confusion here, then it's possible the answer is that np.ma.empty_like should just raise an error or not be defined. Or can you all agree? -n From nouiz at nouiz.org Wed Jul 17 12:57:16 2013 From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=) Date: Wed, 17 Jul 2013 12:57:16 -0400 Subject: [Numpy-discussion] Speedup by avoiding memory alloc twice in scalar array In-Reply-To: References: Message-ID: On Wed, Jul 17, 2013 at 10:39 AM, Nathaniel Smith wrote: > On Tue, Jul 16, 2013 at 7:53 PM, Fr?d?ric Bastien wrote: > > Hi, > > > > > > On Tue, Jul 16, 2013 at 11:55 AM, Nathaniel Smith wrote: > >> > >> On Tue, Jul 16, 2013 at 2:34 PM, Arink Verma > wrote: > >>> > >>> >Each ndarray does two mallocs, for the obj and buffer. These could be > >>> > combined into 1 - just allocate the total size and do some pointer > >>> > >arithmetic, then set OWNDATA to false. > >>> So, that two mallocs has been mentioned in project introduction. I got > >>> that wrong. > >> > >> > >> On further thought/reading the code, it appears to be more complicated > >> than that, actually. > >> > >> It looks like (for a non-scalar array) we have 2 calls to PyMem_Malloc: > 1 > >> for the array object itself, and one for the shapes + strides. And, one > call > >> to regular-old malloc: for the data buffer. > >> > >> (Mysteriously, shapes + strides together have 2*ndim elements, but to > hold > >> them we allocate a memory region sized to hold 3*ndim elements. I'm not > sure > >> why.) > >> > >> And contrary to what I said earlier, this is about as optimized as it > can > >> be without breaking ABI. We need at least 2 calls to > malloc/PyMem_Malloc, > >> because the shapes+strides may need to be resized without affecting the > much > >> larger data area. But it's tempting to allocate the array object and the > >> data buffer in a single memory region, like I suggested earlier. And > this > >> would ALMOST work. But, it turns out there is code out there which > assumes > >> (whether wisely or not) that you can swap around which data buffer a > given > >> PyArrayObject refers to (hi Theano!). And supporting this means that > data > >> buffers and PyArrayObjects need to be in separate memory regions. > > > > > > Are you sure that Theano "swap" the data ptr of an ndarray? When we play > > with that, it is on a newly create ndarray. So a node in our graph, won't > > change the input ndarray structure. It will create a new ndarray > structure > > with new shape/strides and pass a data ptr and we flag the new ndarray > with > > own_data correctly to my knowledge. > > > > If Theano pose a problem here, I'll suggest that I fix Theano. But > currently > > I don't see the problem. So if this make you change your mind about this > > optimization, tell me. I don't want Theano to prevent optimization in > NumPy. > > It's entirely possible I misunderstood, so let's see if we can work it > out. I know that you want to assign to the ->data pointer in a > PyArrayObject, right? That's what caused some trouble with the 1.7 API > deprecations, which were trying to prevent direct access to this > field? Creating a new array given a pointer to a memory region is no > problem, and obviously will be supported regardless of any > optimizations. But if that's all you were doing then you shouldn't > have run into the deprecation problem. Or maybe I'm misremembering! > What is currently done at only 1 place is to create a new PyArrayObject with a given ptr. So NumPy don't do the allocation. We later change that ptr to another one. It is the change to the ptr of the just created PyArrayObject that caused problem with the interface deprecation. I fixed all other problem releated to the deprecation (mostly just rename of function/macro). But I didn't fixed this one yet. I would need to change the logic to compute the final ptr before creating the PyArrayObject object and create it with the final data ptr. But in call cases, NumPy didn't allocated data memory for this object, so this case don't block your optimization. One thing in our optimization "wish list" is to reuse allocated PyArrayObject between Theano function call for intermediate results(so completly under Theano control). This could be useful in particular for reshape/transpose/subtensor. Those functions are pretty fast and from memory, I already found the allocation time was significant. But in those cases, it is on PyArrayObject that are views, so the metadata and the data would be in different memory region in all cases. The other cases of optimization "wish list" is if we want to reuse the PyArrayObject when the shape isn't the good one (but the number of dimensions is the same). If we do that for operation like addition, we will need to use PyArray_Resize(). This will be done on PyArrayObject whose data memory was allocated by NumPy. So if you do one memory allowcation for metadata and data, just make sure that PyArray_Resize() will handle that correctly. On the usefulness of doing only 1 memory allocation, on our old gpu ndarray, we where doing 2 alloc on the GPU, one for metadata and one for data. I removed this, as this was a bottleneck. allocation on the CPU are faster the on the GPU, but this is still something that is slow except if you reuse memory. Do PyMem_Malloc, reuse previous small allocation? For those that read up all this, the conclusion is that Theano should block this optimization. If you optimize the allocation of new PyArrayObject, they will be less incentive to do the "wish list" optimization. One last thing to keep in mind is that you should keep the data segment aligned. I would arg that alignment on the datatype size isn't enough, so I would suggest on cache line size or something like this. But I don't have number to base this one. This would also help in the case of resize that change the number of dimensions. Fred -------------- next part -------------- An HTML attachment was scrubbed... URL: From brady.mccary at gmail.com Wed Jul 17 13:21:43 2013 From: brady.mccary at gmail.com (Brady McCary) Date: Wed, 17 Jul 2013 12:21:43 -0500 Subject: [Numpy-discussion] Size/Shape Message-ID: NumPy Folks, Would someone please discuss or point me to a discussion about the discrepancy in size vs shape in the following MWE? In this example I have used a grayscale PNG version of the ImageMagick logo, but any image which is not square will do. $ python Python 2.7.4 (default, Apr 19 2013, 18:28:01) [GCC 4.7.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import Image >>> import matplotlib.pyplot as plt >>> >>> s = 'logo.png' >>> >>> im = Image.open(s) >>> ar = plt.imread(s) >>> >>> im.size (640, 480) >>> >>> ar.shape (480, 640) >>> The extents/shape of the NumPy array (as loaded by matplotlib, but this convention seems uniform through NumPy) are transposed from what seems to be the usual convention. Why was this choice made? Brady From robert.kern at gmail.com Wed Jul 17 13:41:37 2013 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 17 Jul 2013 18:41:37 +0100 Subject: [Numpy-discussion] Size/Shape In-Reply-To: References: Message-ID: On Wed, Jul 17, 2013 at 6:21 PM, Brady McCary wrote: > > NumPy Folks, > > Would someone please discuss or point me to a discussion about the > discrepancy in size vs shape in the following MWE? In this example I > have used a grayscale PNG version of the ImageMagick logo, but any > image which is not square will do. > > $ python > Python 2.7.4 (default, Apr 19 2013, 18:28:01) > [GCC 4.7.3] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> import Image > >>> import matplotlib.pyplot as plt > >>> > >>> s = 'logo.png' > >>> > >>> im = Image.open(s) > >>> ar = plt.imread(s) > >>> > >>> im.size > (640, 480) > >>> > >>> ar.shape > (480, 640) > >>> > > The extents/shape of the NumPy array (as loaded by matplotlib, but > this convention seems uniform through NumPy) are transposed from what > seems to be the usual convention. Why was this choice made? It matches Python sequence semantics better. ar[i] will index along the first axis to return an array of one less dimension, which itself can be indexed (ar[i])[j]. Try using a list of lists to see what we are trying to be consistent with. To extend this to multidimensional indexing, we want ar[i,j] to give the same thing as ar[i][j]. The .shape attribute needs to be given in the same order that indexing happens. Note that what you call the "usual convention" isn't all that standard for general multidimensional arrays. It's just one of two fairly arbitrary choices, usually derived from the default memory layout at a very low level. Fortran picked one convention, C picked another; numpy and Python are built with C so we use its default conventions. Now, you are right that image dimensions are usually quoted as (width, height), but numpy arrays represent a much broader range of objects than images. -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Wed Jul 17 17:42:57 2013 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 17 Jul 2013 15:42:57 -0600 Subject: [Numpy-discussion] Speedup by avoiding memory alloc twice in scalar array In-Reply-To: References: Message-ID: On Wed, Jul 17, 2013 at 10:57 AM, Fr?d?ric Bastien wrote: > > > > On Wed, Jul 17, 2013 at 10:39 AM, Nathaniel Smith wrote: > >> On Tue, Jul 16, 2013 at 7:53 PM, Fr?d?ric Bastien >> wrote: >> > Hi, >> > >> > >> > On Tue, Jul 16, 2013 at 11:55 AM, Nathaniel Smith >> wrote: >> >> >> >> On Tue, Jul 16, 2013 at 2:34 PM, Arink Verma >> wrote: >> >>> >> >>> >Each ndarray does two mallocs, for the obj and buffer. These could be >> >>> > combined into 1 - just allocate the total size and do some pointer >> >>> > >arithmetic, then set OWNDATA to false. >> >>> So, that two mallocs has been mentioned in project introduction. I got >> >>> that wrong. >> >> >> >> >> >> On further thought/reading the code, it appears to be more complicated >> >> than that, actually. >> >> >> >> It looks like (for a non-scalar array) we have 2 calls to >> PyMem_Malloc: 1 >> >> for the array object itself, and one for the shapes + strides. And, >> one call >> >> to regular-old malloc: for the data buffer. >> >> >> >> (Mysteriously, shapes + strides together have 2*ndim elements, but to >> hold >> >> them we allocate a memory region sized to hold 3*ndim elements. I'm >> not sure >> >> why.) >> >> >> >> And contrary to what I said earlier, this is about as optimized as it >> can >> >> be without breaking ABI. We need at least 2 calls to >> malloc/PyMem_Malloc, >> >> because the shapes+strides may need to be resized without affecting >> the much >> >> larger data area. But it's tempting to allocate the array object and >> the >> >> data buffer in a single memory region, like I suggested earlier. And >> this >> >> would ALMOST work. But, it turns out there is code out there which >> assumes >> >> (whether wisely or not) that you can swap around which data buffer a >> given >> >> PyArrayObject refers to (hi Theano!). And supporting this means that >> data >> >> buffers and PyArrayObjects need to be in separate memory regions. >> > >> > >> > Are you sure that Theano "swap" the data ptr of an ndarray? When we play >> > with that, it is on a newly create ndarray. So a node in our graph, >> won't >> > change the input ndarray structure. It will create a new ndarray >> structure >> > with new shape/strides and pass a data ptr and we flag the new ndarray >> with >> > own_data correctly to my knowledge. >> > >> > If Theano pose a problem here, I'll suggest that I fix Theano. But >> currently >> > I don't see the problem. So if this make you change your mind about this >> > optimization, tell me. I don't want Theano to prevent optimization in >> NumPy. >> >> It's entirely possible I misunderstood, so let's see if we can work it >> out. I know that you want to assign to the ->data pointer in a >> PyArrayObject, right? That's what caused some trouble with the 1.7 API >> deprecations, which were trying to prevent direct access to this >> field? Creating a new array given a pointer to a memory region is no >> problem, and obviously will be supported regardless of any >> optimizations. But if that's all you were doing then you shouldn't >> have run into the deprecation problem. Or maybe I'm misremembering! >> > > What is currently done at only 1 place is to create a new PyArrayObject > with a given ptr. So NumPy don't do the allocation. We later change that > ptr to another one. > > It is the change to the ptr of the just created PyArrayObject that caused > problem with the interface deprecation. I fixed all other problem releated > to the deprecation (mostly just rename of function/macro). But I didn't > fixed this one yet. I would need to change the logic to compute the final > ptr before creating the PyArrayObject object and create it with the final > data ptr. But in call cases, NumPy didn't allocated data memory for this > object, so this case don't block your optimization. > > One thing in our optimization "wish list" is to reuse allocated > PyArrayObject between Theano function call for intermediate results(so > completly under Theano control). This could be useful in particular for > reshape/transpose/subtensor. Those functions are pretty fast and from > memory, I already found the allocation time was significant. But in those > cases, it is on PyArrayObject that are views, so the metadata and the data > would be in different memory region in all cases. > > The other cases of optimization "wish list" is if we want to reuse the > PyArrayObject when the shape isn't the good one (but the number of > dimensions is the same). If we do that for operation like addition, we will > need to use PyArray_Resize(). This will be done on PyArrayObject whose data > memory was allocated by NumPy. So if you do one memory allowcation for > metadata and data, just make sure that PyArray_Resize() will handle that > correctly. > > On the usefulness of doing only 1 memory allocation, on our old gpu > ndarray, we where doing 2 alloc on the GPU, one for metadata and one for > data. I removed this, as this was a bottleneck. allocation on the CPU are > faster the on the GPU, but this is still something that is slow except if > you reuse memory. Do PyMem_Malloc, reuse previous small allocation? > > For those that read up all this, the conclusion is that Theano should > block this optimization. If you optimize the allocation of new > PyArrayObject, they will be less incentive to do the "wish list" > optimization. > > One last thing to keep in mind is that you should keep the data segment > aligned. I would arg that alignment on the datatype size isn't enough, so I > would suggest on cache line size or something like this. But I don't have > number to base this one. This would also help in the case of resize that > change the number of dimensions. > > There is a similar thing done in f2py which is still keeping it from being current with the 1.7 macro replacement by functions. I'd like to add a 'swap' type function and would welcome discussion/implementation fo such. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From jrocher at enthought.com Wed Jul 17 18:43:47 2013 From: jrocher at enthought.com (Jonathan Rocher) Date: Wed, 17 Jul 2013 17:43:47 -0500 Subject: [Numpy-discussion] [ANN] 4th Python Symposium at AMS2014 Message-ID: [Apologies for the cross-post] Dear all, If you work with Python around themes like big data, climate, meteorological or oceanic science, and/or GIS, you should come present at the 4th Python Symposium, as part of the American Meteorological Society conference in Atlanta in Feb 2014: http://annual.ametsoc.org/2014/index.cfm/programs-and-events/conferences-and-symposia/fourth-symposium-on-advances-in-modeling-and-analysis-using-python/ The *abstract deadline is Aug 1st*! Jonathan -- Jonathan Rocher, PhD Scientific software developer SciPy2013 conference co-chair Enthought, Inc. jrocher at enthought.com 1-512-536-1057 http://www.enthought.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From grb at skogoglandskap.no Thu Jul 18 03:32:31 2013 From: grb at skogoglandskap.no (Graeme B. Bell) Date: Thu, 18 Jul 2013 07:32:31 +0000 Subject: [Numpy-discussion] np.select use case In-Reply-To: References: Message-ID: Quick question: Can anyone think of a realistic/real-world use case for array broadcasting and np.select, (other than scalar to ndarray broadcasting)? e.g. differently shaped arrays with matching lower dimensions. (I don't know if a use case even exists). Graeme. From njs at pobox.com Thu Jul 18 08:52:00 2013 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 18 Jul 2013 13:52:00 +0100 Subject: [Numpy-discussion] Bringing order to higher dimensional operations Message-ID: Hi all, I hadn't realized until Pauli just pointed it out that np.dot and the new gufuncs actually have different rules for how they handle extra axes: https://github.com/numpy/numpy/pull/3524#issuecomment-21117601 This is turning into a big mess, and it may take some hard decisions to fix it. Before I explain the problem, a bit of terminology: a "vectorized operation" is built by taking an "intrinsic operation" and putting a loop around it, to apply it many times. So for np.add for example, the intrinsic operation is scalar addition, and if you put a loop around this you get vectorized addition. The question then is, given some input arrays: which parts do you loop over, and which things you pass to the intrinsic operation? In the case of scalar operations (like classic ufuncs), this is pretty straightforward: we broadcast the input arrays together, and loop over them in parallel. So np.add(ones((2, 3, 4)), ones((3, 4))).shape == (2, 3, 4) The other obvious option is, instead of looping over the two arrays in parallel, instead find all combinations. This is what the .outer method on ufuncs does, so np.add.outer(ones((2, 3)), ones((4, 5))).shape == (2, 3, 4, 5) Now, that's for vectorized versions scalar operations. We also have a bunch of vectorized operations whose "intrinsic operation" is itself a function over multidimensional arrays. For example, reduction operations like 'sum' intrinsically take a 1-dim array and return a 0-dim (scalar), but they can be vectorized to apply to a single axis of a >1-dim array. Or matrix multiplication intrinsically takes two 2-dim arrays and returns a 2-dim array; it can be vectorized to apply to >2 dim inputs. (As shorthand I'll write "takes two 2-dim arrays and returns a 2-dim array" as "2,2->2"; so 'sum' is 1->0 and 'cumsum' is 1->1.) ----- Okay, now I can explain the problem. For vectorized multidimensional operations, we have four (!) different conventions deployed: Convention 1: Ufunc .reduce (1->0), .accumulate (1->1), .reduceat (1,1->1): These pick the 0th axis for the intrinsic axis and loop over the rest. (By default; can be modified by specifying axis.) np.add.reduce(np.ones((2, 3, 4))).shape == (3, 4) np.add.accumulate(np.ones((2, 3, 4))).shape == (2, 3, 4) Convention 2: Reduction (1->0) and accumulation (1->1) operations defined as top-level functions and ndarray methods (np.sum, ndarray.sum, np.mean, np.cumprod, etc.): These flatten the array and use the whole thing as the intrinsic axis. (By default; can be modified by specifying axis=.) np.sum(np.ones((2, 3, 4))).shape == () np.cumsum(np.ones((2, 3, 4))).shape == (24,) Convention 3: gufuncs (any->any): These take the *last* k axes for the intrinsic axes, and then broadcast and parallel-loop over the rest. Cannot currently be modified. gu_dot = np.linalg._umath_linalg.matrix_multiply # requires current master gu_dot(np.ones((2, 3, 4, 5)), np.ones((1, 3, 5, 6))).shape == (2, 3, 4, 6) (So what's happened here is that the gufunc pulled off the last two axes of each array, so the intrinsic operation is always going to be a matrix multiply of a (4, 5) array by a (5, 6) array, producing a (4, 6) array. Then it broadcast the remaining axes together: (2, 3) and (1, 3) broadcast to (2, 3), and did a parallel iteration over them: output[0, 0, :, :] is the result of dot(input1[0, 0, :, :], input2[0, 0, :, :]).) Convention 4: np.dot (2->2): this one is bizarre: np.dot(np.ones((1, 2, 10, 11)), np.ones((101, 102, 11, 12)).shape == (1, 2, 10, 101, 102, 12) So what it's done is picked the last two axes to be the intrinsic axes, just like the gufunc -- so it always does a bunch of matrix multiplies of a (10, 11) array with an (11, 12) array. But then it didn't do a ufunc-style parallel loop. Instead it did a ufunc.outer-style outer loop, in which it found all n^2 ways of matching up a matrix in the first input with a matrix in the second input, and took the dot of each. And then it packed these up into an array with a rather weird shape: first all the looping axes from the first input, then the first axis of the output matrix, then all the looping axes from the second input, and then finally the second axis of the output matrix. ----- There are plenty of general reasons to want to simplify this -- it'd make numpy easier to explain and understand, simplify the code, etc. -- but also a few more specific reasons that make it urgent: - gufuncs haven't really been used much yet, so maybe it'd be easy to change how they work now. But in the next release, everything in np.linalg will become a gufunc, so it'll become much harder to change after that. - we'd really like np.dot to become a gufunc -- in particular, in combination with Blake's work on ufunc overrides, this would allow np.dot() to work on scipy.sparse matrices. - pretty soon we'll want to turn things like 'mean' into gufuncs too, for the same reason. ----- Okay, what to do? The gufunc convention actually seems like the right one to me. This is unfortunate, because it's also the only one we could easily change :-(. But we obviously want our vectorized operations to do broadcasting by default, both for consistency with scalar ufuncs, and because it just seems to be the most useful convention. So that rules out the np.dot convention. Then given that we're broadcasting, the two options are to pick intrinsic axes from the right like gufuncs do, or to follow the ufunc.reduce convention and pick intrinsic axes from the left. But picking from the left seems confusing to me, because broadcasting is itself a right-to-left operation. This doesn't matter for .reduce and such because they only take one input, but for something like 'dot', it means you can have 1's inserted in the "middle" of the array, and then broadcast up to a higher dimension. Compare: gu_dot_leftwards(ones((10, 11, 4)), ones((11, 12, 3, 4))) -> (10, 12, 3, 4) versus gu_dot_rightwards(ones((4, 10, 11)), ones((3, 4, 11, 12))) -> (3, 4, 10, 12) To me, it's easier to figure out which axes end up where in the second case. Working from the right, we take two axes to be the intrinsic axes, then we match up the next axis (4 matches 4), then we append a 1 and match up the last axis (1 broadcasts to match 3). So: QUESTION 1: does that sound right: that in a perfect world, the current gufunc convention would be the only one, and that's what we should work towards, at least in the cases where that's possible? QUESTION 2: Assuming that's right, it would be *really nice* if we could at least get np.dot onto our new convention, for consistency with the rest of np.linalg, and to allow it to be overridden. I'm sort of terrified to touch np.dot's API, but the only cases where it would act differently is when *both* arguments have *3 or more dimensions*, and I guess there are very very few users who fall into that category. So maybe we could start raising some big FutureWarnings for this case in the next release, and eventually switch? (I'm even more terrified of trying to mess with np.sum or np.add.reduce, so I'll leave that alone for now -- maybe we're just stuck with them. And at least they do already go through the ufunc machinery.) -n From cjwilliams43 at gmail.com Thu Jul 18 09:18:59 2013 From: cjwilliams43 at gmail.com (Colin J. Williams) Date: Thu, 18 Jul 2013 09:18:59 -0400 Subject: [Numpy-discussion] User Guide Message-ID: <51E7EB43.7010904@gmail.com> An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Thu Jul 18 09:23:39 2013 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Thu, 18 Jul 2013 15:23:39 +0200 Subject: [Numpy-discussion] Bringing order to higher dimensional operations In-Reply-To: References: Message-ID: <1374153819.14751.17.camel@sebastian-laptop> On Thu, 2013-07-18 at 13:52 +0100, Nathaniel Smith wrote: > Hi all, > > > So: > > QUESTION 1: does that sound right: that in a perfect world, the > current gufunc convention would be the only one, and that's what we > should work towards, at least in the cases where that's possible? > Sounds right to me, ufunc/gufunc broadcasting assumes the "inner" dimensions are the right-most. Since we are normally in C-order arrays, this also seems the sensible way if you consider the memory layout. > QUESTION 2: Assuming that's right, it would be *really nice* if we > could at least get np.dot onto our new convention, for consistency > with the rest of np.linalg, and to allow it to be overridden. I'm sort > of terrified to touch np.dot's API, but the only cases where it would > act differently is when *both* arguments have *3 or more dimensions*, > and I guess there are very very few users who fall into that category. > So maybe we could start raising some big FutureWarnings for this case > in the next release, and eventually switch? > It is noble to try to get do to use the gufunc convention, but if you look at the new gufunc linalg functions, they already have to have some weird tricks in the case of np.linalg.solve. It is so difficult because of the fact that dot is basically a combination of many functions: o vector * vector -> vector o vector * matrix -> matrix (add dimensions to vector on right) o matrix * vector -> matrix (add dimensions to vector on left) o matrix * matrix -> matrix plus scalar cases. I somewhat believe we should not touch dot, or deprecate anything but the most basic dot functionality. Then we can point to matrix_multiply, inner1d, etc. which are gufuncs (even if they are not exposed at this time). The whole dance that is already done for np.linalg.solve right now is not pretty there, and it will be worse for dot. Because dot is basically overloaded, marrying it with the broadcasting machinery in a general way is impossible. > (I'm even more terrified of trying to mess with np.sum or > np.add.reduce, so I'll leave that alone for now -- maybe we're just > stuck with them. And at least they do already go through the ufunc > machinery.) I did not understand where the inconsistency/problem for the reductions is. - Sebastian > > -n > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From andrew.collette at gmail.com Thu Jul 18 09:23:59 2013 From: andrew.collette at gmail.com (Andrew Collette) Date: Thu, 18 Jul 2013 07:23:59 -0600 Subject: [Numpy-discussion] ANN: HDF5 for Python (h5py) 2.2 BETA Message-ID: Announcing HDF5 for Python (h5py) 2.2.0 BETA ============================================ We are proud to announce that HDF5 for Python 2.2.0 (beta) is now available. Because of the large number of new features in this release, we are actively seeking community feedback over the (2-week) beta period. The h5py package is a Pythonic interface to the HDF5 binary data format. It lets you store huge amounts of numerical data, and easily manipulate that data from NumPy. For example, you can slice into multi-terabyte datasets stored on disk, as if they were real NumPy arrays. Thousands of datasets can be stored in a single file, categorized and tagged however you want. H5py uses straightforward NumPy and Python metaphors, like dictionary and NumPy array syntax. For example, you can iterate over datasets in a file, or check out the .shape or .dtype attributes of datasets. You don't need to know anything special about HDF5 to get started. Documentation and download links are available at: http://www.h5py.org Parallel HDF5 ============= This version of h5py introduces support for MPI/Parallel HDF5, using the mpi4py package. Parallel HDF5 is the native method for sharing files and objects across multiple processes. Unlike "multiprocessing" based solutions, all processes in an MPI-based program can read from and write to the same shared HDF5 file. There is a guide to using Parallel HDF5 at the h5py web site: http://h5py.org/docs/build/html/topics/mpi.html Other new features ================== * Support for Python 3.3 * Support for 16-bit "mini" floats * Access to the HDF5 scale-offset filter * Field names are now allowed when writing to a dataset * Region references now preserve the shape of their selections * File-resident "committed" types can be linked to datasets and attributes * A new "move" method on Group objects * Many new options for Group.copy From njs at pobox.com Thu Jul 18 09:36:50 2013 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 18 Jul 2013 14:36:50 +0100 Subject: [Numpy-discussion] Speedup by avoiding memory alloc twice in scalar array In-Reply-To: References: Message-ID: On Wed, Jul 17, 2013 at 5:57 PM, Fr?d?ric Bastien wrote: > On Wed, Jul 17, 2013 at 10:39 AM, Nathaniel Smith wrote: >> > >> > On Tue, Jul 16, 2013 at 11:55 AM, Nathaniel Smith wrote: >> It's entirely possible I misunderstood, so let's see if we can work it >> out. I know that you want to assign to the ->data pointer in a >> PyArrayObject, right? That's what caused some trouble with the 1.7 API >> deprecations, which were trying to prevent direct access to this >> field? Creating a new array given a pointer to a memory region is no >> problem, and obviously will be supported regardless of any >> optimizations. But if that's all you were doing then you shouldn't >> have run into the deprecation problem. Or maybe I'm misremembering! > > What is currently done at only 1 place is to create a new PyArrayObject with > a given ptr. So NumPy don't do the allocation. We later change that ptr to > another one. Hmm, OK, so that would still work. If the array has the OWNDATA flag set (or you otherwise know where the data came from), then swapping the data pointer would still work. The change would be that in most cases when asking numpy to allocate a new array from scratch, the OWNDATA flag would not be set. That's because the OWNDATA flag really means "when this object is deallocated, call free(self->data)", but if we allocate the array struct and the data buffer together in a single memory region, then deallocating the object will automatically cause the data buffer to be deallocated as well, without the array destructor having to take any special effort. > It is the change to the ptr of the just created PyArrayObject that caused > problem with the interface deprecation. I fixed all other problem releated > to the deprecation (mostly just rename of function/macro). But I didn't > fixed this one yet. I would need to change the logic to compute the final > ptr before creating the PyArrayObject object and create it with the final > data ptr. But in call cases, NumPy didn't allocated data memory for this > object, so this case don't block your optimization. Right. > One thing in our optimization "wish list" is to reuse allocated > PyArrayObject between Theano function call for intermediate results(so > completly under Theano control). This could be useful in particular for > reshape/transpose/subtensor. Those functions are pretty fast and from > memory, I already found the allocation time was significant. But in those > cases, it is on PyArrayObject that are views, so the metadata and the data > would be in different memory region in all cases. > > The other cases of optimization "wish list" is if we want to reuse the > PyArrayObject when the shape isn't the good one (but the number of > dimensions is the same). If we do that for operation like addition, we will > need to use PyArray_Resize(). This will be done on PyArrayObject whose data > memory was allocated by NumPy. So if you do one memory allowcation for > metadata and data, just make sure that PyArray_Resize() will handle that > correctly. I'm not sure I follow the details here, but it does turn out that a really surprising amount of time in PyArray_NewFromDescr is spent in just calculating and writing out the shape and strides buffers, so for programs that e.g. use hundreds of small 3-element arrays to represent points in space, re-using even these buffers might be a big win... > On the usefulness of doing only 1 memory allocation, on our old gpu ndarray, > we where doing 2 alloc on the GPU, one for metadata and one for data. I > removed this, as this was a bottleneck. allocation on the CPU are faster the > on the GPU, but this is still something that is slow except if you reuse > memory. Do PyMem_Malloc, reuse previous small allocation? Yes, at least in theory PyMem_Malloc is highly-optimized for small buffer re-use. (For requests >256 bytes it just calls malloc().) And it's possible to define type-specific freelists; not sure if there's any value in doing that for PyArrayObjects. See Objects/obmalloc.c in the Python source tree. -n From mdroe at stsci.edu Thu Jul 18 09:42:42 2013 From: mdroe at stsci.edu (Michael Droettboom) Date: Thu, 18 Jul 2013 09:42:42 -0400 Subject: [Numpy-discussion] Results of matplotlib user survey 2013 Message-ID: <51E7F0D2.8040807@stsci.edu> We have had 508 responses to the matplotlib user survey. Quite a nice turnout! You can view the results here: https://docs.google.com/spreadsheet/viewanalytics?key=0AjrPjlTMRTwTdHpQS25pcTZIRWdqX0pNckNSU01sMHc&gridId=0#chart and from there, you can access the complete raw results. I will be doing more analysis of the results over the coming days and weeks, including dedup'ing some of the responses and converting some of the free-form responses into github issues etc. Volunteers to help with this are of course welcome! Cheers, Mike -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at sun.ac.za Thu Jul 18 10:18:56 2013 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Thu, 18 Jul 2013 16:18:56 +0200 Subject: [Numpy-discussion] add .H attribute? In-Reply-To: References: <51D98902.1090403@gmail.com> <51E17280.2030105@gmail.com> Message-ID: On Sat, Jul 13, 2013 at 7:46 PM, Nathaniel Smith wrote: > Why not just write > > def H(a): > return a.conj().T It's hard to convince students that this is the Best Way of doing things in NumPy. Why, they ask, can you do it using a' in MATLAB, then? I've tripped over this one before, since it's not the kind of thing you imagine would be unimplemented, and then spend some time trying to find it. St?fan From mdroe at stsci.edu Thu Jul 18 10:20:00 2013 From: mdroe at stsci.edu (Michael Droettboom) Date: Thu, 18 Jul 2013 10:20:00 -0400 Subject: [Numpy-discussion] Results of matplotlib user survey 2013 In-Reply-To: <51E7F0D2.8040807@stsci.edu> References: <51E7F0D2.8040807@stsci.edu> Message-ID: <51E7F990.3060307@stsci.edu> Apologies: I didn't realize the link to the raw results only exists for users with edit permissions. The public URL for the raw results is: https://docs.google.com/spreadsheet/ccc?key=0AjrPjlTMRTwTdHpQS25pcTZIRWdqX0pNckNSU01sMHc&usp=sharing Mike On 07/18/2013 09:42 AM, Michael Droettboom wrote: > We have had 508 responses to the matplotlib user survey. Quite a nice > turnout! > > You can view the results here: > > https://docs.google.com/spreadsheet/viewanalytics?key=0AjrPjlTMRTwTdHpQS25pcTZIRWdqX0pNckNSU01sMHc&gridId=0#chart > > and from there, you can access the complete raw results. > > I will be doing more analysis of the results over the coming days and > weeks, including dedup'ing some of the responses and converting some > of the free-form responses into github issues etc. Volunteers to help > with this are of course welcome! > > Cheers, > Mike -------------- next part -------------- An HTML attachment was scrubbed... URL: From alan.isaac at gmail.com Thu Jul 18 12:57:19 2013 From: alan.isaac at gmail.com (Alan G Isaac) Date: Thu, 18 Jul 2013 12:57:19 -0400 Subject: [Numpy-discussion] azip Message-ID: <51E81E6F.8010302@gmail.com> I'm floating this thought even though it is not fleshed out. On occasion, I run into the following problem: I have a rectangular array A to which I want to append a (probably) one dimensional vector b to make [A|b]. Of course this can be done as np.hstack((x,b[:,None])) (or obscurely np.r_['1,2,0',x,b]), but this has the following issues: - what if ``b`` turns out to be a list? - what if ``b`` turns out to be 2d (e.g., a column vector)? - it's a bit ugly - it is not obvious when read by others (e.g., students) (The last is a key motivation for me to talk about this.) All of which leads me to wonder if there might be profit in a numpy.azip function that takes as arguments - a tuple of arraylike iterables - an axis along which to concatenate (say, like r_ does) iterated items To make that a little clearer (but not to provide a suggested implementation), it might behave something like def azip(alst, axis=1): results = [] for tpl in zip(*alst): results.append(np.r_[tpl]) return np.rollaxis(np.array(results), axis-1) Alan Isaac From robert.kern at gmail.com Thu Jul 18 13:03:06 2013 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 18 Jul 2013 18:03:06 +0100 Subject: [Numpy-discussion] azip In-Reply-To: <51E81E6F.8010302@gmail.com> References: <51E81E6F.8010302@gmail.com> Message-ID: On Thu, Jul 18, 2013 at 5:57 PM, Alan G Isaac wrote: > > I'm floating this thought even though it is not fleshed out. > > On occasion, I run into the following problem: > I have a rectangular array A to which I want to append > a (probably) one dimensional vector b to make [A|b]. > Of course this can be done as np.hstack((x,b[:,None])) > (or obscurely np.r_['1,2,0',x,b]), but this has the following issues: > > - what if ``b`` turns out to be a list? > - what if ``b`` turns out to be 2d (e.g., a column vector)? > - it's a bit ugly > - it is not obvious when read by others (e.g., students) np.column_stack([x, b]) does everything you need. -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From alan.isaac at gmail.com Thu Jul 18 13:06:59 2013 From: alan.isaac at gmail.com (Alan G Isaac) Date: Thu, 18 Jul 2013 13:06:59 -0400 Subject: [Numpy-discussion] azip In-Reply-To: References: <51E81E6F.8010302@gmail.com> Message-ID: <51E820B3.8000208@gmail.com> On 7/18/2013 1:03 PM, Robert Kern wrote: > np.column_stack([x, b]) does everything you need. So it does. It's not referenced from the hstack or concatenate documentation. Thanks! Alan From stefan at sun.ac.za Thu Jul 18 13:14:06 2013 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Thu, 18 Jul 2013 19:14:06 +0200 Subject: [Numpy-discussion] azip In-Reply-To: <51E820B3.8000208@gmail.com> References: <51E81E6F.8010302@gmail.com> <51E820B3.8000208@gmail.com> Message-ID: On Thu, Jul 18, 2013 at 7:06 PM, Alan G Isaac wrote: > On 7/18/2013 1:03 PM, Robert Kern wrote: >> np.column_stack([x, b]) does everything you need. > > So it does. > > It's not referenced from the hstack or concatenate documentation. A pull request would fix all of that in seconds! GitHub now allows online editing, and provides a one-click option for creating the PR. St?fan From ben.root at ou.edu Thu Jul 18 13:18:06 2013 From: ben.root at ou.edu (Benjamin Root) Date: Thu, 18 Jul 2013 13:18:06 -0400 Subject: [Numpy-discussion] azip In-Reply-To: References: <51E81E6F.8010302@gmail.com> <51E820B3.8000208@gmail.com> Message-ID: Forgive my ignorance, but has numpy and scipy stopped doing that weird doc editing thing that existed back in the days of Trac? I have actually held back on submitting doc edits because I hated using that thing so much. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at sun.ac.za Thu Jul 18 13:27:03 2013 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Thu, 18 Jul 2013 19:27:03 +0200 Subject: [Numpy-discussion] azip In-Reply-To: References: <51E81E6F.8010302@gmail.com> <51E820B3.8000208@gmail.com> Message-ID: Hi Ben On Thu, Jul 18, 2013 at 7:18 PM, Benjamin Root wrote: > Forgive my ignorance, but has numpy and scipy stopped doing that weird doc > editing thing that existed back in the days of Trac? I have actually held > back on submitting doc edits because I hated using that thing so much. That thing helps people without hacking experience to contribute, but you are welcome to issue pull-requests instead. St?fan From alan.isaac at gmail.com Thu Jul 18 13:50:42 2013 From: alan.isaac at gmail.com (Alan G Isaac) Date: Thu, 18 Jul 2013 13:50:42 -0400 Subject: [Numpy-discussion] azip In-Reply-To: References: <51E81E6F.8010302@gmail.com> Message-ID: <51E82AF2.60905@gmail.com> On 7/18/2013 1:03 PM, Robert Kern wrote: > np.column_stack([x, b]) does everything you need. I am curious: why is column_stack in numpy/lib/shape_base.py while hstack and vstack are in numpy/core/shape_base.py ? Thanks, Alan From pav at iki.fi Thu Jul 18 13:51:08 2013 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 18 Jul 2013 20:51:08 +0300 Subject: [Numpy-discussion] azip In-Reply-To: References: <51E81E6F.8010302@gmail.com> <51E820B3.8000208@gmail.com> Message-ID: 18.07.2013 20:18, Benjamin Root kirjoitti: > Forgive my ignorance, but has numpy and scipy stopped doing that weird > doc editing thing that existed back in the days of Trac? I have actually > held back on submitting doc edits because I hated using that thing so much. You were never required to use it. -- Pauli Virtanen From ben.root at ou.edu Thu Jul 18 14:11:46 2013 From: ben.root at ou.edu (Benjamin Root) Date: Thu, 18 Jul 2013 14:11:46 -0400 Subject: [Numpy-discussion] azip In-Reply-To: References: <51E81E6F.8010302@gmail.com> <51E820B3.8000208@gmail.com> Message-ID: Well, that's nice to know now. However, I distinctly remember being told that any changes made to the docstrings directly in the source would end up getting replaced by whatever was in the doc edit system whenever a merge from it happens. Therefore, if one wanted their edits to be persistent, they had to submit it through the doc edit system. Note, much of my animosity towards the doc edit system was due to issues with the scipy.org being so sluggish back then, and the length of time it took for any edits to finally make it down to the docstrings. Now that scipy.org is much more responsive, and that numpy and scipy has moved on to git, perhaps those two issues are gone now? Sorry for hijacking the thread, this is just the first I am hearing that one can submit documentation edits via PRs and was surprised. Cheers! Ben Root On Thu, Jul 18, 2013 at 1:51 PM, Pauli Virtanen wrote: > 18.07.2013 20:18, Benjamin Root kirjoitti: > > Forgive my ignorance, but has numpy and scipy stopped doing that weird > > doc editing thing that existed back in the days of Trac? I have actually > > held back on submitting doc edits because I hated using that thing so > much. > > You were never required to use it. > > -- > Pauli Virtanen > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Thu Jul 18 14:21:19 2013 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 18 Jul 2013 21:21:19 +0300 Subject: [Numpy-discussion] azip In-Reply-To: References: <51E81E6F.8010302@gmail.com> <51E820B3.8000208@gmail.com> Message-ID: 18.07.2013 21:11, Benjamin Root kirjoitti: > Well, that's nice to know now. However, I distinctly remember being told > that any changes made to the docstrings directly in the source would end > up getting replaced by whatever was in the doc edit system whenever a > merge from it happens. Therefore, if one wanted their edits to be > persistent, they had to submit it through the doc edit system. I think there must have been some misunderstanding here: the doc editor works similarly to VCS, in that it will detect merge conflicts and require someone to manually resolve conflicts if the docstring in the source code has been changed. -- Pauli Virtanen From smortaz at exchange.microsoft.com Thu Jul 18 18:06:40 2013 From: smortaz at exchange.microsoft.com (Shahrokh Mortazavi) Date: Thu, 18 Jul 2013 22:06:40 +0000 Subject: [Numpy-discussion] Mixed Python + C debugging support in Visual Studio Message-ID: Hi folks, 1st time poster - apologies if I'm breaking any protocols... We were told that this would be a good alias to announce this on: a few Python & OSS enthusiasts and Microsoft have created a plug-in for Visual Studio that enables Python <-> C/C++ debugging. You may find this useful for debugging your extension modules. A quick video overview of the mixed mode debugging feature: http://www.youtube.com/watch?v=wvJaKQ94lBY&hd=1 (HD) Documentation: https://pytools.codeplex.com/wikipage?title=Mixed-mode%20debugging Python Tools for Visual Studio is free (and OSS): http://pytools.codeplex.com Cheers, s -------------- next part -------------- An HTML attachment was scrubbed... URL: From rob.clewley at gmail.com Thu Jul 18 18:21:24 2013 From: rob.clewley at gmail.com (Rob Clewley) Date: Thu, 18 Jul 2013 18:21:24 -0400 Subject: [Numpy-discussion] User Guide In-Reply-To: <51E7EB43.7010904@gmail.com> References: <51E7EB43.7010904@gmail.com> Message-ID: Hi, I see the desire for stylistic improvement by removing the awkward parens but your correction has incorrect grammar. One cannot have "arrays of Python," nor are Numpy objects a subset of "Python" (because Python is not a set) -- both of which are what your sentence technically states. I.e., the commas are in the wrong place. You could say "The exception: one can have arrays of python objects (including those from numpy) thereby allowing for arrays of different sized elements." but I think it is even clear to just unpack this a bit more with "The exception: one can have arrays of python objects, including numpy objects, which allows arrays to contain different sized elements." In my experience, attempting to be extremely concise in technical writing is a common cause of awkward grammar problems like this. I do it all the time :) -Rob On Thu, Jul 18, 2013 at 9:18 AM, Colin J. Williams wrote: > Returning to numpy after a while away, I'm impressed with the style and > content of the User Guide and the Reference. This is to offer a Guide > correction - I couldn't figure out how to offer the correction on-line. > > What is Numpy? > > > Suggest: > > "The exception: one can have arrays of (Python, including NumPy) objects, > thereby allowing for arrays of different sized elements." > > to: > > The exception: one can have arrays of Python, including NumPy objects, > thereby allowing for arrays of different sized elements. > > Colin W. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Robert Clewley, Ph.D. Assistant Professor Neuroscience Institute and Department of Mathematics and Statistics Georgia State University PO Box 5030 Atlanta, GA 30302, USA tel: 404-413-6420 fax: 404-413-5446 http://neuroscience.gsu.edu/rclewley.html From lists at onerussian.com Thu Jul 18 22:49:20 2013 From: lists at onerussian.com (Yaroslav Halchenko) Date: Thu, 18 Jul 2013 22:49:20 -0400 Subject: [Numpy-discussion] the mean, var, std of non-arrays Message-ID: <20130719024920.GO27621@onerussian.com> Hi everyone, Some of my elderly code stopped working upon upgrades of numpy and upcoming pandas: https://github.com/pydata/pandas/issues/4290 so I have looked at the code of 2481 def mean(a, axis=None, dtype=None, out=None, keepdims=False): 2482 """ ... 2489 Parameters 2490 ---------- 2491 a : array_like 2492 Array containing numbers whose mean is desired. If `a` is not an 2493 array, a conversion is attempted. ... 2555 """ 2556 if type(a) is not mu.ndarray: 2557 try: 2558 mean = a.mean 2559 return mean(axis=axis, dtype=dtype, out=out) 2560 except AttributeError: 2561 pass 2562 2563 return _methods._mean(a, axis=axis, dtype=dtype, 2564 out=out, keepdims=keepdims) here 'array_like'ness is checked by a having mean function. Then it is assumed that it has the same definition as ndarray, including dtype keyword argument. Not sure anyways if my direct numpy.mean application to pandas DataFrame is "kosher" -- initially I just assumed that any argument is asanyarray'ed first -- but I think here catching TypeError for those incompatible .mean's would not hurt either. What do you think? Similar logic applies to mean cousins (var, std, ...?) decorated around _methods implementations. -- Yaroslav O. Halchenko, Ph.D. http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org Senior Research Associate, Psychological and Brain Sciences Dept. Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 WWW: http://www.linkedin.com/in/yarik From jsseabold at gmail.com Thu Jul 18 23:18:49 2013 From: jsseabold at gmail.com (Skipper Seabold) Date: Thu, 18 Jul 2013 23:18:49 -0400 Subject: [Numpy-discussion] the mean, var, std of non-arrays In-Reply-To: <20130719024920.GO27621@onerussian.com> References: <20130719024920.GO27621@onerussian.com> Message-ID: On Thu, Jul 18, 2013 at 10:49 PM, Yaroslav Halchenko wrote: > Hi everyone, > > Some of my elderly code stopped working upon upgrades of numpy and > upcoming pandas: https://github.com/pydata/pandas/issues/4290 so I have > looked at the code of > > 2481 def mean(a, axis=None, dtype=None, out=None, keepdims=False): > 2482 """ > ... > 2489 Parameters > 2490 ---------- > 2491 a : array_like > 2492 Array containing numbers whose mean is desired. If `a` is > not an > 2493 array, a conversion is attempted. > ... > 2555 """ > 2556 if type(a) is not mu.ndarray: > 2557 try: > 2558 mean = a.mean > 2559 return mean(axis=axis, dtype=dtype, out=out) > 2560 except AttributeError: > 2561 pass > 2562 > 2563 return _methods._mean(a, axis=axis, dtype=dtype, > 2564 out=out, keepdims=keepdims) > > here 'array_like'ness is checked by a having mean function. Then it is > assumed > that it has the same definition as ndarray, including dtype keyword > argument. > > Not sure anyways if my direct numpy.mean application to pandas DataFrame is > "kosher" -- initially I just assumed that any argument is asanyarray'ed > first > -- but I think here catching TypeError for those incompatible .mean's > would not > hurt either. What do you think? Similar logic applies to mean cousins > (var, > std, ...?) decorated around _methods implementations. Related? From a while ago. https://github.com/numpy/numpy/pull/160 Skipper -------------- next part -------------- An HTML attachment was scrubbed... URL: From lists at onerussian.com Thu Jul 18 23:24:42 2013 From: lists at onerussian.com (Yaroslav Halchenko) Date: Thu, 18 Jul 2013 23:24:42 -0400 Subject: [Numpy-discussion] the mean, var, std of non-arrays In-Reply-To: References: <20130719024920.GO27621@onerussian.com> Message-ID: <20130719032442.GP27621@onerussian.com> On Thu, 18 Jul 2013, Skipper Seabold wrote: > Not sure anyways if my direct numpy.mean application to pandas DataFrame > is > "kosher" -- initially I just assumed that any argument is asanyarray'ed > first > -- but I think here catching TypeError for those incompatible .mean's > would not > hurt either. ?What do you think? ?Similar logic applies to mean cousins > (var, > std, ...?) decorated around _methods implementations. > Related? From a while ago. > [3]https://github.com/numpy/numpy/pull/160 yeah... That is how I thought "it is working", but I guess it was left without asanyarraying for additional flexibility/performance so any array-like object could be used, not just ndarray derived classes. -- Yaroslav O. Halchenko, Ph.D. http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org Senior Research Associate, Psychological and Brain Sciences Dept. Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 WWW: http://www.linkedin.com/in/yarik From charlesr.harris at gmail.com Fri Jul 19 00:12:45 2013 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 18 Jul 2013 22:12:45 -0600 Subject: [Numpy-discussion] the mean, var, std of non-arrays In-Reply-To: <20130719032442.GP27621@onerussian.com> References: <20130719024920.GO27621@onerussian.com> <20130719032442.GP27621@onerussian.com> Message-ID: On Thu, Jul 18, 2013 at 9:24 PM, Yaroslav Halchenko wrote: > > On Thu, 18 Jul 2013, Skipper Seabold wrote: > > > Not sure anyways if my direct numpy.mean application to pandas > DataFrame > > is > > "kosher" -- initially I just assumed that any argument is > asanyarray'ed > > first > > -- but I think here catching TypeError for those incompatible > .mean's > > would not > > hurt either. ?What do you think? ?Similar logic applies to mean > cousins > > (var, > > std, ...?) decorated around _methods implementations. > > > Related? From a while ago. > > [3]https://github.com/numpy/numpy/pull/160 > > yeah... That is how I thought "it is working", but I guess it was left > without asanyarraying for additional flexibility/performance so any > array-like object could be used, not just ndarray derived classes. > Speaking of which, there is a PR for nan{mean, var, std) that you might want to check before it gets committed. There might be some modifications that you would want to add. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at sun.ac.za Fri Jul 19 04:20:49 2013 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Fri, 19 Jul 2013 10:20:49 +0200 Subject: [Numpy-discussion] User Guide In-Reply-To: References: <51E7EB43.7010904@gmail.com> Message-ID: On Fri, Jul 19, 2013 at 12:21 AM, Rob Clewley wrote: > "The exception: one can have arrays of python objects, including numpy > objects, which allows arrays to contain different sized elements." What are numpy objects? "numpy objects" -> "numpy ndarrays" or "numpy ndarray objects"? St?fan From njs at pobox.com Fri Jul 19 11:14:27 2013 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 19 Jul 2013 16:14:27 +0100 Subject: [Numpy-discussion] Bringing order to higher dimensional operations In-Reply-To: <1374153819.14751.17.camel@sebastian-laptop> References: <1374153819.14751.17.camel@sebastian-laptop> Message-ID: On Thu, Jul 18, 2013 at 2:23 PM, Sebastian Berg wrote: > On Thu, 2013-07-18 at 13:52 +0100, Nathaniel Smith wrote: >> Hi all, >> > >> >> So: >> >> QUESTION 1: does that sound right: that in a perfect world, the >> current gufunc convention would be the only one, and that's what we >> should work towards, at least in the cases where that's possible? >> > > Sounds right to me, ufunc/gufunc broadcasting assumes the "inner" > dimensions are the right-most. Since we are normally in C-order arrays, > this also seems the sensible way if you consider the memory layout. > >> QUESTION 2: Assuming that's right, it would be *really nice* if we >> could at least get np.dot onto our new convention, for consistency >> with the rest of np.linalg, and to allow it to be overridden. I'm sort >> of terrified to touch np.dot's API, but the only cases where it would >> act differently is when *both* arguments have *3 or more dimensions*, >> and I guess there are very very few users who fall into that category. >> So maybe we could start raising some big FutureWarnings for this case >> in the next release, and eventually switch? >> > > It is noble to try to get do to use the gufunc convention, but if you > look at the new gufunc linalg functions, they already have to have some > weird tricks in the case of np.linalg.solve. > It is so difficult because of the fact that dot is basically a > combination of many functions: > o vector * vector -> vector > o vector * matrix -> matrix (add dimensions to vector on right) > o matrix * vector -> matrix (add dimensions to vector on left) > o matrix * matrix -> matrix > plus scalar cases. Oh ugh, I forgot about all those special cases. > I somewhat believe we should not touch dot, or deprecate anything but > the most basic dot functionality. Then we can point to matrix_multiply, > inner1d, etc. which are gufuncs (even if they are not exposed at this > time). The whole dance that is already done for np.linalg.solve right > now is not pretty there, and it will be worse for dot. Because dot is > basically overloaded, marrying it with the broadcasting machinery in a > general way is impossible. While it would be kind of nice if we could eventually make isinstance(np.dot, np.ufunc) be True, I'm not so bothered if it remains a wrapper around gufuncs (like the np.linalg wrappers currently are). Most of these special cases, while ugly, can be handled perfectly well by this sort of mechanism. What I'm most bothered about is pseudo-outer case: np.dot(array with ndim >2, array with ndim >2) This simply *can't* be emulated with a gufunc. And as long as that's true it's going to be very hard to get 'dot' to play along with the general ufunc machinery. So that's specifically the case I was talking about in question 2. >> (I'm even more terrified of trying to mess with np.sum or >> np.add.reduce, so I'll leave that alone for now -- maybe we're just >> stuck with them. And at least they do already go through the ufunc >> machinery.) > > I did not understand where the inconsistency/problem for the reductions > is. What I mean is: Suppose we wrote a gufunc for 'sum', where the intrinsic operation took a vector and returned a scalar. (E.g. we want to implement one of the specialized algorithms for vector summation, like Kahan summation, which can be more accurate than applying scalar addition repeatedly.) Then we'd have: np.sum(ones((2, 3))).shape == () np.add.reduce(ones((2, 3))).shape == (3,) gufunc_sum(ones((2, 3))).shape == (2,) These are three names for exactly the same underlying function... but they all have different defaults for how they vectorize. -n From njs at pobox.com Fri Jul 19 11:31:31 2013 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 19 Jul 2013 16:31:31 +0100 Subject: [Numpy-discussion] Bringing order to higher dimensional operations In-Reply-To: <1374153819.14751.17.camel@sebastian-laptop> References: <1374153819.14751.17.camel@sebastian-laptop> Message-ID: On Thu, Jul 18, 2013 at 2:23 PM, Sebastian Berg wrote: > It is so difficult because of the fact that dot is basically a > combination of many functions: > o vector * vector -> vector > o vector * matrix -> matrix (add dimensions to vector on right) > o matrix * vector -> matrix (add dimensions to vector on left) > o matrix * matrix -> matrix > plus scalar cases. Though, just throwing this out there for the archives since I was thinking about it... I think we *could* consolidate all dot's functionality into a single gufunc, with a few small changes: 1) Deprecate and get rid of the scalar special cases. (For those following along: right now, np.dot(10, array) does scalar multiplication, but this doesn't make much sense conceptually, it's not documented, and I don't think anyone uses it. Except maybe np.matrix.__mul__, but that could be fixed.) 2) Deprecate the strange "broadcasting" behaviour for high-dimensional inputs, in favor of the gufunc version suggested in the previous email. That leaves the vector * vector, vector * matrix, matrix * vector, matrix * matrix cases. To handle these: 3) Extend the gufunc machinery to understand the idea that some core dimensions are allowed to take on a special "nonexistent" size. So the signature for dot would be: (m*,k) x (k, n*) -> (m*, n*) where '*' denotes dimensions who are allowed to take on the "nonexistent" size if necessary. So dot(ones((2, 3)), ones((3, 4))) would have m = 2 k = 3 n = 4 and produce an output with shape (m, n) = (2, 4). But dot(ones((2, 3)), ones((3,))) would have m = 2 k = 3 n = and produce an output with shape (m, n) = (2, ) = (2,). And dot(ones((3,)), ones((3,))) would have m = k = 3 n = and produce an output with shape (m, n) = (, ) = (), i.e., dot(vector, vector) would return a scalar. I'm not sure if there are any other cases where this would be useful, but even if it were just for 'dot', that's still a pretty important case that might justify the mechanism all on its own. -n From stefan at sun.ac.za Fri Jul 19 11:32:11 2013 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Fri, 19 Jul 2013 17:32:11 +0200 Subject: [Numpy-discussion] Bringing order to higher dimensional operations In-Reply-To: References: Message-ID: On Thu, Jul 18, 2013 at 2:52 PM, Nathaniel Smith wrote: > Compare: > gu_dot_leftwards(ones((10, 11, 4)), ones((11, 12, 3, 4))) -> (10, 12, 3, 4) > versus > gu_dot_rightwards(ones((4, 10, 11)), ones((3, 4, 11, 12))) -> (3, 4, 10, 12) The second makes quite a bit more sense to me, and fits with the current way we match broadcasting dimensions (align to the right, match right to left). The np.dot outer example you gave and other exceptions like that will probably give us headaches in the future, so I'd opt for moving away from them. The way ellipses are broadcast, well that's a battle for another day. St?fan From stefan at sun.ac.za Fri Jul 19 11:36:56 2013 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Fri, 19 Jul 2013 17:36:56 +0200 Subject: [Numpy-discussion] Bringing order to higher dimensional operations In-Reply-To: References: <1374153819.14751.17.camel@sebastian-laptop> Message-ID: On Fri, Jul 19, 2013 at 5:31 PM, Nathaniel Smith wrote: > 3) Extend the gufunc machinery to understand the idea that some core > dimensions are allowed to take on a special "nonexistent" size. So the > signature for dot would be: > (m*,k) x (k, n*) -> (m*, n*) > where '*' denotes dimensions who are allowed to take on the > "nonexistent" size if necessary. So dot(ones((2, 3)), ones((3, 4))) > would have > m = 2 > k = 3 > n = 4 > and produce an output with shape (m, n) = (2, 4). But dot(ones((2, > 3)), ones((3,))) would have > m = 2 > k = 3 > n = > and produce an output with shape (m, n) = (2, ) = (2,). And > dot(ones((3,)), ones((3,))) would have > m = > k = 3 > n = > and produce an output with shape (m, n) = (, ) = (), > i.e., dot(vector, vector) would return a scalar. This looks like a fairly clean solution; it could be implemented in a shape pre- and post-processing step, where we pad the array dimensions to match the full signature, and remove it again afterwards. St?fan From sebastian at sipsolutions.net Fri Jul 19 12:05:00 2013 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Fri, 19 Jul 2013 18:05:00 +0200 Subject: [Numpy-discussion] Bringing order to higher dimensional operations In-Reply-To: References: <1374153819.14751.17.camel@sebastian-laptop> Message-ID: <1374249900.3254.28.camel@sebastian-laptop> On Fri, 2013-07-19 at 16:31 +0100, Nathaniel Smith wrote: > On Thu, Jul 18, 2013 at 2:23 PM, Sebastian Berg > wrote: > > It is so difficult because of the fact that dot is basically a > > combination of many functions: > > o vector * vector -> vector > > o vector * matrix -> matrix (add dimensions to vector on right) > > o matrix * vector -> matrix (add dimensions to vector on left) > > o matrix * matrix -> matrix > > plus scalar cases. > > Though, just throwing this out there for the archives since I was > thinking about it... > > I think we *could* consolidate all dot's functionality into a single > gufunc, with a few small changes: > > 1) Deprecate and get rid of the scalar special cases. (For those > following along: right now, np.dot(10, array) does scalar > multiplication, but this doesn't make much sense conceptually, it's > not documented, and I don't think anyone uses it. Except maybe > np.matrix.__mul__, but that could be fixed.) > > 2) Deprecate the strange "broadcasting" behaviour for high-dimensional > inputs, in favor of the gufunc version suggested in the previous > email. > > That leaves the vector * vector, vector * matrix, matrix * vector, > matrix * matrix cases. To handle these: > > 3) Extend the gufunc machinery to understand the idea that some core > dimensions are allowed to take on a special "nonexistent" size. So the > signature for dot would be: > (m*,k) x (k, n*) -> (m*, n*) > where '*' denotes dimensions who are allowed to take on the > "nonexistent" size if necessary. So dot(ones((2, 3)), ones((3, 4))) > would have > m = 2 > k = 3 > n = 4 > and produce an output with shape (m, n) = (2, 4). But dot(ones((2, > 3)), ones((3,))) would have > m = 2 > k = 3 > n = > and produce an output with shape (m, n) = (2, ) = (2,). And > dot(ones((3,)), ones((3,))) would have > m = > k = 3 > n = > and produce an output with shape (m, n) = (, ) = (), > i.e., dot(vector, vector) would return a scalar. > > I'm not sure if there are any other cases where this would be useful, > but even if it were just for 'dot', that's still a pretty important > case that might justify the mechanism all on its own. > Yeah this would work. It is basically what np.linalg.solve currently does in the preparation step. So maybe this is not that bad implemented in the machinery. The logic itself is pretty simple after all. Though it would be one of those features I would probably not want to see used a lot ;). - Sebastian > -n > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From sebastian at sipsolutions.net Fri Jul 19 12:10:34 2013 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Fri, 19 Jul 2013 18:10:34 +0200 Subject: [Numpy-discussion] Bringing order to higher dimensional operations In-Reply-To: References: <1374153819.14751.17.camel@sebastian-laptop> Message-ID: <1374250234.3254.33.camel@sebastian-laptop> On Fri, 2013-07-19 at 16:14 +0100, Nathaniel Smith wrote: > On Thu, Jul 18, 2013 at 2:23 PM, Sebastian Berg > wrote: > > On Thu, 2013-07-18 at 13:52 +0100, Nathaniel Smith wrote: > >> Hi all, > >> > > What I mean is: Suppose we wrote a gufunc for 'sum', where the > intrinsic operation took a vector and returned a scalar. (E.g. we want > to implement one of the specialized algorithms for vector summation, > like Kahan summation, which can be more accurate than applying scalar > addition repeatedly.) > > Then we'd have: > > np.sum(ones((2, 3))).shape == () > np.add.reduce(ones((2, 3))).shape == (3,) > gufunc_sum(ones((2, 3))).shape == (2,) > Ah, indeed! So we have a different default behaviour for ufunc.reduce and all other reduce-like functions, didn't realize that. Changing that would be one huge thing... As to implementing such thing as a Kahan summation, it is true, I also can't see how it fits into the machinery. Maybe it shouldn't even be a gufunc, but we rather need a way to specialize the reduction, or tag on more information into the ufunc itself? - Sebastian > These are three names for exactly the same underlying function... but > they all have different defaults for how they vectorize. > > -n > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From cjwilliams43 at gmail.com Fri Jul 19 13:16:52 2013 From: cjwilliams43 at gmail.com (Colin J. Williams) Date: Fri, 19 Jul 2013 13:16:52 -0400 Subject: [Numpy-discussion] NumPy-Discussion Digest, Vol 82, Issue 34 In-Reply-To: References: Message-ID: <51E97484.1020001@gmail.com> An HTML attachment was scrubbed... URL: From lists at onerussian.com Fri Jul 19 13:04:12 2013 From: lists at onerussian.com (Yaroslav Halchenko) Date: Fri, 19 Jul 2013 13:04:12 -0400 Subject: [Numpy-discussion] the mean, var, std of non-arrays In-Reply-To: References: <20130719024920.GO27621@onerussian.com> <20130719032442.GP27621@onerussian.com> Message-ID: <20130719170412.GQ27621@onerussian.com> On Thu, 18 Jul 2013, Charles R Harris wrote: > yeah... ?That is how I thought "it is working", but I guess it was left > without asanyarraying for additional flexibility/performance so any > array-like object could be used, not just ndarray derived classes. > Speaking of which, there is a PR for [3]nan{mean, var, std)? that you > might want to check before it gets committed. There might be some > modifications that you would want to add. well -- the only modifications to non-nan mean was a docstring's see also. there though input is explicitly converted/copied to ndarray so no custom .mean() functions would be called, thus issue a bit orthogonal as far as I see -- Yaroslav O. Halchenko, Ph.D. http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org Senior Research Associate, Psychological and Brain Sciences Dept. Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 WWW: http://www.linkedin.com/in/yarik From grb at skogoglandskap.no Fri Jul 19 14:23:00 2013 From: grb at skogoglandskap.no (Graeme B. Bell) Date: Fri, 19 Jul 2013 18:23:00 +0000 Subject: [Numpy-discussion] simpleselect v2.0 Message-ID: Hi all, I've just released version 2.0 of simple select. In brief this is a drop-in replacement for numpy.select with the following qualities: - Faster! (benchmarks 2-5x faster than numpy.select depending on use case and faster than v1.0 simpleselect) - Full broadcasting. - All bugs in numpy.select fixed, tested. - Better documented code. - Improvements to the test harness. I'm also submitting this as a pull request to the main numpy distribution, since it now covers all the functionality of numpy.select, but is faster and fixed long-standing bugs. I've tested with the numpy runtests.py as well as a test harness and unit tests of my own. Hope it's useful. Drop me a mail if you have any feedback. Have a nice weekend all, Graeme Bell From grb at skogoglandskap.no Fri Jul 19 14:23:52 2013 From: grb at skogoglandskap.no (Graeme B. Bell) Date: Fri, 19 Jul 2013 18:23:52 +0000 Subject: [Numpy-discussion] simpleselect v2.0 In-Reply-To: References: Message-ID: URL: https://github.com/gbb/numpy-simple-select > I've just released version 2.0 of simple select. In brief this is a drop-in replacement for numpy.select with the following qualities: > > - Faster! (benchmarks 2-5x faster than numpy.select depending on use case and faster than v1.0 simpleselect) > - Full broadcasting. > - All bugs in numpy.select fixed, tested. > - Better documented code. > - Improvements to the test harness. From lists at onerussian.com Fri Jul 19 18:07:54 2013 From: lists at onerussian.com (Yaroslav Halchenko) Date: Fri, 19 Jul 2013 18:07:54 -0400 Subject: [Numpy-discussion] fresh performance hits: numpy.linalg.pinv >30% slowdown In-Reply-To: <20130717035348.GN27621@onerussian.com> References: <20130506143241.GV5140@onerussian.com> <1367856232.2506.31.camel@sebastian-laptop> <20130506161153.GW5140@onerussian.com> <1367927238.23010.12.camel@sebastian-laptop> <20130701193006.GC27621@onerussian.com> <20130701215804.GG27621@onerussian.com> <20130709161007.GL27621@onerussian.com> <20130717035348.GN27621@onerussian.com> Message-ID: <20130719220754.GR27621@onerussian.com> I have just added a few more benchmarks, and here they come http://www.onerussian.com/tmp/numpy-vbench/vb_vb_linalg.html#numpy-linalg-pinv-a-float32 it seems to be very recent so my only check based on 10 commits didn't pick it up yet so they are not present in the summary table. could well be related to 80% faster det()? ;) norm was hit as well a bit earlier, might well be within these commits: https://github.com/numpy/numpy/compare/24a0aa5...29dcc54 I will rerun now benchmarking for the rest of commits (was running last in the day iirc) Cheers, On Tue, 16 Jul 2013, Yaroslav Halchenko wrote: > and to put so far reported findings into some kind of automated form, > please welcome > http://www.onerussian.com/tmp/numpy-vbench/#benchmarks-performance-analysis > This is based on a simple 1-way anova of last 10 commits and some point > in the past where 10 other commits had smallest timing and were significantly > different from the last 10 commits. > "Possible recent" is probably too noisy and not sure if useful -- it should > point to a closest in time (to the latest commits) diff where a > significant excursion from current performance was detected. So per se it has > nothing to do with the initial detected performance hit, but in some cases > seems still to reasonably locate commits hitting on performance. > Enjoy, -- Yaroslav O. Halchenko, Ph.D. http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org Senior Research Associate, Psychological and Brain Sciences Dept. Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 WWW: http://www.linkedin.com/in/yarik From njs at pobox.com Fri Jul 19 18:38:14 2013 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 19 Jul 2013 23:38:14 +0100 Subject: [Numpy-discussion] fresh performance hits: numpy.linalg.pinv >30% slowdown In-Reply-To: <20130719220754.GR27621@onerussian.com> References: <20130506143241.GV5140@onerussian.com> <1367856232.2506.31.camel@sebastian-laptop> <20130506161153.GW5140@onerussian.com> <1367927238.23010.12.camel@sebastian-laptop> <20130701193006.GC27621@onerussian.com> <20130701215804.GG27621@onerussian.com> <20130709161007.GL27621@onerussian.com> <20130717035348.GN27621@onerussian.com> <20130719220754.GR27621@onerussian.com> Message-ID: The biggest ~recent change in master's linalg was the switch to gufunc back ends - you might want to check for that event in your commit log. On 19 Jul 2013 23:08, "Yaroslav Halchenko" wrote: > I have just added a few more benchmarks, and here they come > > http://www.onerussian.com/tmp/numpy-vbench/vb_vb_linalg.html#numpy-linalg-pinv-a-float32 > it seems to be very recent so my only check based on 10 commits > didn't pick it up yet so they are not present in the summary table. > > could well be related to 80% faster det()? ;) > > norm was hit as well a bit earlier, might well be within these commits: > https://github.com/numpy/numpy/compare/24a0aa5...29dcc54 > I will rerun now benchmarking for the rest of commits (was running last > in the day iirc) > > Cheers, > > On Tue, 16 Jul 2013, Yaroslav Halchenko wrote: > > > and to put so far reported findings into some kind of automated form, > > please welcome > > > > http://www.onerussian.com/tmp/numpy-vbench/#benchmarks-performance-analysis > > > This is based on a simple 1-way anova of last 10 commits and some point > > in the past where 10 other commits had smallest timing and were > significantly > > different from the last 10 commits. > > > "Possible recent" is probably too noisy and not sure if useful -- it > should > > point to a closest in time (to the latest commits) diff where a > > significant excursion from current performance was detected. So per se > it has > > nothing to do with the initial detected performance hit, but in some > cases > > seems still to reasonably locate commits hitting on performance. > > > Enjoy, > -- > Yaroslav O. Halchenko, Ph.D. > http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org > Senior Research Associate, Psychological and Brain Sciences Dept. > Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 > Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 > WWW: http://www.linkedin.com/in/yarik > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at gmail.com Fri Jul 19 21:27:40 2013 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Fri, 19 Jul 2013 21:27:40 -0400 Subject: [Numpy-discussion] fresh performance hits: numpy.linalg.pinv >30% slowdown In-Reply-To: <20130719220754.GR27621@onerussian.com> References: <20130506143241.GV5140@onerussian.com> <1367856232.2506.31.camel@sebastian-laptop> <20130506161153.GW5140@onerussian.com> <1367927238.23010.12.camel@sebastian-laptop> <20130701193006.GC27621@onerussian.com> <20130701215804.GG27621@onerussian.com> <20130709161007.GL27621@onerussian.com> <20130717035348.GN27621@onerussian.com> <20130719220754.GR27621@onerussian.com> Message-ID: On 7/19/13, Yaroslav Halchenko wrote: > I have just added a few more benchmarks, and here they come > http://www.onerussian.com/tmp/numpy-vbench/vb_vb_linalg.html#numpy-linalg-pinv-a-float32 > it seems to be very recent so my only check based on 10 commits > didn't pick it up yet so they are not present in the summary table. > > could well be related to 80% faster det()? ;) > > norm was hit as well a bit earlier, Well, this is embarrassing: https://github.com/numpy/numpy/pull/3539 Thanks for benchmarks! I'm now an even bigger fan. :) Warren might well be within these commits: > https://github.com/numpy/numpy/compare/24a0aa5...29dcc54 > I will rerun now benchmarking for the rest of commits (was running last > in the day iirc) > > Cheers, > > On Tue, 16 Jul 2013, Yaroslav Halchenko wrote: > >> and to put so far reported findings into some kind of automated form, >> please welcome > >> http://www.onerussian.com/tmp/numpy-vbench/#benchmarks-performance-analysis > >> This is based on a simple 1-way anova of last 10 commits and some point >> in the past where 10 other commits had smallest timing and were >> significantly >> different from the last 10 commits. > >> "Possible recent" is probably too noisy and not sure if useful -- it >> should >> point to a closest in time (to the latest commits) diff where a >> significant excursion from current performance was detected. So per se it >> has >> nothing to do with the initial detected performance hit, but in some >> cases >> seems still to reasonably locate commits hitting on performance. > >> Enjoy, > -- > Yaroslav O. Halchenko, Ph.D. > http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org > Senior Research Associate, Psychological and Brain Sciences Dept. > Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 > Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 > WWW: http://www.linkedin.com/in/yarik > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From josef.pktd at gmail.com Sat Jul 20 01:44:18 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 20 Jul 2013 01:44:18 -0400 Subject: [Numpy-discussion] Bringing order to higher dimensional operations In-Reply-To: <1374250234.3254.33.camel@sebastian-laptop> References: <1374153819.14751.17.camel@sebastian-laptop> <1374250234.3254.33.camel@sebastian-laptop> Message-ID: On Fri, Jul 19, 2013 at 12:10 PM, Sebastian Berg wrote: > On Fri, 2013-07-19 at 16:14 +0100, Nathaniel Smith wrote: >> On Thu, Jul 18, 2013 at 2:23 PM, Sebastian Berg >> wrote: >> > On Thu, 2013-07-18 at 13:52 +0100, Nathaniel Smith wrote: >> >> Hi all, >> >> > >> >> What I mean is: Suppose we wrote a gufunc for 'sum', where the >> intrinsic operation took a vector and returned a scalar. (E.g. we want >> to implement one of the specialized algorithms for vector summation, >> like Kahan summation, which can be more accurate than applying scalar >> addition repeatedly.) >> >> Then we'd have: >> >> np.sum(ones((2, 3))).shape == () >> np.add.reduce(ones((2, 3))).shape == (3,) >> gufunc_sum(ones((2, 3))).shape == (2,) >> > > Ah, indeed! So we have a different default behaviour for ufunc.reduce > and all other reduce-like functions, didn't realize that. Changing that > would be one huge thing... I thought reduce, accumulate and reduceat (and map in python) are functions on iterators, and numpy still uses axis=0 to iterate over. related: is there any advantage to np.add.reduce? I find it more difficult to read than sum() and still see it used sometimes. (dot with more than 3 dimension is weird, and I never found a use for it.) Josef > As to implementing such thing as a Kahan summation, it is true, I also > can't see how it fits into the machinery. Maybe it shouldn't even be a > gufunc, but we rather need a way to specialize the reduction, or tag on > more information into the ufunc itself? > > - Sebastian > >> These are three names for exactly the same underlying function... but >> they all have different defaults for how they vectorize. >> >> -n >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From ralf.gommers at gmail.com Sat Jul 20 06:36:57 2013 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 20 Jul 2013 12:36:57 +0200 Subject: [Numpy-discussion] add .H attribute? In-Reply-To: References: <51D98902.1090403@gmail.com> <51E17280.2030105@gmail.com> Message-ID: On Thu, Jul 18, 2013 at 4:18 PM, St?fan van der Walt wrote: > On Sat, Jul 13, 2013 at 7:46 PM, Nathaniel Smith wrote: > > Why not just write > > > > def H(a): > > return a.conj().T > > It's hard to convince students that this is the Best Way of doing > things in NumPy. Why, they ask, can you do it using a' in MATLAB, > then? > > I've tripped over this one before, since it's not the kind of thing > you imagine would be unimplemented, and then spend some time trying to > find it. > +1 for adding a H attribute. Here's the end of the old discussion Chuck referred to: http://thread.gmane.org/gmane.comp.python.numeric.general/6637. No strong arguments against and then several more votes in favor. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sat Jul 20 06:58:08 2013 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 20 Jul 2013 12:58:08 +0200 Subject: [Numpy-discussion] subtypes of ndarray and round() In-Reply-To: <51D5D058.9090502@gmail.com> References: <51D5D058.9090502@gmail.com> Message-ID: On Thu, Jul 4, 2013 at 9:43 PM, Matti Picus wrote: > round() does not consistently preserve subtype of the ndarray, > is this known behaviour or should I file a bug for it? > That looks like a bug to me. The docstring explicitly says that return type equals input type. Ralf > Python 2.7.3 (default, Sep 26 2012, 21:51:14) > [GCC 4.7.2] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> import numpy as np > >>> np.version.version > '1.7.0' > >>> a=np.matrix(range(10)) > >>> a.round(decimals=10) > matrix([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]]) > >>> a.round(decimals=-10) > array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]) > > Matti > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Sat Jul 20 09:28:11 2013 From: pav at iki.fi (Pauli Virtanen) Date: Sat, 20 Jul 2013 16:28:11 +0300 Subject: [Numpy-discussion] fresh performance hits: numpy.linalg.pinv >30% slowdown In-Reply-To: References: <20130506143241.GV5140@onerussian.com> <1367856232.2506.31.camel@sebastian-laptop> <20130506161153.GW5140@onerussian.com> <1367927238.23010.12.camel@sebastian-laptop> <20130701193006.GC27621@onerussian.com> <20130701215804.GG27621@onerussian.com> <20130709161007.GL27621@onerussian.com> <20130717035348.GN27621@onerussian.com> <20130719220754.GR27621@onerussian.com> Message-ID: 20.07.2013 01:38, Nathaniel Smith kirjoitti: > The biggest ~recent change in master's linalg was the switch to gufunc > back ends - you might want to check for that event in your commit log. That was in mid-April, which doesn't match with the location of the uptick in the graph. Pauli From seb.haase at gmail.com Sat Jul 20 09:30:48 2013 From: seb.haase at gmail.com (Sebastian Haase) Date: Sat, 20 Jul 2013 15:30:48 +0200 Subject: [Numpy-discussion] add .H attribute? In-Reply-To: References: <51D98902.1090403@gmail.com> <51E17280.2030105@gmail.com> Message-ID: On Sat, Jul 20, 2013 at 12:36 PM, Ralf Gommers wrote: > > > > > On Thu, Jul 18, 2013 at 4:18 PM, St?fan van der Walt wrote: >> >> On Sat, Jul 13, 2013 at 7:46 PM, Nathaniel Smith wrote: >> > Why not just write >> > >> > def H(a): >> > return a.conj().T >> >> It's hard to convince students that this is the Best Way of doing >> things in NumPy. Why, they ask, can you do it using a' in MATLAB, >> then? >> >> I've tripped over this one before, since it's not the kind of thing >> you imagine would be unimplemented, and then spend some time trying to >> find it. > > > +1 for adding a H attribute. > > Here's the end of the old discussion Chuck referred to: >http://thread.gmane.org/gmane.comp.python.numeric.general/6637. No strong arguments against and then > several more votes in favor. Are there other precedents where an attribute would involve data-copying ? I'm thinking that numpy generally does better than matlab by being more explicit about it's memory usage... (But, I'm no mathematician and I could see it beeing much of a convenience to have .H ) My two cents, Sebastian Haase From ralf.gommers at gmail.com Sat Jul 20 09:49:34 2013 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 20 Jul 2013 15:49:34 +0200 Subject: [Numpy-discussion] add .H attribute? In-Reply-To: References: <51D98902.1090403@gmail.com> <51E17280.2030105@gmail.com> Message-ID: On Sat, Jul 20, 2013 at 3:30 PM, Sebastian Haase wrote: > On Sat, Jul 20, 2013 at 12:36 PM, Ralf Gommers > wrote: > > > > > > > > > > On Thu, Jul 18, 2013 at 4:18 PM, St?fan van der Walt > wrote: > >> > >> On Sat, Jul 13, 2013 at 7:46 PM, Nathaniel Smith wrote: > >> > Why not just write > >> > > >> > def H(a): > >> > return a.conj().T > >> > >> It's hard to convince students that this is the Best Way of doing > >> things in NumPy. Why, they ask, can you do it using a' in MATLAB, > >> then? > >> > >> I've tripped over this one before, since it's not the kind of thing > >> you imagine would be unimplemented, and then spend some time trying to > >> find it. > > > > > > +1 for adding a H attribute. > > > > Here's the end of the old discussion Chuck referred to: > > http://thread.gmane.org/gmane.comp.python.numeric.general/6637. No > strong arguments against and then > > several more votes in favor. > > Are there other precedents where an attribute would involve > data-copying ? np.matrix.H for example. If you meant ndarray attributes and not attributes of numpy objects, I guess no. I don't think that matters much compared to having an intuitive and consistent API though. Ralf > I'm thinking that numpy generally does better than > matlab by being more explicit about it's memory usage... > (But, I'm no mathematician and I could see it beeing much of a > convenience to have .H ) > > My two cents, > Sebastian Haase > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matti.picus at gmail.com Sat Jul 20 14:09:09 2013 From: matti.picus at gmail.com (Matti Picus) Date: Sat, 20 Jul 2013 21:09:09 +0300 Subject: [Numpy-discussion] subtypes of ndarray and round() In-Reply-To: References: Message-ID: <51EAD245.7030605@gmail.com> An HTML attachment was scrubbed... URL: From magawake at gmail.com Sun Jul 21 02:24:16 2013 From: magawake at gmail.com (Mag Gam) Date: Sun, 21 Jul 2013 02:24:16 -0400 Subject: [Numpy-discussion] Mag Gam Message-ID: http://houtwormbestrijding-houtwormbestrijding.nl/mlwoeh/ibivodpmj.ikuklorzxzgycwtj Mag Gam 7/21/2013 7:24:11 AM From stefan at sun.ac.za Sun Jul 21 18:37:42 2013 From: stefan at sun.ac.za (=?iso-8859-1?Q?St=E9fan?= van der Walt) Date: Mon, 22 Jul 2013 00:37:42 +0200 Subject: [Numpy-discussion] add .H attribute? In-Reply-To: References: <51D98902.1090403@gmail.com> <51E17280.2030105@gmail.com> Message-ID: <20130721223742.GA20415@shinobi> On Sat, 20 Jul 2013 15:30:48 +0200, Sebastian Haase wrote: > Are there other precedents where an attribute would involve > data-copying ? I'm thinking that numpy generally does better than > matlab by being more explicit about it's memory usage... > (But, I'm no mathematician and I could see it beeing much of a > convenience to have .H ) Hopefully we'll eventually have lazily evaluated arrays so that we can do things like views of ufuncs on data. Unfortunately, this is not doable with the current ndarray, since its structure is tied to a pointer and strides. St?fan From anubhab91 at gmail.com Mon Jul 22 00:04:57 2013 From: anubhab91 at gmail.com (Anubhab Baksi) Date: Mon, 22 Jul 2013 09:34:57 +0530 Subject: [Numpy-discussion] Mag Gam In-Reply-To: References: Message-ID: I don't know, but it is redirected to https://www.google.co.in/?gws_rd=cr . On Sun, Jul 21, 2013 at 11:54 AM, Mag Gam wrote: > > http://houtwormbestrijding-houtwormbestrijding.nl/mlwoeh/ibivodpmj.ikuklorzxzgycwtj > > > > > > Mag Gam > > > 7/21/2013 7:24:11 AM > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at sun.ac.za Mon Jul 22 05:02:19 2013 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Mon, 22 Jul 2013 11:02:19 +0200 Subject: [Numpy-discussion] Bringing order to higher dimensional operations In-Reply-To: References: <1374153819.14751.17.camel@sebastian-laptop> <1374250234.3254.33.camel@sebastian-laptop> Message-ID: On Sat, Jul 20, 2013 at 7:44 AM, wrote: > related: is there any advantage to np.add.reduce? > I find it more difficult to read than sum() and still see it used sometimes. I think ``np.add.reduce`` just falls out of the ufunc implementation--there's no "per ufunc" choice to remove certain parts of the API, if I recall correctly. St?fan From lists at onerussian.com Mon Jul 22 10:51:58 2013 From: lists at onerussian.com (Yaroslav Halchenko) Date: Mon, 22 Jul 2013 10:51:58 -0400 Subject: [Numpy-discussion] fresh performance hits: numpy.linalg.pinv >30% slowdown In-Reply-To: References: <1367856232.2506.31.camel@sebastian-laptop> <20130506161153.GW5140@onerussian.com> <1367927238.23010.12.camel@sebastian-laptop> <20130701193006.GC27621@onerussian.com> <20130701215804.GG27621@onerussian.com> <20130709161007.GL27621@onerussian.com> <20130717035348.GN27621@onerussian.com> <20130719220754.GR27621@onerussian.com> Message-ID: <20130722145157.GS27621@onerussian.com> On Fri, 19 Jul 2013, Warren Weckesser wrote: > Well, this is embarrassing: https://github.com/numpy/numpy/pull/3539 > Thanks for benchmarks! I'm now an even bigger fan. :) Great to see that those came of help! I thought to provide a detailed details (benchmarking all recent commits) to provide exact point of regression, but embarrassingly I made that run outside of the benchmarking chroot, so consistency was not guaranteed. Anyways -- rerunning it correctly now (with recent commits included). -- Yaroslav O. Halchenko, Ph.D. http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org Senior Research Associate, Psychological and Brain Sciences Dept. Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 WWW: http://www.linkedin.com/in/yarik From lists at onerussian.com Mon Jul 22 10:55:14 2013 From: lists at onerussian.com (Yaroslav Halchenko) Date: Mon, 22 Jul 2013 10:55:14 -0400 Subject: [Numpy-discussion] fresh performance hits: numpy.linalg.pinv >30% slowdown In-Reply-To: References: <20130506161153.GW5140@onerussian.com> <1367927238.23010.12.camel@sebastian-laptop> <20130701193006.GC27621@onerussian.com> <20130701215804.GG27621@onerussian.com> <20130709161007.GL27621@onerussian.com> <20130717035348.GN27621@onerussian.com> <20130719220754.GR27621@onerussian.com> Message-ID: <20130722145514.GT27621@onerussian.com> At some point I hope to tune up the report with an option of viewing the plot using e.g. nvd3 JS so it could be easier to pin point/analyze interactively. On Sat, 20 Jul 2013, Pauli Virtanen wrote: > 20.07.2013 01:38, Nathaniel Smith kirjoitti: > > The biggest ~recent change in master's linalg was the switch to gufunc > > back ends - you might want to check for that event in your commit log. > That was in mid-April, which doesn't match with the location of the > uptick in the graph. -- Yaroslav O. Halchenko, Ph.D. http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org Senior Research Associate, Psychological and Brain Sciences Dept. Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 WWW: http://www.linkedin.com/in/yarik From ben.root at ou.edu Mon Jul 22 13:16:22 2013 From: ben.root at ou.edu (Benjamin Root) Date: Mon, 22 Jul 2013 13:16:22 -0400 Subject: [Numpy-discussion] fresh performance hits: numpy.linalg.pinv >30% slowdown In-Reply-To: <20130722145514.GT27621@onerussian.com> References: <20130506161153.GW5140@onerussian.com> <1367927238.23010.12.camel@sebastian-laptop> <20130701193006.GC27621@onerussian.com> <20130701215804.GG27621@onerussian.com> <20130709161007.GL27621@onerussian.com> <20130717035348.GN27621@onerussian.com> <20130719220754.GR27621@onerussian.com> <20130722145514.GT27621@onerussian.com> Message-ID: On Mon, Jul 22, 2013 at 10:55 AM, Yaroslav Halchenko wrote: > At some point I hope to tune up the report with an option of viewing the > plot using e.g. nvd3 JS so it could be easier to pin point/analyze > interactively. > > shameless plug... the soon-to-be-finalized matplotlib-1.3 has a WebAgg backend that allows for interactivity. Cheers! Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From lists at onerussian.com Mon Jul 22 13:28:28 2013 From: lists at onerussian.com (Yaroslav Halchenko) Date: Mon, 22 Jul 2013 13:28:28 -0400 Subject: [Numpy-discussion] fresh performance hits: numpy.linalg.pinv >30% slowdown In-Reply-To: References: <20130701193006.GC27621@onerussian.com> <20130701215804.GG27621@onerussian.com> <20130709161007.GL27621@onerussian.com> <20130717035348.GN27621@onerussian.com> <20130719220754.GR27621@onerussian.com> <20130722145514.GT27621@onerussian.com> Message-ID: <20130722172828.GU27621@onerussian.com> On Mon, 22 Jul 2013, Benjamin Root wrote: > At some point I hope to tune up the report with an option of viewing the > plot using e.g. nvd3 JS so it could be easier to pin point/analyze > interactively. > shameless plug... the soon-to-be-finalized matplotlib-1.3 has a WebAgg > backend that allows for interactivity. "that's just sick!" do you know about any motion in python-sphinx world on supporting it? is there any demo page you would recommend to assess what to expect supported in upcoming webagg? -- Yaroslav O. Halchenko, Ph.D. http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org Senior Research Associate, Psychological and Brain Sciences Dept. Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 WWW: http://www.linkedin.com/in/yarik From ben.root at ou.edu Mon Jul 22 13:43:56 2013 From: ben.root at ou.edu (Benjamin Root) Date: Mon, 22 Jul 2013 13:43:56 -0400 Subject: [Numpy-discussion] fresh performance hits: numpy.linalg.pinv >30% slowdown In-Reply-To: <20130722172828.GU27621@onerussian.com> References: <20130701193006.GC27621@onerussian.com> <20130701215804.GG27621@onerussian.com> <20130709161007.GL27621@onerussian.com> <20130717035348.GN27621@onerussian.com> <20130719220754.GR27621@onerussian.com> <20130722145514.GT27621@onerussian.com> <20130722172828.GU27621@onerussian.com> Message-ID: On Mon, Jul 22, 2013 at 1:28 PM, Yaroslav Halchenko wrote: > > On Mon, 22 Jul 2013, Benjamin Root wrote: > > At some point I hope to tune up the report with an option of > viewing the > > plot using e.g. nvd3 JS so it could be easier to pin point/analyze > > interactively. > > shameless plug... the soon-to-be-finalized matplotlib-1.3 has a WebAgg > > backend that allows for interactivity. > > "that's just sick!" > > do you know about any motion in python-sphinx world on supporting it? > > is there any demo page you would recommend to assess what to expect > supported in upcoming webagg? > > Oldie but goodie: http://mdboom.github.io/blog/2012/10/11/matplotlib-in-the-browser-its-coming/ Official Announcement: http://matplotlib.org/1.3.0/users/whats_new.html#webagg-backend Note, this is different than what is now available in IPython Notebook (it isn't really interactive there). As for what is supported, just about everything you can do normally, can be done in WebAgg. I have no clue about sphinx-level support. Now, back to your regularly scheduled program. Cheers! Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Mon Jul 22 15:10:42 2013 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 22 Jul 2013 20:10:42 +0100 Subject: [Numpy-discussion] add .H attribute? In-Reply-To: References: <51D98902.1090403@gmail.com> <51E17280.2030105@gmail.com> Message-ID: On Thu, Jul 18, 2013 at 3:18 PM, St?fan van der Walt wrote: > On Sat, Jul 13, 2013 at 7:46 PM, Nathaniel Smith wrote: >> Why not just write >> >> def H(a): >> return a.conj().T > > It's hard to convince students that this is the Best Way of doing > things in NumPy. Why, they ask, can you do it using a' in MATLAB, > then? I guess I'd try to treat it as a teachable moment... the answer points to a basic difference in numpy versus MATLAB. Numpy operates at a slightly lower level of abstraction. In MATLAB you're encouraged to think of arrays as just mathematical matrices and let MATLAB worry about how to actually represent those inside the computer. Sometimes it does a good job, sometimes not. In numpy you need to think of arrays as structured representations of a chunk of memory. There disadvantages to this -- e.g. keeping track of which arrays return view and which return copies can be tricky -- but it also gives a lot of power: views are awesome, you get better interoperability with C libraries/Cython, better ability to predict which operations are expensive or cheap, more opportunities to use clever tricks when you need to, etc. And one example of this is that transpose and conjugate transpose really are very different at this level, because one is a cheap stride manipulation that returns a view, and the other is a (relatively) expensive data copying operation. The convention in Python is that attribute access is supposed to be cheap, while function calls serve as a warning that something expensive might be going on. So in short: MATLAB is optimized for doing linear algebra and not thinking too hard about programming; numpy is optimized for writing good programs. Having .T but not .H is an example of this split. Also it's a good opportunity to demonstrate the value of making little helper functions, which is a powerful technique that students generally need to be taught ;-). -n From evgeny.toder at jpmorgan.com Mon Jul 22 15:39:47 2013 From: evgeny.toder at jpmorgan.com (Toder, Evgeny) Date: Mon, 22 Jul 2013 19:39:47 +0000 Subject: [Numpy-discussion] add .H attribute? In-Reply-To: References: <51D98902.1090403@gmail.com> <51E17280.2030105@gmail.com> Message-ID: What if .H is not an attribute, but a method? Is this enough of a warning about copying? Eugene -----Original Message----- From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion-bounces at scipy.org] On Behalf Of Nathaniel Smith Sent: Monday, July 22, 2013 3:11 PM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] add .H attribute? On Thu, Jul 18, 2013 at 3:18 PM, St?fan van der Walt wrote: > On Sat, Jul 13, 2013 at 7:46 PM, Nathaniel Smith wrote: >> Why not just write >> >> def H(a): >> return a.conj().T > > It's hard to convince students that this is the Best Way of doing > things in NumPy. Why, they ask, can you do it using a' in MATLAB, > then? I guess I'd try to treat it as a teachable moment... the answer points to a basic difference in numpy versus MATLAB. Numpy operates at a slightly lower level of abstraction. In MATLAB you're encouraged to think of arrays as just mathematical matrices and let MATLAB worry about how to actually represent those inside the computer. Sometimes it does a good job, sometimes not. In numpy you need to think of arrays as structured representations of a chunk of memory. There disadvantages to this -- e.g. keeping track of which arrays return view and which return copies can be tricky -- but it also gives a lot of power: views are awesome, you get better interoperability with C libraries/Cython, better ability to predict which operations are expensive or cheap, more opportunities to use clever tricks when you need to, etc. And one example of this is that transpose and conjugate transpose really are very different at this level, because one is a cheap stride manipulation that returns a view, and the other is a (relatively) expensive data copying operation. The convention in Python is that attribute access is supposed to be cheap, while function calls serve as a warning that something expensive might be going on. So in short: MATLAB is optimized for doing linear algebra and not thinking too hard about programming; numpy is optimized for writing good programs. Having .T but not .H is an example of this split. Also it's a good opportunity to demonstrate the value of making little helper functions, which is a powerful technique that students generally need to be taught ;-). -n _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion This email is confidential and subject to important disclaimers and conditions including on offers for the purchase or sale of securities, accuracy and completeness of information, viruses, confidentiality, legal privilege, and legal entity disclaimers, available at http://www.jpmorgan.com/pages/disclosures/email. From bryanv at continuum.io Mon Jul 22 16:04:47 2013 From: bryanv at continuum.io (Bryan Van de Ven) Date: Mon, 22 Jul 2013 16:04:47 -0400 Subject: [Numpy-discussion] add .H attribute? In-Reply-To: References: <51D98902.1090403@gmail.com> <51E17280.2030105@gmail.com> Message-ID: On the other hand, the most salient quality an unavoidable copy is that it is unavoidable. For people for whom using Hermitian conjugates is common, it's not like they won't do it just because they can't avoid a copy that can't be avoided. Given that if a problem dictates a Hermitian conjugate be taken, then it will be taken, then: a.H is closer to the mathematical notation, eases migration for matlab users, and does not require everyone to reinvent their own little version of the same function over and over. All of that seems more compelling that this particular arbitrary convention, personally. Bryan On Jul 22, 2013, at 3:10 PM, Nathaniel Smith wrote: > On Thu, Jul 18, 2013 at 3:18 PM, St?fan van der Walt wrote: >> On Sat, Jul 13, 2013 at 7:46 PM, Nathaniel Smith wrote: >>> Why not just write >>> >>> def H(a): >>> return a.conj().T >> >> It's hard to convince students that this is the Best Way of doing >> things in NumPy. Why, they ask, can you do it using a' in MATLAB, >> then? > > I guess I'd try to treat it as a teachable moment... the answer points > to a basic difference in numpy versus MATLAB. Numpy operates at a > slightly lower level of abstraction. In MATLAB you're encouraged to > think of arrays as just mathematical matrices and let MATLAB worry > about how to actually represent those inside the computer. Sometimes > it does a good job, sometimes not. In numpy you need to think of > arrays as structured representations of a chunk of memory. There > disadvantages to this -- e.g. keeping track of which arrays return > view and which return copies can be tricky -- but it also gives a lot > of power: views are awesome, you get better interoperability with C > libraries/Cython, better ability to predict which operations are > expensive or cheap, more opportunities to use clever tricks when you > need to, etc. > > And one example of this is that transpose and conjugate transpose > really are very different at this level, because one is a cheap stride > manipulation that returns a view, and the other is a (relatively) > expensive data copying operation. The convention in Python is that > attribute access is supposed to be cheap, while function calls serve > as a warning that something expensive might be going on. So in short: > MATLAB is optimized for doing linear algebra and not thinking too hard > about programming; numpy is optimized for writing good programs. > Having .T but not .H is an example of this split. > > Also it's a good opportunity to demonstrate the value of making little > helper functions, which is a powerful technique that students > generally need to be taught ;-). > > -n > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From alan.isaac at gmail.com Mon Jul 22 16:07:28 2013 From: alan.isaac at gmail.com (Alan G Isaac) Date: Mon, 22 Jul 2013 16:07:28 -0400 Subject: [Numpy-discussion] add .H attribute? In-Reply-To: References: <51D98902.1090403@gmail.com> <51E17280.2030105@gmail.com> Message-ID: <51ED9100.8040108@gmail.com> On 7/22/2013 3:10 PM, Nathaniel Smith wrote: > Having .T but not .H is an example of this split. Hate to do this but ... Readability counts. Special cases aren't special enough to break the rules. Although practicality beats purity. How much is the split a rule or "just" a convention, and is there enough practicality here to beat the purity of the split? Note: this is not a rhetorical question. However: if you propose A.conjugate().transpose() as providing a teachable moment about why to use NumPy instead of A' in Matlab, I conclude you do not ever teach most of my students. The real world matters. Since practicality beats purity, we do have A.conj().T, which is better but still not as readable as A.H would be. Or even A.H(), should that satisfy your objections (and still provide a teachable moment). Alan From dave.hirschfeld at gmail.com Tue Jul 23 03:35:41 2013 From: dave.hirschfeld at gmail.com (Dave Hirschfeld) Date: Tue, 23 Jul 2013 07:35:41 +0000 (UTC) Subject: [Numpy-discussion] add .H attribute? References: <51D98902.1090403@gmail.com> <51E17280.2030105@gmail.com> <51ED9100.8040108@gmail.com> Message-ID: Alan G Isaac gmail.com> writes: > > On 7/22/2013 3:10 PM, Nathaniel Smith wrote: > > Having .T but not .H is an example of this split. > > Hate to do this but ... > > Readability counts. +10! A.conjugate().transpose() is unspeakably horrible IMHO. Since there's no way to avoid a copy you gain nothing by not providing the convenience function. It should be fairly obvious that an operation which changes the values of an array (and doesn't work in-place) necessarily takes a copy. I think it's more than sufficient to simply document the fact that A.H will return a copy. A user coming from Matlab probably doesn't care that it takes a copy but you'd be hard pressed to convince them there's any benefit of writing A.conjugate().transpose() over exactly what it looks like in textbooks - A.H Regards, Dave From fperez.net at gmail.com Tue Jul 23 04:07:27 2013 From: fperez.net at gmail.com (Fernando Perez) Date: Tue, 23 Jul 2013 01:07:27 -0700 Subject: [Numpy-discussion] add .H attribute? In-Reply-To: References: <51D98902.1090403@gmail.com> <51E17280.2030105@gmail.com> <51ED9100.8040108@gmail.com> Message-ID: On Tue, Jul 23, 2013 at 12:35 AM, Dave Hirschfeld wrote: > Alan G Isaac gmail.com> writes: > >> >> On 7/22/2013 3:10 PM, Nathaniel Smith wrote: >> > Having .T but not .H is an example of this split. >> >> Hate to do this but ... >> >> Readability counts. > > +10! > > A.conjugate().transpose() is unspeakably horrible IMHO. Since there's no way > to avoid a copy you gain nothing by not providing the convenience function. Silly suggestion: why not just make .H a callable? a.H() is nearly as short/handy as .H, it fits easily into the mnemonic pattern suggested by .T, yet the extra () are indicative that something potentially big/expensive is happening... Cheers, f -- Fernando Perez (@fperez_org; http://fperez.org) fperez.net-at-gmail: mailing lists only (I ignore this when swamped!) fernando.perez-at-berkeley: contact me here for any direct mail From d.s.seljebotn at astro.uio.no Tue Jul 23 04:35:16 2013 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Tue, 23 Jul 2013 10:35:16 +0200 Subject: [Numpy-discussion] add .H attribute? In-Reply-To: References: <51D98902.1090403@gmail.com> <51E17280.2030105@gmail.com> <51ED9100.8040108@gmail.com> Message-ID: <51EE4044.5060509@astro.uio.no> On 07/23/2013 09:35 AM, Dave Hirschfeld wrote: > Alan G Isaac gmail.com> writes: > >> >> On 7/22/2013 3:10 PM, Nathaniel Smith wrote: >>> Having .T but not .H is an example of this split. >> >> Hate to do this but ... >> >> Readability counts. > > +10! > > A.conjugate().transpose() is unspeakably horrible IMHO. Since there's no way > to avoid a copy you gain nothing by not providing the convenience function. > > It should be fairly obvious that an operation which changes the values of an > array (and doesn't work in-place) necessarily takes a copy. I think it's more > than sufficient to simply document the fact that A.H will return a copy. I don't think this is obvious at all. In fact, I'd fully expect A.H to return a view that conjugates the values on the fly as they are read/written (just the same way the array is "transposed on the fly" or "sliced on the fly" with other views). There's lots of uses for A.H to be a conjugating-view, e.g., np.dot(A.H, A) can be done on-the-fly by BLAS at no extra cost, and so on. These are currently not possible with pure NumPy without a copy, which is a pretty big defect IMO (and one reason I'd call BLAS myself using Cython rather than use np.dot...) So -1 on using A.H for anything but a proper view, and "A.conjt()" or something similar for a method that does a copy. Dag Sverre From d.s.seljebotn at astro.uio.no Tue Jul 23 04:36:33 2013 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Tue, 23 Jul 2013 10:36:33 +0200 Subject: [Numpy-discussion] add .H attribute? In-Reply-To: <51EE4044.5060509@astro.uio.no> References: <51D98902.1090403@gmail.com> <51E17280.2030105@gmail.com> <51ED9100.8040108@gmail.com> <51EE4044.5060509@astro.uio.no> Message-ID: <51EE4091.1080009@astro.uio.no> On 07/23/2013 10:35 AM, Dag Sverre Seljebotn wrote: > On 07/23/2013 09:35 AM, Dave Hirschfeld wrote: >> Alan G Isaac gmail.com> writes: >> >>> >>> On 7/22/2013 3:10 PM, Nathaniel Smith wrote: >>>> Having .T but not .H is an example of this split. >>> >>> Hate to do this but ... >>> >>> Readability counts. >> >> +10! >> >> A.conjugate().transpose() is unspeakably horrible IMHO. Since there's >> no way >> to avoid a copy you gain nothing by not providing the convenience >> function. >> >> It should be fairly obvious that an operation which changes the values >> of an >> array (and doesn't work in-place) necessarily takes a copy. I think >> it's more >> than sufficient to simply document the fact that A.H will return a copy. > > I don't think this is obvious at all. In fact, I'd fully expect A.H to > return a view that conjugates the values on the fly as they are > read/written (just the same way the array is "transposed on the fly" or > "sliced on the fly" with other views). > > There's lots of uses for A.H to be a conjugating-view, e.g., np.dot(A.H, > A) can be done on-the-fly by BLAS at no extra cost, and so on. These are > currently not possible with pure NumPy without a copy, which is a pretty > big defect IMO (and one reason I'd call BLAS myself using Cython rather > than use np.dot...) > > So -1 on using A.H for anything but a proper view, and "A.conjt()" or > something similar for a method that does a copy. Sorry: I'm +1 on another name for a method that does a copy. Which can eventually be made redundant with A.H.copy(), if somebody ever takes on the work to make that happen...but at least I think the path to that should be kept open. Dag Sverre From josef.pktd at gmail.com Tue Jul 23 04:46:44 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 23 Jul 2013 04:46:44 -0400 Subject: [Numpy-discussion] what is data attribute of numpy.str_ ? Message-ID: python 3.3 I have a bug because we have a check for a .data attribute, that is not supposed to be available for a string. (Pdb) dir(data) ['T', '__abs__', '__add__', '__and__', '__array__', '__array_interface__', '__array_priority__', '__array_struct__', '__array_wrap__', '__bool__', '__class__', '__contains__', '__copy__', '__deepcopy__', '__delattr__', '__dir__', '__divmod__', '__doc__', '__eq__', '__float__', '__floordiv__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__int__', '__invert__', '__iter__', '__le__', '__len__', '__lshift__', '__lt__', '__mod__', '__mul__', '__ne__', '__neg__', '__new__', '__or__', '__pos__', '__pow__', '__radd__', '__rand__', '__rdivmod__', '__reduce__', '__reduce_ex__', '__repr__', '__rfloordiv__', '__rlshift__', '__rmod__', '__rmul__', '__ror__', '__round__', '__rpow__', '__rrshift__', '__rshift__', '__rsub__', '__rtruediv__', '__rxor__', '__setattr__', '__setstate__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__truediv__', '__xor__', 'all', 'any', 'argmax', 'argmin', 'argsort', 'astype', 'base', 'byteswap', 'capitalize', 'casefold', 'center', 'choose', 'clip', 'compress', 'conj', 'conjugate', 'copy', 'count', 'cumprod', 'cumsum', 'data', 'diagonal', 'dtype', 'dump', 'dumps', 'encode', 'endswith', 'expandtabs', 'fill', 'find', 'flags', 'flat', 'flatten', 'format', 'format_map', 'getfield', 'imag', 'index', 'isalnum', 'isalpha', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'item', 'itemset', 'itemsize', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'max', 'mean', 'min', 'nbytes', 'ndim', 'newbyteorder', 'nonzero', 'partition', 'prod', 'ptp', 'put', 'ravel', 'real', 'repeat', 'replace', 'reshape', 'resize', 'rfind', 'rindex', 'rjust', 'round', 'rpartition', 'rsplit', 'rstrip', 'searchsorted', 'setfield', 'setflags', 'shape', 'size', 'sort', 'split', 'splitlines', 'squeeze', 'startswith', 'std', 'strides', 'strip', 'sum', 'swapaxes', 'swapcase', 'take', 'title', 'tofile', 'tolist', 'tostring', 'trace', 'translate', 'transpose', 'upper', 'var', 'view', 'zfill'] (Pdb) data '0' (Pdb) type(data) (Pdb) data.data *** TypeError: memoryview: numpy.str_ object does not have the buffer interface (Pdb) Josef From alan.isaac at gmail.com Tue Jul 23 08:35:01 2013 From: alan.isaac at gmail.com (Alan G Isaac) Date: Tue, 23 Jul 2013 08:35:01 -0400 Subject: [Numpy-discussion] add .H attribute? In-Reply-To: <51EE4091.1080009@astro.uio.no> References: <51D98902.1090403@gmail.com> <51E17280.2030105@gmail.com> <51ED9100.8040108@gmail.com> <51EE4044.5060509@astro.uio.no> <51EE4091.1080009@astro.uio.no> Message-ID: <51EE7875.1080803@gmail.com> On 7/23/2013 4:36 AM, Dag Sverre Seljebotn wrote: > I'm +1 on another name for a method that does a copy. Which can > eventually be made redundant with A.H.copy(), if somebody ever takes on > the work to make that happen...but at least I think the path to that > should be kept open. If that is the decision, I would suggest A.ct(). But, it this really necessary? An obvious path is to introduce A.H now, document that it makes a copy, and document that it may eventually produce an iterative view. Think how much nicer things would be evolving if diagonal had been implemented as an attribute with documentation that it would eventually be a writable view. Isn't there some analogy with this situation? Alan From stefan at sun.ac.za Tue Jul 23 08:42:49 2013 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Tue, 23 Jul 2013 14:42:49 +0200 Subject: [Numpy-discussion] add .H attribute? In-Reply-To: <51EE4044.5060509@astro.uio.no> References: <51D98902.1090403@gmail.com> <51E17280.2030105@gmail.com> <51ED9100.8040108@gmail.com> <51EE4044.5060509@astro.uio.no> Message-ID: On Tue, Jul 23, 2013 at 10:35 AM, Dag Sverre Seljebotn wrote: > So -1 on using A.H for anything but a proper view, and "A.conjt()" or > something similar for a method that does a copy. "A.T.conj()" is just as clear, so my feeling is that we should either add A.H / A.H() or leave it be. St?fan From pav at iki.fi Tue Jul 23 09:09:27 2013 From: pav at iki.fi (Pauli Virtanen) Date: Tue, 23 Jul 2013 16:09:27 +0300 Subject: [Numpy-discussion] add .H attribute? In-Reply-To: References: <51D98902.1090403@gmail.com> <51E17280.2030105@gmail.com> <51ED9100.8040108@gmail.com> <51EE4044.5060509@astro.uio.no> Message-ID: 23.07.2013 15:42, St?fan van der Walt kirjoitti: > On Tue, Jul 23, 2013 at 10:35 AM, Dag Sverre Seljebotn > wrote: >> So -1 on using A.H for anything but a proper view, and "A.conjt()" or >> something similar for a method that does a copy. > > "A.T.conj()" is just as clear, so my feeling is that we should either > add A.H / A.H() or leave it be. The .H property has been implemented in Numpy matrices and Scipy's sparse matrices for many years, and AFAIK the view issue apparently hasn't caused much confusion. I think having it return an iterator (similarly to .flat which I think is rarely used) that is not compatible with ndarrays would be quite confusing. Implementing a full complex-conjugating ndarray view for this purpose on the other hand seems quite a large hassle, for somewhat dubious gains. If it is implemented as returning a copy, it can be documented in a way that leaves leeway for changing the implementation to a view later on. -- Pauli Virtanen From Jerome.Kieffer at esrf.fr Tue Jul 23 09:37:34 2013 From: Jerome.Kieffer at esrf.fr (Jerome Kieffer) Date: Tue, 23 Jul 2013 15:37:34 +0200 Subject: [Numpy-discussion] add .H attribute? In-Reply-To: References: <51D98902.1090403@gmail.com> <51E17280.2030105@gmail.com> <51ED9100.8040108@gmail.com> Message-ID: <20130723153734.3a235e49.Jerome.Kieffer@esrf.fr> On Tue, 23 Jul 2013 01:07:27 -0700 Fernando Perez wrote: > Silly suggestion: why not just make .H a callable? > > a.H() +1 -- J?r?me Kieffer On-Line Data analysis / Software Group ISDD / ESRF tel +33 476 882 445 From alan.isaac at gmail.com Tue Jul 23 09:39:08 2013 From: alan.isaac at gmail.com (Alan G Isaac) Date: Tue, 23 Jul 2013 09:39:08 -0400 Subject: [Numpy-discussion] .flat (was: add .H attribute?) In-Reply-To: References: <51D98902.1090403@gmail.com> <51E17280.2030105@gmail.com> <51ED9100.8040108@gmail.com> <51EE4044.5060509@astro.uio.no> Message-ID: <51EE877C.4080504@gmail.com> On 7/23/2013 9:09 AM, Pauli Virtanen wrote: > .flat which I think > is rarely used Until ``diagonal`` completes its transition, use of ``flat`` seems the best way to reset the diagonal on an array. Am I wrong? I use it that way all the time. Alan Isaac From stefan at sun.ac.za Tue Jul 23 10:11:47 2013 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Tue, 23 Jul 2013 16:11:47 +0200 Subject: [Numpy-discussion] .flat (was: add .H attribute?) In-Reply-To: <51EE877C.4080504@gmail.com> References: <51D98902.1090403@gmail.com> <51E17280.2030105@gmail.com> <51ED9100.8040108@gmail.com> <51EE4044.5060509@astro.uio.no> <51EE877C.4080504@gmail.com> Message-ID: On Tue, Jul 23, 2013 at 3:39 PM, Alan G Isaac wrote: > On 7/23/2013 9:09 AM, Pauli Virtanen wrote: >> .flat which I think >> is rarely used > > Until ``diagonal`` completes its transition, > use of ``flat`` seems the best way to reset > the diagonal on an array. Am I wrong? > I use it that way all the time. I usually write x[np.diag_indices_from(x)] = [1,2,3] St?fan From sebastian at sipsolutions.net Tue Jul 23 10:29:44 2013 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 23 Jul 2013 16:29:44 +0200 Subject: [Numpy-discussion] Allow == and != to raise errors In-Reply-To: References: <1373632688.13968.13.camel@sebastian-laptop> Message-ID: <1374589784.13486.32.camel@sebastian-laptop> On Sat, 2013-07-13 at 11:28 -0400, josef.pktd at gmail.com wrote: > On Sat, Jul 13, 2013 at 9:14 AM, Nathaniel Smith wrote: > > I'm now +1 on the exception that Sebastian proposed. > > I like consistency, and having a more straightforward mental model of > the numpy behavior for elementwise operations, that don't pretend > sometimes to be "python" (when I'm doing array math), like this > I am not sure what the result of this discussion is. As far as I see Benjamin and Fr?d?ric were opposing and overall it seemed pretty mixed, so unless you two changed your mind or say that it was just a small personal preference I would drop it for now. I obviously think the current behaviour is inconsistent to buggy and am really only afraid of possibly breaking code out there. Which is why I think I maybe should first add a FutureWarning if we decide on changing it. Regards, Sebastian > >>> [1,2,3] < [1,2] > False > >>> [1,2,3] > [1,2] > True > > Josef > > > > > > -n > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From ben.root at ou.edu Tue Jul 23 10:34:22 2013 From: ben.root at ou.edu (Benjamin Root) Date: Tue, 23 Jul 2013 10:34:22 -0400 Subject: [Numpy-discussion] .flat (was: add .H attribute?) In-Reply-To: References: <51D98902.1090403@gmail.com> <51E17280.2030105@gmail.com> <51ED9100.8040108@gmail.com> <51EE4044.5060509@astro.uio.no> <51EE877C.4080504@gmail.com> Message-ID: On Tue, Jul 23, 2013 at 10:11 AM, St?fan van der Walt wrote: > On Tue, Jul 23, 2013 at 3:39 PM, Alan G Isaac > wrote: > > On 7/23/2013 9:09 AM, Pauli Virtanen wrote: > >> .flat which I think > >> is rarely used > > > Don't assume .flat is not commonly used. A common idiom in matlab is "a[:]" to flatten an array. When porting code over from matlab, it is typical to replace that with either "a.flat" or "a.flatten()", depending on whether an iterator or an array is needed. Cheers! Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Tue Jul 23 10:46:20 2013 From: pav at iki.fi (Pauli Virtanen) Date: Tue, 23 Jul 2013 17:46:20 +0300 Subject: [Numpy-discussion] .flat In-Reply-To: References: <51D98902.1090403@gmail.com> <51E17280.2030105@gmail.com> <51ED9100.8040108@gmail.com> <51EE4044.5060509@astro.uio.no> <51EE877C.4080504@gmail.com> Message-ID: 23.07.2013 17:34, Benjamin Root kirjoitti: [clip] > Don't assume .flat is not commonly used. A common idiom in matlab is > "a[:]" to flatten an array. When porting code over from matlab, it is > typical to replace that with either "a.flat" or "a.flatten()", depending > on whether an iterator or an array is needed. It is much more rarely used than `ravel()` and `flatten()`, as can be verified by grepping e.g. the matplotlib source code. -- Pauli Virtanen From njs at pobox.com Tue Jul 23 10:51:46 2013 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 23 Jul 2013 15:51:46 +0100 Subject: [Numpy-discussion] add .H attribute? In-Reply-To: <51EE4044.5060509@astro.uio.no> References: <51D98902.1090403@gmail.com> <51E17280.2030105@gmail.com> <51ED9100.8040108@gmail.com> <51EE4044.5060509@astro.uio.no> Message-ID: On Tue, Jul 23, 2013 at 9:35 AM, Dag Sverre Seljebotn wrote: > I don't think this is obvious at all. In fact, I'd fully expect A.H to > return a view that conjugates the values on the fly as they are > read/written (just the same way the array is "transposed on the fly" or > "sliced on the fly" with other views). > > There's lots of uses for A.H to be a conjugating-view, e.g., np.dot(A.H, > A) can be done on-the-fly by BLAS at no extra cost, and so on. These are > currently not possible with pure NumPy without a copy, which is a pretty > big defect IMO (and one reason I'd call BLAS myself using Cython rather > than use np.dot...) I was skeptical about this at first on the grounds that yeah, it'd be nice if at some point we allowed for on-the-fly transformations, it isn't happening anytime soon. But on second thought, we actually could implement this pretty easily -- just define a new dtype "conjcomplex" that stores the value x+iy as two doubles (x, -y). Then complex_arr.view(conjcomplex) would preserve memory contents but invert the numeric sign of all imaginary components, while complex_arr.astype(conjcomplex) would preserve numeric value but alter the memory representation. Because this latter cast is safe, all the existing ufuncs would automatically work fine on conjcomplex arrays. But we could also define conjcomplex-specific ufunc loops for cases like dot() where a more efficient implementation is possible (using the above-mentioned BLAS flags). Don't know if we want to actually do this, but it's doable. (I don't have any in-principle objection to .H(), but won't it just lead to more threads complaining about how confusing it is that .T and .H() are different?) -n From stefan at sun.ac.za Tue Jul 23 10:54:23 2013 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Tue, 23 Jul 2013 16:54:23 +0200 Subject: [Numpy-discussion] add .H attribute? In-Reply-To: References: <51D98902.1090403@gmail.com> <51E17280.2030105@gmail.com> <51ED9100.8040108@gmail.com> <51EE4044.5060509@astro.uio.no> Message-ID: On Tue, Jul 23, 2013 at 4:51 PM, Nathaniel Smith wrote: > Don't know if we want to actually do this, but it's doable. Would we need a matching conjugate data-type for each complex data-type then, or can the data-type be "parameterized"? St?fan From ben.root at ou.edu Tue Jul 23 10:57:58 2013 From: ben.root at ou.edu (Benjamin Root) Date: Tue, 23 Jul 2013 10:57:58 -0400 Subject: [Numpy-discussion] .flat In-Reply-To: References: <51D98902.1090403@gmail.com> <51E17280.2030105@gmail.com> <51ED9100.8040108@gmail.com> <51EE4044.5060509@astro.uio.no> <51EE877C.4080504@gmail.com> Message-ID: On Tue, Jul 23, 2013 at 10:46 AM, Pauli Virtanen wrote: > 23.07.2013 17:34, Benjamin Root kirjoitti: > [clip] > > Don't assume .flat is not commonly used. A common idiom in matlab is > > "a[:]" to flatten an array. When porting code over from matlab, it is > > typical to replace that with either "a.flat" or "a.flatten()", depending > > on whether an iterator or an array is needed. > > It is much more rarely used than `ravel()` and `flatten()`, as can be > verified by grepping e.g. the matplotlib source code. > > The matplotlib source code is not a port from Matlab, so grepping that wouldn't prove anything. Meanwhile, the "NumPy for Matlab users" page notes that a.flatten() makes a copy. A newbie to NumPy would then (correctly) look up the documentation for a.flatten() and see in the "See Also" section that "a.flat" is just an iterator rather than a copy, and would often use that to avoid the copy. Cheers! Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Tue Jul 23 11:02:13 2013 From: pav at iki.fi (Pauli Virtanen) Date: Tue, 23 Jul 2013 18:02:13 +0300 Subject: [Numpy-discussion] add .H attribute? In-Reply-To: References: <51D98902.1090403@gmail.com> <51E17280.2030105@gmail.com> <51ED9100.8040108@gmail.com> <51EE4044.5060509@astro.uio.no> Message-ID: 23.07.2013 17:51, Nathaniel Smith kirjoitti: [clip: conjcomplex dtype] > Because this latter cast is safe, all the existing ufuncs would > automatically work fine on conjcomplex arrays. But we could also > define conjcomplex-specific ufunc loops for cases like dot() where a > more efficient implementation is possible (using the above-mentioned > BLAS flags). > > Don't know if we want to actually do this, but it's doable. There's somewhat a lot of 3rd party code that doesn't do automatic casting (e.g. all of Cython code interfacing with Numpy, C extensions, f2py I think), but rather fails for incompatible input dtypes. Having arrays with a new complex dtype around would require changes in this sort of code. In this sense having an iterator of some sort with an __array__ attribute would work. However, an iterator doesn't support (without a lot of work) the various ndarray attributes which would be confusing. -- Pauli Virtanen From nouiz at nouiz.org Tue Jul 23 11:10:51 2013 From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=) Date: Tue, 23 Jul 2013 11:10:51 -0400 Subject: [Numpy-discussion] Allow == and != to raise errors In-Reply-To: <1374589784.13486.32.camel@sebastian-laptop> References: <1373632688.13968.13.camel@sebastian-laptop> <1374589784.13486.32.camel@sebastian-laptop> Message-ID: I'm mixed, because I see the good value, but I'm not able to guess the consequence of the interface change. So doing your FutureWarning would allow to gatter some data about this, and if it seam to cause too much problem, we could cancel the change. Also, in the case there is a few software that depend on the old behaviour, this will cause a crash(Except if they have a catch all Exception case), not bad result. I think it is always hard to predict the consequence of interface change in NumPy. To help measure it, we could make/as people to contribute to a collection of software that use NumPy with a good tests suites. We could test interface change on them by running there tests suites to try to have a guess of the impact of those change. What do you think of that? I think it was already discussed on the mailing list, but not acted upon. Fred On Tue, Jul 23, 2013 at 10:29 AM, Sebastian Berg wrote: > On Sat, 2013-07-13 at 11:28 -0400, josef.pktd at gmail.com wrote: > > On Sat, Jul 13, 2013 at 9:14 AM, Nathaniel Smith wrote: > > > > > I'm now +1 on the exception that Sebastian proposed. > > > > I like consistency, and having a more straightforward mental model of > > the numpy behavior for elementwise operations, that don't pretend > > sometimes to be "python" (when I'm doing array math), like this > > > > I am not sure what the result of this discussion is. As far as I see > Benjamin and Fr?d?ric were opposing and overall it seemed pretty mixed, > so unless you two changed your mind or say that it was just a small > personal preference I would drop it for now. > I obviously think the current behaviour is inconsistent to buggy and am > really only afraid of possibly breaking code out there. Which is why I > think I maybe should first add a FutureWarning if we decide on changing > it. > > Regards, > > Sebastian > > > >>> [1,2,3] < [1,2] > > False > > >>> [1,2,3] > [1,2] > > True > > > > Josef > > > > > > > > > > -n > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at scipy.org > > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Tue Jul 23 12:10:52 2013 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 23 Jul 2013 17:10:52 +0100 Subject: [Numpy-discussion] add .H attribute? In-Reply-To: References: <51D98902.1090403@gmail.com> <51E17280.2030105@gmail.com> <51ED9100.8040108@gmail.com> <51EE4044.5060509@astro.uio.no> Message-ID: On 23 Jul 2013 16:03, "Pauli Virtanen" wrote: > > 23.07.2013 17:51, Nathaniel Smith kirjoitti: > [clip: conjcomplex dtype] > > Because this latter cast is safe, all the existing ufuncs would > > automatically work fine on conjcomplex arrays. But we could also > > define conjcomplex-specific ufunc loops for cases like dot() where a > > more efficient implementation is possible (using the above-mentioned > > BLAS flags). > > > > Don't know if we want to actually do this, but it's doable. > > There's somewhat a lot of 3rd party code that doesn't do automatic > casting (e.g. all of Cython code interfacing with Numpy, C extensions, > f2py I think), but rather fails for incompatible input dtypes. Having > arrays with a new complex dtype around would require changes in this > sort of code. > > In this sense having an iterator of some sort with an __array__ > attribute would work. However, an iterator doesn't support (without a > lot of work) the various ndarray attributes which would be confusing. Surely there's more code that handles unusual but correctly castable dtypes dtypes than there is code that handles custom iterator objects that are missing ndarray attributes? -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Tue Jul 23 12:13:40 2013 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 23 Jul 2013 17:13:40 +0100 Subject: [Numpy-discussion] add .H attribute? In-Reply-To: References: <51D98902.1090403@gmail.com> <51E17280.2030105@gmail.com> <51ED9100.8040108@gmail.com> <51EE4044.5060509@astro.uio.no> Message-ID: On 23 Jul 2013 15:55, "St?fan van der Walt" wrote: > > On Tue, Jul 23, 2013 at 4:51 PM, Nathaniel Smith wrote: > > Don't know if we want to actually do this, but it's doable. > > Would we need a matching conjugate data-type for each complex > data-type then, or can the data-type be "parameterized"? Right now dtypes can't be parametrized. In this particular case it doesn't matter a whole lot anyway I think - you'd have to write basically the same code to handle different width complex types in either case, the difference is just whether that code got called at runtime or build time. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue Jul 23 12:22:13 2013 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 23 Jul 2013 10:22:13 -0600 Subject: [Numpy-discussion] .flat In-Reply-To: References: <51D98902.1090403@gmail.com> <51E17280.2030105@gmail.com> <51ED9100.8040108@gmail.com> <51EE4044.5060509@astro.uio.no> <51EE877C.4080504@gmail.com> Message-ID: On Tue, Jul 23, 2013 at 8:46 AM, Pauli Virtanen wrote: > 23.07.2013 17:34, Benjamin Root kirjoitti: > [clip] > > Don't assume .flat is not commonly used. A common idiom in matlab is > > "a[:]" to flatten an array. When porting code over from matlab, it is > > typical to replace that with either "a.flat" or "a.flatten()", depending > > on whether an iterator or an array is needed. > > It is much more rarely used than `ravel()` and `flatten()`, as can be > verified by grepping e.g. the matplotlib source code. > Grepping in my code, I find a lot of things like dfx = van.dot((ax2 - ax1).flat) IIRC, the flat version was faster than other methods. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Tue Jul 23 12:35:54 2013 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 23 Jul 2013 18:35:54 +0200 Subject: [Numpy-discussion] .flat In-Reply-To: References: <51D98902.1090403@gmail.com> <51E17280.2030105@gmail.com> <51ED9100.8040108@gmail.com> <51EE4044.5060509@astro.uio.no> <51EE877C.4080504@gmail.com> Message-ID: <1374597354.13486.37.camel@sebastian-laptop> On Tue, 2013-07-23 at 10:22 -0600, Charles R Harris wrote: > > > On Tue, Jul 23, 2013 at 8:46 AM, Pauli Virtanen wrote: > 23.07.2013 17:34, Benjamin Root kirjoitti: > [clip] > > Don't assume .flat is not commonly used. A common idiom in > matlab is > > "a[:]" to flatten an array. When porting code over from > matlab, it is > > typical to replace that with either "a.flat" or > "a.flatten()", depending > > on whether an iterator or an array is needed. > > It is much more rarely used than `ravel()` and `flatten()`, as > can be > verified by grepping e.g. the matplotlib source code. > > Grepping in my code, I find a lot of things like > > dfx = van.dot((ax2 - ax1).flat) > > IIRC, the flat version was faster than other methods. > Faster then flatten certainly (since flatten forces a copy), I would be quite surprised if it is faster then ravel, and since dot can't make use of the iterator, that seems more natural to me. - Sebastian > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From pav at iki.fi Tue Jul 23 12:36:08 2013 From: pav at iki.fi (Pauli Virtanen) Date: Tue, 23 Jul 2013 19:36:08 +0300 Subject: [Numpy-discussion] .flat In-Reply-To: References: <51D98902.1090403@gmail.com> <51E17280.2030105@gmail.com> <51ED9100.8040108@gmail.com> <51EE4044.5060509@astro.uio.no> <51EE877C.4080504@gmail.com> Message-ID: 23.07.2013 19:22, Charles R Harris kirjoitti: [clip] > Grepping in my code, I find a lot of things like > > dfx = van.dot((ax2 - ax1).flat) > > IIRC, the flat version was faster than other methods. That goes through the same code path as `van.dot(np.asarray((ax2 - ax1).flat))`, which calls the `__array__` attribute of the flatiter object. If it's faster than .ravel(), that is surprising. -- Pauli Virtanen From charlesr.harris at gmail.com Tue Jul 23 13:05:06 2013 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 23 Jul 2013 11:05:06 -0600 Subject: [Numpy-discussion] .flat In-Reply-To: References: <51D98902.1090403@gmail.com> <51E17280.2030105@gmail.com> <51ED9100.8040108@gmail.com> <51EE4044.5060509@astro.uio.no> <51EE877C.4080504@gmail.com> Message-ID: On Tue, Jul 23, 2013 at 10:36 AM, Pauli Virtanen wrote: > 23.07.2013 19:22, Charles R Harris kirjoitti: > [clip] > > Grepping in my code, I find a lot of things like > > > > dfx = van.dot((ax2 - ax1).flat) > > > > IIRC, the flat version was faster than other methods. > > That goes through the same code path as > `van.dot(np.asarray((ax2 - ax1).flat))`, which calls the `__array__` > attribute of the flatiter object. If it's faster than .ravel(), that is > surprising. > > Well, I never use ravel, there are zero examples in my code ;) So you may be correct. I'm not sure the example I gave is the one where '*.flat' wins, but I recall such a case and have just used flat a lot ever since. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Tue Jul 23 13:39:10 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 23 Jul 2013 13:39:10 -0400 Subject: [Numpy-discussion] .flat In-Reply-To: References: <51D98902.1090403@gmail.com> <51E17280.2030105@gmail.com> <51ED9100.8040108@gmail.com> <51EE4044.5060509@astro.uio.no> <51EE877C.4080504@gmail.com> Message-ID: On Tue, Jul 23, 2013 at 1:05 PM, Charles R Harris wrote: > > > On Tue, Jul 23, 2013 at 10:36 AM, Pauli Virtanen wrote: >> >> 23.07.2013 19:22, Charles R Harris kirjoitti: >> [clip] >> > Grepping in my code, I find a lot of things like >> > >> > dfx = van.dot((ax2 - ax1).flat) >> > >> > IIRC, the flat version was faster than other methods. >> >> That goes through the same code path as >> `van.dot(np.asarray((ax2 - ax1).flat))`, which calls the `__array__` >> attribute of the flatiter object. If it's faster than .ravel(), that is >> surprising. >> > > Well, I never use ravel, there are zero examples in my code ;) So you may be > correct. > > I'm not sure the example I gave is the one where '*.flat' wins, but I recall > such a case and have just used flat a lot ever since. > > Chuck just another survey scipy: ravel: 136 (including stats) flat: 6 flatten: 37 (not current master) statsmodels ravel: 137, flat: 0 flatten: 9 I only use ravel (what am I supposed to do with an iterator if I want a view?) (I think the equivalent of matlab x(:) is x.ravel("F") not flat or flatten) Josef > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From alan.isaac at gmail.com Tue Jul 23 13:53:54 2013 From: alan.isaac at gmail.com (Alan G Isaac) Date: Tue, 23 Jul 2013 13:53:54 -0400 Subject: [Numpy-discussion] add .H attribute? In-Reply-To: References: <51D98902.1090403@gmail.com> <51E17280.2030105@gmail.com> <51ED9100.8040108@gmail.com> <51EE4044.5060509@astro.uio.no> Message-ID: <51EEC332.7070805@gmail.com> I'm trying to understand the state of this discussion. I believe that propoents of adding a .H attribute have primarily emphasized - readability (and general ease of use) - consistency with matrix and masked array - forward looking (to a future when .H can be a view) The opponents have primarily emphasized - inconsistency with convention that for arrays instance attributes should return views Is this a correct summary? If it is correct, I believe the proponents' case is stronger. All the considerations are valid, so it is a matter of deciding how to weight them. The alternative of offering a new method seems inferior in terms of readability and consistency, and it is not adequately forward looking. If the alternative is nevertheless chosen, I suggest that it should definitely *not* be .H(), both because of the conflict with uses by matrix and masked array, and because I expect that eventually the desire for an attribute will win the day, and it would be a shame for the obvious notation to be lost. Alan Isaac From d.s.seljebotn at astro.uio.no Tue Jul 23 17:08:11 2013 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Tue, 23 Jul 2013 23:08:11 +0200 Subject: [Numpy-discussion] add .H attribute? In-Reply-To: <51EEC332.7070805@gmail.com> References: <51D98902.1090403@gmail.com> <51E17280.2030105@gmail.com> <51ED9100.8040108@gmail.com> <51EE4044.5060509@astro.uio.no> <51EEC332.7070805@gmail.com> Message-ID: <51EEF0BB.60508@astro.uio.no> On 07/23/2013 07:53 PM, Alan G Isaac wrote: > I'm trying to understand the state of this discussion. > I believe that propoents of adding a .H attribute have > primarily emphasized > > - readability (and general ease of use) > - consistency with matrix and masked array > - forward looking (to a future when .H can be a view) I disagree with this being forward looking, as it explicitly creates a situation where code will break if .H becomes a view, e.g.: xh = x.H x *= 2 assert np.all(2 * xh == x.H) > > The opponents have primarily emphasized > > - inconsistency with convention that for arrays > instance attributes should return views I'd formulate this as simply "inconsistency with .T"; they are both motivated primarily as notational shorthands. Dag Sverre From josef.pktd at gmail.com Tue Jul 23 17:32:52 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 23 Jul 2013 17:32:52 -0400 Subject: [Numpy-discussion] add .H attribute? In-Reply-To: <51EEF0BB.60508@astro.uio.no> References: <51D98902.1090403@gmail.com> <51E17280.2030105@gmail.com> <51ED9100.8040108@gmail.com> <51EE4044.5060509@astro.uio.no> <51EEC332.7070805@gmail.com> <51EEF0BB.60508@astro.uio.no> Message-ID: On Tue, Jul 23, 2013 at 5:08 PM, Dag Sverre Seljebotn wrote: > On 07/23/2013 07:53 PM, Alan G Isaac wrote: >> I'm trying to understand the state of this discussion. >> I believe that propoents of adding a .H attribute have >> primarily emphasized >> >> - readability (and general ease of use) >> - consistency with matrix and masked array >> - forward looking (to a future when .H can be a view) > > I disagree with this being forward looking, as it explicitly creates a > situation where code will break if .H becomes a view, e.g.: > > xh = x.H > x *= 2 > assert np.all(2 * xh == x.H) > >> >> The opponents have primarily emphasized >> >> - inconsistency with convention that for arrays >> instance attributes should return views > > I'd formulate this as simply "inconsistency with .T"; they are both > motivated primarily as notational shorthands. Do we really need a one letter shorthand for `a.conj().T` ? I don't. Josef (The one who wrote np.max(np.abs(y - x)) and np.max(np.abs(y / x - 1)) 30 or more times in the last 24 hours, in pdb.) > > Dag Sverre > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From alan.isaac at gmail.com Tue Jul 23 18:22:15 2013 From: alan.isaac at gmail.com (Alan G Isaac) Date: Tue, 23 Jul 2013 18:22:15 -0400 Subject: [Numpy-discussion] add .H attribute? In-Reply-To: <51EEF0BB.60508@astro.uio.no> References: <51D98902.1090403@gmail.com> <51E17280.2030105@gmail.com> <51ED9100.8040108@gmail.com> <51EE4044.5060509@astro.uio.no> <51EEC332.7070805@gmail.com> <51EEF0BB.60508@astro.uio.no> Message-ID: <51EF0217.209@gmail.com> On 7/23/2013 5:08 PM, Dag Sverre Seljebotn wrote: > I disagree with this being forward looking, as it explicitly creates a > situation where code will break if .H becomes a view Well yes, we cannot have everything. Just like it is taking a while for ``diagonal`` to transition to providing a view, this would be true for .H when the time comes. Naturally, this would be documented (that it may change to a view). Just as it is documented with ``diagonal``. But it is nevertheless forward looking in an obvious sense: it provides access to an extremely convenient and much more readable notation that will in any case eventually be available. Also, the current context is the matrices and masked arrays have this attribute, so this transitional issue already exists. Out of curiosity: do you use NumPy to work with complex arrays? Alan From alan.isaac at gmail.com Tue Jul 23 18:30:42 2013 From: alan.isaac at gmail.com (Alan G Isaac) Date: Tue, 23 Jul 2013 18:30:42 -0400 Subject: [Numpy-discussion] add .H attribute? In-Reply-To: References: <51D98902.1090403@gmail.com> <51E17280.2030105@gmail.com> <51ED9100.8040108@gmail.com> <51EE4044.5060509@astro.uio.no> <51EEC332.7070805@gmail.com> <51EEF0BB.60508@astro.uio.no> Message-ID: <51EF0412.5030003@gmail.com> On 7/23/2013 5:32 PM, josef.pktd at gmail.com wrote: > Do we really need a one letter shorthand for `a.conj().T` ? One way to assess this would be to propose removing it from matrix and masked array objects. If the yelping is loud enough, there is apparently need. I suspect the yelping would be pretty loud. Indeed, the reason I started this thread is that I'm using the matrix object less and less, and I definitely miss the .H attribute it offers. In any case, need is the wrong criterion. The question is, do the gains in readability, consistency (across objects), convenience, and advertising appeal (e.g., to those used to other languages) outweigh the costs? It's a cost benefit analysis. Obviously some people think the costs outweigh the benefits and others say they do not. We should look for a ways to determine which group has the better case. This discussion has made me much more inclined to believe it is a good idea to add this attribute. I agree that it would be an even better idea to add it as an iterative view, but nobody seems to feel that can happen quickly. Alan From alan.isaac at gmail.com Tue Jul 23 19:57:39 2013 From: alan.isaac at gmail.com (Alan G Isaac) Date: Tue, 23 Jul 2013 19:57:39 -0400 Subject: [Numpy-discussion] add .H attribute? In-Reply-To: <51EF078E.8050603@astro.uio.no> References: <51D98902.1090403@gmail.com> <51E17280.2030105@gmail.com> <51ED9100.8040108@gmail.com> <51EE4044.5060509@astro.uio.no> <51EEC332.7070805@gmail.com> <51EEF0BB.60508@astro.uio.no> <51EF0217.209@gmail.com> <51EF078E.8050603@astro.uio.no> Message-ID: <51EF1873.7050202@gmail.com> On 7/23/2013 6:45 PM, Dag Sverre Seljebotn wrote: > It'd be great if you could try to incorporate it to create a more balanced overview Attempt 2: I believe that propoents of adding a .H attribute have primarily emphasized - readability (and general ease of use, including in teaching) - consistency with matrix and masked array - forward looking (to a future when .H can be a view) in the following sense: it gives access now to the conjugate transpose via .H, which is likely to be implemented in the future, so as long as we document (as with ``diagonal``) that this may change, it gives a large chunk of the desired benefit now. The opponents have primarily emphasized - inconsistency with convention that for arrays instance attributes should return views - NOT forward looking (to a future when .H can be a view) in the following sense: it gives access now to the conjugate transpose via .H but NOT as a view, which is likely to be the preferred implementation in the future, and if the implementation changes in this preferred way then code that relied on behavior rather than documentation will break Finally, I think (?) everyone (proponents and opponents) would be happy if .H could provide access to an iterative view of the conjugate transpose. (Any objections?) Better? Alan From chris.barker at noaa.gov Tue Jul 23 20:15:25 2013 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Tue, 23 Jul 2013 17:15:25 -0700 Subject: [Numpy-discussion] add .H attribute? In-Reply-To: References: <51D98902.1090403@gmail.com> <51E17280.2030105@gmail.com> <51ED9100.8040108@gmail.com> <51EE4044.5060509@astro.uio.no> Message-ID: On Tue, Jul 23, 2013 at 6:09 AM, Pauli Virtanen wrote: > The .H property has been implemented in Numpy matrices and Scipy's > sparse matrices for many years. Then we're done. Numpy is an array package, NOT a matrix package, and while you can implement matrix math with arrays (and we do), having quick and easy mnemonics for common matrix math operations (but uncommon general purpose array operations) is not eh job of numpy. That's what the matrix object is for. Yes, I know the matrix object isn't really what it should be, and doesn't get much use, but if you want something that is natural for doing matrix math, and particularly natural for teaching it -- that's what it's for -- work to make it what it could be, rather than polluting numpy with this stuff. One of the things I've loved about numpy after moving from MATLAB is that matrixes are second-class citizens, not the other way around. (OK, I'll go away now....) -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From stefan at sun.ac.za Wed Jul 24 02:53:56 2013 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Wed, 24 Jul 2013 08:53:56 +0200 Subject: [Numpy-discussion] add .H attribute? In-Reply-To: References: <51D98902.1090403@gmail.com> <51E17280.2030105@gmail.com> <51ED9100.8040108@gmail.com> <51EE4044.5060509@astro.uio.no> Message-ID: On Wed, Jul 24, 2013 at 2:15 AM, Chris Barker - NOAA Federal wrote: > > On Tue, Jul 23, 2013 at 6:09 AM, Pauli Virtanen wrote: > > > The .H property has been implemented in Numpy matrices and Scipy's > > sparse matrices for many years. > > Then we're done. Numpy is an array package, NOT a matrix package, and > while you can implement matrix math with arrays (and we do), having > quick and easy mnemonics for common matrix math operations (but > uncommon general purpose array operations) is not eh job of numpy. > That's what the matrix object is for. I would argue that the ship sailed when we added .T already. Most users see no difference between the addition of .T and .H. The matrix class should probably be deprecated and removed from NumPy in the long run--being a second class citizen not used by the developers themselves is not sustainable. And, now that we have "dot" as a method, there's very little advantage to it. St?fan From seb.haase at gmail.com Wed Jul 24 03:15:56 2013 From: seb.haase at gmail.com (Sebastian Haase) Date: Wed, 24 Jul 2013 09:15:56 +0200 Subject: [Numpy-discussion] add .H attribute? In-Reply-To: References: <51D98902.1090403@gmail.com> <51E17280.2030105@gmail.com> <51ED9100.8040108@gmail.com> <51EE4044.5060509@astro.uio.no> Message-ID: On Wed, Jul 24, 2013 at 8:53 AM, St?fan van der Walt wrote: > On Wed, Jul 24, 2013 at 2:15 AM, Chris Barker - NOAA Federal > wrote: >> >> On Tue, Jul 23, 2013 at 6:09 AM, Pauli Virtanen wrote: >> >> > The .H property has been implemented in Numpy matrices and Scipy's >> > sparse matrices for many years. >> >> Then we're done. Numpy is an array package, NOT a matrix package, and >> while you can implement matrix math with arrays (and we do), having >> quick and easy mnemonics for common matrix math operations (but >> uncommon general purpose array operations) is not eh job of numpy. >> That's what the matrix object is for. > > I would argue that the ship sailed when we added .T already. Most > users see no difference between the addition of .T and .H. > > The matrix class should probably be deprecated and removed from NumPy > in the long run--being a second class citizen not used by the > developers themselves is not sustainable. And, now that we have "dot" > as a method, there's very little advantage to it. > > St?fan Maybe this is the point where one just needs to do a poll. And finally someone has to make the decision. I feel that adding a method .H() would be the compromise ! Alan, could you live with that ? It is short enough and still emphasises the fact that it is NOT a view and therefore behaves sensitively different in certain scenarios as .T . It also leaves the door open to adding an iterator .H attribute later on without introducing the above mentioned code breaks. Who could make (i.e. is willing to make) the decision ? (( I would not open the discussion about ndarray vs. matrix -- it gets far to involving and we would be talking about far-future directions instead of "a single letter addition", which abvious already has big enough support and had so years ago)) Regards, Sebastian Haase From stefan at sun.ac.za Wed Jul 24 03:30:55 2013 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Wed, 24 Jul 2013 09:30:55 +0200 Subject: [Numpy-discussion] add .H attribute? In-Reply-To: References: <51D98902.1090403@gmail.com> <51E17280.2030105@gmail.com> <51ED9100.8040108@gmail.com> <51EE4044.5060509@astro.uio.no> Message-ID: On Wed, Jul 24, 2013 at 9:15 AM, Sebastian Haase wrote: > I feel that adding a method > .H() > would be the compromise ! Thinking about this more, I think it would just confuse most users... why .T and not .H; then you have to start explaining the underlying implementation detail. For users who already understand the implementation detail, finding .T.conj() would not be too hard. > (( I would not open the discussion about ndarray vs. matrix -- it gets > far to involving and we would be talking about far-future directions > instead of "a single letter addition", which abvious already has big > enough support and had so years ago)) I am willing to write up a NEP if there's any interest. The plan would be to remove the Matrix class from numpy over two or three releases, and publish it as a separate package on PyPi. St?fan From josef.pktd at gmail.com Wed Jul 24 03:40:58 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 24 Jul 2013 03:40:58 -0400 Subject: [Numpy-discussion] add .H attribute? In-Reply-To: References: <51D98902.1090403@gmail.com> <51E17280.2030105@gmail.com> <51ED9100.8040108@gmail.com> <51EE4044.5060509@astro.uio.no> Message-ID: I think a H is feature creep and too specialized What's .H of a int a str a bool ? It's just .T and a view, so you cannot rely that conj() makes a copy if you don't work with complex. .T is just a reshape function and has **nothing** to do with matrix algebra. >>> x = np.arange(12).reshape(3,4) >>> x array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]]) >>> np.may_share_memory(x, x.T) True >>> np.may_share_memory(x, x.conj()) True >>> y = x + 1j >>> np.may_share_memory(y, y.conj()) False >>> y.dtype dtype('complex128') >>> x.conj().dtype dtype('int32') Josef On Wed, Jul 24, 2013 at 3:30 AM, St?fan van der Walt wrote: > On Wed, Jul 24, 2013 at 9:15 AM, Sebastian Haase wrote: >> I feel that adding a method >> .H() >> would be the compromise ! > > Thinking about this more, I think it would just confuse most users... > why .T and not .H; then you have to start explaining the underlying > implementation detail. For users who already understand the > implementation detail, finding .T.conj() would not be too hard. > >> (( I would not open the discussion about ndarray vs. matrix -- it gets >> far to involving and we would be talking about far-future directions >> instead of "a single letter addition", which abvious already has big >> enough support and had so years ago)) > > I am willing to write up a NEP if there's any interest. The plan > would be to remove the Matrix class from numpy over two or three > releases, and publish it as a separate package on PyPi. > > St?fan > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From josef.pktd at gmail.com Wed Jul 24 04:05:24 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 24 Jul 2013 04:05:24 -0400 Subject: [Numpy-discussion] add .H attribute? In-Reply-To: <51EE4044.5060509@astro.uio.no> References: <51D98902.1090403@gmail.com> <51E17280.2030105@gmail.com> <51ED9100.8040108@gmail.com> <51EE4044.5060509@astro.uio.no> Message-ID: On Tue, Jul 23, 2013 at 4:35 AM, Dag Sverre Seljebotn wrote: ... > There's lots of uses for A.H to be a conjugating-view, e.g., np.dot(A.H, > A) can be done on-the-fly by BLAS at no extra cost, and so on. These are > currently not possible with pure NumPy without a copy, which is a pretty > big defect IMO (and one reason I'd call BLAS myself using Cython rather > than use np.dot...) Wouldn't the simpler way not just be to expose those linalg functions? hdot(X, Y) == dot(X.T, Y) (if not complex) == dot(X.H, Y) (if complex) Josef From dave.hirschfeld at gmail.com Wed Jul 24 04:23:09 2013 From: dave.hirschfeld at gmail.com (Dave Hirschfeld) Date: Wed, 24 Jul 2013 08:23:09 +0000 (UTC) Subject: [Numpy-discussion] add .H attribute? References: <51D98902.1090403@gmail.com> <51E17280.2030105@gmail.com> <51ED9100.8040108@gmail.com> <51EE4044.5060509@astro.uio.no> Message-ID: gmail.com> writes: > > I think a H is feature creep and too specialized > > What's .H of a int a str a bool ? > > It's just .T and a view, so you cannot rely that conj() makes a copy > if you don't work with complex. > > .T is just a reshape function and has **nothing** to do with matrix algebra. > It seems to me that that ship has already sailed - i.e. conj doesn't make much sense for str arrays, but it still works in the sense that it's a nop In [16]: A = asarray(list('abcdefghi')).reshape(3,3) ...: np.all(A.T == A.conj().T) ...: Out[16]: True If we're voting my vote goes to add the .H attribute for all the reasons Alan has specified. Document that it returns a copy but that it may in future return a view so it it not future proof to operate on the result inplace. I'm -1 on .H() as it will require code changes if it ever changes to a property and it will simply result in questions about why .T is a property and .H is a function (and why it's a property for (sparse) matrices) Regarding Dag's example: xh = x.H x *= 2 assert np.all(2 * xh == x.H) I'm sceptical that there's much code out there actually relying on the fact that a transpose is a view with the specified intention of altering the original array inplace. I work with a lot of beginners and whenever I've seen them operate inplace on a transpose it has been a bug in the code, leading to a discussion of how, for performance reasons, numpy will return a view where possible, leading to yet further discussion of when it is and isn't possible to return a view. The third option of .H returning a view would probably be agreeable to everyone but I don't think we should punt on this decision for something that if it does happen is likely years away. It seems that work on this front is happening in different projects to numpy. Even if for example sometime in the future numpy's internals were replaced with libdynd or other expression graph engine surely this would result in more breaking changes than .H returning a view rather than a copy?! IANAD so I'm happy with whatever the consensus is I just thought I'd put forward the view from a (specific type of) user perspective. Regards, Dave From klemm at phys.ethz.ch Wed Jul 24 05:46:25 2013 From: klemm at phys.ethz.ch (Hanno Klemm) Date: Wed, 24 Jul 2013 11:46:25 +0200 Subject: [Numpy-discussion] Question regarding documentation of structured arrays Message-ID: <45e848b3afcc062d96726186d2cb827b@phys.ethz.ch> Hi, I found the following inconsistency between the advertised and the actual behviour of structured arrays: on http://docs.scipy.org/doc/numpy/user/basics.rec.html it says in the section "Accessing multiple fields at once" Notice that the fields are always returned in the same order regardless of the sequence they are asked for. Fortunately that does not seem to be the case in my simple test (see below). Is that a change in behaviour I can rely on or am I somehow lucky in this particular example? Thanks, Hanno In [596]: test_array = np.ones((10),dtype=[('a', float), ('b',float)]) In [597]: test_array Out[597]: array([(1.0, 1.0), (1.0, 1.0), (1.0, 1.0), (1.0, 1.0), (1.0, 1.0), (1.0, 1.0), (1.0, 1.0), (1.0, 1.0), (1.0, 1.0), (1.0, 1.0)], dtype=[('a', ' References: <45e848b3afcc062d96726186d2cb827b@phys.ethz.ch> Message-ID: Hi Hanno On Wed, Jul 24, 2013 at 11:46 AM, Hanno Klemm wrote: > I found the following inconsistency between the advertised and the > actual behviour of structured arrays: > > on http://docs.scipy.org/doc/numpy/user/basics.rec.html it says in the > section > > "Accessing multiple fields at once" > Notice that the fields are always returned in the same order regardless > of the sequence they are asked for. I can confirm the behavior you see under the latest development version. Would you mind filing a pull request against the docs? St?fan From njs at pobox.com Wed Jul 24 06:54:28 2013 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 24 Jul 2013 11:54:28 +0100 Subject: [Numpy-discussion] add .H attribute? In-Reply-To: References: <51D98902.1090403@gmail.com> <51E17280.2030105@gmail.com> <51ED9100.8040108@gmail.com> <51EE4044.5060509@astro.uio.no> Message-ID: On Wed, Jul 24, 2013 at 9:23 AM, Dave Hirschfeld wrote: > If we're voting my vote goes to add the .H attribute for all the reasons > Alan has specified. Document that it returns a copy but that it may in > future return a view so it it not future proof to operate on the result > inplace. As soon as you talk about attributes "returning" things you've already broken Python's mental model... attributes are things that sit there, not things that execute arbitrary code. Of course this is not how the actual implementation works, attribute access *can* in fact execute arbitrary code, but the mental model is important, so we should preserve it where-ever we can. Just mentioning an attribute should not cause unbounded memory allocations. Consider these two expressions: x = solve(dot(arr, arr.T), arr.T) x = solve(dot(arr, arr.H), arr.H) Mathematically, they're very similar, and the mathematics-like notation does a good job of expressing that similarity while hiding mathematically irrelevant details. Which is what mathematical notation is for. But numpy isn't a toolkit for writing mathematical formula, it's a toolkit for writing computational algorithms that implement mathematical formula, and algorithmically, those two expressions are radically different. The first one allocates one temporary (the result from 'dot'); the second one allocates 3 temporaries. The second one is gratuitously inefficient, since two of those temporaries are identical, but they're being computed twice anyway. > I'm sceptical that there's much code out there actually relying on the fact > that a transpose is a view with the specified intention of altering the > original array inplace. > > I work with a lot of beginners and whenever I've seen them operate inplace > on a transpose it has been a bug in the code, leading to a discussion of > how, for performance reasons, numpy will return a view where possible, > leading to yet further discussion of when it is and isn't possible to return > a view. The point isn't that there's code that relies specifically on .T returning a view. It's that to be a good programmer, you need to *know whether* it returns a view -- exactly as you say in the second paragraph. And a library should not hide these kinds of details. -n From njs at pobox.com Wed Jul 24 06:56:38 2013 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 24 Jul 2013 11:56:38 +0100 Subject: [Numpy-discussion] add .H attribute? In-Reply-To: References: <51D98902.1090403@gmail.com> <51E17280.2030105@gmail.com> <51ED9100.8040108@gmail.com> <51EE4044.5060509@astro.uio.no> Message-ID: On Wed, Jul 24, 2013 at 8:30 AM, St?fan van der Walt wrote: > I am willing to write up a NEP if there's any interest. The plan > would be to remove the Matrix class from numpy over two or three > releases, and publish it as a separate package on PyPi. Please do! There are some sticky issues to work through (e.g. how to deprecate the "matrix" entry in the numpy namespace, what to do with scipy.sparse), and I don't know whether we'll decide to go through with it in the end, but the way to figure that out is to, you know, work through them :-). -n From dave.hirschfeld at gmail.com Wed Jul 24 07:08:29 2013 From: dave.hirschfeld at gmail.com (Dave Hirschfeld) Date: Wed, 24 Jul 2013 11:08:29 +0000 (UTC) Subject: [Numpy-discussion] add .H attribute? References: <51D98902.1090403@gmail.com> <51E17280.2030105@gmail.com> <51ED9100.8040108@gmail.com> <51EE4044.5060509@astro.uio.no> Message-ID: Nathaniel Smith pobox.com> writes: > > > As soon as you talk about attributes "returning" things you've already > broken Python's mental model... attributes are things that sit there, > not things that execute arbitrary code. Of course this is not how the > actual implementation works, attribute access *can* in fact execute > arbitrary code, but the mental model is important, so we should > preserve it where-ever we can. Just mentioning an attribute should not > cause unbounded memory allocations. > Yep, sorry - sloppy use of terminology which I agree is important in helping understand what's happening. -Dave From klemm at phys.ethz.ch Wed Jul 24 07:29:36 2013 From: klemm at phys.ethz.ch (Hanno Klemm) Date: Wed, 24 Jul 2013 13:29:36 +0200 Subject: [Numpy-discussion] Question regarding documentation of structured arrays In-Reply-To: References: <45e848b3afcc062d96726186d2cb827b@phys.ethz.ch> Message-ID: <0b2c3a3a2e18264ff5bc72e0e16b0d14@phys.ethz.ch> Hi Stefan, I would be happy to file a pull request against the docs if you (or somebody) could point me to a document on how and where to do that. Hanno On 24.07.2013 12:31, St?fan van der Walt wrote: > Hi Hanno > > On Wed, Jul 24, 2013 at 11:46 AM, Hanno Klemm > wrote: >> I found the following inconsistency between the advertised and the >> actual behviour of structured arrays: >> >> on http://docs.scipy.org/doc/numpy/user/basics.rec.html it says in the >> section >> >> "Accessing multiple fields at once" >> Notice that the fields are always returned in the same order >> regardless >> of the sequence they are asked for. > > I can confirm the behavior you see under the latest development > version. Would you mind filing a pull request against the docs? > > St?fan > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Hanno Klemm klemm at phys.ethz.ch From davidmenhur at gmail.com Wed Jul 24 08:47:59 2013 From: davidmenhur at gmail.com (=?UTF-8?B?RGHPgGlk?=) Date: Wed, 24 Jul 2013 14:47:59 +0200 Subject: [Numpy-discussion] add .H attribute? In-Reply-To: References: <51D98902.1090403@gmail.com> <51E17280.2030105@gmail.com> <51ED9100.8040108@gmail.com> <51EE4044.5060509@astro.uio.no> Message-ID: An idea: If .H is ideally going to be a view, and we want to keep it this way, we could have a .h() method with the present implementation. This would preserve the name .H for the conjugate view --when someone finds the way to do it. This way we would increase the readability, simplify some matrix algebra code, and keep the API consistency. On 24 July 2013 13:08, Dave Hirschfeld wrote: > Nathaniel Smith pobox.com> writes: > >> >> >> As soon as you talk about attributes "returning" things you've already >> broken Python's mental model... attributes are things that sit there, >> not things that execute arbitrary code. Of course this is not how the >> actual implementation works, attribute access *can* in fact execute >> arbitrary code, but the mental model is important, so we should >> preserve it where-ever we can. Just mentioning an attribute should not >> cause unbounded memory allocations. >> > > Yep, sorry - sloppy use of terminology which I agree is important in helping > understand what's happening. > > -Dave > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From alan.isaac at gmail.com Wed Jul 24 08:58:34 2013 From: alan.isaac at gmail.com (Alan G Isaac) Date: Wed, 24 Jul 2013 08:58:34 -0400 Subject: [Numpy-discussion] add .H attribute? In-Reply-To: References: <51D98902.1090403@gmail.com> <51E17280.2030105@gmail.com> <51ED9100.8040108@gmail.com> <51EE4044.5060509@astro.uio.no> Message-ID: <51EFCF7A.6030805@gmail.com> On 7/24/2013 3:15 AM, Sebastian Haase wrote: > I feel that adding a method > .H() > would be the compromise ! > > Alan, could you live with that ? I feel .H() now would get in the way of a .H attribute later, which some have indicated could be added as an iterative view in a future numpy. I'd rather wait for that. My assessment of the conversation so far: there is not adequate support for a .H attribute until it can be an iterative view. I believe that almost everyone (possibly not Josef) would accept or want a .H attribute if it could provide an iterative view. (Is that correct?) So I'll drop out of the conversation, but I hope the user interest that has been displayed stimulates interest in that feature request. Thanks to everyone who shared their perspective on this issue. And my apologies to those (e.g., Dag) whom I annoyed by being too bullheaded. Cheers, Alan From ben.root at ou.edu Wed Jul 24 09:54:52 2013 From: ben.root at ou.edu (Benjamin Root) Date: Wed, 24 Jul 2013 09:54:52 -0400 Subject: [Numpy-discussion] add .H attribute? In-Reply-To: References: <51D98902.1090403@gmail.com> <51E17280.2030105@gmail.com> <51ED9100.8040108@gmail.com> <51EE4044.5060509@astro.uio.no> Message-ID: On Wed, Jul 24, 2013 at 8:47 AM, Da?id wrote: > An idea: > > If .H is ideally going to be a view, and we want to keep it this way, > we could have a .h() method with the present implementation. This > would preserve the name .H for the conjugate view --when someone finds > the way to do it. > > This way we would increase the readability, simplify some matrix > algebra code, and keep the API consistency. > > I could get behind a .h() method until .H attribute is ready. +1 Cheers! Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at sun.ac.za Wed Jul 24 10:57:01 2013 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Wed, 24 Jul 2013 16:57:01 +0200 Subject: [Numpy-discussion] Question regarding documentation of structured arrays In-Reply-To: <0b2c3a3a2e18264ff5bc72e0e16b0d14@phys.ethz.ch> References: <45e848b3afcc062d96726186d2cb827b@phys.ethz.ch> <0b2c3a3a2e18264ff5bc72e0e16b0d14@phys.ethz.ch> Message-ID: Hallo Hanno On Wed, Jul 24, 2013 at 1:29 PM, Hanno Klemm wrote: > I would be happy to file a pull request against the docs if you (or > somebody) could point me to a document on how and where to do that. The file you want to edit is here: https://github.com/numpy/numpy/blob/master/numpy/doc/structured_arrays.py#L194 You can click on the "edit" button, then GitHub will help you to make a pull request. Thanks! St?fan From lists at onerussian.com Wed Jul 24 11:00:37 2013 From: lists at onerussian.com (Yaroslav Halchenko) Date: Wed, 24 Jul 2013 11:00:37 -0400 Subject: [Numpy-discussion] fresh performance boosts and elderly hits e.g. identity, ones In-Reply-To: <20130719220754.GR27621@onerussian.com> References: <20130506143241.GV5140@onerussian.com> <1367856232.2506.31.camel@sebastian-laptop> <20130506161153.GW5140@onerussian.com> <1367927238.23010.12.camel@sebastian-laptop> <20130701193006.GC27621@onerussian.com> <20130701215804.GG27621@onerussian.com> <20130709161007.GL27621@onerussian.com> <20130717035348.GN27621@onerussian.com> <20130719220754.GR27621@onerussian.com> Message-ID: <20130724150037.GW27621@onerussian.com> Added some basic constructors benchmarks: http://www.onerussian.com/tmp/numpy-vbench/vb_vb_core.html quite a bit of fresh enhancements are present (cool) but also some freshly discovered elderly hits, e.g. http://www.onerussian.com/tmp/numpy-vbench/vb_vb_core.html#numpy-identity-100 http://www.onerussian.com/tmp/numpy-vbench/vb_vb_core.html#numpy-ones-100 Cheers, On Fri, 19 Jul 2013, Yaroslav Halchenko wrote: > I have just added a few more benchmarks, and here they come > http://www.onerussian.com/tmp/numpy-vbench/vb_vb_linalg.html#numpy-linalg-pinv-a-float32 > it seems to be very recent so my only check based on 10 commits > didn't pick it up yet so they are not present in the summary table. > could well be related to 80% faster det()? ;) > norm was hit as well a bit earlier, might well be within these commits: > https://github.com/numpy/numpy/compare/24a0aa5...29dcc54 > I will rerun now benchmarking for the rest of commits (was running last > in the day iirc) > Cheers, -- Yaroslav O. Halchenko, Ph.D. http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org Senior Research Associate, Psychological and Brain Sciences Dept. Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 WWW: http://www.linkedin.com/in/yarik From stefan at sun.ac.za Wed Jul 24 11:02:52 2013 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Wed, 24 Jul 2013 17:02:52 +0200 Subject: [Numpy-discussion] add .H attribute? In-Reply-To: References: <51D98902.1090403@gmail.com> <51E17280.2030105@gmail.com> <51ED9100.8040108@gmail.com> <51EE4044.5060509@astro.uio.no> Message-ID: On Wed, Jul 24, 2013 at 12:54 PM, Nathaniel Smith wrote: > The point isn't that there's code that relies specifically on .T > returning a view. It's that to be a good programmer, you need to *know > whether* it returns a view -- exactly as you say in the second > paragraph. And a library should not hide these kinds of details. After listening to the arguments by yourself and Dag, I think I buy into the idea that we should hold off on this until we have ufunc views or something similar implemented. Also, if we split off the matrix package, we can give other people who really care about that (perhaps Alan is interested?) ownership, and let them run with it (I mainly use ndarrays myself). St?fan From chris.barker at noaa.gov Wed Jul 24 11:24:00 2013 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Wed, 24 Jul 2013 08:24:00 -0700 Subject: [Numpy-discussion] add .H attribute? In-Reply-To: References: <51D98902.1090403@gmail.com> <51E17280.2030105@gmail.com> <51ED9100.8040108@gmail.com> <51EE4044.5060509@astro.uio.no> Message-ID: <-132241265923687312@unknownmsgid> On Jul 23, 2013, at 11:54 PM, "St?fan van der Walt" wrote: >>> The .H property has been implemented in Numpy matrices and Scipy's >>> sparse matrices for many years. >> >> Then we're done. Numpy is an array package, NOT a matrix package, and >> while you can implement matrix math with arrays (and we do), having >> quick and easy mnemonics for common matrix math operations (but >> uncommon general purpose array operations) is not eh job of numpy. >> That's what the matrix object is for. > > I would argue that the ship sailed when we added .T already. Most > users see no difference between the addition of .T and .H. I don't know who can speak for "most users", but I see them quite differently. Transposing is a common operation outside of linear algebra--I, for one, use it to work with image arrays, which are often stored in a way by image libraries that is the transpose of the "natural" numpy way. But anyway, just because we have one domain-specific convenience attribute, doesn't mean we should add them all. > The matrix class should probably be deprecated and removed from NumPy > in the long run--being a second class citizen not used by the > developers themselves is not sustainable. I agree, but the fact that no one has stepped up to maintain and improve it tells me that there is not a very large community that wants a clean linear algebra interface, not that we should try to build such an interface directly into numpy. Is there really a point to a clean interface to the Hermetian transpose, but not plain old matrix multiply? > And, now that we have "dot" > as a method, Agh, but "dot" is a method--so we still don't have a clean relationship with the math in text books: AB => A.dot(B) Anyway, adding .H is clearly not a big deal, I just don't think it's going to satisfy anyone anyway. -Chris From chris.barker at noaa.gov Wed Jul 24 11:29:16 2013 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Wed, 24 Jul 2013 08:29:16 -0700 Subject: [Numpy-discussion] add .H attribute? In-Reply-To: References: <51D98902.1090403@gmail.com> <51E17280.2030105@gmail.com> <51ED9100.8040108@gmail.com> <51EE4044.5060509@astro.uio.no> Message-ID: <-3986359840241221136@unknownmsgid> >> >> plan >> would be to remove the Matrix class from numpy over two or three >> releases, and publish it as a separate package on PyPi. Anyone willing to take ownership of it? Maybe we should still do it of not-- at least it will make it clear that it is orphaned. Though one plus to having matrix in numpy is that it was a testbed for ndarray subclassing... -Chris > Please do! There are some sticky issues to work through (e.g. how to > deprecate the "matrix" entry in the numpy namespace, what to do with > scipy.sparse), and I don't know whether we'll decide to go through > with it in the end, but the way to figure that out is to, you know, > work through them :-). > > -n > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From pav at iki.fi Wed Jul 24 11:33:16 2013 From: pav at iki.fi (Pauli Virtanen) Date: Wed, 24 Jul 2013 18:33:16 +0300 Subject: [Numpy-discussion] Splitting numpydoc to a separate repo Message-ID: Hi, How about splitting doc/sphinxext out from the main Numpy repository to a separate `numpydoc` repo under Numpy project? It's a separate Python package, after all. Moreover, this would make it easier to use it as a git submodule (e.g. in Scipy). Moreover, its release cycle is not in any way tied to that of Numpy. Pauli From stefan at sun.ac.za Wed Jul 24 11:35:24 2013 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Wed, 24 Jul 2013 17:35:24 +0200 Subject: [Numpy-discussion] Splitting numpydoc to a separate repo In-Reply-To: References: Message-ID: On Wed, Jul 24, 2013 at 5:33 PM, Pauli Virtanen wrote: > How about splitting doc/sphinxext out from the main Numpy repository to > a separate `numpydoc` repo under Numpy project? That would be great, also for scikits that rely on these extensions. St?fan From robert.kern at gmail.com Wed Jul 24 11:35:42 2013 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 24 Jul 2013 16:35:42 +0100 Subject: [Numpy-discussion] Splitting numpydoc to a separate repo In-Reply-To: References: Message-ID: On Wed, Jul 24, 2013 at 4:33 PM, Pauli Virtanen wrote: > > Hi, > > How about splitting doc/sphinxext out from the main Numpy repository to > a separate `numpydoc` repo under Numpy project? > > It's a separate Python package, after all. Moreover, this would make it > easier to use it as a git submodule (e.g. in Scipy). Moreover, its > release cycle is not in any way tied to that of Numpy. Works for me. -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From lists at onerussian.com Wed Jul 24 11:36:58 2013 From: lists at onerussian.com (Yaroslav Halchenko) Date: Wed, 24 Jul 2013 11:36:58 -0400 Subject: [Numpy-discussion] Splitting numpydoc to a separate repo In-Reply-To: References: Message-ID: <20130724153658.GX27621@onerussian.com> On Wed, 24 Jul 2013, Pauli Virtanen wrote: > How about splitting doc/sphinxext out from the main Numpy repository to > a separate `numpydoc` repo under Numpy project? +1 > It's a separate Python package, after all. Moreover, this would make it > easier to use it as a git submodule (e.g. in Scipy). Moreover, its > release cycle is not in any way tied to that of Numpy. yeap -- it has a life of its own -- Yaroslav O. Halchenko, Ph.D. http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org Senior Research Associate, Psychological and Brain Sciences Dept. Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 WWW: http://www.linkedin.com/in/yarik From tantczak at operasolutions.com Wed Jul 24 11:36:53 2013 From: tantczak at operasolutions.com (Trevor Antczak) Date: Wed, 24 Jul 2013 11:36:53 -0400 Subject: [Numpy-discussion] Casting Errors in AIX Message-ID: Hello Numpy Discussion List, So I'm trying to get numpy working on an AIX 6.1 system. Initially I had a lot of problems trying to compile the package because the xlc compiler weren't installed on this machine, but apparently the Python package we installed had been built with them. Once we got xlc installed the process seemed to work pretty well until we got to compiling heapsort.c. At this point I began to get a huge number of errors in the form: compile options: '-Inumpy/core/include -Ibuild/src.aix-6.1-2.7/numpy/core/include/numpy -Inumpy/core/src/private -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/src/npysort -Inumpy/core/include -I/opt/freeware/include/python2.7 -Ibuild/src.aix-6.1-2.7/numpy/core/src/multiarray -Ibuild/src.aix-6.1-2.7/numpy/core/src/umath -c' xlc_r: build/src.aix-6.1-2.7/numpy/core/src/npysort/heapsort.c "/usr/include/stdio.h", line 528.12: 1506-343 (S) Redeclaration of fgetpos64 differs from previous declaration on line 323 of "/usr/include/stdio.h". "/usr/include/stdio.h", line 528.12: 1506-377 (I) The type "long long*" of parameter 2 differs from the previous type "long* restrict". "/usr/include/stdio.h", line 531.12: 1506-343 (S) Redeclaration of fseeko64 differs from previous declaration on line 471 of "/usr/include/stdio.h". "/usr/include/stdio.h", line 531.12: 1506-377 (I) The type "long long" of parameter 2 differs from the previous type "long". "/usr/include/stdio.h", line 532.12: 1506-343 (S) Redeclaration of fsetpos64 differs from previous declaration on line 325 of "/usr/include/stdio.h". "/usr/include/stdio.h", line 532.12: 1506-377 (I) The type "const long long*" of parameter 2 differs from the previous type "const long*". "/usr/include/stdio.h", line 533.16: 1506-343 (S) Redeclaration of ftello64 differs from previous declaration on line 472 of "/usr/include/stdio.h". "/usr/include/stdio.h", line 533.16: 1506-050 (I) Return type "long long" in redeclaration is not compatible with the previous return type "long". "/usr/include/unistd.h", line 171.17: 1506-343 (S) Redeclaration of lseek64 differs from previous declaration on line 169 of "/usr/include/unistd.h". "/usr/include/unistd.h", line 171.17: 1506-050 (I) Return type "long long" in redeclaration is not compatible with the previous return type "long". "/usr/include/unistd.h", line 171.17: 1506-377 (I) The type "long long" of parameter 2 differs from the previous type "long". "/usr/include/sys/lockf.h", line 64.20: 1506-343 (S) Redeclaration of lockf64 differs from previous declaration on line 62 of "/usr/include/sys/lockf.h". ................................................... "/usr/include/unistd.h", line 942.25: 1506-050 (I) Return type "long long" in redeclaration is not compatible with the previous return type "long". "/usr/include/unistd.h", line 942.25: 1506-377 (I) The type "long long" of parameter 2 differs from the previous type "long". "/usr/include/unistd.h", line 943.25: 1506-343 (S) Redeclaration of fsync_range64 differs from previous declaration on line 940 of "/usr/include/unistd.h". "/usr/include/unistd.h", line 943.25: 1506-377 (I) The type "long long" of parameter 3 differs from the previous type "long". error: Command "/usr/vac/bin/xlc_r -DAIX_GENUINE_CPLUSCPLUS -D_LINUX_SOURCE_COMPAT -q32 -qbitfields=signed -qmaxmem=70000 -qalloca -bmaxdata:0x80000000 -Wl,-brtl -I/usr/include -I/opt/freeware/include -I/opt/freeware/include/ncurses -DNDEBUG -O2 -Inumpy/core/include -Ibuild/src.aix-6.1-2.7/numpy/core/include/numpy -Inumpy/core/src/private -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/src/npysort -Inumpy/core/include -I/opt/freeware/include/python2.7 -Ibuild/src.aix-6.1-2.7/numpy/core/src/multiarray -Ibuild/src.aix-6.1-2.7/numpy/core/src/umath -c build/src.aix-6.1-2.7/numpy/core/src/npysort/heapsort.c -o build/temp.aix-6.1-2.7/build/src.aix-6.1-2.7/numpy/core/src/npysort/heapsort.o" failed with exit status 1 There are a lot more than this. Probably in neighborhood of 40 lines all told. I spent some time doing research and this appears to be something not terribly uncommon when compiling F/OSS on AIX. Most of the instances appeared to involve either sshd or smb. Unfortunately the most commonly cited solution (using the --disable-largefile to configure) won't work for compiling a Python module. One suggestion I found that did help was to explicitly include some of the standard libraries in the .c file. So I added: #include #include To heapsort.c. That dramatically reduced the error messages. Now I get: compile options: '-Inumpy/core/include -Ibuild/src.aix-6.1-2.7/numpy/core/include/numpy -Inumpy/core/src/private -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/src/npysort -Inumpy/core/include -I/opt/freeware/include/python2.7 -Ibuild/src.aix-6.1-2.7/numpy/core/src/multiarray -Ibuild/src.aix-6.1-2.7/numpy/core/src/umath -c' xlc_r: build/src.aix-6.1-2.7/numpy/core/src/npysort/heapsort.c "/usr/include/stdio.h", line 528.12: 1506-343 (S) Redeclaration of fgetpos64 differs from previous declaration on line 323 of "/usr/include/stdio.h". "/usr/include/stdio.h", line 528.12: 1506-377 (I) The type "long long*" of parameter 2 differs from the previous type "long* restrict". "/usr/include/stdio.h", line 531.12: 1506-343 (S) Redeclaration of fseeko64 differs from previous declaration on line 471 of "/usr/include/stdio.h". "/usr/include/stdio.h", line 531.12: 1506-377 (I) The type "long long" of parameter 2 differs from the previous type "long". "/usr/include/stdio.h", line 532.12: 1506-343 (S) Redeclaration of fsetpos64 differs from previous declaration on line 325 of "/usr/include/stdio.h". "/usr/include/stdio.h", line 532.12: 1506-377 (I) The type "const long long*" of parameter 2 differs from the previous type "const long*". "/usr/include/stdio.h", line 533.16: 1506-343 (S) Redeclaration of ftello64 differs from previous declaration on line 472 of "/usr/include/stdio.h". "/usr/include/stdio.h", line 533.16: 1506-050 (I) Return type "long long" in redeclaration is not compatible with the previous return type "long". "/usr/include/stdio.h", line 528.12: 1506-343 (S) Redeclaration of fgetpos64 differs from previous declaration on line 323 of "/usr/include/stdio.h". "/usr/include/stdio.h", line 528.12: 1506-377 (I) The type "long long*" of parameter 2 differs from the previous type "long* restrict". "/usr/include/stdio.h", line 531.12: 1506-343 (S) Redeclaration of fseeko64 differs from previous declaration on line 471 of "/usr/include/stdio.h". "/usr/include/stdio.h", line 531.12: 1506-377 (I) The type "long long" of parameter 2 differs from the previous type "long". "/usr/include/stdio.h", line 532.12: 1506-343 (S) Redeclaration of fsetpos64 differs from previous declaration on line 325 of "/usr/include/stdio.h". "/usr/include/stdio.h", line 532.12: 1506-377 (I) The type "const long long*" of parameter 2 differs from the previous type "const long*". "/usr/include/stdio.h", line 533.16: 1506-343 (S) Redeclaration of ftello64 differs from previous declaration on line 472 of "/usr/include/stdio.h". "/usr/include/stdio.h", line 533.16: 1506-050 (I) Return type "long long" in redeclaration is not compatible with the previous return type "long". error: Command "/usr/vac/bin/xlc_r -DAIX_GENUINE_CPLUSCPLUS -D_LINUX_SOURCE_COMPAT -q32 -qbitfields=signed -qmaxmem=70000 -qalloca -bmaxdata:0x80000000 -Wl,-brtl -I/usr/include -I/opt/freeware/include -I/opt/freeware/include/ncurses -DNDEBUG -O2 -Inumpy/core/include -Ibuild/src.aix-6.1-2.7/numpy/core/include/numpy -Inumpy/core/src/private -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/src/npysort -Inumpy/core/include -I/opt/freeware/include/python2.7 -Ibuild/src.aix-6.1-2.7/numpy/core/src/multiarray -Ibuild/src.aix-6.1-2.7/numpy/core/src/umath -c build/src.aix-6.1-2.7/numpy/core/src/npysort/heapsort.c -o build/temp.aix-6.1-2.7/build/src.aix-6.1-2.7/numpy/core/src/npysort/heapsort.o" failed with exit status 1 And that's all of them, and all related to stdio.h. Unfortunately the obvious solution of explicitly including stdio.h didn't help. It also seems really odd that I would have to explicitly include standard system libraries. I'm hoping there some sort of solution to this that doesn't involve a massive amount of recoding. Thanks in advance for your help! Trevor -------------- next part -------------- An HTML attachment was scrubbed... URL: From jjhelmus at gmail.com Wed Jul 24 11:37:30 2013 From: jjhelmus at gmail.com (Jonathan J. Helmus) Date: Wed, 24 Jul 2013 10:37:30 -0500 Subject: [Numpy-discussion] Splitting numpydoc to a separate repo In-Reply-To: References: Message-ID: On Jul 24, 2013, at 10:33 AM, Pauli Virtanen wrote: > Hi, > > How about splitting doc/sphinxext out from the main Numpy repository to > a separate `numpydoc` repo under Numpy project? > > It's a separate Python package, after all. Moreover, this would make it > easier to use it as a git submodule (e.g. in Scipy). Moreover, its > release cycle is not in any way tied to that of Numpy. > > Pauli > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion I'm a big +1 on this idea. I've used the numpydoc sphinx extensions in a number of package I've worked on, having them as a separate git repo would make these even easier to use. - Jonathan Helmus From lists at hilboll.de Wed Jul 24 11:39:18 2013 From: lists at hilboll.de (Andreas Hilboll) Date: Wed, 24 Jul 2013 17:39:18 +0200 Subject: [Numpy-discussion] Splitting numpydoc to a separate repo In-Reply-To: References: Message-ID: <51EFF526.80306@hilboll.de> On 24.07.2013 17:33, Pauli Virtanen wrote: > Hi, > > How about splitting doc/sphinxext out from the main Numpy repository to > a separate `numpydoc` repo under Numpy project? +1 -- Andreas From pav at iki.fi Wed Jul 24 12:32:58 2013 From: pav at iki.fi (Pauli Virtanen) Date: Wed, 24 Jul 2013 19:32:58 +0300 Subject: [Numpy-discussion] Splitting numpydoc to a separate repo In-Reply-To: References: Message-ID: 24.07.2013 18:33, Pauli Virtanen kirjoitti: > How about splitting doc/sphinxext out from the main Numpy repository to > a separate `numpydoc` repo under Numpy project? Done: https://github.com/numpy/numpydoc https://github.com/numpy/numpy/pull/3547 https://github.com/scipy/scipy/pull/2657 From cjwilliams43 at gmail.com Wed Jul 24 14:13:58 2013 From: cjwilliams43 at gmail.com (Colin J. Williams) Date: Wed, 24 Jul 2013 14:13:58 -0400 Subject: [Numpy-discussion] Treatment of the Matrix by Numpy Message-ID: <51F01966.6080603@gmail.com> An HTML attachment was scrubbed... URL: From grb at skogoglandskap.no Thu Jul 25 04:47:03 2013 From: grb at skogoglandskap.no (Graeme B. Bell) Date: Thu, 25 Jul 2013 08:47:03 +0000 Subject: [Numpy-discussion] unit tests / developing numpy In-Reply-To: References: Message-ID: <7B2B6664-7E15-4F65-9366-06898322F773@skogoglandskap.no> Does anyone know how to get the unit tests to run on a local fork, without doing a complete install of numpy? If so, please can you describe it, or better still, update: http://docs.scipy.org/doc/numpy/dev/gitwash/development_workflow.html It seems strange that the development workflow doesn't mention running any tests before committing/pushing/pulling. Graeme. From grb at skogoglandskap.no Thu Jul 25 05:09:51 2013 From: grb at skogoglandskap.no (Graeme B. Bell) Date: Thu, 25 Jul 2013 09:09:51 +0000 Subject: [Numpy-discussion] unit tests / developing numpy In-Reply-To: <7B2B6664-7E15-4F65-9366-06898322F773@skogoglandskap.no> References: <7B2B6664-7E15-4F65-9366-06898322F773@skogoglandskap.no> Message-ID: <80BE42C2-3DAF-4BA8-A271-5A3DD2E098DE@skogoglandskap.no> To answer my own question in a clumsy way: To run unit tests on a dev version of numpy: python setup.py build python setup.py install --prefix=/tmp/numpy export PYTHONPATH="/tmp/numpy/lib64/python2.7/site-packages/" python >>> import numpy >>> print numpy.version.version >>> numpy.test() Adjust according to your version of python. On Jul 25, 2013, at 10:47 AM, Graeme Bell wrote: > > Does anyone know how to get the unit tests to run on a local fork, without doing a complete install of numpy? > > If so, please can you describe it, or better still, update: http://docs.scipy.org/doc/numpy/dev/gitwash/development_workflow.html > > It seems strange that the development workflow doesn't mention running any tests before committing/pushing/pulling. > > Graeme. From grb at skogoglandskap.no Thu Jul 25 05:17:19 2013 From: grb at skogoglandskap.no (Graeme B. Bell) Date: Thu, 25 Jul 2013 09:17:19 +0000 Subject: [Numpy-discussion] unit tests / developing numpy In-Reply-To: <80BE42C2-3DAF-4BA8-A271-5A3DD2E098DE@skogoglandskap.no> References: <7B2B6664-7E15-4F65-9366-06898322F773@skogoglandskap.no> <80BE42C2-3DAF-4BA8-A271-5A3DD2E098DE@skogoglandskap.no> Message-ID: <0634E4DC-15CA-4134-A4D0-CA54ADFA3A33@skogoglandskap.no> To run unit tests on a dev version of numpy: It won't run if you start in the source directory, so a cd is also needed: python setup.py build python setup.py install --prefix=/tmp/numpy export PYTHONPATH="/tmp/numpy/lib64/python2.7/site-packages/" cd .. python >>> import numpy >>> print numpy.version.version >>> numpy.test() Adjust according to your version of python. From njs at pobox.com Thu Jul 25 06:02:10 2013 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 25 Jul 2013 11:02:10 +0100 Subject: [Numpy-discussion] unit tests / developing numpy In-Reply-To: <0634E4DC-15CA-4134-A4D0-CA54ADFA3A33@skogoglandskap.no> References: <7B2B6664-7E15-4F65-9366-06898322F773@skogoglandskap.no> <80BE42C2-3DAF-4BA8-A271-5A3DD2E098DE@skogoglandskap.no> <0634E4DC-15CA-4134-A4D0-CA54ADFA3A33@skogoglandskap.no> Message-ID: A cleaner option is to use virtualenv: virtualenv --python=/usr/bin/python2.7 my-test-env cd my-test-env bin/pip install $NUMPY_SRCDIR bin/python $NUMPY_SRCDIR/tools/test-installed-numpy.py --mode=full Or you can install 'tox', and then running 'tox -e py27' will do the above, 'tox -e py27,py33' will check both 2.7 and 3.3, plain 'tox' will test all supported configurations (this requires you have lots of python interpreter versions installed), etc. Those docs are in doc/source/dev/gitwash in the numpy source tree -- if you have any thoughts on how to make them clearer than pull requests are appreciated :-). You probably have a better idea than us how to put things clearly to someone who's just starting... -n On Thu, Jul 25, 2013 at 10:17 AM, Graeme B. Bell wrote: > > To run unit tests on a dev version of numpy: > It won't run if you start in the source directory, so a cd is also needed: > > > > python setup.py build > python setup.py install --prefix=/tmp/numpy > export PYTHONPATH="/tmp/numpy/lib64/python2.7/site-packages/" > cd .. > python > >>>> import numpy >>>> print numpy.version.version >>>> numpy.test() > > > Adjust according to your version of python. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From pav at iki.fi Thu Jul 25 06:25:51 2013 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 25 Jul 2013 13:25:51 +0300 Subject: [Numpy-discussion] unit tests / developing numpy In-Reply-To: References: <7B2B6664-7E15-4F65-9366-06898322F773@skogoglandskap.no> <80BE42C2-3DAF-4BA8-A271-5A3DD2E098DE@skogoglandskap.no> <0634E4DC-15CA-4134-A4D0-CA54ADFA3A33@skogoglandskap.no> Message-ID: 25.07.2013 13:02, Nathaniel Smith kirjoitti: [clip] > Or you can install 'tox', and then running 'tox -e py27' will do the > above, 'tox -e py27,py33' will check both 2.7 and 3.3, plain 'tox' > will test all supported configurations (this requires you have lots of > python interpreter versions installed), etc. Or: python runtests.py -- Pauli Virtanen From njs at pobox.com Thu Jul 25 07:48:10 2013 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 25 Jul 2013 12:48:10 +0100 Subject: [Numpy-discussion] Allow == and != to raise errors In-Reply-To: References: <1373632688.13968.13.camel@sebastian-laptop> <1374589784.13486.32.camel@sebastian-laptop> Message-ID: On Tue, Jul 23, 2013 at 4:10 PM, Fr?d?ric Bastien wrote: > I'm mixed, because I see the good value, but I'm not able to guess the > consequence of the interface change. > > So doing your FutureWarning would allow to gatter some data about this, and > if it seam to cause too much problem, we could cancel the change. > > Also, in the case there is a few software that depend on the old behaviour, > this will cause a crash(Except if they have a catch all Exception case), not > bad result. I think we have to be willing to fix bugs, even if we can't be sure what all the consequences are. Carefully of course, and with due consideration to possible compatibility consequences, but if we rejected every change that might have unforeseen effects then we'd have to stop accepting changes altogether. (And anyway the show-stopper regressions that make it into releases always seem to be the ones we didn't anticipate at all, so I doubt that being 50% more careful with obscure corner cases like this will have any measurable impact in our overall release-to-release compatibility.) So I'd consider Fred's comments above to be a vote for the change, in practice... > I think it is always hard to predict the consequence of interface change in > NumPy. To help measure it, we could make/as people to contribute to a > collection of software that use NumPy with a good tests suites. We could > test interface change on them by running there tests suites to try to have a > guess of the impact of those change. What do you think of that? I think it > was already discussed on the mailing list, but not acted upon. Yeah, if we want to be careful then it never hurts to run other projects test suites to flush out bugs :-). We don't do this systematically right now. Maybe we should stick some precompiled copies of scipy and other core numpy-dependants up on a host somewhere and then pull them down and run their test suite as part of the Travis tests? We have maybe 10 minutes of CPU budget for tests still. -n From nouiz at nouiz.org Thu Jul 25 09:15:30 2013 From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=) Date: Thu, 25 Jul 2013 09:15:30 -0400 Subject: [Numpy-discussion] Allow == and != to raise errors In-Reply-To: References: <1373632688.13968.13.camel@sebastian-laptop> <1374589784.13486.32.camel@sebastian-laptop> Message-ID: On Thu, Jul 25, 2013 at 7:48 AM, Nathaniel Smith wrote: > On Tue, Jul 23, 2013 at 4:10 PM, Fr?d?ric Bastien wrote: > > I'm mixed, because I see the good value, but I'm not able to guess the > > consequence of the interface change. > > > > So doing your FutureWarning would allow to gatter some data about this, > and > > if it seam to cause too much problem, we could cancel the change. > > > > Also, in the case there is a few software that depend on the old > behaviour, > > this will cause a crash(Except if they have a catch all Exception case), > not > > bad result. > > I think we have to be willing to fix bugs, even if we can't be sure > what all the consequences are. Carefully of course, and with due > consideration to possible compatibility consequences, but if we > rejected every change that might have unforeseen effects then we'd > have to stop accepting changes altogether. (And anyway the > show-stopper regressions that make it into releases always seem to be > the ones we didn't anticipate at all, so I doubt that being 50% more > careful with obscure corner cases like this will have any measurable > impact in our overall release-to-release compatibility.) So I'd > consider Fred's comments above to be a vote for the change, in > practice... > > > I think it is always hard to predict the consequence of interface change > in > > NumPy. To help measure it, we could make/as people to contribute to a > > collection of software that use NumPy with a good tests suites. We could > > test interface change on them by running there tests suites to try to > have a > > guess of the impact of those change. What do you think of that? I think > it > > was already discussed on the mailing list, but not acted upon. > > Yeah, if we want to be careful then it never hurts to run other > projects test suites to flush out bugs :-). > > We don't do this systematically right now. Maybe we should stick some > precompiled copies of scipy and other core numpy-dependants up on a > host somewhere and then pull them down and run their test suite as > part of the Travis tests? We have maybe 10 minutes of CPU budget for > tests still. Theano tests will be too long. I'm not sure that doing this on travis-ci is the right place. Doing this for each version of a PR will be too long for travis and will limit the project that we will test on. What about doing a vagrant VM that update/install the development version of NumPy and then reinstall some predetermined version of other project and run there tests? I started playing with vagrant VM to help test differente OS configuration for Theano. I haven't finished this, but it seam to do the job well. People just cd in a directory, then run "vagrant up" and then all is automatic. They just wait and read the output. Other idea? I know some other project used jenkins. Would this be a better idea? Fred -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Thu Jul 25 10:49:44 2013 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Thu, 25 Jul 2013 07:49:44 -0700 Subject: [Numpy-discussion] add .H attribute? In-Reply-To: <51EF1873.7050202@gmail.com> References: <51D98902.1090403@gmail.com> <51E17280.2030105@gmail.com> <51ED9100.8040108@gmail.com> <51EE4044.5060509@astro.uio.no> <51EEC332.7070805@gmail.com> <51EEF0BB.60508@astro.uio.no> <51EF0217.209@gmail.com> <51EF078E.8050603@astro.uio.no> <51EF1873.7050202@gmail.com> Message-ID: <-29565752093052626@unknownmsgid> On Jul 23, 2013, at 4:57 PM, Alan G Isaac wrote: > Finally, I think (?) everyone (proponents and opponents) > would be happy if .H could provide access to an iterative > view of the conjugate transpose. Except those of us that don't think numpy needs it at all. But I'll call it a -0 -Chris From grb at skogoglandskap.no Thu Jul 25 10:52:07 2013 From: grb at skogoglandskap.no (Graeme B. Bell) Date: Thu, 25 Jul 2013 14:52:07 +0000 Subject: [Numpy-discussion] unit tests / developing numpy In-Reply-To: References: Message-ID: <20D22199-8C71-49A0-8292-CEBBBB70B902@skogoglandskap.no> Nathaniel, Pauli: Thanks for the suggestions! = runtests.py is a nice solution, but unless you also set up your PYTHONPATH and install the code you've been working on, you're going to run with whichever version of numpy you have installed normally rather than the code you've just been working on (e.g. 1.7.1 rather than 1.8dev). Unfortunately, this caused me to submit a buggy set of commits, thinking they had passed the tests. = env approach: thanks, I'll give that a try and compare it with the /tmp install approach. Can we add this to the dev workflow web pages? Graeme. From pav at iki.fi Thu Jul 25 11:38:11 2013 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 25 Jul 2013 18:38:11 +0300 Subject: [Numpy-discussion] unit tests / developing numpy In-Reply-To: <20D22199-8C71-49A0-8292-CEBBBB70B902@skogoglandskap.no> References: <20D22199-8C71-49A0-8292-CEBBBB70B902@skogoglandskap.no> Message-ID: 25.07.2013 17:52, Graeme B. Bell kirjoitti: [clip] > = runtests.py is a nice solution, but unless you also set up your > PYTHONPATH and install the code you've been working on, you're going > to run with whichever version of numpy you have installed normally > rather than the code you've just been working on (e.g. 1.7.1 rather than 1.8dev). No, runtests builds the code and sets PYTHONPATH accordingly. -- Pauli Virtanen From grb at skogoglandskap.no Thu Jul 25 13:15:22 2013 From: grb at skogoglandskap.no (Graeme B. Bell) Date: Thu, 25 Jul 2013 17:15:22 +0000 Subject: [Numpy-discussion] unit tests / developing numpy (Pauli Virtanen) In-Reply-To: References: Message-ID: > > No, runtests builds the code and sets PYTHONPATH accordingly. > > -- > Pauli Virtanen Hello Pauli, Thanks again for writing back. I agree that may be what runtests.py is intended to do, but it is unfortunately not what it actually does in its default configuration, at least on my computer. I got burned by this a few nights ago. $ pwd /home/X/github/numpy $ more numpy/version.py # THIS FILE IS GENERATED FROM NUMPY SETUP.PY short_version = '1.8.0' $ echo $PYTHONPATH $ python runtests.py Building, see build.log... Build OK Running unit tests for numpy NumPy version 1.7.1 NumPy is installed in /usr/lib64/python2.7/site-packages/numpy Python version 2.7.3 (default, Aug 9 2012, 17:23:57) [GCC 4.7.1 20120720 (Red Hat 4.7.1-5)] nose version 1.3.0 *note the version number and directory used by runtests.py* It reported 100% tests passed (unsurprising since it was testing the release version!). In reality, the current directory at that time failed a test when I pushed it to the main repository. Can you suggest anything that I may be doing wrong here? Graeme On Jul 25, 2013, at 7:00 PM, numpy-discussion-request at scipy.org wrote: > Send NumPy-Discussion mailing list submissions to > numpy-discussion at scipy.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://mail.scipy.org/mailman/listinfo/numpy-discussion > or, via email, send a message with subject or body 'help' to > numpy-discussion-request at scipy.org > > You can reach the person managing the list at > numpy-discussion-owner at scipy.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of NumPy-Discussion digest..." > > > Today's Topics: > > 1. Re: add .H attribute? (Chris Barker - NOAA Federal) > 2. Re: unit tests / developing numpy (Graeme B. Bell) > 3. Re: unit tests / developing numpy (Pauli Virtanen) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 25 Jul 2013 07:49:44 -0700 > From: Chris Barker - NOAA Federal > Subject: Re: [Numpy-discussion] add .H attribute? > To: Discussion of Numerical Python > Message-ID: <-29565752093052626 at unknownmsgid> > Content-Type: text/plain; charset=UTF-8 > > On Jul 23, 2013, at 4:57 PM, Alan G Isaac wrote: > >> Finally, I think (?) everyone (proponents and opponents) >> would be happy if .H could provide access to an iterative >> view of the conjugate transpose. > > Except those of us that don't think numpy needs it at all. > > But I'll call it a -0 > > -Chris > > > ------------------------------ > > Message: 2 > Date: Thu, 25 Jul 2013 14:52:07 +0000 > From: "Graeme B. Bell" > Subject: Re: [Numpy-discussion] unit tests / developing numpy > To: "" > Message-ID: <20D22199-8C71-49A0-8292-CEBBBB70B902 at skogoglandskap.no> > Content-Type: text/plain; charset="us-ascii" > > Nathaniel, Pauli: > > Thanks for the suggestions! > > = runtests.py is a nice solution, but unless you also set up your PYTHONPATH and install the code you've been working on, you're going to run with whichever version of numpy you have installed normally rather than the code you've just been working on (e.g. 1.7.1 rather than 1.8dev). > > Unfortunately, this caused me to submit a buggy set of commits, thinking they had passed the tests. > > > = env approach: thanks, I'll give that a try and compare it with the /tmp install approach. > > > Can we add this to the dev workflow web pages? > > Graeme. > > ------------------------------ > > Message: 3 > Date: Thu, 25 Jul 2013 18:38:11 +0300 > From: Pauli Virtanen > Subject: Re: [Numpy-discussion] unit tests / developing numpy > To: numpy-discussion at scipy.org > Message-ID: > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > 25.07.2013 17:52, Graeme B. Bell kirjoitti: > [clip] >> = runtests.py is a nice solution, but unless you also set up your >> PYTHONPATH and install the code you've been working on, you're going >> to run with whichever version of numpy you have installed normally >> rather than the code you've just been working on (e.g. 1.7.1 rather > than 1.8dev). > > No, runtests builds the code and sets PYTHONPATH accordingly. > > -- > Pauli Virtanen > > > > ------------------------------ > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > End of NumPy-Discussion Digest, Vol 82, Issue 51 > ************************************************ From pav at iki.fi Thu Jul 25 13:49:33 2013 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 25 Jul 2013 20:49:33 +0300 Subject: [Numpy-discussion] unit tests / developing numpy (Pauli Virtanen) In-Reply-To: References: Message-ID: 25.07.2013 20:15, Graeme B. Bell kirjoitti: [clip] > I agree that may be what runtests.py is intended to do, but it is > unfortunately not what it actually does in its default configuration, > at least on my computer. I got burned by this a few nights ago. That is interesting, as it has worked for me on all configurations. You can check under 'build/testenv/' --- does your Python version by chance install it to a `lib64` directory instead of `lib`? (i) What does import os from distutils.sysconfig import get_python_lib get_python_lib(prefix=os.path.abspath('build/testenv')) report? Is there a 'numpy' directory below the reported directory after running runtests.py? (ii) Start a fresh Python interpreter and run import sys print sys.modules.get('numpy') (Note: no "import numpy" above). Does it print `None` or something else? -- Pauli Virtanen From stefan at sun.ac.za Wed Jul 24 22:53:19 2013 From: stefan at sun.ac.za (=?iso-8859-1?Q?St=E9fan?= van der Walt) Date: Thu, 25 Jul 2013 04:53:19 +0200 Subject: [Numpy-discussion] unit tests / developing numpy In-Reply-To: <7B2B6664-7E15-4F65-9366-06898322F773@skogoglandskap.no> References: <7B2B6664-7E15-4F65-9366-06898322F773@skogoglandskap.no> Message-ID: <20130725025319.GC7821@shinobi> On Thu, 25 Jul 2013 08:47:03 +0000, Graeme B. Bell wrote: > > Does anyone know how to get the unit tests to run on a local fork, without doing a complete install of numpy? > I usually do an in-place build with either bentomaker build -i -j or python setup.py build_ext -i Then export PYTHONPATH=$PYTHONPATH:/path/to/numpy and nosetests numpy St?fan From grb at skogoglandskap.no Mon Jul 29 03:43:28 2013 From: grb at skogoglandskap.no (Graeme B. Bell) Date: Mon, 29 Jul 2013 07:43:28 +0000 Subject: [Numpy-discussion] unit tests / developing numpy (Pauli Virtanen) In-Reply-To: References: Message-ID: Hi Pauli, Thanks for looking into this. Apologies for mangling the email subject line in my previous reply. Answers are below, inline: > That is interesting, as it has worked for me on all configurations. > You can check under 'build/testenv/' --- does your Python version by > chance install it to a `lib64` directory instead of `lib`? $ ls build/testenv/ bin lib64 > (i) > > What does > > import os > from distutils.sysconfig import get_python_lib > get_python_lib(prefix=os.path.abspath('build/testenv')) > > report? Is there a 'numpy' directory below the reported directory after > running runtests.py? >>> import os >>> from distutils.sysconfig import get_python_lib >>> get_python_lib(prefix=os.path.abspath('build/testenv')) '/ssd-space/home/X/github/numpy/build/testenv/lib/python2.7/site-packages' I'm running this in a fresh python client in the directory 'github/numpy'. > (ii) > > Start a fresh Python interpreter and run > > import sys > print sys.modules.get('numpy') > > (Note: no "import numpy" above). Does it print `None` or something else? >>> import sys >>> print sys.modules.get('numpy') None Thanks, Pauli and Stefan. Graeme. From ncreati at inogs.it Mon Jul 29 08:50:03 2013 From: ncreati at inogs.it (Nicola Creati) Date: Mon, 29 Jul 2013 14:50:03 +0200 Subject: [Numpy-discussion] Search array in array Message-ID: <51F664FB.5000509@inogs.it> Hello, I'm wondering if there is a fast way to solve the following problem. I have two arrays: A = [[ 4, 9, 10], [ 7, 4, 17], [12, 21, 14], [12, 24, 11], [18, 21, 3], [16, 3, 7], [17, 21, 5], [24, 3, 14]] B = [[17, 5], [14, 21]] I need to search rows of A that contain elements of each row of B regardless of the order of the elements in B. The searched results is: [2, 6] . Thanks. Nicola -- _____________________________________________________________________ Nicola Creati Istituto Nazionale di Oceanografia e di Geofisica Sperimentale - OGS IRI (Ricerca Teconologica e Infrastrutture) Department B.go Grotta Gigante - Brisciki 42/c 34010 Sgonico - Zgonik (TS) - Italy Tel. +39-0402140213 Fax +39-040327307 From gbs25 at drexel.edu Mon Jul 29 16:27:47 2013 From: gbs25 at drexel.edu (Gabe Schwartz) Date: Mon, 29 Jul 2013 20:27:47 +0000 (UTC) Subject: [Numpy-discussion] Search array in array References: <51F664FB.5000509@inogs.it> Message-ID: Nicola Creati inogs.it> writes: > > I need to search rows of A that contain elements of each row of B > regardless of the order of the elements in B. > I don't know how fast this is, but it is fairly short: C = (A[..., np.newaxis, np.newaxis] == B) rows = (C.sum(axis=(1,2,3)) >= B.shape[1]).nonzero()[0] From ncreati at inogs.it Tue Jul 30 02:57:13 2013 From: ncreati at inogs.it (Nicola Creati) Date: Tue, 30 Jul 2013 08:57:13 +0200 Subject: [Numpy-discussion] Search array in array In-Reply-To: References: <51F664FB.5000509@inogs.it> Message-ID: <51F763C9.4030203@inogs.it> On 07/29/2013 10:27 PM, Gabe Schwartz wrote: > C = (A[..., np.newaxis, np.newaxis] == B) > rows = (C.sum(axis=(1,2,3)) >= B.shape[1]).nonzero()[0] Hello, thank you, it's not fast but really nice. Nicola -- _____________________________________________________________________ Nicola Creati Istituto Nazionale di Oceanografia e di Geofisica Sperimentale - OGS IRI (Ricerca Teconologica e Infrastrutture) Department B.go Grotta Gigante - Brisciki 42/c 34010 Sgonico - Zgonik (TS) - Italy Tel. +39-0402140213 Fax +39-040327307