From Nicolas.Rougier at inria.fr Fri Mar 1 02:30:52 2013 From: Nicolas.Rougier at inria.fr (Nicolas Rougier) Date: Fri, 1 Mar 2013 08:30:52 +0100 Subject: [Numpy-discussion] Array indexing and repeated indices Message-ID: <6C2ED1E7-9630-4DC3-A6D4-6B149D8AB725@inria.fr> Hi, I'm trying to increment an array using indexing and a second array for increment values (since it might be a little tedious to explain, see below for a short example). Using "direct" indexing, the values in the example are incremented by 1 only while I want to achieve the alternative behavior. My question is whether there is such function in numpy or if there a re better way to achieve the same result ? (I would like to avoid the while statement) I found and adapted the alternative solution from: http://stackoverflow.com/questions/2004364/increment-numpy-array-with-repeated-indices but it is only for a fixed increment from what I've understood. Nicolas # ------------------------ import numpy as np n,p = 5,100 nodes = np.zeros( n, [('value', 'f4', 1)] ) links = np.zeros( p, [('source', 'i4', 1), ('target', 'i4', 1)]) links['source'] = np.random.randint(0, n, p) links['target'] = np.random.randint(0, n, p) targets = links['target'] # Indices can be repeated K = np.ones(len(targets)) # Note K could be anything # Direct indexing nodes['value'] = 0 nodes['value'][targets] += K print nodes # "Alternative" indexing nodes['value'] = 0 B = np.bincount(targets) while B.any(): I = np.argwhere(B>=1) nodes['value'][I] += K[I] B = np.maximum(B-1,0) print nodes From sebastian at sipsolutions.net Fri Mar 1 05:04:07 2013 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Fri, 01 Mar 2013 11:04:07 +0100 Subject: [Numpy-discussion] Array indexing and repeated indices In-Reply-To: <6C2ED1E7-9630-4DC3-A6D4-6B149D8AB725@inria.fr> References: <6C2ED1E7-9630-4DC3-A6D4-6B149D8AB725@inria.fr> Message-ID: <1362132247.9796.1.camel@sebastian-laptop> On Fri, 2013-03-01 at 08:30 +0100, Nicolas Rougier wrote: > Hi, > > I'm trying to increment an array using indexing and a second array for increment values (since it might be a little tedious to explain, see below for a short example). > > Using "direct" indexing, the values in the example are incremented by 1 only while I want to achieve the alternative behavior. My question is whether there is such function in numpy or if there a re better way to achieve the same result ? > (I would like to avoid the while statement) > > I found and adapted the alternative solution from: http://stackoverflow.com/questions/2004364/increment-numpy-array-with-repeated-indices but it is only for a fixed increment from what I've understood. > > > Nicolas > > > # ------------------------ > > import numpy as np > > n,p = 5,100 > nodes = np.zeros( n, [('value', 'f4', 1)] ) > links = np.zeros( p, [('source', 'i4', 1), > ('target', 'i4', 1)]) > links['source'] = np.random.randint(0, n, p) > links['target'] = np.random.randint(0, n, p) > > targets = links['target'] # Indices can be repeated > K = np.ones(len(targets)) # Note K could be anything > > # Direct indexing > nodes['value'] = 0 > nodes['value'][targets] += K > print nodes > > # "Alternative" indexing > nodes['value'] = 0 > B = np.bincount(targets) bincount takes a weights argument which should do exactly what you are looking for. - Sebastian > while B.any(): > I = np.argwhere(B>=1) > nodes['value'][I] += K[I] > B = np.maximum(B-1,0) > print nodes > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From Nicolas.Rougier at inria.fr Fri Mar 1 05:21:41 2013 From: Nicolas.Rougier at inria.fr (Nicolas Rougier) Date: Fri, 1 Mar 2013 11:21:41 +0100 Subject: [Numpy-discussion] Array indexing and repeated indices In-Reply-To: <1362132247.9796.1.camel@sebastian-laptop> References: <6C2ED1E7-9630-4DC3-A6D4-6B149D8AB725@inria.fr> <1362132247.9796.1.camel@sebastian-laptop> Message-ID: <990526F5-6078-4DDF-88F3-C8E4C92DCE03@inria.fr> > > bincount takes a weights argument which should do exactly what you are > looking for. Fantastic ! Thanks ! Nicolas From sebastian at sipsolutions.net Fri Mar 1 07:25:20 2013 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Fri, 01 Mar 2013 13:25:20 +0100 Subject: [Numpy-discussion] step paramter for linspace Message-ID: <1362140720.13987.0.camel@sebastian-laptop> Hi, there has been a request on the issue tracker for a step parameter to linspace. This is of course tricky with the imprecision of floating point numbers. As a trade off, I was thinking of a step parameter that is used to calculate the integer number of steps. However to be certain that it never misbehaves, doing this strict up to the numerical precision of the (float) numbers. Effectively this means: In [9]: np.linspace(0, 1.2, step=0.3) Out[9]: array([ 0. , 0.3, 0.6, 0.9, 1.2]) In [10]: np.linspace(0, 1.2+5-5, step=0.3) Out[10]: array([ 0. , 0.3, 0.6, 0.9, 1.2]) In [11]: np.linspace(0, 1.2+500-500, step=0.3) ValueError: could not determine exact number of samples for given step I.e. the last fails, because 1.2 + 500 - 500 == 1.1999999999999886, which is an error that is larger then the imprecision of floating point numbers. Is this considered useful, or as it can easily fail for calculated numbers, and is thus only a convenience, it is not? Regards, Sebastian From heng at cantab.net Fri Mar 1 07:33:14 2013 From: heng at cantab.net (Henry Gomersall) Date: Fri, 01 Mar 2013 12:33:14 +0000 Subject: [Numpy-discussion] step paramter for linspace In-Reply-To: <1362140720.13987.0.camel@sebastian-laptop> References: <1362140720.13987.0.camel@sebastian-laptop> Message-ID: <1362141194.7312.27.camel@farnsworth> On Fri, 2013-03-01 at 13:25 +0100, Sebastian Berg wrote: > there has been a request on the issue tracker for a step parameter to > linspace. This is of course tricky with the imprecision of floating > point numbers. How is that different to arange? Either you specify the number of points with linspace, or you specify the step with arange. Is there a third option? My usual hack to deal with the numerical bounds issue is to add/subtract half the step. Henry From sebastian at sipsolutions.net Fri Mar 1 07:44:05 2013 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Fri, 01 Mar 2013 13:44:05 +0100 Subject: [Numpy-discussion] step paramter for linspace In-Reply-To: <1362141194.7312.27.camel@farnsworth> References: <1362140720.13987.0.camel@sebastian-laptop> <1362141194.7312.27.camel@farnsworth> Message-ID: <1362141845.13987.10.camel@sebastian-laptop> On Fri, 2013-03-01 at 12:33 +0000, Henry Gomersall wrote: > On Fri, 2013-03-01 at 13:25 +0100, Sebastian Berg wrote: > > there has been a request on the issue tracker for a step parameter to > > linspace. This is of course tricky with the imprecision of floating > > point numbers. > > How is that different to arange? Either you specify the number of points > with linspace, or you specify the step with arange. Is there a third > option? > > My usual hack to deal with the numerical bounds issue is to add/subtract > half the step. > There is not much. It does that half step logic for you, and you actually know that the end point is exact (since linspace makes sure of that). In arange, the start and step are exact. In linspace the start and stop are exact (even with a given step, it would vary on the order of floating point accuracy). Maybe the larger point is the hope that by adding this to linspace it is easier to get new users to use it and avoid pitfalls of arange with floating points when you are not aware of that half step thing. > Henry > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From sebastian at sipsolutions.net Fri Mar 1 07:58:38 2013 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Fri, 01 Mar 2013 13:58:38 +0100 Subject: [Numpy-discussion] step paramter for linspace In-Reply-To: <1362141845.13987.10.camel@sebastian-laptop> References: <1362140720.13987.0.camel@sebastian-laptop> <1362141194.7312.27.camel@farnsworth> <1362141845.13987.10.camel@sebastian-laptop> Message-ID: <1362142718.13987.13.camel@sebastian-laptop> On Fri, 2013-03-01 at 13:44 +0100, Sebastian Berg wrote: > On Fri, 2013-03-01 at 12:33 +0000, Henry Gomersall wrote: > > On Fri, 2013-03-01 at 13:25 +0100, Sebastian Berg wrote: > > > there has been a request on the issue tracker for a step parameter to > > > linspace. This is of course tricky with the imprecision of floating > > > point numbers. > > > > How is that different to arange? Either you specify the number of points > > with linspace, or you specify the step with arange. Is there a third > > option? > > > > My usual hack to deal with the numerical bounds issue is to add/subtract > > half the step. > > > > There is not much. It does that half step logic for you, and you > actually know that the end point is exact (since linspace makes sure of > that). > > In arange, the start and step are exact. In linspace the start and stop > are exact (even with a given step, it would vary on the order of > floating point accuracy). > > Maybe the larger point is the hope that by adding this to linspace it is > easier to get new users to use it and avoid pitfalls of arange with > floating points when you are not aware of that half step thing. > That said, I am honestly not sure this is worth it. I guess I might use it once in a while, but overall probably hardly at all and it is easy to do something else... > > Henry > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From njs at pobox.com Fri Mar 1 08:34:48 2013 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 1 Mar 2013 13:34:48 +0000 Subject: [Numpy-discussion] step paramter for linspace In-Reply-To: <1362141194.7312.27.camel@farnsworth> References: <1362140720.13987.0.camel@sebastian-laptop> <1362141194.7312.27.camel@farnsworth> Message-ID: On Fri, Mar 1, 2013 at 12:33 PM, Henry Gomersall wrote: > On Fri, 2013-03-01 at 13:25 +0100, Sebastian Berg wrote: >> there has been a request on the issue tracker for a step parameter to >> linspace. This is of course tricky with the imprecision of floating >> point numbers. > > How is that different to arange? Either you specify the number of points > with linspace, or you specify the step with arange. Is there a third > option? arange is designed for ints and gives you a half-open interval, linspace is designed for floats and gives you a closed interval. This means that when arange is used on floats, it does weird things that linspace doesn't: In [11]: eps = np.finfo(float).eps In [12]: np.arange(0, 1, step=0.2) Out[12]: array([ 0. , 0.2, 0.4, 0.6, 0.8]) In [13]: np.arange(0, 1 + eps, step=0.2) Out[13]: array([ 0. , 0.2, 0.4, 0.6, 0.8, 1. ]) In [14]: np.linspace(0, 1, 6) Out[14]: array([ 0. , 0.2, 0.4, 0.6, 0.8, 1. ]) In [15]: np.linspace(0, 1 + eps, 6) Out[15]: array([ 0. , 0.2, 0.4, 0.6, 0.8, 1. ]) The half-open/closed thing also has effects on what kind of api is reasonable. arange(0, 1, step=0.8) makes perfect sense (it acts like python range(0, 10, step=8)). linspace(0, 1, step=0.8) is just incoherent, though, because linspace guarantees that both the start and end points are included. > My usual hack to deal with the numerical bounds issue is to add/subtract > half the step. Right. Which is exactly the sort of annoying, content-free code that a library is supposed to handle for you, so you can save mental energy for more important things :-). The problem is to figure out exactly how strict we should be. Like, presumably linspace(0, 1, step=0.8) should fail, rather than round 0.8 to 0.5 or 1. That would clearly violate "in the face of ambiguity, refuse the temptation to guess". OTOH, as Sebastian points out, requiring that the step be *exactly* a divisor of the value (stop - start), within 1 ULP, is probably obnoxious. Would anything bad happen if we just required that, say, (stop - start)/step had to be within "np.allclose" of an integer, i.e., to some reasonable relative and absolute precision, and then rounded the number of steps to match that integer exactly? -n From heng at cantab.net Fri Mar 1 09:14:35 2013 From: heng at cantab.net (Henry Gomersall) Date: Fri, 01 Mar 2013 14:14:35 +0000 Subject: [Numpy-discussion] step paramter for linspace In-Reply-To: References: <1362140720.13987.0.camel@sebastian-laptop> <1362141194.7312.27.camel@farnsworth> Message-ID: <1362147275.7312.43.camel@farnsworth> On Fri, 2013-03-01 at 13:34 +0000, Nathaniel Smith wrote: > > My usual hack to deal with the numerical bounds issue is to > add/subtract > > half the step. > > Right. Which is exactly the sort of annoying, content-free code that a > library is supposed to handle for you, so you can save mental energy > for more important things :-). I agree with the sentiment (I sometimes wish a library could read my mind ;) but putting this sort of logic into the library seems dangerous to me. The point is that the coder _should_ understand the subtleties of floating point numbers. IMO arange _should_ be well specified and actually operate on the half open interval; continuing to add a step until >= the limit is clear and always unambiguous. Unfortunately, the docs tell me that this isn't the case: "For floating point arguments, the length of the result is ``ceil((stop - start)/step)``. Because of floating point overflow, this rule may result in the last element of `out` being greater than `stop`." In my jet-lag addled state, i can't see when this out[-1] > stop case will occur, but I can take it as true. It does seem to be problematic though. As soon as you allow freeform setting of the stop value, problems are going to be encountered. Who's to say that the stop - delta is actually _meant_ to be below the limit, or is meant to be the limit? Certainly not the library! It just seems to me that this will lead to lots of bad code in which the writer has glossed over an ambiguous case. Henry From warren.weckesser at gmail.com Fri Mar 1 09:24:57 2013 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Fri, 1 Mar 2013 09:24:57 -0500 Subject: [Numpy-discussion] step paramter for linspace In-Reply-To: <1362147275.7312.43.camel@farnsworth> References: <1362140720.13987.0.camel@sebastian-laptop> <1362141194.7312.27.camel@farnsworth> <1362147275.7312.43.camel@farnsworth> Message-ID: On 3/1/13, Henry Gomersall wrote: > On Fri, 2013-03-01 at 13:34 +0000, Nathaniel Smith wrote: >> > My usual hack to deal with the numerical bounds issue is to >> add/subtract >> > half the step. >> >> Right. Which is exactly the sort of annoying, content-free code that a >> library is supposed to handle for you, so you can save mental energy >> for more important things :-). > > I agree with the sentiment (I sometimes wish a library could read my > mind ;) but putting this sort of logic into the library seems dangerous > to me. > > The point is that the coder _should_ understand the subtleties of > floating point numbers. IMO arange _should_ be well specified and > actually operate on the half open interval; continuing to add a step > until >= the limit is clear and always unambiguous. > > Unfortunately, the docs tell me that this isn't the case: > "For floating point arguments, the length of the result is > ``ceil((stop - start)/step)``. Because of floating point overflow, > this rule may result in the last element of `out` being greater > than `stop`." > > In my jet-lag addled state, i can't see when this out[-1] > stop case > will occur, but I can take it as true. It does seem to be problematic > though. Here you go: In [32]: end = 2.2 In [33]: x = arange(0.1, end, 0.3) In [34]: x[-1] Out[34]: 2.2000000000000006 In [35]: x[-1] > end Out[35]: True Warren > > As soon as you allow freeform setting of the stop value, problems are > going to be encountered. Who's to say that the stop - delta is actually > _meant_ to be below the limit, or is meant to be the limit? Certainly > not the library! > > It just seems to me that this will lead to lots of bad code in which the > writer has glossed over an ambiguous case. > > Henry > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From heng at cantab.net Fri Mar 1 09:32:48 2013 From: heng at cantab.net (Henry Gomersall) Date: Fri, 01 Mar 2013 14:32:48 +0000 Subject: [Numpy-discussion] step paramter for linspace In-Reply-To: References: <1362140720.13987.0.camel@sebastian-laptop> <1362141194.7312.27.camel@farnsworth> <1362147275.7312.43.camel@farnsworth> Message-ID: <1362148368.7312.48.camel@farnsworth> On Fri, 2013-03-01 at 09:24 -0500, Warren Weckesser wrote: > > In my jet-lag addled state, i can't see when this out[-1] > stop > case > > will occur, but I can take it as true. It does seem to be > problematic > > though. > > > Here you go: > > In [32]: end = 2.2 > > In [33]: x = arange(0.1, end, 0.3) Thanks! I'll assert then that there should be an equivalent for floats that unambiguously returns a range for the half open interval. IMO this is more useful than a hacky version of linspace. Henry From heng at cantab.net Fri Mar 1 09:35:38 2013 From: heng at cantab.net (Henry Gomersall) Date: Fri, 01 Mar 2013 14:35:38 +0000 Subject: [Numpy-discussion] step paramter for linspace In-Reply-To: <1362148368.7312.48.camel@farnsworth> References: <1362140720.13987.0.camel@sebastian-laptop> <1362141194.7312.27.camel@farnsworth> <1362147275.7312.43.camel@farnsworth> <1362148368.7312.48.camel@farnsworth> Message-ID: <1362148538.7312.49.camel@farnsworth> On Fri, 2013-03-01 at 14:32 +0000, Henry Gomersall wrote: > I'll assert then that there should be an equivalent for floats that > unambiguously returns a range for the half open interval. IMO this is > more useful than a hacky version of linspace. And, no, I haven't thought carefully about how to handle a negative step. Henry From sebastian at sipsolutions.net Fri Mar 1 09:53:53 2013 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Fri, 01 Mar 2013 15:53:53 +0100 Subject: [Numpy-discussion] step paramter for linspace In-Reply-To: References: <1362140720.13987.0.camel@sebastian-laptop> <1362141194.7312.27.camel@farnsworth> Message-ID: <1362149633.13987.41.camel@sebastian-laptop> On Fri, 2013-03-01 at 13:34 +0000, Nathaniel Smith wrote: > On Fri, Mar 1, 2013 at 12:33 PM, Henry Gomersall wrote: > > On Fri, 2013-03-01 at 13:25 +0100, Sebastian Berg wrote: > >> there has been a request on the issue tracker for a step parameter to > >> linspace. This is of course tricky with the imprecision of floating > >> point numbers. > > > > How is that different to arange? Either you specify the number of points > > with linspace, or you specify the step with arange. Is there a third > > option? > > arange is designed for ints and gives you a half-open interval, > linspace is designed for floats and gives you a closed interval. This > means that when arange is used on floats, it does weird things that > linspace doesn't: > > In [11]: eps = np.finfo(float).eps > > In [12]: np.arange(0, 1, step=0.2) > Out[12]: array([ 0. , 0.2, 0.4, 0.6, 0.8]) > > In [13]: np.arange(0, 1 + eps, step=0.2) > Out[13]: array([ 0. , 0.2, 0.4, 0.6, 0.8, 1. ]) > > In [14]: np.linspace(0, 1, 6) > Out[14]: array([ 0. , 0.2, 0.4, 0.6, 0.8, 1. ]) > > In [15]: np.linspace(0, 1 + eps, 6) > Out[15]: array([ 0. , 0.2, 0.4, 0.6, 0.8, 1. ]) > > The half-open/closed thing also has effects on what kind of api is > reasonable. arange(0, 1, step=0.8) makes perfect sense (it acts like > python range(0, 10, step=8)). linspace(0, 1, step=0.8) is just > incoherent, though, because linspace guarantees that both the start > and end points are included. > > > My usual hack to deal with the numerical bounds issue is to add/subtract > > half the step. > > Right. Which is exactly the sort of annoying, content-free code that a > library is supposed to handle for you, so you can save mental energy > for more important things :-). > > The problem is to figure out exactly how strict we should be. Like, > presumably linspace(0, 1, step=0.8) should fail, rather than round 0.8 > to 0.5 or 1. That would clearly violate "in the face of ambiguity, > refuse the temptation to guess". > > OTOH, as Sebastian points out, requiring that the step be *exactly* a > divisor of the value (stop - start), within 1 ULP, is probably > obnoxious. > > Would anything bad happen if we just required that, say, (stop - > start)/step had to be within "np.allclose" of an integer, i.e., to > some reasonable relative and absolute precision, and then rounded the > number of steps to match that integer exactly? I was a bit worried about what happens for huge a number of steps. Have to rethink a bit about it, but I guess one should be able to relax it... or maybe someone here has a nice idea on how to relax it. It seems to me that there is a bit of a trade off if you get into the millions of steps range, because absolute errors that make sense for few steps are suddenly in the range integers. > > -n > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From alan.isaac at gmail.com Fri Mar 1 10:01:29 2013 From: alan.isaac at gmail.com (Alan G Isaac) Date: Fri, 01 Mar 2013 10:01:29 -0500 Subject: [Numpy-discussion] step paramter for linspace In-Reply-To: <1362148368.7312.48.camel@farnsworth> References: <1362140720.13987.0.camel@sebastian-laptop> <1362141194.7312.27.camel@farnsworth> <1362147275.7312.43.camel@farnsworth> <1362148368.7312.48.camel@farnsworth> Message-ID: <5130C2C9.3070603@gmail.com> On 3/1/2013 9:32 AM, Henry Gomersall wrote: > there should be an equivalent for floats that > unambiguously returns a range for the half open interval If I've understood you: start + stepsize*np.arange(nsteps) fwiw, Alan Isaac From heng at cantab.net Fri Mar 1 10:07:46 2013 From: heng at cantab.net (Henry Gomersall) Date: Fri, 01 Mar 2013 15:07:46 +0000 Subject: [Numpy-discussion] step paramter for linspace In-Reply-To: <5130C2C9.3070603@gmail.com> References: <1362140720.13987.0.camel@sebastian-laptop> <1362141194.7312.27.camel@farnsworth> <1362147275.7312.43.camel@farnsworth> <1362148368.7312.48.camel@farnsworth> <5130C2C9.3070603@gmail.com> Message-ID: <1362150466.7312.50.camel@farnsworth> On Fri, 2013-03-01 at 10:01 -0500, Alan G Isaac wrote: > On 3/1/2013 9:32 AM, Henry Gomersall wrote: > > there should be an equivalent for floats that > > unambiguously returns a range for the half open interval > > > If I've understood you: > start + stepsize*np.arange(nsteps) yes, except that nsteps is computed for you, otherwise you could just use linspace ;) hen From sebastian at sipsolutions.net Fri Mar 1 10:27:29 2013 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Fri, 01 Mar 2013 16:27:29 +0100 Subject: [Numpy-discussion] step paramter for linspace In-Reply-To: <1362150466.7312.50.camel@farnsworth> References: <1362140720.13987.0.camel@sebastian-laptop> <1362141194.7312.27.camel@farnsworth> <1362147275.7312.43.camel@farnsworth> <1362148368.7312.48.camel@farnsworth> <5130C2C9.3070603@gmail.com> <1362150466.7312.50.camel@farnsworth> Message-ID: <1362151649.13987.57.camel@sebastian-laptop> On Fri, 2013-03-01 at 15:07 +0000, Henry Gomersall wrote: > On Fri, 2013-03-01 at 10:01 -0500, Alan G Isaac wrote: > > On 3/1/2013 9:32 AM, Henry Gomersall wrote: > > > there should be an equivalent for floats that > > > unambiguously returns a range for the half open interval > > > > > > If I've understood you: > > start + stepsize*np.arange(nsteps) > > yes, except that nsteps is computed for you, otherwise you could just > use linspace ;) If you could just use linspace, you should use linspace (and give it a step argument) in my opinion, but I don't think you meant that ;). linspace holds start and stop exact and guarantees that you actually get to stop. Even a modified/new arange will never do that, but I think many use arange like that and giving linspace a step argument could migrate that usage (which is simply ill defined for arange) to it. That might give an error once in a while, but that should be much less often and much more enlightening then a sudden "one value too much". I think the accuracy requirements for the step for linspace can be relaxed enough probably, though I am not quite certain yet as to how (there is a bit of a trade off/problem when you get to a very large number of steps). > > hen > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From alan.isaac at gmail.com Fri Mar 1 10:49:00 2013 From: alan.isaac at gmail.com (Alan G Isaac) Date: Fri, 01 Mar 2013 10:49:00 -0500 Subject: [Numpy-discussion] step paramter for linspace In-Reply-To: <1362150466.7312.50.camel@farnsworth> References: <1362140720.13987.0.camel@sebastian-laptop> <1362141194.7312.27.camel@farnsworth> <1362147275.7312.43.camel@farnsworth> <1362148368.7312.48.camel@farnsworth> <5130C2C9.3070603@gmail.com> <1362150466.7312.50.camel@farnsworth> Message-ID: <5130CDEC.6010206@gmail.com> One motivation of this thread was that adding a step parameter to linspace might make things easier for beginners. I claim this thread has put the lie to that, starting with the initial post. So what is the persuasive case for the change? Imo, the current situation is good: use arange if you want to specify the stepsize, or use linspace if you want to specify the number of points. Nice and clean. Cheers, Alan Isaac From sebastian at sipsolutions.net Fri Mar 1 11:29:57 2013 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Fri, 01 Mar 2013 17:29:57 +0100 Subject: [Numpy-discussion] step paramter for linspace In-Reply-To: <5130CDEC.6010206@gmail.com> References: <1362140720.13987.0.camel@sebastian-laptop> <1362141194.7312.27.camel@farnsworth> <1362147275.7312.43.camel@farnsworth> <1362148368.7312.48.camel@farnsworth> <5130C2C9.3070603@gmail.com> <1362150466.7312.50.camel@farnsworth> <5130CDEC.6010206@gmail.com> Message-ID: <1362155397.13987.99.camel@sebastian-laptop> On Fri, 2013-03-01 at 10:49 -0500, Alan G Isaac wrote: > One motivation of this thread was that > adding a step parameter to linspace might make > things easier for beginners. > > I claim this thread has put the lie to that, > starting with the initial post. So what is the > persuasive case for the change? > > Imo, the current situation is good: > use arange if you want to specify the stepsize, > or use linspace if you want to specify the > number of points. Nice and clean. > Maybe you are right, and it is not easier. But there was a "please include an end_point=True/False option to arange" request, and that does not sense by arange logic. The fact that the initial example was overly strict is something that can be relaxed quite a bit I am sure, though I guess you may always have an odd case here or there with floats. I agree the difference is nice and clean right now, but I disagree that this would change much. Arange guarantees the step size. Linspace the end point. There is a bit of a shift, but if I thought it was less clean I would not have asked if it is deemed useful :). At this time it seems there is more sentiment against it and that is fine with me. I thought it might be useful for some who normally want the linspace behavior, but do not want to worry about the right num in some cases. Someone who actually wants an error if the step they put in quickly (and which they would have used to calculate num) was wrong. > Cheers, > Alan Isaac > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From heng at cantab.net Fri Mar 1 11:36:03 2013 From: heng at cantab.net (Henry Gomersall) Date: Fri, 01 Mar 2013 16:36:03 +0000 Subject: [Numpy-discussion] step paramter for linspace In-Reply-To: <1362155397.13987.99.camel@sebastian-laptop> References: <1362140720.13987.0.camel@sebastian-laptop> <1362141194.7312.27.camel@farnsworth> <1362147275.7312.43.camel@farnsworth> <1362148368.7312.48.camel@farnsworth> <5130C2C9.3070603@gmail.com> <1362150466.7312.50.camel@farnsworth> <5130CDEC.6010206@gmail.com> <1362155397.13987.99.camel@sebastian-laptop> Message-ID: <1362155763.7312.62.camel@farnsworth> On Fri, 2013-03-01 at 17:29 +0100, Sebastian Berg wrote: > At this time it seems there is more sentiment against it and that is > fine with me. I thought it might be useful for some who normally want > the linspace behavior, but do not want to worry about the right num in > some cases. Someone who actually wants an error if the step they put > in > quickly (and which they would have used to calculate num) was wrong. Actually, I buy this could be useful. I think it's helpful to think about the potential problems though. Henry From scollis.acrf at gmail.com Sat Mar 2 17:32:28 2013 From: scollis.acrf at gmail.com (Scott Collis) Date: Sat, 2 Mar 2013 16:32:28 -0600 Subject: [Numpy-discussion] feature tracking in numpy/scipy Message-ID: <4CDBEE7F-FF2B-43DD-BFA6-AFF80D5364B8@gmail.com> Good afternoon list, I am looking at feature tracking in a 2D numpy array, along the lines of Dixon and Wiener 1993 (for tracking precipitating storms) Identifying features based on threshold is quite trivial using ndimage.label b_fld=np.zeros(mygrid.fields['rain_rate_A']['data'].shape) rr=10 b_fld[mygrid.fields['rain_rate_A']['data'] > rr]=1.0 labels, numobjects = ndimage.label(b_fld[0,0,:,:]) (note mygrid.fields['rain_rate_A']['data'] is dimensions time,height, y, x) using the matplotlib contouring and fetching the vertices I can get a nice list of polygons of rain rate above a certain threshold? Now from here I can just go and implement the Dixon and Wiener methodology but I thought I would check here first to see if anyone know of a object/feature tracking algorithm in numpy/scipy or using numpy arrays (it just seems like something people would want to do!).. i.e. something that looks back and forward in time and identifies polygon movement and identifies objects with temporal persistence.. Cheers! Scott Dixon, M., and G. Wiener, 1993: TITAN: Thunderstorm Identification, Tracking, Analysis, and Nowcasting?A Radar-based Methodology. Journal of Atmospheric and Oceanic Technology, 10, 785?797, doi:10.1175/1520-0426(1993)010<0785:TTITAA>2.0.CO;2. http://journals.ametsoc.org/doi/abs/10.1175/1520-0426%281993%29010%3C0785%3ATTITAA%3E2.0.CO%3B2 -------------- next part -------------- An HTML attachment was scrubbed... URL: From sudheer.joseph at yahoo.com Sat Mar 2 21:03:11 2013 From: sudheer.joseph at yahoo.com (Sudheer Joseph) Date: Sun, 3 Mar 2013 10:03:11 +0800 (SGT) Subject: [Numpy-discussion] reshaping arrays In-Reply-To: <4CDBEE7F-FF2B-43DD-BFA6-AFF80D5364B8@gmail.com> References: <4CDBEE7F-FF2B-43DD-BFA6-AFF80D5364B8@gmail.com> Message-ID: <1362276191.55648.YahooMailNeo@web193404.mail.sg3.yahoo.com> Hi all, ? ? ? ? For a 3d array in matlab, I can do the below to reshape it before an eof analysis. Is there a way to do the same using numpy? Please help. [nlat,nlon,ntim ]=size(ssh); tssh=reshape(ssh,nlat*nlon,ntim); and afterwards eofout=[] eofout=reshape(eof1,nlat,nlon,ntime) with best regards, Sudheer -------------- next part -------------- An HTML attachment was scrubbed... URL: From brad.froehle at gmail.com Sat Mar 2 22:20:42 2013 From: brad.froehle at gmail.com (Bradley M. Froehle) Date: Sat, 2 Mar 2013 19:20:42 -0800 Subject: [Numpy-discussion] reshaping arrays In-Reply-To: <1362276191.55648.YahooMailNeo@web193404.mail.sg3.yahoo.com> References: <4CDBEE7F-FF2B-43DD-BFA6-AFF80D5364B8@gmail.com> <1362276191.55648.YahooMailNeo@web193404.mail.sg3.yahoo.com> Message-ID: On Sat, Mar 2, 2013 at 6:03 PM, Sudheer Joseph wrote: > Hi all, > For a 3d array in matlab, I can do the below to reshape it before > an eof analysis. Is there a way to do the same using numpy? Please help. > > [nlat,nlon,ntim ]=size(ssh); > tssh=reshape(ssh,nlat*nlon,ntim); > and afterwards > eofout=[] > eofout=reshape(eof1,nlat,nlon,ntime) > Yes, easy: nlat, nlon, ntim = ssh.shape tssh = ssh.reshape(nlat*nlon, ntim, order='F') and afterwards eofout = eofl.reshape(nlat, nlon, ntim, order='F') You probably want to go read through http://www.scipy.org/NumPy_for_Matlab_Users. Cheers, Brad -------------- next part -------------- An HTML attachment was scrubbed... URL: From sudheer.joseph at yahoo.com Sat Mar 2 22:49:09 2013 From: sudheer.joseph at yahoo.com (Sudheer Joseph) Date: Sun, 3 Mar 2013 11:49:09 +0800 (SGT) Subject: [Numpy-discussion] reshaping arrays In-Reply-To: References: <4CDBEE7F-FF2B-43DD-BFA6-AFF80D5364B8@gmail.com> <1362276191.55648.YahooMailNeo@web193404.mail.sg3.yahoo.com> Message-ID: <1362282549.95401.YahooMailNeo@web193406.mail.sg3.yahoo.com> Thank you Brad, for the quick reply with solution,?special?thanks to the link for matlab users. with best regards, Sudheer ? *************************************************************** Sudheer Joseph Indian National Centre for Ocean Information Services Ministry of Earth Sciences, Govt. of India POST BOX NO: 21, IDA Jeedeemetla P.O. Via Pragathi Nagar,Kukatpally, Hyderabad; Pin:5000 55 Tel:+91-40-23886047(O),Fax:+91-40-23895011(O), Tel:+91-40-23044600(R),Tel:+91-40-9440832534(Mobile) E-mail:sjo.India at gmail.com;sudheer.joseph at yahoo.com Web- http://oppamthadathil.tripod.com *************************************************************** ________________________________ From: Bradley M. Froehle To: Discussion of Numerical Python Sent: Sunday, 3 March 2013 8:50 AM Subject: Re: [Numpy-discussion] reshaping arrays On Sat, Mar 2, 2013 at 6:03 PM, Sudheer Joseph wrote: Hi all, >? ? ? ? For a 3d array in matlab, I can do the below to reshape it before an eof analysis. Is there a way to do the same using numpy? Please help. > > >[nlat,nlon,ntim ]=size(ssh); >tssh=reshape(ssh,nlat*nlon,ntim); >and afterwards >eofout=[] >eofout=reshape(eof1,nlat,nlon,ntime) Yes, easy: nlat, nlon, ntim = ssh.shape tssh = ssh.reshape(nlat*nlon, ntim, order='F') and afterwards eofout = eofl.reshape(nlat, nlon, ntim, order='F') You probably want to go read through?http://www.scipy.org/NumPy_for_Matlab_Users. Cheers, Brad _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From sudheer.joseph at yahoo.com Sat Mar 2 23:35:43 2013 From: sudheer.joseph at yahoo.com (Sudheer Joseph) Date: Sun, 3 Mar 2013 12:35:43 +0800 (SGT) Subject: [Numpy-discussion] reshaping arrays In-Reply-To: <1362282549.95401.YahooMailNeo@web193406.mail.sg3.yahoo.com> References: <4CDBEE7F-FF2B-43DD-BFA6-AFF80D5364B8@gmail.com> <1362276191.55648.YahooMailNeo@web193404.mail.sg3.yahoo.com> <1362282549.95401.YahooMailNeo@web193406.mail.sg3.yahoo.com> Message-ID: <1362285343.43953.YahooMailNeo@web193402.mail.sg3.yahoo.com> Hi Brad, ? ? ? ? ? ? ? ? I am not getting the attribute reshape for the array, are you having a different version of numpy than mine? I have? In [55]: np.__version__ Out[55]: '1.7.0' and detail of the shape details of variable? In [57]: ssh?? Type: ? ? ? NetCDFVariable String Form: Namespace: ?Interactive Length: ? ? 75 Docstring: ?NetCDF Variable In [58]: ssh.shape Out[58]: (75, 140, 180) ssh?? Type: ? ? ? NetCDFVariable String Form: Namespace: ?Interactive Length: ? ? 75 Docstring: ?NetCDF Variable In [66]: ssh.shape Out[66]: (75, 140, 180) In [67]: ssh.reshape(75,140*180) --------------------------------------------------------------------------- AttributeError ? ? ? ? ? ? ? ? ? ? ? ? ? ?Traceback (most recent call last) /home/sjo/RAMA_20120807/adcp/ in () ----> 1 ssh.reshape(75,140*180) AttributeError: reshape ? *************************************************************** Sudheer Joseph Indian National Centre for Ocean Information Services Ministry of Earth Sciences, Govt. of India POST BOX NO: 21, IDA Jeedeemetla P.O. Via Pragathi Nagar,Kukatpally, Hyderabad; Pin:5000 55 Tel:+91-40-23886047(O),Fax:+91-40-23895011(O), Tel:+91-40-23044600(R),Tel:+91-40-9440832534(Mobile) E-mail:sjo.India at gmail.com;sudheer.joseph at yahoo.com Web- http://oppamthadathil.tripod.com *************************************************************** ________________________________ From: Sudheer Joseph To: Discussion of Numerical Python Sent: Sunday, 3 March 2013 9:19 AM Subject: Re: [Numpy-discussion] reshaping arrays Thank you Brad, for the quick reply with solution,?special?thanks to the link for matlab users. with best regards, Sudheer ? *************************************************************** Sudheer Joseph Indian National Centre for Ocean Information Services Ministry of Earth Sciences, Govt. of India POST BOX NO: 21, IDA Jeedeemetla P.O. Via Pragathi Nagar,Kukatpally, Hyderabad; Pin:5000 55 Tel:+91-40-23886047(O),Fax:+91-40-23895011(O), Tel:+91-40-23044600(R),Tel:+91-40-9440832534(Mobile) E-mail:sjo.India at gmail.com;sudheer.joseph at yahoo.com Web- http://oppamthadathil.tripod.com *************************************************************** ________________________________ From: Bradley M. Froehle To: Discussion of Numerical Python Sent: Sunday, 3 March 2013 8:50 AM Subject: Re: [Numpy-discussion] reshaping arrays On Sat, Mar 2, 2013 at 6:03 PM, Sudheer Joseph wrote: Hi all, >? ? ? ? For a 3d array in matlab, I can do the below to reshape it before an eof analysis. Is there a way to do the same using numpy? Please help. > > >[nlat,nlon,ntim ]=size(ssh); >tssh=reshape(ssh,nlat*nlon,ntim); >and afterwards >eofout=[] >eofout=reshape(eof1,nlat,nlon,ntime) Yes, easy: nlat, nlon, ntim = ssh.shape tssh = ssh.reshape(nlat*nlon, ntim, order='F') and afterwards eofout = eofl.reshape(nlat, nlon, ntim, order='F') You probably want to go read through?http://www.scipy.org/NumPy_for_Matlab_Users. Cheers, Brad _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From opossumnano at gmail.com Mon Mar 4 05:36:55 2013 From: opossumnano at gmail.com (Tiziano Zito) Date: Mon, 4 Mar 2013 11:36:55 +0100 Subject: [Numpy-discussion] EuroSciPy 2013 Call for Abstracts Message-ID: <20130304103653.GA30426@bio230.biologie.hu-berlin.de> Dear Scientist using Python, EuroSciPy 2013, the Sixth Annual Conference on Python in Science, takes place in Brussels on 21 - 24 August 2013. The conference features two days of tutorials followed by two days of scientific talks that start with our keynote speakers, Cameron Neylon and Peter Wang. The topics presented at EuroSciPy are very diverse, with a focus on advanced software engineering and original uses of Python and its scientific libraries, either in theoretical or experimental research, from both academia and the industry. The program includes contributed talks and posters. Submissions for talks and posters are welcome on our website (http://www.euroscipy.org/). Authors must use the web interface to submit an abstract to the conference. In your abstract, please provide details on what Python tools are being employed, and how. The deadline for submission is 28 April 2013. Until 31 March 2013, you can apply for a sprint session on 25 August 2013. Also, potential organizers for EuroSciPy 2014 are welcome to contact the conference committee. SciPythonic Regards, The EuroSciPy 2013 Committee http://www.euroscipy.org/ Conference Chairs: Pierre de Buyl and Nicolas Pettiaux, Universit? libre de Bruxelles, Belgium Tutorial Chair: Nicolas Rougier, INRIA, Nancy, France Program Chair: Tiziano Zito, Humboldt-Universit?t zu Berlin, Germany Program Committee Ralf Gommers, ASML, The Netherlands Emmanuelle Gouillart, Joint Unit CNRS/Saint-Gobain, France Kael Hanson, Universit? Libre de Bruxelles, Belgium Konrad Hinsen, Centre National de la Recherche Scientifique (CNRS), France Hans Petter Langtangen, Simula and University of Oslo, Norway Mike M?ller, Python Academy, Germany Raphael Ritz, International Neuroinformatics Coordinating Facility, Stockholm, Sweden St?fan van der Walt, Applied Mathematics, Stellenbosch University, South Africa Ga?l Varoquaux, INRIA Parietal, Saclay, France Nelle Varoquaux, Mines ParisTech, France Pauli Virtanen, Aalto University, Finland Organizing Committee Nicolas Chauvat, Logilab, France Emmanuelle Gouillart, Joint Unit CNRS/Saint-Gobain, France Kael Hanson, Universit? Libre de Bruxelles, Belgium Renaud Lambiotte, University of Namur, Belgium Thomas Lecocq, Royal Observatory of Belgium Mike M?ller, Python Academy, Germany Didrik Pinte, Enthought Europe Ga?l Varoquaux, INRIA Parietal, Saclay, France Nelle Varoquaux, Mines ParisTech, France From ben.root at ou.edu Mon Mar 4 09:10:54 2013 From: ben.root at ou.edu (Benjamin Root) Date: Mon, 4 Mar 2013 09:10:54 -0500 Subject: [Numpy-discussion] reshaping arrays In-Reply-To: <1362285343.43953.YahooMailNeo@web193402.mail.sg3.yahoo.com> References: <4CDBEE7F-FF2B-43DD-BFA6-AFF80D5364B8@gmail.com> <1362276191.55648.YahooMailNeo@web193404.mail.sg3.yahoo.com> <1362282549.95401.YahooMailNeo@web193406.mail.sg3.yahoo.com> <1362285343.43953.YahooMailNeo@web193402.mail.sg3.yahoo.com> Message-ID: On Sat, Mar 2, 2013 at 11:35 PM, Sudheer Joseph wrote: > Hi Brad, > I am not getting the attribute reshape for the array, are > you having a different version of numpy than mine? > > I have > In [55]: np.__version__ > Out[55]: '1.7.0' > and detail of the shape > > details of variable > > In [57]: ssh?? > Type: NetCDFVariable > String Form: > Namespace: Interactive > Length: 75 > Docstring: NetCDF Variable > > In [58]: ssh.shape > Out[58]: (75, 140, 180) > > ssh?? > Type: NetCDFVariable > String Form: > Namespace: Interactive > Length: 75 > Docstring: NetCDF Variable > > In [66]: ssh.shape > Out[66]: (75, 140, 180) > > In [67]: ssh.reshape(75,140*180) > --------------------------------------------------------------------------- > AttributeError Traceback (most recent call last) > /home/sjo/RAMA_20120807/adcp/ in () > ----> 1 ssh.reshape(75,140*180) > > AttributeError: reshape > > > Ah, you have a NetCDF variable, which in many ways purposefully looks like a NumPy array, but isn't. Just keep in mind that a NetCDF variable is merely a way to have the data available without actually reading it in until you need it. If you do: ssh_data = ssh[:] Then the NetCDF variable will read all the data in the file and return it as a numpy array that can be manipulated as you wish. I hope that helps! Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From sudheer.joseph at yahoo.com Mon Mar 4 09:43:06 2013 From: sudheer.joseph at yahoo.com (Sudheer Joseph) Date: Mon, 4 Mar 2013 22:43:06 +0800 (SGT) Subject: [Numpy-discussion] reshaping arrays In-Reply-To: References: <4CDBEE7F-FF2B-43DD-BFA6-AFF80D5364B8@gmail.com> <1362276191.55648.YahooMailNeo@web193404.mail.sg3.yahoo.com> <1362282549.95401.YahooMailNeo@web193406.mail.sg3.yahoo.com> <1362285343.43953.YahooMailNeo@web193402.mail.sg3.yahoo.com> Message-ID: <1362408186.10860.YahooMailNeo@web193405.mail.sg3.yahoo.com> Thanks a lot ?Benjamin,? ?it did the trick. I have another question, I have ?ocean section along latitude 0 ( equator) which is sampled at depths. size of the array is 12x14 but this is just the index of the array I need to make a plot which shows depth value as one axis and longitude values as another axis. Is there a quick way to rescale the data to lat depth section by adding a new axis? depth=[0,10,20,30,40,50,60,70,80,90,100,120] lon=[ 40, ?45, ?50, ?55, ?60, ?65, ?70, ?75, ?80, ?85, ?90, ?95, 100, 105] In [20]: data.shape Out[20]: (12, 14) can you please advice me on what is the best way to?re-scale?the data to depth lat dimensions from the indices 1-12 and 1-14 With best regards, Sudheer From: Benjamin Root To: Discussion of Numerical Python Sent: Monday, 4 March 2013 7:40 PM Subject: Re: [Numpy-discussion] reshaping arrays On Sat, Mar 2, 2013 at 11:35 PM, Sudheer Joseph wrote: Hi Brad, >? ? ? ? ? ? ? ? I am not getting the attribute reshape for the array, are you having a different version of numpy than mine? > > >I have? >In [55]: np.__version__ >Out[55]: '1.7.0' >and detail of the shape > > >details of variable? > > >In [57]: ssh?? >Type: ? ? ? NetCDFVariable >String Form: >Namespace: ?Interactive >Length: ? ? 75 >Docstring: ?NetCDF Variable > > >In [58]: ssh.shape >Out[58]: (75, 140, 180) > > >ssh?? >Type: ? ? ? NetCDFVariable >String Form: >Namespace: ?Interactive >Length: ? ? 75 >Docstring: ?NetCDF Variable > > >In [66]: ssh.shape >Out[66]: (75, 140, 180) > > >In [67]: ssh.reshape(75,140*180) >--------------------------------------------------------------------------- >AttributeError ? ? ? ? ? ? ? ? ? ? ? ? ? ?Traceback (most recent call last) >/home/sjo/RAMA_20120807/adcp/ in () >----> 1 ssh.reshape(75,140*180) > > >AttributeError: reshape > > > Ah, you have a NetCDF variable, which in many ways purposefully looks like a NumPy array, but isn't.? Just keep in mind that a NetCDF variable is merely a way to have the data available without actually reading it in until you need it.? If you do: ssh_data = ssh[:] Then the NetCDF variable will read all the data in the file and return it as a numpy array that can be manipulated as you wish. I hope that helps! Ben Root _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Mon Mar 4 13:04:09 2013 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Mon, 4 Mar 2013 10:04:09 -0800 Subject: [Numpy-discussion] step paramter for linspace In-Reply-To: <1362155763.7312.62.camel@farnsworth> References: <1362140720.13987.0.camel@sebastian-laptop> <1362141194.7312.27.camel@farnsworth> <1362147275.7312.43.camel@farnsworth> <1362148368.7312.48.camel@farnsworth> <5130C2C9.3070603@gmail.com> <1362150466.7312.50.camel@farnsworth> <5130CDEC.6010206@gmail.com> <1362155397.13987.99.camel@sebastian-laptop> <1362155763.7312.62.camel@farnsworth> Message-ID: <-9069729710388254232@unknownmsgid> On Mar 1, 2013, at 8:39 AM, Henry Gomersall wrote: > On Fri, 2013-03-01 at 17:29 > Actually, I buy this could be useful. Yes, it could. How about a "farange", designed for floating point values -- I imagine someone smarter than me about for could write one that would guarantee that end-point was exact, and steps were within For error of exact. CHB > I think it's helpful to think > about the potential problems though. > > Henry > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From paul.anton.letnes at gmail.com Mon Mar 4 15:39:10 2013 From: paul.anton.letnes at gmail.com (Paul Anton Letnes) Date: Mon, 4 Mar 2013 21:39:10 +0100 Subject: [Numpy-discussion] Adding .abs() method to the array object In-Reply-To: References: Message-ID: On 24. feb. 2013, at 02:20, josef.pktd at gmail.com wrote: > On Sat, Feb 23, 2013 at 3:33 PM, Robert Kern wrote: >> On Sat, Feb 23, 2013 at 7:25 PM, Nathaniel Smith wrote: >>> On Sat, Feb 23, 2013 at 3:38 PM, Till Stensitzki wrote: >>>> Hello, >>>> i know that the array object is already crowded, but i would like >>>> to see the abs method added, especially doing work on the console. >>>> Considering that many much less used functions are also implemented >>>> as a method, i don't think adding one more would be problematic. >>> >>> My gut feeling is that we have too many methods on ndarray, not too >>> few, but in any case, can you elaborate? What's the rationale for why >>> np.abs(a) is so much harder than a.abs(), and why this function and >>> not other unary functions? >> >> Or even abs(a). > > > my reason is that I often use > > arr.max() > but then decide I want to us abs and need > np.max(np.abs(arr)) > instead of arr.abs().max() (and often I write that first to see the > error message) > > I don't like > np.abs(arr).max() > because I have to concentrate to much on the braces, especially if arr > is a calculation > > I wrote several times > def maxabs(arr): > return np.max(np.abs(arr)) > > silly, but I use it often and np.is_close is not useful (doesn't show how close) > > Just a small annoyance, but I think it's the method that I miss most often. > > Josef Very well put. I wholeheartedly agree. I'd be sort of happy with all functions becoming np.xxx() in numpy 2.0, for consistency. Paul From ralf.gommers at gmail.com Mon Mar 4 15:41:46 2013 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Mon, 4 Mar 2013 21:41:46 +0100 Subject: [Numpy-discussion] GSOC 2013 In-Reply-To: References: Message-ID: On Tue, Feb 26, 2013 at 11:17 AM, Todd wrote: > Is numpy planning to participate in GSOC this year, either on their own or > as a part of another group? > If we participate, it should be under the PSF organization. I suspect participation for NumPy (and SciPy) largely depends on mentors being available. > If so, should we start trying to get some project suggestions together? > That can't hurt - good project descriptions will be useful not just for GSOC but also for people new to the project looking for ways to contribute. I suggest to use the wiki on Github for that. Ralf > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From toddrjen at gmail.com Mon Mar 4 17:29:38 2013 From: toddrjen at gmail.com (Todd) Date: Mon, 4 Mar 2013 23:29:38 +0100 Subject: [Numpy-discussion] GSOC 2013 In-Reply-To: References: Message-ID: On Mon, Mar 4, 2013 at 9:41 PM, Ralf Gommers wrote: > > > > On Tue, Feb 26, 2013 at 11:17 AM, Todd wrote: > >> Is numpy planning to participate in GSOC this year, either on their own >> or as a part of another group? >> > > If we participate, it should be under the PSF organization. I suspect > participation for NumPy (and SciPy) largely depends on mentors being > available. > > >> If so, should we start trying to get some project suggestions together? >> > > That can't hurt - good project descriptions will be useful not just for > GSOC but also for people new to the project looking for ways to contribute. > I suggest to use the wiki on Github for that. > > Ralf > > > > I have some ideas, but they may not be suitable for GSOC or may just be terrible ideas, so feel free to reject them: 1. A polar dtype. It would be similar to the complex dtype in that it would have two components, but instead of them being real and imaginary, they would be amplitude and angle. Besides the dtype, there should be either functions or methods to convert between complex and polar dtypes, and existing functions should be prepared to handle the new dtype. I it could be made to be able to handle an arbitrary number of dimensions this would be better yet, but I don't know if this is possible not to mention practical. There is a lot of mathematics, including both signal processing and vector analysis, that is often convenient to work with in this format. 2. We discussed this before, but right now subclasses of ndarray don't have any way to preserve their class attributes when using functions that work on multiple ndarrays, such as with concatenate. The current __array_finalize__ method only takes a single array. This project would be to work out a method to handle this sort of situation, perhaps requiring a new method, and making sure numpy methods and functions properly invoke it. 3. Structured arrays are accessed in a manner similar to python dictionaries, using a key. However, they don't support the normal python dictionary methods like keys, values, items, iterkeys, itervalues, iteritems, etc. This project would be to implement as much of the dictionary (and ordereddict) API as possible in structured arrays (making sure that the resulting API presented to the user takes into account whether python 2 or python 3 is being used). 4. The numpy ndarray class stores data in a regular manner in memory. This makes many linear algebra operations easier, but makes changing the number of elements in an array nearly impossible in practice unless you are very careful. There are other data structures that make adding and removing elements easier, but are not as efficient at linear algebra operations. The purpose of this project would be to create such a class in numpy, one that is duck type compatible with ndarray but makes resizing feasible. This would obviously come at a performance penalty for linear algebra related functions. They would still have consistent dtypes and could not be nested, unlike python lists. This could either be based on a new c-based type or be a subclass of list under the hood. 5. Currently dtypes are limited to a set of fixed types, or combinations of these types. You can't have, say, a 48 bit float or a 1-bit bool. This project would be to allow users to create entirely new, non-standard dtypes based on simple rules, such as specifying the length of the sign, length of the exponent, and length of the mantissa for a custom floating-point number. Hopefully this would mostly be used for reading in non-standard data and not used that often, but for some situations it could be useful for storing data too (such as large amounts of boolean data, or genetic code which can be stored in 2 bits and is often very large). -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaime.frio at gmail.com Mon Mar 4 18:21:05 2013 From: jaime.frio at gmail.com (=?ISO-8859-1?Q?Jaime_Fern=E1ndez_del_R=EDo?=) Date: Mon, 4 Mar 2013 15:21:05 -0800 Subject: [Numpy-discussion] GSOC 2013 In-Reply-To: References: Message-ID: On Mon, Mar 4, 2013 at 2:29 PM, Todd wrote: > > 5. Currently dtypes are limited to a set of fixed types, or combinations > of these types. You can't have, say, a 48 bit float or a 1-bit bool. This > project would be to allow users to create entirely new, non-standard dtypes > based on simple rules, such as specifying the length of the sign, length of > the exponent, and length of the mantissa for a custom floating-point > number. Hopefully this would mostly be used for reading in non-standard > data and not used that often, but for some situations it could be useful > for storing data too (such as large amounts of boolean data, or genetic > code which can be stored in 2 bits and is often very large). > I second this general idea. Simply having a pair of packbits/unpackbits functions that could work with 2 and 4 bit uints would make my life easier. If it were possible to have an array of dtype 'uint4' that used half the space of a 'uint8', but could have ufuncs an the like ran on it, it would be pure bliss. Not that I'm complaining, but a man can dream... Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaime.frio at gmail.com Mon Mar 4 19:23:46 2013 From: jaime.frio at gmail.com (=?ISO-8859-1?Q?Jaime_Fern=E1ndez_del_R=EDo?=) Date: Mon, 4 Mar 2013 16:23:46 -0800 Subject: [Numpy-discussion] polyfit with fixed points Message-ID: A couple of days back, answering a question in StackExchange ( http://stackoverflow.com/a/15196628/110026), I found myself using Lagrange multipliers to fit a polynomial with least squares to data, making sure it went through some fixed points. This time it was relatively easy, because some 5 years ago I came across the same problem in real life, and spent the better part of a week banging my head against it. Even knowing what you are doing, it is far from simple, and in my own experience very useful: I think the only time ever I have fitted a polynomial to data with a definite purpose, it required that some points were fixed. Seeing that polyfit is entirely coded in python, it would be relatively straightforward to add support for fixed points. It is also something I feel capable, and willing, of doing. * Is such an additional feature something worthy of investigating, or will it never find its way into numpy.polyfit? * Any ideas on the best syntax for the extra parameters? Thanks, Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From aron at ahmadia.net Mon Mar 4 19:53:57 2013 From: aron at ahmadia.net (Aron Ahmadia) Date: Mon, 4 Mar 2013 19:53:57 -0500 Subject: [Numpy-discussion] polyfit with fixed points In-Reply-To: References: Message-ID: Interesting, that question would probably have gotten a different response on scicomp, it is a pity we are not attracting more questions there! I know there are two polyfit modules in numpy, one in numpy.polyfit, the other in numpy.polynomial, the functionality you are suggesting seems to fit in either. What parameters/functionality are you considering adding? A On Mon, Mar 4, 2013 at 7:23 PM, Jaime Fern?ndez del R?o < jaime.frio at gmail.com> wrote: > A couple of days back, answering a question in StackExchange ( > http://stackoverflow.com/a/15196628/110026), I found myself using > Lagrange multipliers to fit a polynomial with least squares to data, making > sure it went through some fixed points. This time it was relatively easy, > because some 5 years ago I came across the same problem in real life, and > spent the better part of a week banging my head against it. Even knowing > what you are doing, it is far from simple, and in my own experience very > useful: I think the only time ever I have fitted a polynomial to data with > a definite purpose, it required that some points were fixed. > > Seeing that polyfit is entirely coded in python, it would be relatively > straightforward to add support for fixed points. It is also something I > feel capable, and willing, of doing. > > * Is such an additional feature something worthy of investigating, or > will it never find its way into numpy.polyfit? > * Any ideas on the best syntax for the extra parameters? > > Thanks, > > Jaime > > -- > (\__/) > ( O.o) > ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes > de dominaci?n mundial. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mail.till at gmx.de Mon Mar 4 20:09:00 2013 From: mail.till at gmx.de (Till Stensitzki) Date: Tue, 5 Mar 2013 01:09:00 +0000 (UTC) Subject: [Numpy-discussion] GSOC 2013 References: Message-ID: Todd gmail.com> writes: > > I have some ideas, but they may not be suitable for GSOC or may just be terrible ideas, so feel free to reject them: > I have also a possible (terrible?) idea in my mind: Including (maybe optional as blas) faster transcendental functions into numpy. Something like https://github.com/herumi/fmath or using the MKL. I think numpy just uses the standard std functions, whiche are not optimized for speed. greetings Till From jaime.frio at gmail.com Mon Mar 4 20:45:45 2013 From: jaime.frio at gmail.com (=?ISO-8859-1?Q?Jaime_Fern=E1ndez_del_R=EDo?=) Date: Mon, 4 Mar 2013 17:45:45 -0800 Subject: [Numpy-discussion] polyfit with fixed points In-Reply-To: References: Message-ID: On Mon, Mar 4, 2013 at 4:53 PM, Aron Ahmadia wrote: > Interesting, that question would probably have gotten a different response > on scicomp, it is a pity we are not attracting more questions there! > > I know there are two polyfit modules in numpy, one in numpy.polyfit, the > other in numpy.polynomial, the functionality you are suggesting seems to > fit in either. > > What parameters/functionality are you considering adding? > Well, you need two more array-likes, x_fixed and y_fixed, which could probably be fed to polyfit as an optional tuple parameter: polyfit(x, y, deg, fixed_points=(x_fixed, y_fixed)) The standard return would still be the deg + 1 coefficients of the fitted polynomial, so the workings would be perfectly backwards compatible. An optional return, either when full=True, or by setting an additional lagrange_mult=True flag, could include the values of the Lagrange multipliers calculated during the fit. Jaime > A > > > On Mon, Mar 4, 2013 at 7:23 PM, Jaime Fern?ndez del R?o < > jaime.frio at gmail.com> wrote: > >> A couple of days back, answering a question in StackExchange ( >> http://stackoverflow.com/a/15196628/110026), I found myself using >> Lagrange multipliers to fit a polynomial with least squares to data, making >> sure it went through some fixed points. This time it was relatively easy, >> because some 5 years ago I came across the same problem in real life, and >> spent the better part of a week banging my head against it. Even knowing >> what you are doing, it is far from simple, and in my own experience very >> useful: I think the only time ever I have fitted a polynomial to data with >> a definite purpose, it required that some points were fixed. >> >> Seeing that polyfit is entirely coded in python, it would be relatively >> straightforward to add support for fixed points. It is also something I >> feel capable, and willing, of doing. >> >> * Is such an additional feature something worthy of investigating, or >> will it never find its way into numpy.polyfit? >> * Any ideas on the best syntax for the extra parameters? >> >> Thanks, >> >> Jaime >> >> -- >> (\__/) >> ( O.o) >> ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes >> de dominaci?n mundial. >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon Mar 4 23:10:21 2013 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 4 Mar 2013 21:10:21 -0700 Subject: [Numpy-discussion] Remove interactive setup Message-ID: In distutils there are three files that provide some interactive setup: 1. numpy/distutils/core.py 2. numpy/distutils/fcompiler/gnu.py 3. numpy/distutils/interactive.py In Python3 `raw_input` has been renamed 'input' and python2 'input' is gone. I propose that the easiest solution to this compatibility problem is to remove all support for interactive numpy setup. Thoughts? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From aron at ahmadia.net Mon Mar 4 23:12:47 2013 From: aron at ahmadia.net (Aron Ahmadia) Date: Mon, 4 Mar 2013 23:12:47 -0500 Subject: [Numpy-discussion] Remove interactive setup In-Reply-To: References: Message-ID: I've built numpy on many different machines, including supercomputers, and I have never used interactive setup. I agree with the proposal to remove it. A On Mon, Mar 4, 2013 at 11:10 PM, Charles R Harris wrote: > In distutils there are three files that provide some interactive setup: > > > 1. numpy/distutils/core.py > 2. numpy/distutils/fcompiler/gnu.py > 3. numpy/distutils/interactive.py > > In Python3 `raw_input` has been renamed 'input' and python2 'input' is > gone. I propose that the easiest solution to this compatibility problem is > to remove all support for interactive numpy setup. > > Thoughts? > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon Mar 4 23:25:48 2013 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 4 Mar 2013 21:25:48 -0700 Subject: [Numpy-discussion] polyfit with fixed points In-Reply-To: References: Message-ID: On Mon, Mar 4, 2013 at 5:53 PM, Aron Ahmadia wrote: > Interesting, that question would probably have gotten a different response > on scicomp, it is a pity we are not attracting more questions there! > > I know there are two polyfit modules in numpy, one in numpy.polyfit, the > other in numpy.polynomial, the functionality you are suggesting seems to > fit in either. > > What parameters/functionality are you considering adding? > > A > The discussion list convention is to bottom post. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon Mar 4 23:37:43 2013 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 4 Mar 2013 21:37:43 -0700 Subject: [Numpy-discussion] polyfit with fixed points In-Reply-To: References: Message-ID: On Mon, Mar 4, 2013 at 5:23 PM, Jaime Fern?ndez del R?o < jaime.frio at gmail.com> wrote: > A couple of days back, answering a question in StackExchange ( > http://stackoverflow.com/a/15196628/110026), I found myself using > Lagrange multipliers to fit a polynomial with least squares to data, making > sure it went through some fixed points. This time it was relatively easy, > because some 5 years ago I came across the same problem in real life, and > spent the better part of a week banging my head against it. Even knowing > what you are doing, it is far from simple, and in my own experience very > useful: I think the only time ever I have fitted a polynomial to data with > a definite purpose, it required that some points were fixed. > > Seeing that polyfit is entirely coded in python, it would be relatively > straightforward to add support for fixed points. It is also something I > feel capable, and willing, of doing. > > * Is such an additional feature something worthy of investigating, or > will it never find its way into numpy.polyfit? > * Any ideas on the best syntax for the extra parameters? > > There are actually seven versions of polynomial fit, two for the usual polynomial basis, and one each for Legendre, Chebyshev, Hermite, Hermite_e, and Laguerre ;) How do you propose to implement it? I think Lagrange multipliers is overkill, I'd rather see using the weights (approximate) or change of variable -- a permutation in this case -- followed by qr and lstsq. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From kwmsmith at gmail.com Mon Mar 4 23:49:30 2013 From: kwmsmith at gmail.com (Kurt Smith) Date: Mon, 4 Mar 2013 22:49:30 -0600 Subject: [Numpy-discussion] GSOC 2013 In-Reply-To: References: Message-ID: On Mon, Mar 4, 2013 at 4:29 PM, Todd wrote: > > 3. Structured arrays are accessed in a manner similar to python dictionaries, > using a key. However, they don't support the normal python dictionary > methods like keys, values, items, iterkeys, itervalues, iteritems, etc. This > project would be to implement as much of the dictionary (and ordereddict) API > as possible in structured arrays (making sure that the resulting API > presented to the user takes into account whether python 2 or python 3 is > being used). Along these lines: what about implementing the new "memory friendly" dictionary [0] with a NumPy structured array backend for the dense array portion, and allowing any specified column of the array to be the dictionary keys? This would merge the strengths of NumPy structured arrays with Python dictionaries. Some thought would have to be given to mutability / immutability issues, but these are surmountable. Further enhancements would be to allow for multiple key columns -- analogous to multiple indexes into a database. [0] http://mail.python.org/pipermail/python-dev/2012-December/123028.html > > 4. The numpy ndarray class stores data in a regular manner in memory. This > makes many linear algebra operations easier, but makes changing the number > of elements in an array nearly impossible in practice unless you are very > careful. There are other data structures that make adding and removing > elements easier, but are not as efficient at linear algebra operations. The > purpose of this project would be to create such a class in numpy, one that > is duck type compatible with ndarray but makes resizing feasible. This > would obviously come at a performance penalty for linear algebra related > functions. They would still have consistent dtypes and could not be nested, > unlike python lists. This could either be based on a new c-based type or be > a subclass of list under the hood. This made me think of a serious performance limitation of structured dtypes: a structured dtype is always "packed", which may lead to terrible byte alignment for common types. For instance, `dtype([('a', 'u1'), ('b', 'u8')]).itemsize == 9`, meaning that the 8-byte integer is not aligned as an equivalent C-struct's would be, leading to all sorts of horrors at the cache and register level. Python's ctypes does the right thing here, and can be mined for ideas. For instance, the equivalent ctypes Structure adds pad bytes so the 8-byte integer is on the correct boundary: class Aligned(ctypes.Structure): _fields_ = [('a', ctypes.c_uint8), ('b', ctypes.c_uint64)] print ctypes.sizeof(Aligned()) # --> 16 I'd be surprised if someone hasn't already proposed fixing this, although perhaps this would be outside the scope of a GSOC project. I'm willing to wager that the performance improvements would be easily measureable. Just some more thoughts. Kurt From ralf.gommers at gmail.com Tue Mar 5 00:37:49 2013 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Tue, 5 Mar 2013 06:37:49 +0100 Subject: [Numpy-discussion] Remove interactive setup In-Reply-To: References: Message-ID: On Tue, Mar 5, 2013 at 5:12 AM, Aron Ahmadia wrote: > I've built numpy on many different machines, including supercomputers, and > I have never used interactive setup. I agree with the proposal to remove > it. > > A > > > On Mon, Mar 4, 2013 at 11:10 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> In distutils there are three files that provide some interactive setup: >> >> >> 1. numpy/distutils/core.py >> 2. numpy/distutils/fcompiler/gnu.py >> 3. numpy/distutils/interactive.py >> >> In Python3 `raw_input` has been renamed 'input' and python2 'input' is >> gone. I propose that the easiest solution to this compatibility problem is >> to remove all support for interactive numpy setup. >> >> Thoughts? >> > +1 for removing Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue Mar 5 01:34:39 2013 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 4 Mar 2013 23:34:39 -0700 Subject: [Numpy-discussion] Remove interactive setup In-Reply-To: References: Message-ID: On Mon, Mar 4, 2013 at 10:37 PM, Ralf Gommers wrote: > > > > On Tue, Mar 5, 2013 at 5:12 AM, Aron Ahmadia wrote: > >> I've built numpy on many different machines, including supercomputers, >> and I have never used interactive setup. I agree with the proposal to >> remove it. >> >> A >> >> >> On Mon, Mar 4, 2013 at 11:10 PM, Charles R Harris < >> charlesr.harris at gmail.com> wrote: >> >>> In distutils there are three files that provide some interactive setup: >>> >>> >>> 1. numpy/distutils/core.py >>> 2. numpy/distutils/fcompiler/gnu.py >>> 3. numpy/distutils/interactive.py >>> >>> In Python3 `raw_input` has been renamed 'input' and python2 'input' is >>> gone. I propose that the easiest solution to this compatibility problem is >>> to remove all support for interactive numpy setup. >>> >>> Thoughts? >>> >> > +1 for removing > > I note that the way to access it is to run python setup.py with no arguments. I wonder what the proper message should be in that case? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From brad.froehle at gmail.com Tue Mar 5 01:59:33 2013 From: brad.froehle at gmail.com (Bradley M. Froehle) Date: Mon, 4 Mar 2013 22:59:33 -0800 Subject: [Numpy-discussion] Remove interactive setup In-Reply-To: References: Message-ID: On Mon, Mar 4, 2013 at 10:34 PM, Charles R Harris wrote: > I note that the way to access it is to run python setup.py with no > arguments. I wonder what the proper message should be in that case? > How about usage instructions and an error message, similar to what a basic distutils setup script would provide? -Brad -------------- next part -------------- An HTML attachment was scrubbed... URL: From Nicolas.Rougier at inria.fr Tue Mar 5 02:01:32 2013 From: Nicolas.Rougier at inria.fr (Nicolas Rougier) Date: Tue, 5 Mar 2013 08:01:32 +0100 Subject: [Numpy-discussion] GSOC 2013 In-Reply-To: References: Message-ID: > This made me think of a serious performance limitation of structured dtypes: a > structured dtype is always "packed", which may lead to terrible byte alignment > for common types. For instance, `dtype([('a', 'u1'), ('b', > 'u8')]).itemsize == 9`, > meaning that the 8-byte integer is not aligned as an equivalent C-struct's > would be, leading to all sorts of horrors at the cache and register level. > Python's ctypes does the right thing here, and can be mined for ideas. For > instance, the equivalent ctypes Structure adds pad bytes so the 8-byte integer > is on the correct boundary: > > class Aligned(ctypes.Structure): > _fields_ = [('a', ctypes.c_uint8), > ('b', ctypes.c_uint64)] > > print ctypes.sizeof(Aligned()) # --> 16 > > I'd be surprised if someone hasn't already proposed fixing this, although > perhaps this would be outside the scope of a GSOC project. I'm willing to > wager that the performance improvements would be easily measureable. I've been confronted to this very problem and ended up coding a "group class" which is a "split" structured array (each field is stored as a single array) offering the same interface as a regular structured array. http://www.loria.fr/~rougier/coding/software/numpy_group.py Nicolas From jaime.frio at gmail.com Tue Mar 5 02:41:01 2013 From: jaime.frio at gmail.com (=?ISO-8859-1?Q?Jaime_Fern=E1ndez_del_R=EDo?=) Date: Mon, 4 Mar 2013 23:41:01 -0800 Subject: [Numpy-discussion] polyfit with fixed points In-Reply-To: References: Message-ID: On Mon, Mar 4, 2013 at 8:37 PM, Charles R Harris wrote: > > There are actually seven versions of polynomial fit, two for the usual > polynomial basis, and one each for Legendre, Chebyshev, Hermite, Hermite_e, > and Laguerre ;) > Correct me if I am wrong, but the fitted function is the same regardless of the polynomial basis used. I don't know if there can be numerical stability issues, but chebfit(x, y, n) returns the same as poly2cheb(polyfit(x, y, n)). In any case, with all the already existing support for these special polynomials, it wouldn't be too hard to set the problem up to calculate the right coefficients directly for each case. > How do you propose to implement it? I think Lagrange multipliers is > overkill, I'd rather see using the weights (approximate) or change of > variable -- a permutation in this case -- followed by qr and lstsq. > The weights method is already in place, but I find it rather inelegant and unsatisfactory as a solution to this problem. But if it is deemed sufficient, then there is of course no need to go any further. I hadn't thought of any other way than using Lagrange multipliers, but looking at it in more detail, I am not sure it will be possible to formulate it in a manner that can be fed to lstsq, as polyfit does today. And if it can't, it probably wouldn't make much sense to have two different methods which cannot produce the same full output running under the same hood. I can't figure out your "change of variable" method from the succinct description, could you elaborate a little more? Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From efiring at hawaii.edu Tue Mar 5 02:45:20 2013 From: efiring at hawaii.edu (Eric Firing) Date: Mon, 04 Mar 2013 21:45:20 -1000 Subject: [Numpy-discussion] GSOC 2013 In-Reply-To: References: Message-ID: <5135A290.2050405@hawaii.edu> On 2013/03/04 9:01 PM, Nicolas Rougier wrote: >> >This made me think of a serious performance limitation of structured dtypes: a >> >structured dtype is always "packed", which may lead to terrible byte alignment >> >for common types. For instance, `dtype([('a', 'u1'), ('b', >> >'u8')]).itemsize == 9`, >> >meaning that the 8-byte integer is not aligned as an equivalent C-struct's >> >would be, leading to all sorts of horrors at the cache and register level. Doesn't the "align" kwarg of np.dtype do what you want? In [2]: dt = np.dtype(dict(names=['a', 'b'], formats=['u1', 'u8']), align=True) In [3]: dt.itemsize Out[3]: 16 Eric From robert.kern at gmail.com Tue Mar 5 04:09:44 2013 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 5 Mar 2013 09:09:44 +0000 Subject: [Numpy-discussion] Remove interactive setup In-Reply-To: References: Message-ID: On Tue, Mar 5, 2013 at 6:34 AM, Charles R Harris wrote: > I note that the way to access it is to run python setup.py with no > arguments. I wonder what the proper message should be in that case? Just let distutils handle it. $ python setup.py usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...] or: setup.py --help [cmd1 cmd2 ...] or: setup.py --help-commands or: setup.py cmd --help error: no commands supplied Anyone who was expecting the interactive setup will probably complain here. -- Robert Kern From charlesr.harris at gmail.com Tue Mar 5 08:23:49 2013 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 5 Mar 2013 06:23:49 -0700 Subject: [Numpy-discussion] polyfit with fixed points In-Reply-To: References: Message-ID: On Tue, Mar 5, 2013 at 12:41 AM, Jaime Fern?ndez del R?o < jaime.frio at gmail.com> wrote: > On Mon, Mar 4, 2013 at 8:37 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> There are actually seven versions of polynomial fit, two for the usual >> polynomial basis, and one each for Legendre, Chebyshev, Hermite, Hermite_e, >> and Laguerre ;) >> > > Correct me if I am wrong, but the fitted function is the same regardless > of the polynomial basis used. I don't know if there can be numerical > stability issues, but chebfit(x, y, n) returns the same as > poly2cheb(polyfit(x, y, n)). > > In any case, with all the already existing support for these special > polynomials, it wouldn't be too hard to set the problem up to calculate the > right coefficients directly for each case. > > >> How do you propose to implement it? I think Lagrange multipliers is >> overkill, I'd rather see using the weights (approximate) or change of >> variable -- a permutation in this case -- followed by qr and lstsq. >> > > The weights method is already in place, but I find it rather inelegant and > unsatisfactory as a solution to this problem. But if it is deemed > sufficient, then there is of course no need to go any further. > > I hadn't thought of any other way than using Lagrange multipliers, but > looking at it in more detail, I am not sure it will be possible to > formulate it in a manner that can be fed to lstsq, as polyfit does today. > And if it can't, it probably wouldn't make much sense to have two different > methods which cannot produce the same full output running under the same > hood. > > I can't figure out your "change of variable" method from the succinct > description, could you elaborate a little more? > I think the place to add this is to lstsq as linear constraints. That is, the coefficients must satisfy B * c = y_c for some set of equations B. In the polynomial case the rows of B would be the powers of x at the points you want to constrain. Then do an svd on B, B = u * d * v. Apply v to the design matrix of the unconstrained points A' = A * v.T so that B' becomes u * d. The coefficients are now replaced by new variables c' with the contraints in the first two columns. If there are, say, 2 constraints, u * d will be 2x2. Solve that equation for the first two constraints then multiply the first two columns of the design matrix A' by the result and put them on the rhs, i.e., y = y - A'[:, :2] * c'[:2] then solve the usual l least squares thing with A[:, 2:] * c'[2:] = y to get the rest of the transformed coefficients c'. Put the coefficients altogether and multiply with v^T to get c = v^T * c' Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue Mar 5 08:41:55 2013 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 5 Mar 2013 06:41:55 -0700 Subject: [Numpy-discussion] polyfit with fixed points In-Reply-To: References: Message-ID: On Tue, Mar 5, 2013 at 6:23 AM, Charles R Harris wrote: > > > On Tue, Mar 5, 2013 at 12:41 AM, Jaime Fern?ndez del R?o < > jaime.frio at gmail.com> wrote: > >> On Mon, Mar 4, 2013 at 8:37 PM, Charles R Harris < >> charlesr.harris at gmail.com> wrote: >> >>> >>> There are actually seven versions of polynomial fit, two for the usual >>> polynomial basis, and one each for Legendre, Chebyshev, Hermite, Hermite_e, >>> and Laguerre ;) >>> >> >> Correct me if I am wrong, but the fitted function is the same regardless >> of the polynomial basis used. I don't know if there can be numerical >> stability issues, but chebfit(x, y, n) returns the same as >> poly2cheb(polyfit(x, y, n)). >> >> In any case, with all the already existing support for these special >> polynomials, it wouldn't be too hard to set the problem up to calculate the >> right coefficients directly for each case. >> >> >>> How do you propose to implement it? I think Lagrange multipliers is >>> overkill, I'd rather see using the weights (approximate) or change of >>> variable -- a permutation in this case -- followed by qr and lstsq. >>> >> >> The weights method is already in place, but I find it rather inelegant >> and unsatisfactory as a solution to this problem. But if it is deemed >> sufficient, then there is of course no need to go any further. >> >> I hadn't thought of any other way than using Lagrange multipliers, but >> looking at it in more detail, I am not sure it will be possible to >> formulate it in a manner that can be fed to lstsq, as polyfit does today. >> And if it can't, it probably wouldn't make much sense to have two different >> methods which cannot produce the same full output running under the same >> hood. >> >> I can't figure out your "change of variable" method from the succinct >> description, could you elaborate a little more? >> > > I think the place to add this is to lstsq as linear constraints. That is, > the coefficients must satisfy B * c = y_c for some set of equations B. In > the polynomial case the rows of B would be the powers of x at the points > you want to constrain. Then do an svd on B, B = u * d * v. Apply v to the > design matrix of the unconstrained points A' = A * v.T so that B' becomes > u * d. The coefficients are now replaced by new variables c' with the > contraints in the first two columns. If there are, say, 2 constraints, u * > d will be 2x2. Solve that equation for the first two constraints then > multiply the first two columns of the design matrix A' by the result and > put them on the rhs, i.e., > > y = y - A'[:, :2] * c'[:2] > > then solve the usual l least squares thing with > > A[:, 2:] * c'[2:] = y > > to get the rest of the transformed coefficients c'. Put the coefficients > altogether and multiply with v^T to get > > c = v^T * c' > > There are a few missing `'` in there, but I think you can get the idea, we are making the substitution c = v^T * c'. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From pelson.pub at gmail.com Tue Mar 5 09:15:07 2013 From: pelson.pub at gmail.com (Phil Elson) Date: Tue, 5 Mar 2013 14:15:07 +0000 Subject: [Numpy-discussion] Implementing a "find first" style function Message-ID: The ticket https://github.com/numpy/numpy/issues/2269 discusses the possibility of implementing a "find first" style function which can optimise the process of finding the first value(s) which match a predicate in a given 1D array. For example: >>> a = np.sin(np.linspace(0, np.pi, 200)) >>> print find_first(a, lambda a: a > 0.9) ((71, ), 0.900479032457) This has been discussed in several locations: https://github.com/numpy/numpy/issues/2269 https://github.com/numpy/numpy/issues/2333 http://stackoverflow.com/questions/7632963/numpy-array-how-to-find-index-of-first-occurrence-of-item *Rationale* For small arrays there is no real reason to avoid doing: >>> a = np.sin(np.linspace(0, np.pi, 200)) >>> ind = (a > 0.9).nonzero()[0][0] >>> print (ind, ), a[ind] (71,) 0.900479032457 But for larger arrays, this can lead to massive amounts of work even if the result is one of the first to be computed. Example: >>> a = np.arange(1e8) >>> print (a == 5).nonzero()[0][0] 5 So a function which terminates when the first matching value is found is desirable. As mentioned in #2269, it is possible to define a consistent ordering which allows this functionality for >1D arrays, but IMHO it overcomplicates the problem and was not a case that I personally needed, so I've limited the scope to 1D arrays only. *Implementation* My initial assumption was that to get any kind of performance I would need to write the *find* function in C, however after prototyping with some array chunking it became apparent that a trivial python function would be quick enough for my needs. The approach I've implemented in the code found in #2269 simply breaks the array into sub-arrays of maximum length *chunk_size* (2048 by default, though there is no real science to this number), applies the given predicating function, and yields the results from *nonzero()*. The given function should be a python function which operates on the whole of the sub-array element-wise (i.e. the function should be vectorized). Returning a generator also has the benefit of allowing users to get the first *n*matching values/indices. *Results* I timed the implementation of *find* found in my comment at https://github.com/numpy/numpy/issues/2269#issuecomment-14436725 with an obvious test: In [1]: from np_utils import find In [2]: import numpy as np In [3]: import numpy.random In [4]: np.random.seed(1) In [5]: a = np.random.randn(1e8) In [6]: a.min(), a.max() Out[6]: (-6.1194900990552776, 5.9632246301166321) In [7]: next(find(a, lambda a: np.abs(a) > 6)) Out[7]: ((33105441,), -6.1194900990552776) In [8]: (np.abs(a) > 6).nonzero() Out[8]: (array([33105441]),) In [9]: %timeit (np.abs(a) > 6).nonzero() 1 loops, best of 3: 1.51 s per loop In [10]: %timeit next(find(a, lambda a: np.abs(a) > 6)) 1 loops, best of 3: 912 ms per loop In [11]: %timeit next(find(a, lambda a: np.abs(a) > 6, chunk_size=100000)) 1 loops, best of 3: 470 ms per loop In [12]: %timeit next(find(a, lambda a: np.abs(a) > 6, chunk_size=1000000)) 1 loops, best of 3: 483 ms per loop This shows that picking a sensible *chunk_size* can yield massive speed-ups (nonzero is x3 slower in one case). A similar example with a much smaller 1D array shows similar promise: In [41]: a = np.random.randn(1e4) In [42]: %timeit next(find(a, lambda a: np.abs(a) > 3)) 10000 loops, best of 3: 35.8 us per loop In [43]: %timeit (np.abs(a) > 3).nonzero() 10000 loops, best of 3: 148 us per loop As I commented on the issue tracker, if you think this function is worth taking forward, I'd be happy to open up a pull request. Feedback greatfully received. Cheers, Phil -------------- next part -------------- An HTML attachment was scrubbed... URL: From gbuday at gmail.com Tue Mar 5 09:58:38 2013 From: gbuday at gmail.com (Gergely Buday) Date: Tue, 5 Mar 2013 15:58:38 +0100 Subject: [Numpy-discussion] scipy_distutils.fcompiler Message-ID: Hi there, I try to compile a program developed with scipy. It is installed on my Ubuntu 12.04 box but upon make I get: Traceback (most recent call last): File "/usr/local/bin/f2py", line 4, in f2py2e.main() File "/usr/local/lib/python2.7/dist-packages/f2py2e/f2py2e.py", line 677, in main run_compile() File "/usr/local/lib/python2.7/dist-packages/f2py2e/f2py2e.py", line 536, in run_compile import scipy_distutils.fcompiler ImportError: No module named scipy_distutils.fcompiler What should I do to fix this? I have scipy Version: 0.9.0+dfsg1-1ubuntu2 - Gergely From djpine at gmail.com Tue Mar 5 10:50:34 2013 From: djpine at gmail.com (David Pine) Date: Tue, 5 Mar 2013 10:50:34 -0500 Subject: [Numpy-discussion] polyfit with fixed points In-Reply-To: References: Message-ID: Jaime, If you are going to work on this, you should also take a look at the recent thread http://mail.scipy.org/pipermail/numpy-discussion/2013-February/065649.html, which is about the weighting function, which is in a confused state in the current version of polyfit. By the way, Numerical Recipes has a nice discussion both about fixing parameters and about weighting the data in different ways in polynomial least squares fitting. David On Mon, Mar 4, 2013 at 7:23 PM, Jaime Fern?ndez del R?o < jaime.frio at gmail.com> wrote: > A couple of days back, answering a question in StackExchange ( > http://stackoverflow.com/a/15196628/110026), I found myself using > Lagrange multipliers to fit a polynomial with least squares to data, making > sure it went through some fixed points. This time it was relatively easy, > because some 5 years ago I came across the same problem in real life, and > spent the better part of a week banging my head against it. Even knowing > what you are doing, it is far from simple, and in my own experience very > useful: I think the only time ever I have fitted a polynomial to data with > a definite purpose, it required that some points were fixed. > > Seeing that polyfit is entirely coded in python, it would be relatively > straightforward to add support for fixed points. It is also something I > feel capable, and willing, of doing. > > * Is such an additional feature something worthy of investigating, or > will it never find its way into numpy.polyfit? > * Any ideas on the best syntax for the extra parameters? > > Thanks, > > Jaime > > -- > (\__/) > ( O.o) > ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes > de dominaci?n mundial. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From eirik.gjerlow at astro.uio.no Tue Mar 5 10:56:40 2013 From: eirik.gjerlow at astro.uio.no (=?ISO-8859-1?Q?Eirik_Gjerl=F8w?=) Date: Tue, 05 Mar 2013 15:56:40 +0000 Subject: [Numpy-discussion] scipy_distutils.fcompiler In-Reply-To: References: Message-ID: <513615B8.9080006@uio.no> Hey Gergely, On my box, the fcompiler module is in numpy.distutils, so import numpy.distutils.fcompiler works for me at least! Eirik On 05. mars 2013 14:58, Gergely Buday wrote: > Hi there, > > I try to compile a program developed with scipy. It is installed on my > Ubuntu 12.04 box but upon make I get: > > Traceback (most recent call last): > File "/usr/local/bin/f2py", line 4, in > f2py2e.main() > File "/usr/local/lib/python2.7/dist-packages/f2py2e/f2py2e.py", line > 677, in main > run_compile() > File "/usr/local/lib/python2.7/dist-packages/f2py2e/f2py2e.py", line > 536, in run_compile > import scipy_distutils.fcompiler > ImportError: No module named scipy_distutils.fcompiler > > What should I do to fix this? > > I have scipy Version: 0.9.0+dfsg1-1ubuntu2 > > - Gergely > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From cournape at gmail.com Tue Mar 5 11:24:31 2013 From: cournape at gmail.com (David Cournapeau) Date: Tue, 5 Mar 2013 16:24:31 +0000 Subject: [Numpy-discussion] scipy_distutils.fcompiler In-Reply-To: References: Message-ID: On Tue, Mar 5, 2013 at 2:58 PM, Gergely Buday wrote: > Hi there, > > I try to compile a program developed with scipy. It is installed on my > Ubuntu 12.04 box but upon make I get: > > Traceback (most recent call last): > File "/usr/local/bin/f2py", line 4, in > f2py2e.main() > File "/usr/local/lib/python2.7/dist-packages/f2py2e/f2py2e.py", line > 677, in main > run_compile() > File "/usr/local/lib/python2.7/dist-packages/f2py2e/f2py2e.py", line > 536, in run_compile > import scipy_distutils.fcompiler > ImportError: No module named scipy_distutils.fcompiler > Looks like you're having an ancient f2py in there. You may want to use the one included in numpy instead. David From andrew.collette at gmail.com Tue Mar 5 12:33:56 2013 From: andrew.collette at gmail.com (Andrew Collette) Date: Tue, 5 Mar 2013 10:33:56 -0700 Subject: [Numpy-discussion] GSOC 2013 In-Reply-To: References: Message-ID: >> 5. Currently dtypes are limited to a set of fixed types, or combinations >> of these types. You can't have, say, a 48 bit float or a 1-bit bool. This >> project would be to allow users to create entirely new, non-standard dtypes >> based on simple rules, such as specifying the length of the sign, length of >> the exponent, and length of the mantissa for a custom floating-point number. >> Hopefully this would mostly be used for reading in non-standard data and not >> used that often, but for some situations it could be useful for storing data >> too (such as large amounts of boolean data, or genetic code which can be >> stored in 2 bits and is often very large). > > > I second this general idea. Simply having a pair of packbits/unpackbits > functions that could work with 2 and 4 bit uints would make my life easier. > If it were possible to have an array of dtype 'uint4' that used half the > space of a 'uint8', but could have ufuncs an the like ran on it, it would be > pure bliss. Not that I'm complaining, but a man can dream... I also think this would make a great addition to NumPy. People may even be able to save some work by leveraging the HDF5 code base; the HDF5 guys have piles and piles of carefully tested C code for exactly this purpose; converting between the common IEEE float sizes and those with user-specified mantissa/exponents; 1, 2, 3 bit etc. integers and the like. It's all under a BSD-compatible license. You'd have to replace the bits which talk to the HDF5 type description system, but it might be a good place to start. Andrew From kwmsmith at gmail.com Tue Mar 5 13:14:22 2013 From: kwmsmith at gmail.com (Kurt Smith) Date: Tue, 5 Mar 2013 12:14:22 -0600 Subject: [Numpy-discussion] GSOC 2013 In-Reply-To: <5135A290.2050405@hawaii.edu> References: <5135A290.2050405@hawaii.edu> Message-ID: On Tue, Mar 5, 2013 at 1:45 AM, Eric Firing wrote: > On 2013/03/04 9:01 PM, Nicolas Rougier wrote: >>> >This made me think of a serious performance limitation of structured dtypes: a >>> >structured dtype is always "packed", which may lead to terrible byte alignment >>> >for common types. For instance, `dtype([('a', 'u1'), ('b', >>> >'u8')]).itemsize == 9`, >>> >meaning that the 8-byte integer is not aligned as an equivalent C-struct's >>> >would be, leading to all sorts of horrors at the cache and register level. > > Doesn't the "align" kwarg of np.dtype do what you want? > > In [2]: dt = np.dtype(dict(names=['a', 'b'], formats=['u1', 'u8']), > align=True) > > In [3]: dt.itemsize > Out[3]: 16 Thanks! That's what I get for not checking before posting. Consider this my vote to make `aligned=True` the default. > > Eric > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From njs at pobox.com Tue Mar 5 13:52:56 2013 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 5 Mar 2013 18:52:56 +0000 Subject: [Numpy-discussion] GSOC 2013 In-Reply-To: References: Message-ID: On 4 Mar 2013 23:21, "Jaime Fern?ndez del R?o" wrote: > > On Mon, Mar 4, 2013 at 2:29 PM, Todd wrote: >> >> >> 5. Currently dtypes are limited to a set of fixed types, or combinations of these types. You can't have, say, a 48 bit float or a 1-bit bool. This project would be to allow users to create entirely new, non-standard dtypes based on simple rules, such as specifying the length of the sign, length of the exponent, and length of the mantissa for a custom floating-point number. Hopefully this would mostly be used for reading in non-standard data and not used that often, but for some situations it could be useful for storing data too (such as large amounts of boolean data, or genetic code which can be stored in 2 bits and is often very large). > > > I second this general idea. Simply having a pair of packbits/unpackbits functions that could work with 2 and 4 bit uints would make my life easier. If it were possible to have an array of dtype 'uint4' that used half the space of a 'uint8', but could have ufuncs an the like ran on it, it would be pure bliss. Not that I'm complaining, but a man can dream... This would be quite difficult, since it would require reworking the guts of the ndarray data structure to store strides and buffer offsets in bits rather than bytes, and probably with endianness handling too. Indexing is all done at the ndarray buffer-of-bytes layer, without any involvement of the dtype. Consider: a = zeros(10, dtype=uint4) b = a[1::3] Now b is a view onto a discontiguous set of half-bytes within a... You could have a dtype that represented several uint4s that together added up to an integral number of bytes, sort of like a structured dtype. Or packbits()/unpackbits(), like you say. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From toddrjen at gmail.com Wed Mar 6 03:38:12 2013 From: toddrjen at gmail.com (Todd) Date: Wed, 6 Mar 2013 09:38:12 +0100 Subject: [Numpy-discussion] GSOC 2013 In-Reply-To: References: Message-ID: On Mar 5, 2013 7:53 PM, "Nathaniel Smith" wrote: > > On 4 Mar 2013 23:21, "Jaime Fern?ndez del R?o" wrote: > > > > On Mon, Mar 4, 2013 at 2:29 PM, Todd wrote: > >> > >> > >> 5. Currently dtypes are limited to a set of fixed types, or combinations of these types. You can't have, say, a 48 bit float or a 1-bit bool. This project would be to allow users to create entirely new, non-standard dtypes based on simple rules, such as specifying the length of the sign, length of the exponent, and length of the mantissa for a custom floating-point number. Hopefully this would mostly be used for reading in non-standard data and not used that often, but for some situations it could be useful for storing data too (such as large amounts of boolean data, or genetic code which can be stored in 2 bits and is often very large). > > > > > > I second this general idea. Simply having a pair of packbits/unpackbits functions that could work with 2 and 4 bit uints would make my life easier. If it were possible to have an array of dtype 'uint4' that used half the space of a 'uint8', but could have ufuncs an the like ran on it, it would be pure bliss. Not that I'm complaining, but a man can dream... > > This would be quite difficult, since it would require reworking the guts of the ndarray data structure to store strides and buffer offsets in bits rather than bytes, and probably with endianness handling too. Indexing is all done at the ndarray buffer-of-bytes layer, without any involvement of the dtype. > > Consider: > > a = zeros(10, dtype=uint4) > b = a[1::3] > > Now b is a view onto a discontiguous set of half-bytes within a... > > You could have a dtype that represented several uint4s that together added up to an integral number of bytes, sort of like a structured dtype. Or packbits()/unpackbits(), like you say. > > -n Then perhaps such a project could be a four-stage thing. 1. Allow for the creation of int, unit, float, bool, and complex dtypes with an arbitrary number of bytes 2. Allow for the creation of dtypes which are integer fractions of a byte, (1, 2, or 4 bits), and must be padded to a whole byte. 3. Have an optional internal value in an array that tells it to exclude the last n bits of the last byte. This would be used to hide the padding from step 2. This should be abstracted into a general-purpose method for excluding bits from the byte-to-dtype conversion so it can be used in step 4. 4. Allow for the creation of dtypes that are non-integer fractions of a byte or non-integer multiples of a byte (3, 5, 6, 7, 9, 10, 11, 12, etc, bits). Each element in the array would be stored as a certain number of bytes, with the method from 3 used to cut it down to the right number of bits. So a 3 bit dtype would have two elements per byte with 2 bits excluded. A 5 bit dtype would have 1 element per byte with 3 bits excluded. A 12 bit dtype would have one element in two bytes with with 4 bits excluded from the second byte. This approach would allow for arbitrary numbers of bits without breaking the internal representation, would have each stage building off the previous stage, and we would still have something useful even if not all the stages are completed. -------------- next part -------------- An HTML attachment was scrubbed... URL: From francesc at continuum.io Wed Mar 6 05:29:52 2013 From: francesc at continuum.io (Francesc Alted) Date: Wed, 06 Mar 2013 11:29:52 +0100 Subject: [Numpy-discussion] GSOC 2013 In-Reply-To: References: <5135A290.2050405@hawaii.edu> Message-ID: <51371AA0.1040808@continuum.io> On 3/5/13 7:14 PM, Kurt Smith wrote: > On Tue, Mar 5, 2013 at 1:45 AM, Eric Firing wrote: >> On 2013/03/04 9:01 PM, Nicolas Rougier wrote: >>>>> This made me think of a serious performance limitation of structured dtypes: a >>>>> structured dtype is always "packed", which may lead to terrible byte alignment >>>>> for common types. For instance, `dtype([('a', 'u1'), ('b', >>>>> 'u8')]).itemsize == 9`, >>>>> meaning that the 8-byte integer is not aligned as an equivalent C-struct's >>>>> would be, leading to all sorts of horrors at the cache and register level. >> Doesn't the "align" kwarg of np.dtype do what you want? >> >> In [2]: dt = np.dtype(dict(names=['a', 'b'], formats=['u1', 'u8']), >> align=True) >> >> In [3]: dt.itemsize >> Out[3]: 16 > Thanks! That's what I get for not checking before posting. > > Consider this my vote to make `aligned=True` the default. I would not run too much. The example above takes 9 bytes to host the structure, while a `aligned=True` will take 16 bytes. I'd rather let the default as it is, and in case performance is critical, you can always copy the unaligned field to a new (homogeneous) array. -- Francesc Alted From sunghwanchoi91 at gmail.com Wed Mar 6 07:22:23 2013 From: sunghwanchoi91 at gmail.com (Sunghwan Choi) Date: Wed, 6 Mar 2013 21:22:23 +0900 Subject: [Numpy-discussion] embedding numpy ImportError: numpy.core.multiarray failed to import Message-ID: <1ab301ce1a65$44961a70$cdc24f50$@gmail.com> Hi, I tried to embedding numpy in C++ but I got ImportError: numpy.core.multiarray failed to import Do you know any ways to solve this problem? I copy my codes and error message Makefile CXX= icpc all: exe clean: rm -rf *.o exe exe: test.o $(CXX) -o exe test.o -L/home/shchoi/program/epd/lib/ -lpython2.7 test.o : test.cpp $(CXX) -c test.cpp -I/home/shchoi/program/epd/lib/python2.7/site-packages/numpy/core/include/nu mpy/ -I/home/shchoi/program/epd/include/python2.7 tmp.cpp #include "Python.h" #include "arrayobject.h" #include #include extern "C" void Py_Initialize(); extern "C" void PyErr_Print(); using namespace std; int main(int argc, char* argv[]) { double answer = 0; PyObject *modname, *mod, *mdict, *func, *stringarg, *args, *rslt; Py_Initialize(); import_array(); modname = PyString_FromString("numpy"); mod = PyImport_Import(modname); PyErr_Print(); cout << mod << endl; Py_Finalize(); return 0; } $ make icpc -c test.cpp -I/home/shchoi/program/epd/lib/python2.7/site-packages/numpy/core/include/nu mpy/ -I/home/shchoi/program/epd/include/python2.7 test.cpp(15): warning #117: non-void function "main" should return a value import_array(); ^ icpc -o exe test.o -L/home/shchoi/program/epd/lib/ -lpython2.7 #-L/home/shchoi/program/epd/lib/python2.7/site-packages/numpy/core/ $ ./exe ImportError: numpy.core.multiarray failed to import Sunghwan Choi -------------- next part -------------- An HTML attachment was scrubbed... URL: From marcelcoding+numpy at gmail.com Wed Mar 6 11:14:22 2013 From: marcelcoding+numpy at gmail.com (Marcel Stimberg) Date: Wed, 6 Mar 2013 17:14:22 +0100 Subject: [Numpy-discussion] scipy.weave + nose testing no longer works with numpy 1.7.0 Message-ID: Hi, I noticed that our unit tests running under nose and using scipy.weave started to fail with numpy 1.7.0 because of a change in numpy.distutils.exec_command (called by scipy.weave) which now assumes that sys.stdout always provides a fileno function (which fails because nose redirect stdout to a cStringIO). I guess the combination of scipy.weave and nose is not that unusual for scientific software, maybe 1.7.1 could make the exec_command a bit more robust in that regard? I filed the issue as #2999 in github (including a simple example triggering the error): https://github.com/numpy/numpy/issues/2999 Thanks Marcel From dan.blanchard at gmail.com Wed Mar 6 11:41:25 2013 From: dan.blanchard at gmail.com (Dan Blanchard) Date: Wed, 6 Mar 2013 11:41:25 -0500 Subject: [Numpy-discussion] Help trying to fix issue 368 on Github (math functions fail confusingly on long integers (and object arrays generally)) Message-ID: Hi, I've been trying to take a crack at fixing https://github.com/numpy/numpy/issues/368, and I think I've identified all of the affected functions and even a potential fix, but I'm new to the Python C API and the numpy source, so if anyone has time to look at the discussion on Github and chime in with suggestions, I'd be glad to help finish getting this patched up. It is currently very frustrating that many of the math functions do not work with longs. Thanks, Dan -------------- next part -------------- An HTML attachment was scrubbed... URL: From dynamicgl at gmail.com Wed Mar 6 11:44:32 2013 From: dynamicgl at gmail.com (Gelin Yan) Date: Thu, 7 Mar 2013 00:44:32 +0800 Subject: [Numpy-discussion] a question about freeze on numpy 1.7.0 In-Reply-To: References: Message-ID: On Mon, Feb 25, 2013 at 4:09 PM, Bradley M. Froehle wrote: > I submitted a bug report (and patch) to cx_freeze. You can follow up with > them at http://sourceforge.net/p/cx-freeze/bugs/36/. > > -Brad > > > On Mon, Feb 25, 2013 at 12:06 AM, Gelin Yan wrote: > >> >> >> On Mon, Feb 25, 2013 at 3:53 PM, Bradley M. Froehle < >> brad.froehle at gmail.com> wrote: >> >>> I can reproduce with NumPy 1.7.0, but I'm not convinced the bug lies >>> within NumPy. >>> >>> The exception is not being raised on the `del sys` line. Rather it is >>> being raised in numpy.__init__: >>> >>> File >>> "/home/bfroehle/.local/lib/python2.7/site-packages/cx_Freeze/initscripts/Console.py", >>> line 27, in >>> exec code in m.__dict__ >>> File "numpytest.py", line 1, in >>> import numpy >>> File >>> "/home/bfroehle/.local/lib/python2.7/site-packages/numpy/__init__.py", line >>> 147, in >>> from core import * >>> AttributeError: 'module' object has no attribute 'sys' >>> >>> This is because, somehow, `'sys' in numpy.core.__all__` returns True in >>> the cx_Freeze context but False in the regular Python context. >>> >>> -Brad >>> >>> >>> On Sun, Feb 24, 2013 at 10:49 PM, Gelin Yan wrote: >>> >>>> >>>> >>>> On Mon, Feb 25, 2013 at 9:16 AM, Ond?ej ?ert?k >>> > wrote: >>>> >>>>> Hi Gelin, >>>>> >>>>> On Sun, Feb 24, 2013 at 12:08 AM, Gelin Yan >>>>> wrote: >>>>> > Hi All >>>>> > >>>>> > When I used numpy 1.7.0 with cx_freeze 4.3.1 on windows, I >>>>> quickly >>>>> > found out even a simple "import numpy" may lead to program failed >>>>> with >>>>> > following exception: >>>>> > >>>>> > "AttributeError: 'module' object has no attribute 'sys' >>>>> > >>>>> > After a poking around some codes I noticed /numpy/core/__init__.py >>>>> has a >>>>> > line 'del sys' at the bottom. After I commented this line, and >>>>> repacked the >>>>> > whole program, It ran fine. >>>>> > I also noticed this 'del sys' didn't exist on numpy 1.6.2 >>>>> > >>>>> > I am curious why this 'del sys' should be here and whether it is >>>>> safe to >>>>> > omit it. Thanks. >>>>> >>>>> The "del sys" line was introduced in the commit: >>>>> >>>>> >>>>> https://github.com/numpy/numpy/commit/4c0576fe9947ef2af8351405e0990cebd83ccbb6 >>>>> >>>>> and it seems to me that it is needed so that the numpy.core namespace >>>>> is not >>>>> cluttered by it. >>>>> >>>>> Can you post the full stacktrace of your program (and preferably some >>>>> instructions >>>>> how to reproduce the problem)? It should become clear where the >>>>> problem is. >>>>> >>>>> Thanks, >>>>> Ondrej >>>>> _______________________________________________ >>>>> NumPy-Discussion mailing list >>>>> NumPy-Discussion at scipy.org >>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>>> >>>> >>>> Hi Ondrej >>>> >>>> I attached two files here for demonstration. you need cx_freeze to >>>> build a standalone executable file. simply running python setup.py build >>>> and try to run the executable file you may see this exception. This >>>> example works with numpy 1.6.2. Thanks. >>>> >>>> Regards >>>> >>>> gelin yan >>>> >>>> >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>> >>>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> Hi Bradley >> >> So is it supposed to be a bug of cx_freeze? Any work around for that >> except omit 'del sys'? If the answer is no, I may consider submit a ticket >> on cx_freeze site. Thanks >> >> Regards >> >> gelin yan >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > Hi Brad please feel free to check it http://sourceforge.net/p/cx-freeze/bugs/36/ someone from cx_freeze has replied it. Thanks again Regards gelin yan -------------- next part -------------- An HTML attachment was scrubbed... URL: From kwmsmith at gmail.com Wed Mar 6 13:12:27 2013 From: kwmsmith at gmail.com (Kurt Smith) Date: Wed, 6 Mar 2013 12:12:27 -0600 Subject: [Numpy-discussion] GSOC 2013 In-Reply-To: <51371AA0.1040808@continuum.io> References: <5135A290.2050405@hawaii.edu> <51371AA0.1040808@continuum.io> Message-ID: On Wed, Mar 6, 2013 at 4:29 AM, Francesc Alted wrote: > On 3/5/13 7:14 PM, Kurt Smith wrote: >> On Tue, Mar 5, 2013 at 1:45 AM, Eric Firing wrote: >>> On 2013/03/04 9:01 PM, Nicolas Rougier wrote: >>>>>> This made me think of a serious performance limitation of structured dtypes: a >>>>>> structured dtype is always "packed", which may lead to terrible byte alignment >>>>>> for common types. For instance, `dtype([('a', 'u1'), ('b', >>>>>> 'u8')]).itemsize == 9`, >>>>>> meaning that the 8-byte integer is not aligned as an equivalent C-struct's >>>>>> would be, leading to all sorts of horrors at the cache and register level. >>> Doesn't the "align" kwarg of np.dtype do what you want? >>> >>> In [2]: dt = np.dtype(dict(names=['a', 'b'], formats=['u1', 'u8']), >>> align=True) >>> >>> In [3]: dt.itemsize >>> Out[3]: 16 >> Thanks! That's what I get for not checking before posting. >> >> Consider this my vote to make `aligned=True` the default. > > I would not run too much. The example above takes 9 bytes to host the > structure, while a `aligned=True` will take 16 bytes. I'd rather let > the default as it is, and in case performance is critical, you can > always copy the unaligned field to a new (homogeneous) array. Yes, I can absolutely see the case you're making here, and I made my "vote" with the understanding that `aligned=False` will almost certainly stay the default. Adding 'aligned=True' is simple for me to do, so no harm done. My case is based on what's the least surprising behavior: C structs / all C compilers, the builtin `struct` module, and ctypes `Structure` subclasses all use padding to ensure aligned fields by default. You can turn this off to get packed structures, but the default behavior in these other places is alignment, which is why I was surprised when I first saw that NumPy structured dtypes are packed by default. > > -- > Francesc Alted > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From kwmsmith at gmail.com Wed Mar 6 13:42:52 2013 From: kwmsmith at gmail.com (Kurt Smith) Date: Wed, 6 Mar 2013 12:42:52 -0600 Subject: [Numpy-discussion] aligned / unaligned structured dtype behavior (was: GSOC 2013) Message-ID: On Wed, Mar 6, 2013 at 12:12 PM, Kurt Smith wrote: > On Wed, Mar 6, 2013 at 4:29 AM, Francesc Alted wrote: >> >> I would not run too much. The example above takes 9 bytes to host the >> structure, while a `aligned=True` will take 16 bytes. I'd rather let >> the default as it is, and in case performance is critical, you can >> always copy the unaligned field to a new (homogeneous) array. > > Yes, I can absolutely see the case you're making here, and I made my > "vote" with the understanding that `aligned=False` will almost > certainly stay the default. Adding 'aligned=True' is simple for me to > do, so no harm done. > > My case is based on what's the least surprising behavior: C structs / > all C compilers, the builtin `struct` module, and ctypes `Structure` > subclasses all use padding to ensure aligned fields by default. You > can turn this off to get packed structures, but the default behavior > in these other places is alignment, which is why I was surprised when > I first saw that NumPy structured dtypes are packed by default. > Some surprises with aligned / unaligned arrays: #----------------------------- import numpy as np packed_dt = np.dtype((('a', 'u1'), ('b', 'u8')), align=False) aligned_dt = np.dtype((('a', 'u1'), ('b', 'u8')), align=True) packed_arr = np.ones((10**6,), dtype=packed_dt) aligned_arr = np.ones((10**6,), dtype=aligned_dt) print "all(packed_arr['a'] == aligned_arr['a'])", np.all(packed_arr['a'] == aligned_arr['a']) # True print "all(packed_arr['b'] == aligned_arr['b'])", np.all(packed_arr['b'] == aligned_arr['b']) # True print "all(packed_arr == aligned_arr)", np.all(packed_arr == aligned_arr) # False (!!) #----------------------------- I can understand what's likely going on under the covers that makes these arrays not compare equal, but I'd expect that if all columns of two structured arrays are everywhere equal, then the arrays themselves would be everywhere equal. Bug? And regarding performance, doing simple timings shows a 30%-ish slowdown for unaligned operations: In [36]: %timeit packed_arr['b']**2 100 loops, best of 3: 2.48 ms per loop In [37]: %timeit aligned_arr['b']**2 1000 loops, best of 3: 1.9 ms per loop Whereas summing shows just a 10%-ish slowdown: In [38]: %timeit packed_arr['b'].sum() 1000 loops, best of 3: 1.29 ms per loop In [39]: %timeit aligned_arr['b'].sum() 1000 loops, best of 3: 1.14 ms per loop From charlesr.harris at gmail.com Wed Mar 6 13:43:41 2013 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 6 Mar 2013 11:43:41 -0700 Subject: [Numpy-discussion] Scheduling the 1.7.1 and 1.8 releases Message-ID: Hi All, There are now some 14 non-merge commits in the 1.7.x branch including the critical diagonal leak fix. I think there is maybe one more critical backport and perhaps several low priority fixes, documentation and such, but I think we should start up the release process with a goal of getting 1.7.1 out by the middle of April. The development branch has been accumulating stuff since last summer, I suggest we look to get it out in May, branching at the end of this month. Thoughts? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From efiring at hawaii.edu Wed Mar 6 13:56:47 2013 From: efiring at hawaii.edu (Eric Firing) Date: Wed, 06 Mar 2013 08:56:47 -1000 Subject: [Numpy-discussion] GSOC 2013 In-Reply-To: References: <5135A290.2050405@hawaii.edu> Message-ID: <5137916F.5080708@hawaii.edu> On 2013/03/05 8:14 AM, Kurt Smith wrote: > On Tue, Mar 5, 2013 at 1:45 AM, Eric Firing wrote: >> On 2013/03/04 9:01 PM, Nicolas Rougier wrote: >>>>> This made me think of a serious performance limitation of structured dtypes: a >>>>> structured dtype is always "packed", which may lead to terrible byte alignment >>>>> for common types. For instance, `dtype([('a', 'u1'), ('b', >>>>> 'u8')]).itemsize == 9`, >>>>> meaning that the 8-byte integer is not aligned as an equivalent C-struct's >>>>> would be, leading to all sorts of horrors at the cache and register level. >> >> Doesn't the "align" kwarg of np.dtype do what you want? >> >> In [2]: dt = np.dtype(dict(names=['a', 'b'], formats=['u1', 'u8']), >> align=True) >> >> In [3]: dt.itemsize >> Out[3]: 16 > > Thanks! That's what I get for not checking before posting. > > Consider this my vote to make `aligned=True` the default. I strongly oppose this, because it would break the common usage of structured dtypes for reading packed binary data from files. I see no reason to change the default. Eric > >> >> Eric >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From ralf.gommers at gmail.com Wed Mar 6 14:42:45 2013 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Wed, 6 Mar 2013 20:42:45 +0100 Subject: [Numpy-discussion] scipy.weave + nose testing no longer works with numpy 1.7.0 In-Reply-To: References: Message-ID: On Wed, Mar 6, 2013 at 5:14 PM, Marcel Stimberg < marcelcoding+numpy at gmail.com> wrote: > Hi, > > I noticed that our unit tests running under nose and using scipy.weave > started to fail with numpy 1.7.0 because of a change in > numpy.distutils.exec_command (called by scipy.weave) which now assumes > that sys.stdout always provides a fileno function (which fails because > nose redirect stdout to a cStringIO). I guess the combination of > scipy.weave and nose is not that unusual for scientific software, > maybe 1.7.1 could make the exec_command a bit more robust in that > regard? I filed the issue as #2999 in github (including a simple > example triggering the error): > https://github.com/numpy/numpy/issues/2999 > That ticket has been sitting in my inbox for the last 2 weeks, sorry for not replying earlier. It's yet again a case of a small and seemingly harmless change in distutils breaking a fair amount of things. I added it to the 1.7.1 milestone. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Wed Mar 6 15:06:16 2013 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 6 Mar 2013 20:06:16 +0000 Subject: [Numpy-discussion] Scheduling the 1.7.1 and 1.8 releases In-Reply-To: References: Message-ID: On Wed, Mar 6, 2013 at 6:43 PM, Charles R Harris wrote: > Hi All, > > There are now some 14 non-merge commits in the 1.7.x branch including the > critical diagonal leak fix. I think there is maybe one more critical > backport and perhaps several low priority fixes, documentation and such, but > I think we should start up the release process with a goal of getting 1.7.1 > out by the middle of April. What's the critical backport you're thinking of? This last shows just two backport PRs waiting to be merged, one trivial one that I just submitted, the other that needs a tweak but won't take long: https://github.com/numpy/numpy/issues?milestone=27&page=1&state=open But I agree, basically we should merge those two (today?) and then release the first RC as soon as Ondrej has a moment to do so... > The development branch has been accumulating stuff since last summer, I > suggest we look to get it out in May, branching at the end of this month. I would say "let's fix the blockers and then branch as soon as Ondrej has time to do it", but in practice I suspect this comes out the same as what you just said :-). I just pruned the list of blockers; here's what we've got: https://github.com/numpy/numpy/issues?milestone=1&page=1&state=open -n From njs at pobox.com Wed Mar 6 15:09:00 2013 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 6 Mar 2013 20:09:00 +0000 Subject: [Numpy-discussion] Numpy deprecation schedule Message-ID: A number of items on the 1.8 todo list are reminders to remove things that we deprecated in 1.7, and said we would remove in 1.8, e.g.: https://github.com/numpy/numpy/issues/596 https://github.com/numpy/numpy/issues/294 But, since 1.8 is so soon after 1.7, we probably shouldn't actually do that. I suggest we switch to a time-based deprecation schedule, where instead of saying "this will be removed in N releases" we say "this will be removed in the first release on or after (now+N months)". I also suggest that we set N=12, because it's a round number, it roughly matches numpy's historical release cycle, and because AFAICT that's the number that python itself uses for core and stdlib deprecations. Thoughts? -n From nouiz at nouiz.org Wed Mar 6 15:21:46 2013 From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=) Date: Wed, 6 Mar 2013 15:21:46 -0500 Subject: [Numpy-discussion] Numpy deprecation schedule In-Reply-To: References: Message-ID: That sound good. To be sure, the "now" mean the first release that include the deprecation, in that case NumPy 1.7? Fred On Wed, Mar 6, 2013 at 3:09 PM, Nathaniel Smith wrote: > A number of items on the 1.8 todo list are reminders to remove things > that we deprecated in 1.7, and said we would remove in 1.8, e.g.: > https://github.com/numpy/numpy/issues/596 > https://github.com/numpy/numpy/issues/294 > > But, since 1.8 is so soon after 1.7, we probably shouldn't actually do that. > > I suggest we switch to a time-based deprecation schedule, where > instead of saying "this will be removed in N releases" we say "this > will be removed in the first release on or after (now+N months)". > > I also suggest that we set N=12, because it's a round number, it > roughly matches numpy's historical release cycle, and because AFAICT > that's the number that python itself uses for core and stdlib > deprecations. > > Thoughts? > -n > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From njs at pobox.com Wed Mar 6 15:38:47 2013 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 6 Mar 2013 20:38:47 +0000 Subject: [Numpy-discussion] Numpy deprecation schedule In-Reply-To: References: Message-ID: On Wed, Mar 6, 2013 at 8:21 PM, Fr?d?ric Bastien wrote: > That sound good. To be sure, the "now" mean the first release that > include the deprecation, in that case NumPy 1.7? Yes. -n From ralf.gommers at gmail.com Wed Mar 6 15:52:23 2013 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Wed, 6 Mar 2013 21:52:23 +0100 Subject: [Numpy-discussion] Scheduling the 1.7.1 and 1.8 releases In-Reply-To: References: Message-ID: On Wed, Mar 6, 2013 at 9:06 PM, Nathaniel Smith wrote: > On Wed, Mar 6, 2013 at 6:43 PM, Charles R Harris > wrote: > > Hi All, > > > > There are now some 14 non-merge commits in the 1.7.x branch including the > > critical diagonal leak fix. I think there is maybe one more critical > > backport and perhaps several low priority fixes, documentation and such, > but > > I think we should start up the release process with a goal of getting > 1.7.1 > > out by the middle of April. > > What's the critical backport you're thinking of? This last shows just > two backport PRs waiting to be merged, one trivial one that I just > submitted, the other that needs a tweak but won't take long: > https://github.com/numpy/numpy/issues?milestone=27&page=1&state=open > But I agree, basically we should merge those two (today?) and then > release the first RC as soon as Ondrej has a moment to do so... > I added issue 2999, which I think should be taken along. Other than that, +1 for a quick release. > > The development branch has been accumulating stuff since last summer, I > > suggest we look to get it out in May, branching at the end of this month. > > I would say "let's fix the blockers and then branch as soon as Ondrej > has time to do it", but in practice I suspect this comes out the same > as what you just said :-). I just pruned the list of blockers; here's > what we've got: > https://github.com/numpy/numpy/issues?milestone=1&page=1&state=open > It looks like we're not doing so well with setting Milestones correctly. Only 4 closed issues for 1.8.... Release quickly after 1.7.1 sounds good. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Wed Mar 6 16:16:19 2013 From: ben.root at ou.edu (Benjamin Root) Date: Wed, 6 Mar 2013 16:16:19 -0500 Subject: [Numpy-discussion] Implementing a "find first" style function In-Reply-To: References: Message-ID: On Tue, Mar 5, 2013 at 9:15 AM, Phil Elson wrote: > The ticket https://github.com/numpy/numpy/issues/2269 discusses the > possibility of implementing a "find first" style function which can > optimise the process of finding the first value(s) which match a predicate > in a given 1D array. For example: > > > >>> a = np.sin(np.linspace(0, np.pi, 200)) > >>> print find_first(a, lambda a: a > 0.9) > ((71, ), 0.900479032457) > > > This has been discussed in several locations: > > https://github.com/numpy/numpy/issues/2269 > https://github.com/numpy/numpy/issues/2333 > > http://stackoverflow.com/questions/7632963/numpy-array-how-to-find-index-of-first-occurrence-of-item > > > *Rationale* > > For small arrays there is no real reason to avoid doing: > > >>> a = np.sin(np.linspace(0, np.pi, 200)) > >>> ind = (a > 0.9).nonzero()[0][0] > >>> print (ind, ), a[ind] > (71,) 0.900479032457 > > > But for larger arrays, this can lead to massive amounts of work even if > the result is one of the first to be computed. Example: > > >>> a = np.arange(1e8) > >>> print (a == 5).nonzero()[0][0] > 5 > > > So a function which terminates when the first matching value is found is > desirable. > > As mentioned in #2269, it is possible to define a consistent ordering > which allows this functionality for >1D arrays, but IMHO it overcomplicates > the problem and was not a case that I personally needed, so I've limited > the scope to 1D arrays only. > > > *Implementation* > > My initial assumption was that to get any kind of performance I would need > to write the *find* function in C, however after prototyping with some > array chunking it became apparent that a trivial python function would be > quick enough for my needs. > > The approach I've implemented in the code found in #2269 simply breaks the > array into sub-arrays of maximum length *chunk_size* (2048 by default, > though there is no real science to this number), applies the given > predicating function, and yields the results from *nonzero()*. The given > function should be a python function which operates on the whole of the > sub-array element-wise (i.e. the function should be vectorized). Returning > a generator also has the benefit of allowing users to get the first *n*matching values/indices. > > > *Results* > > > I timed the implementation of *find* found in my comment at > https://github.com/numpy/numpy/issues/2269#issuecomment-14436725 with an > obvious test: > > > In [1]: from np_utils import find > > In [2]: import numpy as np > > In [3]: import numpy.random > > In [4]: np.random.seed(1) > > In [5]: a = np.random.randn(1e8) > > In [6]: a.min(), a.max() > Out[6]: (-6.1194900990552776, 5.9632246301166321) > > In [7]: next(find(a, lambda a: np.abs(a) > 6)) > Out[7]: ((33105441,), -6.1194900990552776) > > In [8]: (np.abs(a) > 6).nonzero() > Out[8]: (array([33105441]),) > > In [9]: %timeit (np.abs(a) > 6).nonzero() > 1 loops, best of 3: 1.51 s per loop > > In [10]: %timeit next(find(a, lambda a: np.abs(a) > 6)) > 1 loops, best of 3: 912 ms per loop > > In [11]: %timeit next(find(a, lambda a: np.abs(a) > 6, chunk_size=100000)) > 1 loops, best of 3: 470 ms per loop > > In [12]: %timeit next(find(a, lambda a: np.abs(a) > 6, chunk_size=1000000)) > 1 loops, best of 3: 483 ms per loop > > > This shows that picking a sensible *chunk_size* can yield massive > speed-ups (nonzero is x3 slower in one case). A similar example with a much > smaller 1D array shows similar promise: > > In [41]: a = np.random.randn(1e4) > > In [42]: %timeit next(find(a, lambda a: np.abs(a) > 3)) > 10000 loops, best of 3: 35.8 us per loop > > In [43]: %timeit (np.abs(a) > 3).nonzero() > 10000 loops, best of 3: 148 us per loop > > > As I commented on the issue tracker, if you think this function is worth > taking forward, I'd be happy to open up a pull request. > > Feedback greatfully received. > > Cheers, > > Phil > > > In the interest of generalizing code and such, could such approaches be used for functions like np.any() and np.all() for short-circuiting if True or False (respectively) are found? I wonder what other sort of functions in NumPy might benefit from this? Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Wed Mar 6 16:24:11 2013 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Wed, 6 Mar 2013 22:24:11 +0100 Subject: [Numpy-discussion] Numpy deprecation schedule In-Reply-To: References: Message-ID: On Wed, Mar 6, 2013 at 9:38 PM, Nathaniel Smith wrote: > On Wed, Mar 6, 2013 at 8:21 PM, Fr?d?ric Bastien wrote: > > That sound good. To be sure, the "now" mean the first release that > > include the deprecation, in that case NumPy 1.7? > > Yes. > +1 $ git add HOWTO_DEPRECATE.rst.txt ? Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Wed Mar 6 16:40:47 2013 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 6 Mar 2013 21:40:47 +0000 Subject: [Numpy-discussion] Numpy deprecation schedule In-Reply-To: References: Message-ID: On Wed, Mar 6, 2013 at 9:24 PM, Ralf Gommers wrote: > On Wed, Mar 6, 2013 at 9:38 PM, Nathaniel Smith wrote: >> >> On Wed, Mar 6, 2013 at 8:21 PM, Fr?d?ric Bastien wrote: >> > That sound good. To be sure, the "now" mean the first release that >> > include the deprecation, in that case NumPy 1.7? >> >> Yes. > > > +1 > > $ git add HOWTO_DEPRECATE.rst.txt ? +1 I'm vaguely intimidated by the doc structure, so I'm not sure where this would go, but... aside from a formal description of how one does a deprecation and the difference between DeprecationWarning and FutureWarning, etc., we might even want to just add a whole page in the manual that just lists the current status of all ongoing deprecations, the releases where each change was made, the date for the next change, etc., and use that as our canonical reference that we check before each release? Since this is information that end-users want to be able to see? ("I got this weird warning... what is it trying to tell me? Which release started issuing it? What's my deadline for fixing this?") And because this whole cycle of filing multiple bugs and then shunting them off to the next release is pretty awkward. -n From sebastian at sipsolutions.net Wed Mar 6 17:05:22 2013 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Wed, 06 Mar 2013 23:05:22 +0100 Subject: [Numpy-discussion] aligned / unaligned structured dtype behavior (was: GSOC 2013) In-Reply-To: References: Message-ID: <1362607522.3944.7.camel@sebastian-laptop> On Wed, 2013-03-06 at 12:42 -0600, Kurt Smith wrote: > On Wed, Mar 6, 2013 at 12:12 PM, Kurt Smith wrote: > > On Wed, Mar 6, 2013 at 4:29 AM, Francesc Alted wrote: > >> > >> I would not run too much. The example above takes 9 bytes to host the > >> structure, while a `aligned=True` will take 16 bytes. I'd rather let > >> the default as it is, and in case performance is critical, you can > >> always copy the unaligned field to a new (homogeneous) array. > > > > Yes, I can absolutely see the case you're making here, and I made my > > "vote" with the understanding that `aligned=False` will almost > > certainly stay the default. Adding 'aligned=True' is simple for me to > > do, so no harm done. > > > > My case is based on what's the least surprising behavior: C structs / > > all C compilers, the builtin `struct` module, and ctypes `Structure` > > subclasses all use padding to ensure aligned fields by default. You > > can turn this off to get packed structures, but the default behavior > > in these other places is alignment, which is why I was surprised when > > I first saw that NumPy structured dtypes are packed by default. > > > > Some surprises with aligned / unaligned arrays: > > #----------------------------- > > import numpy as np > > packed_dt = np.dtype((('a', 'u1'), ('b', 'u8')), align=False) > aligned_dt = np.dtype((('a', 'u1'), ('b', 'u8')), align=True) > > packed_arr = np.ones((10**6,), dtype=packed_dt) > aligned_arr = np.ones((10**6,), dtype=aligned_dt) > > print "all(packed_arr['a'] == aligned_arr['a'])", > np.all(packed_arr['a'] == aligned_arr['a']) # True > print "all(packed_arr['b'] == aligned_arr['b'])", > np.all(packed_arr['b'] == aligned_arr['b']) # True > print "all(packed_arr == aligned_arr)", np.all(packed_arr == > aligned_arr) # False (!!) > > #----------------------------- > > I can understand what's likely going on under the covers that makes > these arrays not compare equal, but I'd expect that if all columns of > two structured arrays are everywhere equal, then the arrays themselves > would be everywhere equal. Bug? > Yes and no... equal for structured types seems not implemented, you get the same (wrong) False also with (packed_arr == packed_arr). But if the types are equivalent but np.equal not implemented, just returning False is a bit dangerous I agree. Not sure what the solution is exactly, I think the == operator could really raise an error instead of eating them all though probably... - Sebastian > And regarding performance, doing simple timings shows a 30%-ish > slowdown for unaligned operations: > > In [36]: %timeit packed_arr['b']**2 > 100 loops, best of 3: 2.48 ms per loop > > In [37]: %timeit aligned_arr['b']**2 > 1000 loops, best of 3: 1.9 ms per loop > > Whereas summing shows just a 10%-ish slowdown: > > In [38]: %timeit packed_arr['b'].sum() > 1000 loops, best of 3: 1.29 ms per loop > > In [39]: %timeit aligned_arr['b'].sum() > 1000 loops, best of 3: 1.14 ms per loop > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From robert.kern at gmail.com Wed Mar 6 17:33:52 2013 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 6 Mar 2013 22:33:52 +0000 Subject: [Numpy-discussion] Numpy deprecation schedule In-Reply-To: References: Message-ID: On Wed, Mar 6, 2013 at 8:09 PM, Nathaniel Smith wrote: > A number of items on the 1.8 todo list are reminders to remove things > that we deprecated in 1.7, and said we would remove in 1.8, e.g.: > https://github.com/numpy/numpy/issues/596 > https://github.com/numpy/numpy/issues/294 > > But, since 1.8 is so soon after 1.7, we probably shouldn't actually do that. > > I suggest we switch to a time-based deprecation schedule, where > instead of saying "this will be removed in N releases" we say "this > will be removed in the first release on or after (now+N months)". We can always delay removal if a particular release comes sooner than originally expected. The deprecation policy is just that we specify minimum version numbers at which the features can be removed. It's not really a firm schedule. I do take your suggestion to heart, though. We shouldn't remove stuff faster than 12 months or so. I just think that it should modify our release process, not our "marking for deprecation" process. -- Robert Kern From njs at pobox.com Wed Mar 6 17:45:53 2013 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 6 Mar 2013 22:45:53 +0000 Subject: [Numpy-discussion] Numpy deprecation schedule In-Reply-To: References: Message-ID: On Wed, Mar 6, 2013 at 10:33 PM, Robert Kern wrote: > On Wed, Mar 6, 2013 at 8:09 PM, Nathaniel Smith wrote: >> A number of items on the 1.8 todo list are reminders to remove things >> that we deprecated in 1.7, and said we would remove in 1.8, e.g.: >> https://github.com/numpy/numpy/issues/596 >> https://github.com/numpy/numpy/issues/294 >> >> But, since 1.8 is so soon after 1.7, we probably shouldn't actually do that. >> >> I suggest we switch to a time-based deprecation schedule, where >> instead of saying "this will be removed in N releases" we say "this >> will be removed in the first release on or after (now+N months)". > > We can always delay removal if a particular release comes sooner than > originally expected. The deprecation policy is just that we specify > minimum version numbers at which the features can be removed. It's not > really a firm schedule. > > I do take your suggestion to heart, though. We shouldn't remove stuff > faster than 12 months or so. I just think that it should modify our > release process, not our "marking for deprecation" process. I'm not sure what this means in practical terms, though? Take the stuff deprecated in 1.7, released 2013-02-10. From here it seems plausible that the first release after 2014-02-10 could be 1.9, 1.10, or even, if we end up really embracing the small-quick-release cycle, 1.11. So which should we write down as our expected version number for the 1.7 deprecations? -n From robert.kern at gmail.com Wed Mar 6 17:53:13 2013 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 6 Mar 2013 22:53:13 +0000 Subject: [Numpy-discussion] Numpy deprecation schedule In-Reply-To: References: Message-ID: On Wed, Mar 6, 2013 at 10:45 PM, Nathaniel Smith wrote: > On Wed, Mar 6, 2013 at 10:33 PM, Robert Kern wrote: >> On Wed, Mar 6, 2013 at 8:09 PM, Nathaniel Smith wrote: >>> A number of items on the 1.8 todo list are reminders to remove things >>> that we deprecated in 1.7, and said we would remove in 1.8, e.g.: >>> https://github.com/numpy/numpy/issues/596 >>> https://github.com/numpy/numpy/issues/294 >>> >>> But, since 1.8 is so soon after 1.7, we probably shouldn't actually do that. >>> >>> I suggest we switch to a time-based deprecation schedule, where >>> instead of saying "this will be removed in N releases" we say "this >>> will be removed in the first release on or after (now+N months)". >> >> We can always delay removal if a particular release comes sooner than >> originally expected. The deprecation policy is just that we specify >> minimum version numbers at which the features can be removed. It's not >> really a firm schedule. >> >> I do take your suggestion to heart, though. We shouldn't remove stuff >> faster than 12 months or so. I just think that it should modify our >> release process, not our "marking for deprecation" process. > > I'm not sure what this means in practical terms, though? Take the > stuff deprecated in 1.7, released 2013-02-10. From here it seems > plausible that the first release after 2014-02-10 could be 1.9, 1.10, > or even, if we end up really embracing the small-quick-release cycle, > 1.11. So which should we write down as our expected version number for > the 1.7 deprecations? If. I would leave the policy alone until we consistently implement such a release cycle that makes it regularly problematic. -- Robert Kern From njs at pobox.com Wed Mar 6 17:56:37 2013 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 6 Mar 2013 22:56:37 +0000 Subject: [Numpy-discussion] Numpy deprecation schedule In-Reply-To: References: Message-ID: On Wed, Mar 6, 2013 at 10:53 PM, Robert Kern wrote: > On Wed, Mar 6, 2013 at 10:45 PM, Nathaniel Smith wrote: >> On Wed, Mar 6, 2013 at 10:33 PM, Robert Kern wrote: >>> On Wed, Mar 6, 2013 at 8:09 PM, Nathaniel Smith wrote: >>>> A number of items on the 1.8 todo list are reminders to remove things >>>> that we deprecated in 1.7, and said we would remove in 1.8, e.g.: >>>> https://github.com/numpy/numpy/issues/596 >>>> https://github.com/numpy/numpy/issues/294 >>>> >>>> But, since 1.8 is so soon after 1.7, we probably shouldn't actually do that. >>>> >>>> I suggest we switch to a time-based deprecation schedule, where >>>> instead of saying "this will be removed in N releases" we say "this >>>> will be removed in the first release on or after (now+N months)". >>> >>> We can always delay removal if a particular release comes sooner than >>> originally expected. The deprecation policy is just that we specify >>> minimum version numbers at which the features can be removed. It's not >>> really a firm schedule. >>> >>> I do take your suggestion to heart, though. We shouldn't remove stuff >>> faster than 12 months or so. I just think that it should modify our >>> release process, not our "marking for deprecation" process. >> >> I'm not sure what this means in practical terms, though? Take the >> stuff deprecated in 1.7, released 2013-02-10. From here it seems >> plausible that the first release after 2014-02-10 could be 1.9, 1.10, >> or even, if we end up really embracing the small-quick-release cycle, >> 1.11. So which should we write down as our expected version number for >> the 1.7 deprecations? > > If. I would leave the policy alone until we consistently implement > such a release cycle that makes it regularly problematic. It's being problematic right now, we need some process in place to handle these bugs through the 1.8 release and to make sure we don't drop them on the floor later... -n From robert.kern at gmail.com Wed Mar 6 18:02:08 2013 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 6 Mar 2013 23:02:08 +0000 Subject: [Numpy-discussion] Numpy deprecation schedule In-Reply-To: References: Message-ID: On Wed, Mar 6, 2013 at 10:56 PM, Nathaniel Smith wrote: > On Wed, Mar 6, 2013 at 10:53 PM, Robert Kern wrote: >> On Wed, Mar 6, 2013 at 10:45 PM, Nathaniel Smith wrote: >>> On Wed, Mar 6, 2013 at 10:33 PM, Robert Kern wrote: >>>> On Wed, Mar 6, 2013 at 8:09 PM, Nathaniel Smith wrote: >>>>> A number of items on the 1.8 todo list are reminders to remove things >>>>> that we deprecated in 1.7, and said we would remove in 1.8, e.g.: >>>>> https://github.com/numpy/numpy/issues/596 >>>>> https://github.com/numpy/numpy/issues/294 >>>>> >>>>> But, since 1.8 is so soon after 1.7, we probably shouldn't actually do that. >>>>> >>>>> I suggest we switch to a time-based deprecation schedule, where >>>>> instead of saying "this will be removed in N releases" we say "this >>>>> will be removed in the first release on or after (now+N months)". >>>> >>>> We can always delay removal if a particular release comes sooner than >>>> originally expected. The deprecation policy is just that we specify >>>> minimum version numbers at which the features can be removed. It's not >>>> really a firm schedule. >>>> >>>> I do take your suggestion to heart, though. We shouldn't remove stuff >>>> faster than 12 months or so. I just think that it should modify our >>>> release process, not our "marking for deprecation" process. >>> >>> I'm not sure what this means in practical terms, though? Take the >>> stuff deprecated in 1.7, released 2013-02-10. From here it seems >>> plausible that the first release after 2014-02-10 could be 1.9, 1.10, >>> or even, if we end up really embracing the small-quick-release cycle, >>> 1.11. So which should we write down as our expected version number for >>> the 1.7 deprecations? >> >> If. I would leave the policy alone until we consistently implement >> such a release cycle that makes it regularly problematic. > > It's being problematic right now, Changing existing process is like automation: don't do it until the problem bites you twice. That's why I suggested that we don't change things until it's *regularly* problematic. > we need some process in place to > handle these bugs through the 1.8 release and to make sure we don't > drop them on the floor later... Bump the milestones to 1.9. -- Robert Kern From jaime.frio at gmail.com Wed Mar 6 18:52:11 2013 From: jaime.frio at gmail.com (=?ISO-8859-1?Q?Jaime_Fern=E1ndez_del_R=EDo?=) Date: Wed, 6 Mar 2013 15:52:11 -0800 Subject: [Numpy-discussion] polyfit with fixed points In-Reply-To: References: Message-ID: On Tue, Mar 5, 2013 at 5:23 AM, Charles R Harris wrote: > > > On Tue, Mar 5, 2013 at 12:41 AM, Jaime Fern?ndez del R?o < > jaime.frio at gmail.com> wrote: > >> On Mon, Mar 4, 2013 at 8:37 PM, Charles R Harris < >> charlesr.harris at gmail.com> wrote: >> >>> >>> There are actually seven versions of polynomial fit, two for the usual >>> polynomial basis, and one each for Legendre, Chebyshev, Hermite, Hermite_e, >>> and Laguerre ;) >>> >> >> Correct me if I am wrong, but the fitted function is the same regardless >> of the polynomial basis used. I don't know if there can be numerical >> stability issues, but chebfit(x, y, n) returns the same as >> poly2cheb(polyfit(x, y, n)). >> >> In any case, with all the already existing support for these special >> polynomials, it wouldn't be too hard to set the problem up to calculate the >> right coefficients directly for each case. >> >> >>> How do you propose to implement it? I think Lagrange multipliers is >>> overkill, I'd rather see using the weights (approximate) or change of >>> variable -- a permutation in this case -- followed by qr and lstsq. >>> >> >> The weights method is already in place, but I find it rather inelegant >> and unsatisfactory as a solution to this problem. But if it is deemed >> sufficient, then there is of course no need to go any further. >> >> I hadn't thought of any other way than using Lagrange multipliers, but >> looking at it in more detail, I am not sure it will be possible to >> formulate it in a manner that can be fed to lstsq, as polyfit does today. >> And if it can't, it probably wouldn't make much sense to have two different >> methods which cannot produce the same full output running under the same >> hood. >> >> I can't figure out your "change of variable" method from the succinct >> description, could you elaborate a little more? >> > > I think the place to add this is to lstsq as linear constraints. That is, > the coefficients must satisfy B * c = y_c for some set of equations B. In > the polynomial case the rows of B would be the powers of x at the points > you want to constrain. Then do an svd on B, B = u * d * v. Apply v to the > design matrix of the unconstrained points A' = A * v.T so that B' becomes > u * d. The coefficients are now replaced by new variables c' with the > contraints in the first two columns. If there are, say, 2 constraints, u * > d will be 2x2. Solve that equation for the first two constraints then > multiply the first two columns of the design matrix A' by the result and > put them on the rhs, i.e., > > y = y - A'[:, :2] * c'[:2] > > then solve the usual l least squares thing with > > A[:, 2:] * c'[2:] = y > > to get the rest of the transformed coefficients c'. Put the coefficients > altogether and multiply with v^T to get > > c = v^T * c' > Very nice, and works beautifully! I have tried the method you describe, and there are a few relevant observations: 1. It gives the exact same result as the Lagrange multiplier approach, which is probably expected, but I wasn't all that sure it would be the case. 2. The result also seems to be to what the sequence of fits giving increasing weights to the fixed points converges to. This image http://i1092.photobucket.com/albums/i412/jfrio/image.png is an example. In there: * blue crosses are the data points to fit to * red points are the fixed points * blue line is the standard polyfit * red line is the constrained polyfit * cyan, magenta, yellow and black are polyfits with weights of 2, 4, 8, 16 for the fixed points, 1 for the rest Seeing this last point, probably the cleanest, least disruptive implementation of this, would be to allow np.inf values in the weights parameter, which would get filtered out, and dealt with in the above manner. So I have two questions: 1. Does this make sense? Or will it be better to make it more explicit, with a 'fixed_points' keyword argument defaulting to None? 2. Once I have this implemented, documented and tested... How do I go about submitting it for consideration? Would a patch be the way to go, or should I fork? Thanks, Jaime > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Wed Mar 6 19:29:24 2013 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 6 Mar 2013 17:29:24 -0700 Subject: [Numpy-discussion] polyfit with fixed points In-Reply-To: References: Message-ID: On Wed, Mar 6, 2013 at 4:52 PM, Jaime Fern?ndez del R?o < jaime.frio at gmail.com> wrote: > On Tue, Mar 5, 2013 at 5:23 AM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Tue, Mar 5, 2013 at 12:41 AM, Jaime Fern?ndez del R?o < >> jaime.frio at gmail.com> wrote: >> >>> On Mon, Mar 4, 2013 at 8:37 PM, Charles R Harris < >>> charlesr.harris at gmail.com> wrote: >>> >>>> >>>> There are actually seven versions of polynomial fit, two for the usual >>>> polynomial basis, and one each for Legendre, Chebyshev, Hermite, Hermite_e, >>>> and Laguerre ;) >>>> >>> >>> Correct me if I am wrong, but the fitted function is the same regardless >>> of the polynomial basis used. I don't know if there can be numerical >>> stability issues, but chebfit(x, y, n) returns the same as >>> poly2cheb(polyfit(x, y, n)). >>> >>> In any case, with all the already existing support for these special >>> polynomials, it wouldn't be too hard to set the problem up to calculate the >>> right coefficients directly for each case. >>> >>> >>>> How do you propose to implement it? I think Lagrange multipliers is >>>> overkill, I'd rather see using the weights (approximate) or change of >>>> variable -- a permutation in this case -- followed by qr and lstsq. >>>> >>> >>> The weights method is already in place, but I find it rather inelegant >>> and unsatisfactory as a solution to this problem. But if it is deemed >>> sufficient, then there is of course no need to go any further. >>> >>> I hadn't thought of any other way than using Lagrange multipliers, but >>> looking at it in more detail, I am not sure it will be possible to >>> formulate it in a manner that can be fed to lstsq, as polyfit does today. >>> And if it can't, it probably wouldn't make much sense to have two different >>> methods which cannot produce the same full output running under the same >>> hood. >>> >>> I can't figure out your "change of variable" method from the succinct >>> description, could you elaborate a little more? >>> >> >> I think the place to add this is to lstsq as linear constraints. That is, >> the coefficients must satisfy B * c = y_c for some set of equations B. In >> the polynomial case the rows of B would be the powers of x at the points >> you want to constrain. Then do an svd on B, B = u * d * v. Apply v to the >> design matrix of the unconstrained points A' = A * v.T so that B' becomes >> u * d. The coefficients are now replaced by new variables c' with the >> contraints in the first two columns. If there are, say, 2 constraints, u * >> d will be 2x2. Solve that equation for the first two constraints then >> multiply the first two columns of the design matrix A' by the result and >> put them on the rhs, i.e., >> >> y = y - A'[:, :2] * c'[:2] >> >> then solve the usual l least squares thing with >> >> A[:, 2:] * c'[2:] = y >> >> to get the rest of the transformed coefficients c'. Put the coefficients >> altogether and multiply with v^T to get >> >> c = v^T * c' >> > > Very nice, and works beautifully! I have tried the method you describe, > and there are a few relevant observations: > > 1. It gives the exact same result as the Lagrange multiplier approach, > which is probably expected, but I wasn't all that sure it would be the case. > It's equivalent, but I'm thinking in algorithmic terms, which is somewhat more specific than the mathematical formulation. > 2. The result also seems to be to what the sequence of fits giving > increasing weights to the fixed points converges to. This image > http://i1092.photobucket.com/albums/i412/jfrio/image.png is an example. > In there: > * blue crosses are the data points to fit to > * red points are the fixed points > * blue line is the standard polyfit > * red line is the constrained polyfit > * cyan, magenta, yellow and black are polyfits with weights of 2, 4, > 8, 16 for the fixed points, 1 for the rest > > Seeing this last point, probably the cleanest, least disruptive > implementation of this, would be to allow np.inf values in the weights > parameter, which would get filtered out, and dealt with in the above manner. > > Interesting idea, I like it. It is less general, but probably all that is needed for polynomial fits. I suppose that after you pull out the relevant rows you can set the weights to zero so that they will have no (order of roundoff) effect on the remaining fit and you don't need to rewrite the design matrix. > So I have two questions: > > 1. Does this make sense? Or will it be better to make it more explicit, > with a 'fixed_points' keyword argument defaulting to None? > 2. Once I have this implemented, documented and tested... How do I go > about submitting it for consideration? Would a patch be the way to go, or > should I fork? > > A fork is definitely the way to go. That makes it easy for folks to review the code and tell you everything you did wrong ;) I think adding linear constraints to lstsq would be good, then the upper level routines can make use of them. Something like a new argument constraints=(B, y), with None the default. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From alan.isaac at gmail.com Thu Mar 7 09:51:35 2013 From: alan.isaac at gmail.com (Alan G Isaac) Date: Thu, 07 Mar 2013 09:51:35 -0500 Subject: [Numpy-discussion] scipy.optimize.fminbound bound violation Message-ID: <5138A977.1090001@gmail.com> Under what conditions should I expect fminbound to call the supplied function with argument values substantially outside the user-provided bounds? Thanks, Alan Isaac From e.antero.tammi at gmail.com Thu Mar 7 11:22:30 2013 From: e.antero.tammi at gmail.com (eat) Date: Thu, 7 Mar 2013 18:22:30 +0200 Subject: [Numpy-discussion] polyfit with fixed points In-Reply-To: References: Message-ID: Hi, On Thu, Mar 7, 2013 at 1:52 AM, Jaime Fern?ndez del R?o < jaime.frio at gmail.com> wrote: > On Tue, Mar 5, 2013 at 5:23 AM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Tue, Mar 5, 2013 at 12:41 AM, Jaime Fern?ndez del R?o < >> jaime.frio at gmail.com> wrote: >> >>> On Mon, Mar 4, 2013 at 8:37 PM, Charles R Harris < >>> charlesr.harris at gmail.com> wrote: >>> >>>> >>>> There are actually seven versions of polynomial fit, two for the usual >>>> polynomial basis, and one each for Legendre, Chebyshev, Hermite, Hermite_e, >>>> and Laguerre ;) >>>> >>> >>> Correct me if I am wrong, but the fitted function is the same regardless >>> of the polynomial basis used. I don't know if there can be numerical >>> stability issues, but chebfit(x, y, n) returns the same as >>> poly2cheb(polyfit(x, y, n)). >>> >>> In any case, with all the already existing support for these special >>> polynomials, it wouldn't be too hard to set the problem up to calculate the >>> right coefficients directly for each case. >>> >>> >>>> How do you propose to implement it? I think Lagrange multipliers is >>>> overkill, I'd rather see using the weights (approximate) or change of >>>> variable -- a permutation in this case -- followed by qr and lstsq. >>>> >>> >>> The weights method is already in place, but I find it rather inelegant >>> and unsatisfactory as a solution to this problem. But if it is deemed >>> sufficient, then there is of course no need to go any further. >>> >>> I hadn't thought of any other way than using Lagrange multipliers, but >>> looking at it in more detail, I am not sure it will be possible to >>> formulate it in a manner that can be fed to lstsq, as polyfit does today. >>> And if it can't, it probably wouldn't make much sense to have two different >>> methods which cannot produce the same full output running under the same >>> hood. >>> >>> I can't figure out your "change of variable" method from the succinct >>> description, could you elaborate a little more? >>> >> >> I think the place to add this is to lstsq as linear constraints. That is, >> the coefficients must satisfy B * c = y_c for some set of equations B. In >> the polynomial case the rows of B would be the powers of x at the points >> you want to constrain. Then do an svd on B, B = u * d * v. Apply v to the >> design matrix of the unconstrained points A' = A * v.T so that B' becomes >> u * d. The coefficients are now replaced by new variables c' with the >> contraints in the first two columns. If there are, say, 2 constraints, u * >> d will be 2x2. Solve that equation for the first two constraints then >> multiply the first two columns of the design matrix A' by the result and >> put them on the rhs, i.e., >> >> y = y - A'[:, :2] * c'[:2] >> >> then solve the usual l least squares thing with >> >> A[:, 2:] * c'[2:] = y >> >> to get the rest of the transformed coefficients c'. Put the coefficients >> altogether and multiply with v^T to get >> >> c = v^T * c' >> > > Very nice, and works beautifully! I have tried the method you describe, > and there are a few relevant observations: > > 1. It gives the exact same result as the Lagrange multiplier approach, > which is probably expected, but I wasn't all that sure it would be the case. > 2. The result also seems to be to what the sequence of fits giving > increasing weights to the fixed points converges to. This image > http://i1092.photobucket.com/albums/i412/jfrio/image.png is an example. > In there: > * blue crosses are the data points to fit to > * red points are the fixed points > * blue line is the standard polyfit > * red line is the constrained polyfit > * cyan, magenta, yellow and black are polyfits with weights of 2, 4, > 8, 16 for the fixed points, 1 for the rest > > Seeing this last point, probably the cleanest, least disruptive > implementation of this, would be to allow np.inf values in the weights > parameter, which would get filtered out, and dealt with in the above manner. > Just to point out that a very simple approach is where one just multiply the constraints with big enough number M, like: In []: def V(x, n= None): ....: """Polynomial package compatible Vandermonde 'matrix'""" ....: return vander(x, n)[:, ::-1] ....: In []: def clsq(A, b, C, d, M= 1e5): ....: """A simple constrained least squared solution of Ax= b, s.t. Cx= d""" ....: return solve(dot(A.T, A)+ M* dot(C.T, C), dot(A.T, b)+ M* dot(C.T, d)) ....: In []: x= linspace(-6, 6, 23) In []: y= sin(x)+ 4e-1* rand(len(x))- 2e-1 In []: x_f, y_f= linspace(-(3./ 2)* pi, (3./ 2)* pi, 4), array([1, -1, 1, -1]) In []: n, x_s= 5, linspace(-6, 6, 123) In []: plot(x, y, 'bo', x_f, y_f, 'bs', x_s, sin(x_s), 'b--') Out[]: In []: for M in 7** (arange(5)): ....: p= Polynomial(clsq(V(x, n), y, V(x_f, n), y_f, M)) ....: plot(x_s, p(x_s)) ....: Out[]: In []: ylim([-2, 2]) Out[]: In []: show() Obviously this is not any 'silver bullet' solution, but simple enough ;-) My 2 cents, -eat > > So I have two questions: > > 1. Does this make sense? Or will it be better to make it more explicit, > with a 'fixed_points' keyword argument defaulting to None? > 2. Once I have this implemented, documented and tested... How do I go > about submitting it for consideration? Would a patch be the way to go, or > should I fork? > > Thanks, > > Jaime > > >> >> Chuck >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > > -- > (\__/) > ( O.o) > ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes > de dominaci?n mundial. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: clsq.png Type: image/png Size: 69891 bytes Desc: not available URL: From charlesr.harris at gmail.com Thu Mar 7 12:07:15 2013 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 7 Mar 2013 10:07:15 -0700 Subject: [Numpy-discussion] polyfit with fixed points In-Reply-To: References: Message-ID: On Thu, Mar 7, 2013 at 9:22 AM, eat wrote: > Hi, > > On Thu, Mar 7, 2013 at 1:52 AM, Jaime Fern?ndez del R?o < > jaime.frio at gmail.com> wrote: > >> On Tue, Mar 5, 2013 at 5:23 AM, Charles R Harris < >> charlesr.harris at gmail.com> wrote: >> >>> >>> >>> On Tue, Mar 5, 2013 at 12:41 AM, Jaime Fern?ndez del R?o < >>> jaime.frio at gmail.com> wrote: >>> >>>> On Mon, Mar 4, 2013 at 8:37 PM, Charles R Harris < >>>> charlesr.harris at gmail.com> wrote: >>>> >>>>> >>>>> There are actually seven versions of polynomial fit, two for the usual >>>>> polynomial basis, and one each for Legendre, Chebyshev, Hermite, Hermite_e, >>>>> and Laguerre ;) >>>>> >>>> >>>> Correct me if I am wrong, but the fitted function is the same >>>> regardless of the polynomial basis used. I don't know if there can be >>>> numerical stability issues, but chebfit(x, y, n) returns the same as >>>> poly2cheb(polyfit(x, y, n)). >>>> >>>> In any case, with all the already existing support for these special >>>> polynomials, it wouldn't be too hard to set the problem up to calculate the >>>> right coefficients directly for each case. >>>> >>>> >>>>> How do you propose to implement it? I think Lagrange multipliers is >>>>> overkill, I'd rather see using the weights (approximate) or change of >>>>> variable -- a permutation in this case -- followed by qr and lstsq. >>>>> >>>> >>>> The weights method is already in place, but I find it rather inelegant >>>> and unsatisfactory as a solution to this problem. But if it is deemed >>>> sufficient, then there is of course no need to go any further. >>>> >>>> I hadn't thought of any other way than using Lagrange multipliers, but >>>> looking at it in more detail, I am not sure it will be possible to >>>> formulate it in a manner that can be fed to lstsq, as polyfit does today. >>>> And if it can't, it probably wouldn't make much sense to have two different >>>> methods which cannot produce the same full output running under the same >>>> hood. >>>> >>>> I can't figure out your "change of variable" method from the succinct >>>> description, could you elaborate a little more? >>>> >>> >>> I think the place to add this is to lstsq as linear constraints. That >>> is, the coefficients must satisfy B * c = y_c for some set of equations B. >>> In the polynomial case the rows of B would be the powers of x at the points >>> you want to constrain. Then do an svd on B, B = u * d * v. Apply v to the >>> design matrix of the unconstrained points A' = A * v.T so that B' becomes >>> u * d. The coefficients are now replaced by new variables c' with the >>> contraints in the first two columns. If there are, say, 2 constraints, u * >>> d will be 2x2. Solve that equation for the first two constraints then >>> multiply the first two columns of the design matrix A' by the result and >>> put them on the rhs, i.e., >>> >>> y = y - A'[:, :2] * c'[:2] >>> >>> then solve the usual l least squares thing with >>> >>> A[:, 2:] * c'[2:] = y >>> >>> to get the rest of the transformed coefficients c'. Put the coefficients >>> altogether and multiply with v^T to get >>> >>> c = v^T * c' >>> >> >> Very nice, and works beautifully! I have tried the method you describe, >> and there are a few relevant observations: >> >> 1. It gives the exact same result as the Lagrange multiplier approach, >> which is probably expected, but I wasn't all that sure it would be the case. >> 2. The result also seems to be to what the sequence of fits giving >> increasing weights to the fixed points converges to. This image >> http://i1092.photobucket.com/albums/i412/jfrio/image.png is an example. >> In there: >> * blue crosses are the data points to fit to >> * red points are the fixed points >> * blue line is the standard polyfit >> * red line is the constrained polyfit >> * cyan, magenta, yellow and black are polyfits with weights of 2, 4, >> 8, 16 for the fixed points, 1 for the rest >> >> Seeing this last point, probably the cleanest, least disruptive >> implementation of this, would be to allow np.inf values in the weights >> parameter, which would get filtered out, and dealt with in the above manner. >> > > Just to point out that a very simple approach is where one just multiply > the constraints with big enough number M, like: > > In []: def V(x, n= None): > ....: """Polynomial package compatible Vandermonde 'matrix'""" > ....: return vander(x, n)[:, ::-1] > Just to note, there is a polyvander in numpy.polynomial.polynomial, and a chebvander in numpy.polynomial.chebyshev, etc. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From francesc at continuum.io Thu Mar 7 12:47:12 2013 From: francesc at continuum.io (Francesc Alted) Date: Thu, 07 Mar 2013 18:47:12 +0100 Subject: [Numpy-discussion] aligned / unaligned structured dtype behavior (was: GSOC 2013) In-Reply-To: References: Message-ID: <5138D2A0.3080802@continuum.io> On 3/6/13 7:42 PM, Kurt Smith wrote: > And regarding performance, doing simple timings shows a 30%-ish > slowdown for unaligned operations: > > In [36]: %timeit packed_arr['b']**2 > 100 loops, best of 3: 2.48 ms per loop > > In [37]: %timeit aligned_arr['b']**2 > 1000 loops, best of 3: 1.9 ms per loop Hmm, that clearly depends on the architecture. On my machine: In [1]: import numpy as np In [2]: aligned_dt = np.dtype([('a', 'i1'), ('b', 'i8')], align=True) In [3]: packed_dt = np.dtype([('a', 'i1'), ('b', 'i8')], align=False) In [4]: aligned_arr = np.ones((10**6,), dtype=aligned_dt) In [5]: packed_arr = np.ones((10**6,), dtype=packed_dt) In [6]: baligned = aligned_arr['b'] In [7]: bpacked = packed_arr['b'] In [8]: %timeit baligned**2 1000 loops, best of 3: 1.96 ms per loop In [9]: %timeit bpacked**2 100 loops, best of 3: 7.84 ms per loop That is, the unaligned column is 4x slower (!). numexpr allows somewhat better results: In [11]: %timeit numexpr.evaluate('baligned**2') 1000 loops, best of 3: 1.13 ms per loop In [12]: %timeit numexpr.evaluate('bpacked**2') 1000 loops, best of 3: 865 us per loop Yes, in this case, the unaligned array goes faster (as much as 30%). I think the reason is that numexpr optimizes the unaligned access by doing a copy of the different chunks in internal buffers that fits in L1 cache. Apparently this is very beneficial in this case (not sure why, though). > > Whereas summing shows just a 10%-ish slowdown: > > In [38]: %timeit packed_arr['b'].sum() > 1000 loops, best of 3: 1.29 ms per loop > > In [39]: %timeit aligned_arr['b'].sum() > 1000 loops, best of 3: 1.14 ms per loop On my machine: In [14]: %timeit baligned.sum() 1000 loops, best of 3: 1.03 ms per loop In [15]: %timeit bpacked.sum() 100 loops, best of 3: 3.79 ms per loop Again, the 4x slowdown is here. Using numexpr: In [16]: %timeit numexpr.evaluate('sum(baligned)') 100 loops, best of 3: 2.16 ms per loop In [17]: %timeit numexpr.evaluate('sum(bpacked)') 100 loops, best of 3: 2.08 ms per loop Again, the unaligned case is (sligthly better). In this case numexpr is a bit slower that NumPy because sum() is not parallelized internally. Hmm, provided that, I'm wondering if some internal copies to L1 in NumPy could help improving unaligned performance. Worth a try? -- Francesc Alted From francesc at continuum.io Thu Mar 7 13:06:03 2013 From: francesc at continuum.io (Francesc Alted) Date: Thu, 07 Mar 2013 19:06:03 +0100 Subject: [Numpy-discussion] aligned / unaligned structured dtype behavior In-Reply-To: <5138D2A0.3080802@continuum.io> References: <5138D2A0.3080802@continuum.io> Message-ID: <5138D70B.2030606@continuum.io> On 3/7/13 6:47 PM, Francesc Alted wrote: > On 3/6/13 7:42 PM, Kurt Smith wrote: >> And regarding performance, doing simple timings shows a 30%-ish >> slowdown for unaligned operations: >> >> In [36]: %timeit packed_arr['b']**2 >> 100 loops, best of 3: 2.48 ms per loop >> >> In [37]: %timeit aligned_arr['b']**2 >> 1000 loops, best of 3: 1.9 ms per loop > > Hmm, that clearly depends on the architecture. On my machine: > > In [1]: import numpy as np > > In [2]: aligned_dt = np.dtype([('a', 'i1'), ('b', 'i8')], align=True) > > In [3]: packed_dt = np.dtype([('a', 'i1'), ('b', 'i8')], align=False) > > In [4]: aligned_arr = np.ones((10**6,), dtype=aligned_dt) > > In [5]: packed_arr = np.ones((10**6,), dtype=packed_dt) > > In [6]: baligned = aligned_arr['b'] > > In [7]: bpacked = packed_arr['b'] > > In [8]: %timeit baligned**2 > 1000 loops, best of 3: 1.96 ms per loop > > In [9]: %timeit bpacked**2 > 100 loops, best of 3: 7.84 ms per loop > > That is, the unaligned column is 4x slower (!). numexpr allows > somewhat better results: > > In [11]: %timeit numexpr.evaluate('baligned**2') > 1000 loops, best of 3: 1.13 ms per loop > > In [12]: %timeit numexpr.evaluate('bpacked**2') > 1000 loops, best of 3: 865 us per loop Just for completeness, here it is what Theano gets: In [18]: import theano In [20]: a = theano.tensor.vector() In [22]: f = theano.function([a], a**2) In [23]: %timeit f(baligned) 100 loops, best of 3: 7.74 ms per loop In [24]: %timeit f(bpacked) 100 loops, best of 3: 12.6 ms per loop So yeah, Theano is also slower for the unaligned case (but less than 2x in this case). > > Yes, in this case, the unaligned array goes faster (as much as 30%). > I think the reason is that numexpr optimizes the unaligned access by > doing a copy of the different chunks in internal buffers that fits in > L1 cache. Apparently this is very beneficial in this case (not sure > why, though). > >> >> Whereas summing shows just a 10%-ish slowdown: >> >> In [38]: %timeit packed_arr['b'].sum() >> 1000 loops, best of 3: 1.29 ms per loop >> >> In [39]: %timeit aligned_arr['b'].sum() >> 1000 loops, best of 3: 1.14 ms per loop > > On my machine: > > In [14]: %timeit baligned.sum() > 1000 loops, best of 3: 1.03 ms per loop > > In [15]: %timeit bpacked.sum() > 100 loops, best of 3: 3.79 ms per loop > > Again, the 4x slowdown is here. Using numexpr: > > In [16]: %timeit numexpr.evaluate('sum(baligned)') > 100 loops, best of 3: 2.16 ms per loop > > In [17]: %timeit numexpr.evaluate('sum(bpacked)') > 100 loops, best of 3: 2.08 ms per loop And with Theano: In [26]: f2 = theano.function([a], a.sum()) In [27]: %timeit f2(baligned) 100 loops, best of 3: 2.52 ms per loop In [28]: %timeit f2(bpacked) 100 loops, best of 3: 7.43 ms per loop Again, the unaligned case is significantly slower (as much as 3x here!). -- Francesc Alted From nouiz at nouiz.org Thu Mar 7 13:26:27 2013 From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=) Date: Thu, 7 Mar 2013 13:26:27 -0500 Subject: [Numpy-discussion] aligned / unaligned structured dtype behavior In-Reply-To: <5138D70B.2030606@continuum.io> References: <5138D2A0.3080802@continuum.io> <5138D70B.2030606@continuum.io> Message-ID: Hi, It is normal that unaligned access are slower. The hardware have been optimized for aligned access. So this is a user choice space vs speed. We can't go around that. We can only minimize the cost of unaligned access in some cases, but not all and those optimization depend of the CPU. But newer CPU have lowered in cost of unaligned access. I'm surprised that Theano worked with the unaligned input. I added some check to make this raise an error, as we do not support that! Francesc, can you check if Theano give the good result? It is possible that someone (maybe me), just copy the input to an aligned ndarray when we receive an not aligned one. That could explain why it worked, but my memory tell me that we raise an error. As you saw in the number, this is a bad example for Theano as the function compiled is too fast . Their is more Theano overhead then computation time in that example. We have reduced recently the overhead, but we can do more to lower it. Fred On Thu, Mar 7, 2013 at 1:06 PM, Francesc Alted wrote: > On 3/7/13 6:47 PM, Francesc Alted wrote: >> On 3/6/13 7:42 PM, Kurt Smith wrote: >>> And regarding performance, doing simple timings shows a 30%-ish >>> slowdown for unaligned operations: >>> >>> In [36]: %timeit packed_arr['b']**2 >>> 100 loops, best of 3: 2.48 ms per loop >>> >>> In [37]: %timeit aligned_arr['b']**2 >>> 1000 loops, best of 3: 1.9 ms per loop >> >> Hmm, that clearly depends on the architecture. On my machine: >> >> In [1]: import numpy as np >> >> In [2]: aligned_dt = np.dtype([('a', 'i1'), ('b', 'i8')], align=True) >> >> In [3]: packed_dt = np.dtype([('a', 'i1'), ('b', 'i8')], align=False) >> >> In [4]: aligned_arr = np.ones((10**6,), dtype=aligned_dt) >> >> In [5]: packed_arr = np.ones((10**6,), dtype=packed_dt) >> >> In [6]: baligned = aligned_arr['b'] >> >> In [7]: bpacked = packed_arr['b'] >> >> In [8]: %timeit baligned**2 >> 1000 loops, best of 3: 1.96 ms per loop >> >> In [9]: %timeit bpacked**2 >> 100 loops, best of 3: 7.84 ms per loop >> >> That is, the unaligned column is 4x slower (!). numexpr allows >> somewhat better results: >> >> In [11]: %timeit numexpr.evaluate('baligned**2') >> 1000 loops, best of 3: 1.13 ms per loop >> >> In [12]: %timeit numexpr.evaluate('bpacked**2') >> 1000 loops, best of 3: 865 us per loop > > Just for completeness, here it is what Theano gets: > > In [18]: import theano > > In [20]: a = theano.tensor.vector() > > In [22]: f = theano.function([a], a**2) > > In [23]: %timeit f(baligned) > 100 loops, best of 3: 7.74 ms per loop > > In [24]: %timeit f(bpacked) > 100 loops, best of 3: 12.6 ms per loop > > So yeah, Theano is also slower for the unaligned case (but less than 2x > in this case). > >> >> Yes, in this case, the unaligned array goes faster (as much as 30%). >> I think the reason is that numexpr optimizes the unaligned access by >> doing a copy of the different chunks in internal buffers that fits in >> L1 cache. Apparently this is very beneficial in this case (not sure >> why, though). >> >>> >>> Whereas summing shows just a 10%-ish slowdown: >>> >>> In [38]: %timeit packed_arr['b'].sum() >>> 1000 loops, best of 3: 1.29 ms per loop >>> >>> In [39]: %timeit aligned_arr['b'].sum() >>> 1000 loops, best of 3: 1.14 ms per loop >> >> On my machine: >> >> In [14]: %timeit baligned.sum() >> 1000 loops, best of 3: 1.03 ms per loop >> >> In [15]: %timeit bpacked.sum() >> 100 loops, best of 3: 3.79 ms per loop >> >> Again, the 4x slowdown is here. Using numexpr: >> >> In [16]: %timeit numexpr.evaluate('sum(baligned)') >> 100 loops, best of 3: 2.16 ms per loop >> >> In [17]: %timeit numexpr.evaluate('sum(bpacked)') >> 100 loops, best of 3: 2.08 ms per loop > > And with Theano: > > In [26]: f2 = theano.function([a], a.sum()) > > In [27]: %timeit f2(baligned) > 100 loops, best of 3: 2.52 ms per loop > > In [28]: %timeit f2(bpacked) > 100 loops, best of 3: 7.43 ms per loop > > Again, the unaligned case is significantly slower (as much as 3x here!). > > -- > Francesc Alted > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From ben.root at ou.edu Thu Mar 7 14:14:28 2013 From: ben.root at ou.edu (Benjamin Root) Date: Thu, 7 Mar 2013 14:14:28 -0500 Subject: [Numpy-discussion] feature tracking in numpy/scipy In-Reply-To: <4CDBEE7F-FF2B-43DD-BFA6-AFF80D5364B8@gmail.com> References: <4CDBEE7F-FF2B-43DD-BFA6-AFF80D5364B8@gmail.com> Message-ID: On Sat, Mar 2, 2013 at 5:32 PM, Scott Collis wrote: > Good afternoon list, > I am looking at feature tracking in a 2D numpy array, along the lines of > Dixon and Wiener 1993 (for tracking precipitating storms) > > Identifying features based on threshold is quite trivial using > ndimage.label > > b_fld=np.zeros(mygrid.fields['rain_rate_A']['data'].shape) > rr=10 > b_fld[mygrid.fields['rain_rate_A']['data'] > rr]=1.0 > labels, numobjects = ndimage.label(b_fld[0,0,:,:]) > (note mygrid.fields['rain_rate_A']['data'] is dimensions time,height, y, x) > > using the matplotlib contouring and fetching the vertices I can get a nice > list of polygons of rain rate above a certain threshold? Now from here I > can just go and implement the Dixon and Wiener methodology but I thought I > would check here first to see if anyone know of a object/feature tracking > algorithm in numpy/scipy or using numpy arrays (it just seems like > something people would want to do!).. i.e. something that looks back and > forward in time and identifies polygon movement and identifies objects with > temporal persistence.. > > Cheers! > Scott > > Dixon, M., and G. Wiener, 1993: TITAN: Thunderstorm Identification, > Tracking, Analysis, and Nowcasting?A Radar-based Methodology. *Journal of > Atmospheric and Oceanic Technology*, *10*, 785?797, > doi:10.1175/1520-0426(1993)010<0785:TTITAA>2.0.CO;2. > > http://journals.ametsoc.org/doi/abs/10.1175/1520-0426%281993%29010%3C0785%3ATTITAA%3E2.0.CO%3B2 > > > Say hello to my PhD project: https://github.com/WeatherGod/ZigZag In it, I have the centroid-tracking portion of the TITAN code, along with SCIT, and hooks into MHT. Several of the dependencies are also available in my repositories. Cheers! Ben P.S. - I have personally met Dr. Dixon on multiple occasions and he is a great guy to work with. Feel free to email him or myself with questions about TITAN. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dagamayank at gmail.com Thu Mar 7 14:36:14 2013 From: dagamayank at gmail.com (Mayank Daga) Date: Thu, 7 Mar 2013 13:36:14 -0600 Subject: [Numpy-discussion] Definition of dot function Message-ID: Hi, Can someone point me to the definition of dot() in the numpy source? The only instance of 'def dot()' I found was in numpy/ma/extras.py but that does not seem to be the correct one. ~mayank -- Mayank Daga "Nothing Succeeds Like Success" -------------- next part -------------- An HTML attachment was scrubbed... URL: From heng at cantab.net Thu Mar 7 15:26:47 2013 From: heng at cantab.net (Henry Gomersall) Date: Thu, 07 Mar 2013 20:26:47 +0000 Subject: [Numpy-discussion] Definition of dot function In-Reply-To: References: Message-ID: <1362688007.3893.6.camel@farnsworth> On Thu, 2013-03-07 at 13:36 -0600, Mayank Daga wrote: > Can someone point me to the definition of dot() in the numpy source? > The only instance of 'def dot()' I found was in numpy/ma/extras.py but > that does not seem to be the correct one. It seems to be in a dynamic library. In [9]: numpy.dot.__module__ Out[9]: 'numpy.core.multiarray' In [10]: numpy.core.multiarray.__file__ Out[10]: '/usr/local/lib/python2.7/dist-packages/numpy/core/multiarray.so' so... in here perhaps? https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/multiarraymodule.c hen From njs at pobox.com Thu Mar 7 17:21:43 2013 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 7 Mar 2013 22:21:43 +0000 Subject: [Numpy-discussion] Definition of dot function In-Reply-To: <1362688007.3893.6.camel@farnsworth> References: <1362688007.3893.6.camel@farnsworth> Message-ID: On 7 Mar 2013 20:27, "Henry Gomersall" wrote: > > On Thu, 2013-03-07 at 13:36 -0600, Mayank Daga wrote: > > Can someone point me to the definition of dot() in the numpy source? > > The only instance of 'def dot()' I found was in numpy/ma/extras.py but > > that does not seem to be the correct one. > > It seems to be in a dynamic library. > > In [9]: numpy.dot.__module__ > Out[9]: 'numpy.core.multiarray' > > In [10]: numpy.core.multiarray.__file__ > Out[10]: > '/usr/local/lib/python2.7/dist-packages/numpy/core/multiarray.so' > > so... in here perhaps? > https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/multiarraymodule.c The actual entry point is array_matrixproduct in that file, which then calls PyArray_MatrixProduct2, which either does the work or dispatches through a dtype-specific function pointer ('dotfunc'). -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From kwmsmith at gmail.com Thu Mar 7 22:14:19 2013 From: kwmsmith at gmail.com (Kurt Smith) Date: Thu, 7 Mar 2013 21:14:19 -0600 Subject: [Numpy-discussion] aligned / unaligned structured dtype behavior (was: GSOC 2013) In-Reply-To: <5138D2A0.3080802@continuum.io> References: <5138D2A0.3080802@continuum.io> Message-ID: On Thu, Mar 7, 2013 at 11:47 AM, Francesc Alted wrote: > On 3/6/13 7:42 PM, Kurt Smith wrote: > > Hmm, that clearly depends on the architecture. On my machine: > ... > That is, the unaligned column is 4x slower (!). numexpr allows somewhat > better results: > ... > Yes, in this case, the unaligned array goes faster (as much as 30%). I > think the reason is that numexpr optimizes the unaligned access by doing > a copy of the different chunks in internal buffers that fits in L1 > cache. Apparently this is very beneficial in this case (not sure why, > though). > > On my machine: > ... > Again, the 4x slowdown is here. Using numexpr: > ... > Again, the unaligned case is (sligthly better). In this case numexpr is > a bit slower that NumPy because sum() is not parallelized internally. > Hmm, provided that, I'm wondering if some internal copies to L1 in NumPy > could help improving unaligned performance. Worth a try? > Very interesting -- thanks for sharing. > -- > Francesc Alted From kwmsmith at gmail.com Thu Mar 7 22:28:22 2013 From: kwmsmith at gmail.com (Kurt Smith) Date: Thu, 7 Mar 2013 21:28:22 -0600 Subject: [Numpy-discussion] aligned / unaligned structured dtype behavior In-Reply-To: References: <5138D2A0.3080802@continuum.io> <5138D70B.2030606@continuum.io> Message-ID: On Thu, Mar 7, 2013 at 12:26 PM, Fr?d?ric Bastien wrote: > Hi, > > It is normal that unaligned access are slower. The hardware have been > optimized for aligned access. So this is a user choice space vs speed. The quantitative difference is still important, so this thread is useful for future reference, I think. If reading in data into a packed array is 3x faster than reading into an aligned array, but the core computation is 4x slower with a packed array...you get the idea. I would have benefitted years ago knowing (1) numpy structured dtypes are packed by default, and (2) computations with unaligned data can be several factors slower than aligned. That's strong motivation to always make sure I'm using 'aligned=True' except when memory usage is an issue, or for file IO with packed binary data, etc. > We can't go around that. We can only minimize the cost of unaligned > access in some cases, but not all and those optimization depend of the > CPU. But newer CPU have lowered in cost of unaligned access. > > I'm surprised that Theano worked with the unaligned input. I added > some check to make this raise an error, as we do not support that! > Francesc, can you check if Theano give the good result? It is possible > that someone (maybe me), just copy the input to an aligned ndarray > when we receive an not aligned one. That could explain why it worked, > but my memory tell me that we raise an error. > > As you saw in the number, this is a bad example for Theano as the > function compiled is too fast . Their is more Theano overhead then > computation time in that example. We have reduced recently the > overhead, but we can do more to lower it. > > Fred > > On Thu, Mar 7, 2013 at 1:06 PM, Francesc Alted wrote: >> On 3/7/13 6:47 PM, Francesc Alted wrote: >>> On 3/6/13 7:42 PM, Kurt Smith wrote: >>>> And regarding performance, doing simple timings shows a 30%-ish >>>> slowdown for unaligned operations: >>>> >>>> In [36]: %timeit packed_arr['b']**2 >>>> 100 loops, best of 3: 2.48 ms per loop >>>> >>>> In [37]: %timeit aligned_arr['b']**2 >>>> 1000 loops, best of 3: 1.9 ms per loop >>> >>> Hmm, that clearly depends on the architecture. On my machine: >>> >>> In [1]: import numpy as np >>> >>> In [2]: aligned_dt = np.dtype([('a', 'i1'), ('b', 'i8')], align=True) >>> >>> In [3]: packed_dt = np.dtype([('a', 'i1'), ('b', 'i8')], align=False) >>> >>> In [4]: aligned_arr = np.ones((10**6,), dtype=aligned_dt) >>> >>> In [5]: packed_arr = np.ones((10**6,), dtype=packed_dt) >>> >>> In [6]: baligned = aligned_arr['b'] >>> >>> In [7]: bpacked = packed_arr['b'] >>> >>> In [8]: %timeit baligned**2 >>> 1000 loops, best of 3: 1.96 ms per loop >>> >>> In [9]: %timeit bpacked**2 >>> 100 loops, best of 3: 7.84 ms per loop >>> >>> That is, the unaligned column is 4x slower (!). numexpr allows >>> somewhat better results: >>> >>> In [11]: %timeit numexpr.evaluate('baligned**2') >>> 1000 loops, best of 3: 1.13 ms per loop >>> >>> In [12]: %timeit numexpr.evaluate('bpacked**2') >>> 1000 loops, best of 3: 865 us per loop >> >> Just for completeness, here it is what Theano gets: >> >> In [18]: import theano >> >> In [20]: a = theano.tensor.vector() >> >> In [22]: f = theano.function([a], a**2) >> >> In [23]: %timeit f(baligned) >> 100 loops, best of 3: 7.74 ms per loop >> >> In [24]: %timeit f(bpacked) >> 100 loops, best of 3: 12.6 ms per loop >> >> So yeah, Theano is also slower for the unaligned case (but less than 2x >> in this case). >> >>> >>> Yes, in this case, the unaligned array goes faster (as much as 30%). >>> I think the reason is that numexpr optimizes the unaligned access by >>> doing a copy of the different chunks in internal buffers that fits in >>> L1 cache. Apparently this is very beneficial in this case (not sure >>> why, though). >>> >>>> >>>> Whereas summing shows just a 10%-ish slowdown: >>>> >>>> In [38]: %timeit packed_arr['b'].sum() >>>> 1000 loops, best of 3: 1.29 ms per loop >>>> >>>> In [39]: %timeit aligned_arr['b'].sum() >>>> 1000 loops, best of 3: 1.14 ms per loop >>> >>> On my machine: >>> >>> In [14]: %timeit baligned.sum() >>> 1000 loops, best of 3: 1.03 ms per loop >>> >>> In [15]: %timeit bpacked.sum() >>> 100 loops, best of 3: 3.79 ms per loop >>> >>> Again, the 4x slowdown is here. Using numexpr: >>> >>> In [16]: %timeit numexpr.evaluate('sum(baligned)') >>> 100 loops, best of 3: 2.16 ms per loop >>> >>> In [17]: %timeit numexpr.evaluate('sum(bpacked)') >>> 100 loops, best of 3: 2.08 ms per loop >> >> And with Theano: >> >> In [26]: f2 = theano.function([a], a.sum()) >> >> In [27]: %timeit f2(baligned) >> 100 loops, best of 3: 2.52 ms per loop >> >> In [28]: %timeit f2(bpacked) >> 100 loops, best of 3: 7.43 ms per loop >> >> Again, the unaligned case is significantly slower (as much as 3x here!). >> >> -- >> Francesc Alted >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From francesc at continuum.io Fri Mar 8 05:22:20 2013 From: francesc at continuum.io (Francesc Alted) Date: Fri, 08 Mar 2013 11:22:20 +0100 Subject: [Numpy-discussion] aligned / unaligned structured dtype behavior In-Reply-To: References: <5138D2A0.3080802@continuum.io> <5138D70B.2030606@continuum.io> Message-ID: <5139BBDC.6090202@continuum.io> On 3/7/13 7:26 PM, Fr?d?ric Bastien wrote: > Hi, > > It is normal that unaligned access are slower. The hardware have been > optimized for aligned access. So this is a user choice space vs speed. > We can't go around that. Well, my benchmarks apparently say that numexpr can get better performance when tackling computations on unaligned arrays (30% faster). This puzzled me a bit yesterday, but after thinking a bit about what was happening, the explanation is clear to me now. The aligned and unaligned arrays were not contiguous, as they had a gap between elements (a consequence of the layout of structure arrays): 8 bytes for the aligned case and 1 byte for the packed one. The hardware of modern machines fetches a complete cache line (64 bytes typically) whenever an element is accessed and that means that, even though we are only making use of one field in the computations, both fields are brought into cache. That means that, for aligned object, 16 MB (16 bytes * 1 million elements) are transmitted to the cache, while the unaligned object only have to transmit 9 MB (9 bytes * 1 million). Of course, transmitting 16 MB is pretty much work than just 9 MB. Now, the elements land in cache aligned for the aligned case and unaligned for the packed case, and as you say, unaligned access in cache is pretty slow for the CPU, and this is the reason why NumPy can take up to 4x more time to perform the computation. So why numexpr is performing much better for the packed case? Well, it turns out that numexpr has machinery to detect that an array is unaligned, and does an internal copy for every block that is brought to the cache to be computed. This block size is between 1024 elements (8 KB for double precision) and 4096 elements when linked with VML support, and that means that a copy normally happens at L1 or L2 cache speed, which is much faster than memory-to-memory copy. After the copy numexpr can perform operations with aligned data at full CPU speed. The paradox is that, by doing more copies, you may end performing faster computations. This is the joy of programming with memory hierarchy in mind. This is to say that there is more in the equation than just if an array is aligned or not. You must take in account how (and how much!) data travels from storage to CPU before making assumptions on the performance of your programs. > We can only minimize the cost of unaligned > access in some cases, but not all and those optimization depend of the > CPU. But newer CPU have lowered in cost of unaligned access. > > I'm surprised that Theano worked with the unaligned input. I added > some check to make this raise an error, as we do not support that! > Francesc, can you check if Theano give the good result? It is possible > that someone (maybe me), just copy the input to an aligned ndarray > when we receive an not aligned one. That could explain why it worked, > but my memory tell me that we raise an error. It seems to work for me: In [10]: f = theano.function([a], a**2) In [11]: f(baligned) Out[11]: array([ 1., 1., 1., ..., 1., 1., 1.]) In [12]: f(bpacked) Out[12]: array([ 1., 1., 1., ..., 1., 1., 1.]) In [13]: f2 = theano.function([a], a.sum()) In [14]: f2(baligned) Out[14]: array(1000000.0) In [15]: f2(bpacked) Out[15]: array(1000000.0) > > As you saw in the number, this is a bad example for Theano as the > function compiled is too fast . Their is more Theano overhead then > computation time in that example. We have reduced recently the > overhead, but we can do more to lower it. Yeah. I was mainly curious about how different packages handle unaligned arrays. -- Francesc Alted From ondrej.certik at gmail.com Fri Mar 8 06:07:02 2013 From: ondrej.certik at gmail.com (=?UTF-8?B?T25kxZllaiDEjGVydMOtaw==?=) Date: Fri, 8 Mar 2013 12:07:02 +0100 Subject: [Numpy-discussion] Scheduling the 1.7.1 and 1.8 releases In-Reply-To: References: Message-ID: On Wed, Mar 6, 2013 at 9:52 PM, Ralf Gommers wrote: > > > > On Wed, Mar 6, 2013 at 9:06 PM, Nathaniel Smith wrote: >> >> On Wed, Mar 6, 2013 at 6:43 PM, Charles R Harris >> wrote: >> > Hi All, >> > >> > There are now some 14 non-merge commits in the 1.7.x branch including >> > the >> > critical diagonal leak fix. I think there is maybe one more critical >> > backport and perhaps several low priority fixes, documentation and such, >> > but >> > I think we should start up the release process with a goal of getting >> > 1.7.1 >> > out by the middle of April. >> >> What's the critical backport you're thinking of? This last shows just >> two backport PRs waiting to be merged, one trivial one that I just >> submitted, the other that needs a tweak but won't take long: >> https://github.com/numpy/numpy/issues?milestone=27&page=1&state=open >> But I agree, basically we should merge those two (today?) and then >> release the first RC as soon as Ondrej has a moment to do so... > > > I added issue 2999, which I think should be taken along. Other than that, +1 > for a quick release. > > >> >> > The development branch has been accumulating stuff since last summer, I >> > suggest we look to get it out in May, branching at the end of this >> > month. >> >> I would say "let's fix the blockers and then branch as soon as Ondrej >> has time to do it", but in practice I suspect this comes out the same >> as what you just said :-). I just pruned the list of blockers; here's >> what we've got: >> https://github.com/numpy/numpy/issues?milestone=1&page=1&state=open > > > It looks like we're not doing so well with setting Milestones correctly. > Only 4 closed issues for 1.8.... > > Release quickly after 1.7.1 sounds good. I hope to finish the rest of issues for 1.7.1 today or tomorrow. Should I release 1.7.1rc1 first? I think that makes sense, just to be sure, right? Ondrej From mdroe at stsci.edu Fri Mar 8 09:33:24 2013 From: mdroe at stsci.edu (Michael Droettboom) Date: Fri, 8 Mar 2013 09:33:24 -0500 Subject: [Numpy-discussion] SciPy John Hunter Excellence in Plotting Contest In-Reply-To: <513928AF.7010201@stsci.edu> References: <513928AF.7010201@stsci.edu> Message-ID: <5139F6B4.2010800@stsci.edu> Apologies for any accidental cross-posting. Email not displaying correctly? View it in your browser. Scientific Computing with Python-Austin, Texas-June 24-29, 2013 SciPy John Hunter Excellence in Plotting Contest In memory of John Hunter, we are pleased to announce the first SciPy John Hunter Excellence in Plotting Competition. This open competition aims to highlight the importance of quality plotting to scientific progress and showcase the capabilities of the current generation of plotting software. Participants are invited to submit scientific plots to be judged by a panel. The winning entries will be announced and displayed at the conference. NumFOCUS is graciously sponsoring cash prizes for the winners in the following amounts: * 1st prize: $500 * 2nd prize: $200 * 3rd prize: $100 Instructions * Entries must be submitted by April 3 via e-mail . * Plots may be produced with any combination of Python-based tools (it is not required that they use matplotlib, for example). * Source code for the plot must be provided, along with a rendering of the plot in a vector format (PDF, PS, etc.). If the data can not be shared for reasons of size or licensing, "fake" data may be substituted, along with an image of the plot using real data. * Entries will be judged on their clarity, innovation and aesthetics, but most importantly for their effectiveness in illuminating real scientific work. Entrants are encouraged to submit plots that were used during the course of research, rather than merely being hypothetical. * SciPy reserves the right to display the entry at the conference, use in any materials or on its website, providing attribution to the original author(s). Important dates: * April 3rd: Plotting submissions due * Monday-Tuesday, June 24 - 25: SciPy 2013 Tutorials, Austin TX * Wednesday-Thursday, June 26 - 27: SciPy 2013 Conference, Austin TX * Winners will be announced during the conference days * Friday-Saturday, June 27 - 28: SciPy 2013 Sprints, Austin TX & remote We look forward to exciting submissions that push the boundaries of plotting, in this, our first attempt at this kind of competition. The SciPy Plotting Contest Organizer -Michael Droettboom, Space Telescope Science Institute You are receiving this email because you subscribed to the mailing list or registered for the SciPy 2010 or SciPy 2011 conference in Austin, TX. Unsubscribe mdboom at gmail.com from this list | Forward to a friend | Update your profile *Our mailing address is:* Enthought, Inc. 515 Congress Ave. Austin, TX 78701 Add us to your address book /Copyright (C) 2013 Enthought, Inc. All rights reserved./ -------------- next part -------------- An HTML attachment was scrubbed... URL: From nouiz at nouiz.org Fri Mar 8 10:16:43 2013 From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=) Date: Fri, 8 Mar 2013 10:16:43 -0500 Subject: [Numpy-discussion] aligned / unaligned structured dtype behavior In-Reply-To: <5139BBDC.6090202@continuum.io> References: <5138D2A0.3080802@continuum.io> <5138D70B.2030606@continuum.io> <5139BBDC.6090202@continuum.io> Message-ID: On Fri, Mar 8, 2013 at 5:22 AM, Francesc Alted wrote: > On 3/7/13 7:26 PM, Fr?d?ric Bastien wrote: >> I'm surprised that Theano worked with the unaligned input. I added >> some check to make this raise an error, as we do not support that! >> Francesc, can you check if Theano give the good result? It is possible >> that someone (maybe me), just copy the input to an aligned ndarray >> when we receive an not aligned one. That could explain why it worked, >> but my memory tell me that we raise an error. > > It seems to work for me: > > In [10]: f = theano.function([a], a**2) > > In [11]: f(baligned) > Out[11]: array([ 1., 1., 1., ..., 1., 1., 1.]) > > In [12]: f(bpacked) > Out[12]: array([ 1., 1., 1., ..., 1., 1., 1.]) > > In [13]: f2 = theano.function([a], a.sum()) > > In [14]: f2(baligned) > Out[14]: array(1000000.0) > > In [15]: f2(bpacked) > Out[15]: array(1000000.0) I understand what happen. You declare the symbolic variable like this: a = theano.tensor.vector() This create a symbolic variable with dtype floatX that is float64 by default. baligned and bpacked are of dtype int64. When a Theano function receive as input an ndarray of the wrong dtype, we try to cast it to the good dtype and check we don't loose precission. As the input are only 1s, there is no lost of precission, so the input is silently accepted and copied. So when we check later for the aligned flags, it pass. If you change the symbolic variable to have a dtype of int64, there won't be a copy and we will see the error: a = theano.tensor.lvector() f = theano.function([a], a ** 2) f(bpacked) TypeError: ('Bad input argument to theano function at index 0(0-based)', 'The numpy.ndarray object is not aligned. Theano C code does not support that.', '', 'object shape', (1000000,), 'object strides', (9,)) If I time now this new function I have: In [14]: timeit baligned**2 100 loops, best of 3: 7.5 ms per loop In [15]: timeit bpacked**2 100 loops, best of 3: 8.25 ms per loop In [16]: timeit f(baligned) 100 loops, best of 3: 7.36 ms per loop So the Theano overhead was the copy in this case. It is not the first time I saw this. We added the automatic cast to allow specifing most python int/list/real as input. Fred From nouiz at nouiz.org Fri Mar 8 10:18:10 2013 From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=) Date: Fri, 8 Mar 2013 10:18:10 -0500 Subject: [Numpy-discussion] aligned / unaligned structured dtype behavior In-Reply-To: References: <5138D2A0.3080802@continuum.io> <5138D70B.2030606@continuum.io> Message-ID: I agree that documenting this better would be useful to many people. So if someone what to summarize this and put it in the doc, I think many people will appreciate this. Fred On Thu, Mar 7, 2013 at 10:28 PM, Kurt Smith wrote: > On Thu, Mar 7, 2013 at 12:26 PM, Fr?d?ric Bastien wrote: >> Hi, >> >> It is normal that unaligned access are slower. The hardware have been >> optimized for aligned access. So this is a user choice space vs speed. > > The quantitative difference is still important, so this thread is > useful for future reference, I think. If reading in data into a > packed array is 3x faster than reading into an aligned array, but the > core computation is 4x slower with a packed array...you get the idea. > > I would have benefitted years ago knowing (1) numpy structured dtypes > are packed by default, and (2) computations with unaligned data can be > several factors slower than aligned. That's strong motivation to > always make sure I'm using 'aligned=True' except when memory usage is > an issue, or for file IO with packed binary data, etc. > >> We can't go around that. We can only minimize the cost of unaligned >> access in some cases, but not all and those optimization depend of the >> CPU. But newer CPU have lowered in cost of unaligned access. >> >> I'm surprised that Theano worked with the unaligned input. I added >> some check to make this raise an error, as we do not support that! >> Francesc, can you check if Theano give the good result? It is possible >> that someone (maybe me), just copy the input to an aligned ndarray >> when we receive an not aligned one. That could explain why it worked, >> but my memory tell me that we raise an error. >> >> As you saw in the number, this is a bad example for Theano as the >> function compiled is too fast . Their is more Theano overhead then >> computation time in that example. We have reduced recently the >> overhead, but we can do more to lower it. >> >> Fred >> >> On Thu, Mar 7, 2013 at 1:06 PM, Francesc Alted wrote: >>> On 3/7/13 6:47 PM, Francesc Alted wrote: >>>> On 3/6/13 7:42 PM, Kurt Smith wrote: >>>>> And regarding performance, doing simple timings shows a 30%-ish >>>>> slowdown for unaligned operations: >>>>> >>>>> In [36]: %timeit packed_arr['b']**2 >>>>> 100 loops, best of 3: 2.48 ms per loop >>>>> >>>>> In [37]: %timeit aligned_arr['b']**2 >>>>> 1000 loops, best of 3: 1.9 ms per loop >>>> >>>> Hmm, that clearly depends on the architecture. On my machine: >>>> >>>> In [1]: import numpy as np >>>> >>>> In [2]: aligned_dt = np.dtype([('a', 'i1'), ('b', 'i8')], align=True) >>>> >>>> In [3]: packed_dt = np.dtype([('a', 'i1'), ('b', 'i8')], align=False) >>>> >>>> In [4]: aligned_arr = np.ones((10**6,), dtype=aligned_dt) >>>> >>>> In [5]: packed_arr = np.ones((10**6,), dtype=packed_dt) >>>> >>>> In [6]: baligned = aligned_arr['b'] >>>> >>>> In [7]: bpacked = packed_arr['b'] >>>> >>>> In [8]: %timeit baligned**2 >>>> 1000 loops, best of 3: 1.96 ms per loop >>>> >>>> In [9]: %timeit bpacked**2 >>>> 100 loops, best of 3: 7.84 ms per loop >>>> >>>> That is, the unaligned column is 4x slower (!). numexpr allows >>>> somewhat better results: >>>> >>>> In [11]: %timeit numexpr.evaluate('baligned**2') >>>> 1000 loops, best of 3: 1.13 ms per loop >>>> >>>> In [12]: %timeit numexpr.evaluate('bpacked**2') >>>> 1000 loops, best of 3: 865 us per loop >>> >>> Just for completeness, here it is what Theano gets: >>> >>> In [18]: import theano >>> >>> In [20]: a = theano.tensor.vector() >>> >>> In [22]: f = theano.function([a], a**2) >>> >>> In [23]: %timeit f(baligned) >>> 100 loops, best of 3: 7.74 ms per loop >>> >>> In [24]: %timeit f(bpacked) >>> 100 loops, best of 3: 12.6 ms per loop >>> >>> So yeah, Theano is also slower for the unaligned case (but less than 2x >>> in this case). >>> >>>> >>>> Yes, in this case, the unaligned array goes faster (as much as 30%). >>>> I think the reason is that numexpr optimizes the unaligned access by >>>> doing a copy of the different chunks in internal buffers that fits in >>>> L1 cache. Apparently this is very beneficial in this case (not sure >>>> why, though). >>>> >>>>> >>>>> Whereas summing shows just a 10%-ish slowdown: >>>>> >>>>> In [38]: %timeit packed_arr['b'].sum() >>>>> 1000 loops, best of 3: 1.29 ms per loop >>>>> >>>>> In [39]: %timeit aligned_arr['b'].sum() >>>>> 1000 loops, best of 3: 1.14 ms per loop >>>> >>>> On my machine: >>>> >>>> In [14]: %timeit baligned.sum() >>>> 1000 loops, best of 3: 1.03 ms per loop >>>> >>>> In [15]: %timeit bpacked.sum() >>>> 100 loops, best of 3: 3.79 ms per loop >>>> >>>> Again, the 4x slowdown is here. Using numexpr: >>>> >>>> In [16]: %timeit numexpr.evaluate('sum(baligned)') >>>> 100 loops, best of 3: 2.16 ms per loop >>>> >>>> In [17]: %timeit numexpr.evaluate('sum(bpacked)') >>>> 100 loops, best of 3: 2.08 ms per loop >>> >>> And with Theano: >>> >>> In [26]: f2 = theano.function([a], a.sum()) >>> >>> In [27]: %timeit f2(baligned) >>> 100 loops, best of 3: 2.52 ms per loop >>> >>> In [28]: %timeit f2(bpacked) >>> 100 loops, best of 3: 7.43 ms per loop >>> >>> Again, the unaligned case is significantly slower (as much as 3x here!). >>> >>> -- >>> Francesc Alted >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From mdroe at stsci.edu Thu Mar 7 18:54:23 2013 From: mdroe at stsci.edu (Michael Droettboom) Date: Thu, 7 Mar 2013 18:54:23 -0500 Subject: [Numpy-discussion] SciPy John Hunter Excellence in Plotting Contest In-Reply-To: References: Message-ID: <513928AF.7010201@stsci.edu> Apologies for any accidental cross-posting. Email not displaying correctly? View it in your browser. Scientific Computing with Python-Austin, Texas-June 24-29, 2013 SciPy John Hunter Excellence in Plotting Contest In memory of John Hunter, we are pleased to announce the first SciPy John Hunter Excellence in Plotting Competition. This open competition aims to highlight the importance of quality plotting to scientific progress and showcase the capabilities of the current generation of plotting software. Participants are invited to submit scientific plots to be judged by a panel. The winning entries will be announced and displayed at the conference. NumFOCUS is graciously sponsoring cash prizes for the winners in the following amounts: * 1st prize: $500 * 2nd prize: $200 * 3rd prize: $100 Instructions * Entries must be submitted by April 3 via e-mail . * Plots may be produced with any combination of Python-based tools (it is not required that they use matplotlib, for example). * Source code for the plot must be provided, along with a rendering of the plot in a vector format (PDF, PS, etc.). If the data can not be shared for reasons of size or licensing, "fake" data may be substituted, along with an image of the plot using real data. * Entries will be judged on their clarity, innovation and aesthetics, but most importantly for their effectiveness in illuminating real scientific work. Entrants are encouraged to submit plots that were used during the course of research, rather than merely being hypothetical. * SciPy reserves the right to display the entry at the conference, use in any materials or on its website, providing attribution to the original author(s). Important dates: * April 3rd: Plotting submissions due * Monday-Tuesday, June 24 - 25: SciPy 2013 Tutorials, Austin TX * Wednesday-Thursday, June 26 - 27: SciPy 2013 Conference, Austin TX * Winners will be announced during the conference days * Friday-Saturday, June 27 - 28: SciPy 2013 Sprints, Austin TX & remote We look forward to exciting submissions that push the boundaries of plotting, in this, our first attempt at this kind of competition. The SciPy Plotting Contest Organizer -Michael Droettboom, Space Telescope Science Institute You are receiving this email because you subscribed to the mailing list or registered for the SciPy 2010 or SciPy 2011 conference in Austin, TX. Unsubscribe mdboom at gmail.com from this list | Forward to a friend | Update your profile *Our mailing address is:* Enthought, Inc. 515 Congress Ave. Austin, TX 78701 Add us to your address book /Copyright (C) 2013 Enthought, Inc. All rights reserved./ -- Michael Droettboom http://www.droettboom.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From sergio.callegari at gmail.com Fri Mar 8 11:23:33 2013 From: sergio.callegari at gmail.com (Sergio Callegari) Date: Fri, 8 Mar 2013 16:23:33 +0000 (UTC) Subject: [Numpy-discussion] Casting and promotion rules (e.g. int + uint64 => float) Message-ID: Hi, I have noticed that numpy introduces some unexpected type casts, that are in some cases problematic. A very weird cast is int + uint64 -> float for instance, consider the following snippet: import numpy as np a=np.uint64(1) a+1 -> 2.0 this cast is quite different from what other programming languages (e.g., C) would do in this case, so it already comes unexpected. Furthermore, an int64 (or an uint64) is too large to fit into a float, hence this automatic conversion also results in data loss! For instance consider: a=np.uint64(18446744073709551614) a+np.uint64(1) -> 18446744073709551615 # CORRECT! a+1 -> 1.8446744073709552e+19 # Actually 1.84467440737095516160e+19 - LOSS OF DATA in fact np.uint64(a+1) -> 0 Weird, isn't it? Another issue is that variables unexpectedly change type with accumulation operators a=np.uint64(1) a+=1 now a is float I believe that some casting/promotion rules should be revised, since they now lead to difficult to catch, intermittent errors. In case this cannot be done immediately, I suggest at least documenting these promotions, providing examples on how to code many conventional tasks. E.g., incrementing an integer of unknown size b=a+type(a)(1) I have also reported this in https://github.com/numpy/numpy/issues/3118 Thanks! From pelson.pub at gmail.com Fri Mar 8 12:38:23 2013 From: pelson.pub at gmail.com (Phil Elson) Date: Fri, 8 Mar 2013 17:38:23 +0000 Subject: [Numpy-discussion] Implementing a "find first" style function In-Reply-To: References: Message-ID: Interesting. I hadn't thought of those. I've implemented (very roughly without a sound logic check) and benchmarked: def my_any(a, predicate, chunk_size=2048): try: next(find(a, predicate, chunk_size)) return True except StopIteration: return False def my_all(a, predicate, chunk_size=2048): return not my_any(a, lambda a: ~predicate(a), chunk_size) With the following setup: import numpy as np import numpy.random np.random.seed(1) a = np.random.randn(1e8) For a low frequency *any*: In [12]: %timeit (np.abs(a) > 6).any() 1 loops, best of 3: 1.29 s per loop In [13]: %timeit my_any(a, lambda a: np.abs(a) > 6) 1 loops, best of 3: 792 ms per loop In [14]: %timeit my_any(a, lambda a: np.abs(a) > 6, chunk_size=10000) 1 loops, best of 3: 654 ms per loop For a False *any*: In [16]: %timeit (np.abs(a) > 7).any() 1 loops, best of 3: 1.22 s per loop In [17]: %timeit my_any(a, lambda a: np.abs(a) > 7) 1 loops, best of 3: 2.4 s per loop For a high probability *any*: In [28]: %timeit (np.abs(a) > 1).any() 1 loops, best of 3: 972 ms per loop In [27]: %timeit my_any(a, lambda a: np.abs(a) > 1) 10000 loops, best of 3: 67 us per loop --------------- For a low probability *all*: In [18]: %timeit (np.abs(a) < 6).all() 1 loops, best of 3: 1.16 s per loop In [19]: %timeit my_all(a, lambda a: np.abs(a) < 6) 1 loops, best of 3: 880 ms per loop In [20]: %timeit my_all(a, lambda a: np.abs(a) < 6, chunk_size=10000) 1 loops, best of 3: 706 ms per loop For a True *all*: In [22]: %timeit (np.abs(a) < 7).all() 1 loops, best of 3: 1.47 s per loop In [23]: %timeit my_all(a, lambda a: np.abs(a) < 7) 1 loops, best of 3: 2.65 s per loop For a high probability *all*: In [25]: %timeit (np.abs(a) < 1).all() 1 loops, best of 3: 978 ms per loop In [26]: %timeit my_all(a, lambda a: np.abs(a) < 1) 10000 loops, best of 3: 73.6 us per loop On 6 March 2013 21:16, Benjamin Root wrote: > > > On Tue, Mar 5, 2013 at 9:15 AM, Phil Elson wrote: > >> The ticket https://github.com/numpy/numpy/issues/2269 discusses the >> possibility of implementing a "find first" style function which can >> optimise the process of finding the first value(s) which match a predicate >> in a given 1D array. For example: >> >> >> >>> a = np.sin(np.linspace(0, np.pi, 200)) >> >>> print find_first(a, lambda a: a > 0.9) >> ((71, ), 0.900479032457) >> >> >> This has been discussed in several locations: >> >> https://github.com/numpy/numpy/issues/2269 >> https://github.com/numpy/numpy/issues/2333 >> >> http://stackoverflow.com/questions/7632963/numpy-array-how-to-find-index-of-first-occurrence-of-item >> >> >> *Rationale* >> >> For small arrays there is no real reason to avoid doing: >> >> >>> a = np.sin(np.linspace(0, np.pi, 200)) >> >>> ind = (a > 0.9).nonzero()[0][0] >> >>> print (ind, ), a[ind] >> (71,) 0.900479032457 >> >> >> But for larger arrays, this can lead to massive amounts of work even if >> the result is one of the first to be computed. Example: >> >> >>> a = np.arange(1e8) >> >>> print (a == 5).nonzero()[0][0] >> 5 >> >> >> So a function which terminates when the first matching value is found is >> desirable. >> >> As mentioned in #2269, it is possible to define a consistent ordering >> which allows this functionality for >1D arrays, but IMHO it overcomplicates >> the problem and was not a case that I personally needed, so I've limited >> the scope to 1D arrays only. >> >> >> *Implementation* >> >> My initial assumption was that to get any kind of performance I would >> need to write the *find* function in C, however after prototyping with >> some array chunking it became apparent that a trivial python function would >> be quick enough for my needs. >> >> The approach I've implemented in the code found in #2269 simply breaks >> the array into sub-arrays of maximum length *chunk_size* (2048 by >> default, though there is no real science to this number), applies the given >> predicating function, and yields the results from *nonzero()*. The given >> function should be a python function which operates on the whole of the >> sub-array element-wise (i.e. the function should be vectorized). Returning >> a generator also has the benefit of allowing users to get the first *n*matching values/indices. >> >> >> *Results* >> >> >> I timed the implementation of *find* found in my comment at >> https://github.com/numpy/numpy/issues/2269#issuecomment-14436725 with an >> obvious test: >> >> >> In [1]: from np_utils import find >> >> In [2]: import numpy as np >> >> In [3]: import numpy.random >> >> In [4]: np.random.seed(1) >> >> In [5]: a = np.random.randn(1e8) >> >> In [6]: a.min(), a.max() >> Out[6]: (-6.1194900990552776, 5.9632246301166321) >> >> In [7]: next(find(a, lambda a: np.abs(a) > 6)) >> Out[7]: ((33105441,), -6.1194900990552776) >> >> In [8]: (np.abs(a) > 6).nonzero() >> Out[8]: (array([33105441]),) >> >> In [9]: %timeit (np.abs(a) > 6).nonzero() >> 1 loops, best of 3: 1.51 s per loop >> >> In [10]: %timeit next(find(a, lambda a: np.abs(a) > 6)) >> 1 loops, best of 3: 912 ms per loop >> >> In [11]: %timeit next(find(a, lambda a: np.abs(a) > 6, chunk_size=100000)) >> 1 loops, best of 3: 470 ms per loop >> >> In [12]: %timeit next(find(a, lambda a: np.abs(a) > 6, >> chunk_size=1000000)) >> 1 loops, best of 3: 483 ms per loop >> >> >> This shows that picking a sensible *chunk_size* can yield massive >> speed-ups (nonzero is x3 slower in one case). A similar example with a much >> smaller 1D array shows similar promise: >> >> In [41]: a = np.random.randn(1e4) >> >> In [42]: %timeit next(find(a, lambda a: np.abs(a) > 3)) >> 10000 loops, best of 3: 35.8 us per loop >> >> In [43]: %timeit (np.abs(a) > 3).nonzero() >> 10000 loops, best of 3: 148 us per loop >> >> >> As I commented on the issue tracker, if you think this function is worth >> taking forward, I'd be happy to open up a pull request. >> >> Feedback greatfully received. >> >> Cheers, >> >> Phil >> >> >> > In the interest of generalizing code and such, could such approaches be > used for functions like np.any() and np.all() for short-circuiting if True > or False (respectively) are found? I wonder what other sort of functions > in NumPy might benefit from this? > > Ben Root > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Fri Mar 8 17:23:11 2013 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 8 Mar 2013 22:23:11 +0000 Subject: [Numpy-discussion] Scheduling the 1.7.1 and 1.8 releases In-Reply-To: References: Message-ID: On Fri, Mar 8, 2013 at 11:07 AM, Ond?ej ?ert?k wrote: > I hope to finish the rest of issues for 1.7.1 today or tomorrow. > Should I release 1.7.1rc1 first? I think that makes sense, just to be > sure, right? Big +1 to doing an RC from me. I guess conceptually this is like we just jumped back in time to right before we released 1.7.0, and merged a bunch more bug-fixes. We'd definitely have done another RC for the new changes then, so we should do one now too :-). -n From sebastian at sipsolutions.net Sat Mar 9 11:17:39 2013 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Sat, 09 Mar 2013 17:17:39 +0100 Subject: [Numpy-discussion] Compile time flag for numpy Message-ID: <1362845859.15128.2.camel@sebastian-laptop> Hey, how would I go about making a compile time flag for numpy to use as a macro? The reason is: https://github.com/numpy/numpy/pull/2735 so that it would be possible to compile numpy differently for debugging if software depending on numpy is broken by this change. Regards, Sebastian From sebastian at sipsolutions.net Sat Mar 9 11:30:51 2013 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Sat, 09 Mar 2013 17:30:51 +0100 Subject: [Numpy-discussion] Compile time flag for numpy In-Reply-To: <1362845859.15128.2.camel@sebastian-laptop> References: <1362845859.15128.2.camel@sebastian-laptop> Message-ID: <1362846651.15128.3.camel@sebastian-laptop> On Sat, 2013-03-09 at 17:17 +0100, Sebastian Berg wrote: > Hey, > > how would I go about making a compile time flag for numpy to use as a > macro? > To be clear I mean an environment variable. > The reason is: https://github.com/numpy/numpy/pull/2735 > > so that it would be possible to compile numpy differently for debugging > if software depending on numpy is broken by this change. > > Regards, > > Sebastian > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From siu at continuum.io Sun Mar 10 14:12:27 2013 From: siu at continuum.io (Siu Kwan Lam) Date: Sun, 10 Mar 2013 13:12:27 -0500 Subject: [Numpy-discussion] Adopt Mersenne Twister 64bit? Message-ID: <662AFAE5-ED48-402F-A8C4-89C87FA0A6CF@continuum.io> Hi all, I am redirecting a discussion on github issue tracker here. My original post (https://github.com/numpy/numpy/issues/3137): "The current implementation of the RNG seems to be MT19937-32. Since 64-bit machines are common nowadays, I am suggesting adding or upgrading to MT19937-64. Thoughts?" Let me start by answering to njsmith's comments on the issue tracker: > Would it be faster? Although I have not benchmarked the 64-bit implementation, it is likely that it will be faster on a 64-bit machine since the number of iteration (controlled by NN and MM in the reference implementation http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/VERSIONS/C-LANG/mt19937-64.c) is reduced by half. In addition, each generation in the 64-bit implementation produces a 64-bit random int which can be used to generate double precision random number. Unlike the 32-bit implementation which requires generating a pair of 32-bit random int. But, on a 32-bit machine, a 64-bit instruction is translated into 4 32-bit instructions; thus, it is likely to be slower. (1) > Use less memory? The amount of memory use will remain the same. The size of the RNG state is the same. > Provide higher quality randomness? My naive answer is that 32-bit and 64-bit implementation have the same 2^19937-1 period. Need to do some research and experiments. > Would it change the output of this program: > import numpy > numpy.random.seed(0) > print numpy.random.random() > ? Unfortunately, yes. The 64-bit implementation generates a different random number sequence with the same seed. (2) My suggestion to overcome (1) and (2) is to allow the user to select between the two implementations (and possibly different algorithms in the future). If user does not provide a choice, we use the MT19937-32 by default. numpy.random.set_state("MT19937_64", ?) # choose the 64-bit implementation Thoughts? Best, Siu -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdirective at gmail.com Sun Mar 10 21:24:59 2013 From: rdirective at gmail.com (QT) Date: Sun, 10 Mar 2013 20:24:59 -0500 Subject: [Numpy-discussion] Numpy 1.7.0 with Intel MKL 11.0.2.146 Message-ID: Dear all, I'm at my wits end. I've followed Intel's own instructionson how to compile Numpy with Intel MKL. Everything compiled and linked fine and I've installed it locally in my user folder...There is one nasty problem. When one calls the numpy library to do some computation, it does not use all of the available threads. I have 8 "cores" on my machine and it only uses 4 of them. The MKL_NUM_THREADS environmental variable can be set to tune the number of threads but setting it to 8 does not change anything. Indeed, setting it to 3 does limit the threads to 3....What is going on? As a comparison, the numpy (version 1.4.1, installed from yum, which uses BLAS+ATLAS) uses all 8 threads. I do not get this. You can run this test program python -mtimeit -s'import numpy as np; a = np.random.randn(1e3,1e3)' 'np.dot(a, a)' There is one saving grace, the local numpy built with MKL is much faster than the system's numpy. I hope someone can help me. Searching the internet has been fruitless. Best, Quyen My site.cfg for numpy (1.7.0) [mkl] library_dirs = /opt/intel/mkl/lib/intel64 include_dirs = /opt/intel/mkl/include mkl_libs = mkl_rt lapack_libs = I've edited line 37 of numpy/distutils/intelcompiler.py self.cc_exe = 'icc -O3 -fPIC -fp-model strict -fomit-frame-pointer -openmp -parallel -DMKL_ILP64' Also line 54 of numpy/distutils/fcompiler/intel.py return ['-i8 -xhost -openmp -fp-model strict'] My .bash_profile also contains the lines: source /opt/intel/bin/compilervars.sh intel64 source /opt/intel/mkl/bin/mklvars.sh intel64 The above is needed to set the LD_LIBRARY_PATH so that Python can source the intel dynamic library when numpy is called. -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at gmail.com Sun Mar 10 23:18:23 2013 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Sun, 10 Mar 2013 23:18:23 -0400 Subject: [Numpy-discussion] Numpy 1.7.0 with Intel MKL 11.0.2.146 In-Reply-To: References: Message-ID: On 3/10/13, QT wrote: > Dear all, > > I'm at my wits end. I've followed Intel's own > instructionson > how to compile Numpy with Intel MKL. Everything compiled and linked > fine and I've installed it locally in my user folder...There is one nasty > problem. When one calls the numpy library to do some computation, it does > not use all of the available threads. I have 8 "cores" on my machine and > it only uses 4 of them. The MKL_NUM_THREADS environmental variable can be > set to tune the number of threads but setting it to 8 does not change > anything. Indeed, setting it to 3 does limit the threads to 3....What is > going on? Does your computer have 8 physical cores, or 4 cores that look like 8 because of hyperthreading? Warren > > As a comparison, the numpy (version 1.4.1, installed from yum, which uses > BLAS+ATLAS) uses all 8 threads. I do not get this. > > You can run this test program > > python -mtimeit -s'import numpy as np; a = np.random.randn(1e3,1e3)' > 'np.dot(a, a)' > > There is one saving grace, the local numpy built with MKL is much faster > than the system's numpy. > > I hope someone can help me. Searching the internet has been fruitless. > > Best, > Quyen > > My site.cfg for numpy (1.7.0) > [mkl] > library_dirs = /opt/intel/mkl/lib/intel64 > include_dirs = /opt/intel/mkl/include > mkl_libs = mkl_rt > lapack_libs = > > I've edited line 37 of numpy/distutils/intelcompiler.py > self.cc_exe = 'icc -O3 -fPIC -fp-model strict -fomit-frame-pointer -openmp > -parallel -DMKL_ILP64' > > Also line 54 of numpy/distutils/fcompiler/intel.py > return ['-i8 -xhost -openmp -fp-model strict'] > > My .bash_profile also contains the lines: > source /opt/intel/bin/compilervars.sh intel64 > source /opt/intel/mkl/bin/mklvars.sh intel64 > > The above is needed to set the LD_LIBRARY_PATH so that Python can source > the intel dynamic library when numpy is called. > From warren.weckesser at gmail.com Sun Mar 10 23:31:15 2013 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Sun, 10 Mar 2013 23:31:15 -0400 Subject: [Numpy-discussion] Numpy 1.7.0 with Intel MKL 11.0.2.146 In-Reply-To: References: Message-ID: On 3/10/13, Warren Weckesser wrote: > On 3/10/13, QT wrote: >> Dear all, >> >> I'm at my wits end. I've followed Intel's own >> instructionson >> how to compile Numpy with Intel MKL. Everything compiled and linked >> fine and I've installed it locally in my user folder...There is one nasty >> problem. When one calls the numpy library to do some computation, it >> does >> not use all of the available threads. I have 8 "cores" on my machine and >> it only uses 4 of them. The MKL_NUM_THREADS environmental variable can >> be >> set to tune the number of threads but setting it to 8 does not change >> anything. Indeed, setting it to 3 does limit the threads to 3....What is >> going on? > > > Does your computer have 8 physical cores, or 4 cores that look like 8 > because of hyperthreading? > Here's why I ask this: http://software.intel.com/en-us/forums/topic/294954 > Warren > > >> >> As a comparison, the numpy (version 1.4.1, installed from yum, which uses >> BLAS+ATLAS) uses all 8 threads. I do not get this. >> >> You can run this test program >> >> python -mtimeit -s'import numpy as np; a = np.random.randn(1e3,1e3)' >> 'np.dot(a, a)' >> >> There is one saving grace, the local numpy built with MKL is much faster >> than the system's numpy. >> >> I hope someone can help me. Searching the internet has been fruitless. >> >> Best, >> Quyen >> >> My site.cfg for numpy (1.7.0) >> [mkl] >> library_dirs = /opt/intel/mkl/lib/intel64 >> include_dirs = /opt/intel/mkl/include >> mkl_libs = mkl_rt >> lapack_libs = >> >> I've edited line 37 of numpy/distutils/intelcompiler.py >> self.cc_exe = 'icc -O3 -fPIC -fp-model strict -fomit-frame-pointer >> -openmp >> -parallel -DMKL_ILP64' >> >> Also line 54 of numpy/distutils/fcompiler/intel.py >> return ['-i8 -xhost -openmp -fp-model strict'] >> >> My .bash_profile also contains the lines: >> source /opt/intel/bin/compilervars.sh intel64 >> source /opt/intel/mkl/bin/mklvars.sh intel64 >> >> The above is needed to set the LD_LIBRARY_PATH so that Python can source >> the intel dynamic library when numpy is called. >> > From rdirective at gmail.com Sun Mar 10 23:38:11 2013 From: rdirective at gmail.com (QT) Date: Sun, 10 Mar 2013 22:38:11 -0500 Subject: [Numpy-discussion] Numpy 1.7.0 with Intel MKL 11.0.2.146 In-Reply-To: References: Message-ID: Dear Warren, It's an Intel i7 950, 4 cores, 8 with hyper-threading. I used MKL 11.0.2.146, but I will read your link. It seems spot on. Best, Quyen On Sun, Mar 10, 2013 at 10:31 PM, Warren Weckesser < warren.weckesser at gmail.com> wrote: > On 3/10/13, Warren Weckesser wrote: > > On 3/10/13, QT wrote: > >> Dear all, > >> > >> I'm at my wits end. I've followed Intel's own > >> instructions< > http://software.intel.com/en-us/articles/numpyscipy-with-intel-mkl>on > >> how to compile Numpy with Intel MKL. Everything compiled and linked > >> fine and I've installed it locally in my user folder...There is one > nasty > >> problem. When one calls the numpy library to do some computation, it > >> does > >> not use all of the available threads. I have 8 "cores" on my machine > and > >> it only uses 4 of them. The MKL_NUM_THREADS environmental variable can > >> be > >> set to tune the number of threads but setting it to 8 does not change > >> anything. Indeed, setting it to 3 does limit the threads to 3....What > is > >> going on? > > > > > > Does your computer have 8 physical cores, or 4 cores that look like 8 > > because of hyperthreading? > > > > > Here's why I ask this: http://software.intel.com/en-us/forums/topic/294954 > > > > Warren > > > > > >> > >> As a comparison, the numpy (version 1.4.1, installed from yum, which > uses > >> BLAS+ATLAS) uses all 8 threads. I do not get this. > >> > >> You can run this test program > >> > >> python -mtimeit -s'import numpy as np; a = np.random.randn(1e3,1e3)' > >> 'np.dot(a, a)' > >> > >> There is one saving grace, the local numpy built with MKL is much faster > >> than the system's numpy. > >> > >> I hope someone can help me. Searching the internet has been fruitless. > >> > >> Best, > >> Quyen > >> > >> My site.cfg for numpy (1.7.0) > >> [mkl] > >> library_dirs = /opt/intel/mkl/lib/intel64 > >> include_dirs = /opt/intel/mkl/include > >> mkl_libs = mkl_rt > >> lapack_libs = > >> > >> I've edited line 37 of numpy/distutils/intelcompiler.py > >> self.cc_exe = 'icc -O3 -fPIC -fp-model strict -fomit-frame-pointer > >> -openmp > >> -parallel -DMKL_ILP64' > >> > >> Also line 54 of numpy/distutils/fcompiler/intel.py > >> return ['-i8 -xhost -openmp -fp-model strict'] > >> > >> My .bash_profile also contains the lines: > >> source /opt/intel/bin/compilervars.sh intel64 > >> source /opt/intel/mkl/bin/mklvars.sh intel64 > >> > >> The above is needed to set the LD_LIBRARY_PATH so that Python can source > >> the intel dynamic library when numpy is called. > >> > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Mon Mar 11 05:46:54 2013 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 11 Mar 2013 09:46:54 +0000 Subject: [Numpy-discussion] Adopt Mersenne Twister 64bit? In-Reply-To: <662AFAE5-ED48-402F-A8C4-89C87FA0A6CF@continuum.io> References: <662AFAE5-ED48-402F-A8C4-89C87FA0A6CF@continuum.io> Message-ID: On Sun, Mar 10, 2013 at 6:12 PM, Siu Kwan Lam wrote: > Hi all, > > I am redirecting a discussion on github issue tracker here. My original > post (https://github.com/numpy/numpy/issues/3137): > > "The current implementation of the RNG seems to be MT19937-32. Since 64-bit > machines are common nowadays, I am suggesting adding or upgrading to > MT19937-64. Thoughts?" > > Let me start by answering to njsmith's comments on the issue tracker: > > Would it be faster? > > > Although I have not benchmarked the 64-bit implementation, it is likely that > it will be faster on a 64-bit machine since the number of iteration > (controlled by NN and MM in the reference implementation > http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/VERSIONS/C-LANG/mt19937-64.c) > is reduced by half. In addition, each generation in the 64-bit > implementation produces a 64-bit random int which can be used to generate > double precision random number. Unlike the 32-bit implementation which > requires generating a pair of 32-bit random int. >From the last time this was brought up, it looks like getting a single 64-bit integer out from MT19937-64 takes about the same amount of time as getting a single 32-bit integer from MT19937-32, perhaps a little slower, even on a 64-bit machine. http://comments.gmane.org/gmane.comp.python.numeric.general/27773 So getting a single double would be not quite twice as fast. > But, on a 32-bit machine, a 64-bit instruction is translated into 4 32-bit > instructions; thus, it is likely to be slower. (1) > > Use less memory? > > > The amount of memory use will remain the same. The size of the RNG state is > the same. > > Provide higher quality randomness? > > > My naive answer is that 32-bit and 64-bit implementation have the same > 2^19937-1 period. Need to do some research and experiments. > > Would it change the output of this program: import numpy > numpy.random.seed(0) print numpy.random.random() ? > > > Unfortunately, yes. The 64-bit implementation generates a different random > number sequence with the same seed. (2) > > > My suggestion to overcome (1) and (2) is to allow the user to select between > the two implementations (and possibly different algorithms in the future). > If user does not provide a choice, we use the MT19937-32 by default. > > numpy.random.set_state("MT19937_64", ?) # choose the 64-bit > implementation Most likely, the different PRNGs should be different subclasses of RandomState. The module-level convenience API should probably be left alone. If you need to control the PRNG that you are using, you really need to be passing around a RandomState instance and not relying on reseeding the shared global instance. Aside: I really wish we hadn't exposed `set_state()` in the module API. It's an attractive nuisance. There is some low-level C work that needs to be done to allow the non-uniform distributions to be shared between implementations of the core uniform PRNG, but that's the same no matter how you organize the upper layer. -- Robert Kern From chris.barker at noaa.gov Mon Mar 11 13:07:05 2013 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Mon, 11 Mar 2013 10:07:05 -0700 Subject: [Numpy-discussion] Casting and promotion rules (e.g. int + uint64 => float) In-Reply-To: References: Message-ID: On Fri, Mar 8, 2013 at 8:23 AM, Sergio Callegari wrote: > I have noticed that numpy introduces some unexpected type casts, that are > in some cases problematic. There has been a lot of discussion about casting on this list in the last couple months -- I suggest you peruse that discussion and see what conclusions it has lead to. > A very weird cast is > > int + uint64 -> float I think the idea here is that an int can hold negative numbers, so you can't put it in a uint64 -- but you can't put a uint64 into a signed int64. A float64 can hold the range of numbers of both a int and uint64, so it is used, even though it can't hold the full precision of a uint64 (far from it!) > Another issue is that variables unexpectedly change type with accumulation > operators > > a=np.uint64(1) > a+=1 > > now a is float yeah -- that should NEVER happen -- += is supposed to be an iin=place operator, it should never change the array! However, what you've crated here is not an array, but a numpy scalar, and the rules are different there (but should they be?). I suspect that part of the issue is that array scalars behave a bit more like the built-in numpy number types, and thus += is not an in-place operator, but rather, translates to: a = a + 1 and as you've seen, that casts to a float64. A little test: In [34]: d = np.int64(2) In [35]: e = d # e and d are the same object In [36]: d += 1 In [37]: e is d Out[37]: False # they are not longer the same object -- the += created a new object In [38]: type(d) Out[38]: numpy.int64 # even though it's still the same type (no casting needed) If you do use an array, you don't get casting with +=: In [39]: a = np.array((1,), dtype=np.uint64) In [40]: a Out[40]: array([1], dtype=uint64) In [41]: a + 1.0 Out[41]: array([ 2.]) # got a cast with the additon and creation of a new array In [42]: a += 1.0 In [43]: a Out[43]: array([2], dtype=uint64) # but no cast with the in-place operator. Personally, I think the "in-place" operators should be just that -- and only work for mutable objects, but I guess the ability to easily increment in integer was just too tempting! -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From sergio.callegari at gmail.com Mon Mar 11 14:17:22 2013 From: sergio.callegari at gmail.com (Sergio Callegari) Date: Mon, 11 Mar 2013 18:17:22 +0000 (UTC) Subject: [Numpy-discussion] Casting and promotion rules (e.g. int + uint64 => float) References: Message-ID: Thanks for the explanation. Chris Barker - NOAA Federal noaa.gov> writes: > There has been a lot of discussion about casting on this list in the > last couple months -- I suggest you peruse that discussion and see > what conclusions it has lead to. I'll look at it. My message to the ml followed an invitation to do so after I posted a bug about weird castings. > > int + uint64 -> float > > I think the idea here is that an int can hold negative numbers, so you > can't put it in a uint64 -- but you can't put a uint64 into a signed > int64. A float64 can hold the range of numbers of both a int and > uint64, so it is used, even though it can't hold the full precision > of a uint64 (far from it!) I understand the good intention. Yet, this does not follow the principle of least surprise. This is not what most other languages (possibly following C) would do and, most important, dealing with integers, one expects overflows and wraparounds, not certainly a loss of precision. Another issue is that the promotion rule breaks indexing a = np.uint64(1) b=[0,1,2,3,4,5] b[a] -> 1 # OK b[a+1] -> Error I really would like to suggest changing this behavior. Thanks Sergio From wfspotz at sandia.gov Mon Mar 11 23:55:58 2013 From: wfspotz at sandia.gov (Bill Spotz) Date: Mon, 11 Mar 2013 21:55:58 -0600 Subject: [Numpy-discussion] Request code review of numpy.i changes Message-ID: https://github.com/wfspotz/numpy/compare/numpy-swig ** Bill Spotz ** ** Sandia National Laboratories Voice: (505)845-0170 ** ** P.O. Box 5800 Fax: (505)284-0154 ** ** Albuquerque, NM 87185-0370 Email: wfspotz at sandia.gov ** From soumendotganguly at gmail.com Tue Mar 12 03:20:21 2013 From: soumendotganguly at gmail.com (soumen ganguly) Date: Tue, 12 Mar 2013 12:50:21 +0530 Subject: [Numpy-discussion] unclear output format for numpy.argmax() Message-ID: Hello, There are some doubts that i have regarding the argmax() method of numpy.As described in reference doc's of numpy,argmax(axis=None,out=None) returns the indices of the maximum value along the given axis(In this case 0 is default). So, i tried to implement the method to a 2d array with elements say,[[1,2,3],[4,5,6]] along the axis 1.The output to this code is [2,2] and when i implement it along the axis 0,it outputs [1,1,1].I dont see the connection to this output with the scope of argmax method. I would appreciate a detailed insight to the argmax method. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjhnson at gmail.com Tue Mar 12 03:40:14 2013 From: tjhnson at gmail.com (T J) Date: Tue, 12 Mar 2013 02:40:14 -0500 Subject: [Numpy-discussion] Vectorize and ufunc attribute Message-ID: Prior to 1.7, I had working compatibility code such as the following: if has_good_functions: # http://projects.scipy.org/numpy/ticket/1096 from numpy import logaddexp, logaddexp2 else: logaddexp = vectorize(_logaddexp, otypes=[numpy.float64]) logaddexp2 = vectorize(_logaddexp2, otypes=[numpy.float64]) # Run these at least once so that .ufunc.reduce exists logaddexp([1.,2.,3.],[1.,2.,3.]) logaddexp2([1.,2.,3.],[1.,2.,3.]) # And then make reduce available at the top level logaddexp.reduce = logaddexp.ufunc.reduce logaddexp2.reduce = logaddexp2.ufunc.reduce The point was that I wanted to treat the output of vectorize as a hacky drop-in replacement for a ufunc. In 1.7, I discovered that vectorize had changed (https://github.com/numpy/numpy/pull/290), and now there is no longer a ufunc attribute at all. Should this be added back in? Besides hackish drop-in replacements, I see value in to being able to call reduce, accumulate, etc (when possible) on the output of vectorize(). -------------- next part -------------- An HTML attachment was scrubbed... URL: From toddrjen at gmail.com Tue Mar 12 03:48:36 2013 From: toddrjen at gmail.com (Todd) Date: Tue, 12 Mar 2013 08:48:36 +0100 Subject: [Numpy-discussion] unclear output format for numpy.argmax() In-Reply-To: References: Message-ID: On Tue, Mar 12, 2013 at 8:20 AM, soumen ganguly wrote: > Hello, > > There are some doubts that i have regarding the argmax() method of > numpy.As described in reference doc's of numpy,argmax(axis=None,out=None) > returns the indices of the maximum value along the given axis(In this case > 0 is default). > > So, i tried to implement the method to a 2d array with elements > say,[[1,2,3],[4,5,6]] along the axis 1.The output to this code is [2,2] and > when i implement it along the axis 0,it outputs [1,1,1].I dont see the > connection to this output with the scope of argmax method. > > I would appreciate a detailed insight to the argmax method. > > I am not sure I understand the question. For axis 0 (the "outer" dimension in the way it is printed) the things being compared are argmax([1, 4]), argmax(([2, 5]), and argmax([3, 6]).. Amongst those, the second (index 1) is higher in each case, so it returns [1, 1, 1]. With axis 1 (the "inner" dimension in the way it is printed) , the things being compared are argmax([1, 2, 3]) and argmax([4, 5, 6]). In both case the third (index 2) is the highest, so it returns [2, 2]. What is unexpected about this? -------------- next part -------------- An HTML attachment was scrubbed... URL: From dineshbvadhia at hotmail.com Tue Mar 12 09:01:59 2013 From: dineshbvadhia at hotmail.com (Dinesh B Vadhia) Date: Tue, 12 Mar 2013 06:01:59 -0700 Subject: [Numpy-discussion] Yes, this one again "ImportError: No module named multiarray" Message-ID: I've been using Numpy/Scipy for >5 years so know a little on how to get around them. Recently, I've needed to either freeze or create executables with tools such as PyInstaller, Cython, Py2exe and others on both Windows (XP 32-bit, 7 64-bit) and Ubuntu (12.04) Linux (64-bit). The test program (which runs perfectly with the Python interpreter) is very simple: import numpy def main(): print numpy.array([12, 23, 34, 45, 56, 67, 78, 89, 90]) return if __name__ == '__main__': main() The software versions are Python 2.7.3, Numpy 1.7.0, and Scipy 0.11. The "import numpy" causes an "ImportError: No module named multiarray". After endless Googling, I am none the wiser about what (really) causes the ImportError let alone what the solution is. The Traceback, similar to others found on the web, is: Traceback (most recent call last): File "test.py", ... File "C:\Python27\lib\site-packages\numpy\__init__.py", line 154, in import add_newdocs File "C:\Python27\lib\site-packages\numpy\add_newdocs.py", line 9, in from numpy.lib import add_newdoc File "C:\Python27\lib\site-packages\numpy\lib\__init__.py", line 4, in from type_check import * File "C:\Python27\lib\site-packages\numpy\lib\type_check.py", line 8, in import numpy.core.numeric as _nx File "C:\Python27\lib\site-packages\numpy\core\__init__.py", line 5, in import multiarray ImportError: No module named multiarray. Could someone shed some light on this - please? Thx. -------------- next part -------------- An HTML attachment was scrubbed... URL: From aron at ahmadia.net Tue Mar 12 09:17:34 2013 From: aron at ahmadia.net (Aron Ahmadia) Date: Tue, 12 Mar 2013 13:17:34 +0000 Subject: [Numpy-discussion] Yes, this one again "ImportError: No module named multiarray" In-Reply-To: References: Message-ID: multiarray is an extension module that lives within numpy/core, that is, when, "import multiarray" is called, (and it's the first imported extension module in numpy), multiarray.ext (ext being dll on Windows I guess), gets dynamically loaded. "No module named multiarray" is indicating problems with your freeze setup. Most of these tools don't support locally imported extension modules. Does this help you get oriented on your problem? A On Tue, Mar 12, 2013 at 1:01 PM, Dinesh B Vadhia wrote: > ** > I've been using Numpy/Scipy for >5 years so know a little on how to get > around them. Recently, I've needed to either freeze or create executables > with tools such as PyInstaller, Cython, Py2exe and others on both Windows > (XP 32-bit, 7 64-bit) and Ubuntu (12.04) Linux (64-bit). The test > program (which runs perfectly with the Python interpreter) is very simple: > > import numpy > > def main(): > print numpy.array([12, 23, 34, 45, 56, 67, 78, 89, 90]) > return > > if __name__ == '__main__': > main() > > The software versions are Python 2.7.3, Numpy 1.7.0, and Scipy 0.11. The > "import numpy" causes an "ImportError: No module named multiarray". After > endless Googling, I am none the wiser about what (really) causes the > ImportError let alone what the solution is. The Traceback, similar to > others found on the web, is: > > Traceback (most recent call last): > File "test.py", ... > File "C:\Python27\lib\site-packages\numpy\__init__.py", line 154, in > > import add_newdocs > File "C:\Python27\lib\site-packages\numpy\add_newdocs.py", line 9, in > > from numpy.lib import add_newdoc > File "C:\Python27\lib\site-packages\numpy\lib\__init__.py", line 4, in > > from type_check import * > File "C:\Python27\lib\site-packages\numpy\lib\type_check.py", line 8, in > > import numpy.core.numeric as _nx > File "C:\Python27\lib\site-packages\numpy\core\__init__.py", line 5, in > > import multiarray > ImportError: No module named multiarray. > > Could someone shed some light on this - please? Thx. > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ezindy at gmail.com Tue Mar 12 09:27:24 2013 From: ezindy at gmail.com (Egor Zindy) Date: Tue, 12 Mar 2013 13:27:24 +0000 Subject: [Numpy-discussion] Request code review of numpy.i changes In-Reply-To: References: Message-ID: Thanks Bill, I wasn't happy with my use of either PyCObject_FromVoidPtr or PyArray_BASE. Both are now deprecated. So I updated all the ARGOUTVIEWM_ definitions with %#ifdef SWIGPY_USE_CAPSULE PyObject* cap = PyCapsule_New((void*)(*$1), SWIGPY_CAPSULE_NAME, SWIG_Python_DestroyModule); %#else PyObject* cap = PyCObject_FromVoidPtr((void*)(*$1), SWIG_Python_DestroyModule); %#endif %#if NPY_API_VERSION < 0x00000007 PyArray_BASE(array) = cap; %#else PyArray_SetBaseObject(array,cap); %#endif This could probably be improved with the use of a macro, and checking the returned value of PyArray_SetBaseObject wouldn't hurt either. Anyway, it's a start. Hopefully I haven't messed my use of either SWIGPY_CAPSULE_NAME or SWIG_Python_DestroyModule here. Other changes I made relate to various warnings, in particular relating to the use of SWIG_Python_AppendOutput($result, XXX) where XXX should be a PyObject but was a PyArrayObject. In ARGOUTVIEW / ARGOUTVIEWM typedefs, I made sure there was a PyObject* obj = PyArray_SimpleNewFromData(3, dims, DATA_TYPECODE, (void*)(*$1)); PyArrayObject* array = (PyArrayObject*) obj; which allows me to then use (instead of ,array) $result = SWIG_Python_AppendOutput($result,obj); In the other few other instances where this construct doesn't apply (ARGOUT_ARRAY1 for example) I used typecasting $result = SWIG_Python_AppendOutput($result,(PyObject*)array$argnum); I can't think of anything else at this stage. Kind regards, Egor On 12 March 2013 03:55, Bill Spotz wrote: > > https://github.com/wfspotz/numpy/compare/numpy-swig > > ** Bill Spotz ** > ** Sandia National Laboratories Voice: (505)845-0170 ** > ** P.O. Box 5800 Fax: (505)284-0154 ** > ** Albuquerque, NM 87185-0370 Email: wfspotz at sandia.gov ** > > -------------- next part -------------- A non-text attachment was scrubbed... Name: numpy.i Type: application/octet-stream Size: 97352 bytes Desc: not available URL: From dineshbvadhia at hotmail.com Tue Mar 12 10:05:30 2013 From: dineshbvadhia at hotmail.com (Dinesh B Vadhia) Date: Tue, 12 Mar 2013 07:05:30 -0700 Subject: [Numpy-discussion] Yes, this one again "ImportError: No module named multiarray" In-Reply-To: References: Message-ID: Does that mean numpy won't work with freeze/create_executable type of tools or is there a workaround? From: Aron Ahmadia Sent: Tuesday, March 12, 2013 6:17 AM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] Yes,this one again "ImportError: No module named multiarray" multiarray is an extension module that lives within numpy/core, that is, when, "import multiarray" is called, (and it's the first imported extension module in numpy), multiarray.ext (ext being dll on Windows I guess), gets dynamically loaded. "No module named multiarray" is indicating problems with your freeze setup. Most of these tools don't support locally imported extension modules. Does this help you get oriented on your problem? A On Tue, Mar 12, 2013 at 1:01 PM, Dinesh B Vadhia wrote: I've been using Numpy/Scipy for >5 years so know a little on how to get around them. Recently, I've needed to either freeze or create executables with tools such as PyInstaller, Cython, Py2exe and others on both Windows (XP 32-bit, 7 64-bit) and Ubuntu (12.04) Linux (64-bit). The test program (which runs perfectly with the Python interpreter) is very simple: import numpy def main(): print numpy.array([12, 23, 34, 45, 56, 67, 78, 89, 90]) return if __name__ == '__main__': main() The software versions are Python 2.7.3, Numpy 1.7.0, and Scipy 0.11. The "import numpy" causes an "ImportError: No module named multiarray". After endless Googling, I am none the wiser about what (really) causes the ImportError let alone what the solution is. The Traceback, similar to others found on the web, is: Traceback (most recent call last): File "test.py", ... File "C:\Python27\lib\site-packages\numpy\__init__.py", line 154, in import add_newdocs File "C:\Python27\lib\site-packages\numpy\add_newdocs.py", line 9, in from numpy.lib import add_newdoc File "C:\Python27\lib\site-packages\numpy\lib\__init__.py", line 4, in from type_check import * File "C:\Python27\lib\site-packages\numpy\lib\type_check.py", line 8, in import numpy.core.numeric as _nx File "C:\Python27\lib\site-packages\numpy\core\__init__.py", line 5, in import multiarray ImportError: No module named multiarray. Could someone shed some light on this - please? Thx. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From aron at ahmadia.net Tue Mar 12 10:08:32 2013 From: aron at ahmadia.net (Aron Ahmadia) Date: Tue, 12 Mar 2013 14:08:32 +0000 Subject: [Numpy-discussion] (@Pat Marion) Re: Yes, this one again "ImportError: No module named multiarray" Message-ID: Pat Marion at Kitware did some work on this, I'm pinging him on the thread. A On Tue, Mar 12, 2013 at 2:05 PM, Dinesh B Vadhia wrote: > ** > Does that mean numpy won't work with freeze/create_executable type of > tools or is there a workaround? > > > *From:* Aron Ahmadia > *Sent:* Tuesday, March 12, 2013 6:17 AM > *To:* Discussion of Numerical Python > *Subject:* Re: [Numpy-discussion] Yes,this one again "ImportError: No > module named multiarray" > > multiarray is an extension module that lives within numpy/core, that is, > when, "import multiarray" is called, (and it's the first imported extension > module in numpy), multiarray.ext (ext being dll on Windows I guess), gets > dynamically loaded. > > "No module named multiarray" is indicating problems with your freeze > setup. Most of these tools don't support locally imported extension > modules. > > Does this help you get oriented on your problem? > > A > > > On Tue, Mar 12, 2013 at 1:01 PM, Dinesh B Vadhia < > dineshbvadhia at hotmail.com> wrote: > >> ** >> I've been using Numpy/Scipy for >5 years so know a little on how to get >> around them. Recently, I've needed to either freeze or create executables >> with tools such as PyInstaller, Cython, Py2exe and others on both Windows >> (XP 32-bit, 7 64-bit) and Ubuntu (12.04) Linux (64-bit). The test >> program (which runs perfectly with the Python interpreter) is very simple: >> >> import numpy >> >> def main(): >> print numpy.array([12, 23, 34, 45, 56, 67, 78, 89, 90]) >> return >> >> if __name__ == '__main__': >> main() >> >> The software versions are Python 2.7.3, Numpy 1.7.0, and Scipy 0.11. The >> "import numpy" causes an "ImportError: No module named multiarray". After >> endless Googling, I am none the wiser about what (really) causes the >> ImportError let alone what the solution is. The Traceback, similar to >> others found on the web, is: >> >> Traceback (most recent call last): >> File "test.py", ... >> File "C:\Python27\lib\site-packages\numpy\__init__.py", line 154, in >> >> import add_newdocs >> File "C:\Python27\lib\site-packages\numpy\add_newdocs.py", line 9, in >> >> from numpy.lib import add_newdoc >> File "C:\Python27\lib\site-packages\numpy\lib\__init__.py", line 4, in >> >> from type_check import * >> File "C:\Python27\lib\site-packages\numpy\lib\type_check.py", line 8, >> in >> import numpy.core.numeric as _nx >> File "C:\Python27\lib\site-packages\numpy\core\__init__.py", line 5, in >> >> import multiarray >> ImportError: No module named multiarray. >> >> Could someone shed some light on this - please? Thx. >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pat.marion at kitware.com Tue Mar 12 10:23:53 2013 From: pat.marion at kitware.com (Pat Marion) Date: Wed, 13 Mar 2013 00:23:53 +1000 Subject: [Numpy-discussion] (@Pat Marion) Re: Yes, this one again "ImportError: No module named multiarray" In-Reply-To: References: Message-ID: Thanks for copying me, Aron. Hi Dinesh, I have a github project which demonstrates how to use numpy with freeze. The project's readme includes more information: https://github.com/patmarion/NumpyBuiltinExample It does require a small patch to CPython's import.c file. I haven't tried posted this patch to the CPython developers, perhaps there'd be interest incorporating it upstream. Pat On Wed, Mar 13, 2013 at 12:08 AM, Aron Ahmadia wrote: > Pat Marion at Kitware did some work on this, I'm pinging him on the thread. > > A > > > On Tue, Mar 12, 2013 at 2:05 PM, Dinesh B Vadhia < > dineshbvadhia at hotmail.com> wrote: > >> ** >> Does that mean numpy won't work with freeze/create_executable type of >> tools or is there a workaround? >> >> >> *From:* Aron Ahmadia >> *Sent:* Tuesday, March 12, 2013 6:17 AM >> *To:* Discussion of Numerical Python >> *Subject:* Re: [Numpy-discussion] Yes,this one again "ImportError: No >> module named multiarray" >> >> multiarray is an extension module that lives within numpy/core, that is, >> when, "import multiarray" is called, (and it's the first imported extension >> module in numpy), multiarray.ext (ext being dll on Windows I guess), gets >> dynamically loaded. >> >> "No module named multiarray" is indicating problems with your freeze >> setup. Most of these tools don't support locally imported extension >> modules. >> >> Does this help you get oriented on your problem? >> >> A >> >> >> On Tue, Mar 12, 2013 at 1:01 PM, Dinesh B Vadhia < >> dineshbvadhia at hotmail.com> wrote: >> >>> ** >>> I've been using Numpy/Scipy for >5 years so know a little on how to get >>> around them. Recently, I've needed to either freeze or create executables >>> with tools such as PyInstaller, Cython, Py2exe and others on both Windows >>> (XP 32-bit, 7 64-bit) and Ubuntu (12.04) Linux (64-bit). The test >>> program (which runs perfectly with the Python interpreter) is very simple: >>> >>> import numpy >>> >>> def main(): >>> print numpy.array([12, 23, 34, 45, 56, 67, 78, 89, 90]) >>> return >>> >>> if __name__ == '__main__': >>> main() >>> >>> The software versions are Python 2.7.3, Numpy 1.7.0, and Scipy 0.11. >>> The "import numpy" causes an "ImportError: No module named multiarray". After >>> endless Googling, I am none the wiser about what (really) causes the >>> ImportError let alone what the solution is. The Traceback, similar to >>> others found on the web, is: >>> >>> Traceback (most recent call last): >>> File "test.py", ... >>> File "C:\Python27\lib\site-packages\numpy\__init__.py", line 154, in >>> >>> import add_newdocs >>> File "C:\Python27\lib\site-packages\numpy\add_newdocs.py", line 9, in >>> >>> from numpy.lib import add_newdoc >>> File "C:\Python27\lib\site-packages\numpy\lib\__init__.py", line 4, in >>> >>> from type_check import * >>> File "C:\Python27\lib\site-packages\numpy\lib\type_check.py", line 8, >>> in >>> import numpy.core.numeric as _nx >>> File "C:\Python27\lib\site-packages\numpy\core\__init__.py", line 5, >>> in >>> import multiarray >>> ImportError: No module named multiarray. >>> >>> Could someone shed some light on this - please? Thx. >>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brad.froehle at gmail.com Tue Mar 12 10:59:53 2013 From: brad.froehle at gmail.com (Bradley M. Froehle) Date: Tue, 12 Mar 2013 07:59:53 -0700 Subject: [Numpy-discussion] Vectorize and ufunc attribute In-Reply-To: References: Message-ID: T J: You may want to look into `numpy.frompyfunc` ( http://docs.scipy.org/doc/numpy/reference/generated/numpy.frompyfunc.html). -Brad On Tue, Mar 12, 2013 at 12:40 AM, T J wrote: > Prior to 1.7, I had working compatibility code such as the following: > > > if has_good_functions: > # http://projects.scipy.org/numpy/ticket/1096 > from numpy import logaddexp, logaddexp2 > else: > logaddexp = vectorize(_logaddexp, otypes=[numpy.float64]) > logaddexp2 = vectorize(_logaddexp2, otypes=[numpy.float64]) > > # Run these at least once so that .ufunc.reduce exists > logaddexp([1.,2.,3.],[1.,2.,3.]) > logaddexp2([1.,2.,3.],[1.,2.,3.]) > > # And then make reduce available at the top level > logaddexp.reduce = logaddexp.ufunc.reduce > logaddexp2.reduce = logaddexp2.ufunc.reduce > > > The point was that I wanted to treat the output of vectorize as a hacky > drop-in replacement for a ufunc. In 1.7, I discovered that vectorize had > changed (https://github.com/numpy/numpy/pull/290), and now there is no > longer a ufunc attribute at all. > > Should this be added back in? Besides hackish drop-in replacements, I see > value in to being able to call reduce, accumulate, etc (when possible) on > the output of vectorize(). > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Tue Mar 12 17:25:44 2013 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 12 Mar 2013 21:25:44 +0000 Subject: [Numpy-discussion] Adopt Mersenne Twister 64bit? In-Reply-To: References: <662AFAE5-ED48-402F-A8C4-89C87FA0A6CF@continuum.io> Message-ID: On Mon, Mar 11, 2013 at 9:46 AM, Robert Kern wrote: > On Sun, Mar 10, 2013 at 6:12 PM, Siu Kwan Lam wrote: >> My suggestion to overcome (1) and (2) is to allow the user to select between >> the two implementations (and possibly different algorithms in the future). >> If user does not provide a choice, we use the MT19937-32 by default. >> >> numpy.random.set_state("MT19937_64", ?) # choose the 64-bit >> implementation > > Most likely, the different PRNGs should be different subclasses of > RandomState. The module-level convenience API should probably be left > alone. If you need to control the PRNG that you are using, you really > need to be passing around a RandomState instance and not relying on > reseeding the shared global instance. +1 > Aside: I really wish we hadn't > exposed `set_state()` in the module API. It's an attractive nuisance. And our own test suite is a serious offender in this regard, we have tests that fail if you run the test suite in a non-default order... https://github.com/numpy/numpy/issues/347 I wonder if we dare deprecate it? The whole idea of a global random state is just a bad one, like every other sort of global shared state. But it's one that's deeply baked into a lot of scientific programmers expectations about how APIs work... -n From njs at pobox.com Tue Mar 12 17:27:35 2013 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 12 Mar 2013 21:27:35 +0000 Subject: [Numpy-discussion] Adopt Mersenne Twister 64bit? In-Reply-To: References: <662AFAE5-ED48-402F-A8C4-89C87FA0A6CF@continuum.io> Message-ID: On Tue, Mar 12, 2013 at 9:25 PM, Nathaniel Smith wrote: > On Mon, Mar 11, 2013 at 9:46 AM, Robert Kern wrote: >> On Sun, Mar 10, 2013 at 6:12 PM, Siu Kwan Lam wrote: >>> My suggestion to overcome (1) and (2) is to allow the user to select between >>> the two implementations (and possibly different algorithms in the future). >>> If user does not provide a choice, we use the MT19937-32 by default. >>> >>> numpy.random.set_state("MT19937_64", ?) # choose the 64-bit >>> implementation >> >> Most likely, the different PRNGs should be different subclasses of >> RandomState. The module-level convenience API should probably be left >> alone. If you need to control the PRNG that you are using, you really >> need to be passing around a RandomState instance and not relying on >> reseeding the shared global instance. > > +1 > >> Aside: I really wish we hadn't >> exposed `set_state()` in the module API. It's an attractive nuisance. > > And our own test suite is a serious offender in this regard, we have > tests that fail if you run the test suite in a non-default order... > https://github.com/numpy/numpy/issues/347 > > I wonder if we dare deprecate it? The whole idea of a global random > state is just a bad one, like every other sort of global shared state. > But it's one that's deeply baked into a lot of scientific programmers > expectations about how APIs work... (To be clear, by 'it' here I meant np.random.set_seed(), not the whole np.random API. Probably. And by 'deprecate' I mean 'whine loudly in some fashion when people use it', not 'rip out in a few releases'. I think.) -n From chris.barker at noaa.gov Tue Mar 12 17:50:54 2013 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Tue, 12 Mar 2013 14:50:54 -0700 Subject: [Numpy-discussion] Yes, this one again "ImportError: No module named multiarray" In-Reply-To: References: Message-ID: On Tue, Mar 12, 2013 at 7:05 AM, Dinesh B Vadhia wrote: > Does that mean numpy won't work with freeze/create_executable type of tools > or is there a workaround? I've used numpy with py2exe and py2app out of the box with no issues ( actually, there is an issue with too much stuff getting bundled up, but it works) >> ImportError let alone what the solution is. The Traceback, similar to >> others found on the web, is: >> >> Traceback (most recent call last): >> File "test.py", ... >> File "C:\Python27\lib\site-packages\numpy\__init__.py", line 154, in >> This indicates that your code is importing the numpy that's inside the system installation -- it should be using one in your app bundle. What bundling tool are you using? How did you install python/numpy? What does your bundling tol config look like? And, of course, version numbers of everything. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From josef.pktd at gmail.com Tue Mar 12 18:37:37 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 12 Mar 2013 18:37:37 -0400 Subject: [Numpy-discussion] Adopt Mersenne Twister 64bit? In-Reply-To: References: <662AFAE5-ED48-402F-A8C4-89C87FA0A6CF@continuum.io> Message-ID: On Tue, Mar 12, 2013 at 5:27 PM, Nathaniel Smith wrote: > On Tue, Mar 12, 2013 at 9:25 PM, Nathaniel Smith wrote: >> On Mon, Mar 11, 2013 at 9:46 AM, Robert Kern wrote: >>> On Sun, Mar 10, 2013 at 6:12 PM, Siu Kwan Lam wrote: >>>> My suggestion to overcome (1) and (2) is to allow the user to select between >>>> the two implementations (and possibly different algorithms in the future). >>>> If user does not provide a choice, we use the MT19937-32 by default. >>>> >>>> numpy.random.set_state("MT19937_64", ?) # choose the 64-bit >>>> implementation >>> >>> Most likely, the different PRNGs should be different subclasses of >>> RandomState. The module-level convenience API should probably be left >>> alone. If you need to control the PRNG that you are using, you really >>> need to be passing around a RandomState instance and not relying on >>> reseeding the shared global instance. >> >> +1 >> >>> Aside: I really wish we hadn't >>> exposed `set_state()` in the module API. It's an attractive nuisance. Here is a recipe how to use it http://mail.scipy.org/pipermail/numpy-discussion/2010-September/052911.html (I'm just drawing a random number as seed that I can save, instead of the entire state.) Josef >> >> And our own test suite is a serious offender in this regard, we have >> tests that fail if you run the test suite in a non-default order... >> https://github.com/numpy/numpy/issues/347 >> >> I wonder if we dare deprecate it? The whole idea of a global random >> state is just a bad one, like every other sort of global shared state. >> But it's one that's deeply baked into a lot of scientific programmers >> expectations about how APIs work... > > (To be clear, by 'it' here I meant np.random.set_seed(), not the whole > np.random API. Probably. And by 'deprecate' I mean 'whine loudly in > some fashion when people use it', not 'rip out in a few releases'. I > think.) > > -n > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From ndbecker2 at gmail.com Tue Mar 12 18:38:54 2013 From: ndbecker2 at gmail.com (Neal Becker) Date: Tue, 12 Mar 2013 18:38:54 -0400 Subject: [Numpy-discussion] Adopt Mersenne Twister 64bit? References: <662AFAE5-ED48-402F-A8C4-89C87FA0A6CF@continuum.io> Message-ID: Nathaniel Smith wrote: > On Tue, Mar 12, 2013 at 9:25 PM, Nathaniel Smith wrote: >> On Mon, Mar 11, 2013 at 9:46 AM, Robert Kern wrote: >>> On Sun, Mar 10, 2013 at 6:12 PM, Siu Kwan Lam wrote: >>>> My suggestion to overcome (1) and (2) is to allow the user to select >>>> between the two implementations (and possibly different algorithms in the >>>> future). If user does not provide a choice, we use the MT19937-32 by >>>> default. >>>> >>>> numpy.random.set_state("MT19937_64", ?) # choose the 64-bit >>>> implementation >>> >>> Most likely, the different PRNGs should be different subclasses of >>> RandomState. The module-level convenience API should probably be left >>> alone. If you need to control the PRNG that you are using, you really >>> need to be passing around a RandomState instance and not relying on >>> reseeding the shared global instance. >> >> +1 >> >>> Aside: I really wish we hadn't >>> exposed `set_state()` in the module API. It's an attractive nuisance. >> >> And our own test suite is a serious offender in this regard, we have >> tests that fail if you run the test suite in a non-default order... >> https://github.com/numpy/numpy/issues/347 >> >> I wonder if we dare deprecate it? The whole idea of a global random >> state is just a bad one, like every other sort of global shared state. >> But it's one that's deeply baked into a lot of scientific programmers >> expectations about how APIs work... > > (To be clear, by 'it' here I meant np.random.set_seed(), not the whole > np.random API. Probably. And by 'deprecate' I mean 'whine loudly in > some fashion when people use it', not 'rip out in a few releases'. I > think.) > > -n What do you mean that the idea of global shared state is a bad one? How would you prefer the API to look? An alternative is a stateless rng, where you have to pass it it's state on each invocation, which it would update and return. I hope you're not advocating that. From robert.kern at gmail.com Tue Mar 12 19:10:04 2013 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 12 Mar 2013 23:10:04 +0000 Subject: [Numpy-discussion] Adopt Mersenne Twister 64bit? In-Reply-To: References: <662AFAE5-ED48-402F-A8C4-89C87FA0A6CF@continuum.io> Message-ID: On Tue, Mar 12, 2013 at 10:38 PM, Neal Becker wrote: > Nathaniel Smith wrote: > >> On Tue, Mar 12, 2013 at 9:25 PM, Nathaniel Smith wrote: >>> On Mon, Mar 11, 2013 at 9:46 AM, Robert Kern wrote: >>>> On Sun, Mar 10, 2013 at 6:12 PM, Siu Kwan Lam wrote: >>>>> My suggestion to overcome (1) and (2) is to allow the user to select >>>>> between the two implementations (and possibly different algorithms in the >>>>> future). If user does not provide a choice, we use the MT19937-32 by >>>>> default. >>>>> >>>>> numpy.random.set_state("MT19937_64", ?) # choose the 64-bit >>>>> implementation >>>> >>>> Most likely, the different PRNGs should be different subclasses of >>>> RandomState. The module-level convenience API should probably be left >>>> alone. If you need to control the PRNG that you are using, you really >>>> need to be passing around a RandomState instance and not relying on >>>> reseeding the shared global instance. >>> >>> +1 >>> >>>> Aside: I really wish we hadn't >>>> exposed `set_state()` in the module API. It's an attractive nuisance. >>> >>> And our own test suite is a serious offender in this regard, we have >>> tests that fail if you run the test suite in a non-default order... >>> https://github.com/numpy/numpy/issues/347 >>> >>> I wonder if we dare deprecate it? The whole idea of a global random >>> state is just a bad one, like every other sort of global shared state. >>> But it's one that's deeply baked into a lot of scientific programmers >>> expectations about how APIs work... >> >> (To be clear, by 'it' here I meant np.random.set_seed(), not the whole >> np.random API. Probably. And by 'deprecate' I mean 'whine loudly in >> some fashion when people use it', not 'rip out in a few releases'. I >> think.) >> >> -n > > What do you mean that the idea of global shared state is a bad one? The words "global shared state" drives fear into the hearts of experienced programmers everywhere, whatever the context. :-) It's rarely a *good* idea. > How would > you prefer the API to look? There are two current APIs: 1. Instantiate RandomState and call it's methods 2. Just call the functions in numpy.random The latter has a shared global state. In fact, all of those "functions" are just references to the methods on a shared global RandomState instance. We advocate using the former API. Note that it already exists. It was the recommended API from day one. No one is recommending adding a new API. > An alternative is a stateless rng, where you have > to pass it it's state on each invocation, which it would update and return. I > hope you're not advocating that. No. This is a place where OOP solved the problem neatly. -- Robert Kern From ndbecker2 at gmail.com Tue Mar 12 20:16:12 2013 From: ndbecker2 at gmail.com (Neal Becker) Date: Tue, 12 Mar 2013 20:16:12 -0400 Subject: [Numpy-discussion] Adopt Mersenne Twister 64bit? References: <662AFAE5-ED48-402F-A8C4-89C87FA0A6CF@continuum.io> Message-ID: I guess I talked to you about 100 years ago about sharing state between numpy rng and code I have in c++ that wraps boost::random. So is there a C-api for this RandomState object I could use to call from c++? Maybe I could do something with that. The c++ code could invoke via the python api, but that might be slower. I'm just rambling here, I'd have to see the API to get some ideas. From tjhnson at gmail.com Tue Mar 12 20:16:52 2013 From: tjhnson at gmail.com (T J) Date: Tue, 12 Mar 2013 19:16:52 -0500 Subject: [Numpy-discussion] Vectorize and ufunc attribute In-Reply-To: References: Message-ID: On Tue, Mar 12, 2013 at 9:59 AM, Bradley M. Froehle wrote: > T J: > > You may want to look into `numpy.frompyfunc` ( > http://docs.scipy.org/doc/numpy/reference/generated/numpy.frompyfunc.html > ). > > Yeah that's better, but it doesn't respect the output type of the function. Be nice if this supported the otypes keyword. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndbecker2 at gmail.com Tue Mar 12 20:33:19 2013 From: ndbecker2 at gmail.com (Neal Becker) Date: Tue, 12 Mar 2013 20:33:19 -0400 Subject: [Numpy-discussion] Adopt Mersenne Twister 64bit? References: <662AFAE5-ED48-402F-A8C4-89C87FA0A6CF@continuum.io> Message-ID: Neal Becker wrote: > I guess I talked to you about 100 years ago about sharing state between numpy > rng and code I have in c++ that wraps boost::random. So is there a C-api for > this RandomState object I could use to call from c++? Maybe I could do > something with that. > > The c++ code could invoke via the python api, but that might be slower. I'm > just rambling here, I'd have to see the API to get some ideas. I think if I could just grab a long int from the underlying mersenne twister, through some c api? From josef.pktd at gmail.com Tue Mar 12 20:48:14 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 12 Mar 2013 20:48:14 -0400 Subject: [Numpy-discussion] Adopt Mersenne Twister 64bit? In-Reply-To: References: <662AFAE5-ED48-402F-A8C4-89C87FA0A6CF@continuum.io> Message-ID: On Tue, Mar 12, 2013 at 7:10 PM, Robert Kern wrote: > On Tue, Mar 12, 2013 at 10:38 PM, Neal Becker wrote: >> Nathaniel Smith wrote: >> >>> On Tue, Mar 12, 2013 at 9:25 PM, Nathaniel Smith wrote: >>>> On Mon, Mar 11, 2013 at 9:46 AM, Robert Kern wrote: >>>>> On Sun, Mar 10, 2013 at 6:12 PM, Siu Kwan Lam wrote: >>>>>> My suggestion to overcome (1) and (2) is to allow the user to select >>>>>> between the two implementations (and possibly different algorithms in the >>>>>> future). If user does not provide a choice, we use the MT19937-32 by >>>>>> default. >>>>>> >>>>>> numpy.random.set_state("MT19937_64", ?) # choose the 64-bit >>>>>> implementation >>>>> >>>>> Most likely, the different PRNGs should be different subclasses of >>>>> RandomState. The module-level convenience API should probably be left >>>>> alone. If you need to control the PRNG that you are using, you really >>>>> need to be passing around a RandomState instance and not relying on >>>>> reseeding the shared global instance. >>>> >>>> +1 >>>> >>>>> Aside: I really wish we hadn't >>>>> exposed `set_state()` in the module API. It's an attractive nuisance. >>>> >>>> And our own test suite is a serious offender in this regard, we have >>>> tests that fail if you run the test suite in a non-default order... >>>> https://github.com/numpy/numpy/issues/347 >>>> >>>> I wonder if we dare deprecate it? The whole idea of a global random >>>> state is just a bad one, like every other sort of global shared state. >>>> But it's one that's deeply baked into a lot of scientific programmers >>>> expectations about how APIs work... >>> >>> (To be clear, by 'it' here I meant np.random.set_seed(), not the whole >>> np.random API. Probably. And by 'deprecate' I mean 'whine loudly in >>> some fashion when people use it', not 'rip out in a few releases'. I >>> think.) >>> >>> -n >> >> What do you mean that the idea of global shared state is a bad one? > > The words "global shared state" drives fear into the hearts of > experienced programmers everywhere, whatever the context. :-) It's > rarely a *good* idea. > >> How would >> you prefer the API to look? > > There are two current APIs: > > 1. Instantiate RandomState and call it's methods > 2. Just call the functions in numpy.random > > The latter has a shared global state. In fact, all of those > "functions" are just references to the methods on a shared global > RandomState instance. > > We advocate using the former API. Note that it already exists. It was > the recommended API from day one. No one is recommending adding a new > API. I never saw much advertising for the RandomState api, and until recently wasn't sure why using the global random state function np.random.norm, ... should be a bad idea. Learning by example, and seeing almost all examples using the global state, is not exactly conducive to figuring out that there is an issue. All of scipy.stats.distribution random numbers are using the global random state. (I guess I should open a ticket.) Josef > >> An alternative is a stateless rng, where you have >> to pass it it's state on each invocation, which it would update and return. I >> hope you're not advocating that. > > No. This is a place where OOP solved the problem neatly. > > -- > Robert Kern > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From jaakko.luttinen at aalto.fi Wed Mar 13 05:15:36 2013 From: jaakko.luttinen at aalto.fi (Jaakko Luttinen) Date: Wed, 13 Mar 2013 11:15:36 +0200 Subject: [Numpy-discussion] Bug in einsum? Message-ID: <514043B8.3010703@aalto.fi> Hi, I have encountered a very weird behaviour with einsum. I try to compute something like R*A*R', where * denotes a kind of "matrix multiplication". However, for particular shapes of R and A, the results are extremely bad. I compare two einsum results: First, I compute in two einsum calls as (R*A)*R'. Second, I compute the whole result in one einsum call. However, the results are significantly different for some shapes. My test: import numpy as np for D in range(30): A = np.random.randn(100,D,D) R = np.random.randn(D,D) Y1 = np.einsum('...ik,...kj->...ij', R, A) Y1 = np.einsum('...ik,...kj->...ij', Y1, R.T) Y2 = np.einsum('...ik,...kl,...lj->...ij', R, A, R.T) print("D=%d" % D, np.allclose(Y1,Y2), np.linalg.norm(Y1-Y2)) Output: D=0 True 0.0 D=1 True 0.0 D=2 True 8.40339658678e-15 D=3 True 8.09995399928e-15 D=4 True 3.59428803435e-14 D=5 False 34.755610184 D=6 False 28.3576558351 D=7 False 41.5402690906 D=8 True 2.31709582841e-13 D=9 False 36.0161112799 D=10 True 4.76237746912e-13 D=11 True 4.57944440782e-13 D=12 True 4.90302218301e-13 D=13 True 6.96175851271e-13 D=14 True 1.10067181384e-12 D=15 True 1.29095933163e-12 D=16 True 1.3466837332e-12 D=17 True 1.52265065763e-12 D=18 True 2.05407923852e-12 D=19 True 2.33327630748e-12 D=20 True 2.96849358082e-12 D=21 True 3.31063706175e-12 D=22 True 4.28163620455e-12 D=23 True 3.58951880681e-12 D=24 True 4.69973694769e-12 D=25 True 5.47385264567e-12 D=26 True 5.49643316347e-12 D=27 True 6.75132988402e-12 D=28 True 7.86435437892e-12 D=29 True 7.85453681029e-12 So, for D={5,6,7,9}, allclose returns False and the error norm is HUGE. It doesn't seem like just some small numerical inaccuracy because the error norm is so large. I don't know which one is correct (Y1 or Y2) but at least either one is wrong in my opinion. I ran the same test several times, and each time same values of D fail. If I change the shapes somehow, the failing values of D might change too, but I usually have several failing values. I'm running the latest version from github (commit bd7104cef4) under Python 3.2.3. With NumPy 1.6.1 under Python 2.7.3 the test crashes and Python exits printing "Floating point exception". This seems so weird to me that I wonder if I'm just doing something stupid.. Thanks a lot for any help! Jaakko From ndbecker2 at gmail.com Wed Mar 13 09:23:59 2013 From: ndbecker2 at gmail.com (Neal Becker) Date: Wed, 13 Mar 2013 09:23:59 -0400 Subject: [Numpy-discussion] Adopt Mersenne Twister 64bit? References: <662AFAE5-ED48-402F-A8C4-89C87FA0A6CF@continuum.io> Message-ID: Neal Becker wrote: > Neal Becker wrote: > >> I guess I talked to you about 100 years ago about sharing state between numpy >> rng and code I have in c++ that wraps boost::random. So is there a C-api for >> this RandomState object I could use to call from c++? Maybe I could do >> something with that. >> >> The c++ code could invoke via the python api, but that might be slower. I'm >> just rambling here, I'd have to see the API to get some ideas. > > I think if I could just grab a long int from the underlying mersenne twister, > through some c api? Well, this at least appears to work - probably not the most efficient approach - calls the RandomState object via the python interface to get 4 bytes at a time: int test1 (bp::object & rs) { bp::str bytes = call_method (rs.ptr(), "bytes", 4); // get 4 bytes return *reinterpret_cast (PyString_AS_STRING (bytes.ptr())); } BOOST_PYTHON_MODULE (numpy_rand) { boost::numpy::initialize(); def ("test1", &test1); } From Andrea.Cimatoribus at nioz.nl Wed Mar 13 09:45:23 2013 From: Andrea.Cimatoribus at nioz.nl (Andrea Cimatoribus) Date: Wed, 13 Mar 2013 14:45:23 +0100 Subject: [Numpy-discussion] fast numpy.fromfile skipping data chunks Message-ID: Hi everybody, I hope this has not been discussed before, I couldn't find a solution elsewhere. I need to read some binary data, and I am using numpy.fromfile to do this. Since the files are huge, and would make me run out of memory, I need to read data skipping some records (I am reading data recorded at high frequency, so basically I want to read subsampling). At the moment, I came up with the code below, which is then compiled using cython. Despite the significant performance increase from the pure python version, the function is still much slower than numpy.fromfile, and only reads one kind of data (in this case uint32), otherwise I do not know how to define the array type in advance. I have basically no experience with cython nor c, so I am a bit stuck. How can I try to make this more efficient and possibly more generic? Thanks import numpy as np #For cython! cimport numpy as np from libc.stdint cimport uint32_t def cffskip32(fid, int count=1, int skip=0): cdef int k=0 cdef np.ndarray[uint32_t, ndim=1] data = np.zeros(count, dtype=np.uint32) if skip>=0: while k References: Message-ID: On Wed, Mar 13, 2013 at 1:45 PM, Andrea Cimatoribus wrote: > Hi everybody, I hope this has not been discussed before, I couldn't find a solution elsewhere. > I need to read some binary data, and I am using numpy.fromfile to do this. Since the files are huge, and would make me run out of memory, I need to read data skipping some records (I am reading data recorded at high frequency, so basically I want to read subsampling). > At the moment, I came up with the code below, which is then compiled using cython. Despite the significant performance increase from the pure python version, the function is still much slower than numpy.fromfile, and only reads one kind of data (in this case uint32), otherwise I do not know how to define the array type in advance. I have basically no experience with cython nor c, so I am a bit stuck. How can I try to make this more efficient and possibly more generic? If your data is stored as fixed-format binary (as it seems it is), then the easiest way is probably # Exploit the operating system's virtual memory manager to get a "virtual copy" of the entire file in memory # (This does not actually use any memory until accessed): virtual_arr = np.memmap(path, np.uint32, "r") # Get a numpy view onto every 20th entry: virtual_arr_subsampled = virtual_arr[::20] # Copy those bits into regular malloc'ed memory: arr_subsampled = virtual_arr_subsampled.copy() (Your data is probably large enough that this will only work if you're using a 64-bit system, because of address space limitations; but if you have data that's too large to fit into memory, then I assume you're using a 64-bit system anyway...) -n From nouiz at nouiz.org Wed Mar 13 10:03:10 2013 From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=) Date: Wed, 13 Mar 2013 10:03:10 -0400 Subject: [Numpy-discussion] fast numpy.fromfile skipping data chunks In-Reply-To: References: Message-ID: Hi, I would suggest that you look at pytables[1]. It use a different file format, but it seam to do exactly what you want and give an object that have a very similar interface to numpy.ndarray (but fewer function). You would just ask for the slice/indices that you want and it return you a numpy.ndarray. HTH Fr?d?ric [1] http://www.pytables.org/moin On Wed, Mar 13, 2013 at 9:54 AM, Nathaniel Smith wrote: > On Wed, Mar 13, 2013 at 1:45 PM, Andrea Cimatoribus > wrote: >> Hi everybody, I hope this has not been discussed before, I couldn't find a solution elsewhere. >> I need to read some binary data, and I am using numpy.fromfile to do this. Since the files are huge, and would make me run out of memory, I need to read data skipping some records (I am reading data recorded at high frequency, so basically I want to read subsampling). >> At the moment, I came up with the code below, which is then compiled using cython. Despite the significant performance increase from the pure python version, the function is still much slower than numpy.fromfile, and only reads one kind of data (in this case uint32), otherwise I do not know how to define the array type in advance. I have basically no experience with cython nor c, so I am a bit stuck. How can I try to make this more efficient and possibly more generic? > > If your data is stored as fixed-format binary (as it seems it is), > then the easiest way is probably > > # Exploit the operating system's virtual memory manager to get a > "virtual copy" of the entire file in memory > # (This does not actually use any memory until accessed): > virtual_arr = np.memmap(path, np.uint32, "r") > # Get a numpy view onto every 20th entry: > virtual_arr_subsampled = virtual_arr[::20] > # Copy those bits into regular malloc'ed memory: > arr_subsampled = virtual_arr_subsampled.copy() > > (Your data is probably large enough that this will only work if you're > using a 64-bit system, because of address space limitations; but if > you have data that's too large to fit into memory, then I assume > you're using a 64-bit system anyway...) > > -n > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From mpuecker at mit.edu Wed Mar 13 09:56:07 2013 From: mpuecker at mit.edu (Matt U) Date: Wed, 13 Mar 2013 13:56:07 +0000 (UTC) Subject: [Numpy-discussion] numpy reference array Message-ID: Is it possible to create a numpy array which points to the same data in a different numpy array (but in different order etc)? For example: Code: ------------------------------------------------------------------------------ import numpy as np a = np.arange(10) ids = np.array([0,0,5,5,9,9,1,1]) b = a[ids] a[0] = -1 b[0] #should be -1 if b[0] referenced the same data as a[0] 0 ------------------------------------------------------------------------------ ctypes almost does it for me, but the access is inconvenient. I would like to access b as a regular numpy array: Code: ------------------------------------------------------------------------------ import numpy as np import ctypes a = np.arange(10) ids = np.array([0,0,5,5,9,9,1,1]) b = [a[id:id+1].ctypes.data_as(ctypes.POINTER(ctypes.c_long)) for id in ids] a[0] = -1 b[0][0] #access is inconvenient -1 ------------------------------------------------------------------------------ Some more information: I've written a finite-element code, and I'm working on optimizing the python implementation. Profiling shows the slowest operation is the re-creation of an array that extracts edge degrees of freedom from the volume of the element (similar to b above). So, I'm trying to avoid copying the data every time, and just setting up 'b' once. The ctypes solution is sub-optimal since my code is mostly vectorized, that is, later I'd like to something like Code: ------------------------------------------------------------------------------ c[ids] = b[ids] + d[ids] ------------------------------------------------------------------------------ where c, and d are the same shape as b but contain different data. Any thoughts? If it's not possible that will save me time searching. From Andrea.Cimatoribus at nioz.nl Wed Mar 13 10:18:53 2013 From: Andrea.Cimatoribus at nioz.nl (Andrea Cimatoribus) Date: Wed, 13 Mar 2013 15:18:53 +0100 Subject: [Numpy-discussion] R: fast numpy.fromfile skipping data chunks In-Reply-To: References: , Message-ID: This solution does not work for me since I have an offset before the data that is not a multiple of the datatype (it's a header containing various stuff). I'll at pytables. # Exploit the operating system's virtual memory manager to get a "virtual copy" of the entire file in memory # (This does not actually use any memory until accessed): virtual_arr = np.memmap(path, np.uint32, "r") # Get a numpy view onto every 20th entry: virtual_arr_subsampled = virtual_arr[::20] # Copy those bits into regular malloc'ed memory: arr_subsampled = virtual_arr_subsampled.copy() From jaakko.luttinen at aalto.fi Wed Mar 13 10:21:13 2013 From: jaakko.luttinen at aalto.fi (Jaakko Luttinen) Date: Wed, 13 Mar 2013 16:21:13 +0200 Subject: [Numpy-discussion] Dot/inner products with broadcasting? Message-ID: <51408B59.8090504@aalto.fi> Hi! How can I compute dot product (or similar multiply&sum operations) efficiently so that broadcasting is utilized? For multi-dimensional arrays, NumPy's inner and dot functions do not match the leading axes and use broadcasting, but instead the result has first the leading axes of the first input array and then the leading axes of the second input array. For instance, I would like to compute the following inner-product: np.sum(A*B, axis=-1) But numpy.inner gives: A = np.random.randn(2,3,4) B = np.random.randn(3,4) np.inner(A,B).shape # -> (2, 3, 3) instead of (2, 3) Similarly for dot product, I would like to compute for instance: np.sum(A[...,:,:,np.newaxis]*B[...,np.newaxis,:,:], axis=-2) But numpy.dot gives: In [12]: A = np.random.randn(2,3,4); B = np.random.randn(2,4,5) In [13]: np.dot(A,B).shape # -> (2, 3, 2, 5) instead of (2, 3, 5) I could use einsum for these operations, but I'm not sure whether that's as efficient as using some BLAS-supported(?) dot products. I couldn't find any function which could perform this kind of operations. NumPy's functions seem to either flatten the input arrays (vdot, outer) or just use the axes of the input arrays separately (dot, inner, tensordot). Any help? Best regards, Jaakko From Andrea.Cimatoribus at nioz.nl Wed Mar 13 10:21:50 2013 From: Andrea.Cimatoribus at nioz.nl (Andrea Cimatoribus) Date: Wed, 13 Mar 2013 15:21:50 +0100 Subject: [Numpy-discussion] R: fast numpy.fromfile skipping data chunks In-Reply-To: References: , Message-ID: I see that pytables deals with hdf5 data. It would be very nice if the data were in such a standard format, but that is not the case, and that can't be changed. ________________________________________ Da: numpy-discussion-bounces at scipy.org [numpy-discussion-bounces at scipy.org] per conto di Fr?d?ric Bastien [nouiz at nouiz.org] Inviato: mercoled? 13 marzo 2013 15.03 A: Discussion of Numerical Python Oggetto: Re: [Numpy-discussion] fast numpy.fromfile skipping data chunks Hi, I would suggest that you look at pytables[1]. It use a different file format, but it seam to do exactly what you want and give an object that have a very similar interface to numpy.ndarray (but fewer function). You would just ask for the slice/indices that you want and it return you a numpy.ndarray. HTH Fr?d?ric [1] http://www.pytables.org/moin On Wed, Mar 13, 2013 at 9:54 AM, Nathaniel Smith wrote: > On Wed, Mar 13, 2013 at 1:45 PM, Andrea Cimatoribus > wrote: >> Hi everybody, I hope this has not been discussed before, I couldn't find a solution elsewhere. >> I need to read some binary data, and I am using numpy.fromfile to do this. Since the files are huge, and would make me run out of memory, I need to read data skipping some records (I am reading data recorded at high frequency, so basically I want to read subsampling). >> At the moment, I came up with the code below, which is then compiled using cython. Despite the significant performance increase from the pure python version, the function is still much slower than numpy.fromfile, and only reads one kind of data (in this case uint32), otherwise I do not know how to define the array type in advance. I have basically no experience with cython nor c, so I am a bit stuck. How can I try to make this more efficient and possibly more generic? > > If your data is stored as fixed-format binary (as it seems it is), > then the easiest way is probably > > # Exploit the operating system's virtual memory manager to get a > "virtual copy" of the entire file in memory > # (This does not actually use any memory until accessed): > virtual_arr = np.memmap(path, np.uint32, "r") > # Get a numpy view onto every 20th entry: > virtual_arr_subsampled = virtual_arr[::20] > # Copy those bits into regular malloc'ed memory: > arr_subsampled = virtual_arr_subsampled.copy() > > (Your data is probably large enough that this will only work if you're > using a 64-bit system, because of address space limitations; but if > you have data that's too large to fit into memory, then I assume > you're using a 64-bit system anyway...) > > -n > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion From njs at pobox.com Wed Mar 13 10:32:29 2013 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 13 Mar 2013 14:32:29 +0000 Subject: [Numpy-discussion] R: fast numpy.fromfile skipping data chunks In-Reply-To: References: Message-ID: On Wed, Mar 13, 2013 at 2:18 PM, Andrea Cimatoribus wrote: > This solution does not work for me since I have an offset before the data that is not a multiple of the datatype (it's a header containing various stuff). np.memmap takes an offset= argument. -n From Andrea.Cimatoribus at nioz.nl Wed Mar 13 10:37:54 2013 From: Andrea.Cimatoribus at nioz.nl (Andrea Cimatoribus) Date: Wed, 13 Mar 2013 15:37:54 +0100 Subject: [Numpy-discussion] R: R: fast numpy.fromfile skipping data chunks In-Reply-To: References: , Message-ID: Indeed, but that offset "it should be a multiple of the byte-size of dtype" as the help says. Indeed, this is silly. ________________________________________ Da: numpy-discussion-bounces at scipy.org [numpy-discussion-bounces at scipy.org] per conto di Nathaniel Smith [njs at pobox.com] Inviato: mercoled? 13 marzo 2013 15.32 A: Discussion of Numerical Python Oggetto: Re: [Numpy-discussion] R: fast numpy.fromfile skipping data chunks On Wed, Mar 13, 2013 at 2:18 PM, Andrea Cimatoribus wrote: > This solution does not work for me since I have an offset before the data that is not a multiple of the datatype (it's a header containing various stuff). np.memmap takes an offset= argument. -n _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion From Andrea.Cimatoribus at nioz.nl Wed Mar 13 10:40:07 2013 From: Andrea.Cimatoribus at nioz.nl (Andrea Cimatoribus) Date: Wed, 13 Mar 2013 15:40:07 +0100 Subject: [Numpy-discussion] R: R: R: fast numpy.fromfile skipping data chunks In-Reply-To: References: , , Message-ID: On top of that, there is another issue: it can be that the data available itself is not a multiple of dtype, since there can be write errors at the end of the file, and I don't know how to deal with that. ________________________________________ Da: numpy-discussion-bounces at scipy.org [numpy-discussion-bounces at scipy.org] per conto di Andrea Cimatoribus Inviato: mercoled? 13 marzo 2013 15.37 A: Discussion of Numerical Python Oggetto: [Numpy-discussion] R: R: fast numpy.fromfile skipping data chunks Indeed, but that offset "it should be a multiple of the byte-size of dtype" as the help says. Indeed, this is silly. ________________________________________ Da: numpy-discussion-bounces at scipy.org [numpy-discussion-bounces at scipy.org] per conto di Nathaniel Smith [njs at pobox.com] Inviato: mercoled? 13 marzo 2013 15.32 A: Discussion of Numerical Python Oggetto: Re: [Numpy-discussion] R: fast numpy.fromfile skipping data chunks On Wed, Mar 13, 2013 at 2:18 PM, Andrea Cimatoribus wrote: > This solution does not work for me since I have an offset before the data that is not a multiple of the datatype (it's a header containing various stuff). np.memmap takes an offset= argument. -n _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion From Andrea.Cimatoribus at nioz.nl Wed Mar 13 10:46:30 2013 From: Andrea.Cimatoribus at nioz.nl (Andrea Cimatoribus) Date: Wed, 13 Mar 2013 15:46:30 +0100 Subject: [Numpy-discussion] R: R: R: fast numpy.fromfile skipping data chunks In-Reply-To: References: , , Message-ID: >Indeed, but that offset "it should be a multiple of the byte-size of dtype" as the help says. My mistake, sorry, even if the help says so, it seems that this is not the case in the actual code. Still, the problem with the size of the available data (which is not necessarily a multiple of dtype byte-size) remains. ac From njs at pobox.com Wed Mar 13 10:53:25 2013 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 13 Mar 2013 14:53:25 +0000 Subject: [Numpy-discussion] R: R: R: fast numpy.fromfile skipping data chunks In-Reply-To: References: Message-ID: On Wed, Mar 13, 2013 at 2:46 PM, Andrea Cimatoribus wrote: >>Indeed, but that offset "it should be a multiple of the byte-size of dtype" as the help says. > > My mistake, sorry, even if the help says so, it seems that this is not the case in the actual code. Still, the problem with the size of the available data (which is not necessarily a multiple of dtype byte-size) remains. Worst case you can always work around such issues with an extra layer of view manipulation: # create a raw view onto the contents of the file file_bytes = np.memmap(path, dtype=np.uint8, ...) # cut out any arbitrary number of bytes from the beginning and end data_bytes = file_bytes[...some slice expression...] # switch to viewing the bytes as the proper data type data = data_bytes.view(dtype=np.uint32) # proceed as before -n From francesc at continuum.io Wed Mar 13 10:53:38 2013 From: francesc at continuum.io (Francesc Alted) Date: Wed, 13 Mar 2013 15:53:38 +0100 Subject: [Numpy-discussion] fast numpy.fromfile skipping data chunks In-Reply-To: References: Message-ID: <514092F2.6@continuum.io> On 3/13/13 2:45 PM, Andrea Cimatoribus wrote: > Hi everybody, I hope this has not been discussed before, I couldn't find a solution elsewhere. > I need to read some binary data, and I am using numpy.fromfile to do this. Since the files are huge, and would make me run out of memory, I need to read data skipping some records (I am reading data recorded at high frequency, so basically I want to read subsampling). [clip] You can do a fid.seek(offset) prior to np.fromfile() and the it will read from offset. See the docstrings for `file.seek()` on how to use it. -- Francesc Alted From francesc at continuum.io Wed Mar 13 11:04:31 2013 From: francesc at continuum.io (Francesc Alted) Date: Wed, 13 Mar 2013 16:04:31 +0100 Subject: [Numpy-discussion] fast numpy.fromfile skipping data chunks In-Reply-To: <514092F2.6@continuum.io> References: <514092F2.6@continuum.io> Message-ID: <5140957F.4030708@continuum.io> On 3/13/13 3:53 PM, Francesc Alted wrote: > On 3/13/13 2:45 PM, Andrea Cimatoribus wrote: >> Hi everybody, I hope this has not been discussed before, I couldn't >> find a solution elsewhere. >> I need to read some binary data, and I am using numpy.fromfile to do >> this. Since the files are huge, and would make me run out of memory, >> I need to read data skipping some records (I am reading data recorded >> at high frequency, so basically I want to read subsampling). > [clip] > > You can do a fid.seek(offset) prior to np.fromfile() and the it will > read from offset. See the docstrings for `file.seek()` on how to use it. > Ups, you were already using file.seek(). Disregard, please. -- Francesc Alted From Andrea.Cimatoribus at nioz.nl Wed Mar 13 11:13:50 2013 From: Andrea.Cimatoribus at nioz.nl (Andrea Cimatoribus) Date: Wed, 13 Mar 2013 16:13:50 +0100 Subject: [Numpy-discussion] R: R: R: R: fast numpy.fromfile skipping data chunks In-Reply-To: References: , Message-ID: Ok, this seems to be working (well, as soon as I get the right offset and things like that, but that's a different story). The problem is that it does not go any faster than my initial function compiled with cython, and it is still a lot slower than fromfile. Is there a reason why, even with compiled code, reading from a file skipping some records should be slower than reading the whole file? ________________________________________ Da: numpy-discussion-bounces at scipy.org [numpy-discussion-bounces at scipy.org] per conto di Nathaniel Smith [njs at pobox.com] Inviato: mercoled? 13 marzo 2013 15.53 A: Discussion of Numerical Python Oggetto: Re: [Numpy-discussion] R: R: R: fast numpy.fromfile skipping data chunks On Wed, Mar 13, 2013 at 2:46 PM, Andrea Cimatoribus wrote: >>Indeed, but that offset "it should be a multiple of the byte-size of dtype" as the help says. > > My mistake, sorry, even if the help says so, it seems that this is not the case in the actual code. Still, the problem with the size of the available data (which is not necessarily a multiple of dtype byte-size) remains. Worst case you can always work around such issues with an extra layer of view manipulation: # create a raw view onto the contents of the file file_bytes = np.memmap(path, dtype=np.uint8, ...) # cut out any arbitrary number of bytes from the beginning and end data_bytes = file_bytes[...some slice expression...] # switch to viewing the bytes as the proper data type data = data_bytes.view(dtype=np.uint32) # proceed as before -n _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion From njs at pobox.com Wed Mar 13 11:43:02 2013 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 13 Mar 2013 15:43:02 +0000 Subject: [Numpy-discussion] R: R: R: R: fast numpy.fromfile skipping data chunks In-Reply-To: References: Message-ID: On 13 Mar 2013 15:16, "Andrea Cimatoribus" wrote: > > Ok, this seems to be working (well, as soon as I get the right offset and things like that, but that's a different story). > The problem is that it does not go any faster than my initial function compiled with cython, and it is still a lot slower than fromfile. Is there a reason why, even with compiled code, reading from a file skipping some records should be slower than reading the whole file? Oh, in that case you're probably IO bound, not CPU bound, so Cython etc. can't help. Traditional spinning-disk hard drives can read quite quickly, but take a long time to find the right place to read from and start reading. Your OS has heuristics in it to detect sequential reads and automatically start the setup for the next read while you're processing the previous read, so you don't see the seek overhead. If your reads are widely separated enough, these heuristics will get confused and you'll drop back to doing a new disk seek on every call to read(), which is deadly. (And would explain what you're seeing.) If this is what's going on, your best bet is to just write a python loop that uses fromfile() to read some largeish (megabytes?) chunk, subsample those and throw away the rest, and repeat. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaakko.luttinen at aalto.fi Wed Mar 13 11:46:56 2013 From: jaakko.luttinen at aalto.fi (Jaakko Luttinen) Date: Wed, 13 Mar 2013 17:46:56 +0200 Subject: [Numpy-discussion] Performance of einsum? Message-ID: <51409F70.800@aalto.fi> Hi, I was wondering if someone could provide some intuition on the performance of einsum? I have found that sometimes it is extremely efficient but sometimes it is several orders of magnitudes slower compared to some other approaches, for instance, using multiple dot-calls. My intuition is that the computation time of einsum is linear with respect to the size of the "index space", that is, the product of the maximums of the indices. So, for instance computing the dot product of three matrices A*B*C would not be efficient as einsum('ij,jk,kl->il', A, B, C) because there are four indices i=1,...,I, j=1,...,J, k=1,...,K and l=1,...,L so the total computation time is O(I*J*K*L) which is much worse than with two dot products O(I*J*K+J*K*L), or with two einsum-calls for Y=A*B and Y*C. On the other hand, computing einsum('ij,ij,ij->i', A, B, C) would be "efficient" because the computation time is only O(I*J). Is this intuition roughly correct or how could I intuitively understand when the usage of einsum is bad? Best regards, Jaakko From Andrea.Cimatoribus at nioz.nl Wed Mar 13 11:54:24 2013 From: Andrea.Cimatoribus at nioz.nl (Andrea Cimatoribus) Date: Wed, 13 Mar 2013 16:54:24 +0100 Subject: [Numpy-discussion] R: fast numpy.fromfile skipping data chunks In-Reply-To: References: , Message-ID: Thanks a lot for the feedback, I'll try to modify my function to overcome this issue. Since I'm in the process of buying new hardware too, a slight OT (but definitely related). Does an ssd provide substantial improvement in these cases? ________________________________________ Da: numpy-discussion-bounces at scipy.org [numpy-discussion-bounces at scipy.org] per conto di Nathaniel Smith [njs at pobox.com] Inviato: mercoled? 13 marzo 2013 16.43 A: Discussion of Numerical Python Oggetto: Re: [Numpy-discussion] R: R: R: R: fast numpy.fromfile skipping data chunks On 13 Mar 2013 15:16, "Andrea Cimatoribus" > wrote: > > Ok, this seems to be working (well, as soon as I get the right offset and things like that, but that's a different story). > The problem is that it does not go any faster than my initial function compiled with cython, and it is still a lot slower than fromfile. Is there a reason why, even with compiled code, reading from a file skipping some records should be slower than reading the whole file? Oh, in that case you're probably IO bound, not CPU bound, so Cython etc. can't help. Traditional spinning-disk hard drives can read quite quickly, but take a long time to find the right place to read from and start reading. Your OS has heuristics in it to detect sequential reads and automatically start the setup for the next read while you're processing the previous read, so you don't see the seek overhead. If your reads are widely separated enough, these heuristics will get confused and you'll drop back to doing a new disk seek on every call to read(), which is deadly. (And would explain what you're seeing.) If this is what's going on, your best bet is to just write a python loop that uses fromfile() to read some largeish (megabytes?) chunk, subsample those and throw away the rest, and repeat. -n From dineshbvadhia at hotmail.com Wed Mar 13 11:59:12 2013 From: dineshbvadhia at hotmail.com (Dinesh B Vadhia) Date: Wed, 13 Mar 2013 08:59:12 -0700 Subject: [Numpy-discussion] (@Pat Marion) Re: Yes, this one again "ImportError: No module named multiarray" In-Reply-To: References: Message-ID: Many thanks Pat - the numpy discussion list is brill. Go ahead and see if the CPython developers would be interested as it is a problem that appears all the time on boards/lists. Best ... Dinesh From: Pat Marion Sent: Tuesday, March 12, 2013 7:23 AM To: Aron Ahmadia Cc: Discussion of Numerical Python Subject: Re: [Numpy-discussion] (@Pat Marion) Re: Yes,this one again "ImportError: No module named multiarray" Thanks for copying me, Aron. Hi Dinesh, I have a github project which demonstrates how to use numpy with freeze. The project's readme includes more information: https://github.com/patmarion/NumpyBuiltinExample It does require a small patch to CPython's import.c file. I haven't tried posted this patch to the CPython developers, perhaps there'd be interest incorporating it upstream. Pat -------------- next part -------------- An HTML attachment was scrubbed... URL: From dineshbvadhia at hotmail.com Wed Mar 13 12:08:22 2013 From: dineshbvadhia at hotmail.com (Dinesh B Vadhia) Date: Wed, 13 Mar 2013 09:08:22 -0700 Subject: [Numpy-discussion] Yes, this one again "ImportError: No module named multiarray" In-Reply-To: References: Message-ID: Hi Chris Darn! It worked this morning and I don't know why. Focused on PyInstaller because it creates a single executable. Testing on all major versions of Windows (32-bit and 64-bit), Linux and OSX. The problem OS is unsurprisingly, Windows XP (SP3). Numpy was upgraded to the mkl-version and maybe that did the trick. Tried to replicate on an identical Windows XP machine using the standard sourceforge distribution but that resulted in a pyinstaller error. Anyway, using the latest releases of all software ie. Python 2.7.3, Numpy 1.7.0, Scipy 0.11.0, PyInstaller 2.0. Will post back if run into problems again. Best ... -------------------------------------------------- From: "Chris Barker - NOAA Federal" Sent: Tuesday, March 12, 2013 2:50 PM To: "Discussion of Numerical Python" Subject: Re: [Numpy-discussion] Yes,this one again "ImportError: No module named multiarray" > On Tue, Mar 12, 2013 at 7:05 AM, Dinesh B Vadhia > wrote: >> Does that mean numpy won't work with freeze/create_executable type of >> tools >> or is there a workaround? > > I've used numpy with py2exe and py2app out of the box with no issues ( > actually, there is an issue with too much stuff getting bundled up, > but it works) > >>> ImportError let alone what the solution is. The Traceback, similar to >>> others found on the web, is: >>> >>> Traceback (most recent call last): >>> File "test.py", ... >>> File "C:\Python27\lib\site-packages\numpy\__init__.py", line 154, in >>> > > This indicates that your code is importing the numpy that's inside the > system installation -- it should be using one in your app bundle. > > What bundling tool are you using? > How did you install python/numpy? > What does your bundling tol config look like? > And, of course, version numbers of everything. > > -Chris > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > > From rhattersley at gmail.com Wed Mar 13 12:53:07 2013 From: rhattersley at gmail.com (Richard Hattersley) Date: Wed, 13 Mar 2013 16:53:07 +0000 Subject: [Numpy-discussion] fast numpy.fromfile skipping data chunks In-Reply-To: References: Message-ID: > Since the files are huge, and would make me run out of memory, I need to read data skipping some records Is it possible to describe what you're doing with the data once you have subsampled it? And if there were a way to work with the full resolution data, would that be desirable? I ask because I've been dabbling with a pure-Python library for handilng larger-than-memory datasets - https://github.com/SciTools/biggus, and it uses similar chunking techniques as mentioned in the other replies to process data at the full streaming I/O rate. It's still in the early stages of development so the design can be fluid, so maybe it's worth seeing if there's enough in common with your needs to warrant adding your use case. Richard On 13 March 2013 13:45, Andrea Cimatoribus wrote: > Hi everybody, I hope this has not been discussed before, I couldn't find a > solution elsewhere. > I need to read some binary data, and I am using numpy.fromfile to do this. > Since the files are huge, and would make me run out of memory, I need to > read data skipping some records (I am reading data recorded at high > frequency, so basically I want to read subsampling). > At the moment, I came up with the code below, which is then compiled using > cython. Despite the significant performance increase from the pure python > version, the function is still much slower than numpy.fromfile, and only > reads one kind of data (in this case uint32), otherwise I do not know how > to define the array type in advance. I have basically no experience with > cython nor c, so I am a bit stuck. How can I try to make this more > efficient and possibly more generic? > Thanks > > import numpy as np > #For cython! > cimport numpy as np > from libc.stdint cimport uint32_t > > def cffskip32(fid, int count=1, int skip=0): > > cdef int k=0 > cdef np.ndarray[uint32_t, ndim=1] data = np.zeros(count, > dtype=np.uint32) > > if skip>=0: > while k try: > data[k] = np.fromfile(fid, count=1, dtype=np.uint32) > fid.seek(skip, 1) > k +=1 > except ValueError: > data = data[:k] > break > return data > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From davidmenhur at gmail.com Wed Mar 13 13:41:19 2013 From: davidmenhur at gmail.com (=?UTF-8?B?RGHPgGlk?=) Date: Wed, 13 Mar 2013 18:41:19 +0100 Subject: [Numpy-discussion] R: fast numpy.fromfile skipping data chunks In-Reply-To: References: Message-ID: On 13 March 2013 16:54, Andrea Cimatoribus wrote: > Since I'm in the process of buying new hardware too, a slight OT (but > definitely related). > Does an ssd provide substantial improvement in these cases? > It should help. Nevertheless, when talking about performance, it is difficult to predict, mainly because in a computer there are many things going on and many layers involved. I have a couple of computers equipped with SSD, if you want, if you send me some benchmarks I can run them and see if I get any speedup. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndbecker2 at gmail.com Wed Mar 13 14:40:28 2013 From: ndbecker2 at gmail.com (Neal Becker) Date: Wed, 13 Mar 2013 14:40:28 -0400 Subject: [Numpy-discussion] can't run cython on mtrand.pyx Message-ID: Grabbed numpy-1.7.0 source. Cython is 0.18 cython mtrand.pyx produces lots of errors. From robert.kern at gmail.com Wed Mar 13 15:01:51 2013 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 13 Mar 2013 19:01:51 +0000 Subject: [Numpy-discussion] can't run cython on mtrand.pyx In-Reply-To: References: Message-ID: On Wed, Mar 13, 2013 at 6:40 PM, Neal Becker wrote: > Grabbed numpy-1.7.0 source. > Cython is 0.18 > > cython mtrand.pyx produces lots of errors. It helps to copy-and-paste the errors that you are seeing. In any case, Cython 0.18 works okay on master's mtrand.pyx sources. -- Robert Kern From ndbecker2 at gmail.com Wed Mar 13 15:20:30 2013 From: ndbecker2 at gmail.com (Neal Becker) Date: Wed, 13 Mar 2013 15:20:30 -0400 Subject: [Numpy-discussion] can't run cython on mtrand.pyx References: Message-ID: Robert Kern wrote: > On Wed, Mar 13, 2013 at 6:40 PM, Neal Becker wrote: >> Grabbed numpy-1.7.0 source. >> Cython is 0.18 >> >> cython mtrand.pyx produces lots of errors. > > It helps to copy-and-paste the errors that you are seeing. > > In any case, Cython 0.18 works okay on master's mtrand.pyx sources. > Well, this is the first error: cython mtrand.pyx Error compiling Cython file: ------------------------------------------------------------ ... PyArray_DIMS(oa) , NPY_DOUBLE) length = PyArray_SIZE(array) array_data = PyArray_DATA(array) itera = PyArray_IterNew(oa) for i from 0 <= i < length: array_data[i] = func(state, ((itera.dataptr))[0]) ^ ------------------------------------------------------------ mtrand.pyx:177:41: Python objects cannot be cast to pointers of primitive types From charlesr.harris at gmail.com Wed Mar 13 19:40:00 2013 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 13 Mar 2013 17:40:00 -0600 Subject: [Numpy-discussion] R: fast numpy.fromfile skipping data chunks In-Reply-To: References: Message-ID: On Wed, Mar 13, 2013 at 9:54 AM, Andrea Cimatoribus < Andrea.Cimatoribus at nioz.nl> wrote: > Thanks a lot for the feedback, I'll try to modify my function to overcome > this issue. > Since I'm in the process of buying new hardware too, a slight OT (but > definitely related). > Does an ssd provide substantial improvement in these cases? > It should. Seek time on an ssd is quite low, and readout is fast. Skipping over items will probably not be as fast as a sequential read but I expect it will be substantially faster than a disk. Nathaniel's loop idea will probably work faster also. The sequential readout rate of a modern ssd will be about 500 MB/sec, so you can probably just divide that into your file size to get an estimate of the time needed. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Wed Mar 13 20:50:20 2013 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Wed, 13 Mar 2013 17:50:20 -0700 Subject: [Numpy-discussion] numpy reference array In-Reply-To: References: Message-ID: On Wed, Mar 13, 2013 at 6:56 AM, Matt U wrote: > Is it possible to create a numpy array which points to the same data in a > different numpy array (but in different order etc)? You can do this (easily), but only if the "different order" can be defined in terms of strides. A simple example is a transpose: In [3]: a = np.arange(12).reshape((3,4)) In [4]: a Out[4]: array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]]) In [5]: b = a.T In [6]: b Out[6]: array([[ 0, 4, 8], [ 1, 5, 9], [ 2, 6, 10], [ 3, 7, 11]]) # b is the transpose of a # but a view on the same data block: # change a: In [7]: a[2,1] = 44 In [8]: a Out[8]: array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 44, 10, 11]]) # b is changed, too. In [9]: b Out[9]: array([[ 0, 4, 8], [ 1, 5, 44], [ 2, 6, 10], [ 3, 7, 11]]) check out "stride tricks" for clever things you can do. But numpy does require that the data in your array be a contiguous block, in order, so you can't arbitrarily re-arrange it while keeping a view. HTH, -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From birdada85 at gmail.com Wed Mar 13 21:25:28 2013 From: birdada85 at gmail.com (Birdada Simret) Date: Thu, 14 Mar 2013 02:25:28 +0100 Subject: [Numpy-discussion] Any help from Numpy community? Message-ID: * Any help from Numpy community [[ 0. 1.54 0. 0. 0. 1.08 1.08 1.08 ] [ 1.54 0. 1.08 1.08 1.08 0. 0. 0. ] [ 0. 1.08 0. 0. 0. 0. 0. 0. ] [ 0. 1.08 0. 0. 0. 0. 0. 0. ] [ 0. 1.08 0. 0. 0. 0. 0. 0. ] [ 1.08 0. 0. 0. 0. 0. 0. 0. ] [ 1.08 0. 0. 0. 0. 0. 0. 0. ] [ 1.08 0. 0. 0. 0. 0. 0. 0. ]] the above is the numpy array matrix. the numbers represents: C-C: 1.54 and C-H=1.08 So I want to write this form as C of index i is connected to C of index j C of index i is connected to H of index j (C(i),C(j)) # key C(i) and value C(j) (C(i),H(j)) # key C(i) and value H(j) ; the key C(i) can be repeated to fulfil as much as the values of H(j) To summarize, the out put may look like: C1 is connected to C2 C1 is connected to H1 C1 is connected to H3 C2 is connected to H2 etc.... Any guide is greatly appreciated, thanks birda * -------------- next part -------------- An HTML attachment was scrubbed... URL: From pat.marion at kitware.com Wed Mar 13 22:33:01 2013 From: pat.marion at kitware.com (Pat Marion) Date: Thu, 14 Mar 2013 12:33:01 +1000 Subject: [Numpy-discussion] Yes, this one again "ImportError: No module named multiarray" In-Reply-To: References: Message-ID: Glad you got it working! For those who might be interested, the distinction between the example I linked to and packaging tools like PyInstaller or py2exe, is that NumpyBuiltinExampleuses static linking to embed numpy as a builtin module. At runtime, there is no dynamic loading, and there is no filesystem access. The technique is targeted at HPC or embedded systems where you might want to avoid touching the filesystem, or avoid dynamic loading. Pat On Thu, Mar 14, 2013 at 2:08 AM, Dinesh B Vadhia wrote: > Hi Chris > Darn! It worked this morning and I don't know why. > > Focused on PyInstaller because it creates a single executable. Testing on > all major versions of Windows (32-bit and 64-bit), Linux and OSX. The > problem OS is unsurprisingly, Windows XP (SP3). > > Numpy was upgraded to the mkl-version and maybe that did the trick. Tried > to replicate on an identical Windows XP machine using the standard > sourceforge distribution but that resulted in a pyinstaller error. > > Anyway, using the latest releases of all software ie. Python 2.7.3, Numpy > 1.7.0, Scipy 0.11.0, PyInstaller 2.0. > > Will post back if run into problems again. Best ... > > > -------------------------------------------------- > From: "Chris Barker - NOAA Federal" > Sent: Tuesday, March 12, 2013 2:50 PM > To: "Discussion of Numerical Python" > Subject: Re: [Numpy-discussion] Yes,this one again "ImportError: No module > named multiarray" > > > On Tue, Mar 12, 2013 at 7:05 AM, Dinesh B Vadhia > > wrote: > >> Does that mean numpy won't work with freeze/create_executable type of > >> tools > >> or is there a workaround? > > > > I've used numpy with py2exe and py2app out of the box with no issues ( > > actually, there is an issue with too much stuff getting bundled up, > > but it works) > > > >>> ImportError let alone what the solution is. The Traceback, similar to > >>> others found on the web, is: > >>> > >>> Traceback (most recent call last): > >>> File "test.py", ... > >>> File "C:\Python27\lib\site-packages\numpy\__init__.py", line 154, in > >>> > > > > This indicates that your code is importing the numpy that's inside the > > system installation -- it should be using one in your app bundle. > > > > What bundling tool are you using? > > How did you install python/numpy? > > What does your bundling tol config look like? > > And, of course, version numbers of everything. > > > > -Chris > > > > -- > > > > Christopher Barker, Ph.D. > > Oceanographer > > > > Emergency Response Division > > NOAA/NOS/OR&R (206) 526-6959 voice > > 7600 Sand Point Way NE (206) 526-6329 fax > > Seattle, WA 98115 (206) 526-6317 main reception > > > > Chris.Barker at noaa.gov > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Andrea.Cimatoribus at nioz.nl Thu Mar 14 04:48:08 2013 From: Andrea.Cimatoribus at nioz.nl (Andrea Cimatoribus) Date: Thu, 14 Mar 2013 09:48:08 +0100 Subject: [Numpy-discussion] R: R: R: R: R: fast numpy.fromfile skipping data chunks In-Reply-To: References: , Message-ID: Thanks for all the feedback (on the SSD too). For what concerns "biggus" library, for working on larger-than-memory arrays, this is really interesting, but unfortunately I don't have time to test it at the moment, I will try to have a look at it in the future. I hope to see something like that implemented in numpy soon, though. From sudheer.joseph at yahoo.com Thu Mar 14 05:18:01 2013 From: sudheer.joseph at yahoo.com (Sudheer Joseph) Date: Thu, 14 Mar 2013 17:18:01 +0800 (SGT) Subject: [Numpy-discussion] Numpy correlate In-Reply-To: References: , Message-ID: <1363252681.94038.YahooMailNeo@web193401.mail.sg3.yahoo.com> Dear Numpy/Scipy experts, ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Attached is a script which I made to test the numpy.correlate ( which is called py plt.xcorr) to see how the cross correlation is calculated. From this it appears the if i call plt.xcorr(x,y) Y is?slided back in time compared to x. ie if y is a process that causes a delayed response in x after 5 timesteps then there should be a high correlation at Lag 5. However in attached plot the response is seen in only -ve side of the lags. Can any one advice me on how to see which way exactly the 2 series are?slided?back or forth.? and understand the cause result relation better?( I understand merely by correlation one cannot assume cause and result relation, but it is important to know which series is older in time at a given lag. with best regards, Sudheer??? ? *************************************************************** Sudheer Joseph Indian National Centre for Ocean Information Services Ministry of Earth Sciences, Govt. of India POST BOX NO: 21, IDA Jeedeemetla P.O. Via Pragathi Nagar,Kukatpally, Hyderabad; Pin:5000 55 Tel:+91-40-23886047(O),Fax:+91-40-23895011(O), Tel:+91-40-23044600(R),Tel:+91-40-9440832534(Mobile) E-mail:sjo.India at gmail.com;sudheer.joseph at yahoo.com Web- http://oppamthadathil.tripod.com *************************************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: plt_xcorr.py Type: text/x-python Size: 428 bytes Desc: not available URL: From robert.kern at gmail.com Thu Mar 14 06:19:06 2013 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 14 Mar 2013 10:19:06 +0000 Subject: [Numpy-discussion] can't run cython on mtrand.pyx In-Reply-To: References: Message-ID: On Wed, Mar 13, 2013 at 7:20 PM, Neal Becker wrote: > Robert Kern wrote: > >> On Wed, Mar 13, 2013 at 6:40 PM, Neal Becker wrote: >>> Grabbed numpy-1.7.0 source. >>> Cython is 0.18 >>> >>> cython mtrand.pyx produces lots of errors. >> >> It helps to copy-and-paste the errors that you are seeing. >> >> In any case, Cython 0.18 works okay on master's mtrand.pyx sources. >> > > Well, this is the first error: > > cython mtrand.pyx > > Error compiling Cython file: > ------------------------------------------------------------ > ... > PyArray_DIMS(oa) , NPY_DOUBLE) > length = PyArray_SIZE(array) > array_data = PyArray_DATA(array) > itera = PyArray_IterNew(oa) > for i from 0 <= i < length: > array_data[i] = func(state, ((itera.dataptr))[0]) > ^ > ------------------------------------------------------------ > > mtrand.pyx:177:41: Python objects cannot be cast to pointers of primitive types It looks like Cython 0.18 removed the members of flatiter in its copy of numpy.pxd in favor of the macros that are recommended for numpy 1.7. The irony is not lost on me. This should be (PyArray_ITER_DATA(itera)[0]). I'm not sure why it appears to work in master, since this code in mtrand.pyx did not change. https://github.com/numpy/numpy/issues/3144 -- Robert Kern From robert.kern at gmail.com Thu Mar 14 06:24:41 2013 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 14 Mar 2013 10:24:41 +0000 Subject: [Numpy-discussion] Adopt Mersenne Twister 64bit? In-Reply-To: References: <662AFAE5-ED48-402F-A8C4-89C87FA0A6CF@continuum.io> Message-ID: On Wed, Mar 13, 2013 at 12:16 AM, Neal Becker wrote: > I guess I talked to you about 100 years ago about sharing state between numpy > rng and code I have in c++ that wraps boost::random. So is there a C-api for > this RandomState object I could use to call from c++? Maybe I could do > something with that. There is not one currently. Cython has provisions for sharing such low-level access to other Cython extensions, but I'm not sure how well it works for exporting data pointers and function pointers to general C/++ code. We could probably package the necessities into a struct and export a pointer to it via a PyCapsule. -- Robert Kern From ndbecker2 at gmail.com Thu Mar 14 06:54:18 2013 From: ndbecker2 at gmail.com (Neal Becker) Date: Thu, 14 Mar 2013 06:54:18 -0400 Subject: [Numpy-discussion] Adopt Mersenne Twister 64bit? References: <662AFAE5-ED48-402F-A8C4-89C87FA0A6CF@continuum.io> Message-ID: Robert Kern wrote: > On Wed, Mar 13, 2013 at 12:16 AM, Neal Becker wrote: >> I guess I talked to you about 100 years ago about sharing state between numpy >> rng and code I have in c++ that wraps boost::random. So is there a C-api for >> this RandomState object I could use to call from c++? Maybe I could do >> something with that. > > There is not one currently. Cython has provisions for sharing such > low-level access to other Cython extensions, but I'm not sure how well > it works for exporting data pointers and function pointers to general > C/++ code. We could probably package the necessities into a struct and > export a pointer to it via a PyCapsule. > I did find a way to do this, and the results are good enough. Timing is quite comparable to my pure c++ implementation. I used rk_ulong from mtrand.so. I also tried using rk_fill, but it was a bit slower. The boost::python c++ code is attached, for posterity. -------------- next part -------------- A non-text attachment was scrubbed... Name: pn64.cc Type: text/x-c++src Size: 7382 bytes Desc: not available URL: From ndbecker2 at gmail.com Thu Mar 14 07:00:39 2013 From: ndbecker2 at gmail.com (Neal Becker) Date: Thu, 14 Mar 2013 07:00:39 -0400 Subject: [Numpy-discussion] Adopt Mersenne Twister 64bit? References: <662AFAE5-ED48-402F-A8C4-89C87FA0A6CF@continuum.io> Message-ID: Robert Kern wrote: > On Wed, Mar 13, 2013 at 12:16 AM, Neal Becker wrote: >> I guess I talked to you about 100 years ago about sharing state between numpy >> rng and code I have in c++ that wraps boost::random. So is there a C-api for >> this RandomState object I could use to call from c++? Maybe I could do >> something with that. > > There is not one currently. Cython has provisions for sharing such > low-level access to other Cython extensions, but I'm not sure how well > it works for exporting data pointers and function pointers to general > C/++ code. We could probably package the necessities into a struct and > export a pointer to it via a PyCapsule. > One thing this code doesn't do: it requires construction of the wrapper class passing in a RandomState object. It doesn't verify you actually gave it a RandomState object. It's hard to do that. The problem as I see it is to perform this check, I need the RandomStateType object, which unfortunately mtrand.so does not export. The only way to do it is in c++ code: 1. import numpy.random 2. get RandomState class 3. call it to create RandomState instance 4. get the ob_type pointer. Pretty ugly: object mod = object (handle<> (borrowed((PyImport_ImportModule("numpy.random"))))); object rs_obj = mod.attr("RandomState"); object rs_inst = call (rs_obj.ptr(), 0); RandomStateTypeObj = rs_inst.ptr()->ob_type; From robert.kern at gmail.com Thu Mar 14 07:14:32 2013 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 14 Mar 2013 11:14:32 +0000 Subject: [Numpy-discussion] Adopt Mersenne Twister 64bit? In-Reply-To: References: <662AFAE5-ED48-402F-A8C4-89C87FA0A6CF@continuum.io> Message-ID: On Thu, Mar 14, 2013 at 11:00 AM, Neal Becker wrote: > Robert Kern wrote: > >> On Wed, Mar 13, 2013 at 12:16 AM, Neal Becker wrote: >>> I guess I talked to you about 100 years ago about sharing state between numpy >>> rng and code I have in c++ that wraps boost::random. So is there a C-api for >>> this RandomState object I could use to call from c++? Maybe I could do >>> something with that. >> >> There is not one currently. Cython has provisions for sharing such >> low-level access to other Cython extensions, but I'm not sure how well >> it works for exporting data pointers and function pointers to general >> C/++ code. We could probably package the necessities into a struct and >> export a pointer to it via a PyCapsule. >> > > One thing this code doesn't do: it requires construction of the wrapper class > passing in a RandomState object. It doesn't verify you actually gave it a > RandomState object. It's hard to do that. The problem as I see it is to > perform this check, I need the RandomStateType object, which unfortunately > mtrand.so does not export. > > The only way to do it is in c++ code: > > 1. import numpy.random > 2. get RandomState class > 3. call it to create RandomState instance > 4. get the ob_type pointer. > > Pretty ugly: > > object mod = object (handle<> > (borrowed((PyImport_ImportModule("numpy.random"))))); > object rs_obj = mod.attr("RandomState"); > object rs_inst = call (rs_obj.ptr(), 0); > RandomStateTypeObj = rs_inst.ptr()->ob_type; PyObject_IsInstance() should be sufficient. http://docs.python.org/2/c-api/object.html#PyObject_IsInstance -- Robert Kern From jaakko.luttinen at aalto.fi Thu Mar 14 07:54:06 2013 From: jaakko.luttinen at aalto.fi (Jaakko Luttinen) Date: Thu, 14 Mar 2013 13:54:06 +0200 Subject: [Numpy-discussion] Dot/inner products with broadcasting? In-Reply-To: <51408B59.8090504@aalto.fi> References: <51408B59.8090504@aalto.fi> Message-ID: <5141BA5E.2020704@aalto.fi> Answering to myself, this pull request seems to implement an inner product with broadcasting (inner1d) and many other useful functions: https://github.com/numpy/numpy/pull/2954/ -J On 03/13/2013 04:21 PM, Jaakko Luttinen wrote: > Hi! > > How can I compute dot product (or similar multiply&sum operations) > efficiently so that broadcasting is utilized? > For multi-dimensional arrays, NumPy's inner and dot functions do not > match the leading axes and use broadcasting, but instead the result has > first the leading axes of the first input array and then the leading > axes of the second input array. > > For instance, I would like to compute the following inner-product: > np.sum(A*B, axis=-1) > > But numpy.inner gives: > A = np.random.randn(2,3,4) > B = np.random.randn(3,4) > np.inner(A,B).shape > # -> (2, 3, 3) instead of (2, 3) > > Similarly for dot product, I would like to compute for instance: > np.sum(A[...,:,:,np.newaxis]*B[...,np.newaxis,:,:], axis=-2) > > But numpy.dot gives: > In [12]: A = np.random.randn(2,3,4); B = np.random.randn(2,4,5) > In [13]: np.dot(A,B).shape > # -> (2, 3, 2, 5) instead of (2, 3, 5) > > I could use einsum for these operations, but I'm not sure whether that's > as efficient as using some BLAS-supported(?) dot products. > > I couldn't find any function which could perform this kind of > operations. NumPy's functions seem to either flatten the input arrays > (vdot, outer) or just use the axes of the input arrays separately (dot, > inner, tensordot). > > Any help? > > Best regards, > Jaakko > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From rnelsonchem at gmail.com Thu Mar 14 09:05:17 2013 From: rnelsonchem at gmail.com (Ryan) Date: Thu, 14 Mar 2013 13:05:17 +0000 (UTC) Subject: [Numpy-discussion] Any help from Numpy community? References: Message-ID: Birdada Simret gmail.com> writes: > > > Any help from Numpy community > [[ ? 0. ? ? ? ? ?1.54 ? ? ? ?0. ? ? ? ? ?0. ? ? ? ? ?0. ? ? ? ? ? ?1.08 ? ? 1.08 ? ? ?1.08 ?] > > [ 1.54 ? ? ? ?0. ? ? ? ? ?1.08 ? ? ? ?1.08 ? ? ?1.08 ? ? ? ?0. ? ? ? ? ?0. ? ? ? ? ? 0. ? ] > ?[ ? ?0. ? ? ? 1.08 ? ? ? ? 0. ? ? ? ? ?0. ? ? ? ? ? 0. ? ? ? ? ? ?0. ? ? ? ? ? 0. ? ? ? ? ? 0. ? ] > ?[ ? ?0. ? ? ? 1.08 ? ? ? ? 0. ? ? ? ? ?0. ? ? ? ? ? 0. ? ? ? ? ? ?0. ? ? ? ? ?0. ? ? ? ? ? ?0. ? ?] > > ?[ ? 0. ? ? ? ?1.08 ? ? ? ?0. ? ? ? ? ? 0. ? ? ? ? ? 0. ? ? ? ? ? ?0. ? ? ? ? ?0. ? ? ? ? ? ?0. ? ?] > ?[ 1.08 ? ? ? 0. ? ? ? ? ? 0. ? ? ? ? ? 0. ? ? ? ? ? 0. ? ? ? ? ? ?0. ? ? ? ? ?0. ? ? ? ? ? ?0. ? ? ] > > ?[ 1.08 ? ? ? 0. ? ? ? ? ? 0. ? ? ? ? ? 0. ? ? ? ? ? 0. ? ? ? ? ? ?0. ? ? ? ? ?0. ? ? ? ? ? ?0. ? ? ] > ?[ 1.08 ? ? ? 0. ? ? ? ? ? 0. ? ? ? ? ? 0. ? ? ? ? ? 0. ? ? ? ? ? ?0. ? ? ? ? ?0. ? ? ? ? ? ?0. ? ? ]] > > > the above is the numpy array matrix. the numbers represents: > C-C: 1.54 and C-H=1.08 > So I want to write this form as > C of index i is connected to C of index j > C of index i is connected to H of index j > > > (C(i),C(j)) ?# key C(i) and value C(j) > (C(i),H(j)) # key C(i) and value H(j) ; the key C(i) can be repeated to fulfil as much as the values of H(j) > To summarize, ?the out put may look like: > > C1 is connected to C2 > C1 is connected to H1 > C1 is connected to H3 > C2 is connected to H2 ? etc.... > > Any guide is greatly appreciated, > thanks > birda > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > Birda, I think this will get you some of the way there: import numpy as np x = ... # Here's your 2D atomic distance array # Create an indexing array index = np.arange( x.size ).reshape( x.shape ) # Find the non-zero indices items = index[ x != 0 ] # You only need the first half because your array is symmetric items = items[ : items.size/2] rows = items / x.shape[0] cols = items % x.shape[0] print 'Rows: ', rows print 'Columns:', cols print 'Atomic Distances:', x[rows, cols] Hope it helps. Ryan From ndbecker2 at gmail.com Thu Mar 14 09:25:53 2013 From: ndbecker2 at gmail.com (Neal Becker) Date: Thu, 14 Mar 2013 09:25:53 -0400 Subject: [Numpy-discussion] Adopt Mersenne Twister 64bit? References: <662AFAE5-ED48-402F-A8C4-89C87FA0A6CF@continuum.io> Message-ID: Robert Kern wrote: > On Thu, Mar 14, 2013 at 11:00 AM, Neal Becker wrote: >> Robert Kern wrote: >> >>> On Wed, Mar 13, 2013 at 12:16 AM, Neal Becker wrote: >>>> I guess I talked to you about 100 years ago about sharing state between >>>> numpy >>>> rng and code I have in c++ that wraps boost::random. So is there a C-api >>>> for >>>> this RandomState object I could use to call from c++? Maybe I could do >>>> something with that. >>> >>> There is not one currently. Cython has provisions for sharing such >>> low-level access to other Cython extensions, but I'm not sure how well >>> it works for exporting data pointers and function pointers to general >>> C/++ code. We could probably package the necessities into a struct and >>> export a pointer to it via a PyCapsule. >>> >> >> One thing this code doesn't do: it requires construction of the wrapper class >> passing in a RandomState object. It doesn't verify you actually gave it a >> RandomState object. It's hard to do that. The problem as I see it is to >> perform this check, I need the RandomStateType object, which unfortunately >> mtrand.so does not export. >> >> The only way to do it is in c++ code: >> >> 1. import numpy.random >> 2. get RandomState class >> 3. call it to create RandomState instance >> 4. get the ob_type pointer. >> >> Pretty ugly: >> >> object mod = object (handle<> >> (borrowed((PyImport_ImportModule("numpy.random"))))); >> object rs_obj = mod.attr("RandomState"); >> object rs_inst = call (rs_obj.ptr(), 0); >> RandomStateTypeObj = rs_inst.ptr()->ob_type; > > PyObject_IsInstance() should be sufficient. > > http://docs.python.org/2/c-api/object.html#PyObject_IsInstance > > -- > Robert Kern Thanks! For the record, an updated version attached. -------------- next part -------------- A non-text attachment was scrubbed... Name: pn.cc Type: text/x-c++src Size: 7852 bytes Desc: not available URL: From rnelsonchem at gmail.com Thu Mar 14 09:26:32 2013 From: rnelsonchem at gmail.com (Ryan) Date: Thu, 14 Mar 2013 13:26:32 +0000 (UTC) Subject: [Numpy-discussion] Any help from Numpy community? References: Message-ID: > > Birda, > > I think this will get you some of the way there: > > import numpy as np > x = ... # Here's your 2D atomic distance array > # Create an indexing array > index = np.arange( x.size ).reshape( x.shape ) > # Find the non-zero indices > items = index[ x != 0 ] > # You only need the first half because your array is symmetric > items = items[ : items.size/2] > rows = items / x.shape[0] > cols = items % x.shape[0] > print 'Rows: ', rows > print 'Columns:', cols > print 'Atomic Distances:', x[rows, cols] > > Hope it helps. > > Ryan > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > Whoops. That doesn't quite work. You shouldn't drop half the items array like that. This will work better (maybe ?): import numpy as np x = ... # Here's your 2D atomic distance array index = np.arange( x.size ).reshape( x.shape ) items = index[ x != 0 ] rows = items / x.shape[0] cols = items % x.shape[0] # This index mask should take better advantage of the array symmetry mask = rows < cols print 'Rows: ', rows[mask] print 'Columns:', cols[mask] print 'Atomic Distances:', x[rows[mask], cols[mask]] Ryan From birdada85 at gmail.com Thu Mar 14 10:51:15 2013 From: birdada85 at gmail.com (Birdada Simret) Date: Thu, 14 Mar 2013 15:51:15 +0100 Subject: [Numpy-discussion] Any help from Numpy community? In-Reply-To: References: Message-ID: Hi Ryan,Thank you very much indeed, I'm not sure if I well understood your code, let say, for the example array matrix given represents H3C-CH3 connection(bonding). the result from your code is: Rows: [0 0 0 0 1 1 1] # is these for C indices? Columns: [1 2 3 4 5 6 7] # is these for H indices? but it shouldn't be 6 H's? Atomic Distances: [ 1. 1. 1. 1. 1. 1. 1.] # ofcourse this is the number of connections or bonds. In fact, if I write in the form of dictionary: row indices as keys and column indices as values, {0:1, 0:2, 0:3, 0:4, 1:5, 1:6, 1:7}, So, does it mean C[0] is connected to H[1], C[0] is connected to H[2] , H[1],....,C[1] is connected to H[7]? But I have only 6 H's and two C's in this example (H3C-CH3) I have tried some thing like: but still no luck ;( import numpy as np from collections import defaultdict dict = defaultdict(list) x=....2d numpy array I = x.shape[0] J = x.shape[1] d={} for i in xrange(0, I, 1): for j in xrange(0, J, 1): if x[i,j] > 0: dict[i].append(j) # the result is: dict: {0: [1, 2, 3, 4], 1: [0, 5, 6, 7], 2: [0], 3: [0], 4: [0], 5: [1], 6: [1], 7: [1]}) keys: [0, 1, 2, 3, 4, 5, 6, 7] values: [[1, 2, 3, 4], [0, 5, 6, 7], [0], [0], [0], [1], [1], [1]] #The H indices can be found by H_rows = np.nonzero(x.sum(axis=1)== 1) result=>H_rows : [2, 3, 4, 5, 6, 7] # six H's I am trying to connect this indices with the dict result but I am confused! So, now I want to produce a dictionary or what ever to produce results as: H[2] is connected to C[?] H[3] is connected to C[?] H[4] is connected to C[?], ..... Thanks for any help . On Thu, Mar 14, 2013 at 2:26 PM, Ryan wrote: > > > > > Birda, > > > > I think this will get you some of the way there: > > > > import numpy as np > > x = ... # Here's your 2D atomic distance array > > # Create an indexing array > > index = np.arange( x.size ).reshape( x.shape ) > > # Find the non-zero indices > > items = index[ x != 0 ] > > # You only need the first half because your array is symmetric > > items = items[ : items.size/2] > > rows = items / x.shape[0] > > cols = items % x.shape[0] > > print 'Rows: ', rows > > print 'Columns:', cols > > print 'Atomic Distances:', x[rows, cols] > > > > Hope it helps. > > > > Ryan > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > Whoops. > That doesn't quite work. You shouldn't drop half the items array like that. > This will work better (maybe ?): > > import numpy as np > x = ... # Here's your 2D atomic distance array > index = np.arange( x.size ).reshape( x.shape ) > items = index[ x != 0 ] > rows = items / x.shape[0] > cols = items % x.shape[0] > # This index mask should take better advantage of the array symmetry > mask = rows < cols > print 'Rows: ', rows[mask] > print 'Columns:', cols[mask] > print 'Atomic Distances:', x[rows[mask], cols[mask]] > > Ryan > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Thu Mar 14 11:40:53 2013 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Thu, 14 Mar 2013 08:40:53 -0700 Subject: [Numpy-discussion] R: R: R: R: R: fast numpy.fromfile skipping data chunks In-Reply-To: References: Message-ID: On Thu, Mar 14, 2013 at 1:48 AM, Andrea Cimatoribus wrote: > Thanks for all the feedback (on the SSD too). For what concerns "biggus" library, for working on larger-than-memory arrays, this is really interesting, but unfortunately I don't have time to test it at the moment, I will try to have a look at it in the future. I hope to see something like that implemented in numpy soon, though. You may also want to look at carray: https://github.com/FrancescAlted/carray I"ve never used it, but it stores the contents of the array in a compressed from in memory, so if you data compresses well, then it could be a slick solution. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From rnelsonchem at gmail.com Thu Mar 14 12:03:44 2013 From: rnelsonchem at gmail.com (Ryan) Date: Thu, 14 Mar 2013 16:03:44 +0000 (UTC) Subject: [Numpy-discussion] Any help from Numpy community? References: Message-ID: Birdada Simret gmail.com> writes: > > > > Hi Ryan,Thank you very much indeed, I'm not sure if I well understood your code, let say, for the example array matrix given represents ?H3C-CH3 connection(bonding). > the result from your code is: > Rows: ? ?[0 0 0 0 1 1 1] ?# is these for C indices? > Columns: [1 2 3 4 5 6 7] ? # is these for H indices? but it shouldn't be 6 H's? > Atomic Distances: [ 1. ?1. ?1. ?1. ?1. ?1. ?1.] # ofcourse this is the number of connections or bonds. > > In fact, if I write in the form of dictionary: row indices as keys and column indices as values, > {0:1, 0:2, 0:3, 0:4, 1:5, 1:6, 1:7}, So, does it mean C[0] is connected to H[1], C[0] is connected to H[2] , H[1],....,C[1] is connected to H[7]? ?But I have only 6 H's and two C's ?in this example (H3C-CH3)? > > I have tried some thing like: but still no luck ;( > import numpy as np > from collections import defaultdict? > dict = defaultdict(list) > x=....2d numpy array > > I = x.shape[0] > J = x.shape[1] > d={} > for i in xrange(0, I, 1):? > ? for j in xrange(0, J, 1): > ? ? ?if x[i,j] > 0: > ? ? ? ? dict[i].append(j)? > # the result is: > dict: ?{0: [1, 2, 3, 4], 1: [0, 5, 6, 7], 2: [0], 3: [0], 4: [0], 5: [1], 6: [1], 7: [1]}) > keys: [0, 1, 2, 3, 4, 5, 6, 7] > values: ?[[1, 2, 3, 4], [0, 5, 6, 7], [0], [0], [0], [1], [1], [1]] > > ? > #The H indices can be found by > ?H_rows = np.nonzero(x.sum(axis=1)== 1) ? > result=>H_rows : [2, 3, 4, 5, 6, 7] ?# six H's > I am trying to connect this indices with the dict result but I am confused! > So, now I want to produce a dictionary or what ever to produce results as: ?H[2] is connected to C[?] > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? H[3] is connected to C[?] > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? H[4] is connected to C[?], ..... > Thanks for any help > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?. > > > On Thu, Mar 14, 2013 at 2:26 PM, Ryan gmail.com> wrote: > > Birda, I don't know how your getting those values from my code. Here's a slightly modified and fully self-contained version that includes your bonding matrix: import numpy as np x = np.array( [[ 0., 1.54, 0., 0., 0., 1.08, 1.08, 1.08 ], [ 1.54, 0., 1.08, 1.08, 1.08, 0., 0., 0. ], [ 0., 1.08, 0., 0., 0., 0., 0., 0. ], [ 0., 1.08, 0., 0., 0., 0., 0., 0. ], [ 0., 1.08, 0., 0., 0., 0., 0., 0. ], [ 1.08, 0., 0., 0., 0., 0., 0., 0. ], [ 1.08, 0., 0., 0., 0., 0., 0., 0. ], [ 1.08, 0., 0., 0., 0., 0., 0., 0. ]] ) atoms = np.array(['C1', 'C2', 'H3', 'H4', 'H5', 'H6', 'H7', 'H8']) index = np.arange( x.size ).reshape( x.shape ) items = index[ x != 0 ] rows = items / x.shape[0] cols = items % x.shape[0] mask = rows < cols print 'Rows: ', rows[mask] print 'Columns:', cols[mask] print 'Bond Atom 1: ', atoms[ rows[mask] ] print 'Bond Atom 2: ', atoms[ cols[mask] ] print 'Atomic Distances:', x[rows[mask], cols[mask]] If I copy that into a file and run it, I get the following output: Rows: [0 0 0 0 1 1 1] Columns: [1 5 6 7 2 3 4] Bond Atom 1: ['C1' 'C1' 'C1' 'C1' 'C2' 'C2' 'C2'] Bond Atom 2: ['C2' 'H6' 'H7' 'H8' 'H3' 'H4' 'H5'] Atomic Distances: [ 1.54 1.08 1.08 1.08 1.08 1.08 1.08] Honestly, I did not think about your code all that much. Too many 'for' loops for my taste. My code has quite a bit of fancy indexing, which I could imagine is also quite confusing. If you really want a dictionary type of interface that still let's you use Numpy magic, I would take a look at Pandas (http://pandas.pydata.org/) Ryan From birdada85 at gmail.com Thu Mar 14 12:50:34 2013 From: birdada85 at gmail.com (Birdada Simret) Date: Thu, 14 Mar 2013 17:50:34 +0100 Subject: [Numpy-discussion] Any help from Numpy community? In-Reply-To: References: Message-ID: Oh, thanks alot. can the " atoms = np.array(['C1', 'C2', 'H3', 'H4', 'H5', 'H6', 'H7', 'H8'])" able to make general? I mean, if I have a big molecule, it seems difficult to label each time. Ofcourse I'm new to python(even for programing) and I didn't had any knowhow about pandas, but i will try it. any ways, it is great help, many thanks Ryan Birda On Thu, Mar 14, 2013 at 5:03 PM, Ryan wrote: > Birdada Simret gmail.com> writes: > > > > > > > > > Hi Ryan,Thank you very much indeed, I'm not sure if I well understood > your > code, let say, for the example array matrix given represents H3C-CH3 > connection(bonding). > > the result from your code is: > > Rows: [0 0 0 0 1 1 1] # is these for C indices? > > Columns: [1 2 3 4 5 6 7] # is these for H indices? but it shouldn't be > 6 H's? > > Atomic Distances: [ 1. 1. 1. 1. 1. 1. 1.] # ofcourse this is the > number > of connections or bonds. > > > > In fact, if I write in the form of dictionary: row indices as keys and > column > indices as values, > > {0:1, 0:2, 0:3, 0:4, 1:5, 1:6, 1:7}, So, does it mean C[0] is connected > to > H[1], C[0] is connected to H[2] , H[1],....,C[1] is connected to H[7]? > But I > have only 6 H's and two C's in this example (H3C-CH3) > > > > I have tried some thing like: but still no luck ;( > > import numpy as np > > from collections import defaultdict > > dict = defaultdict(list) > > x=....2d numpy array > > > > I = x.shape[0] > > J = x.shape[1] > > d={} > > for i in xrange(0, I, 1): > > for j in xrange(0, J, 1): > > if x[i,j] > 0: > > dict[i].append(j) > > # the result is: > > dict: {0: [1, 2, 3, 4], 1: [0, 5, 6, 7], 2: [0], 3: [0], 4: [0], 5: > [1], 6: > [1], 7: [1]}) > > keys: [0, 1, 2, 3, 4, 5, 6, 7] > > values: [[1, 2, 3, 4], [0, 5, 6, 7], [0], [0], [0], [1], [1], [1]] > > > > > > #The H indices can be found by > > H_rows = np.nonzero(x.sum(axis=1)== 1) > > result=>H_rows : [2, 3, 4, 5, 6, 7] # six H's > > I am trying to connect this indices with the dict result but I am > confused! > > So, now I want to produce a dictionary or what ever to produce results > as: > H[2] is connected to C[?] > > > > H[3] is connected to C[?] > > > > H[4] is connected to C[?], ..... > > Thanks for any help > > . > > > > > > On Thu, Mar 14, 2013 at 2:26 PM, Ryan gmail.com> > wrote: > > > > > > Birda, > > I don't know how your getting those values from my code. Here's a slightly > modified and fully self-contained version that includes your bonding > matrix: > > import numpy as np > x = np.array( > [[ 0., 1.54, 0., 0., 0., 1.08, 1.08, 1.08 ], > [ 1.54, 0., 1.08, 1.08, 1.08, 0., 0., 0. ], > [ 0., 1.08, 0., 0., 0., 0., 0., 0. ], > [ 0., 1.08, 0., 0., 0., 0., 0., 0. ], > [ 0., 1.08, 0., 0., 0., 0., 0., 0. ], > [ 1.08, 0., 0., 0., 0., 0., 0., 0. ], > [ 1.08, 0., 0., 0., 0., 0., 0., 0. ], > [ 1.08, 0., 0., 0., 0., 0., 0., 0. ]] > ) > atoms = np.array(['C1', 'C2', 'H3', 'H4', 'H5', 'H6', 'H7', 'H8']) > index = np.arange( x.size ).reshape( x.shape ) > items = index[ x != 0 ] > rows = items / x.shape[0] > cols = items % x.shape[0] > mask = rows < cols > print 'Rows: ', rows[mask] > print 'Columns:', cols[mask] > print 'Bond Atom 1: ', atoms[ rows[mask] ] > print 'Bond Atom 2: ', atoms[ cols[mask] ] > print 'Atomic Distances:', x[rows[mask], cols[mask]] > > If I copy that into a file and run it, I get the following output: > > Rows: [0 0 0 0 1 1 1] > Columns: [1 5 6 7 2 3 4] > Bond Atom 1: ['C1' 'C1' 'C1' 'C1' 'C2' 'C2' 'C2'] > Bond Atom 2: ['C2' 'H6' 'H7' 'H8' 'H3' 'H4' 'H5'] > Atomic Distances: [ 1.54 1.08 1.08 1.08 1.08 1.08 1.08] > > Honestly, I did not think about your code all that much. Too many 'for' > loops > for my taste. My code has quite a bit of fancy indexing, which I could > imagine > is also quite confusing. > > If you really want a dictionary type of interface that still let's you use > Numpy > magic, I would take a look at Pandas (http://pandas.pydata.org/) > > Ryan > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ake.sandgren at hpc2n.umu.se Fri Mar 15 05:19:45 2013 From: ake.sandgren at hpc2n.umu.se (Ake Sandgren) Date: Fri, 15 Mar 2013 10:19:45 +0100 Subject: [Numpy-discussion] Possible bug in numpy 1.6.1 Message-ID: <1363339185.25361.19.camel@lurvas.hpc2n.umu.se> Hi! Found this thing that looks like a bug in core/src/multiarray/dtype_transfer.c diff -ru site/numpy/core/src/multiarray/dtype_transfer.c amd64_ubuntu1004-intel-acml/numpy/core/src/multiarray/dtype_transfer.c --- site/numpy/core/src/multiarray/dtype_transfer.c 2011-07-20 20:25:28.000000000 +0200 +++ amd64_ubuntu1004-intel-acml/numpy/core/src/multiarray/dtype_transfer.c 2013-03-14 22:09:38.000000000 +0100 @@ -1064,7 +1064,7 @@ _one_to_n_data *d = (_one_to_n_data *)data; PyArray_StridedTransferFn *subtransfer = d->stransfer, *stransfer_finish_src = d->stransfer_finish_src; - void *subdata = d->data, *data_finish_src = data_finish_src; + void *subdata = d->data, *data_finish_src = d->data_finish_src; npy_intp subN = d->N, dst_itemsize = d->dst_itemsize; while (N > 0) { -- Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden Internet: ake at hpc2n.umu.se Phone: +46 90 7866134 Fax: +46 90 7866126 Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se From njs at pobox.com Fri Mar 15 05:44:30 2013 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 15 Mar 2013 09:44:30 +0000 Subject: [Numpy-discussion] Possible bug in numpy 1.6.1 In-Reply-To: <1363339185.25361.19.camel@lurvas.hpc2n.umu.se> References: <1363339185.25361.19.camel@lurvas.hpc2n.umu.se> Message-ID: That does look unlikely yeah... Does this have any consequences that you've found? Is there a test case that fails before the patch but works after? -n On 15 Mar 2013 09:19, "Ake Sandgren" wrote: > Hi! > > Found this thing that looks like a bug in > core/src/multiarray/dtype_transfer.c > > diff -ru site/numpy/core/src/multiarray/dtype_transfer.c > amd64_ubuntu1004-intel-acml/numpy/core/src/multiarray/dtype_transfer.c > --- site/numpy/core/src/multiarray/dtype_transfer.c 2011-07-20 > 20:25:28.000000000 +0200 > +++ > amd64_ubuntu1004-intel-acml/numpy/core/src/multiarray/dtype_transfer.c > 2013-03-14 22:09:38.000000000 +0100 > @@ -1064,7 +1064,7 @@ > _one_to_n_data *d = (_one_to_n_data *)data; > PyArray_StridedTransferFn *subtransfer = d->stransfer, > *stransfer_finish_src = d->stransfer_finish_src; > - void *subdata = d->data, *data_finish_src = data_finish_src; > + void *subdata = d->data, *data_finish_src = d->data_finish_src; > npy_intp subN = d->N, dst_itemsize = d->dst_itemsize; > > while (N > 0) { > > > -- > Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden > Internet: ake at hpc2n.umu.se Phone: +46 90 7866134 Fax: +46 90 7866126 > Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ake.sandgren at hpc2n.umu.se Fri Mar 15 05:52:45 2013 From: ake.sandgren at hpc2n.umu.se (Ake Sandgren) Date: Fri, 15 Mar 2013 10:52:45 +0100 Subject: [Numpy-discussion] Possible bug in numpy 1.6.1 In-Reply-To: References: <1363339185.25361.19.camel@lurvas.hpc2n.umu.se> Message-ID: <1363341165.25361.20.camel@lurvas.hpc2n.umu.se> On Fri, 2013-03-15 at 09:44 +0000, Nathaniel Smith wrote: > That does look unlikely yeah... Does this have any consequences that > you've found? Is there a test case that fails before the patch but > works after? No, just found it during compilation with the intel compiler. It complained about use before initialize on it. And it's still there in 1.7.0 From kchristman54 at yahoo.com Fri Mar 15 08:48:55 2013 From: kchristman54 at yahoo.com (Kevin Christman) Date: Fri, 15 Mar 2013 05:48:55 -0700 (PDT) Subject: [Numpy-discussion] Fox News Message-ID: <1363351735.63163.YahooMailNeo@web121904.mail.ne1.yahoo.com> http://ezinsurance.org/bpln/iebpmnsmrme.htm -------------- next part -------------- An HTML attachment was scrubbed... URL: From tmp50 at ukr.net Fri Mar 15 09:21:21 2013 From: tmp50 at ukr.net (Dmitrey) Date: Fri, 15 Mar 2013 15:21:21 +0200 Subject: [Numpy-discussion] OpenOpt Suite release 0.45 Message-ID: <1991.1363353681.6942528722304958464@ffe16.ukr.net> Hi all, > I'm glad to inform you about new OpenOpt Suite release 0.45 (2013-March-15): * Essential improvements for FuncDesigner interval analysis (thus affect interalg) * Temporary walkaround for a serious bug in FuncDesigner automatic differentiation kernel due to a bug in some versions of Python or NumPy, may affect optimization problems, including (MI)LP, (MI)NLP, TSP etc * Some other minor bugfixes and improvements > > --------------------------- > Regards, D. > http://openopt.org/Dmitrey -------------- next part -------------- An HTML attachment was scrubbed... URL: From oscar.villellas at continuum.io Fri Mar 15 10:22:21 2013 From: oscar.villellas at continuum.io (Oscar Villellas) Date: Fri, 15 Mar 2013 15:22:21 +0100 Subject: [Numpy-discussion] Dot/inner products with broadcasting? In-Reply-To: <5141BA5E.2020704@aalto.fi> References: <51408B59.8090504@aalto.fi> <5141BA5E.2020704@aalto.fi> Message-ID: In fact, there is already an inner1d implemented in numpy.core.umath_tests.inner1d from numpy.core.umath_tests import inner1d It should do the trick :) On Thu, Mar 14, 2013 at 12:54 PM, Jaakko Luttinen wrote: > Answering to myself, this pull request seems to implement an inner > product with broadcasting (inner1d) and many other useful functions: > https://github.com/numpy/numpy/pull/2954/ > -J > > On 03/13/2013 04:21 PM, Jaakko Luttinen wrote: >> Hi! >> >> How can I compute dot product (or similar multiply&sum operations) >> efficiently so that broadcasting is utilized? >> For multi-dimensional arrays, NumPy's inner and dot functions do not >> match the leading axes and use broadcasting, but instead the result has >> first the leading axes of the first input array and then the leading >> axes of the second input array. >> >> For instance, I would like to compute the following inner-product: >> np.sum(A*B, axis=-1) >> >> But numpy.inner gives: >> A = np.random.randn(2,3,4) >> B = np.random.randn(3,4) >> np.inner(A,B).shape >> # -> (2, 3, 3) instead of (2, 3) >> >> Similarly for dot product, I would like to compute for instance: >> np.sum(A[...,:,:,np.newaxis]*B[...,np.newaxis,:,:], axis=-2) >> >> But numpy.dot gives: >> In [12]: A = np.random.randn(2,3,4); B = np.random.randn(2,4,5) >> In [13]: np.dot(A,B).shape >> # -> (2, 3, 2, 5) instead of (2, 3, 5) >> >> I could use einsum for these operations, but I'm not sure whether that's >> as efficient as using some BLAS-supported(?) dot products. >> >> I couldn't find any function which could perform this kind of >> operations. NumPy's functions seem to either flatten the input arrays >> (vdot, outer) or just use the axes of the input arrays separately (dot, >> inner, tensordot). >> >> Any help? >> >> Best regards, >> Jaakko >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From alan.isaac at gmail.com Fri Mar 15 14:38:20 2013 From: alan.isaac at gmail.com (Alan G Isaac) Date: Fri, 15 Mar 2013 14:38:20 -0400 Subject: [Numpy-discussion] OpenOpt Suite release 0.45 In-Reply-To: <1991.1363353681.6942528722304958464@ffe16.ukr.net> References: <1991.1363353681.6942528722304958464@ffe16.ukr.net> Message-ID: <51436A9C.7010905@gmail.com> On 3/15/2013 9:21 AM, Dmitrey wrote: > Temporary walkaround for a serious bug in FuncDesigner automatic differentiation kernel due to a bug in some versions of Python or NumPy, Are the suspected bugs documented somewhere? Alan PS The word 'banausic' is very rare in English. Perhaps you meant 'unsophisticated'? From warren.weckesser at gmail.com Fri Mar 15 14:47:43 2013 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Fri, 15 Mar 2013 14:47:43 -0400 Subject: [Numpy-discussion] Add ability to disable the autogeneration of the function signature in a ufunc docstring. Message-ID: Hi all, In a recent scipy pull request (https://github.com/scipy/scipy/pull/459), I ran into the problem of ufuncs automatically generating a signature in the docstring using arguments such as 'x' or 'x1, x2'. scipy.special has a lot of ufuncs, and for most of them, there are much more descriptive or conventional argument names than 'x'. For now, we will include a nicer signature in the added docstring, and grudgingly put up with the one generated by the ufunc. In the long term, it would be nice to be able to disable the automatic generation of the signature. I submitted a pull request to numpy to allow that: https://github.com/numpy/numpy/pull/3149 Comments on the pull request would be appreciated. Thanks, Warren -------------- next part -------------- An HTML attachment was scrubbed... URL: From tmp50 at ukr.net Fri Mar 15 15:34:36 2013 From: tmp50 at ukr.net (Dmitrey) Date: Fri, 15 Mar 2013 21:34:36 +0200 Subject: [Numpy-discussion] OpenOpt Suite release 0.45 In-Reply-To: <51436A9C.7010905@gmail.com> References: <1991.1363353681.6942528722304958464@ffe16.ukr.net> <51436A9C.7010905@gmail.com> Message-ID: <27202.1363376076.16920329947726938112@ffe12.ukr.net> --- ???????? ????????? --- > ?? ????: "Alan G Isaac" ????: 15 ????? 2013, 20:38:38 On 3/15/2013 9:21 AM, Dmitrey wrote: > Temporary walkaround for a serious bug in FuncDesigner automatic differentiation kernel due to a bug in some versions of Python or NumPy, Are the suspected bugs documented somewhere? the suspected bugs are not documented yet, I guess it will be fixed in future versions of Python or numpy the bug is hard to locate and isolate, it looks like this: derivative_items = list(pointDerivative.items()) # temporary walkaround for a bug in Python or numpy derivative_items.sort(key=lambda elem: elem[0]) ###################################### for key, val in derivative_items: indexes = oovarsIndDict[key] # this line is not reached in the involved buggy case if not involveSparse and isspmatrix(val): val = val.A if r.ndim == 1: r[indexes[0]:indexes[1]] = val.flatten() if type(val) == ndarray else val else: # this line is not reached in the involved buggy case r[:, indexes[0]:indexes[1]] = val if val.shape == r.shape else val.reshape((funcLen, prod(val.shape)/funcLen)) so, pointDerivative is Python dict of pairs (F_i, N_i), where F_i are hashable objects, and even for the case when N_i are ordinary scalars (they can be numpy arrays or scipy sparse matrices) results of this code are different wrt was or was not derivative_items.sort() performed; total number of nonzero elements is same for both cases. oovarsIndDict is dict of pairs (F_i, (n_start_i, n_end_i)), and for the case N_i are all scalars for all i n_end_i = n_start_i - 1. ?Alan PS The word 'banausic' is very rare in English. Perhaps you meant 'unsophisticated'? google translate tells me "banausic" is more appropriate translation than "unsophisticated" for the sense I meant (those frameworks are aimed on modelling only numerical optimization problems, while FuncDesigner is suitable for modelling of systems of linear, nonlinear, ordinary differential equations, eigenvalue problems, interval analysis and much more). D. -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Fri Mar 15 16:04:12 2013 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 15 Mar 2013 20:04:12 +0000 Subject: [Numpy-discussion] OpenOpt Suite release 0.45 In-Reply-To: <27202.1363376076.16920329947726938112@ffe12.ukr.net> References: <1991.1363353681.6942528722304958464@ffe16.ukr.net> <51436A9C.7010905@gmail.com> <27202.1363376076.16920329947726938112@ffe12.ukr.net> Message-ID: On Fri, Mar 15, 2013 at 7:34 PM, Dmitrey wrote: > --- ???????? ????????? --- > > ?? ????: "Alan G Isaac" > ????: 15 ????? 2013, 20:38:38 > > On 3/15/2013 9:21 AM, Dmitrey wrote: >> Temporary walkaround for a serious bug in FuncDesigner automatic >> differentiation kernel due to a bug in some versions of Python or NumPy, > > > Are the suspected bugs documented somewhere? > > the suspected bugs are not documented yet, I guess it will be fixed in > future versions of Python or numpy > the bug is hard to locate and isolate, it looks like this: > > derivative_items = list(pointDerivative.items()) > > # temporary walkaround for a bug in Python or numpy > derivative_items.sort(key=lambda elem: elem[0]) > ###################################### > > for key, val in derivative_items: > indexes = oovarsIndDict[key] > > # this line is not reached in the involved buggy case > if not involveSparse and isspmatrix(val): val = val.A > > if r.ndim == 1: > r[indexes[0]:indexes[1]] = val.flatten() if type(val) == > ndarray else val > else: > # this line is not reached in the involved b uggy case > r[:, indexes[0]:indexes[1]] = val if val.shape == > r.shape else val.reshape((funcLen, prod(val.shape)/funcLen)) > > so, pointDerivative is Python dict of pairs (F_i, N_i), where F_i are > hashable objects, and even for the case when N_i are ordinary scalars (they > can be numpy arrays or scipy sparse matrices) results of this code are > different wrt was or was not derivative_items.sort() performed; total number > of nonzero elements is same for both cases. oovarsIndDict is dict of pairs > (F_i, (n_start_i, n_end_i)), and for the case N_i are all scalars for all i > n_end_i = n_start_i - 1. If you can turn this into a minimal self-contained working example we can take a look... -n From njs at pobox.com Fri Mar 15 16:05:48 2013 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 15 Mar 2013 20:05:48 +0000 Subject: [Numpy-discussion] Possible bug in numpy 1.6.1 In-Reply-To: <1363341165.25361.20.camel@lurvas.hpc2n.umu.se> References: <1363339185.25361.19.camel@lurvas.hpc2n.umu.se> <1363341165.25361.20.camel@lurvas.hpc2n.umu.se> Message-ID: On Fri, Mar 15, 2013 at 9:52 AM, Ake Sandgren wrote: > On Fri, 2013-03-15 at 09:44 +0000, Nathaniel Smith wrote: >> That does look unlikely yeah... Does this have any consequences that >> you've found? Is there a test case that fails before the patch but >> works after? > > No, just found it during compilation with the intel compiler. It > complained about use before initialize on it. > > And it's still there in 1.7.0 Clever compiler. Since no-one has jumped up to investigate yet, can you file a bug on the github tracker, so at least it doesn't get lost entirely before someone finds the time to do that? -n From njs at pobox.com Fri Mar 15 16:39:45 2013 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 15 Mar 2013 20:39:45 +0000 Subject: [Numpy-discussion] Add ability to disable the autogeneration of the function signature in a ufunc docstring. In-Reply-To: References: Message-ID: On Fri, Mar 15, 2013 at 6:47 PM, Warren Weckesser wrote: > Hi all, > > In a recent scipy pull request (https://github.com/scipy/scipy/pull/459), I > ran into the problem of ufuncs automatically generating a signature in the > docstring using arguments such as 'x' or 'x1, x2'. scipy.special has a lot > of ufuncs, and for most of them, there are much more descriptive or > conventional argument names than 'x'. For now, we will include a nicer > signature in the added docstring, and grudgingly put up with the one > generated by the ufunc. In the long term, it would be nice to be able to > disable the automatic generation of the signature. I submitted a pull > request to numpy to allow that: https://github.com/numpy/numpy/pull/3149 > > Comments on the pull request would be appreciated. The functionality seems obviously useful, but adding a magic public attribute to all ufuncs seems like a somewhat clumsy way to expose it? Esp. since ufuncs are always created through the C API, including docstring specification, but this can only be set at the Python level? Maybe it's the best option but it seems worth taking a few minutes to consider alternatives. Brainstorming: - If the first line of the docstring starts with "(" and ends with ")", then that's a signature and we skip adding one (I think sphinx does something like this?) Kinda magic and implicit, but highly backwards compatible. - Declare that henceforth, the signature generation will be disabled by default, and go through and add a special marker like "__SIGNATURE__" to all the existing ufunc docstrings, which gets replaced (if present) by the automagically generated signature. - Give ufunc arguments actual names in general, that work for things like kwargs, and then use those in the automagically generated signature. This is the most work, but it would mean that people don't have to remember to update their non-magic signatures whenever numpy adds a new feature like out= or where=, and would make the docstrings actually accurate, which right now they aren't: In [7]: np.add.__doc__.split("\n")[0] Out[7]: 'add(x1, x2[, out])' In [8]: np.add(x1=1, x2=2) ValueError: invalid number of arguments - Allow some special syntax to describe the argument names in the docstring: "__ARGNAMES__: a b\n" -> "add(a, b[, out])" - Something else... -n From alan.isaac at gmail.com Fri Mar 15 16:54:10 2013 From: alan.isaac at gmail.com (Alan G Isaac) Date: Fri, 15 Mar 2013 16:54:10 -0400 Subject: [Numpy-discussion] OpenOpt Suite release 0.45 In-Reply-To: <27202.1363376076.16920329947726938112@ffe12.ukr.net> References: <1991.1363353681.6942528722304958464@ffe16.ukr.net> <51436A9C.7010905@gmail.com> <27202.1363376076.16920329947726938112@ffe12.ukr.net> Message-ID: <51438A72.3000301@gmail.com> On 3/15/2013 3:34 PM, Dmitrey wrote: > the suspected bugs are not documented yet I'm going to guess that the state of the F_i changes when you use them as keys (i.e., when you call __le__. It is very hard to imagine that this is a Python or NumPy bug. Cheers, Alan From pav at iki.fi Fri Mar 15 17:19:03 2013 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 15 Mar 2013 23:19:03 +0200 Subject: [Numpy-discussion] Add ability to disable the autogeneration of the function signature in a ufunc docstring. In-Reply-To: References: Message-ID: 15.03.2013 22:39, Nathaniel Smith kirjoitti: [clip] > - Something else... How about: scrap the automatic signatures altogether, and directly use the docstring provided to the ufunc creation function? I suspect ufuncs are not very widely used in 3rd party code, as it requires somewhat tricky messing with the C API. The backwards compatibility issue is also just a documentation issue, so nothing drastic. -- Pauli Virtanen From tmp50 at ukr.net Sat Mar 16 05:31:37 2013 From: tmp50 at ukr.net (Dmitrey) Date: Sat, 16 Mar 2013 11:31:37 +0200 Subject: [Numpy-discussion] OpenOpt Suite release 0.45 In-Reply-To: <51438A72.3000301@gmail.com> References: <1991.1363353681.6942528722304958464@ffe16.ukr.net> <51436A9C.7010905@gmail.com> <27202.1363376076.16920329947726938112@ffe12.ukr.net> <51438A72.3000301@gmail.com> Message-ID: <6447.1363426297.5285784474692943872@ffe6.ukr.net> --- ???????? ????????? --- ?? ????: "Alan G Isaac" ????: 15 ????? 2013, 22:54:21 On 3/15/2013 3:34 PM, Dmitrey wrote: > the suspected bugs are not documented yet I'm going to guess that the state of the F_i changes when you use them as keys (i.e., when you call __le__. no, their state doesn't change for operations like __le__ . AFAIK searching Python dict doesn't calls __le__ on the object keys at all, it operates with method .__hash__(), and latter returns fixed integer numbers assigned to the objects earlier (at least in my case). ?It is very hard to imagine that this is a Python or NumPy bug. Cheers, Alan _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthieu.brucher at gmail.com Sat Mar 16 05:33:32 2013 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Sat, 16 Mar 2013 09:33:32 +0000 Subject: [Numpy-discussion] OpenOpt Suite release 0.45 In-Reply-To: <6447.1363426297.5285784474692943872@ffe6.ukr.net> References: <1991.1363353681.6942528722304958464@ffe16.ukr.net> <51436A9C.7010905@gmail.com> <27202.1363376076.16920329947726938112@ffe12.ukr.net> <51438A72.3000301@gmail.com> <6447.1363426297.5285784474692943872@ffe6.ukr.net> Message-ID: Hi, Different objects can have the same hash, so it compares to find the actual correct object. Usually when you store something in a dict and later you can't find it anymore, it is that the internal state changed and that the hash is not the same anymore. Matthieu 2013/3/16 Dmitrey > > > --- ???????? ????????? --- > ?? ????: "Alan G Isaac" > ????: 15 ????? 2013, 22:54:21 > > On 3/15/2013 3:34 PM, Dmitrey wrote: > > the suspected bugs are not documented yet > > > I'm going to guess that the state of the F_i changes > when you use them as keys (i.e., when you call __le__. > > no, their state doesn't change for operations like __le__ . AFAIK > searching Python dict doesn't calls __le__ on the object keys at all, it > operates with method .__hash__(), and latter returns fixed integer numbers > assigned to the objects earlier (at least in my case). > > > It is very hard to imagine that this is a Python or NumPy bug. > > Cheers, > Alan > > _______________________________________________ > NumPy-Discussion mailing listNumPy-Discussion at scipy.orghttp://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher Music band: http://liliejay.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From tmp50 at ukr.net Sat Mar 16 06:36:10 2013 From: tmp50 at ukr.net (Dmitrey) Date: Sat, 16 Mar 2013 12:36:10 +0200 Subject: [Numpy-discussion] OpenOpt Suite release 0.45 In-Reply-To: References: <1991.1363353681.6942528722304958464@ffe16.ukr.net> <51436A9C.7010905@gmail.com> <27202.1363376076.16920329947726938112@ffe12.ukr.net> <51438A72.3000301@gmail.com> <6447.1363426297.5285784474692943872@ffe6.ukr.net> Message-ID: <90157.1363430170.9409858753789952@ffe8.ukr.net> --- ???????? ????????? --- ?? ????: "Matthieu Brucher" ????: 16 ????? 2013, 11:33:39 Hi, Different objects can have the same hash, so it compares to find the actual correct object. Usually when you store something in a dict and later you can't find it anymore, it is that the internal state changed and that the hash is not the same anymore. my objects (oofuns) definitely have different __hash__() results - it's just integers 1,2,3 etc assigned to the oofuns (stored in oofun._id field) when they are created. D. Matthieu 2013/3/16 Dmitrey --- ???????? ????????? --- ?? ????: "Alan G Isaac" ????: 15 ????? 2013, 22:54:21 On 3/15/2013 3:34 PM, Dmitrey wrote: > the suspected bugs are not documented yet I'm going to guess that the state of the F_i changes when you use them as keys (i.e., when you call __le__. no, their state doesn't change for operations like __le__ . AFAIK searching Python dict doesn't calls __le__ on the object keys at all, it operates with method .__hash__(), and latter returns fixed integer numbers assigned to the objects earlier (at least in my case). ?It is very hard to imagine that this is a Python or NumPy bug. Cheers, Alan _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher Music band: http://liliejay.com/ _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthieu.brucher at gmail.com Sat Mar 16 06:39:05 2013 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Sat, 16 Mar 2013 10:39:05 +0000 Subject: [Numpy-discussion] OpenOpt Suite release 0.45 In-Reply-To: <90157.1363430170.9409858753789952@ffe8.ukr.net> References: <1991.1363353681.6942528722304958464@ffe16.ukr.net> <51436A9C.7010905@gmail.com> <27202.1363376076.16920329947726938112@ffe12.ukr.net> <51438A72.3000301@gmail.com> <6447.1363426297.5285784474692943872@ffe6.ukr.net> <90157.1363430170.9409858753789952@ffe8.ukr.net> Message-ID: Even if they have different hashes, they can be stored in the same underlying list before they are retrieved. Then, an actual comparison is done to check if the given key (i.e. object instance, not hash) is the same as one of the stored keys. 2013/3/16 Dmitrey > > > --- ???????? ????????? --- > ?? ????: "Matthieu Brucher" > ????: 16 ????? 2013, 11:33:39 > > Hi, > > Different objects can have the same hash, so it compares to find the > actual correct object. > Usually when you store something in a dict and later you can't find it > anymore, it is that the internal state changed and that the hash is not the > same anymore. > > > my objects (oofuns) definitely have different __hash__() results - it's > just integers 1,2,3 etc assigned to the oofuns (stored in oofun._id field) > when they are created. > > > D. > > > > Matthieu > > > 2013/3/16 Dmitrey > > > > --- ???????? ????????? --- > ?? ????: "Alan G Isaac" > ????: 15 ????? 2013, 22:54:21 > > On 3/15/2013 3:34 PM, Dmitrey wrote: > > the suspected bugs are not documented yet > > > I'm going to guess that the state of the F_i changes > when you use them as keys (i.e., when you call __le__. > > no, their state doesn't change for operations like __le__ . AFAIK > searching Python dict doesn't calls __le__ on the object keys at all, it > operates with method .__hash__(), and latter returns fixed integer numbers > assigned to the objects earlier (at least in my case). > > > It is very hard to imagine that this is a Python or NumPy bug. > > Cheers, > Alan > > _______________________________________________ > NumPy-Discussion mailing listNumPy-Discussion at scipy.orghttp://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > -- > Information System Engineer, Ph.D. > Blog: http://matt.eifelle.com > LinkedIn: http://www.linkedin.com/in/matthieubrucher > Music band: http://liliejay.com/ > > _______________________________________________ > NumPy-Discussion mailing listNumPy-Discussion at scipy.orghttp://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher Music band: http://liliejay.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From tmp50 at ukr.net Sat Mar 16 07:48:59 2013 From: tmp50 at ukr.net (Dmitrey) Date: Sat, 16 Mar 2013 13:48:59 +0200 Subject: [Numpy-discussion] OpenOpt Suite release 0.45 In-Reply-To: References: <1991.1363353681.6942528722304958464@ffe16.ukr.net> <51436A9C.7010905@gmail.com> <27202.1363376076.16920329947726938112@ffe12.ukr.net> <51438A72.3000301@gmail.com> <6447.1363426297.5285784474692943872@ffe6.ukr.net> <90157.1363430170.9409858753789952@ffe8.ukr.net> Message-ID: <10909.1363434539.16610087277036634112@ffe16.ukr.net> --- ???????? ????????? --- ?? ????: "Matthieu Brucher" ????: 16 ????? 2013, 12:39:07 Even if they have different hashes, they can be stored in the same underlying list before they are retrieved. Then, an actual comparison is done to check if the given key (i.e. object instance, not hash) is the same as one of the stored keys. but, as I have already mentioned, comparison of oofun(s) via __le__, __eq__ etc doesn't change their inner state (but the methods can create additional oofun(s), although). I have checked via debugger - my methods __le__, __eq__, __lt__, __gt__, __ge__ are not called from the buggy place of code, only __hash__ is called from there. Python could check key objects equivalence via id(), although, but I don't see any possible bug source from using id(). D. 2013/3/16 Dmitrey --- ???????? ????????? --- ?? ????: "Matthieu Brucher" ????: 16 ????? 2013, 11:33:39 Hi, Different objects can have the same hash, so it compares to find the actual correct object. Usually when you store something in a dict and later you can't find it anymore, it is that the internal state changed and that the hash is not the same anymore. my objects (oofuns) definitely have different __hash__() results - it's just integers 1,2,3 etc assigned to the oofuns (stored in oofun._id field) when they are created. D. Matthieu 2013/3/16 Dmitrey --- ???????? ????????? --- ?? ????: "Alan G Isaac" ????: 15 ????? 2013, 22:54:21 On 3/15/2013 3:34 PM, Dmitrey wrote: > the suspected bugs are not documented yet I'm going to guess that the state of the F_i changes when you use them as keys (i.e., when you call __le__. no, their state doesn't change for operations like __le__ . AFAIK searching Python dict doesn't calls __le__ on the object keys at all, it operates with method .__hash__(), and latter returns fixed integer numbers assigned to the objects earlier (at least in my case). ?It is very hard to imagine that this is a Python or NumPy bug. Cheers, Alan _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher Music band: http://liliejay.com/ _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher Music band: http://liliejay.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sat Mar 16 08:11:47 2013 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 16 Mar 2013 12:11:47 +0000 Subject: [Numpy-discussion] OpenOpt Suite release 0.45 In-Reply-To: <10909.1363434539.16610087277036634112@ffe16.ukr.net> References: <1991.1363353681.6942528722304958464@ffe16.ukr.net> <51436A9C.7010905@gmail.com> <27202.1363376076.16920329947726938112@ffe12.ukr.net> <51438A72.3000301@gmail.com> <6447.1363426297.5285784474692943872@ffe6.ukr.net> <90157.1363430170.9409858753789952@ffe8.ukr.net> <10909.1363434539.16610087277036634112@ffe16.ukr.net> Message-ID: On 16 Mar 2013 11:49, "Dmitrey" wrote: > > > > > --- ???????? ????????? --- > ?? ????: "Matthieu Brucher" > ????: 16 ????? 2013, 12:39:07 > >> Even if they have different hashes, they can be stored in the same underlying list before they are retrieved. Then, an actual comparison is done to check if the given key (i.e. object instance, not hash) is the same as one of the stored keys. > > >> > but, as I have already mentioned, comparison of oofun(s) via __le__, __eq__ etc doesn't change their inner state (but the methods can create additional oofun(s), although). > I have checked via debugger - my methods __le__, __eq__, __lt__, __gt__, __ge__ are not called from the buggy place of code, only __hash__ is called from there. Python could check key objects equivalence via id(), although, but I don't see any possible bug source from using id(). Dict lookup always calls both __hash__ and __eq__. I guess it might use id() to shortcut the __eq__ call in some cases - there are some places in python that do. Anyway there's no point trying to debug this code by ESP... It's not even clear from what's been said whether dict lookups have anything to do with the problem. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From chaoyuejoy at gmail.com Sat Mar 16 12:40:51 2013 From: chaoyuejoy at gmail.com (Chao YUE) Date: Sat, 16 Mar 2013 17:40:51 +0100 Subject: [Numpy-discussion] indexing of arbitrary axis and arbitrary slice? Message-ID: Dear all, Is there some way to index the numpy array by specifying arbitrary axis and arbitrary slice, while not knowing the actual shape of the data? For example, I have a 3-dim data, data.shape = (3,4,5) Is there a way to retrieve data[:,0,:] by using something like np.retrieve_data(data,axis=2,slice=0), by this way you don't have to know the actual shape of the array. for for 4-dim data, np.retrieve_data(data,axis=2,slice=0) will actually be data[:,0,:,:] thanks in advance, Chao -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sat Mar 16 12:49:13 2013 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 16 Mar 2013 16:49:13 +0000 Subject: [Numpy-discussion] indexing of arbitrary axis and arbitrary slice? In-Reply-To: References: Message-ID: On 16 Mar 2013 16:41, "Chao YUE" wrote: > > Dear all, > > Is there some way to index the numpy array by specifying arbitrary axis and arbitrary slice, while > not knowing the actual shape of the data? > For example, I have a 3-dim data, data.shape = (3,4,5) > Is there a way to retrieve data[:,0,:] by using something like np.retrieve_data(data,axis=2,slice=0), > by this way you don't have to know the actual shape of the array. > for for 4-dim data, np.retrieve_data(data,axis=2,slice=0) will actually be data[:,0,:,:] I don't know of anything quite like that, but it's easy to fake it: def retrieve_data(a, ax, idx): full_idx = [slice(None)] * a.ndim full_idx[ax] = idx return a[tuple(full_idx)] Or for the specific case where you do know the axis in advance, you just don't know how many trailing axes there are, use a[:, :, 0, ...] and the ... will expand to represent the appropriate number of :'s. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Sat Mar 16 13:54:28 2013 From: robert.kern at gmail.com (Robert Kern) Date: Sat, 16 Mar 2013 17:54:28 +0000 Subject: [Numpy-discussion] OpenOpt Suite release 0.45 In-Reply-To: References: <1991.1363353681.6942528722304958464@ffe16.ukr.net> <51436A9C.7010905@gmail.com> <27202.1363376076.16920329947726938112@ffe12.ukr.net> <51438A72.3000301@gmail.com> <6447.1363426297.5285784474692943872@ffe6.ukr.net> <90157.1363430170.9409858753789952@ffe8.ukr.net> Message-ID: On Sat, Mar 16, 2013 at 10:39 AM, Matthieu Brucher wrote: > Even if they have different hashes, they can be stored in the same > underlying list before they are retrieved. Then, an actual comparison is > done to check if the given key (i.e. object instance, not hash) is the same > as one of the stored keys. Right. And the rule is that if two objects compare equal, then they must also hash equal. Unfortunately, it looks like `oofun` objects do not obey this property. oofun.__eq__() seems to return a Constraint rather than a bool, so oofun objects should simply not be used as dictionary keys. That's quite possibly the source of the bug. Or at least, that's a bug that needs to get fixed first before attempting to debug anything else or attribute bugs to Python or numpy. Also, the lack of a bool-returning __eq__() will prevent proper sorting, which also seems to be used in the code snippet that Dmitrey showed. -- Robert Kern From tmp50 at ukr.net Sat Mar 16 14:19:26 2013 From: tmp50 at ukr.net (Dmitrey) Date: Sat, 16 Mar 2013 20:19:26 +0200 Subject: [Numpy-discussion] OpenOpt Suite release 0.45 In-Reply-To: References: <1991.1363353681.6942528722304958464@ffe16.ukr.net> <51436A9C.7010905@gmail.com> <27202.1363376076.16920329947726938112@ffe12.ukr.net> <51438A72.3000301@gmail.com> <6447.1363426297.5285784474692943872@ffe6.ukr.net> <90157.1363430170.9409858753789952@ffe8.ukr.net> Message-ID: <29942.1363457966.17802425454193213440@ffe11.ukr.net> --- ???????? ????????? --- ?? ????: "Robert Kern" ????: 16 ????? 2013, 19:54:51 On Sat, Mar 16, 2013 at 10:39 AM, Matthieu Brucher < matthieu.brucher at gmail.com > wrote: > Even if they have different hashes, they can be stored in the same > underlying list before they are retrieved. Then, an actual comparison is > done to check if the given key (i.e. object instance, not hash) is the same > as one of the stored keys. Right. And the rule is that if two objects compare equal, then they must also hash equal. Unfortunately, it looks like `oofun` objects do not obey this property. oofun.__eq__() seems to return a Constraint rather than a bool, so oofun objects should simply not be used as dictionary keys. It is one of several base features FuncDesigner is build on and is used extremely often and wide; then whole FuncDesigner would work incorrectly while it is used intensively and solves many problems better than its competitors. That's quite possibly the source of the bug. Or at least, that's a bug that needs to get fixed first before attempting to debug anything else or attribute bugs to Python or numpy. Also, the lack of a bool-returning __eq__() will prevent proper sorting, which also seems to be used in the code snippet that Dmitrey showed. as I have already mentioned, I ensured via debugger that my __eq__, __le__ etc are not involved from the buggy place of the code, only __hash__ is involved from there. ?-- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Sat Mar 16 16:14:46 2013 From: robert.kern at gmail.com (Robert Kern) Date: Sat, 16 Mar 2013 20:14:46 +0000 Subject: [Numpy-discussion] OpenOpt Suite release 0.45 In-Reply-To: <29942.1363457966.17802425454193213440@ffe11.ukr.net> References: <1991.1363353681.6942528722304958464@ffe16.ukr.net> <51436A9C.7010905@gmail.com> <27202.1363376076.16920329947726938112@ffe12.ukr.net> <51438A72.3000301@gmail.com> <6447.1363426297.5285784474692943872@ffe6.ukr.net> <90157.1363430170.9409858753789952@ffe8.ukr.net> <29942.1363457966.17802425454193213440@ffe11.ukr.net> Message-ID: On Sat, Mar 16, 2013 at 6:19 PM, Dmitrey wrote: > > > --- ???????? ????????? --- > ?? ????: "Robert Kern" > ????: 16 ????? 2013, 19:54:51 > > On Sat, Mar 16, 2013 at 10:39 AM, Matthieu Brucher > wrote: >> Even if they have different hashes, they can be stored in the same >> underlying list before they are retrieved. Then, an actual comparison is >> done to check if the given key (i.e. object instance, not hash) is the >> same >> as one of the stored keys. > > Right. And the rule is that if two objects compare equal, then they > must also hash equal. Unfortunately, it looks like `oofun` objects do > not obey this property. oofun.__eq__() seems to return a Constraint > rather than a bool, so oofun objects should simply not be used as > dictionary keys. > > It is one of several base features FuncDesigner is build on and is used > extremely often and wide; then whole FuncDesigner would work incorrectly > while it is used intensively and solves many problems better than its > competitors. I understand. It just means that you can't oofun objects as dictionary keys. Adding a __hash__() method is not enough to make that work. > That's quite possibly the source of the bug. Or at > least, that's a bug that needs to get fixed first before attempting to > debug anything else or attribute bugs to Python or numpy. Also, the > lack of a bool-returning __eq__() will prevent proper sorting, which > also seems to be used in the code snippet that Dmitrey showed. > > as I have already mentioned, I ensured via debugger that my __eq__, __le__ > etc are not involved from the buggy place of the code, only __hash__ is > involved from there. oofun.__lt__() will certainly be called, and it too is problematic. If pointDerivates is a dict mapping oofun objects to other objects as you say, then derivative_items will be a list of (oofun, object) tuples. If you sort derivative_items by the first element, the oofun objects, then oofun.__lt__() *will* be called. That's how list.sort() works. I was wrong: oofun.__eq__() won't be called by the sorting. You are probably not seeing the oofun.__eq__() problem in that code because of an implementation detail in Python: dicts will check identity first before trying to compare with __eq__(). You may be having problems in the construction of the pointDerivates dict or ooVarsIndDict outside of this code snippet, so if you just ran your debugger over this code snippet, you would not detect those calls. However, if you are not seeing the oofun.__lt__() calls from the sorting with your debugger, then your debugger may be missing the oofun.__eq__() calls, too. By all means, if you still think the bug is in someone else's code, please post a short example that other people can run that will demonstrate the problem. -- Robert Kern From njs at pobox.com Sat Mar 16 17:23:35 2013 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 16 Mar 2013 21:23:35 +0000 Subject: [Numpy-discussion] Add ability to disable the autogeneration of the function signature in a ufunc docstring. In-Reply-To: References: Message-ID: On Fri, Mar 15, 2013 at 9:19 PM, Pauli Virtanen wrote: > 15.03.2013 22:39, Nathaniel Smith kirjoitti: > [clip] >> - Something else... > > How about: scrap the automatic signatures altogether, and directly use > the docstring provided to the ufunc creation function? > > I suspect ufuncs are not very widely used in 3rd party code, as it > requires somewhat tricky messing with the C API. The backwards > compatibility issue is also just a documentation issue, so nothing drastic. True enough. I guess a question is how much it bothers us that there are tons of ufunc arguments that are just not mentioned in the interpreter docstrings: http://docs.scipy.org/doc/numpy/reference/ufuncs.html#optional-keyword-arguments Obviously not a huge amount of we'd have altered the auto-generation already to include them :-) But IMHO it would be kind of nice if ?np.add mentioned the existence of things like where= and dtype=... and if we decide that docstrings ought to mention such things, then it's going to be a right hassle updating them all by hand every time some new ufunc feature is added. -n From chaoyuejoy at gmail.com Mon Mar 18 05:25:41 2013 From: chaoyuejoy at gmail.com (Chao YUE) Date: Mon, 18 Mar 2013 10:25:41 +0100 Subject: [Numpy-discussion] indexing of arbitrary axis and arbitrary slice? In-Reply-To: References: Message-ID: Hi Nathaniel, thanks for your reply, it works fine and suffice for my purpose. cheers, Chao On Sat, Mar 16, 2013 at 5:49 PM, Nathaniel Smith wrote: > On 16 Mar 2013 16:41, "Chao YUE" wrote: > > > > Dear all, > > > > Is there some way to index the numpy array by specifying arbitrary axis > and arbitrary slice, while > > not knowing the actual shape of the data? > > For example, I have a 3-dim data, data.shape = (3,4,5) > > Is there a way to retrieve data[:,0,:] by using something like > np.retrieve_data(data,axis=2,slice=0), > > by this way you don't have to know the actual shape of the array. > > for for 4-dim data, np.retrieve_data(data,axis=2,slice=0) will actually > be data[:,0,:,:] > > I don't know of anything quite like that, but it's easy to fake it: > > def retrieve_data(a, ax, idx): > full_idx = [slice(None)] * a.ndim > full_idx[ax] = idx > return a[tuple(full_idx)] > > Or for the specific case where you do know the axis in advance, you just > don't know how many trailing axes there are, use > a[:, :, 0, ...] > and the ... will expand to represent the appropriate number of :'s. > > -n > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.robitaille at gmail.com Mon Mar 18 05:56:52 2013 From: thomas.robitaille at gmail.com (Thomas Robitaille) Date: Mon, 18 Mar 2013 10:56:52 +0100 Subject: [Numpy-discussion] Memory issue with memory-mapped array assignment Message-ID: Hi everyone, I've come across a memory issue when trying to assign data to slices of a Numpy memory-mapped array. The short story is that if I create a memory mapped array and try to add data to subsets of the array many times in a loop, the memory usage of my code grows over time, suggesting there is some kind of memory leak. More specifically, if I run the following script: import random import numpy as np image = np.memmap('image.np', mode='w+', dtype=np.float32, shape=(10000, 10000)) print("Before assignment") for i in range(1000): x = random.uniform(1000, 9000) y = random.uniform(1000, 9000) imin = int(x) - 128 imax = int(x) + 128 jmin = int(y) - 128 jmax = int(y) + 128 data = np.random.random((256,256)) image[imin:imax, jmin:jmax] = image[imin:imax, jmin:jmax] + data del x, y, imin, imax, jmin, jmax, data the memory usage goes up to ~300Mb after 1000 iterations (and proportionally more if I increase the number of iterations). I've written up a more detailed overview of the issue on stackoverflow (with memory profiling): http://stackoverflow.com/questions/15473377/memory-issue-with-numpy-memory-mapped-array-assignment Does anyone have any idea what is going on, and how I can avoid this issue? Thanks! Tom From mpuecker at mit.edu Mon Mar 18 09:42:09 2013 From: mpuecker at mit.edu (Matt U) Date: Mon, 18 Mar 2013 13:42:09 +0000 (UTC) Subject: [Numpy-discussion] numpy reference array References: Message-ID: Chris Barker - NOAA Federal noaa.gov> writes: > check out "stride tricks" for clever things you can do. > > But numpy does require that the data in your array be a contiguous > block, in order, so you can't arbitrarily re-arrange it while keeping > a view. > > HTH, > -Chris > Hi Chris, Thanks for the reply, you've just saved me a lot of time. I did run across 'views' but it looked like I couldn't have my data arbitrarily arranged. Thank you for confirming that. Unfortunately my desired view does not fit a neat striding pattern. Cheers, Matt From pierre.haessig at crans.org Mon Mar 18 13:00:02 2013 From: pierre.haessig at crans.org (Pierre Haessig) Date: Mon, 18 Mar 2013 18:00:02 +0100 Subject: [Numpy-discussion] Numpy correlate In-Reply-To: <1363252681.94038.YahooMailNeo@web193401.mail.sg3.yahoo.com> References: , <1363252681.94038.YahooMailNeo@web193401.mail.sg3.yahoo.com> Message-ID: <51474812.2060001@crans.org> Hi Sudheer, Le 14/03/2013 10:18, Sudheer Joseph a ?crit : > Dear Numpy/Scipy experts, > Attached is a script > which I made to test the numpy.correlate ( which is called py > plt.xcorr) to see how the cross correlation is calculated. From this > it appears the if i call plt.xcorr(x,y) > Y is slided back in time compared to x. ie if y is a process that > causes a delayed response in x after 5 timesteps then there should be > a high correlation at Lag 5. However in attached plot the response is > seen in only -ve side of the lags. > Can any one advice me on how to see which way exactly the 2 series > are slided back or forth.? and understand the cause result relation > better?( I understand merely by correlation one cannot assume cause > and result relation, but it is important to know which series is older > in time at a given lag. You indeed pointed out a lack of documentation of in matplotlib.xcorr function because the definition of covariance can be ambiguous. The way I would try to get an interpretation of xcorr function (& its friends) is to go back to the theoretical definition of cross-correlation, which is a normalized version of the covariance. In your example you've created a time series X(k) and a lagged one : Y(k) = X(k-5) Now, the covariance function of X and Y is commonly defined as : Cov_{X,Y}(h) = E(X(k+h) * Y(k)) where E is the expectation (assuming that X and Y are centered for the sake of clarity). If I plug in the definition of Y, I get Cov(h) = E(X(k+h) * X(k-5)). This yields naturally the fact that the covariance is indeed maximal at h=-5 and not h=+5. Note that this reasoning does yield the opposite result with a different definition of the covariance, ie. Cov_{X,Y}(h) = E(X(k) * Y(k+h)) (and that's what I first did !). Therefore, I think there should be a definition in of cross correlation in matplotlib xcorr docstring. In R's acf doc, there is this mention : "The lag k value returned by ccf(x, y) estimates the correlation between x[t+k] and y[t]. " (see http://stat.ethz.ch/R-manual/R-devel/library/stats/html/acf.html) Now I believe, this upper discussion really belongs to matplotlib ML. I'll put an issue on github (I just spotted a mistake the definition of normalization anyway) Coming back to numpy : There's a strange thing, the definition of numpy.correlate seems to give the other definition "z[k] = sum_n a[n] * conj(v[n+k])" ( http://docs.scipy.org/doc/numpy/reference/generated/numpy.correlate.html) although its usage prooves otherwise. What did I miss ? best, Pierre -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 900 bytes Desc: OpenPGP digital signature URL: From jsseabold at gmail.com Mon Mar 18 13:10:16 2013 From: jsseabold at gmail.com (Skipper Seabold) Date: Mon, 18 Mar 2013 13:10:16 -0400 Subject: [Numpy-discussion] Numpy correlate In-Reply-To: <51474812.2060001@crans.org> References: <1363252681.94038.YahooMailNeo@web193401.mail.sg3.yahoo.com> <51474812.2060001@crans.org> Message-ID: On Mon, Mar 18, 2013 at 1:00 PM, Pierre Haessig wrote: > Hi Sudheer, > > Le 14/03/2013 10:18, Sudheer Joseph a ?crit : > > Dear Numpy/Scipy experts, > Attached is a script which I > made to test the numpy.correlate ( which is called py plt.xcorr) to see how > the cross correlation is calculated. From this it appears the if i call > plt.xcorr(x,y) > Y is slided back in time compared to x. ie if y is a process that causes a > delayed response in x after 5 timesteps then there should be a high > correlation at Lag 5. However in attached plot the response is seen in only > -ve side of the lags. > Can any one advice me on how to see which way exactly the 2 series > are slided back or forth.? and understand the cause result relation > better?( I understand merely by correlation one cannot assume cause and > result relation, but it is important to know which series is older in time > at a given lag. > > You indeed pointed out a lack of documentation of in matplotlib.xcorr > function because the definition of covariance can be ambiguous. > > The way I would try to get an interpretation of xcorr function (& its > friends) is to go back to the theoretical definition of cross-correlation, > which is a normalized version of the covariance. > > In your example you've created a time series X(k) and a lagged one : Y(k) > = X(k-5) > > Now, the covariance function of X and Y is commonly defined as : > Cov_{X,Y}(h) = E(X(k+h) * Y(k)) where E is the expectation > (assuming that X and Y are centered for the sake of clarity). > > If I plug in the definition of Y, I get Cov(h) = E(X(k+h) * X(k-5)). This > yields naturally the fact that the covariance is indeed maximal at h=-5 and > not h=+5. > > Note that this reasoning does yield the opposite result with a different > definition of the covariance, ie. Cov_{X,Y}(h) = E(X(k) * Y(k+h)) (and > that's what I first did !). > > > Therefore, I think there should be a definition in of cross correlation in > matplotlib xcorr docstring. In R's acf doc, there is this mention : "The > lag k value returned by ccf(x, y) estimates the correlation between x[t+k] > and y[t]. " > (see http://stat.ethz.ch/R-manual/R-devel/library/stats/html/acf.html) > > Now I believe, this upper discussion really belongs to matplotlib ML. I'll > put an issue on github (I just spotted a mistake the definition of > normalization anyway) > You might be interested in the statsmodels implementation which should be similar to the R functionality. http://nbviewer.ipython.org/urls/raw.github.com/jseabold/tutorial/master/tsa_arma.ipynb http://statsmodels.sourceforge.net/devel/generated/statsmodels.tsa.stattools.acf.html http://statsmodels.sourceforge.net/devel/generated/statsmodels.graphics.tsaplots.plot_acf.html Skipper -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Mon Mar 18 16:21:35 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 18 Mar 2013 16:21:35 -0400 Subject: [Numpy-discussion] Numpy correlate In-Reply-To: References: <1363252681.94038.YahooMailNeo@web193401.mail.sg3.yahoo.com> <51474812.2060001@crans.org> Message-ID: On Mon, Mar 18, 2013 at 1:10 PM, Skipper Seabold wrote: > On Mon, Mar 18, 2013 at 1:00 PM, Pierre Haessig > wrote: >> >> Hi Sudheer, >> >> Le 14/03/2013 10:18, Sudheer Joseph a ?crit : >> >> Dear Numpy/Scipy experts, >> Attached is a script which I >> made to test the numpy.correlate ( which is called py plt.xcorr) to see how >> the cross correlation is calculated. From this it appears the if i call >> plt.xcorr(x,y) >> Y is slided back in time compared to x. ie if y is a process that causes a >> delayed response in x after 5 timesteps then there should be a high >> correlation at Lag 5. However in attached plot the response is seen in only >> -ve side of the lags. >> Can any one advice me on how to see which way exactly the 2 series are >> slided back or forth.? and understand the cause result relation better?( I >> understand merely by correlation one cannot assume cause and result >> relation, but it is important to know which series is older in time at a >> given lag. >> >> You indeed pointed out a lack of documentation of in matplotlib.xcorr >> function because the definition of covariance can be ambiguous. >> >> The way I would try to get an interpretation of xcorr function (& its >> friends) is to go back to the theoretical definition of cross-correlation, >> which is a normalized version of the covariance. >> >> In your example you've created a time series X(k) and a lagged one : Y(k) >> = X(k-5) >> >> Now, the covariance function of X and Y is commonly defined as : >> Cov_{X,Y}(h) = E(X(k+h) * Y(k)) where E is the expectation >> (assuming that X and Y are centered for the sake of clarity). >> >> If I plug in the definition of Y, I get Cov(h) = E(X(k+h) * X(k-5)). This >> yields naturally the fact that the covariance is indeed maximal at h=-5 and >> not h=+5. >> >> Note that this reasoning does yield the opposite result with a different >> definition of the covariance, ie. Cov_{X,Y}(h) = E(X(k) * Y(k+h)) (and >> that's what I first did !). >> >> >> Therefore, I think there should be a definition in of cross correlation in >> matplotlib xcorr docstring. In R's acf doc, there is this mention : "The lag >> k value returned by ccf(x, y) estimates the correlation between x[t+k] and >> y[t]. " >> (see http://stat.ethz.ch/R-manual/R-devel/library/stats/html/acf.html) >> >> Now I believe, this upper discussion really belongs to matplotlib ML. I'll >> put an issue on github (I just spotted a mistake the definition of >> normalization anyway) > > > > You might be interested in the statsmodels implementation which should be > similar to the R functionality. > > http://nbviewer.ipython.org/urls/raw.github.com/jseabold/tutorial/master/tsa_arma.ipynb > http://statsmodels.sourceforge.net/devel/generated/statsmodels.tsa.stattools.acf.html > http://statsmodels.sourceforge.net/devel/generated/statsmodels.graphics.tsaplots.plot_acf.html we don't have any cross-correlation xcorr, AFAIR but I guess it should work the same way. Josef > > Skipper > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From sudheer.joseph at yahoo.com Tue Mar 19 03:07:57 2013 From: sudheer.joseph at yahoo.com (Sudheer Joseph) Date: Tue, 19 Mar 2013 15:07:57 +0800 (SGT) Subject: [Numpy-discussion] Numpy correlate In-Reply-To: References: <1363252681.94038.YahooMailNeo@web193401.mail.sg3.yahoo.com> <51474812.2060001@crans.org> Message-ID: <1363676877.47397.YahooMailNeo@web193403.mail.sg3.yahoo.com> Thank you All for the response, ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?acf do not accept 2 variables so naturally? http://nbviewer.ipython.org/urls/raw.github.com/jseabold/tutorial/master/tsa_arma.ipynb >?http://statsmodels.sourceforge.net/devel/generated/statsmodels.tsa.stattools.acf.html >?http://statsmodels.sourceforge.net/devel/generated/statsmodels.graphics.tsaplots.plot_acf.html These may not work for me. ? *************************************************************** Sudheer Joseph Indian National Centre for Ocean Information Services Ministry of Earth Sciences, Govt. of India POST BOX NO: 21, IDA Jeedeemetla P.O. Via Pragathi Nagar,Kukatpally, Hyderabad; Pin:5000 55 Tel:+91-40-23886047(O),Fax:+91-40-23895011(O), Tel:+91-40-23044600(R),Tel:+91-40-9440832534(Mobile) E-mail:sjo.India at gmail.com;sudheer.joseph at yahoo.com Web- http://oppamthadathil.tripod.com *************************************************************** ________________________________ From: "josef.pktd at gmail.com" To: Discussion of Numerical Python Sent: Tuesday, 19 March 2013 1:51 AM Subject: Re: [Numpy-discussion] Numpy correlate On Mon, Mar 18, 2013 at 1:10 PM, Skipper Seabold wrote: > On Mon, Mar 18, 2013 at 1:00 PM, Pierre Haessig > wrote: >> >> Hi Sudheer, >> >> Le 14/03/2013 10:18, Sudheer Joseph a ?crit : >> >> Dear Numpy/Scipy experts, >>? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Attached is a script which I >> made to test the numpy.correlate ( which is called py plt.xcorr) to see how >> the cross correlation is calculated. From this it appears the if i call >> plt.xcorr(x,y) >> Y is slided back in time compared to x. ie if y is a process that causes a >> delayed response in x after 5 timesteps then there should be a high >> correlation at Lag 5. However in attached plot the response is seen in only >> -ve side of the lags. >> Can any one advice me on how to see which way exactly the 2 series are >> slided back or forth.? and understand the cause result relation better?( I >> understand merely by correlation one cannot assume cause and result >> relation, but it is important to know which series is older in time at a >> given lag. >> >> You indeed pointed out a lack of documentation of in matplotlib.xcorr >> function because the definition of covariance can be ambiguous. >> >> The way I would try to get an interpretation of xcorr function (& its >> friends) is to go back to the theoretical definition of cross-correlation, >> which is a normalized version of the covariance. >> >> In your example you've created a time series X(k) and a lagged one : Y(k) >> = X(k-5) >> >> Now, the covariance function of X and Y is commonly defined as : >>? Cov_{X,Y}(h) = E(X(k+h) * Y(k))? where E is the expectation >>? (assuming that X and Y are centered for the sake of clarity). >> >> If I plug in the definition of Y, I get Cov(h) = E(X(k+h) * X(k-5)). This >> yields naturally the fact that the covariance is indeed maximal at h=-5 and >> not h=+5. >> >> Note that this reasoning does yield the opposite result with a different >> definition of the covariance, ie. Cov_{X,Y}(h) = E(X(k) * Y(k+h))? (and >> that's what I first did !). >> >> >> Therefore, I think there should be a definition in of cross correlation in >> matplotlib xcorr docstring. In R's acf doc, there is this mention : "The lag >> k value returned by ccf(x, y) estimates the correlation between x[t+k] and >> y[t]. " >> (see http://stat.ethz.ch/R-manual/R-devel/library/stats/html/acf.html) >> >> Now I believe, this upper discussion really belongs to matplotlib ML. I'll >> put an issue on github (I just spotted a mistake the definition of >> normalization anyway) > > > > You might be interested in the statsmodels implementation which should be > similar to the R functionality. > > http://nbviewer.ipython.org/urls/raw.github.com/jseabold/tutorial/master/tsa_arma.ipynb > http://statsmodels.sourceforge.net/devel/generated/statsmodels.tsa.stattools.acf.html > http://statsmodels.sourceforge.net/devel/generated/statsmodels.graphics.tsaplots.plot_acf.html we don't have any cross-correlation xcorr, AFAIR but I guess it should work the same way. Josef > > Skipper > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From sudheer.joseph at yahoo.com Tue Mar 19 03:12:00 2013 From: sudheer.joseph at yahoo.com (Sudheer Joseph) Date: Tue, 19 Mar 2013 15:12:00 +0800 (SGT) Subject: [Numpy-discussion] Numpy correlate In-Reply-To: <51474812.2060001@crans.org> References: , <1363252681.94038.YahooMailNeo@web193401.mail.sg3.yahoo.com> <51474812.2060001@crans.org> Message-ID: <1363677120.65039.YahooMailNeo@web193405.mail.sg3.yahoo.com> Thank you Pierre, ? ? ? ? ? ? ? ? ? ? ? ? It appears the numpy.correlate uses the frequency domain method for getting the ccf. I would like to know how serious or exactly what is the issue with normalization?. I have computed cross correlation using the function and interpreting the results based on it. It will be helpful if you could tell me if there is a significant bug in the function with best regards, Sudheer From: Pierre Haessig To: numpy-discussion at scipy.org Sent: Monday, 18 March 2013 10:30 PM Subject: Re: [Numpy-discussion] Numpy correlate Hi Sudheer, Le 14/03/2013 10:18, Sudheer Joseph a ?crit?: Dear Numpy/Scipy experts, >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Attached is a script which I made to test the numpy.correlate ( which is called py plt.xcorr) to see how the cross correlation is calculated. From this it appears the if i call plt.xcorr(x,y) >Y is?slided back in time compared to x. ie if y is a process that causes a delayed response in x after 5 timesteps then there should be a high correlation at Lag 5. However in attached plot the response is seen in only -ve side of the lags. >Can any one advice me on how to see which way exactly the 2 series are?slided?back or forth.? and understand the cause result relation better?( I understand merely by correlation one cannot assume cause and result relation, but it is important to know which series is older in time at a given lag. You indeed pointed out a lack of documentation of in matplotlib.xcorr function because the definition of covariance can be ambiguous. The way I would try to get an interpretation of xcorr function (& its friends) is to go back to the theoretical definition of cross-correlation, which is a normalized version of the covariance. In your example you've created a time series X(k) and a lagged one : Y(k) = X(k-5) Now, the covariance function of X and Y is commonly defined as : ?Cov_{X,Y}(h) = E(X(k+h) * Y(k))?? where E is the expectation ?(assuming that X and Y are centered for the sake of clarity). If I plug in the definition of Y, I get Cov(h) = E(X(k+h) * X(k-5)). This yields naturally the fact that the covariance is indeed maximal at h=-5 and not h=+5. Note that this reasoning does yield the opposite result with a different definition of the covariance, ie. Cov_{X,Y}(h) = E(X(k) * Y(k+h))? (and that's what I first did !). Therefore, I think there should be a definition in of cross correlation in matplotlib xcorr docstring. In R's acf doc, there is this mention : "The lag k value returned by ccf(x, y) estimates the correlation between x[t+k] and y[t]. " (see http://stat.ethz.ch/R-manual/R-devel/library/stats/html/acf.html) Now I believe, this upper discussion really belongs to matplotlib ML. I'll put an issue on github (I just spotted a mistake the definition of normalization anyway) Coming back to numpy : There's a strange thing, the definition of numpy.correlate seems to give the other definition "z[k] = sum_n a[n] * conj(v[n+k])" ( http://docs.scipy.org/doc/numpy/reference/generated/numpy.correlate.html) although its usage prooves otherwise. What did I miss ? best, Pierre _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre.haessig at crans.org Wed Mar 20 04:30:59 2013 From: pierre.haessig at crans.org (Pierre Haessig) Date: Wed, 20 Mar 2013 09:30:59 +0100 Subject: [Numpy-discussion] Numpy correlate In-Reply-To: <1363677120.65039.YahooMailNeo@web193405.mail.sg3.yahoo.com> References: , <1363252681.94038.YahooMailNeo@web193401.mail.sg3.yahoo.com> <51474812.2060001@crans.org> <1363677120.65039.YahooMailNeo@web193405.mail.sg3.yahoo.com> Message-ID: <514973C3.9000905@crans.org> Hi, Le 19/03/2013 08:12, Sudheer Joseph a ?crit : > *Thank you Pierre,* > It appears the numpy.correlate uses the > frequency domain method for getting the ccf. I would like to know how > serious or exactly what is the issue with normalization?. I have > computed cross correlation using the function and interpreting the > results based on it. It will be helpful if you could tell me if there > is a significant bug in the function > with best regards, > Sudheer np.correlate works in the time domain. I started a discussion about a month ago about the way it's implemented http://mail.scipy.org/pipermail/numpy-discussion/2013-February/065562.html Unfortunately I didn't find time to dig deeper in the matter which needs working in the C code of numpy which I'm not familiar with. Concerning the normalization of mpl.xcorr, I think that what is computed is just fine. It's just the way this normalization is described in the docstring which I think is weird. https://github.com/matplotlib/matplotlib/issues/1835 best, Pierre -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 900 bytes Desc: OpenPGP digital signature URL: From jaakko.luttinen at aalto.fi Wed Mar 20 09:33:50 2013 From: jaakko.luttinen at aalto.fi (Jaakko Luttinen) Date: Wed, 20 Mar 2013 15:33:50 +0200 Subject: [Numpy-discussion] Dot/inner products with broadcasting? In-Reply-To: References: <51408B59.8090504@aalto.fi> <5141BA5E.2020704@aalto.fi> Message-ID: <5149BABE.1080306@aalto.fi> I tried using this inner1d as an alternative to dot because it uses broadcasting. However, I found something surprising: Not only is inner1d much much slower than dot, it is also slower than einsum which is much more general: In [68]: import numpy as np In [69]: import numpy.core.gufuncs_linalg as gula In [70]: K = np.random.randn(1000,1000) In [71]: %timeit gula.inner1d(K[:,np.newaxis,:], np.swapaxes(K,-1,-2)[np.newaxis,:,:]) 1 loops, best of 3: 6.05 s per loop In [72]: %timeit np.dot(K,K) 1 loops, best of 3: 392 ms per loop In [73]: %timeit np.einsum('ik,kj->ij', K, K) 1 loops, best of 3: 1.24 s per loop Why is it so? I thought that the performance of inner1d would be somewhere in between dot and einsum, probably closer to dot. Now I don't see any reason to use inner1d instead of einsum.. -Jaakko On 03/15/2013 04:22 PM, Oscar Villellas wrote: > In fact, there is already an inner1d implemented in > numpy.core.umath_tests.inner1d > > from numpy.core.umath_tests import inner1d > > It should do the trick :) > > On Thu, Mar 14, 2013 at 12:54 PM, Jaakko Luttinen > wrote: >> Answering to myself, this pull request seems to implement an inner >> product with broadcasting (inner1d) and many other useful functions: >> https://github.com/numpy/numpy/pull/2954/ >> -J >> >> On 03/13/2013 04:21 PM, Jaakko Luttinen wrote: >>> Hi! >>> >>> How can I compute dot product (or similar multiply&sum operations) >>> efficiently so that broadcasting is utilized? >>> For multi-dimensional arrays, NumPy's inner and dot functions do not >>> match the leading axes and use broadcasting, but instead the result has >>> first the leading axes of the first input array and then the leading >>> axes of the second input array. >>> >>> For instance, I would like to compute the following inner-product: >>> np.sum(A*B, axis=-1) >>> >>> But numpy.inner gives: >>> A = np.random.randn(2,3,4) >>> B = np.random.randn(3,4) >>> np.inner(A,B).shape >>> # -> (2, 3, 3) instead of (2, 3) >>> >>> Similarly for dot product, I would like to compute for instance: >>> np.sum(A[...,:,:,np.newaxis]*B[...,np.newaxis,:,:], axis=-2) >>> >>> But numpy.dot gives: >>> In [12]: A = np.random.randn(2,3,4); B = np.random.randn(2,4,5) >>> In [13]: np.dot(A,B).shape >>> # -> (2, 3, 2, 5) instead of (2, 3, 5) >>> >>> I could use einsum for these operations, but I'm not sure whether that's >>> as efficient as using some BLAS-supported(?) dot products. >>> >>> I couldn't find any function which could perform this kind of >>> operations. NumPy's functions seem to either flatten the input arrays >>> (vdot, outer) or just use the axes of the input arrays separately (dot, >>> inner, tensordot). >>> >>> Any help? >>> >>> Best regards, >>> Jaakko >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From cjw at ncf.ca Wed Mar 20 09:46:35 2013 From: cjw at ncf.ca (Colin J. Williams) Date: Wed, 20 Mar 2013 09:46:35 -0400 Subject: [Numpy-discussion] Execution time difference between 2.7 and 3.2 using numpy Message-ID: <5149BDBB.6060509@ncf.ca> An HTML attachment was scrubbed... URL: From pierre.barbierdereuille at gmail.com Wed Mar 20 09:57:59 2013 From: pierre.barbierdereuille at gmail.com (Pierre Barbier de Reuille) Date: Wed, 20 Mar 2013 14:57:59 +0100 Subject: [Numpy-discussion] Bug in np.records? Message-ID: Hey, I am trying to use titles for the record arrays. In the documentation, it is specified that any column can set to "None". However, trying this fails on numpy 1.6.2 because in np.core.records, on line 195, the "strip" method is called on the title object. This is really annoying. Could we fix this by replacing line 195 with: self._titles = [n.strip() if n is not None else None for n in titles[:self._nfields]] ? Thank you, -- Barbier de Reuille Pierre -------------- next part -------------- An HTML attachment was scrubbed... URL: From sd at syntonetic.com Wed Mar 20 09:59:33 2013 From: sd at syntonetic.com (=?UTF-8?B?U8O4cmVu?=) Date: Wed, 20 Mar 2013 14:59:33 +0100 Subject: [Numpy-discussion] numpy array to C API Message-ID: <5149C0C5.6050801@syntonetic.com> Greetings I'm extending our existing C/C++ software with Python/Numpy in order to do extra number crunching. It already works like a charm calling python with the C API . But what is the proper way of passing double arrays returned from Python/Numpy routines back to C? I came across PyArray but I can see in the compiler warnings, it is deprecated and I don't wanna start from scratch on legacy facilities. Going forward, what is the intended way of doing this with neat code on both sides and with a minimum of mem copy gymnastics overhead? thanks in advance S?ren From davidmenhur at gmail.com Wed Mar 20 10:14:06 2013 From: davidmenhur at gmail.com (=?UTF-8?B?RGHPgGlk?=) Date: Wed, 20 Mar 2013 15:14:06 +0100 Subject: [Numpy-discussion] Execution time difference between 2.7 and 3.2 using numpy In-Reply-To: <5149BDBB.6060509@ncf.ca> References: <5149BDBB.6060509@ncf.ca> Message-ID: Without much detailed knowledge of the topic, I would expect both versions to give very similar timing, as it is essentially a call to ATLAS function, not much is done in Python. Given this, maybe the difference is in ATLAS itself. How have you installed it? When you compile ATLAS, it will do some machine-specific optimisation, but if you have installed a binary chances are that your version is optimised for a machine quite different from yours. So, two different installations could have been compiled in different machines and so one is more suited for your machine. If you want to be sure, I would try to compile ATLAS (this may be difficult) or check the same on a very different machine (like an AMD processor, different architecture...). Just for reference, on Linux Python 2.7 64 bits can deal with these matrices easily. %timeit mat=np.random.random((6143,6143)); matinv= np.linalg.inv(mat); res = np.dot(mat, matinv); diff= res-np.eye(6143); print np.sum(np.abs(diff)) 2.41799631031e-05 1.13955868701e-05 3.64338191541e-05 1.13484781021e-05 1 loops, best of 3: 156 s per loop Intel i5, 4 GB of RAM and SSD. ATLAS installed from Fedora repository (I don't run heavy stuff on this computer). On 20 March 2013 14:46, Colin J. Williams wrote: > I have a small program which builds random matrices for increasing matrix > orders, inverts the matrix and checks the precision of the product. At some > point, one would expect operations to fail, when the memory capacity is > exceeded. In both Python 2.7 and 3.2 matrices of order 3,071 area handled, > but not 6,143. > > Using wall-clock times, with win32, Python 3.2 is slower than Python 2.7. > The profiler indicates a problem in the solver. > > Done on a Pentium, with 2.7 GHz processor, 2 GB of RAM and 221 GB of free > disk space. Both Python 3.2.3 and Python 2.7.3 use numpy 1.6.2. > > The results are show below. > > Colin W. > > aaaa_ssss > 2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit (Intel)] > order= 2 measure ofimprecision= 0.097 Time elapsed (seconds)= > 0.004143 > order= 5 measure ofimprecision= 2.207 Time elapsed (seconds)= > 0.001514 > order= 11 measure ofimprecision= 2.372 Time elapsed (seconds)= > 0.001455 > order= 23 measure ofimprecision= 3.318 Time elapsed (seconds)= > 0.001608 > order= 47 measure ofimprecision= 4.257 Time elapsed (seconds)= > 0.002339 > order= 95 measure ofimprecision= 4.986 Time elapsed (seconds)= > 0.005747 > order= 191 measure ofimprecision= 5.788 Time elapsed (seconds)= > 0.029974 > order= 383 measure ofimprecision= 6.765 Time elapsed (seconds)= > 0.145339 > order= 767 measure ofimprecision= 7.909 Time elapsed (seconds)= > 0.841142 > order= 1535 measure ofimprecision= 8.532 Time elapsed (seconds)= > 5.793630 > order= 3071 measure ofimprecision= 9.774 Time elapsed (seconds)= > 39.559540 > order= 6143 Process terminated by a MemoryError > > Above: 2.7.3 Below: Python 3.2.3 > > bbb_bbb > 3.2.3 (default, Apr 11 2012, 07:15:24) [MSC v.1500 32 bit (Intel)] > order= 2 measure ofimprecision= 0.000 Time elapsed (seconds)= > 0.113930 > order= 5 measure ofimprecision= 1.807 Time elapsed (seconds)= > 0.001373 > order= 11 measure ofimprecision= 2.395 Time elapsed (seconds)= > 0.001468 > order= 23 measure ofimprecision= 3.073 Time elapsed (seconds)= > 0.001609 > order= 47 measure ofimprecision= 5.642 Time elapsed (seconds)= > 0.002687 > order= 95 measure ofimprecision= 5.745 Time elapsed (seconds)= > 0.013510 > order= 191 measure ofimprecision= 5.866 Time elapsed (seconds)= > 0.061560 > order= 383 measure ofimprecision= 7.129 Time elapsed (seconds)= > 0.418490 > order= 767 measure ofimprecision= 8.240 Time elapsed (seconds)= > 3.815713 > order= 1535 measure ofimprecision= 8.735 Time elapsed (seconds)= > 27.877270 > order= 3071 measure ofimprecision= 9.996 Time elapsed > (seconds)=212.545610 > order= 6143 Process terminated by a MemoryError > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From jenshnielsen at gmail.com Wed Mar 20 10:29:37 2013 From: jenshnielsen at gmail.com (Jens Nielsen) Date: Wed, 20 Mar 2013 14:29:37 +0000 Subject: [Numpy-discussion] Execution time difference between 2.7 and 3.2 using numpy In-Reply-To: References: <5149BDBB.6060509@ncf.ca> Message-ID: Hi, Could also be that they are linked to different libs such as atlas and standart Blas. What is the output of numpy.show_config() in the two different python versions. Jens On Wed, Mar 20, 2013 at 2:14 PM, Da?id wrote: > Without much detailed knowledge of the topic, I would expect both > versions to give very similar timing, as it is essentially a call to > ATLAS function, not much is done in Python. > > Given this, maybe the difference is in ATLAS itself. How have you > installed it? When you compile ATLAS, it will do some machine-specific > optimisation, but if you have installed a binary chances are that your > version is optimised for a machine quite different from yours. So, two > different installations could have been compiled in different machines > and so one is more suited for your machine. If you want to be sure, I > would try to compile ATLAS (this may be difficult) or check the same > on a very different machine (like an AMD processor, different > architecture...). > > > > Just for reference, on Linux Python 2.7 64 bits can deal with these > matrices easily. > > %timeit mat=np.random.random((6143,6143)); matinv= np.linalg.inv(mat); > res = np.dot(mat, matinv); diff= res-np.eye(6143); print > np.sum(np.abs(diff)) > 2.41799631031e-05 > 1.13955868701e-05 > 3.64338191541e-05 > 1.13484781021e-05 > 1 loops, best of 3: 156 s per loop > > Intel i5, 4 GB of RAM and SSD. ATLAS installed from Fedora repository > (I don't run heavy stuff on this computer). > > On 20 March 2013 14:46, Colin J. Williams wrote: > > I have a small program which builds random matrices for increasing matrix > > orders, inverts the matrix and checks the precision of the product. At > some > > point, one would expect operations to fail, when the memory capacity is > > exceeded. In both Python 2.7 and 3.2 matrices of order 3,071 area > handled, > > but not 6,143. > > > > Using wall-clock times, with win32, Python 3.2 is slower than Python 2.7. > > The profiler indicates a problem in the solver. > > > > Done on a Pentium, with 2.7 GHz processor, 2 GB of RAM and 221 GB of free > > disk space. Both Python 3.2.3 and Python 2.7.3 use numpy 1.6.2. > > > > The results are show below. > > > > Colin W. > > > > aaaa_ssss > > 2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit (Intel)] > > order= 2 measure ofimprecision= 0.097 Time elapsed (seconds)= > > 0.004143 > > order= 5 measure ofimprecision= 2.207 Time elapsed (seconds)= > > 0.001514 > > order= 11 measure ofimprecision= 2.372 Time elapsed (seconds)= > > 0.001455 > > order= 23 measure ofimprecision= 3.318 Time elapsed (seconds)= > > 0.001608 > > order= 47 measure ofimprecision= 4.257 Time elapsed (seconds)= > > 0.002339 > > order= 95 measure ofimprecision= 4.986 Time elapsed (seconds)= > > 0.005747 > > order= 191 measure ofimprecision= 5.788 Time elapsed (seconds)= > > 0.029974 > > order= 383 measure ofimprecision= 6.765 Time elapsed (seconds)= > > 0.145339 > > order= 767 measure ofimprecision= 7.909 Time elapsed (seconds)= > > 0.841142 > > order= 1535 measure ofimprecision= 8.532 Time elapsed (seconds)= > > 5.793630 > > order= 3071 measure ofimprecision= 9.774 Time elapsed (seconds)= > > 39.559540 > > order= 6143 Process terminated by a MemoryError > > > > Above: 2.7.3 Below: Python 3.2.3 > > > > bbb_bbb > > 3.2.3 (default, Apr 11 2012, 07:15:24) [MSC v.1500 32 bit (Intel)] > > order= 2 measure ofimprecision= 0.000 Time elapsed (seconds)= > > 0.113930 > > order= 5 measure ofimprecision= 1.807 Time elapsed (seconds)= > > 0.001373 > > order= 11 measure ofimprecision= 2.395 Time elapsed (seconds)= > > 0.001468 > > order= 23 measure ofimprecision= 3.073 Time elapsed (seconds)= > > 0.001609 > > order= 47 measure ofimprecision= 5.642 Time elapsed (seconds)= > > 0.002687 > > order= 95 measure ofimprecision= 5.745 Time elapsed (seconds)= > > 0.013510 > > order= 191 measure ofimprecision= 5.866 Time elapsed (seconds)= > > 0.061560 > > order= 383 measure ofimprecision= 7.129 Time elapsed (seconds)= > > 0.418490 > > order= 767 measure ofimprecision= 8.240 Time elapsed (seconds)= > > 3.815713 > > order= 1535 measure ofimprecision= 8.735 Time elapsed (seconds)= > > 27.877270 > > order= 3071 measure ofimprecision= 9.996 Time elapsed > > (seconds)=212.545610 > > order= 6143 Process terminated by a MemoryError > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nouiz at nouiz.org Wed Mar 20 10:30:48 2013 From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=) Date: Wed, 20 Mar 2013 10:30:48 -0400 Subject: [Numpy-discussion] Execution time difference between 2.7 and 3.2 using numpy In-Reply-To: References: <5149BDBB.6060509@ncf.ca> Message-ID: Hi, win32 do not mean it is a 32 bits windows. sys.platform always return win32 on 32bits and 64 bits windows even for python 64 bits. But that is a good question, is your python 32 or 64 bits? Fred On Wed, Mar 20, 2013 at 10:14 AM, Da?id wrote: > Without much detailed knowledge of the topic, I would expect both > versions to give very similar timing, as it is essentially a call to > ATLAS function, not much is done in Python. > > Given this, maybe the difference is in ATLAS itself. How have you > installed it? When you compile ATLAS, it will do some machine-specific > optimisation, but if you have installed a binary chances are that your > version is optimised for a machine quite different from yours. So, two > different installations could have been compiled in different machines > and so one is more suited for your machine. If you want to be sure, I > would try to compile ATLAS (this may be difficult) or check the same > on a very different machine (like an AMD processor, different > architecture...). > > > > Just for reference, on Linux Python 2.7 64 bits can deal with these > matrices easily. > > %timeit mat=np.random.random((6143,6143)); matinv= np.linalg.inv(mat); > res = np.dot(mat, matinv); diff= res-np.eye(6143); print > np.sum(np.abs(diff)) > 2.41799631031e-05 > 1.13955868701e-05 > 3.64338191541e-05 > 1.13484781021e-05 > 1 loops, best of 3: 156 s per loop > > Intel i5, 4 GB of RAM and SSD. ATLAS installed from Fedora repository > (I don't run heavy stuff on this computer). > > On 20 March 2013 14:46, Colin J. Williams wrote: >> I have a small program which builds random matrices for increasing matrix >> orders, inverts the matrix and checks the precision of the product. At some >> point, one would expect operations to fail, when the memory capacity is >> exceeded. In both Python 2.7 and 3.2 matrices of order 3,071 area handled, >> but not 6,143. >> >> Using wall-clock times, with win32, Python 3.2 is slower than Python 2.7. >> The profiler indicates a problem in the solver. >> >> Done on a Pentium, with 2.7 GHz processor, 2 GB of RAM and 221 GB of free >> disk space. Both Python 3.2.3 and Python 2.7.3 use numpy 1.6.2. >> >> The results are show below. >> >> Colin W. >> >> aaaa_ssss >> 2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit (Intel)] >> order= 2 measure ofimprecision= 0.097 Time elapsed (seconds)= >> 0.004143 >> order= 5 measure ofimprecision= 2.207 Time elapsed (seconds)= >> 0.001514 >> order= 11 measure ofimprecision= 2.372 Time elapsed (seconds)= >> 0.001455 >> order= 23 measure ofimprecision= 3.318 Time elapsed (seconds)= >> 0.001608 >> order= 47 measure ofimprecision= 4.257 Time elapsed (seconds)= >> 0.002339 >> order= 95 measure ofimprecision= 4.986 Time elapsed (seconds)= >> 0.005747 >> order= 191 measure ofimprecision= 5.788 Time elapsed (seconds)= >> 0.029974 >> order= 383 measure ofimprecision= 6.765 Time elapsed (seconds)= >> 0.145339 >> order= 767 measure ofimprecision= 7.909 Time elapsed (seconds)= >> 0.841142 >> order= 1535 measure ofimprecision= 8.532 Time elapsed (seconds)= >> 5.793630 >> order= 3071 measure ofimprecision= 9.774 Time elapsed (seconds)= >> 39.559540 >> order= 6143 Process terminated by a MemoryError >> >> Above: 2.7.3 Below: Python 3.2.3 >> >> bbb_bbb >> 3.2.3 (default, Apr 11 2012, 07:15:24) [MSC v.1500 32 bit (Intel)] >> order= 2 measure ofimprecision= 0.000 Time elapsed (seconds)= >> 0.113930 >> order= 5 measure ofimprecision= 1.807 Time elapsed (seconds)= >> 0.001373 >> order= 11 measure ofimprecision= 2.395 Time elapsed (seconds)= >> 0.001468 >> order= 23 measure ofimprecision= 3.073 Time elapsed (seconds)= >> 0.001609 >> order= 47 measure ofimprecision= 5.642 Time elapsed (seconds)= >> 0.002687 >> order= 95 measure ofimprecision= 5.745 Time elapsed (seconds)= >> 0.013510 >> order= 191 measure ofimprecision= 5.866 Time elapsed (seconds)= >> 0.061560 >> order= 383 measure ofimprecision= 7.129 Time elapsed (seconds)= >> 0.418490 >> order= 767 measure ofimprecision= 8.240 Time elapsed (seconds)= >> 3.815713 >> order= 1535 measure ofimprecision= 8.735 Time elapsed (seconds)= >> 27.877270 >> order= 3071 measure ofimprecision= 9.996 Time elapsed >> (seconds)=212.545610 >> order= 6143 Process terminated by a MemoryError >> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From cjwilliams43 at gmail.com Wed Mar 20 10:49:58 2013 From: cjwilliams43 at gmail.com (Colin J. Williams) Date: Wed, 20 Mar 2013 10:49:58 -0400 Subject: [Numpy-discussion] Execution time difference between 2.7 and 3.2 using numpy In-Reply-To: References: <5149BDBB.6060509@ncf.ca> Message-ID: <5149CC96.7090006@gmail.com> An HTML attachment was scrubbed... URL: From cjwilliams43 at gmail.com Wed Mar 20 10:59:26 2013 From: cjwilliams43 at gmail.com (Colin J. Williams) Date: Wed, 20 Mar 2013 10:59:26 -0400 Subject: [Numpy-discussion] Execution time difference between 2.7 and 3.2 using numpy In-Reply-To: References: <5149BDBB.6060509@ncf.ca> Message-ID: <5149CECE.6020107@gmail.com> An HTML attachment was scrubbed... URL: From cjwilliams43 at gmail.com Wed Mar 20 11:01:33 2013 From: cjwilliams43 at gmail.com (Colin J. Williams) Date: Wed, 20 Mar 2013 11:01:33 -0400 Subject: [Numpy-discussion] Execution time difference between 2.7 and 3.2 using numpy In-Reply-To: References: <5149BDBB.6060509@ncf.ca> Message-ID: <5149CF4D.6090906@gmail.com> On 20/03/2013 10:30 AM, Fr?d?ric Bastien wrote: > Hi, > > win32 do not mean it is a 32 bits windows. sys.platform always return > win32 on 32bits and 64 bits windows even for python 64 bits. > > But that is a good question, is your python 32 or 64 bits? 32 bits. Colin W. > > Fred > > On Wed, Mar 20, 2013 at 10:14 AM, Da?id wrote: >> Without much detailed knowledge of the topic, I would expect both >> versions to give very similar timing, as it is essentially a call to >> ATLAS function, not much is done in Python. >> >> Given this, maybe the difference is in ATLAS itself. How have you >> installed it? When you compile ATLAS, it will do some machine-specific >> optimisation, but if you have installed a binary chances are that your >> version is optimised for a machine quite different from yours. So, two >> different installations could have been compiled in different machines >> and so one is more suited for your machine. If you want to be sure, I >> would try to compile ATLAS (this may be difficult) or check the same >> on a very different machine (like an AMD processor, different >> architecture...). >> >> >> >> Just for reference, on Linux Python 2.7 64 bits can deal with these >> matrices easily. >> >> %timeit mat=np.random.random((6143,6143)); matinv= np.linalg.inv(mat); >> res = np.dot(mat, matinv); diff= res-np.eye(6143); print >> np.sum(np.abs(diff)) >> 2.41799631031e-05 >> 1.13955868701e-05 >> 3.64338191541e-05 >> 1.13484781021e-05 >> 1 loops, best of 3: 156 s per loop >> >> Intel i5, 4 GB of RAM and SSD. ATLAS installed from Fedora repository >> (I don't run heavy stuff on this computer). >> >> On 20 March 2013 14:46, Colin J. Williams wrote: >>> I have a small program which builds random matrices for increasing matrix >>> orders, inverts the matrix and checks the precision of the product. At some >>> point, one would expect operations to fail, when the memory capacity is >>> exceeded. In both Python 2.7 and 3.2 matrices of order 3,071 area handled, >>> but not 6,143. >>> >>> Using wall-clock times, with win32, Python 3.2 is slower than Python 2.7. >>> The profiler indicates a problem in the solver. >>> >>> Done on a Pentium, with 2.7 GHz processor, 2 GB of RAM and 221 GB of free >>> disk space. Both Python 3.2.3 and Python 2.7.3 use numpy 1.6.2. >>> >>> The results are show below. >>> >>> Colin W. >>> >>> aaaa_ssss >>> 2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit (Intel)] >>> order= 2 measure ofimprecision= 0.097 Time elapsed (seconds)= >>> 0.004143 >>> order= 5 measure ofimprecision= 2.207 Time elapsed (seconds)= >>> 0.001514 >>> order= 11 measure ofimprecision= 2.372 Time elapsed (seconds)= >>> 0.001455 >>> order= 23 measure ofimprecision= 3.318 Time elapsed (seconds)= >>> 0.001608 >>> order= 47 measure ofimprecision= 4.257 Time elapsed (seconds)= >>> 0.002339 >>> order= 95 measure ofimprecision= 4.986 Time elapsed (seconds)= >>> 0.005747 >>> order= 191 measure ofimprecision= 5.788 Time elapsed (seconds)= >>> 0.029974 >>> order= 383 measure ofimprecision= 6.765 Time elapsed (seconds)= >>> 0.145339 >>> order= 767 measure ofimprecision= 7.909 Time elapsed (seconds)= >>> 0.841142 >>> order= 1535 measure ofimprecision= 8.532 Time elapsed (seconds)= >>> 5.793630 >>> order= 3071 measure ofimprecision= 9.774 Time elapsed (seconds)= >>> 39.559540 >>> order= 6143 Process terminated by a MemoryError >>> >>> Above: 2.7.3 Below: Python 3.2.3 >>> >>> bbb_bbb >>> 3.2.3 (default, Apr 11 2012, 07:15:24) [MSC v.1500 32 bit (Intel)] >>> order= 2 measure ofimprecision= 0.000 Time elapsed (seconds)= >>> 0.113930 >>> order= 5 measure ofimprecision= 1.807 Time elapsed (seconds)= >>> 0.001373 >>> order= 11 measure ofimprecision= 2.395 Time elapsed (seconds)= >>> 0.001468 >>> order= 23 measure ofimprecision= 3.073 Time elapsed (seconds)= >>> 0.001609 >>> order= 47 measure ofimprecision= 5.642 Time elapsed (seconds)= >>> 0.002687 >>> order= 95 measure ofimprecision= 5.745 Time elapsed (seconds)= >>> 0.013510 >>> order= 191 measure ofimprecision= 5.866 Time elapsed (seconds)= >>> 0.061560 >>> order= 383 measure ofimprecision= 7.129 Time elapsed (seconds)= >>> 0.418490 >>> order= 767 measure ofimprecision= 8.240 Time elapsed (seconds)= >>> 3.815713 >>> order= 1535 measure ofimprecision= 8.735 Time elapsed (seconds)= >>> 27.877270 >>> order= 3071 measure ofimprecision= 9.996 Time elapsed >>> (seconds)=212.545610 >>> order= 6143 Process terminated by a MemoryError >>> >>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From jenshnielsen at gmail.com Wed Mar 20 11:06:47 2013 From: jenshnielsen at gmail.com (Jens Nielsen) Date: Wed, 20 Mar 2013 15:06:47 +0000 Subject: [Numpy-discussion] Execution time difference between 2.7 and 3.2 using numpy In-Reply-To: <5149CF4D.6090906@gmail.com> References: <5149BDBB.6060509@ncf.ca> <5149CF4D.6090906@gmail.com> Message-ID: The python3 version is compiled without any optimised library and is falling back on a slow version. Where did you get this installation from? Jens On Wed, Mar 20, 2013 at 3:01 PM, Colin J. Williams wrote: > On 20/03/2013 10:30 AM, Fr?d?ric Bastien wrote: > > Hi, > > > > win32 do not mean it is a 32 bits windows. sys.platform always return > > win32 on 32bits and 64 bits windows even for python 64 bits. > > > > But that is a good question, is your python 32 or 64 bits? > 32 bits. > > Colin W. > > > > Fred > > > > On Wed, Mar 20, 2013 at 10:14 AM, Da?id wrote: > >> Without much detailed knowledge of the topic, I would expect both > >> versions to give very similar timing, as it is essentially a call to > >> ATLAS function, not much is done in Python. > >> > >> Given this, maybe the difference is in ATLAS itself. How have you > >> installed it? When you compile ATLAS, it will do some machine-specific > >> optimisation, but if you have installed a binary chances are that your > >> version is optimised for a machine quite different from yours. So, two > >> different installations could have been compiled in different machines > >> and so one is more suited for your machine. If you want to be sure, I > >> would try to compile ATLAS (this may be difficult) or check the same > >> on a very different machine (like an AMD processor, different > >> architecture...). > >> > >> > >> > >> Just for reference, on Linux Python 2.7 64 bits can deal with these > >> matrices easily. > >> > >> %timeit mat=np.random.random((6143,6143)); matinv= np.linalg.inv(mat); > >> res = np.dot(mat, matinv); diff= res-np.eye(6143); print > >> np.sum(np.abs(diff)) > >> 2.41799631031e-05 > >> 1.13955868701e-05 > >> 3.64338191541e-05 > >> 1.13484781021e-05 > >> 1 loops, best of 3: 156 s per loop > >> > >> Intel i5, 4 GB of RAM and SSD. ATLAS installed from Fedora repository > >> (I don't run heavy stuff on this computer). > >> > >> On 20 March 2013 14:46, Colin J. Williams wrote: > >>> I have a small program which builds random matrices for increasing > matrix > >>> orders, inverts the matrix and checks the precision of the product. > At some > >>> point, one would expect operations to fail, when the memory capacity is > >>> exceeded. In both Python 2.7 and 3.2 matrices of order 3,071 area > handled, > >>> but not 6,143. > >>> > >>> Using wall-clock times, with win32, Python 3.2 is slower than Python > 2.7. > >>> The profiler indicates a problem in the solver. > >>> > >>> Done on a Pentium, with 2.7 GHz processor, 2 GB of RAM and 221 GB of > free > >>> disk space. Both Python 3.2.3 and Python 2.7.3 use numpy 1.6.2. > >>> > >>> The results are show below. > >>> > >>> Colin W. > >>> > >>> aaaa_ssss > >>> 2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit (Intel)] > >>> order= 2 measure ofimprecision= 0.097 Time elapsed (seconds)= > >>> 0.004143 > >>> order= 5 measure ofimprecision= 2.207 Time elapsed (seconds)= > >>> 0.001514 > >>> order= 11 measure ofimprecision= 2.372 Time elapsed (seconds)= > >>> 0.001455 > >>> order= 23 measure ofimprecision= 3.318 Time elapsed (seconds)= > >>> 0.001608 > >>> order= 47 measure ofimprecision= 4.257 Time elapsed (seconds)= > >>> 0.002339 > >>> order= 95 measure ofimprecision= 4.986 Time elapsed (seconds)= > >>> 0.005747 > >>> order= 191 measure ofimprecision= 5.788 Time elapsed (seconds)= > >>> 0.029974 > >>> order= 383 measure ofimprecision= 6.765 Time elapsed (seconds)= > >>> 0.145339 > >>> order= 767 measure ofimprecision= 7.909 Time elapsed (seconds)= > >>> 0.841142 > >>> order= 1535 measure ofimprecision= 8.532 Time elapsed (seconds)= > >>> 5.793630 > >>> order= 3071 measure ofimprecision= 9.774 Time elapsed (seconds)= > >>> 39.559540 > >>> order= 6143 Process terminated by a MemoryError > >>> > >>> Above: 2.7.3 Below: Python 3.2.3 > >>> > >>> bbb_bbb > >>> 3.2.3 (default, Apr 11 2012, 07:15:24) [MSC v.1500 32 bit (Intel)] > >>> order= 2 measure ofimprecision= 0.000 Time elapsed (seconds)= > >>> 0.113930 > >>> order= 5 measure ofimprecision= 1.807 Time elapsed (seconds)= > >>> 0.001373 > >>> order= 11 measure ofimprecision= 2.395 Time elapsed (seconds)= > >>> 0.001468 > >>> order= 23 measure ofimprecision= 3.073 Time elapsed (seconds)= > >>> 0.001609 > >>> order= 47 measure ofimprecision= 5.642 Time elapsed (seconds)= > >>> 0.002687 > >>> order= 95 measure ofimprecision= 5.745 Time elapsed (seconds)= > >>> 0.013510 > >>> order= 191 measure ofimprecision= 5.866 Time elapsed (seconds)= > >>> 0.061560 > >>> order= 383 measure ofimprecision= 7.129 Time elapsed (seconds)= > >>> 0.418490 > >>> order= 767 measure ofimprecision= 8.240 Time elapsed (seconds)= > >>> 3.815713 > >>> order= 1535 measure ofimprecision= 8.735 Time elapsed (seconds)= > >>> 27.877270 > >>> order= 3071 measure ofimprecision= 9.996 Time elapsed > >>> (seconds)=212.545610 > >>> order= 6143 Process terminated by a MemoryError > >>> > >>> > >>> > >>> _______________________________________________ > >>> NumPy-Discussion mailing list > >>> NumPy-Discussion at scipy.org > >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >>> > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> NumPy-Discussion at scipy.org > >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaakko.luttinen at aalto.fi Wed Mar 20 11:10:02 2013 From: jaakko.luttinen at aalto.fi (Jaakko Luttinen) Date: Wed, 20 Mar 2013 17:10:02 +0200 Subject: [Numpy-discussion] Dot/inner products with broadcasting? In-Reply-To: <5149BABE.1080306@aalto.fi> References: <51408B59.8090504@aalto.fi> <5141BA5E.2020704@aalto.fi> <5149BABE.1080306@aalto.fi> Message-ID: <5149D14A.4020402@aalto.fi> Well, thanks to seberg, I finally noticed that there is a dot product function in this new module numpy.core.gufuncs_linalg, it was just named differently (matrix_multiply instead of dot). However, I may have found a bug in it: import numpy.core.gufuncs_linalg as gula A = np.arange(2*2).reshape((2,2)) B = np.arange(2*1).reshape((2,1)) gula.matrix_multiply(A, B) ---- ValueError: On entry to DGEMM parameter number 10 had an illegal value -Jaakko On 03/20/2013 03:33 PM, Jaakko Luttinen wrote: > I tried using this inner1d as an alternative to dot because it uses > broadcasting. However, I found something surprising: Not only is inner1d > much much slower than dot, it is also slower than einsum which is much > more general: > > In [68]: import numpy as np > > In [69]: import numpy.core.gufuncs_linalg as gula > > In [70]: K = np.random.randn(1000,1000) > > In [71]: %timeit gula.inner1d(K[:,np.newaxis,:], > np.swapaxes(K,-1,-2)[np.newaxis,:,:]) > 1 loops, best of 3: 6.05 s per loop > > In [72]: %timeit np.dot(K,K) > 1 loops, best of 3: 392 ms per loop > > In [73]: %timeit np.einsum('ik,kj->ij', K, K) > 1 loops, best of 3: 1.24 s per loop > > Why is it so? I thought that the performance of inner1d would be > somewhere in between dot and einsum, probably closer to dot. Now I don't > see any reason to use inner1d instead of einsum.. > > -Jaakko > > On 03/15/2013 04:22 PM, Oscar Villellas wrote: >> In fact, there is already an inner1d implemented in >> numpy.core.umath_tests.inner1d >> >> from numpy.core.umath_tests import inner1d >> >> It should do the trick :) >> >> On Thu, Mar 14, 2013 at 12:54 PM, Jaakko Luttinen >> wrote: >>> Answering to myself, this pull request seems to implement an inner >>> product with broadcasting (inner1d) and many other useful functions: >>> https://github.com/numpy/numpy/pull/2954/ >>> -J >>> >>> On 03/13/2013 04:21 PM, Jaakko Luttinen wrote: >>>> Hi! >>>> >>>> How can I compute dot product (or similar multiply&sum operations) >>>> efficiently so that broadcasting is utilized? >>>> For multi-dimensional arrays, NumPy's inner and dot functions do not >>>> match the leading axes and use broadcasting, but instead the result has >>>> first the leading axes of the first input array and then the leading >>>> axes of the second input array. >>>> >>>> For instance, I would like to compute the following inner-product: >>>> np.sum(A*B, axis=-1) >>>> >>>> But numpy.inner gives: >>>> A = np.random.randn(2,3,4) >>>> B = np.random.randn(3,4) >>>> np.inner(A,B).shape >>>> # -> (2, 3, 3) instead of (2, 3) >>>> >>>> Similarly for dot product, I would like to compute for instance: >>>> np.sum(A[...,:,:,np.newaxis]*B[...,np.newaxis,:,:], axis=-2) >>>> >>>> But numpy.dot gives: >>>> In [12]: A = np.random.randn(2,3,4); B = np.random.randn(2,4,5) >>>> In [13]: np.dot(A,B).shape >>>> # -> (2, 3, 2, 5) instead of (2, 3, 5) >>>> >>>> I could use einsum for these operations, but I'm not sure whether that's >>>> as efficient as using some BLAS-supported(?) dot products. >>>> >>>> I couldn't find any function which could perform this kind of >>>> operations. NumPy's functions seem to either flatten the input arrays >>>> (vdot, outer) or just use the axes of the input arrays separately (dot, >>>> inner, tensordot). >>>> >>>> Any help? >>>> >>>> Best regards, >>>> Jaakko >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From nouiz at nouiz.org Wed Mar 20 11:12:17 2013 From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=) Date: Wed, 20 Mar 2013 11:12:17 -0400 Subject: [Numpy-discussion] Execution time difference between 2.7 and 3.2 using numpy In-Reply-To: <5149CF4D.6090906@gmail.com> References: <5149BDBB.6060509@ncf.ca> <5149CF4D.6090906@gmail.com> Message-ID: On Wed, Mar 20, 2013 at 11:01 AM, Colin J. Williams wrote: > On 20/03/2013 10:30 AM, Fr?d?ric Bastien wrote: >> >> Hi, >> >> win32 do not mean it is a 32 bits windows. sys.platform always return >> win32 on 32bits and 64 bits windows even for python 64 bits. >> >> But that is a good question, is your python 32 or 64 bits? > > 32 bits. That explain why you have memory problem but not other people with 64 bits version. So if you want to work with bigger input, change to a python 64 bits. Fred From cjwilliams43 at gmail.com Wed Mar 20 11:16:05 2013 From: cjwilliams43 at gmail.com (Colin J. Williams) Date: Wed, 20 Mar 2013 11:16:05 -0400 Subject: [Numpy-discussion] Execution time difference between 2.7 and 3.2 using numpy In-Reply-To: References: <5149BDBB.6060509@ncf.ca> <5149CF4D.6090906@gmail.com> Message-ID: <5149D2B5.8000907@gmail.com> An HTML attachment was scrubbed... URL: From cjwilliams43 at gmail.com Wed Mar 20 11:18:23 2013 From: cjwilliams43 at gmail.com (Colin J. Williams) Date: Wed, 20 Mar 2013 11:18:23 -0400 Subject: [Numpy-discussion] Execution time difference between 2.7 and 3.2 using numpy In-Reply-To: References: <5149BDBB.6060509@ncf.ca> <5149CF4D.6090906@gmail.com> Message-ID: <5149D33F.9080907@gmail.com> An HTML attachment was scrubbed... URL: From lists at hilboll.de Wed Mar 20 11:31:04 2013 From: lists at hilboll.de (Andreas Hilboll) Date: Wed, 20 Mar 2013 16:31:04 +0100 Subject: [Numpy-discussion] how to efficiently select multiple slices from an array? Message-ID: <5149D638.9050000@hilboll.de> Cross-posting a question I asked on SO (http://stackoverflow.com/q/15527666/152439): Given an array d = np.random.randn(100) and an index array i = np.random.random_integers(low=3, high=d.size - 5, size=20) how can I efficiently create a 2d array r with r.shape = (20, 8) such that for all j=0..19, r[j] = d[i[j]-3:i[j]+5] In my case, the arrays are quite large (~200000 instead of 100 and 20), so something quick would be useful. Cheers, Andreas. From sebastian at sipsolutions.net Wed Mar 20 11:43:17 2013 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Wed, 20 Mar 2013 16:43:17 +0100 Subject: [Numpy-discussion] how to efficiently select multiple slices from an array? In-Reply-To: <5149D638.9050000@hilboll.de> References: <5149D638.9050000@hilboll.de> Message-ID: <1363794197.22391.9.camel@sebastian-laptop> Hey, On Wed, 2013-03-20 at 16:31 +0100, Andreas Hilboll wrote: > Cross-posting a question I asked on SO > (http://stackoverflow.com/q/15527666/152439): > > > Given an array > > d = np.random.randn(100) > > and an index array > > i = np.random.random_integers(low=3, high=d.size - 5, size=20) > > how can I efficiently create a 2d array r with > > r.shape = (20, 8) > > such that for all j=0..19, > > r[j] = d[i[j]-3:i[j]+5] > > In my case, the arrays are quite large (~200000 instead of 100 and 20), > so something quick would be useful. You can use stride tricks, its simple to do by hand, but since I got it, maybe just use this: https://gist.github.com/seberg/3866040 d = np.random.randn(100) windowed_d = rolling_window(d, 8) i = np.random_integers(len(windowed_d)) r = d[i,:] Or use stride_tricks by hand, with: windowed_d = np.lib.stride_tricks.as_strided(d, (d.shape[0]-7, 8), (d.strides[0],)*2) Since the fancy indexing will create a copy, while windowed_d views the same data as the original array, of course that is not the case for the end result. Regards, Sebastian > > Cheers, Andreas. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From robert.kern at gmail.com Wed Mar 20 12:03:36 2013 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 20 Mar 2013 16:03:36 +0000 Subject: [Numpy-discussion] numpy array to C API In-Reply-To: <5149C0C5.6050801@syntonetic.com> References: <5149C0C5.6050801@syntonetic.com> Message-ID: On Wed, Mar 20, 2013 at 1:59 PM, S?ren wrote: > Greetings > > I'm extending our existing C/C++ software with Python/Numpy in order to do extra number crunching. > It already works like a charm calling python with the C API . > > But what is the proper way of passing double arrays returned from Python/Numpy routines back to C? > > I came across PyArray but I can see in the compiler warnings, it is deprecated and I don't wanna start from scratch on legacy facilities. What is this `PyArray` that you are referring to? There is nothing named just `PyArray` to my knowledge. Do you mean direct access to the `data` member of the PyArrayObject struct? Yes, that is deprecated. Use the PyArray_DATA() macro to get a `void*` pointer to the start of the data. http://docs.scipy.org/doc/numpy/reference/c-api.array.html#PyArray_DATA -- Robert Kern From warren.weckesser at gmail.com Wed Mar 20 13:11:10 2013 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Wed, 20 Mar 2013 13:11:10 -0400 Subject: [Numpy-discussion] Add ability to disable the autogeneration of the function signature in a ufunc docstring. In-Reply-To: References: Message-ID: On Fri, Mar 15, 2013 at 4:39 PM, Nathaniel Smith wrote: > On Fri, Mar 15, 2013 at 6:47 PM, Warren Weckesser > wrote: > > Hi all, > > > > In a recent scipy pull request (https://github.com/scipy/scipy/pull/459), > I > > ran into the problem of ufuncs automatically generating a signature in > the > > docstring using arguments such as 'x' or 'x1, x2'. scipy.special has a > lot > > of ufuncs, and for most of them, there are much more descriptive or > > conventional argument names than 'x'. For now, we will include a nicer > > signature in the added docstring, and grudgingly put up with the one > > generated by the ufunc. In the long term, it would be nice to be able to > > disable the automatic generation of the signature. I submitted a pull > > request to numpy to allow that: https://github.com/numpy/numpy/pull/3149 > > > > Comments on the pull request would be appreciated. > > The functionality seems obviously useful, but adding a magic public > attribute to all ufuncs seems like a somewhat clumsy way to expose it? > Esp. since ufuncs are always created through the C API, including > docstring specification, but this can only be set at the Python level? > Maybe it's the best option but it seems worth taking a few minutes to > consider alternatives. > Agreed; exposing the flag as part of the public Python ufunc API is unnecessary, since this is something that would rarely, if ever, be changed during the life of the ufunc. > Brainstorming: > > - If the first line of the docstring starts with "(" and > ends with ")", then that's a signature and we skip adding one (I think > sphinx does something like this?) Kinda magic and implicit, but highly > backwards compatible. > > - Declare that henceforth, the signature generation will be disabled > by default, and go through and add a special marker like > "__SIGNATURE__" to all the existing ufunc docstrings, which gets > replaced (if present) by the automagically generated signature. > > - Give ufunc arguments actual names in general, that work for things > like kwargs, and then use those in the automagically generated > signature. This is the most work, but it would mean that people don't > have to remember to update their non-magic signatures whenever numpy > adds a new feature like out= or where=, and would make the docstrings > actually accurate, which right now they aren't: > > I'm leaning towards this option. I don't know if there would still be a need to disable the automatic generation of the docstring if it was good enough. In [7]: np.add.__doc__.split("\n")[0] > Out[7]: 'add(x1, x2[, out])' > > In [8]: np.add(x1=1, x2=2) > ValueError: invalid number of arguments > > - Allow some special syntax to describe the argument names in the > docstring: "__ARGNAMES__: a b\n" -> "add(a, b[, out])" > > - Something else... > > -n > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From oscar.villellas at continuum.io Wed Mar 20 13:14:41 2013 From: oscar.villellas at continuum.io (Oscar Villellas) Date: Wed, 20 Mar 2013 18:14:41 +0100 Subject: [Numpy-discussion] Dot/inner products with broadcasting? In-Reply-To: <5149D14A.4020402@aalto.fi> References: <51408B59.8090504@aalto.fi> <5141BA5E.2020704@aalto.fi> <5149BABE.1080306@aalto.fi> <5149D14A.4020402@aalto.fi> Message-ID: Reproduced it. I will take a look at it. That error comes direct from BLAS and shouldn't be happening. I will also look why inner1d is not performing well. Note: inner1d is implemented with calls to BLAS (dot). I will get back to you later :) On Wed, Mar 20, 2013 at 4:10 PM, Jaakko Luttinen wrote: > Well, thanks to seberg, I finally noticed that there is a dot product > function in this new module numpy.core.gufuncs_linalg, it was just named > differently (matrix_multiply instead of dot). > > However, I may have found a bug in it: > > import numpy.core.gufuncs_linalg as gula > A = np.arange(2*2).reshape((2,2)) > B = np.arange(2*1).reshape((2,1)) > gula.matrix_multiply(A, B) > ---- > ValueError: On entry to DGEMM parameter number 10 had an illegal value > > -Jaakko > > On 03/20/2013 03:33 PM, Jaakko Luttinen wrote: >> I tried using this inner1d as an alternative to dot because it uses >> broadcasting. However, I found something surprising: Not only is inner1d >> much much slower than dot, it is also slower than einsum which is much >> more general: >> >> In [68]: import numpy as np >> >> In [69]: import numpy.core.gufuncs_linalg as gula >> >> In [70]: K = np.random.randn(1000,1000) >> >> In [71]: %timeit gula.inner1d(K[:,np.newaxis,:], >> np.swapaxes(K,-1,-2)[np.newaxis,:,:]) >> 1 loops, best of 3: 6.05 s per loop >> >> In [72]: %timeit np.dot(K,K) >> 1 loops, best of 3: 392 ms per loop >> >> In [73]: %timeit np.einsum('ik,kj->ij', K, K) >> 1 loops, best of 3: 1.24 s per loop >> >> Why is it so? I thought that the performance of inner1d would be >> somewhere in between dot and einsum, probably closer to dot. Now I don't >> see any reason to use inner1d instead of einsum.. >> >> -Jaakko >> >> On 03/15/2013 04:22 PM, Oscar Villellas wrote: >>> In fact, there is already an inner1d implemented in >>> numpy.core.umath_tests.inner1d >>> >>> from numpy.core.umath_tests import inner1d >>> >>> It should do the trick :) >>> >>> On Thu, Mar 14, 2013 at 12:54 PM, Jaakko Luttinen >>> wrote: >>>> Answering to myself, this pull request seems to implement an inner >>>> product with broadcasting (inner1d) and many other useful functions: >>>> https://github.com/numpy/numpy/pull/2954/ >>>> -J >>>> >>>> On 03/13/2013 04:21 PM, Jaakko Luttinen wrote: >>>>> Hi! >>>>> >>>>> How can I compute dot product (or similar multiply&sum operations) >>>>> efficiently so that broadcasting is utilized? >>>>> For multi-dimensional arrays, NumPy's inner and dot functions do not >>>>> match the leading axes and use broadcasting, but instead the result has >>>>> first the leading axes of the first input array and then the leading >>>>> axes of the second input array. >>>>> >>>>> For instance, I would like to compute the following inner-product: >>>>> np.sum(A*B, axis=-1) >>>>> >>>>> But numpy.inner gives: >>>>> A = np.random.randn(2,3,4) >>>>> B = np.random.randn(3,4) >>>>> np.inner(A,B).shape >>>>> # -> (2, 3, 3) instead of (2, 3) >>>>> >>>>> Similarly for dot product, I would like to compute for instance: >>>>> np.sum(A[...,:,:,np.newaxis]*B[...,np.newaxis,:,:], axis=-2) >>>>> >>>>> But numpy.dot gives: >>>>> In [12]: A = np.random.randn(2,3,4); B = np.random.randn(2,4,5) >>>>> In [13]: np.dot(A,B).shape >>>>> # -> (2, 3, 2, 5) instead of (2, 3, 5) >>>>> >>>>> I could use einsum for these operations, but I'm not sure whether that's >>>>> as efficient as using some BLAS-supported(?) dot products. >>>>> >>>>> I couldn't find any function which could perform this kind of >>>>> operations. NumPy's functions seem to either flatten the input arrays >>>>> (vdot, outer) or just use the axes of the input arrays separately (dot, >>>>> inner, tensordot). >>>>> >>>>> Any help? >>>>> >>>>> Best regards, >>>>> Jaakko >>>>> _______________________________________________ >>>>> NumPy-Discussion mailing list >>>>> NumPy-Discussion at scipy.org >>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>>> >>>> >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From njs at pobox.com Wed Mar 20 13:16:30 2013 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 20 Mar 2013 17:16:30 +0000 Subject: [Numpy-discussion] Add ability to disable the autogeneration of the function signature in a ufunc docstring. In-Reply-To: References: Message-ID: On 20 Mar 2013 17:11, "Warren Weckesser" wrote: > > > > On Fri, Mar 15, 2013 at 4:39 PM, Nathaniel Smith wrote: >> >> On Fri, Mar 15, 2013 at 6:47 PM, Warren Weckesser >> wrote: >> > Hi all, >> > >> > In a recent scipy pull request (https://github.com/scipy/scipy/pull/459), I >> > ran into the problem of ufuncs automatically generating a signature in the >> > docstring using arguments such as 'x' or 'x1, x2'. scipy.special has a lot >> > of ufuncs, and for most of them, there are much more descriptive or >> > conventional argument names than 'x'. For now, we will include a nicer >> > signature in the added docstring, and grudgingly put up with the one >> > generated by the ufunc. In the long term, it would be nice to be able to >> > disable the automatic generation of the signature. I submitted a pull >> > request to numpy to allow that: https://github.com/numpy/numpy/pull/3149 >> > >> > Comments on the pull request would be appreciated. >> >> The functionality seems obviously useful, but adding a magic public >> attribute to all ufuncs seems like a somewhat clumsy way to expose it? >> Esp. since ufuncs are always created through the C API, including >> docstring specification, but this can only be set at the Python level? >> Maybe it's the best option but it seems worth taking a few minutes to >> consider alternatives. > > > > Agreed; exposing the flag as part of the public Python ufunc API is unnecessary, since this is something that would rarely, if ever, be changed during the life of the ufunc. > > >> >> Brainstorming: >> >> - If the first line of the docstring starts with "(" and >> ends with ")", then that's a signature and we skip adding one (I think >> sphinx does something like this?) Kinda magic and implicit, but highly >> backwards compatible. >> >> - Declare that henceforth, the signature generation will be disabled >> by default, and go through and add a special marker like >> "__SIGNATURE__" to all the existing ufunc docstrings, which gets >> replaced (if present) by the automagically generated signature. >> >> - Give ufunc arguments actual names in general, that work for things >> like kwargs, and then use those in the automagically generated >> signature. This is the most work, but it would mean that people don't >> have to remember to update their non-magic signatures whenever numpy >> adds a new feature like out= or where=, and would make the docstrings >> actually accurate, which right now they aren't: >> > > I'm leaning towards this option. I don't know if there would still be a need to disable the automatic generation of the docstring if it was good enough. Certainly it would be nice for ufunc argument handling to better match python argument handling! Just needs someone willing to do the work... *cough* ;-) -n >> In [7]: np.add.__doc__.split("\n")[0] >> Out[7]: 'add(x1, x2[, out])' >> >> In [8]: np.add(x1=1, x2=2) >> ValueError: invalid number of arguments >> >> - Allow some special syntax to describe the argument names in the >> docstring: "__ARGNAMES__: a b\n" -> "add(a, b[, out])" >> >> - Something else... >> >> -n >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lists at hilboll.de Wed Mar 20 13:59:22 2013 From: lists at hilboll.de (Andreas Hilboll) Date: Wed, 20 Mar 2013 18:59:22 +0100 Subject: [Numpy-discussion] how to efficiently select multiple slices from an array? In-Reply-To: <1363794197.22391.9.camel@sebastian-laptop> References: <5149D638.9050000@hilboll.de> <1363794197.22391.9.camel@sebastian-laptop> Message-ID: <5149F8FA.6040707@hilboll.de> > Hey, > > On Wed, 2013-03-20 at 16:31 +0100, Andreas Hilboll wrote: >> Cross-posting a question I asked on SO >> (http://stackoverflow.com/q/15527666/152439): >> >> >> Given an array >> >> d = np.random.randn(100) >> >> and an index array >> >> i = np.random.random_integers(low=3, high=d.size - 5, size=20) >> >> how can I efficiently create a 2d array r with >> >> r.shape = (20, 8) >> >> such that for all j=0..19, >> >> r[j] = d[i[j]-3:i[j]+5] >> >> In my case, the arrays are quite large (~200000 instead of 100 and 20), >> so something quick would be useful. > > > You can use stride tricks, its simple to do by hand, but since I got it, > maybe just use this: https://gist.github.com/seberg/3866040 > > d = np.random.randn(100) > windowed_d = rolling_window(d, 8) > i = np.random_integers(len(windowed_d)) > r = d[i,:] > > Or use stride_tricks by hand, with: > windowed_d = np.lib.stride_tricks.as_strided(d, (d.shape[0]-7, 8), > (d.strides[0],)*2) > > Since the fancy indexing will create a copy, while windowed_d views the > same data as the original array, of course that is not the case for the > end result. > > Regards, > > Sebastian > >> >> Cheers, Andreas. >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion cool, thanks! From chris.barker at noaa.gov Wed Mar 20 14:25:20 2013 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Wed, 20 Mar 2013 11:25:20 -0700 Subject: [Numpy-discussion] numpy array to C API In-Reply-To: References: <5149C0C5.6050801@syntonetic.com> Message-ID: On Wed, Mar 20, 2013 at 9:03 AM, Robert Kern wrote: I highly recommend using an existing tool to write this interface, to take care of the reference counting, etc for you. Cython is particularly nice. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From sd at syntonetic.com Thu Mar 21 04:11:35 2013 From: sd at syntonetic.com (=?UTF-8?B?U8O4cmVu?=) Date: Thu, 21 Mar 2013 09:11:35 +0100 Subject: [Numpy-discussion] numpy array to C API In-Reply-To: References: <5149C0C5.6050801@syntonetic.com> Message-ID: <514AC0B7.6020500@syntonetic.com> Thanks Robert, for making that clear. I got a deprecated warning the second I added #include and I got scared off too fast in my exploring phase. Cheers S?ren On 20/03/2013 17:03, Robert Kern wrote: > On Wed, Mar 20, 2013 at 1:59 PM, S?ren wrote: >> Greetings >> >> I'm extending our existing C/C++ software with Python/Numpy in order to do extra number crunching. >> It already works like a charm calling python with the C API . >> >> But what is the proper way of passing double arrays returned from Python/Numpy routines back to C? >> >> I came across PyArray but I can see in the compiler warnings, it is deprecated and I don't wanna start from scratch on legacy facilities. > What is this `PyArray` that you are referring to? There is nothing > named just `PyArray` to my knowledge. Do you mean direct access to the > `data` member of the PyArrayObject struct? Yes, that is deprecated. > Use the PyArray_DATA() macro to get a `void*` pointer to the start of > the data. > > http://docs.scipy.org/doc/numpy/reference/c-api.array.html#PyArray_DATA > > -- > Robert Kern > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From valentin at haenel.co Thu Mar 21 04:45:21 2013 From: valentin at haenel.co (Valentin Haenel) Date: Thu, 21 Mar 2013 09:45:21 +0100 Subject: [Numpy-discussion] numpy array to C API In-Reply-To: <514AC0B7.6020500@syntonetic.com> References: <5149C0C5.6050801@syntonetic.com> <514AC0B7.6020500@syntonetic.com> Message-ID: <20130321084521.GG7842@kudu.in-berlin.de> Dear S?ren, if you are new to interfacing python/numpy with C/C++, you may want to check out: http://scipy-lectures.github.com/advanced/interfacing_with_c/interfacing_with_c.html Disclaimer: I am the author of this chapter, so this response is a bit of a shameless plug :D Hope it helps none the less. V- * S?ren [2013-03-21]: > Thanks Robert, for making that clear. > > I got a deprecated warning the second I added > #include > and I got scared off too fast in my exploring phase. > > Cheers > S?ren > > On 20/03/2013 17:03, Robert Kern wrote: > > On Wed, Mar 20, 2013 at 1:59 PM, S?ren wrote: > >> Greetings > >> > >> I'm extending our existing C/C++ software with Python/Numpy in order to do extra number crunching. > >> It already works like a charm calling python with the C API . > >> > >> But what is the proper way of passing double arrays returned from Python/Numpy routines back to C? > >> > >> I came across PyArray but I can see in the compiler warnings, it is deprecated and I don't wanna start from scratch on legacy facilities. > > What is this `PyArray` that you are referring to? There is nothing > > named just `PyArray` to my knowledge. Do you mean direct access to the > > `data` member of the PyArrayObject struct? Yes, that is deprecated. > > Use the PyArray_DATA() macro to get a `void*` pointer to the start of > > the data. > > > > http://docs.scipy.org/doc/numpy/reference/c-api.array.html#PyArray_DATA > > > > -- > > Robert Kern > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From daniele at grinta.net Thu Mar 21 05:04:39 2013 From: daniele at grinta.net (Daniele Nicolodi) Date: Thu, 21 Mar 2013 10:04:39 +0100 Subject: [Numpy-discussion] numpy array to C API In-Reply-To: <20130321084521.GG7842@kudu.in-berlin.de> References: <5149C0C5.6050801@syntonetic.com> <514AC0B7.6020500@syntonetic.com> <20130321084521.GG7842@kudu.in-berlin.de> Message-ID: <514ACD27.3060504@grinta.net> On 21/03/2013 09:45, Valentin Haenel wrote: > if you are new to interfacing python/numpy with C/C++, you may want to > check out: > > http://scipy-lectures.github.com/advanced/interfacing_with_c/interfacing_with_c.html > > Disclaimer: I am the author of this chapter, so this response is a bit > of a shameless plug :D Hello Valentin, I had a quick look at the chapter. It looks good! Thanks for sharing it. However I have a small comment on the way you implement the Cython-Numpy solution. I would have written the loop over the array element in Cython itself rather than in a separately compiled C function. This would have the advantage of presenting more capabilities of Cython and would slightly decrease the complexity of the solution (one source file instead of two). Cheers, Daniele From valentin at haenel.co Thu Mar 21 05:16:50 2013 From: valentin at haenel.co (Valentin Haenel) Date: Thu, 21 Mar 2013 10:16:50 +0100 Subject: [Numpy-discussion] numpy array to C API In-Reply-To: <514ACD27.3060504@grinta.net> References: <5149C0C5.6050801@syntonetic.com> <514AC0B7.6020500@syntonetic.com> <20130321084521.GG7842@kudu.in-berlin.de> <514ACD27.3060504@grinta.net> Message-ID: <20130321091649.GA12061@kudu.in-berlin.de> Dear Daniele * Daniele Nicolodi [2013-03-21]: > On 21/03/2013 09:45, Valentin Haenel wrote: > > if you are new to interfacing python/numpy with C/C++, you may want to > > check out: > > > > http://scipy-lectures.github.com/advanced/interfacing_with_c/interfacing_with_c.html > > > > Disclaimer: I am the author of this chapter, so this response is a bit > > of a shameless plug :D > > Hello Valentin, > > I had a quick look at the chapter. It looks good! Thanks for sharing it. > > However I have a small comment on the way you implement the Cython-Numpy > solution. I would have written the loop over the array element in Cython > itself rather than in a separately compiled C function. This would have > the advantage of presenting more capabilities of Cython and would > slightly decrease the complexity of the solution (one source file > instead of two). Thanks very much for your feedback! Since the chapter in under a CC licence, you are welcome to submit your proposal as a Pull-Request. :D The reason why I wrote the loop in C is so that the cython example synergieses with the others. The ideas is, that you have an already existing code-base that has a function which has such a signature. Ideally, your proposal would be an improvement, where the original example stays in place and you develop the improvement including reasons as to why it is better. ;) best V- From daniele at grinta.net Thu Mar 21 06:14:16 2013 From: daniele at grinta.net (Daniele Nicolodi) Date: Thu, 21 Mar 2013 11:14:16 +0100 Subject: [Numpy-discussion] numpy array to C API In-Reply-To: <20130321091649.GA12061@kudu.in-berlin.de> References: <5149C0C5.6050801@syntonetic.com> <514AC0B7.6020500@syntonetic.com> <20130321084521.GG7842@kudu.in-berlin.de> <514ACD27.3060504@grinta.net> <20130321091649.GA12061@kudu.in-berlin.de> Message-ID: <514ADD78.4010604@grinta.net> On 21/03/2013 10:16, Valentin Haenel wrote: > Dear Daniele > > * Daniele Nicolodi [2013-03-21]: >> On 21/03/2013 09:45, Valentin Haenel wrote: >>> if you are new to interfacing python/numpy with C/C++, you may want to >>> check out: >>> >>> http://scipy-lectures.github.com/advanced/interfacing_with_c/interfacing_with_c.html >>> >>> Disclaimer: I am the author of this chapter, so this response is a bit >>> of a shameless plug :D >> >> Hello Valentin, >> >> I had a quick look at the chapter. It looks good! Thanks for sharing it. >> >> However I have a small comment on the way you implement the Cython-Numpy >> solution. I would have written the loop over the array element in Cython >> itself rather than in a separately compiled C function. This would have >> the advantage of presenting more capabilities of Cython and would >> slightly decrease the complexity of the solution (one source file >> instead of two). > > Thanks very much for your feedback! Since the chapter in under a CC > licence, you are welcome to submit your proposal as a Pull-Request. :D > > The reason why I wrote the loop in C is so that the cython example > synergieses with the others. The ideas is, that you have an already > existing code-base that has a function which has such a signature. > Ideally, your proposal would be an improvement, where the original > example stays in place and you develop the improvement including reasons > as to why it is better. ;) I understand the reasoning behind your choice. I'm adding sending you a patch with this addition to my todo list, but I don't really know when I will have time to work on it... Cheers, Daniele From valentin at haenel.co Thu Mar 21 06:19:12 2013 From: valentin at haenel.co (Valentin Haenel) Date: Thu, 21 Mar 2013 11:19:12 +0100 Subject: [Numpy-discussion] numpy array to C API In-Reply-To: <514ADD78.4010604@grinta.net> References: <5149C0C5.6050801@syntonetic.com> <514AC0B7.6020500@syntonetic.com> <20130321084521.GG7842@kudu.in-berlin.de> <514ACD27.3060504@grinta.net> <20130321091649.GA12061@kudu.in-berlin.de> <514ADD78.4010604@grinta.net> Message-ID: <20130321101912.GB12061@kudu.in-berlin.de> * Daniele Nicolodi [2013-03-21]: > On 21/03/2013 10:16, Valentin Haenel wrote: > > Dear Daniele > > > > * Daniele Nicolodi [2013-03-21]: > >> On 21/03/2013 09:45, Valentin Haenel wrote: > >>> if you are new to interfacing python/numpy with C/C++, you may want to > >>> check out: > >>> > >>> http://scipy-lectures.github.com/advanced/interfacing_with_c/interfacing_with_c.html > >>> > >>> Disclaimer: I am the author of this chapter, so this response is a bit > >>> of a shameless plug :D > >> > >> Hello Valentin, > >> > >> I had a quick look at the chapter. It looks good! Thanks for sharing it. > >> > >> However I have a small comment on the way you implement the Cython-Numpy > >> solution. I would have written the loop over the array element in Cython > >> itself rather than in a separately compiled C function. This would have > >> the advantage of presenting more capabilities of Cython and would > >> slightly decrease the complexity of the solution (one source file > >> instead of two). > > > > Thanks very much for your feedback! Since the chapter in under a CC > > licence, you are welcome to submit your proposal as a Pull-Request. :D > > > > The reason why I wrote the loop in C is so that the cython example > > synergieses with the others. The ideas is, that you have an already > > existing code-base that has a function which has such a signature. > > Ideally, your proposal would be an improvement, where the original > > example stays in place and you develop the improvement including reasons > > as to why it is better. ;) > > I understand the reasoning behind your choice. I'm adding sending you a > patch with this addition to my todo list, but I don't really know when I > will have time to work on it... Aye, that would be great! No need to rush -- you can also throw a feature request into the project issue tracker, maybe someone else will grab it. V- From sd at syntonetic.com Thu Mar 21 12:17:15 2013 From: sd at syntonetic.com (=?windows-1252?Q?S=F8ren?=) Date: Thu, 21 Mar 2013 17:17:15 +0100 Subject: [Numpy-discussion] numpy array to C API In-Reply-To: <20130321084147.GF7842@kudu.in-berlin.de> References: <5149C0C5.6050801@syntonetic.com> <514AC0B7.6020500@syntonetic.com> <20130321084147.GF7842@kudu.in-berlin.de> Message-ID: <514B328B.40107@syntonetic.com> Thanks Valentin Your article fell in dry spot when a newbie in C/Python interfacing. Python-C-API fits perfectly with my current use-case. I got currious about the Ctypes approach as well as "Ga?l Varoquaux?s blog post about avoiding data copies", but the link in the article didn't seem to work. (Under "Further Reading and References") cheers S?ren On 21/03/2013 09:41, Valentin Haenel wrote: > Dear S?ren, > > if you are new to interfacing python/numpy with C/C++, you may want to > check out: > > http://scipy-lectures.github.com/advanced/interfacing_with_c/interfacing_with_c.html > > Disclaimer: I am the author of this chapter, so this response is a bit > of a shameless plug :D > > Hope it helps. > > V- > > * S?ren [2013-03-21]: >> Thanks Robert, for making that clear. >> >> I got a deprecated warning the second I added >> #include >> and I got scared off too fast in my exploring phase. >> >> Cheers >> S?ren >> >> On 20/03/2013 17:03, Robert Kern wrote: >>> On Wed, Mar 20, 2013 at 1:59 PM, S?ren wrote: >>>> Greetings >>>> >>>> I'm extending our existing C/C++ software with Python/Numpy in order to do extra number crunching. >>>> It already works like a charm calling python with the C API . >>>> >>>> But what is the proper way of passing double arrays returned from Python/Numpy routines back to C? >>>> >>>> I came across PyArray but I can see in the compiler warnings, it is deprecated and I don't wanna start from scratch on legacy facilities. >>> What is this `PyArray` that you are referring to? There is nothing >>> named just `PyArray` to my knowledge. Do you mean direct access to the >>> `data` member of the PyArrayObject struct? Yes, that is deprecated. >>> Use the PyArray_DATA() macro to get a `void*` pointer to the start of >>> the data. >>> >>> http://docs.scipy.org/doc/numpy/reference/c-api.array.html#PyArray_DATA >>> >>> -- >>> Robert Kern >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion From valentin at haenel.co Thu Mar 21 12:34:51 2013 From: valentin at haenel.co (Valentin Haenel) Date: Thu, 21 Mar 2013 17:34:51 +0100 Subject: [Numpy-discussion] numpy array to C API In-Reply-To: <514B328B.40107@syntonetic.com> Message-ID: <20130321163451.GE12061@kudu.in-berlin.de> Dear S?ren, * S?ren [2013-03-21]: > Your article fell in dry spot when a newbie in C/Python interfacing. > Python-C-API fits perfectly with my current use-case. > > I got currious about the Ctypes approach as well as "Ga?l Varoquaux?s > blog post about avoiding data copies", but the link in the article > didn't seem to work. (Under "Further Reading and References") There seems to be something wrong with Ga?l's website. I have CC him, maybe he can fix it. best V- > > cheers > S?ren > > On 21/03/2013 09:41, Valentin Haenel wrote: > > Dear S?ren, > > > > if you are new to interfacing python/numpy with C/C++, you may want to > > check out: > > > > http://scipy-lectures.github.com/advanced/interfacing_with_c/interfacing_with_c.html > > > > Disclaimer: I am the author of this chapter, so this response is a bit > > of a shameless plug :D > > > > Hope it helps. > > > > V- > > > > * S?ren [2013-03-21]: > >> Thanks Robert, for making that clear. > >> > >> I got a deprecated warning the second I added > >> #include > >> and I got scared off too fast in my exploring phase. > >> > >> Cheers > >> S?ren > >> > >> On 20/03/2013 17:03, Robert Kern wrote: > >>> On Wed, Mar 20, 2013 at 1:59 PM, S?ren wrote: > >>>> Greetings > >>>> > >>>> I'm extending our existing C/C++ software with Python/Numpy in order to do extra number crunching. > >>>> It already works like a charm calling python with the C API . > >>>> > >>>> But what is the proper way of passing double arrays returned from Python/Numpy routines back to C? > >>>> > >>>> I came across PyArray but I can see in the compiler warnings, it is deprecated and I don't wanna start from scratch on legacy facilities. > >>> What is this `PyArray` that you are referring to? There is nothing > >>> named just `PyArray` to my knowledge. Do you mean direct access to the > >>> `data` member of the PyArrayObject struct? Yes, that is deprecated. > >>> Use the PyArray_DATA() macro to get a `void*` pointer to the start of > >>> the data. > >>> > >>> http://docs.scipy.org/doc/numpy/reference/c-api.array.html#PyArray_DATA > >>> > >>> -- > >>> Robert Kern > >>> _______________________________________________ > >>> NumPy-Discussion mailing list > >>> NumPy-Discussion at scipy.org > >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> NumPy-Discussion at scipy.org > >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From ralf.gommers at gmail.com Thu Mar 21 17:20:31 2013 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Thu, 21 Mar 2013 22:20:31 +0100 Subject: [Numpy-discussion] NumPy/SciPy participation in GSoC 2013 Message-ID: Hi all, It is the time of the year for Google Summer of Code applications. If we want to participate with Numpy and/or Scipy, we need two things: enough mentors and ideas for projects. If we get those, we'll apply under the PSF umbrella. They've outlined the timeline they're working by and guidelines at http://pyfound.blogspot.nl/2013/03/get-ready-for-google-summer-of-code.html. We should be able to come up with some interesting project ideas I'd think, let's put those at http://projects.scipy.org/scipy/wiki/SummerofCodeIdeas. Preferably with enough detail to be understandable for people new to the projects and a proposed mentor. We need at least 3 people willing to mentor a student. Ideally we'd have enough mentors this week, so we can apply to the PSF on time. If you're willing to be a mentor, please send me the following: name, email address, phone nr, and what you're interested in mentoring. If you have time constaints and have doubts about being able to be a primary mentor, being a backup mentor would also be helpful. Cheers, Ralf P.S. as you can probably tell from the above, I'm happy to coordinate the GSoC applications for Numpy and Scipy -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu Mar 21 20:00:37 2013 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 21 Mar 2013 18:00:37 -0600 Subject: [Numpy-discussion] Videos of PyCon talks. Message-ID: Here . Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu Mar 21 20:02:49 2013 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 21 Mar 2013 18:02:49 -0600 Subject: [Numpy-discussion] Numpy 1.7.1 Message-ID: The Numpy 1.7.1 release process seems to have stalled. What do we need to finish up to get it going again? I think it would be nice to shoot for a release maybe the weekend after next. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ake.sandgren at hpc2n.umu.se Fri Mar 22 03:47:13 2013 From: ake.sandgren at hpc2n.umu.se (Ake Sandgren) Date: Fri, 22 Mar 2013 08:47:13 +0100 Subject: [Numpy-discussion] Numpy 1.7.1 In-Reply-To: References: Message-ID: <1363938433.3343.43.camel@lurvas.hpc2n.umu.se> On Thu, 2013-03-21 at 18:02 -0600, Charles R Harris wrote: > The Numpy 1.7.1 release process seems to have stalled. What do we need > to finish up to get it going again? I think it would be nice to shoot > for a release maybe the weekend after next. Talking about 1.7.1 i have a couple of bug fixes for 1.7.0 at git://github.com/akesandgren/numpy.git in the v1.7.0-hpc2n branch They are quite small. -- Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden Internet: ake at hpc2n.umu.se Phone: +46 90 7866134 Fax: +46 90 7866126 Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se From pierre.barbierdereuille at gmail.com Fri Mar 22 06:41:13 2013 From: pierre.barbierdereuille at gmail.com (Pierre Barbier de Reuille) Date: Fri, 22 Mar 2013 11:41:13 +0100 Subject: [Numpy-discussion] Problem with numpy.records module Message-ID: Hello, I am trying to use titles for record arrays. In the documentation, it is specified that any column can set to "None". However, using 'None' fails on numpy 1.6.2 because in np.core.records, on line 195, the "strip" method is called on the title object. First, I hope I understood the documentation correctly. If so, is it possible to replace the line 195 with: self._titles = [n.strip() if n is not None else None for n in titles[:self._nfields]] so 'None' elements are handled properly? Thanks, -- Barbier de Reuille Pierre -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Fri Mar 22 07:42:51 2013 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 22 Mar 2013 11:42:51 +0000 Subject: [Numpy-discussion] Numpy 1.7.1 In-Reply-To: <1363938433.3343.43.camel@lurvas.hpc2n.umu.se> References: <1363938433.3343.43.camel@lurvas.hpc2n.umu.se> Message-ID: On Fri, Mar 22, 2013 at 7:47 AM, Ake Sandgren wrote: > On Thu, 2013-03-21 at 18:02 -0600, Charles R Harris wrote: >> The Numpy 1.7.1 release process seems to have stalled. What do we need >> to finish up to get it going again? I think it would be nice to shoot >> for a release maybe the weekend after next. > > Talking about 1.7.1 i have a couple of bug fixes for 1.7.0 at > git://github.com/akesandgren/numpy.git in the v1.7.0-hpc2n branch > > They are quite small. Please send as PRs against master, so we can review and merge them? -n From ake.sandgren at hpc2n.umu.se Fri Mar 22 08:16:50 2013 From: ake.sandgren at hpc2n.umu.se (Ake Sandgren) Date: Fri, 22 Mar 2013 13:16:50 +0100 Subject: [Numpy-discussion] Numpy 1.7.1 In-Reply-To: References: <1363938433.3343.43.camel@lurvas.hpc2n.umu.se> Message-ID: <1363954610.3343.66.camel@lurvas.hpc2n.umu.se> On Fri, 2013-03-22 at 11:42 +0000, Nathaniel Smith wrote: > On Fri, Mar 22, 2013 at 7:47 AM, Ake Sandgren wrote: > > On Thu, 2013-03-21 at 18:02 -0600, Charles R Harris wrote: > >> The Numpy 1.7.1 release process seems to have stalled. What do we need > >> to finish up to get it going again? I think it would be nice to shoot > >> for a release maybe the weekend after next. > > > > Talking about 1.7.1 i have a couple of bug fixes for 1.7.0 at > > git://github.com/akesandgren/numpy.git in the v1.7.0-hpc2n branch > > > > They are quite small. > > Please send as PRs against master, so we can review and merge them? Done. From ndbecker2 at gmail.com Fri Mar 22 09:59:52 2013 From: ndbecker2 at gmail.com (Neal Becker) Date: Fri, 22 Mar 2013 09:59:52 -0400 Subject: [Numpy-discussion] howto apply-along-axis? Message-ID: I frequently find I have my 1d function that performs some reduction that I'd like to apply-along some axis of an n-d array. As a trivial example, def sum(u): return np.sum (u) In this case the function is probably C/C++ code, but that is irrelevant (I think). Is there a reasonably efficient way to do this within numpy? From njs at pobox.com Fri Mar 22 10:21:03 2013 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 22 Mar 2013 14:21:03 +0000 Subject: [Numpy-discussion] howto apply-along-axis? In-Reply-To: References: Message-ID: On 22 Mar 2013 14:09, "Neal Becker" wrote: > > I frequently find I have my 1d function that performs some reduction that I'd > like to apply-along some axis of an n-d array. > > As a trivial example, > > def sum(u): > return np.sum (u) > > In this case the function is probably C/C++ code, but that is irrelevant (I > think). > > Is there a reasonably efficient way to do this within numpy? The core infrastructure for this sort of thing is there - search on "generalized ufuncs". There's no python-level api as far as I know, though, yet. You could write a reasonable facsimile of np.vectorize for such functions using nditer. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjwilliams43 at gmail.com Fri Mar 22 17:39:35 2013 From: cjwilliams43 at gmail.com (Colin J. Williams) Date: Fri, 22 Mar 2013 17:39:35 -0400 Subject: [Numpy-discussion] Execution time difference between 2.7 and 3.2 using numpy In-Reply-To: References: <5149BDBB.6060509@ncf.ca> <5149CF4D.6090906@gmail.com> Message-ID: <514CCF97.7030902@gmail.com> An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Sat Mar 23 00:05:11 2013 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Fri, 22 Mar 2013 21:05:11 -0700 Subject: [Numpy-discussion] Execution time difference between 2.7 and 3.2 using numpy In-Reply-To: <514CCF97.7030902@gmail.com> References: <5149BDBB.6060509@ncf.ca> <5149CF4D.6090906@gmail.com> <514CCF97.7030902@gmail.com> Message-ID: On Fri, Mar 22, 2013 at 2:39 PM, Colin J. Williams wrote: > I have updated to numpy 1.7.0 for each of the Pythons 2.7.3, 3.2.3 and > 3.3.0. ... > The tests, which are available > here(http://web.ncf.ca/cjw/FP%20Summary%20over%20273-323-330.txt), show that > 3.2 is slower, but not to the same degree reported before. Have posted your test code anywhere? Anyway, depending on how you did your timings, that looks to me like 3.* is a bit faster with small data, and pretty much within measurement error for the large datasets. And if the large ones are doing things with really big arrays (I'm assuming pretty big, as you're getting close to 32 bit memory limits...), then it's really hard to imagine how python version could make a noticeable difference -- the real work would be in the numpy code, and that's exactly the same on all python versions. If you are using BLAS or LAPACK stuff, then there might be some differences with the different builds, though I wouldn't expect so if you ar getting them from the same source. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From ralf.gommers at gmail.com Sat Mar 23 07:21:21 2013 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 23 Mar 2013 12:21:21 +0100 Subject: [Numpy-discussion] Execution time difference between 2.7 and 3.2 using numpy In-Reply-To: <514CCF97.7030902@gmail.com> References: <5149BDBB.6060509@ncf.ca> <5149CF4D.6090906@gmail.com> <514CCF97.7030902@gmail.com> Message-ID: On Fri, Mar 22, 2013 at 10:39 PM, Colin J. Williams wrote: > On 20/03/2013 11:12 AM, Fr?d?ric Bastien wrote: > > On Wed, Mar 20, 2013 at 11:01 AM, Colin J. Williams wrote: > > On 20/03/2013 10:30 AM, Fr?d?ric Bastien wrote: > > Hi, > > win32 do not mean it is a 32 bits windows. sys.platform always return > win32 on 32bits and 64 bits windows even for python 64 bits. > > But that is a good question, is your python 32 or 64 bits? > > 32 bits. > > That explain why you have memory problem but not other people with 64 > bits version. So if you want to work with bigger input, change to a > python 64 bits. > > Fred > > > Thanks to the people who responded to my report that numpy, with Python > 3.2 was significantly slower than with Python 2.7. > > I have updated to numpy 1.7.0 for each of the Pythons 2.7.3, 3.2.3 and > 3.3.0. > > The Pythons came from python.org and the Numpys from PyPi. The SciPy > site still points to Source Forge, I gathered from the responses that > Source Forge is no longer recommended for downloads. > That's not the case. The official binaries for NumPy and SciPy are on SourceForge. The Windows installers on PyPI are there to make easy_install work, but they're likely slower than the SF installers (no SSE2/SSE3 instructions). Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From toddrjen at gmail.com Sat Mar 23 07:23:10 2013 From: toddrjen at gmail.com (Todd) Date: Sat, 23 Mar 2013 12:23:10 +0100 Subject: [Numpy-discussion] Execution time difference between 2.7 and 3.2 using numpy In-Reply-To: References: <5149BDBB.6060509@ncf.ca> <5149CF4D.6090906@gmail.com> <514CCF97.7030902@gmail.com> Message-ID: On Sat, Mar 23, 2013 at 12:21 PM, Ralf Gommers wrote: > > That's not the case. The official binaries for NumPy and SciPy are on > SourceForge. The Windows installers on PyPI are there to make easy_install > work, but they're likely slower than the SF installers (no SSE2/SSE3 > instructions). > > Ralf > > Is there a reason why the same binaries can't be used for both? -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sat Mar 23 08:17:28 2013 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 23 Mar 2013 13:17:28 +0100 Subject: [Numpy-discussion] Execution time difference between 2.7 and 3.2 using numpy In-Reply-To: References: <5149BDBB.6060509@ncf.ca> <5149CF4D.6090906@gmail.com> <514CCF97.7030902@gmail.com> Message-ID: On Sat, Mar 23, 2013 at 12:23 PM, Todd wrote: > On Sat, Mar 23, 2013 at 12:21 PM, Ralf Gommers wrote: > >> >> That's not the case. The official binaries for NumPy and SciPy are on >> SourceForge. The Windows installers on PyPI are there to make easy_install >> work, but they're likely slower than the SF installers (no SSE2/SSE3 >> instructions). >> >> Ralf >> >> > Is there a reason why the same binaries can't be used for both? > The SF .exe superpack installers contains three installers: plain, SSE2 and SSE3 support. easy_install doesn't know what to do with such an installer. See http://thread.gmane.org/gmane.comp.python.numeric.general/29395/focus=29582for the discussion on why things are as they are now. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjwilliams43 at gmail.com Sat Mar 23 10:39:34 2013 From: cjwilliams43 at gmail.com (Colin J. Williams) Date: Sat, 23 Mar 2013 10:39:34 -0400 Subject: [Numpy-discussion] Execution time difference between 2.7 and 3.2 using numpy In-Reply-To: References: <5149BDBB.6060509@ncf.ca> <5149CF4D.6090906@gmail.com> <514CCF97.7030902@gmail.com> Message-ID: <514DBEA6.6080504@gmail.com> An HTML attachment was scrubbed... URL: From davidmenhur at gmail.com Sat Mar 23 11:17:57 2013 From: davidmenhur at gmail.com (=?UTF-8?B?RGHPgGlk?=) Date: Sat, 23 Mar 2013 16:17:57 +0100 Subject: [Numpy-discussion] Execution time difference between 2.7 and 3.2 using numpy In-Reply-To: <514DBEA6.6080504@gmail.com> References: <5149BDBB.6060509@ncf.ca> <5149CF4D.6090906@gmail.com> <514CCF97.7030902@gmail.com> <514DBEA6.6080504@gmail.com> Message-ID: I am a bit worried about the differences in results. Just to be sure you are comparing apples with apples, it may be a good idea to set the seed at the beginning: np.random.seed( SEED ) where SEED is an int. This way, you will be inverting always the same matrix, regardless of the Python version. I think, even if the timing is different, the results should be the same. http://docs.scipy.org/doc/numpy/reference/generated/numpy.random.seed.html#numpy.random.seed David. On 23 March 2013 15:39, Colin J. Williams wrote: > On 23/03/2013 7:21 AM, Ralf Gommers wrote: > > > > > On Fri, Mar 22, 2013 at 10:39 PM, Colin J. Williams > wrote: >> >> On 20/03/2013 11:12 AM, Fr?d?ric Bastien wrote: >> >> On Wed, Mar 20, 2013 at 11:01 AM, Colin J. Williams >> wrote: >> >> On 20/03/2013 10:30 AM, Fr?d?ric Bastien wrote: >> >> Hi, >> >> win32 do not mean it is a 32 bits windows. sys.platform always return >> win32 on 32bits and 64 bits windows even for python 64 bits. >> >> But that is a good question, is your python 32 or 64 bits? >> >> 32 bits. >> >> That explain why you have memory problem but not other people with 64 >> bits version. So if you want to work with bigger input, change to a >> python 64 bits. >> >> Fred >> >> Thanks to the people who responded to my report that numpy, with Python >> 3.2 was significantly slower than with Python 2.7. >> >> I have updated to numpy 1.7.0 for each of the Pythons 2.7.3, 3.2.3 and >> 3.3.0. >> >> The Pythons came from python.org and the Numpys from PyPi. The SciPy site >> still points to Source Forge, I gathered from the responses that Source >> Forge is no longer recommended for downloads. > > > That's not the case. The official binaries for NumPy and SciPy are on > SourceForge. The Windows installers on PyPI are there to make easy_install > work, but they're likely slower than the SF installers (no SSE2/SSE3 > instructions). > > Ralf > > Thanks, I'll read over Robert Kern's comments. PyPi is the simpler process, > but, if the result is unoptimized code, then easy_install is not the way to > go. > > The code is available here(http://web.ncf.ca/cjw/testFPSpeed.py) > and the most recent test results are > here(http://web.ncf.ca/cjw/FP%2023-Mar-13%20Test%20Summary.txt). These are > using PyPi, I'll look into SourceForge. > > Colin W. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From jsseabold at gmail.com Sat Mar 23 14:19:42 2013 From: jsseabold at gmail.com (Skipper Seabold) Date: Sat, 23 Mar 2013 14:19:42 -0400 Subject: [Numpy-discussion] Unable to building numpy with openblas using bento or distutils Message-ID: Some help on this would be greatly appreciated. It's been recommended to use OpenBlas over ATLAS, so I've been trying to build numpy with openblas and have run into a few problems. 1) Build fails using bento master and waf 1.7.9, see below. 2) Distutils doesn't seem to be able to find lapack as part of atlas. I tried to skip a site.cfg and define environmental variables. No idea what I missed. I followed instructions found scattered over the internet and only understand vaguely the issues. Maybe someone can help. I'd be happy to update the wiki with any answers. To truly support OpenBlas, is it maybe necessary to make some additions to numpy/distutils/system_info.py? Thanks for having a look, Skipper Install OpenBlas ----------------------------- git clone git://github.com/xianyi/OpenBLAS cd OpenBlas Edit c_check to look for libpthreads in the right place (Kubuntu 12.10) |4 $ git diff c_check ``` diff --git a/c_check b/c_check index 4d82237..de0fd33 100644 --- a/c_check +++ b/c_check @@ -241,7 +241,7 @@ print CONFFILE "#define FUNDERSCORE\t$need_fu\n" if $need_fu if ($os eq "LINUX") { - @pthread = split(/\s+/, `nm /lib/libpthread.so* | grep _pthread_create`); + @pthread = split(/\s+/, `nm /lib/x86_64-linux-gnu/libpthread.so* | grep _pthread_create`); if ($pthread[2] ne "") { print CONFFILE "#define PTHREAD_CREATE_FUNC $pthread[2]\n"; ``` make fc=gfortran make PREFIX=~/.local install Everything looks ok, so far. Install NumPy --------------------------- Using numpy master I tried to use bento master and waf 1.7.9, following instructions from David's blog bentomaker configure --prefix=/home/skipper/.local --with-blas-lapack-libdir=/home/skipper/.local/lib --blas-lapack-type=openblas .. bentomaker build -j4 ``` [101/104] cshlib: build/numpy/core/src/umath/umath_tests.c.11.o -> build/numpy/core/umath_tests.so /usr/bin/ld: numpy/core/lib/libnpymath.a(halffloat.c.12.o): relocation R_X86_64_PC32 against symbol `npy_halfbits_to_floatbits' can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: final link failed: Bad value collect2: error: ld returned 1 exit status /usr/bin/ld: numpy/core/lib/libnpymath.a(halffloat.c.12.o): relocation R_X86_64_PC32 against symbol `npy_halfbits_to_floatbits' can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: final link failed: Bad value collect2: error: ld returned 1 exit status ``` No idea, so, let's try distutils export LAPACK=~/.local/lib/libopenblas.a export BLAS=~/.local/lib/libopenblas.a export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/.local/lib/ echo $LD_LIBRARY_PATH ``` :/usr/local/lib64/R/bin:/home/skipper/.local/lib/ ``` This step seems to be necessary? python setup.py config ``` Running from numpy source directory. non-existing path in 'numpy/distutils': 'site.cfg' F2PY Version 2 numpy/core/setup_common.py:88: MismatchCAPIWarning: API mismatch detected, the C API version numbers have to be updated. Current C api version is 8, with checksum f4362353e2d72f889fda0128aa015037, but recorded checksum for C API version 8 in codegen_dir/cversions.txt is 17321775fc884de0b1eda478cd61c74b. If functions were added in the C API, you have to update C_API_VERSION in numpy/core/setup_common.py. MismatchCAPIWarning) blas_opt_info: blas_mkl_info: libraries mkl,vml,guide not found in ['/usr/local/lib64', '/usr/local/lib', '/usr/lib64', '/usr/lib', '/usr/lib/x86_64-linux-gnu'] NOT AVAILABLE atlas_blas_threads_info: Setting PTATLAS=ATLAS libraries ptf77blas,ptcblas,atlas not found in ['/usr/local/lib64', '/usr/local/lib', '/usr/lib64', '/usr/lib', '/usr/lib/x86_64-linux-gnu'] NOT AVAILABLE atlas_blas_info: libraries f77blas,cblas,atlas not found in ['/usr/local/lib64', '/usr/local/lib', '/usr/lib64', '/usr/lib', '/usr/lib/x86_64-linux-gnu'] NOT AVAILABLE /home/skipper/src/numpy-skipper/numpy/distutils/system_info.py:1501: UserWarning: Atlas (http://math-atlas.sourceforge.net/) libraries not found. Directories to search for the libraries can be specified in the numpy/distutils/site.cfg file (section [atlas]) or by setting the ATLAS environment variable. warnings.warn(AtlasNotFoundError.__doc__) blas_info: Replacing _lib_names[0]=='blas' with 'openblas' Replacing _lib_names[0]=='openblas' with 'openblas' FOUND: libraries = ['openblas'] library_dirs = ['/home/skipper/.local/lib'] language = f77 FOUND: libraries = ['openblas'] library_dirs = ['/home/skipper/.local/lib'] define_macros = [('NO_ATLAS_INFO', 1)] language = f77 non-existing path in 'numpy/lib': 'benchmarks' lapack_opt_info: lapack_mkl_info: mkl_info: libraries mkl,vml,guide not found in ['/usr/local/lib64', '/usr/local/lib', '/usr/lib64', '/usr/lib', '/usr/lib/x86_64-linux-gnu'] NOT AVAILABLE NOT AVAILABLE atlas_threads_info: Setting PTATLAS=ATLAS libraries ptf77blas,ptcblas,atlas not found in /usr/local/lib64 libraries lapack_atlas not found in /usr/local/lib64 libraries ptf77blas,ptcblas,atlas not found in /usr/local/lib libraries lapack_atlas not found in /usr/local/lib libraries ptf77blas,ptcblas,atlas not found in /usr/lib64 libraries lapack_atlas not found in /usr/lib64 libraries ptf77blas,ptcblas,atlas not found in /usr/lib libraries lapack_atlas not found in /usr/lib libraries ptf77blas,ptcblas,atlas not found in /usr/lib/x86_64-linux-gnu libraries lapack_atlas not found in /usr/lib/x86_64-linux-gnu numpy.distutils.system_info.atlas_threads_info NOT AVAILABLE atlas_info: libraries f77blas,cblas,atlas not found in /usr/local/lib64 libraries lapack_atlas not found in /usr/local/lib64 libraries f77blas,cblas,atlas not found in /usr/local/lib libraries lapack_atlas not found in /usr/local/lib libraries f77blas,cblas,atlas not found in /usr/lib64 libraries lapack_atlas not found in /usr/lib64 libraries f77blas,cblas,atlas not found in /usr/lib libraries lapack_atlas not found in /usr/lib libraries f77blas,cblas,atlas not found in /usr/lib/x86_64-linux-gnu libraries lapack_atlas not found in /usr/lib/x86_64-linux-gnu numpy.distutils.system_info.atlas_info NOT AVAILABLE /home/skipper/src/numpy-skipper/numpy/distutils/system_info.py:1415: UserWarning: Atlas (http://math-atlas.sourceforge.net/) libraries not found. Directories to search for the libraries can be specified in the numpy/distutils/site.cfg file (section [atlas]) or by setting the ATLAS environment variable. warnings.warn(AtlasNotFoundError.__doc__) lapack_info: Replacing _lib_names[0]=='lapack' with 'openblas' Replacing _lib_names[0]=='openblas' with 'openblas' FOUND: libraries = ['openblas'] library_dirs = ['/home/skipper/.local/lib'] language = f77 FOUND: libraries = ['openblas', 'openblas'] library_dirs = ['/home/skipper/.local/lib'] define_macros = [('NO_ATLAS_INFO', 1)] language = f77 running config ``` python setup.py build &> build.log Build log is here. Obviously it didn't go well, but I don't see anything to indicate problems. Sometimes I am able to get _dotblas.so built, though I don't know what causes it. This time I wasn't. https://gist.github.com/jseabold/7054ba9d85eae09eb402#file-numpy_build-log sudo python setup.py install &> install.log https://gist.github.com/jseabold/a0f5638b65d44aeff598#file-numpy_install-log >>> import numpy as np Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python2.7/dist-packages/numpy/__init__.py", line 138, in import add_newdocs File "/usr/local/lib/python2.7/dist-packages/numpy/add_newdocs.py", line 13, in from numpy.lib import add_newdoc File "/usr/local/lib/python2.7/dist-packages/numpy/lib/__init__.py", line 15, in from polynomial import * File "/usr/local/lib/python2.7/dist-packages/numpy/lib/polynomial.py", line 19, in from numpy.linalg import eigvals, lstsq, inv File "/usr/local/lib/python2.7/dist-packages/numpy/linalg/__init__.py", line 50, in from linalg import * File "/usr/local/lib/python2.7/dist-packages/numpy/linalg/linalg.py", line 25, in from numpy.linalg import lapack_lite ImportError: libopenblas.so.0: cannot open shared object file: No such file or directory -------------- next part -------------- An HTML attachment was scrubbed... URL: From klemm at phys.ethz.ch Sat Mar 23 14:32:10 2013 From: klemm at phys.ethz.ch (Hanno Klemm) Date: Sat, 23 Mar 2013 19:32:10 +0100 Subject: [Numpy-discussion] Unable to building numpy with openblas using bento or distutils In-Reply-To: References: Message-ID: <128E07FF-9B6C-4F47-AEFD-43752010E8BB@phys.ethz.ch> Skipper, this looks like a problem that I had in the bad old days with ATLAS, as well. Try compiling openblas with the -fPIC flag that used to help. Best of luck, Hanno hanno.klemm at me.com Sent from my mobile device, please excuse my brevity. On 23.03.2013, at 19:19, Skipper Seabold wrote: > Some help on this would be greatly appreciated. It's been recommended to use OpenBlas over ATLAS, so I've been trying to build numpy with openblas and have run into a few problems. > > 1) Build fails using bento master and waf 1.7.9, see below. > 2) Distutils doesn't seem to be able to find lapack as part of atlas. I tried to skip a site.cfg and define environmental variables. No idea what I missed. > > I followed instructions found scattered over the internet and only understand vaguely the issues. Maybe someone can help. I'd be happy to update the wiki with any answers. > > To truly support OpenBlas, is it maybe necessary to make some additions to numpy/distutils/system_info.py? > > Thanks for having a look, > > Skipper > > Install OpenBlas > ----------------------------- > git clone git://github.com/xianyi/OpenBLAS > cd OpenBlas > > Edit c_check to look for libpthreads in the right place (Kubuntu 12.10) > > |4 $ git diff c_check > ``` > diff --git a/c_check b/c_check > index 4d82237..de0fd33 100644 > --- a/c_check > +++ b/c_check > @@ -241,7 +241,7 @@ print CONFFILE "#define FUNDERSCORE\t$need_fu\n" if $need_fu > > if ($os eq "LINUX") { > > - @pthread = split(/\s+/, `nm /lib/libpthread.so* | grep _pthread_create`); > + @pthread = split(/\s+/, `nm /lib/x86_64-linux-gnu/libpthread.so* | grep _pthread_create`); > > if ($pthread[2] ne "") { > print CONFFILE "#define PTHREAD_CREATE_FUNC $pthread[2]\n"; > ``` > > make fc=gfortran > make PREFIX=~/.local install > > Everything looks ok, so far. > > Install NumPy > --------------------------- > Using numpy master > > I tried to use bento master and waf 1.7.9, following instructions from David's blog > > bentomaker configure --prefix=/home/skipper/.local --with-blas-lapack-libdir=/home/skipper/.local/lib --blas-lapack-type=openblas .. > bentomaker build -j4 > > ``` > > [101/104] cshlib: build/numpy/core/src/umath/umath_tests.c.11.o -> build/numpy/core/umath_tests.so > /usr/bin/ld: numpy/core/lib/libnpymath.a(halffloat.c.12.o): relocation R_X86_64_PC32 against symbol `npy_halfbits_to_floatbits' can not be used when making a shared object; recompile with -fPIC > /usr/bin/ld: final link failed: Bad value > collect2: error: ld returned 1 exit status > /usr/bin/ld: numpy/core/lib/libnpymath.a(halffloat.c.12.o): relocation R_X86_64_PC32 against symbol `npy_halfbits_to_floatbits' can not be used when making a shared object; recompile with -fPIC > /usr/bin/ld: final link failed: Bad value > collect2: error: ld returned 1 exit status > ``` > > No idea, so, let's try distutils > > export LAPACK=~/.local/lib/libopenblas.a > export BLAS=~/.local/lib/libopenblas.a > export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/.local/lib/ > echo $LD_LIBRARY_PATH > ``` > :/usr/local/lib64/R/bin:/home/skipper/.local/lib/ > ``` > > This step seems to be necessary? > > python setup.py config > ``` > Running from numpy source directory. > non-existing path in 'numpy/distutils': 'site.cfg' > F2PY Version 2 > numpy/core/setup_common.py:88: MismatchCAPIWarning: API mismatch detected, the C API version numbers have to be updated. Current C api version is 8, with checksum f4362353e2d72f889fda0128aa015037, but recorded checksum for C API version 8 in codegen_dir/cversions.txt is 17321775fc884de0b1eda478cd61c74b. If functions were added in the C API, you have to update C_API_VERSION in numpy/core/setup_common.py. > MismatchCAPIWarning) > blas_opt_info: > blas_mkl_info: > libraries mkl,vml,guide not found in ['/usr/local/lib64', '/usr/local/lib', '/usr/lib64', '/usr/lib', '/usr/lib/x86_64-linux-gnu'] > NOT AVAILABLE > > atlas_blas_threads_info: > Setting PTATLAS=ATLAS > libraries ptf77blas,ptcblas,atlas not found in ['/usr/local/lib64', '/usr/local/lib', '/usr/lib64', '/usr/lib', '/usr/lib/x86_64-linux-gnu'] > NOT AVAILABLE > > atlas_blas_info: > libraries f77blas,cblas,atlas not found in ['/usr/local/lib64', '/usr/local/lib', '/usr/lib64', '/usr/lib', '/usr/lib/x86_64-linux-gnu'] > NOT AVAILABLE > > /home/skipper/src/numpy-skipper/numpy/distutils/system_info.py:1501: UserWarning: > Atlas (http://math-atlas.sourceforge.net/) libraries not found. > Directories to search for the libraries can be specified in the > numpy/distutils/site.cfg file (section [atlas]) or by setting > the ATLAS environment variable. > warnings.warn(AtlasNotFoundError.__doc__) > blas_info: > Replacing _lib_names[0]=='blas' with 'openblas' > Replacing _lib_names[0]=='openblas' with 'openblas' > FOUND: > libraries = ['openblas'] > library_dirs = ['/home/skipper/.local/lib'] > language = f77 > > FOUND: > libraries = ['openblas'] > library_dirs = ['/home/skipper/.local/lib'] > define_macros = [('NO_ATLAS_INFO', 1)] > language = f77 > > non-existing path in 'numpy/lib': 'benchmarks' > lapack_opt_info: > lapack_mkl_info: > mkl_info: > libraries mkl,vml,guide not found in ['/usr/local/lib64', '/usr/local/lib', '/usr/lib64', '/usr/lib', '/usr/lib/x86_64-linux-gnu'] > NOT AVAILABLE > > NOT AVAILABLE > > atlas_threads_info: > Setting PTATLAS=ATLAS > libraries ptf77blas,ptcblas,atlas not found in /usr/local/lib64 > libraries lapack_atlas not found in /usr/local/lib64 > libraries ptf77blas,ptcblas,atlas not found in /usr/local/lib > libraries lapack_atlas not found in /usr/local/lib > libraries ptf77blas,ptcblas,atlas not found in /usr/lib64 > libraries lapack_atlas not found in /usr/lib64 > libraries ptf77blas,ptcblas,atlas not found in /usr/lib > libraries lapack_atlas not found in /usr/lib > libraries ptf77blas,ptcblas,atlas not found in /usr/lib/x86_64-linux-gnu > libraries lapack_atlas not found in /usr/lib/x86_64-linux-gnu > numpy.distutils.system_info.atlas_threads_info > NOT AVAILABLE > > atlas_info: > libraries f77blas,cblas,atlas not found in /usr/local/lib64 > libraries lapack_atlas not found in /usr/local/lib64 > libraries f77blas,cblas,atlas not found in /usr/local/lib > libraries lapack_atlas not found in /usr/local/lib > libraries f77blas,cblas,atlas not found in /usr/lib64 > libraries lapack_atlas not found in /usr/lib64 > libraries f77blas,cblas,atlas not found in /usr/lib > libraries lapack_atlas not found in /usr/lib > libraries f77blas,cblas,atlas not found in /usr/lib/x86_64-linux-gnu > libraries lapack_atlas not found in /usr/lib/x86_64-linux-gnu > numpy.distutils.system_info.atlas_info > NOT AVAILABLE > > /home/skipper/src/numpy-skipper/numpy/distutils/system_info.py:1415: UserWarning: > Atlas (http://math-atlas.sourceforge.net/) libraries not found. > Directories to search for the libraries can be specified in the > numpy/distutils/site.cfg file (section [atlas]) or by setting > the ATLAS environment variable. > warnings.warn(AtlasNotFoundError.__doc__) > lapack_info: > Replacing _lib_names[0]=='lapack' with 'openblas' > Replacing _lib_names[0]=='openblas' with 'openblas' > FOUND: > libraries = ['openblas'] > library_dirs = ['/home/skipper/.local/lib'] > language = f77 > > FOUND: > libraries = ['openblas', 'openblas'] > library_dirs = ['/home/skipper/.local/lib'] > define_macros = [('NO_ATLAS_INFO', 1)] > language = f77 > > running config > ``` > > python setup.py build &> build.log > > Build log is here. Obviously it didn't go well, but I don't see anything to indicate problems. Sometimes I am able to get _dotblas.so built, though I don't know what causes it. This time I wasn't. > > https://gist.github.com/jseabold/7054ba9d85eae09eb402#file-numpy_build-log > > sudo python setup.py install &> install.log > > https://gist.github.com/jseabold/a0f5638b65d44aeff598#file-numpy_install-log > > >>> import numpy as np > Traceback (most recent call last): > File "", line 1, in > File "/usr/local/lib/python2.7/dist-packages/numpy/__init__.py", line 138, in > import add_newdocs > File "/usr/local/lib/python2.7/dist-packages/numpy/add_newdocs.py", line 13, in > from numpy.lib import add_newdoc > File "/usr/local/lib/python2.7/dist-packages/numpy/lib/__init__.py", line 15, in > from polynomial import * > File "/usr/local/lib/python2.7/dist-packages/numpy/lib/polynomial.py", line 19, in > from numpy.linalg import eigvals, lstsq, inv > File "/usr/local/lib/python2.7/dist-packages/numpy/linalg/__init__.py", line 50, in > from linalg import * > File "/usr/local/lib/python2.7/dist-packages/numpy/linalg/linalg.py", line 25, in > from numpy.linalg import lapack_lite > ImportError: libopenblas.so.0: cannot open shared object file: No such file or directory > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsseabold at gmail.com Sat Mar 23 15:19:43 2013 From: jsseabold at gmail.com (Skipper Seabold) Date: Sat, 23 Mar 2013 15:19:43 -0400 Subject: [Numpy-discussion] Unable to building numpy with openblas using bento or distutils In-Reply-To: <128E07FF-9B6C-4F47-AEFD-43752010E8BB@phys.ethz.ch> References: <128E07FF-9B6C-4F47-AEFD-43752010E8BB@phys.ethz.ch> Message-ID: On Sat, Mar 23, 2013 at 2:32 PM, Hanno Klemm wrote: > Skipper, > this looks like a problem that I had in the bad old days with ATLAS, as > well. Try compiling openblas with the -fPIC flag that used to help. > > Thanks for having a look. I checked after seeing that odd bento failure (see here [1]), and it looks to me like OpenBlas uses the -fPIC flag in all of the gcc and gfortran calls. Possible related? [2] Skipper [1] https://github.com/cournape/Bento/issues/116 [2] https://github.com/cournape/Bento/issues/128 > Best of luck, > Hanno > > hanno.klemm at me.com > > Sent from my mobile device, please excuse my brevity. > > On 23.03.2013, at 19:19, Skipper Seabold wrote: > > Some help on this would be greatly appreciated. It's been recommended to > use OpenBlas over ATLAS, so I've been trying to build numpy with openblas > and have run into a few problems. > > 1) Build fails using bento master and waf 1.7.9, see below. > 2) Distutils doesn't seem to be able to find lapack as part of atlas. I > tried to skip a site.cfg and define environmental variables. No idea what I > missed. > > I followed instructions found scattered over the internet and only > understand vaguely the issues. Maybe someone can help. I'd be happy to > update the wiki with any answers. > > To truly support OpenBlas, is it maybe necessary to make some additions to > numpy/distutils/system_info.py? > > Thanks for having a look, > > Skipper > > Install OpenBlas > ----------------------------- > git clone git://github.com/xianyi/OpenBLAS > cd OpenBlas > > Edit c_check to look for libpthreads in the right place (Kubuntu 12.10) > > |4 $ git diff c_check > ``` > diff --git a/c_check b/c_check > index 4d82237..de0fd33 100644 > --- a/c_check > +++ b/c_check > @@ -241,7 +241,7 @@ print CONFFILE "#define FUNDERSCORE\t$need_fu\n" if > $need_fu > > if ($os eq "LINUX") { > > - @pthread = split(/\s+/, `nm /lib/libpthread.so* | grep > _pthread_create`); > + @pthread = split(/\s+/, `nm /lib/x86_64-linux-gnu/libpthread.so* | > grep _pthread_create`); > > if ($pthread[2] ne "") { > print CONFFILE "#define PTHREAD_CREATE_FUNC $pthread[2]\n"; > ``` > > make fc=gfortran > make PREFIX=~/.local install > > Everything looks ok, so far. > > Install NumPy > --------------------------- > Using numpy master > > I tried to use bento master and waf 1.7.9, following instructions from > David's blog > > bentomaker configure --prefix=/home/skipper/.local > --with-blas-lapack-libdir=/home/skipper/.local/lib > --blas-lapack-type=openblas .. > bentomaker build -j4 > > ``` > > [101/104] cshlib: build/numpy/core/src/umath/umath_tests.c.11.o -> > build/numpy/core/umath_tests.so > > /usr/bin/ld: numpy/core/lib/libnpymath.a(halffloat.c.12.o): relocation > R_X86_64_PC32 against symbol `npy_halfbits_to_floatbits' can not be used > when making a shared object; recompile with -fPIC > /usr/bin/ld: final link failed: Bad value > collect2: error: ld returned 1 exit status > /usr/bin/ld: numpy/core/lib/libnpymath.a(halffloat.c.12.o): relocation > R_X86_64_PC32 against symbol `npy_halfbits_to_floatbits' can not be used > when making a shared object; recompile with -fPIC > /usr/bin/ld: final link failed: Bad value > collect2: error: ld returned 1 exit status > ``` > > No idea, so, let's try distutils > > export LAPACK=~/.local/lib/libopenblas.a > export BLAS=~/.local/lib/libopenblas.a > export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/.local/lib/ > echo $LD_LIBRARY_PATH > ``` > :/usr/local/lib64/R/bin:/home/skipper/.local/lib/ > ``` > > This step seems to be necessary? > > python setup.py config > ``` > Running from numpy source directory. > non-existing path in 'numpy/distutils': 'site.cfg' > F2PY Version 2 > numpy/core/setup_common.py:88: MismatchCAPIWarning: API mismatch detected, > the C API version numbers have to be updated. Current C api version is 8, > with checksum f4362353e2d72f889fda0128aa015037, but recorded checksum for C > API version 8 in codegen_dir/cversions.txt is > 17321775fc884de0b1eda478cd61c74b. If functions were added in the C API, you > have to update C_API_VERSION in numpy/core/setup_common.py. > MismatchCAPIWarning) > blas_opt_info: > blas_mkl_info: > libraries mkl,vml,guide not found in ['/usr/local/lib64', > '/usr/local/lib', '/usr/lib64', '/usr/lib', '/usr/lib/x86_64-linux-gnu'] > NOT AVAILABLE > > atlas_blas_threads_info: > Setting PTATLAS=ATLAS > libraries ptf77blas,ptcblas,atlas not found in ['/usr/local/lib64', > '/usr/local/lib', '/usr/lib64', '/usr/lib', '/usr/lib/x86_64-linux-gnu'] > NOT AVAILABLE > > atlas_blas_info: > libraries f77blas,cblas,atlas not found in ['/usr/local/lib64', > '/usr/local/lib', '/usr/lib64', '/usr/lib', '/usr/lib/x86_64-linux-gnu'] > NOT AVAILABLE > > /home/skipper/src/numpy-skipper/numpy/distutils/system_info.py:1501: > UserWarning: > Atlas (http://math-atlas.sourceforge.net/) libraries not found. > Directories to search for the libraries can be specified in the > numpy/distutils/site.cfg file (section [atlas]) or by setting > the ATLAS environment variable. > warnings.warn(AtlasNotFoundError.__doc__) > blas_info: > Replacing _lib_names[0]=='blas' with 'openblas' > Replacing _lib_names[0]=='openblas' with 'openblas' > FOUND: > libraries = ['openblas'] > library_dirs = ['/home/skipper/.local/lib'] > language = f77 > > FOUND: > libraries = ['openblas'] > library_dirs = ['/home/skipper/.local/lib'] > define_macros = [('NO_ATLAS_INFO', 1)] > language = f77 > > non-existing path in 'numpy/lib': 'benchmarks' > lapack_opt_info: > lapack_mkl_info: > mkl_info: > libraries mkl,vml,guide not found in ['/usr/local/lib64', > '/usr/local/lib', '/usr/lib64', '/usr/lib', '/usr/lib/x86_64-linux-gnu'] > NOT AVAILABLE > > NOT AVAILABLE > > atlas_threads_info: > Setting PTATLAS=ATLAS > libraries ptf77blas,ptcblas,atlas not found in /usr/local/lib64 > libraries lapack_atlas not found in /usr/local/lib64 > libraries ptf77blas,ptcblas,atlas not found in /usr/local/lib > libraries lapack_atlas not found in /usr/local/lib > libraries ptf77blas,ptcblas,atlas not found in /usr/lib64 > libraries lapack_atlas not found in /usr/lib64 > libraries ptf77blas,ptcblas,atlas not found in /usr/lib > libraries lapack_atlas not found in /usr/lib > libraries ptf77blas,ptcblas,atlas not found in /usr/lib/x86_64-linux-gnu > libraries lapack_atlas not found in /usr/lib/x86_64-linux-gnu > numpy.distutils.system_info.atlas_threads_info > NOT AVAILABLE > > atlas_info: > libraries f77blas,cblas,atlas not found in /usr/local/lib64 > libraries lapack_atlas not found in /usr/local/lib64 > libraries f77blas,cblas,atlas not found in /usr/local/lib > libraries lapack_atlas not found in /usr/local/lib > libraries f77blas,cblas,atlas not found in /usr/lib64 > libraries lapack_atlas not found in /usr/lib64 > libraries f77blas,cblas,atlas not found in /usr/lib > libraries lapack_atlas not found in /usr/lib > libraries f77blas,cblas,atlas not found in /usr/lib/x86_64-linux-gnu > libraries lapack_atlas not found in /usr/lib/x86_64-linux-gnu > numpy.distutils.system_info.atlas_info > NOT AVAILABLE > > /home/skipper/src/numpy-skipper/numpy/distutils/system_info.py:1415: > UserWarning: > Atlas (http://math-atlas.sourceforge.net/) libraries not found. > Directories to search for the libraries can be specified in the > numpy/distutils/site.cfg file (section [atlas]) or by setting > the ATLAS environment variable. > warnings.warn(AtlasNotFoundError.__doc__) > lapack_info: > Replacing _lib_names[0]=='lapack' with 'openblas' > Replacing _lib_names[0]=='openblas' with 'openblas' > FOUND: > libraries = ['openblas'] > library_dirs = ['/home/skipper/.local/lib'] > language = f77 > > FOUND: > libraries = ['openblas', 'openblas'] > library_dirs = ['/home/skipper/.local/lib'] > define_macros = [('NO_ATLAS_INFO', 1)] > language = f77 > > running config > ``` > > python setup.py build &> build.log > > Build log is here. Obviously it didn't go well, but I don't see anything > to indicate problems. Sometimes I am able to get _dotblas.so built, though > I don't know what causes it. This time I wasn't. > > https://gist.github.com/jseabold/7054ba9d85eae09eb402#file-numpy_build-log > > sudo python setup.py install &> install.log > > > https://gist.github.com/jseabold/a0f5638b65d44aeff598#file-numpy_install-log > > >>> import numpy as np > Traceback (most recent call last): > File "", line 1, in > File "/usr/local/lib/python2.7/dist-packages/numpy/__init__.py", line > 138, in > import add_newdocs > File "/usr/local/lib/python2.7/dist-packages/numpy/add_newdocs.py", line > 13, in > from numpy.lib import add_newdoc > File "/usr/local/lib/python2.7/dist-packages/numpy/lib/__init__.py", > line 15, in > from polynomial import * > File "/usr/local/lib/python2.7/dist-packages/numpy/lib/polynomial.py", > line 19, in > from numpy.linalg import eigvals, lstsq, inv > File "/usr/local/lib/python2.7/dist-packages/numpy/linalg/__init__.py", > line 50, in > from linalg import * > File "/usr/local/lib/python2.7/dist-packages/numpy/linalg/linalg.py", > line 25, in > from numpy.linalg import lapack_lite > ImportError: libopenblas.so.0: cannot open shared object file: No such > file or directory > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjwilliams43 at gmail.com Sat Mar 23 15:36:32 2013 From: cjwilliams43 at gmail.com (Colin J. Williams) Date: Sat, 23 Mar 2013 15:36:32 -0400 Subject: [Numpy-discussion] Execution time difference between 2.7 and 3.2 using numpy In-Reply-To: References: <5149BDBB.6060509@ncf.ca> <5149CF4D.6090906@gmail.com> <514CCF97.7030902@gmail.com> <514DBEA6.6080504@gmail.com> Message-ID: <514E0440.5010404@gmail.com> An HTML attachment was scrubbed... URL: From cjwilliams43 at gmail.com Sat Mar 23 15:47:53 2013 From: cjwilliams43 at gmail.com (Colin J. Williams) Date: Sat, 23 Mar 2013 15:47:53 -0400 Subject: [Numpy-discussion] Execution time difference between 2.7 and 3.2 using numpy In-Reply-To: References: <5149BDBB.6060509@ncf.ca> <5149CF4D.6090906@gmail.com> <514CCF97.7030902@gmail.com> Message-ID: <514E06E9.3050001@gmail.com> An HTML attachment was scrubbed... URL: From ake.sandgren at hpc2n.umu.se Sat Mar 23 19:26:46 2013 From: ake.sandgren at hpc2n.umu.se (Ake Sandgren) Date: Sun, 24 Mar 2013 00:26:46 +0100 Subject: [Numpy-discussion] Unable to building numpy with openblas using bento or distutils In-Reply-To: References: Message-ID: <1364081206.2948.75.camel@skalman.ydc.se> On Sat, 2013-03-23 at 14:19 -0400, Skipper Seabold wrote: > Some help on this would be greatly appreciated. It's been recommended > to use OpenBlas over ATLAS, so I've been trying to build numpy with > openblas and have run into a few problems. > > To truly support OpenBlas, is it maybe necessary to make some > additions to numpy/distutils/system_info.py? Here is how. https://github.com/akesandgren/numpy/commit/363339dd3a9826f3e3e7dc4248c258d3c4dfcd7c From jsseabold at gmail.com Sat Mar 23 20:44:24 2013 From: jsseabold at gmail.com (Skipper Seabold) Date: Sat, 23 Mar 2013 20:44:24 -0400 Subject: [Numpy-discussion] Unable to building numpy with openblas using bento or distutils In-Reply-To: <1364081206.2948.75.camel@skalman.ydc.se> References: <1364081206.2948.75.camel@skalman.ydc.se> Message-ID: On Sat, Mar 23, 2013 at 7:26 PM, Ake Sandgren wrote: > On Sat, 2013-03-23 at 14:19 -0400, Skipper Seabold wrote: >> Some help on this would be greatly appreciated. It's been recommended >> to use OpenBlas over ATLAS, so I've been trying to build numpy with >> openblas and have run into a few problems. > >> >> To truly support OpenBlas, is it maybe necessary to make some >> additions to numpy/distutils/system_info.py? > > Here is how. > > https://github.com/akesandgren/numpy/commit/363339dd3a9826f3e3e7dc4248c258d3c4dfcd7c > Thanks that works well for numpy. Test pass. I hope that makes it into a pull request. My site.cfg looks like this. I don't know about the lapack_opt section. It doesn't seem to work. [DEFAULT] library_dirs = /home/skipper/.local/lib include_dirs = /home/skipper/.local/include [openblas] libraries = openblas [lapack_opt] libraries = openblas Do you have any idea how to get scipy working too. I have a similar site.cfg, but it does not find lapack, which is rolled into libopenblas from what I understand. I can do export LAPACK=~/.local/lib/libopenblas.a python setup.py build &> build.log sudo -E python setup.py install There are no obvious failures in the build.log, but scipy is still broken because it needs lapack from numpy I guess. >>> import numpy as np >>> np.show_config() lapack_info: NOT AVAILABLE atlas_threads_info: NOT AVAILABLE blas_opt_info: libraries = ['openblas', 'openblas'] library_dirs = ['/home/skipper/.local/lib'] language = f77 lapack_src_info: NOT AVAILABLE openblas_info: libraries = ['openblas', 'openblas'] library_dirs = ['/home/skipper/.local/lib'] language = f77 lapack_opt_info: NOT AVAILABLE atlas_info: NOT AVAILABLE lapack_mkl_info: NOT AVAILABLE blas_mkl_info: NOT AVAILABLE mkl_info: NOT AVAILABLE >>> from scipy import stats Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python2.7/dist-packages/scipy/stats/__init__.py", line 320, in from .stats import * File "/usr/local/lib/python2.7/dist-packages/scipy/stats/stats.py", line 242, in import scipy.linalg as linalg File "/usr/local/lib/python2.7/dist-packages/scipy/linalg/__init__.py", line 147, in from .misc import * File "/usr/local/lib/python2.7/dist-packages/scipy/linalg/misc.py", line 5, in from . import blas File "/usr/local/lib/python2.7/dist-packages/scipy/linalg/blas.py", line 113, in from scipy.linalg import _fblas ImportError: libopenblas.so.0: cannot open shared object file: No such file or directory Skipper From jsseabold at gmail.com Sat Mar 23 21:06:48 2013 From: jsseabold at gmail.com (Skipper Seabold) Date: Sat, 23 Mar 2013 21:06:48 -0400 Subject: [Numpy-discussion] Unable to building numpy with openblas using bento or distutils In-Reply-To: References: <1364081206.2948.75.camel@skalman.ydc.se> Message-ID: On Sat, Mar 23, 2013 at 8:44 PM, Skipper Seabold wrote: > On Sat, Mar 23, 2013 at 7:26 PM, Ake Sandgren wrote: >> On Sat, 2013-03-23 at 14:19 -0400, Skipper Seabold wrote: >>> Some help on this would be greatly appreciated. It's been recommended >>> to use OpenBlas over ATLAS, so I've been trying to build numpy with >>> openblas and have run into a few problems. >> >>> >>> To truly support OpenBlas, is it maybe necessary to make some >>> additions to numpy/distutils/system_info.py? >> >> Here is how. >> >> https://github.com/akesandgren/numpy/commit/363339dd3a9826f3e3e7dc4248c258d3c4dfcd7c >> > > > Thanks that works well for numpy. Test pass. I hope that makes it into > a pull request. My site.cfg looks like this. I don't know about the > lapack_opt section. It doesn't seem to work. > > [DEFAULT] > library_dirs = /home/skipper/.local/lib > include_dirs = /home/skipper/.local/include > > [openblas] > libraries = openblas > > [lapack_opt] > libraries = openblas > > Do you have any idea how to get scipy working too. I have a similar > site.cfg, but it does not find lapack, which is rolled into > libopenblas from what I understand. I can do > > export LAPACK=~/.local/lib/libopenblas.a > python setup.py build &> build.log > sudo -E python setup.py install > > There are no obvious failures in the build.log, but scipy is still > broken because it needs lapack from numpy I guess. The answer is to export BLAS=~/.local/lib/libopenblas.a export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/.local/lib/ before building and installing. Now everything works. Whew. Thanks a lot for the help. > >>>> import numpy as np >>>> np.show_config() > lapack_info: > NOT AVAILABLE > atlas_threads_info: > NOT AVAILABLE > blas_opt_info: > libraries = ['openblas', 'openblas'] > library_dirs = ['/home/skipper/.local/lib'] > language = f77 > lapack_src_info: > NOT AVAILABLE > openblas_info: > libraries = ['openblas', 'openblas'] > library_dirs = ['/home/skipper/.local/lib'] > language = f77 > lapack_opt_info: > NOT AVAILABLE > atlas_info: > NOT AVAILABLE > lapack_mkl_info: > NOT AVAILABLE > blas_mkl_info: > NOT AVAILABLE > mkl_info: > NOT AVAILABLE >>>> from scipy import stats > Traceback (most recent call last): > File "", line 1, in > File "/usr/local/lib/python2.7/dist-packages/scipy/stats/__init__.py", > line 320, in > from .stats import * > File "/usr/local/lib/python2.7/dist-packages/scipy/stats/stats.py", > line 242, in > import scipy.linalg as linalg > File "/usr/local/lib/python2.7/dist-packages/scipy/linalg/__init__.py", > line 147, in > from .misc import * > File "/usr/local/lib/python2.7/dist-packages/scipy/linalg/misc.py", > line 5, in > from . import blas > File "/usr/local/lib/python2.7/dist-packages/scipy/linalg/blas.py", > line 113, in > from scipy.linalg import _fblas > ImportError: libopenblas.so.0: cannot open shared object file: No such > file or directory > > Skipper From sebastian at sipsolutions.net Sun Mar 24 08:21:30 2013 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Sun, 24 Mar 2013 13:21:30 +0100 Subject: [Numpy-discussion] NumPy/SciPy participation in GSoC 2013 In-Reply-To: References: Message-ID: <1364127690.12566.39.camel@sebastian-laptop> On Thu, 2013-03-21 at 22:20 +0100, Ralf Gommers wrote: > Hi all, > > > It is the time of the year for Google Summer of Code applications. If > we want to participate with Numpy and/or Scipy, we need two things: > enough mentors and ideas for projects. If we get those, we'll apply > under the PSF umbrella. They've outlined the timeline they're working > by and guidelines at > http://pyfound.blogspot.nl/2013/03/get-ready-for-google-summer-of-code.html. > > We should be able to come up with some interesting project ideas I'd > think, let's put those at > http://projects.scipy.org/scipy/wiki/SummerofCodeIdeas. Preferably > with enough detail to be understandable for people new to the projects > and a proposed mentor. Just some more ideas for numpy. I did not think about it much if they fit the GSoC format well, but maybe a possible mentor likes one: 1. Speed improvement for scalars/small arrays. This would start with ideas along the lines currently done by two current pull requests for numpy, that try to improve array + python scalar speed by circumventing costly scalar -> array conversions, etc. And continue to improving the speed for finding the correct ufunc (which I believe Nathaniel timed to be a pretty big factor). But it would touch a lot of numpy internals probably, so the learning curve may be pretty steep. 2. Implement stable summation. Basically it would be about creating generalized ufuncs (if that is possible) implementing different kinds of stable summation algorithms for the inexact types and then adding it as an option to np.sum. 3. This has been suggested before in some way or another, but improving of the subclassing of arrays. Though I am unsure if user code might dislike changes, even if it is improvement... It would start of with checking which python side functions should explicitly call __array_wrap__ (possible writing more helpers to do it) and calling it more consistently plus adding the context information where it is currently not added (only simple ufunc calls add it, not even reductions I think). I am sure you can dig a lot deeper into it all, but it would require some serious thinking and is not straight forward. 4. Partial sorting. This would be about implementing partial sorting and O(N) median calculation into numpy. Plus maybe new functions that can make use of it (though I don't know what exactly that would be, but these functions could also have a home in scipy not numpy). > > We need at least 3 people willing to mentor a student. Ideally we'd > have enough mentors this week, so we can apply to the PSF on time. If > you're willing to be a mentor, please send me the following: name, > email address, phone nr, and what you're interested in mentoring. If > you have time constaints and have doubts about being able to be a > primary mentor, being a backup mentor would also be helpful. > > > Cheers, > Ralf > > > P.S. as you can probably tell from the above, I'm happy to coordinate > the GSoC applications for Numpy and Scipy > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From marc.gronle at ito.uni-stuttgart.de Sun Mar 24 10:53:40 2013 From: marc.gronle at ito.uni-stuttgart.de (Marc Gronle) Date: Sun, 24 Mar 2013 15:53:40 +0100 Subject: [Numpy-discussion] C-API: Subclassing array for numpy 1.7 (with #define NPY_NO_DEPRECATED_API 0x00000007) Message-ID: Hello together, we embedded Python 3 in a C++ environment. In this application I created a new class, that is a subclass from numpy array. Until now (numpy 1.6 or numpy 1.7 without the deprecation define (NPY_NO_DEPRECATED_API) the typedef for my class describing the object was something like typedef struct { PyArrayObject numpyArray; int myMember1; int myMember2; ... } PySubclassObject This always worked for me and a call to PyArray_NDIM(obj) returned the number of dimensions, where obj is of type PySubclassObject*. If I am now using numpy 1.7 this also works if the line #define NPY_NO_DEPRECATED_API 0x00000007 is not set. However, if I removed all deprecated content, there occurrs the first runtime error when creating an instance of PySubclassObject. This is due to the fact, that PyArrayObject now only is a tiny typedef of the following form typedef struct tagPyArrayObject { PyObject_HEAD } PyArrayObject; Previously this has been something like typedef struct tagPyArrayObject { PyObject_HEAD char *data; int nd; npy_intp *dimensions; ... } PyArrayObject; Usually, when creating only one np.array, there is extra space allocated depending on the size of PyArrayObject_fields. However, in my subclass I don't know how to add that extra space between the members numpyArray and myMember1. Finally, when calling PyArray_NDIM(obj) like above, the obj-pointer is casted in the macro to PyArrayObject_fields*. This yields an access conflict with myMember1, myMember2... and the members of PyArrayObject_fields. I hope this description was clear enough to understand my problem. Has anybody an idea what I need to change such that the subclassing also works for the new numpy structure? Thanks for any answer. Cheers Marc -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaime.frio at gmail.com Sun Mar 24 12:50:59 2013 From: jaime.frio at gmail.com (=?ISO-8859-1?Q?Jaime_Fern=E1ndez_del_R=EDo?=) Date: Sun, 24 Mar 2013 09:50:59 -0700 Subject: [Numpy-discussion] Generalized inner? Message-ID: The other day I found myself finding trailing edges in binary images doing something like this: arr = np.random.randint(2, size=1000).astype(np.int8) pattern = np.array([1, 1, 1, 1, 0, 0]) arr_match = 2*arr - 1 pat_match = 2*pattern - 1 from numpy.lib.stride_tricks import as_strided arr_win = as_strided(arr_match, shape=arr.shape[:-1] + (arr.shape[-1]-len(pattern)+1, len(pattern)), strides=arr.strides+arr.strides[-1:]) matches = np.einsum('...i, i', arr_win, pat_match) == len(pattern) While this works fine, this led me to thinking that all this functions (inner, dot, einsum, tensordot...) could be generalized to any other ufuncs apart from a pointwise np.multiply followed by an np.add reduction. It would be great if there was a np.gen_inner that allowed something like: np.gen_inner(arr_win, pattern, pointwise=np.equal, reduce=np.logical_and) I would like to think that such a generalization would be useful in other settings (although I can't think of any right now), and that it could find it's place in numpy, rather than in scipy.ndimage or the like. Does this make any sense? Is there any already existing way of doing this that I'm overlooking? Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ondrej.certik at gmail.com Sun Mar 24 13:36:23 2013 From: ondrej.certik at gmail.com (=?UTF-8?B?T25kxZllaiDEjGVydMOtaw==?=) Date: Sun, 24 Mar 2013 18:36:23 +0100 Subject: [Numpy-discussion] Numpy 1.7.1 In-Reply-To: References: Message-ID: On Fri, Mar 22, 2013 at 1:02 AM, Charles R Harris wrote: > The Numpy 1.7.1 release process seems to have stalled. My apologies for that. > What do we need to > finish up to get it going again? I think it would be nice to shoot for a > release maybe the weekend after next. I think just the release notes need to be written, which I am doing right now. Then I release 1.7.1rc1. If more things need to be merged, I can do 1.7.1rc2. Or we can later release 1.7.2, depending on the result of the discussion of the plan here: https://github.com/numpy/numpy/issues/3158 Ondrej From nadavh at visionsense.com Sun Mar 24 14:32:03 2013 From: nadavh at visionsense.com (Nadav Horesh) Date: Sun, 24 Mar 2013 18:32:03 +0000 Subject: [Numpy-discussion] Generalized inner? In-Reply-To: References: Message-ID: <3vuqt5ng51j5gts026ebpk53.1364149918865@email.android.com> This is what APL's . operator does, and I found it useful from time to time (but I was much younger then). Nadav Jaime Fern?ndez del R?o wrote: The other day I found myself finding trailing edges in binary images doing something like this: arr = np.random.randint(2, size=1000).astype(np.int8) pattern = np.array([1, 1, 1, 1, 0, 0]) arr_match = 2*arr - 1 pat_match = 2*pattern - 1 from numpy.lib.stride_tricks import as_strided arr_win = as_strided(arr_match, shape=arr.shape[:-1] + (arr.shape[-1]-len(pattern)+1, len(pattern)), strides=arr.strides+arr.strides[-1:]) matches = np.einsum('...i, i', arr_win, pat_match) == len(pattern) While this works fine, this led me to thinking that all this functions (inner, dot, einsum, tensordot...) could be generalized to any other ufuncs apart from a pointwise np.multiply followed by an np.add reduction. It would be great if there was a np.gen_inner that allowed something like: np.gen_inner(arr_win, pattern, pointwise=np.equal, reduce=np.logical_and) I would like to think that such a generalization would be useful in other settings (although I can't think of any right now), and that it could find it's place in numpy, rather than in scipy.ndimage or the like. Does this make any sense? Is there any already existing way of doing this that I'm overlooking? Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ondrej.certik at gmail.com Sun Mar 24 17:02:53 2013 From: ondrej.certik at gmail.com (=?UTF-8?B?T25kxZllaiDEjGVydMOtaw==?=) Date: Sun, 24 Mar 2013 22:02:53 +0100 Subject: [Numpy-discussion] ANN: NumPy 1.7.1rc1 release Message-ID: Hi, I'm pleased to announce the availability of the first release candidate of NumPy 1.7.1rc1. Sources and binary installers can be found at https://sourceforge.net/projects/numpy/files/NumPy/1.7.1rc1/ Please test it and report any bugs. It fixes a few bugs, listed below. I would like to thank everybody who contributed patches to this release: Nathaniel J. Smith, Sebastian Berg, Charles Harris, Bradley M. Froehle, Ralf Gommers, Christoph Gohlke, Mark Wiebe and Maximilian Albert. Cheers, Ondrej ========================= NumPy 1.7.1 Release Notes ========================= This is a bugfix only release in the 1.7.x series. Issues fixed ------------ gh-2973 Fix `1` is printed during numpy.test() gh-2983 BUG: gh-2969: Backport memory leak fix 80b3a34. gh-3007 Backport gh-3006 gh-2984 Backport fix complex polynomial fit gh-2982 BUG: Make nansum work with booleans. gh-2985 Backport large sort fixes gh-3039 Backport object take gh-3105 Backport nditer fix op axes initialization gh-3108 BUG: npy-pkg-config ini files were missing after Bento build. gh-3124 BUG: PyArray_LexSort allocates too much temporary memory. gh-3131 BUG: Exported f2py_size symbol prevents linking multiple f2py modules. gh-3117 Backport gh-2992 gh-3135 DOC: Add mention of PyArray_SetBaseObject stealing a reference gh-3134 DOC: Fix typo in fft docs (the indexing variable is 'm', not 'n'). gh-3136 Backport #3128 Checksums ========= 28c3f3e71b5eaa6bfab6e8340dbd35e7 release/installers/numpy-1.7.1rc1.tar.gz 436f416dee10d157314bd9da7ab95c9c release/installers/numpy-1.7.1rc1-win32-superpack-python2.7.exe a543c8cf69f66ff2b4c9565646105863 release/installers/numpy-1.7.1rc1-win32-superpack-python2.6.exe 6dfcbbd449b7fe4e841c5fd1bfa7af7c release/installers/numpy-1.7.1rc1-win32-superpack-python2.5.exe 22912792a1b6155ae2bdbc30bee8fadc release/installers/numpy-1.7.1rc1-win32-superpack-python3.2.exe 95bc5a5fcce9fcbc2717a774dccae31b release/installers/numpy-1.7.1rc1-win32-superpack-python3.3.exe 33cf283765a148846b49b89fb96d67d5 release/installers/numpy-1.7.1rc1-win32-superpack-python3.1.exe 9761de4b35493fed38c5d177da9c3b37 release/installers/numpy-1.7.1rc1.zip From sergio.pasra at gmail.com Sun Mar 24 17:46:56 2013 From: sergio.pasra at gmail.com (Sergio Pascual) Date: Sun, 24 Mar 2013 22:46:56 +0100 Subject: [Numpy-discussion] howto apply-along-axis? In-Reply-To: References: Message-ID: This is the closer I got to do what you say http://numpy-discussion.10968.n7.nabble.com/Reductions-with-nditer-working-only-with-the-last-axis-td8157.html Converts a 3D to 2D, but only works in the last axis. Any improvement would be welcomed.? 2013/3/22 Neal Becker > I frequently find I have my 1d function that performs some reduction that > I'd > like to apply-along some axis of an n-d array. > > As a trivial example, > > def sum(u): > return np.sum (u) > > In this case the function is probably C/C++ code, but that is irrelevant (I > think). > > Is there a reasonably efficient way to do this within numpy? > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sun Mar 24 23:00:47 2013 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 24 Mar 2013 21:00:47 -0600 Subject: [Numpy-discussion] ANN: NumPy 1.7.1rc1 release In-Reply-To: References: Message-ID: On Sun, Mar 24, 2013 at 3:02 PM, Ond?ej ?ert?k wrote: > Hi, > > I'm pleased to announce the availability of the first release candidate of > NumPy 1.7.1rc1. > > Sources and binary installers can be found at > https://sourceforge.net/projects/numpy/files/NumPy/1.7.1rc1/ > > Please test it and report any bugs. It fixes a few bugs, listed below. > > I would like to thank everybody who contributed patches to this release: > Nathaniel J. Smith, Sebastian Berg, Charles Harris, Bradley M. Froehle, > Ralf Gommers, Christoph Gohlke, Mark Wiebe and Maximilian Albert. > > Cheers, > Ondrej > > > > ========================= > NumPy 1.7.1 Release Notes > ========================= > > This is a bugfix only release in the 1.7.x series. > > > Issues fixed > ------------ > > gh-2973 Fix `1` is printed during numpy.test() > gh-2983 BUG: gh-2969: Backport memory leak fix 80b3a34. > gh-3007 Backport gh-3006 > gh-2984 Backport fix complex polynomial fit > gh-2982 BUG: Make nansum work with booleans. > gh-2985 Backport large sort fixes > gh-3039 Backport object take > gh-3105 Backport nditer fix op axes initialization > gh-3108 BUG: npy-pkg-config ini files were missing after Bento build. > gh-3124 BUG: PyArray_LexSort allocates too much temporary memory. > gh-3131 BUG: Exported f2py_size symbol prevents linking multiple f2py > modules. > gh-3117 Backport gh-2992 > gh-3135 DOC: Add mention of PyArray_SetBaseObject stealing a reference > gh-3134 DOC: Fix typo in fft docs (the indexing variable is 'm', not > 'n'). > gh-3136 Backport #3128 > > Checksums > ========= > > 28c3f3e71b5eaa6bfab6e8340dbd35e7 release/installers/numpy-1.7.1rc1.tar.gz > 436f416dee10d157314bd9da7ab95c9c > release/installers/numpy-1.7.1rc1-win32-superpack-python2.7.exe > a543c8cf69f66ff2b4c9565646105863 > release/installers/numpy-1.7.1rc1-win32-superpack-python2.6.exe > 6dfcbbd449b7fe4e841c5fd1bfa7af7c > release/installers/numpy-1.7.1rc1-win32-superpack-python2.5.exe > 22912792a1b6155ae2bdbc30bee8fadc > release/installers/numpy-1.7.1rc1-win32-superpack-python3.2.exe > 95bc5a5fcce9fcbc2717a774dccae31b > release/installers/numpy-1.7.1rc1-win32-superpack-python3.3.exe > 33cf283765a148846b49b89fb96d67d5 > release/installers/numpy-1.7.1rc1-win32-superpack-python3.1.exe > 9761de4b35493fed38c5d177da9c3b37 release/installers/numpy-1.7.1rc1.zip > __ Great. The fix for the memory leak should make some folks happy. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From cgohlke at uci.edu Sun Mar 24 23:40:58 2013 From: cgohlke at uci.edu (Christoph Gohlke) Date: Sun, 24 Mar 2013 20:40:58 -0700 Subject: [Numpy-discussion] ANN: NumPy 1.7.1rc1 release In-Reply-To: References: Message-ID: <514FC74A.90705@uci.edu> On 3/24/2013 2:02 PM, Ond?ej ?ert?k wrote: > Hi, > > I'm pleased to announce the availability of the first release candidate of > NumPy 1.7.1rc1. > > Sources and binary installers can be found at > https://sourceforge.net/projects/numpy/files/NumPy/1.7.1rc1/ > > Please test it and report any bugs. It fixes a few bugs, listed below. > > I would like to thank everybody who contributed patches to this release: > Nathaniel J. Smith, Sebastian Berg, Charles Harris, Bradley M. Froehle, > Ralf Gommers, Christoph Gohlke, Mark Wiebe and Maximilian Albert. > > Cheers, > Ondrej > > > > ========================= > NumPy 1.7.1 Release Notes > ========================= > > This is a bugfix only release in the 1.7.x series. > > > Issues fixed > ------------ > > gh-2973 Fix `1` is printed during numpy.test() > gh-2983 BUG: gh-2969: Backport memory leak fix 80b3a34. > gh-3007 Backport gh-3006 > gh-2984 Backport fix complex polynomial fit > gh-2982 BUG: Make nansum work with booleans. > gh-2985 Backport large sort fixes > gh-3039 Backport object take > gh-3105 Backport nditer fix op axes initialization > gh-3108 BUG: npy-pkg-config ini files were missing after Bento build. > gh-3124 BUG: PyArray_LexSort allocates too much temporary memory. > gh-3131 BUG: Exported f2py_size symbol prevents linking multiple f2py > modules. > gh-3117 Backport gh-2992 > gh-3135 DOC: Add mention of PyArray_SetBaseObject stealing a reference > gh-3134 DOC: Fix typo in fft docs (the indexing variable is 'm', not 'n'). > gh-3136 Backport #3128 > > Checksums > ========= > > 28c3f3e71b5eaa6bfab6e8340dbd35e7 release/installers/numpy-1.7.1rc1.tar.gz > 436f416dee10d157314bd9da7ab95c9c > release/installers/numpy-1.7.1rc1-win32-superpack-python2.7.exe > a543c8cf69f66ff2b4c9565646105863 > release/installers/numpy-1.7.1rc1-win32-superpack-python2.6.exe > 6dfcbbd449b7fe4e841c5fd1bfa7af7c > release/installers/numpy-1.7.1rc1-win32-superpack-python2.5.exe > 22912792a1b6155ae2bdbc30bee8fadc > release/installers/numpy-1.7.1rc1-win32-superpack-python3.2.exe > 95bc5a5fcce9fcbc2717a774dccae31b > release/installers/numpy-1.7.1rc1-win32-superpack-python3.3.exe > 33cf283765a148846b49b89fb96d67d5 > release/installers/numpy-1.7.1rc1-win32-superpack-python3.1.exe > 9761de4b35493fed38c5d177da9c3b37 release/installers/numpy-1.7.1rc1.zip Hello, test_exec_command_stderr fails on Python 3.x for Windows (msvc/MKL builds): https://github.com/numpy/numpy/issues/3165 -- Christoph From ndbecker2 at gmail.com Mon Mar 25 08:24:23 2013 From: ndbecker2 at gmail.com (Neal Becker) Date: Mon, 25 Mar 2013 08:24:23 -0400 Subject: [Numpy-discussion] picking elements with boolean masks Message-ID: starting with a NxM array, I want to select elements of the array using a set of boolean masks. The masks are simply where the indexes have a 0 or 1 in the corresponding bit position. For example, consider the case where M = 4. all_syms = np.arange (4) all_bits = np.arange (2) bit_mask = (all_syms[:,np.newaxis] >> all_bits) & 1 mask0 = bit_mask == 0 mask1 = bit_mask == 1 Maybe there's a more straightforward way to generate these masks. That's not my question. In [331]: mask1 Out[331]: array([[False, False], [ True, False], [False, True], [ True, True]], dtype=bool) OK, now I want to use this mask on D In [333]: D.shape Out[333]: (32400, 4) Just to simplify, let's just try the first row of D In [336]: D[0] Out[336]: array([ 0., 2., 2., 4.]) In [335]: D[0][mask1[...,0]] Out[335]: array([ 2., 4.]) that worked fine. But I want not just to apply one of the masks in the set (mask1 is [4,2], it has 2 masks), I want the results of applying all the masks (2 in this case) In [334]: D[0][mask1] --------------------------------------------------------------------------- ValueError Traceback (most recent call last) in () ----> 1 D[0][mask1] ValueError: boolean index array should have 1 dimension Any ideas what's the best approach here? From ndbecker2 at gmail.com Mon Mar 25 08:50:42 2013 From: ndbecker2 at gmail.com (Neal Becker) Date: Mon, 25 Mar 2013 08:50:42 -0400 Subject: [Numpy-discussion] picking elements with boolean masks References: Message-ID: Neal Becker wrote: > starting with a NxM array, I want to select elements of the array using a set > of > boolean masks. The masks are simply where the indexes have a 0 or 1 in the > corresponding bit position. For example, consider the case where M = 4. > > all_syms = np.arange (4) > all_bits = np.arange (2) > bit_mask = (all_syms[:,np.newaxis] >> all_bits) & 1 > mask0 = bit_mask == 0 > mask1 = bit_mask == 1 > > Maybe there's a more straightforward way to generate these masks. That's not > my question. > > In [331]: mask1 > Out[331]: > array([[False, False], > [ True, False], > [False, True], > [ True, True]], dtype=bool) > > OK, now I want to use this mask on D > In [333]: D.shape > Out[333]: (32400, 4) > > Just to simplify, let's just try the first row of D > > In [336]: D[0] > Out[336]: array([ 0., 2., 2., 4.]) > > In [335]: D[0][mask1[...,0]] > Out[335]: array([ 2., 4.]) > > that worked fine. But I want not just to apply one of the masks in the set > (mask1 is [4,2], it has 2 masks), I want the results of applying all the masks > (2 in this case) > > > In [334]: D[0][mask1] > --------------------------------------------------------------------------- > ValueError Traceback (most recent call last) > in () > ----> 1 D[0][mask1] > > ValueError: boolean index array should have 1 dimension > > Any ideas what's the best approach here? Perhaps what I need is to use integer indexing, rather than boolean. all_syms = np.arange (const.size) all_bits = np.arange (BITS_PER_SYM) bit_mask = (all_syms[:,np.newaxis] >> all_bits) & 1 ind = np.array ([np.nonzero (bit_mask[...,i])[0] for i in range (BITS_PER_SYM)]) In [366]: ind Out[366]: array([[1, 3], [2, 3]]) So now we have the 1-d indexes of the elements we want to select from D. D = np.arange (4)+1 In [376]: D Out[376]: array([1, 2, 3, 4]) In [377]: D[ind] Out[377]: array([[2, 4], [3, 4]]) Looks like that does the job From dineshbvadhia at hotmail.com Mon Mar 25 11:23:52 2013 From: dineshbvadhia at hotmail.com (Dinesh B Vadhia) Date: Mon, 25 Mar 2013 08:23:52 -0700 Subject: [Numpy-discussion] variables not defined in numpy.random __init.py__ ? Message-ID: Using PyInstaller, the following error occurs: Traceback (most recent call last): File "", line 9, in File "//usr/lib/python2.7/dist-packages/PIL/Image.py", line 355, in init __import__(f, globals(), locals(), []) File "//usr/lib/python2.7/dist-packages/PIL/IptcImagePlugin.py", line 23, in import os, tempfile File "/usr/lib/python2.7/tempfile.py", line 34, in from random import Random as _Random File "//usr/lib/python2.7/dist-packages/numpy/random/__init__.py", line 90, in ranf = random = sample = random_sample NameError: name 'random_sample' is not defined Is line 90 in __init.py__ valid? -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Mon Mar 25 13:04:01 2013 From: gael.varoquaux at normalesup.org (=?iso-8859-1?Q?Ga=EBl?= Varoquaux) Date: Mon, 25 Mar 2013 18:04:01 +0100 Subject: [Numpy-discussion] numpy array to C API In-Reply-To: <20130321163451.GE12061@kudu.in-berlin.de> References: <514B328B.40107@syntonetic.com> <20130321163451.GE12061@kudu.in-berlin.de> Message-ID: <20130325170401.GF20550@phare.normalesup.org> On Thu, Mar 21, 2013 at 05:34:51PM +0100, Valentin Haenel wrote: > > I got currious about the Ctypes approach as well as "Ga?l Varoquaux?s > > blog post about avoiding data copies", but the link in the article > > didn't seem to work. (Under "Further Reading and References") > There seems to be something wrong with Ga?l's website. I have CC him, > maybe he can fix it. Thanks Valentin! I believe that I have fixed the problem. Soren, if you still have difficulties accessing the material, please complain. Cheers, Ga?l From dineshbvadhia at hotmail.com Mon Mar 25 14:40:58 2013 From: dineshbvadhia at hotmail.com (Dinesh B Vadhia) Date: Mon, 25 Mar 2013 11:40:58 -0700 Subject: [Numpy-discussion] Unable to building numpy with openblas usingbento or distutils In-Reply-To: <128E07FF-9B6C-4F47-AEFD-43752010E8BB@phys.ethz.ch> References: <128E07FF-9B6C-4F47-AEFD-43752010E8BB@phys.ethz.ch> Message-ID: Caveat: Not tested but it did look interesting: http://osdf.github.com/blog/numpyscipy-with-openblas-for-ubuntu-1204-second-try.html. Would be interested to know if it worked out as want to try out OpenBlas in the future. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsseabold at gmail.com Mon Mar 25 14:52:41 2013 From: jsseabold at gmail.com (Skipper Seabold) Date: Mon, 25 Mar 2013 14:52:41 -0400 Subject: [Numpy-discussion] Unable to building numpy with openblas usingbento or distutils In-Reply-To: References: <128E07FF-9B6C-4F47-AEFD-43752010E8BB@phys.ethz.ch> Message-ID: On Mon, Mar 25, 2013 at 2:40 PM, Dinesh B Vadhia wrote: > ** > Caveat: Not tested but it did look interesting: > http://osdf.github.com/blog/numpyscipy-with-openblas-for-ubuntu-1204-second-try.html > . > Would be interested to know if it worked out as want to try out OpenBlas > in the future. > > Yes, this is one of the sources I used. I needed to change the c_check file in openblas as described up thread, and I didn't like the half-distutils/half-bento hack, but with Ake's patch to numpy's distutils, and my site.cfg, this works as described for me (Kubuntu 12.10) using just the usual setup.py. Skipper -------------- next part -------------- An HTML attachment was scrubbed... URL: From jrocher at enthought.com Mon Mar 25 14:56:32 2013 From: jrocher at enthought.com (Jonathan Rocher) Date: Mon, 25 Mar 2013 13:56:32 -0500 Subject: [Numpy-discussion] Growing the contributor base of Numpy Message-ID: Dear all, One recurring question is how to *grow the contributor base* to NumPy and provide help and relief to core developers and maintainers. One way to do this would be to *leverage the upcoming SciPy conference* in 2 ways: 1. Provide an intermediate or advanced level tutorial on NumPy focusing on teaching the C-API and the architecture of the package to help people navigate the source code, and find answers to precise deep questions. I think that many users would be interested in being better able to understand the underlayers to become powerful users (and contributors if they want to). 2. Organize a Numpy sprint to leverage all this freshly graduated students apply what they learned to tackle some of the work under the guidance of core developers. This would be a great occasion to share and grow knowledge that is fundamental to our community. And the fact that the underlayers are in C is fine IMHO: SciPy is about scientific programming in Python and that is done with a lot of C. *Thoughts? Anyone interested in leading a tutorial (can be a team of people)? Anyone willing to coordinate the sprint? Who would be willing to be present and help during the sprint? * Note that there is less than 1 week left until the tutorial submission deadline. I am happy to help brainstorm on this to make it happen. Thanks, Jonathan and Andy, for the SciPy2013 organizers -- Jonathan Rocher, PhD Scientific software developer SciPy2013 conference co-chair Enthought, Inc. jrocher at enthought.com 1-512-536-1057 http://www.enthought.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Mon Mar 25 15:51:13 2013 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Mon, 25 Mar 2013 20:51:13 +0100 Subject: [Numpy-discussion] variables not defined in numpy.random __init.py__ ? In-Reply-To: References: Message-ID: On Mon, Mar 25, 2013 at 4:23 PM, Dinesh B Vadhia wrote: > ** > Using PyInstaller, the following error occurs: > > Traceback (most recent call last): > File "", line 9, in > File "//usr/lib/python2.7/dist-packages/PIL/Image.py", line 355, in init > __import__(f, globals(), locals(), []) > File "//usr/lib/python2.7/dist-packages/PIL/IptcImagePlugin.py", line > 23, in > import os, tempfile > File "/usr/lib/python2.7/tempfile.py", line 34, in > from random import Random as _Random > File "//usr/lib/python2.7/dist-packages/numpy/random/__init__.py", line > 90, in > ranf = random = sample = random_sample > NameError: name 'random_sample' is not defined > > Is line 90 in __init.py__ valid? > It is. Above the failing you see a line "from info import __all__", and in random/info.py you'll see that `random_sample` is in the __all__ dict. Somehow it disappeared for you, you'll need to do some debugging to find out why. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From brad.froehle at gmail.com Mon Mar 25 16:26:03 2013 From: brad.froehle at gmail.com (Bradley M. Froehle) Date: Mon, 25 Mar 2013 13:26:03 -0700 Subject: [Numpy-discussion] variables not defined in numpy.random __init.py__ ? In-Reply-To: References: Message-ID: On Mon, Mar 25, 2013 at 12:51 PM, Ralf Gommers wrote: > On Mon, Mar 25, 2013 at 4:23 PM, Dinesh B Vadhia < > dineshbvadhia at hotmail.com> wrote: > >> ** >> Using PyInstaller, the following error occurs: >> >> Traceback (most recent call last): >> File "", line 9, in >> File "//usr/lib/python2.7/dist-packages/PIL/Image.py", line 355, in init >> __import__(f, globals(), locals(), []) >> File "//usr/lib/python2.7/dist-packages/PIL/IptcImagePlugin.py", line >> 23, in >> import os, tempfile >> File "/usr/lib/python2.7/tempfile.py", line 34, in >> from random import Random as _Random >> File "//usr/lib/python2.7/dist-packages/numpy/random/__init__.py", line >> 90, in >> ranf = random = sample = random_sample >> NameError: name 'random_sample' is not defined >> >> Is line 90 in __init.py__ valid? >> > > It is. > In my reading of this the main problem is that `tempfile` is trying to import `random` from the Python standard library but instead is importing the one from within NumPy (i.e., `numpy.random`). I suspect that somehow `sys.path` is being set incorrectly --- perhaps because of the `PYTHONPATH` environment variable. -Brad -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Mon Mar 25 19:27:35 2013 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Tue, 26 Mar 2013 00:27:35 +0100 Subject: [Numpy-discussion] NumPy/SciPy participation in GSoC 2013 In-Reply-To: References: Message-ID: On Thu, Mar 21, 2013 at 10:20 PM, Ralf Gommers wrote: > Hi all, > > It is the time of the year for Google Summer of Code applications. If we > want to participate with Numpy and/or Scipy, we need two things: enough > mentors and ideas for projects. If we get those, we'll apply under the PSF > umbrella. They've outlined the timeline they're working by and guidelines > at > http://pyfound.blogspot.nl/2013/03/get-ready-for-google-summer-of-code.html. > > > We should be able to come up with some interesting project ideas I'd > think, let's put those at > http://projects.scipy.org/scipy/wiki/SummerofCodeIdeas. Preferably with > enough detail to be understandable for people new to the projects and a > proposed mentor. > > We need at least 3 people willing to mentor a student. Ideally we'd have > enough mentors this week, so we can apply to the PSF on time. If you're > willing to be a mentor, please send me the following: name, email address, > phone nr, and what you're interested in mentoring. If you have time > constaints and have doubts about being able to be a primary mentor, being a > backup mentor would also be helpful. > So far we've only got one primary mentor (thanks Chuck!), most core devs do not seem to have the bandwidth this year. If there are other people interested in mentoring please let me know. If not, then it looks like we're not participating this year. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Tue Mar 26 03:16:35 2013 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Tue, 26 Mar 2013 08:16:35 +0100 Subject: [Numpy-discussion] [numfocus] Growing the contributor base of Numpy In-Reply-To: References: Message-ID: On Mon, Mar 25, 2013 at 7:56 PM, Jonathan Rocher wrote: > Dear all, > > One recurring question is how to *grow the contributor base* to NumPy and > provide help and relief to core developers and maintainers. > > One way to do this would be to *leverage the upcoming SciPy conference*in 2 ways: > > 1. Provide an intermediate or advanced level tutorial on NumPy > focusing on teaching the C-API and the architecture of the package to help > people navigate the source code, and find answers to precise deep > questions. I think that many users would be interested in being better able > to understand the underlayers to become powerful users (and contributors if > they want to). > > 2. Organize a Numpy sprint to leverage all this freshly graduated > students apply what they learned to tackle some of the work under the > guidance of core developers. > > This would be a great occasion to share and grow knowledge that is > fundamental to our community. And the fact that the underlayers are in C is > fine IMHO: SciPy is about scientific programming in Python and that is done > with a lot of C. > > *Thoughts? Anyone interested in leading a tutorial (can be a team of > people)? Anyone willing to coordinate the sprint? Who would be willing to > be present and help during the sprint? * > First thought: excellent initiative. I'm not going to be at SciPy, but I'm happy to coordinate a numpy/scipy sprint at EuroScipy. Going to email the organizers right now. Ralf > Note that there is less than 1 week left until the tutorial submission > deadline. I am happy to help brainstorm on this to make it happen. > > Thanks, > Jonathan and Andy, for the SciPy2013 organizers > > -- > Jonathan Rocher, PhD > Scientific software developer > SciPy2013 conference co-chair > Enthought, Inc. > jrocher at enthought.com > 1-512-536-1057 > http://www.enthought.com > > -- > You received this message because you are subscribed to the Google Groups > "NumFOCUS" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to numfocus+unsubscribe at googlegroups.com. > For more options, visit https://groups.google.com/groups/opt_out. > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dineshbvadhia at hotmail.com Tue Mar 26 04:46:30 2013 From: dineshbvadhia at hotmail.com (Dinesh B Vadhia) Date: Tue, 26 Mar 2013 01:46:30 -0700 Subject: [Numpy-discussion] variables not defined in numpy.random__init.py__ ? In-Reply-To: References: Message-ID: @ Ralf. I missed info.py at the top and it is a valid statement. @ Brad. My project is using Numpy and Scipy and falls over at this point when using PyInstaller. One of the project source files has an "import random" from the Standard Library. As you say, at this point in tempfile.py, it is attempting to "import random" from the Standard Library but instead is importing the one from Numpy (numpy.random). How can this be fixed? Or, is it something for PyInstaller to fix? Thx. From: Bradley M. Froehle Sent: Monday, March 25, 2013 1:26 PM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] variables not defined in numpy.random__init.py__ ? On Mon, Mar 25, 2013 at 12:51 PM, Ralf Gommers wrote: On Mon, Mar 25, 2013 at 4:23 PM, Dinesh B Vadhia wrote: Using PyInstaller, the following error occurs: Traceback (most recent call last): File "", line 9, in File "//usr/lib/python2.7/dist-packages/PIL/Image.py", line 355, in init __import__(f, globals(), locals(), []) File "//usr/lib/python2.7/dist-packages/PIL/IptcImagePlugin.py", line 23, in import os, tempfile File "/usr/lib/python2.7/tempfile.py", line 34, in from random import Random as _Random File "//usr/lib/python2.7/dist-packages/numpy/random/__init__.py", line 90, in ranf = random = sample = random_sample NameError: name 'random_sample' is not defined Is line 90 in __init.py__ valid? It is. In my reading of this the main problem is that `tempfile` is trying to import `random` from the Python standard library but instead is importing the one from within NumPy (i.e., `numpy.random`). I suspect that somehow `sys.path` is being set incorrectly --- perhaps because of the `PYTHONPATH` environment variable. -Brad -------------- next part -------------- An HTML attachment was scrubbed... URL: From pelson.pub at gmail.com Tue Mar 26 05:20:34 2013 From: pelson.pub at gmail.com (Phil Elson) Date: Tue, 26 Mar 2013 09:20:34 +0000 Subject: [Numpy-discussion] Implementing a "find first" style function In-Reply-To: References: Message-ID: Bump. I'd be interested to know if this is a desirable feature for numpy? (specifically the 1D "find" functionality rather than the "any"/"all" also discussed) If so, I'd be more than happy to submit a PR, but I don't want to put in the effort if the principle isn't desirable in the core of numpy. Cheers, On 8 March 2013 17:38, Phil Elson wrote: > Interesting. I hadn't thought of those. I've implemented (very roughly > without a sound logic check) and benchmarked: > > def my_any(a, predicate, chunk_size=2048): > try: > next(find(a, predicate, chunk_size)) > return True > except StopIteration: > return False > > def my_all(a, predicate, chunk_size=2048): > return not my_any(a, lambda a: ~predicate(a), chunk_size) > > > With the following setup: > > import numpy as np > import numpy.random > > np.random.seed(1) > a = np.random.randn(1e8) > > > For a low frequency *any*: > > In [12]: %timeit (np.abs(a) > 6).any() > 1 loops, best of 3: 1.29 s per loop > > In [13]: %timeit my_any(a, lambda a: np.abs(a) > 6) > > 1 loops, best of 3: 792 ms per loop > > In [14]: %timeit my_any(a, lambda a: np.abs(a) > 6, chunk_size=10000) > 1 loops, best of 3: 654 ms per loop > > For a False *any*: > > In [16]: %timeit (np.abs(a) > 7).any() > 1 loops, best of 3: 1.22 s per loop > > In [17]: %timeit my_any(a, lambda a: np.abs(a) > 7) > 1 loops, best of 3: 2.4 s per loop > > For a high probability *any*: > > In [28]: %timeit (np.abs(a) > 1).any() > 1 loops, best of 3: 972 ms per loop > > In [27]: %timeit my_any(a, lambda a: np.abs(a) > 1) > 10000 loops, best of 3: 67 us per loop > > --------------- > > For a low probability *all*: > > In [18]: %timeit (np.abs(a) < 6).all() > 1 loops, best of 3: 1.16 s per loop > > In [19]: %timeit my_all(a, lambda a: np.abs(a) < 6) > 1 loops, best of 3: 880 ms per loop > > In [20]: %timeit my_all(a, lambda a: np.abs(a) < 6, chunk_size=10000) > 1 loops, best of 3: 706 ms per loop > > For a True *all*: > > In [22]: %timeit (np.abs(a) < 7).all() > 1 loops, best of 3: 1.47 s per loop > > In [23]: %timeit my_all(a, lambda a: np.abs(a) < 7) > 1 loops, best of 3: 2.65 s per loop > > For a high probability *all*: > > In [25]: %timeit (np.abs(a) < 1).all() > 1 loops, best of 3: 978 ms per loop > > In [26]: %timeit my_all(a, lambda a: np.abs(a) < 1) > 10000 loops, best of 3: 73.6 us per loop > > > > > > > > On 6 March 2013 21:16, Benjamin Root wrote: > >> >> >> On Tue, Mar 5, 2013 at 9:15 AM, Phil Elson wrote: >> >>> The ticket https://github.com/numpy/numpy/issues/2269 discusses the >>> possibility of implementing a "find first" style function which can >>> optimise the process of finding the first value(s) which match a predicate >>> in a given 1D array. For example: >>> >>> >>> >>> a = np.sin(np.linspace(0, np.pi, 200)) >>> >>> print find_first(a, lambda a: a > 0.9) >>> ((71, ), 0.900479032457) >>> >>> >>> This has been discussed in several locations: >>> >>> https://github.com/numpy/numpy/issues/2269 >>> https://github.com/numpy/numpy/issues/2333 >>> >>> http://stackoverflow.com/questions/7632963/numpy-array-how-to-find-index-of-first-occurrence-of-item >>> >>> >>> *Rationale* >>> >>> For small arrays there is no real reason to avoid doing: >>> >>> >>> a = np.sin(np.linspace(0, np.pi, 200)) >>> >>> ind = (a > 0.9).nonzero()[0][0] >>> >>> print (ind, ), a[ind] >>> (71,) 0.900479032457 >>> >>> >>> But for larger arrays, this can lead to massive amounts of work even if >>> the result is one of the first to be computed. Example: >>> >>> >>> a = np.arange(1e8) >>> >>> print (a == 5).nonzero()[0][0] >>> 5 >>> >>> >>> So a function which terminates when the first matching value is found is >>> desirable. >>> >>> As mentioned in #2269, it is possible to define a consistent ordering >>> which allows this functionality for >1D arrays, but IMHO it overcomplicates >>> the problem and was not a case that I personally needed, so I've limited >>> the scope to 1D arrays only. >>> >>> >>> *Implementation* >>> >>> My initial assumption was that to get any kind of performance I would >>> need to write the *find* function in C, however after prototyping with >>> some array chunking it became apparent that a trivial python function would >>> be quick enough for my needs. >>> >>> The approach I've implemented in the code found in #2269 simply breaks >>> the array into sub-arrays of maximum length *chunk_size* (2048 by >>> default, though there is no real science to this number), applies the given >>> predicating function, and yields the results from *nonzero()*. The >>> given function should be a python function which operates on the whole of >>> the sub-array element-wise (i.e. the function should be vectorized). >>> Returning a generator also has the benefit of allowing users to get the >>> first *n* matching values/indices. >>> >>> >>> *Results* >>> >>> >>> I timed the implementation of *find* found in my comment at >>> https://github.com/numpy/numpy/issues/2269#issuecomment-14436725 with >>> an obvious test: >>> >>> >>> In [1]: from np_utils import find >>> >>> In [2]: import numpy as np >>> >>> In [3]: import numpy.random >>> >>> In [4]: np.random.seed(1) >>> >>> In [5]: a = np.random.randn(1e8) >>> >>> In [6]: a.min(), a.max() >>> Out[6]: (-6.1194900990552776, 5.9632246301166321) >>> >>> In [7]: next(find(a, lambda a: np.abs(a) > 6)) >>> Out[7]: ((33105441,), -6.1194900990552776) >>> >>> In [8]: (np.abs(a) > 6).nonzero() >>> Out[8]: (array([33105441]),) >>> >>> In [9]: %timeit (np.abs(a) > 6).nonzero() >>> 1 loops, best of 3: 1.51 s per loop >>> >>> In [10]: %timeit next(find(a, lambda a: np.abs(a) > 6)) >>> 1 loops, best of 3: 912 ms per loop >>> >>> In [11]: %timeit next(find(a, lambda a: np.abs(a) > 6, >>> chunk_size=100000)) >>> 1 loops, best of 3: 470 ms per loop >>> >>> In [12]: %timeit next(find(a, lambda a: np.abs(a) > 6, >>> chunk_size=1000000)) >>> 1 loops, best of 3: 483 ms per loop >>> >>> >>> This shows that picking a sensible *chunk_size* can yield massive >>> speed-ups (nonzero is x3 slower in one case). A similar example with a much >>> smaller 1D array shows similar promise: >>> >>> In [41]: a = np.random.randn(1e4) >>> >>> In [42]: %timeit next(find(a, lambda a: np.abs(a) > 3)) >>> 10000 loops, best of 3: 35.8 us per loop >>> >>> In [43]: %timeit (np.abs(a) > 3).nonzero() >>> 10000 loops, best of 3: 148 us per loop >>> >>> >>> As I commented on the issue tracker, if you think this function is worth >>> taking forward, I'd be happy to open up a pull request. >>> >>> Feedback greatfully received. >>> >>> Cheers, >>> >>> Phil >>> >>> >>> >> In the interest of generalizing code and such, could such approaches be >> used for functions like np.any() and np.all() for short-circuiting if True >> or False (respectively) are found? I wonder what other sort of functions >> in NumPy might benefit from this? >> >> Ben Root >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndbecker2 at gmail.com Tue Mar 26 09:07:07 2013 From: ndbecker2 at gmail.com (Neal Becker) Date: Tue, 26 Mar 2013 09:07:07 -0400 Subject: [Numpy-discussion] howto reduce along arbitrary axis Message-ID: In the following code, the function maxstar is applied along the last axis. Can anyone suggest how to modify this to apply reduction along a user-specified axis? def maxstar2 (a, b): return max (a, b) + log1p (exp (-abs (a - b))) def maxstar (u): s = u.shape[-1] if s == 1: return u[...,0] elif s == 2: return maxstar2 (u[...,0], u[...,1]) else: return maxstar2 ( maxstar (u[...,:s/2]), maxstar (u[...,s/2:])) From chaoyuejoy at gmail.com Tue Mar 26 10:23:16 2013 From: chaoyuejoy at gmail.com (Chao YUE) Date: Tue, 26 Mar 2013 15:23:16 +0100 Subject: [Numpy-discussion] howto reduce along arbitrary axis Message-ID: Hi Neal, I forward you this mail which I think might be of help to your question. Chao ---------- Forwarded message ---------- From: Chao YUE Date: Sat, Mar 16, 2013 at 5:40 PM Subject: indexing of arbitrary axis and arbitrary slice? To: Discussion of Numerical Python Dear all, Is there some way to index the numpy array by specifying arbitrary axis and arbitrary slice, while not knowing the actual shape of the data? For example, I have a 3-dim data, data.shape = (3,4,5) Is there a way to retrieve data[:,0,:] by using something like np.retrieve_data(data,axis=2,slice=0), by this way you don't have to know the actual shape of the array. for for 4-dim data, np.retrieve_data(data,axis=2,slice=0) will actually be data[:,0,:,:] thanks in advance, Chao -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From chaoyuejoy at gmail.com Tue Mar 26 10:59:58 2013 From: chaoyuejoy at gmail.com (Chao YUE) Date: Tue, 26 Mar 2013 15:59:58 +0100 Subject: [Numpy-discussion] howto reduce along arbitrary axis In-Reply-To: References: Message-ID: Oh sorry, my fault... here is the answer by Nathaniel Smith: def retrieve_data(a, ax, idx): full_idx = [slice(None)] * a.ndim full_idx[ax] = idx return a[tuple(full_idx)] Or for the specific case where you do know the axis in advance, you just don't know how many trailing axes there are, use a[:, :, 0, ...] and the ... will expand to represent the appropriate number of :'s. probably you can sue something simlaer. Chao On Tue, Mar 26, 2013 at 3:33 PM, Neal Becker wrote: > Thank you, but do you also have an answer to this question? I only see > the question. > > > On Tue, Mar 26, 2013 at 10:23 AM, Chao YUE wrote: > >> Hi Neal, >> >> I forward you this mail which I think might be of help to your question. >> >> Chao >> >> ---------- Forwarded message ---------- >> From: Chao YUE >> Date: Sat, Mar 16, 2013 at 5:40 PM >> Subject: indexing of arbitrary axis and arbitrary slice? >> To: Discussion of Numerical Python >> >> >> Dear all, >> >> Is there some way to index the numpy array by specifying arbitrary axis >> and arbitrary slice, while >> not knowing the actual shape of the data? >> For example, I have a 3-dim data, data.shape = (3,4,5) >> Is there a way to retrieve data[:,0,:] by using something like >> np.retrieve_data(data,axis=2,slice=0), >> by this way you don't have to know the actual shape of the array. >> for for 4-dim data, np.retrieve_data(data,axis=2,slice=0) will actually >> be data[:,0,:,:] >> >> thanks in advance, >> >> Chao >> >> -- >> >> *********************************************************************************** >> Chao YUE >> Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) >> UMR 1572 CEA-CNRS-UVSQ >> Batiment 712 - Pe 119 >> 91191 GIF Sur YVETTE Cedex >> Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 >> >> ************************************************************************************ >> >> >> >> -- >> >> *********************************************************************************** >> Chao YUE >> Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) >> UMR 1572 CEA-CNRS-UVSQ >> Batiment 712 - Pe 119 >> 91191 GIF Sur YVETTE Cedex >> Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 >> >> ************************************************************************************ >> > > -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Tue Mar 26 15:06:11 2013 From: cournape at gmail.com (David Cournapeau) Date: Tue, 26 Mar 2013 19:06:11 +0000 Subject: [Numpy-discussion] [numfocus] Growing the contributor base of Numpy In-Reply-To: References: Message-ID: On Mon, Mar 25, 2013 at 6:56 PM, Jonathan Rocher wrote: > Dear all, > > One recurring question is how to grow the contributor base to NumPy and > provide help and relief to core developers and maintainers. > > One way to do this would be to leverage the upcoming SciPy conference in 2 > ways: > > Provide an intermediate or advanced level tutorial on NumPy focusing on > teaching the C-API and the architecture of the package to help people > navigate the source code, and find answers to precise deep questions. I > think that many users would be interested in being better able to understand > the underlayers to become powerful users (and contributors if they want to). > > Organize a Numpy sprint to leverage all this freshly graduated students > apply what they learned to tackle some of the work under the guidance of > core developers. > > This would be a great occasion to share and grow knowledge that is > fundamental to our community. And the fact that the underlayers are in C is > fine IMHO: SciPy is about scientific programming in Python and that is done > with a lot of C. > > Thoughts? Anyone interested in leading a tutorial (can be a team of people)? > Anyone willing to coordinate the sprint? Who would be willing to be present > and help during the sprint? I would be happy to be part of the team doing it, David From ondrej.certik at gmail.com Tue Mar 26 20:32:06 2013 From: ondrej.certik at gmail.com (=?UTF-8?B?T25kxZllaiDEjGVydMOtaw==?=) Date: Tue, 26 Mar 2013 17:32:06 -0700 Subject: [Numpy-discussion] ANN: NumPy 1.7.1rc1 release In-Reply-To: References: Message-ID: On Sun, Mar 24, 2013 at 8:00 PM, Charles R Harris wrote: > > > On Sun, Mar 24, 2013 at 3:02 PM, Ond?ej ?ert?k > wrote: >> >> Hi, >> >> I'm pleased to announce the availability of the first release candidate of >> NumPy 1.7.1rc1. >> >> Sources and binary installers can be found at >> https://sourceforge.net/projects/numpy/files/NumPy/1.7.1rc1/ >> >> Please test it and report any bugs. It fixes a few bugs, listed below. >> >> I would like to thank everybody who contributed patches to this release: >> Nathaniel J. Smith, Sebastian Berg, Charles Harris, Bradley M. Froehle, >> Ralf Gommers, Christoph Gohlke, Mark Wiebe and Maximilian Albert. >> >> Cheers, >> Ondrej >> >> >> >> ========================= >> NumPy 1.7.1 Release Notes >> ========================= >> >> This is a bugfix only release in the 1.7.x series. >> >> >> Issues fixed >> ------------ >> >> gh-2973 Fix `1` is printed during numpy.test() >> gh-2983 BUG: gh-2969: Backport memory leak fix 80b3a34. >> gh-3007 Backport gh-3006 >> gh-2984 Backport fix complex polynomial fit >> gh-2982 BUG: Make nansum work with booleans. >> gh-2985 Backport large sort fixes >> gh-3039 Backport object take >> gh-3105 Backport nditer fix op axes initialization >> gh-3108 BUG: npy-pkg-config ini files were missing after Bento build. >> gh-3124 BUG: PyArray_LexSort allocates too much temporary memory. >> gh-3131 BUG: Exported f2py_size symbol prevents linking multiple f2py >> modules. >> gh-3117 Backport gh-2992 >> gh-3135 DOC: Add mention of PyArray_SetBaseObject stealing a reference >> gh-3134 DOC: Fix typo in fft docs (the indexing variable is 'm', not >> 'n'). >> gh-3136 Backport #3128 >> >> Checksums >> ========= >> >> 28c3f3e71b5eaa6bfab6e8340dbd35e7 release/installers/numpy-1.7.1rc1.tar.gz >> 436f416dee10d157314bd9da7ab95c9c >> release/installers/numpy-1.7.1rc1-win32-superpack-python2.7.exe >> a543c8cf69f66ff2b4c9565646105863 >> release/installers/numpy-1.7.1rc1-win32-superpack-python2.6.exe >> 6dfcbbd449b7fe4e841c5fd1bfa7af7c >> release/installers/numpy-1.7.1rc1-win32-superpack-python2.5.exe >> 22912792a1b6155ae2bdbc30bee8fadc >> release/installers/numpy-1.7.1rc1-win32-superpack-python3.2.exe >> 95bc5a5fcce9fcbc2717a774dccae31b >> release/installers/numpy-1.7.1rc1-win32-superpack-python3.3.exe >> 33cf283765a148846b49b89fb96d67d5 >> release/installers/numpy-1.7.1rc1-win32-superpack-python3.1.exe >> 9761de4b35493fed38c5d177da9c3b37 release/installers/numpy-1.7.1rc1.zip >> __ > > > Great. The fix for the memory leak should make some folks happy. Yes. I created an issue here for them to test it: https://github.com/scikit-learn/scikit-learn/issues/1809 Just to make sure. Ondrej From matthew.brett at gmail.com Tue Mar 26 20:48:00 2013 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 26 Mar 2013 17:48:00 -0700 Subject: [Numpy-discussion] Any plans for windows 64-bit installer for 1.7? In-Reply-To: References: <51119007.6090806@uci.edu> <5113399F.3090803@astro.uio.no> Message-ID: Hi Ondrej, On Thu, Feb 7, 2013 at 3:18 PM, Ond?ej ?ert?k wrote: > On Thu, Feb 7, 2013 at 12:29 PM, Chris Barker - NOAA Federal > wrote: >> On Thu, Feb 7, 2013 at 11:38 AM, Matthew Brett wrote: >>> a) If we cannot build Scipy now, it may or may not be acceptable to >>> release numpy now. I think it is, you (Ralf) think it isn't, we >>> haven't discussed that. It may not come up. >> >> Is anyone suggesting we hold the whole release for this? I fnot, then > > Just to make it clear, I do not plan to hold the whole release because of this. > Previous releases also didn't have this official 64bit Windows binary, > so there is > no regression. > > Once we figure out how to create 64bit binaries, then we'll start > uploading them. Did you make any progress with this? Worth making some notes? Anything we can do to help? Cheers, Matthew From njs at pobox.com Wed Mar 27 08:19:02 2013 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 27 Mar 2013 12:19:02 +0000 Subject: [Numpy-discussion] Implementing a "find first" style function In-Reply-To: References: Message-ID: On Tue, Mar 26, 2013 at 9:20 AM, Phil Elson wrote: > Bump. > > I'd be interested to know if this is a desirable feature for numpy? > (specifically the 1D "find" functionality rather than the "any"/"all" also > discussed) > If so, I'd be more than happy to submit a PR, but I don't want to put in the > effort if the principle isn't desirable in the core of numpy. I don't think anyone has a strong opinion either way :-). It seems like a fairly general interface that people might find useful, so I don't see an immediate objection to including it in principle. It would help to see the actual numbers from a tuned version though to know how much benefit there is to get... -n From mdroe at stsci.edu Wed Mar 27 08:51:11 2013 From: mdroe at stsci.edu (Michael Droettboom) Date: Wed, 27 Mar 2013 08:51:11 -0400 Subject: [Numpy-discussion] ANN: matplotlib 1.2.1 release Message-ID: <5152EB3F.5090902@stsci.edu> I'm pleased to announce the release of matplotlib 1.2.1. This is a bug release and improves stability and quality over the 1.2.0 release from four months ago. All users on 1.2.0 are encouraged to upgrade. Since github no longer provides download hosting, our tarballs and binaries are back on SourceForge, and we have a master index of downloads here: http://matplotlib.org/downloads Highlights include: - Usage of deprecated APIs in matplotlib are now displayed by default on all Python versions - Agg backend: Cleaner rendering of rectilinear lines when snapping to pixel boundaries, and fixes rendering bugs when using clip paths - Python 3: Fixes a number of missed Python 3 compatibility problems - Histograms and stacked histograms have a number of important bugfixes - Compatibility with more 3rd-party TrueType fonts - SVG backend: Image support in SVG output is consistent with other backends - Qt backend: Fixes leaking of window objects in Qt backend - hexbin with a log scale now works correctly - autoscaling works better on 3D plots - ...and numerous others. Enjoy! As always, there are number of good ways to get help with matplotlib listed on the homepage at http://matplotlib.org/ and I thank everyone for their continued support of this project. Mike Droettboom -------------- next part -------------- An HTML attachment was scrubbed... URL: From Andrea.Cimatoribus at nioz.nl Wed Mar 27 10:41:59 2013 From: Andrea.Cimatoribus at nioz.nl (Andrea Cimatoribus) Date: Wed, 27 Mar 2013 15:41:59 +0100 Subject: [Numpy-discussion] Growing the contributor base of Numpy Message-ID: Not sure if this is really relevant to the original message, but here is my opinion. I think that the numpy/scipy community would greatly benefit from a platform enabling easy sharing of code written by users. This would provide a database of solved problems, where people could dig without having to ask. I think that something like this exists for matlab, but I have no experience with it. If it exists for python, then it must be seriously under-advertised. The web provides many answers, but they are scattered in all sorts of places, and it is often impossible to contribute improvements to code found online. If such a database could enable some sort of collaborative development it would be a great added value for numpy, and would provide a natural source of new features or improvements for scipy and numpy. From njs at pobox.com Wed Mar 27 10:59:09 2013 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 27 Mar 2013 14:59:09 +0000 Subject: [Numpy-discussion] Growing the contributor base of Numpy In-Reply-To: References: Message-ID: On Wed, Mar 27, 2013 at 2:41 PM, Andrea Cimatoribus wrote: > > Not sure if this is really relevant to the original message, but here is my opinion. I think that the numpy/scipy community would greatly benefit from a platform enabling easy sharing of code written by users. This would provide a database of solved problems, where people could dig without having to ask. I think that something like this exists for matlab, but I have no experience with it. If it exists for python, then it must be seriously under-advertised. The web provides many answers, but they are scattered in all sorts of places, and it is often impossible to contribute improvements to code found online. If such a database could enable some sort of collaborative development it would be a great added value for numpy, and would provide a natural source of new features or improvements for scipy and numpy. Supposedly that's what scipy-central is for, but it's somehow not yet reached critical mass and become a household name; I haven't looked hard enough to have any hypotheses about why not. Surya Kasturi is working on spiffing it up (see discussion on scipy-dev); I bet they could use some help if you want to scratch this itch. -n From Andrea.Cimatoribus at nioz.nl Wed Mar 27 13:12:02 2013 From: Andrea.Cimatoribus at nioz.nl (Andrea Cimatoribus) Date: Wed, 27 Mar 2013 18:12:02 +0100 Subject: [Numpy-discussion] Growing the contributor base of Numpy Message-ID: Oh, I didn't even know it existed! > > Not sure if this is really relevant to the original message, but here is my opinion. I think that the numpy/scipy community would greatly benefit from a platform enabling easy sharing of code written by users. This would provide a database of solved problems, where people could dig without having to ask. I think that something like this exists for matlab, but I have no experience with it. If it exists for python, then it must be seriously under-advertised. The web provides many answers, but they are scattered in all sorts of places, and it is often impossible to contribute improvements to code found online. If such a database could enable some sort of collaborative development it would be a great added value for numpy, and would provide a natural source of new features or improvements for scipy and numpy. Supposedly that's what scipy-central is for, but it's somehow not yet reached critical mass and become a household name; I haven't looked hard enough to have any hypotheses about why not. Surya Kasturi is working on spiffing it up (see discussion on scipy-dev); I bet they could use some help if you want to scratch this itch. From ralf.gommers at gmail.com Wed Mar 27 16:09:21 2013 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Wed, 27 Mar 2013 21:09:21 +0100 Subject: [Numpy-discussion] [numfocus] Growing the contributor base of Numpy In-Reply-To: References: Message-ID: On Tue, Mar 26, 2013 at 8:16 AM, Ralf Gommers wrote: > > > > On Mon, Mar 25, 2013 at 7:56 PM, Jonathan Rocher wrote: > >> Dear all, >> >> One recurring question is how to *grow the contributor base* to NumPy >> and provide help and relief to core developers and maintainers. >> >> One way to do this would be to *leverage the upcoming SciPy conference*in 2 ways: >> >> 1. Provide an intermediate or advanced level tutorial on NumPy >> focusing on teaching the C-API and the architecture of the package to help >> people navigate the source code, and find answers to precise deep >> questions. I think that many users would be interested in being better able >> to understand the underlayers to become powerful users (and contributors if >> they want to). >> >> 2. Organize a Numpy sprint to leverage all this freshly graduated >> students apply what they learned to tackle some of the work under the >> guidance of core developers. >> >> This would be a great occasion to share and grow knowledge that is >> fundamental to our community. And the fact that the underlayers are in C is >> fine IMHO: SciPy is about scientific programming in Python and that is done >> with a lot of C. >> >> *Thoughts? Anyone interested in leading a tutorial (can be a team of >> people)? Anyone willing to coordinate the sprint? Who would be willing to >> be present and help during the sprint? * >> > > First thought: excellent initiative. I'm not going to be at SciPy, but I'm > happy to coordinate a numpy/scipy sprint at EuroScipy. Going to email the > organizers right now. > The EuroScipy organizers have accepted our sprint, so we'll have a room available. If you're going to the conference, think about reserving Sun 25 Aug to attend this sprint. I've put up a page where people can add topics and more details: http://projects.scipy.org/scipy/wiki/EuroSciPy2013Sprint Ralf > > Ralf > > > > >> Note that there is less than 1 week left until the tutorial submission >> deadline. I am happy to help brainstorm on this to make it happen. >> >> Thanks, >> Jonathan and Andy, for the SciPy2013 organizers >> >> -- >> Jonathan Rocher, PhD >> Scientific software developer >> SciPy2013 conference co-chair >> Enthought, Inc. >> jrocher at enthought.com >> 1-512-536-1057 >> http://www.enthought.com >> >> -- >> You received this message because you are subscribed to the Google Groups >> "NumFOCUS" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to numfocus+unsubscribe at googlegroups.com. >> For more options, visit https://groups.google.com/groups/opt_out. >> >> >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jrocher at enthought.com Wed Mar 27 16:45:56 2013 From: jrocher at enthought.com (Jonathan Rocher) Date: Wed, 27 Mar 2013 15:45:56 -0500 Subject: [Numpy-discussion] [numfocus] Growing the contributor base of Numpy In-Reply-To: References: Message-ID: Awesome Ralf! And thanks David C. for being available for the US one. When you say you would like to be part of it, did you mean an advanced tutorial or a sprint? Other people available to contribute to this or coordinate this? Thanks, Jonathan On Wed, Mar 27, 2013 at 3:09 PM, Ralf Gommers wrote: > > > > On Tue, Mar 26, 2013 at 8:16 AM, Ralf Gommers wrote: > >> >> >> >> On Mon, Mar 25, 2013 at 7:56 PM, Jonathan Rocher wrote: >> >>> Dear all, >>> >>> One recurring question is how to *grow the contributor base* to NumPy >>> and provide help and relief to core developers and maintainers. >>> >>> One way to do this would be to *leverage the upcoming SciPy conference*in 2 ways: >>> >>> 1. Provide an intermediate or advanced level tutorial on NumPy >>> focusing on teaching the C-API and the architecture of the package to help >>> people navigate the source code, and find answers to precise deep >>> questions. I think that many users would be interested in being better able >>> to understand the underlayers to become powerful users (and contributors if >>> they want to). >>> >>> 2. Organize a Numpy sprint to leverage all this freshly graduated >>> students apply what they learned to tackle some of the work under the >>> guidance of core developers. >>> >>> This would be a great occasion to share and grow knowledge that is >>> fundamental to our community. And the fact that the underlayers are in C is >>> fine IMHO: SciPy is about scientific programming in Python and that is done >>> with a lot of C. >>> >>> *Thoughts? Anyone interested in leading a tutorial (can be a team of >>> people)? Anyone willing to coordinate the sprint? Who would be willing to >>> be present and help during the sprint? * >>> >> >> First thought: excellent initiative. I'm not going to be at SciPy, but >> I'm happy to coordinate a numpy/scipy sprint at EuroScipy. Going to email >> the organizers right now. >> > > The EuroScipy organizers have accepted our sprint, so we'll have a room > available. If you're going to the conference, think about reserving Sun 25 > Aug to attend this sprint. I've put up a page where people can add topics > and more details: http://projects.scipy.org/scipy/wiki/EuroSciPy2013Sprint > > Ralf > > > >> >> Ralf >> >> >> >> >>> Note that there is less than 1 week left until the tutorial submission >>> deadline. I am happy to help brainstorm on this to make it happen. >>> >>> Thanks, >>> Jonathan and Andy, for the SciPy2013 organizers >>> >>> -- >>> Jonathan Rocher, PhD >>> Scientific software developer >>> SciPy2013 conference co-chair >>> Enthought, Inc. >>> jrocher at enthought.com >>> 1-512-536-1057 >>> http://www.enthought.com >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "NumFOCUS" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to numfocus+unsubscribe at googlegroups.com. >>> For more options, visit https://groups.google.com/groups/opt_out. >>> >>> >>> >> >> > -- Jonathan Rocher, PhD Scientific software developer SciPy2013 conference co-chair Enthought, Inc. jrocher at enthought.com 1-512-536-1057 http://www.enthought.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Wed Mar 27 18:11:12 2013 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Wed, 27 Mar 2013 23:11:12 +0100 Subject: [Numpy-discussion] variables not defined in numpy.random__init.py__ ? In-Reply-To: References: Message-ID: On Tue, Mar 26, 2013 at 9:46 AM, Dinesh B Vadhia wrote: > ** > @ Ralf. I missed info.py at the top and it is a valid statement. > > @ Brad. My project is using Numpy and Scipy and falls over at this point > when using PyInstaller. One of the project source files has an "import > random" from the Standard Library. As you say, at this point in > tempfile.py, it is attempting to "import random" from the Standard Library > but instead is importing the one from Numpy (numpy.random). How can this > be fixed? Or, is it something for PyInstaller to fix? Thx. > Probably the latter. Check your PYTHONPATH is not set and you're not doing anything to sys.path somehow. Then probably best to ask on the PyInstaller mailing list. Ralf > > > *From:* Bradley M. Froehle > *Sent:* Monday, March 25, 2013 1:26 PM > *To:* Discussion of Numerical Python > *Subject:* Re: [Numpy-discussion] variables not defined in > numpy.random__init.py__ ? > > On Mon, Mar 25, 2013 at 12:51 PM, Ralf Gommers wrote: > >> On Mon, Mar 25, 2013 at 4:23 PM, Dinesh B Vadhia < >> dineshbvadhia at hotmail.com> wrote: >> >>> ** >>> Using PyInstaller, the following error occurs: >>> >>> Traceback (most recent call last): >>> File "", line 9, in >>> File "//usr/lib/python2.7/dist-packages/PIL/Image.py", line 355, in >>> init >>> __import__(f, globals(), locals(), []) >>> File "//usr/lib/python2.7/dist-packages/PIL/IptcImagePlugin.py", line >>> 23, in >>> import os, tempfile >>> File "/usr/lib/python2.7/tempfile.py", line 34, in >>> from random import Random as _Random >>> File "//usr/lib/python2.7/dist-packages/numpy/random/__init__.py", >>> line 90, in >>> ranf = random = sample = random_sample >>> NameError: name 'random_sample' is not defined >>> >>> Is line 90 in __init.py__ valid? >>> >> >> It is. >> > > In my reading of this the main problem is that `tempfile` is trying to > import `random` from the Python standard library but instead is importing > the one from within NumPy (i.e., `numpy.random`). I suspect that somehow > `sys.path` is being set incorrectly --- perhaps because of the `PYTHONPATH` > environment variable. > > -Brad > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pelson.pub at gmail.com Thu Mar 28 07:04:15 2013 From: pelson.pub at gmail.com (Phil Elson) Date: Thu, 28 Mar 2013 11:04:15 +0000 Subject: [Numpy-discussion] Implementing a "find first" style function In-Reply-To: References: Message-ID: I've specifically not tuned it, primarily because the get the best tuning you need to know the likelihood of finding a match. One option would be to allow users to specify a "probability" parameter which would chunk the array into size*probability chunks - an additional parameter could then be exposed to limit the maximum chunk size to give the user control of the maximum memory overhead that the routine could use. I'll submit a PR and we can discuss inline. Thanks for the response Nathaniel. On 27 March 2013 12:19, Nathaniel Smith wrote: > On Tue, Mar 26, 2013 at 9:20 AM, Phil Elson wrote: > > Bump. > > > > I'd be interested to know if this is a desirable feature for numpy? > > (specifically the 1D "find" functionality rather than the "any"/"all" > also > > discussed) > > If so, I'd be more than happy to submit a PR, but I don't want to put in > the > > effort if the principle isn't desirable in the core of numpy. > > I don't think anyone has a strong opinion either way :-). It seems > like a fairly general interface that people might find useful, so I > don't see an immediate objection to including it in principle. It > would help to see the actual numbers from a tuned version though to > know how much benefit there is to get... > > -n > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Thu Mar 28 12:47:05 2013 From: cournape at gmail.com (David Cournapeau) Date: Thu, 28 Mar 2013 16:47:05 +0000 Subject: [Numpy-discussion] [numfocus] Growing the contributor base of Numpy In-Reply-To: References: Message-ID: On Wed, Mar 27, 2013 at 8:45 PM, Jonathan Rocher wrote: > Awesome Ralf! > > And thanks David C. for being available for the US one. When you say you > would like to be part of it, did you mean an advanced tutorial or a sprint? I meant I would be happy to contribute to a tutorial in the spirit of "dive into numpy code". I would prefer if we were two doing it, though. David From irving at naml.us Thu Mar 28 14:16:48 2013 From: irving at naml.us (Geoffrey Irving) Date: Thu, 28 Mar 2013 11:16:48 -0700 Subject: [Numpy-discussion] inheriting from recarray with nested dtypes Message-ID: I have the following two structured dtypes: rotation (quaternion) = dtype([('s','f8'),('v','3f8')]) frame = dtype([('t','3f8'),('r',rotation)]) For various reasons, I usually store rotation arrays in a class Rotations deriving from ndarray, and frames in a class Frames deriving from ndarray. Currently I am defining .s, .v, .t, and .r properties manually, and I'd like to switch to inheriting from recarray. However, the .r property should return an array of class Rotations. I.e., f = Frames(...) # f.dtype = frame r = f.r # r.dtype = rotation, type(r) = Rotations Is there a clean way to tell recarray to adjust the type returned? It already has a bit of intelligence there, since it returns ndarray vs. recarray based on whether the returned dtype has fields. The full code is here in case anyone is curious: https://github.com/otherlab/core/blob/master/vector/Frame.py https://github.com/otherlab/core/blob/master/vector/Rotation.py Thanks, Geoffrey From ondrej.certik at gmail.com Thu Mar 28 21:31:07 2013 From: ondrej.certik at gmail.com (=?UTF-8?B?T25kxZllaiDEjGVydMOtaw==?=) Date: Thu, 28 Mar 2013 18:31:07 -0700 Subject: [Numpy-discussion] Any plans for windows 64-bit installer for 1.7? In-Reply-To: References: <51119007.6090806@uci.edu> <5113399F.3090803@astro.uio.no> Message-ID: Hi Matthew, On Tue, Mar 26, 2013 at 5:48 PM, Matthew Brett wrote: > Hi Ondrej, > > On Thu, Feb 7, 2013 at 3:18 PM, Ond?ej ?ert?k wrote: >> On Thu, Feb 7, 2013 at 12:29 PM, Chris Barker - NOAA Federal >> wrote: >>> On Thu, Feb 7, 2013 at 11:38 AM, Matthew Brett wrote: >>>> a) If we cannot build Scipy now, it may or may not be acceptable to >>>> release numpy now. I think it is, you (Ralf) think it isn't, we >>>> haven't discussed that. It may not come up. >>> >>> Is anyone suggesting we hold the whole release for this? I fnot, then >> >> Just to make it clear, I do not plan to hold the whole release because of this. >> Previous releases also didn't have this official 64bit Windows binary, >> so there is >> no regression. >> >> Once we figure out how to create 64bit binaries, then we'll start >> uploading them. > > Did you make any progress with this? Worth making some notes? > Anything we can do to help? Unfortunately I've been too busy the last month to push this through, so right now I am just concentrating on getting 1.7.1 out of the door, as that is higher priority. I am starting a new job on Monday, so once things settle down, I should be able to get back to this. I will post notes once I get to this again. Ondrej From toddrjen at gmail.com Fri Mar 29 11:15:07 2013 From: toddrjen at gmail.com (Todd) Date: Fri, 29 Mar 2013 16:15:07 +0100 Subject: [Numpy-discussion] Polar/spherical coordinates handling Message-ID: >From what I can see, numpy doesn't have any functions for handling polar or spherical coordinate to/from cartesian coordinate conversion. I think such methods would be pretty useful. I am looking now and it doesn't look that hard to create functions to convert between n-dimensional cartesian and n-spherical coordinates. Would anyone be interested in me adding methods for this? -------------- next part -------------- An HTML attachment was scrubbed... URL: From amcmorl at gmail.com Fri Mar 29 11:33:13 2013 From: amcmorl at gmail.com (Angus McMorland) Date: Fri, 29 Mar 2013 11:33:13 -0400 Subject: [Numpy-discussion] Polar/spherical coordinates handling In-Reply-To: References: Message-ID: On 29 March 2013 11:15, Todd wrote: > From what I can see, numpy doesn't have any functions for handling polar or > spherical coordinate to/from cartesian coordinate conversion. I think such > methods would be pretty useful. I am looking now and it doesn't look that > hard to create functions to convert between n-dimensional cartesian and > n-spherical coordinates. Would anyone be interested in me adding methods > for this? I use these co-ordinate transforms often. I wonder if it wouldn't be preferable to create a scikit focused on spherical or, more generally, geometric operations rather than adding to the already hefty number of functions in numpy. I'd be interested to contribute to such a scikit. Angus -- AJC McMorland Research Associate Neurobiology, University of Pittsburgh From toddrjen at gmail.com Fri Mar 29 13:27:13 2013 From: toddrjen at gmail.com (Todd) Date: Fri, 29 Mar 2013 18:27:13 +0100 Subject: [Numpy-discussion] Polar/spherical coordinates handling In-Reply-To: References: Message-ID: On Fri, Mar 29, 2013 at 4:33 PM, Angus McMorland wrote: > On 29 March 2013 11:15, Todd wrote: > > From what I can see, numpy doesn't have any functions for handling polar > or > > spherical coordinate to/from cartesian coordinate conversion. I think > such > > methods would be pretty useful. I am looking now and it doesn't look > that > > hard to create functions to convert between n-dimensional cartesian and > > n-spherical coordinates. Would anyone be interested in me adding methods > > for this? > > I use these co-ordinate transforms often. I wonder if it wouldn't be > preferable to create a scikit focused on spherical or, more generally, > geometric operations rather than adding to the already hefty number of > functions in numpy. I'd be interested to contribute to such a scikit. > The reason I think these particular functions belong in numpy is that they are closely tied to signal processing and linear algebra, far more than any other coordinate systems. It is really just a generalization of the complex number processing that is already available from numpy. Also, although numpy has methods to convert complex values to magnitude and angle, it doesn't have any methods to go the other way. Again, such a function would just be a special 2-D case of the more general n-dimensional functions. -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Fri Mar 29 22:08:23 2013 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 29 Mar 2013 19:08:23 -0700 Subject: [Numpy-discussion] Raveling, reshape order keyword unnecessarily confuses index and memory ordering In-Reply-To: References: Message-ID: Hi, We were teaching today, and found ourselves getting very confused about ravel and shape in numpy. Summary -------------- There are two separate ideas needed to understand ordering in ravel and reshape: Idea 1): ravel / reshape can proceed from the last axis to the first, or the first to the last. This is "ravel index ordering" Idea 2) The physical layout of the array (on disk or in memory) can be "C" or "F" contiguous or neither. This is "memory ordering" The index ordering is usually (but see below) orthogonal to the memory ordering. The 'ravel' and 'reshape' commands use "C" and "F" in the sense of index ordering, and this mixes the two ideas and is confusing. What the current situation looks like ---------------------------------------------------- Specifically, we've been rolling this around 4 experienced numpy users and we all predicted at least one of the results below wrongly. This was what we knew, or should have known: In [2]: import numpy as np In [3]: arr = np.arange(10).reshape((2, 5)) In [5]: arr.ravel() Out[5]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) So, the 'ravel' operation unravels over the last axis (1) first, followed by axis 0. So far so good (even if the opposite to MATLAB, Octave). Then we found the 'order' flag to ravel: In [10]: arr.flags Out[10]: C_CONTIGUOUS : True F_CONTIGUOUS : False OWNDATA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False In [11]: arr.ravel('C') Out[11]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) But we soon got confused. How about this? In [12]: arr_F = np.array(arr, order='F') In [13]: arr_F.flags Out[13]: C_CONTIGUOUS : False F_CONTIGUOUS : True OWNDATA : True WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False In [16]: arr_F Out[16]: array([[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]]) In [17]: arr_F.ravel('C') Out[17]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) Right - so the flag 'C' to ravel, has got nothing to do with *memory* ordering, but is to do with *index* ordering. And in fact, we can ask for memory ordering specifically: In [22]: arr.ravel('K') Out[22]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) In [23]: arr_F.ravel('K') Out[23]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9]) In [24]: arr.ravel('A') Out[24]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) In [25]: arr_F.ravel('A') Out[25]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9]) There are some confusions to get into with the 'order' flag to reshape as well, of the same type. Ravel and reshape use the tems 'C' and 'F" in the sense of index ordering. This is very confusing. We think the index ordering and memory ordering ideas need to be separated, and specifically, we should avoid using "C" and "F" to refer to index ordering. Proposal ------------- * Deprecate the use of "C" and "F" meaning backwards and forwards index ordering for ravel, reshape * Prefer "Z" and "N", being graphical representations of unraveling in 2 dimensions, axis1 first and axis0 first respectively (excellent naming idea by Paul Ivanov) What do y'all think? Cheers, Matthew Paul Ivanov JB Poline From josef.pktd at gmail.com Sat Mar 30 07:14:51 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 30 Mar 2013 07:14:51 -0400 Subject: [Numpy-discussion] Raveling, reshape order keyword unnecessarily confuses index and memory ordering In-Reply-To: References: Message-ID: On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett wrote: > > Hi, > > We were teaching today, and found ourselves getting very confused > about ravel and shape in numpy. > > Summary > -------------- > > There are two separate ideas needed to understand ordering in ravel and reshape: > > Idea 1): ravel / reshape can proceed from the last axis to the first, > or the first to the last. This is "ravel index ordering" > Idea 2) The physical layout of the array (on disk or in memory) can be > "C" or "F" contiguous or neither. > This is "memory ordering" > > The index ordering is usually (but see below) orthogonal to the memory ordering. > > The 'ravel' and 'reshape' commands use "C" and "F" in the sense of > index ordering, and this mixes the two ideas and is confusing. > > What the current situation looks like > ---------------------------------------------------- > > Specifically, we've been rolling this around 4 experienced numpy users > and we all predicted at least one of the results below wrongly. > > This was what we knew, or should have known: > > In [2]: import numpy as np > > In [3]: arr = np.arange(10).reshape((2, 5)) > > In [5]: arr.ravel() > Out[5]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) > > So, the 'ravel' operation unravels over the last axis (1) first, > followed by axis 0. > > So far so good (even if the opposite to MATLAB, Octave). > > Then we found the 'order' flag to ravel: > > In [10]: arr.flags > Out[10]: > C_CONTIGUOUS : True > F_CONTIGUOUS : False > OWNDATA : False > WRITEABLE : True > ALIGNED : True > UPDATEIFCOPY : False > > In [11]: arr.ravel('C') > Out[11]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) > > But we soon got confused. How about this? > > In [12]: arr_F = np.array(arr, order='F') > > In [13]: arr_F.flags > Out[13]: > C_CONTIGUOUS : False > F_CONTIGUOUS : True > OWNDATA : True > WRITEABLE : True > ALIGNED : True > UPDATEIFCOPY : False > > In [16]: arr_F > Out[16]: > array([[0, 1, 2, 3, 4], > [5, 6, 7, 8, 9]]) > > In [17]: arr_F.ravel('C') > Out[17]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) > > Right - so the flag 'C' to ravel, has got nothing to do with *memory* > ordering, but is to do with *index* ordering. > > And in fact, we can ask for memory ordering specifically: > > In [22]: arr.ravel('K') > Out[22]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) > > In [23]: arr_F.ravel('K') > Out[23]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9]) > > In [24]: arr.ravel('A') > Out[24]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) > > In [25]: arr_F.ravel('A') > Out[25]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9]) > > There are some confusions to get into with the 'order' flag to reshape > as well, of the same type. > > Ravel and reshape use the tems 'C' and 'F" in the sense of index ordering. > > This is very confusing. We think the index ordering and memory > ordering ideas need to be separated, and specifically, we should avoid > using "C" and "F" to refer to index ordering. > > Proposal > ------------- > > * Deprecate the use of "C" and "F" meaning backwards and forwards > index ordering for ravel, reshape > * Prefer "Z" and "N", being graphical representations of unraveling in > 2 dimensions, axis1 first and axis0 first respectively (excellent > naming idea by Paul Ivanov) > > What do y'all think? > > Cheers, > > Matthew > Paul Ivanov > JB Poline > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion I always thought "F" and "C" are easy to understand, I always thought about the content and never about the memory when using it. In my numpy htmlhelp for version 1.5, I don't have a K or A option >>> np.__version__ '1.5.1' >>> np.arange(5).ravel("K") Traceback (most recent call last): File "", line 1, in TypeError: order not understood >>> np.arange(5).ravel("A") array([0, 1, 2, 3, 4]) >>> the C, F in ravel have their twins in reshape >>> arr = np.arange(10).reshape(2,5, order="C").copy() >>> arr array([[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]]) >>> arr.ravel() array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >>> arr = np.arange(10).reshape(2,5, order="F").copy() >>> arr array([[0, 2, 4, 6, 8], [1, 3, 5, 7, 9]]) >>> arrarr.ravel("F") array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) For example we use it when we get raveled arrays from R, and F for column order and C for row order indexing are pretty obvious names when coming from another package (Matlab, R, Gauss) Josef From josef.pktd at gmail.com Sat Mar 30 07:48:19 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 30 Mar 2013 07:48:19 -0400 Subject: [Numpy-discussion] Raveling, reshape order keyword unnecessarily confuses index and memory ordering In-Reply-To: References: Message-ID: On Sat, Mar 30, 2013 at 7:14 AM, wrote: > On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett wrote: >> >> Hi, >> >> We were teaching today, and found ourselves getting very confused >> about ravel and shape in numpy. >> >> Summary >> -------------- >> >> There are two separate ideas needed to understand ordering in ravel and reshape: >> >> Idea 1): ravel / reshape can proceed from the last axis to the first, >> or the first to the last. This is "ravel index ordering" >> Idea 2) The physical layout of the array (on disk or in memory) can be >> "C" or "F" contiguous or neither. >> This is "memory ordering" >> >> The index ordering is usually (but see below) orthogonal to the memory ordering. >> >> The 'ravel' and 'reshape' commands use "C" and "F" in the sense of >> index ordering, and this mixes the two ideas and is confusing. >> >> What the current situation looks like >> ---------------------------------------------------- >> >> Specifically, we've been rolling this around 4 experienced numpy users >> and we all predicted at least one of the results below wrongly. >> >> This was what we knew, or should have known: >> >> In [2]: import numpy as np >> >> In [3]: arr = np.arange(10).reshape((2, 5)) >> >> In [5]: arr.ravel() >> Out[5]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >> >> So, the 'ravel' operation unravels over the last axis (1) first, >> followed by axis 0. >> >> So far so good (even if the opposite to MATLAB, Octave). >> >> Then we found the 'order' flag to ravel: >> >> In [10]: arr.flags >> Out[10]: >> C_CONTIGUOUS : True >> F_CONTIGUOUS : False >> OWNDATA : False >> WRITEABLE : True >> ALIGNED : True >> UPDATEIFCOPY : False >> >> In [11]: arr.ravel('C') >> Out[11]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >> >> But we soon got confused. How about this? >> >> In [12]: arr_F = np.array(arr, order='F') >> >> In [13]: arr_F.flags >> Out[13]: >> C_CONTIGUOUS : False >> F_CONTIGUOUS : True >> OWNDATA : True >> WRITEABLE : True >> ALIGNED : True >> UPDATEIFCOPY : False >> >> In [16]: arr_F >> Out[16]: >> array([[0, 1, 2, 3, 4], >> [5, 6, 7, 8, 9]]) >> >> In [17]: arr_F.ravel('C') >> Out[17]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >> >> Right - so the flag 'C' to ravel, has got nothing to do with *memory* >> ordering, but is to do with *index* ordering. >> >> And in fact, we can ask for memory ordering specifically: >> >> In [22]: arr.ravel('K') >> Out[22]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >> >> In [23]: arr_F.ravel('K') >> Out[23]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9]) >> >> In [24]: arr.ravel('A') >> Out[24]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >> >> In [25]: arr_F.ravel('A') >> Out[25]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9]) >> >> There are some confusions to get into with the 'order' flag to reshape >> as well, of the same type. >> >> Ravel and reshape use the tems 'C' and 'F" in the sense of index ordering. >> >> This is very confusing. We think the index ordering and memory >> ordering ideas need to be separated, and specifically, we should avoid >> using "C" and "F" to refer to index ordering. >> >> Proposal >> ------------- >> >> * Deprecate the use of "C" and "F" meaning backwards and forwards >> index ordering for ravel, reshape >> * Prefer "Z" and "N", being graphical representations of unraveling in >> 2 dimensions, axis1 first and axis0 first respectively (excellent >> naming idea by Paul Ivanov) >> >> What do y'all think? >> >> Cheers, >> >> Matthew >> Paul Ivanov >> JB Poline >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > I always thought "F" and "C" are easy to understand, I always thought about > the content and never about the memory when using it. > > In my numpy htmlhelp for version 1.5, I don't have a K or A option > >>>> np.__version__ > '1.5.1' > >>>> np.arange(5).ravel("K") > Traceback (most recent call last): > File "", line 1, in > TypeError: order not understood > >>>> np.arange(5).ravel("A") > array([0, 1, 2, 3, 4]) >>>> > > the C, F in ravel have their twins in reshape > >>>> arr = np.arange(10).reshape(2,5, order="C").copy() >>>> arr > array([[0, 1, 2, 3, 4], > [5, 6, 7, 8, 9]]) >>>> arr.ravel() > array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >>>> arr = np.arange(10).reshape(2,5, order="F").copy() >>>> arr > array([[0, 2, 4, 6, 8], > [1, 3, 5, 7, 9]]) >>>> arrarr.ravel("F") > array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) > > For example we use it when we get raveled arrays from R, > and F for column order and C for row order indexing are pretty > obvious names when coming from another package (Matlab, R, Gauss) just a quick search to get an idea in statsmodels 19 out of 135 ravel are ravel('F') 50 out of 270 reshapes specify: reshape.*order='F' (regular expression) Josef > > Josef From ivan.oseledets at gmail.com Sat Mar 30 14:01:38 2013 From: ivan.oseledets at gmail.com (Ivan Oseledets) Date: Sat, 30 Mar 2013 22:01:38 +0400 Subject: [Numpy-discussion] Indexing bug? Message-ID: I am using numpy 1.6.1, and encountered a wierd fancy indexing bug: import numpy as np c = np.random.randn(10,200,10); In [29]: print c[[0,1],:200,:2].shape (2, 200, 2) In [30]: print c[[0,1],:200,[0,1]].shape (2, 200) It means, that here fancy indexing is not working right for a 3d array. Is this bug fixed with higher versions of numpy? I do not check, since mine is from EPD and is compiled with MKL (and I can consider recompiling myself only under strong circumstances) Ivan From jaime.frio at gmail.com Sat Mar 30 14:13:35 2013 From: jaime.frio at gmail.com (=?ISO-8859-1?Q?Jaime_Fern=E1ndez_del_R=EDo?=) Date: Sat, 30 Mar 2013 11:13:35 -0700 Subject: [Numpy-discussion] Indexing bug? In-Reply-To: References: Message-ID: On Sat, Mar 30, 2013 at 11:01 AM, Ivan Oseledets wrote: > I am using numpy 1.6.1, > and encountered a wierd fancy indexing bug: > > import numpy as np > c = np.random.randn(10,200,10); > > In [29]: print c[[0,1],:200,:2].shape > (2, 200, 2) > > In [30]: print c[[0,1],:200,[0,1]].shape > (2, 200) > > It means, that here fancy indexing is not working right for a 3d array. > It is working fine, review the docs: http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#advanced-indexing In your return, item [0, :] is c[0, :, 0] and item[1, :]is c[1, :, 1]. If you want a return of shape (2, 200, 2) where item [i, :, j] is c[i, :, j] you could use slicing: c[:2, :200, :2] or something more elaborate like: c[np.arange(2)[:, None, None], np.arange(200)[:, None], np.arange(2)] Jaime > > Is this bug fixed with higher versions of numpy? > I do not check, since mine is from EPD and is compiled with MKL (and I > can consider recompiling myself only under strong circumstances) > > Ivan > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Sat Mar 30 14:55:23 2013 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Sat, 30 Mar 2013 19:55:23 +0100 Subject: [Numpy-discussion] Raveling, reshape order keyword unnecessarily confuses index and memory ordering In-Reply-To: References: Message-ID: <1364669723.2556.19.camel@sebastian-laptop> On Fri, 2013-03-29 at 19:08 -0700, Matthew Brett wrote: > Hi, > > We were teaching today, and found ourselves getting very confused > about ravel and shape in numpy. > > Summary > -------------- > > There are two separate ideas needed to understand ordering in ravel and reshape: > > Idea 1): ravel / reshape can proceed from the last axis to the first, > or the first to the last. This is "ravel index ordering" > Idea 2) The physical layout of the array (on disk or in memory) can be > "C" or "F" contiguous or neither. > This is "memory ordering" > > The index ordering is usually (but see below) orthogonal to the memory ordering. > > The 'ravel' and 'reshape' commands use "C" and "F" in the sense of > index ordering, and this mixes the two ideas and is confusing. > > What the current situation looks like > ---------------------------------------------------- > > Specifically, we've been rolling this around 4 experienced numpy users > and we all predicted at least one of the results below wrongly. > > This was what we knew, or should have known: > > In [2]: import numpy as np > > In [3]: arr = np.arange(10).reshape((2, 5)) > > In [5]: arr.ravel() > Out[5]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) > > So, the 'ravel' operation unravels over the last axis (1) first, > followed by axis 0. > > So far so good (even if the opposite to MATLAB, Octave). > > Then we found the 'order' flag to ravel: > > In [10]: arr.flags > Out[10]: > C_CONTIGUOUS : True > F_CONTIGUOUS : False > OWNDATA : False > WRITEABLE : True > ALIGNED : True > UPDATEIFCOPY : False > > In [11]: arr.ravel('C') > Out[11]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) > > But we soon got confused. How about this? > > In [12]: arr_F = np.array(arr, order='F') > > In [13]: arr_F.flags > Out[13]: > C_CONTIGUOUS : False > F_CONTIGUOUS : True > OWNDATA : True > WRITEABLE : True > ALIGNED : True > UPDATEIFCOPY : False > > In [16]: arr_F > Out[16]: > array([[0, 1, 2, 3, 4], > [5, 6, 7, 8, 9]]) > > In [17]: arr_F.ravel('C') > Out[17]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) > > Right - so the flag 'C' to ravel, has got nothing to do with *memory* > ordering, but is to do with *index* ordering. > > And in fact, we can ask for memory ordering specifically: > > In [22]: arr.ravel('K') > Out[22]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) > > In [23]: arr_F.ravel('K') > Out[23]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9]) > > In [24]: arr.ravel('A') > Out[24]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) > > In [25]: arr_F.ravel('A') > Out[25]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9]) > > There are some confusions to get into with the 'order' flag to reshape > as well, of the same type. > > Ravel and reshape use the tems 'C' and 'F" in the sense of index ordering. > > This is very confusing. We think the index ordering and memory > ordering ideas need to be separated, and specifically, we should avoid > using "C" and "F" to refer to index ordering. > > Proposal > ------------- > > * Deprecate the use of "C" and "F" meaning backwards and forwards > index ordering for ravel, reshape > * Prefer "Z" and "N", being graphical representations of unraveling in > 2 dimensions, axis1 first and axis0 first respectively (excellent > naming idea by Paul Ivanov) > > What do y'all think? > Personally I think it is clear enough and that "Z" and "N" would confuse me just as much (though I am used to the other names). Also "Z" and "N" would seem more like aliases, which would also make sense in the memory order context. If anything, I would prefer renaming the arguments iteration_order and memory_order, but it seems overdoing it... Maybe the documentation could just be checked if it is always clear though. I.e. maybe it does not use "iteration" or "memory" order consistently (though I somewhat feel it is usually clear that it must be iteration order, since no numpy function cares about the input memory order as they will just do a copy if necessary). Regards, Sebastian > Cheers, > > Matthew > Paul Ivanov > JB Poline > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From matthew.brett at gmail.com Sat Mar 30 15:45:52 2013 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 30 Mar 2013 12:45:52 -0700 Subject: [Numpy-discussion] Raveling, reshape order keyword unnecessarily confuses index and memory ordering In-Reply-To: <1364669723.2556.19.camel@sebastian-laptop> References: <1364669723.2556.19.camel@sebastian-laptop> Message-ID: Hi, On Sat, Mar 30, 2013 at 11:55 AM, Sebastian Berg wrote: > On Fri, 2013-03-29 at 19:08 -0700, Matthew Brett wrote: >> Hi, >> >> We were teaching today, and found ourselves getting very confused >> about ravel and shape in numpy. >> >> Summary >> -------------- >> >> There are two separate ideas needed to understand ordering in ravel and reshape: >> >> Idea 1): ravel / reshape can proceed from the last axis to the first, >> or the first to the last. This is "ravel index ordering" >> Idea 2) The physical layout of the array (on disk or in memory) can be >> "C" or "F" contiguous or neither. >> This is "memory ordering" >> >> The index ordering is usually (but see below) orthogonal to the memory ordering. >> >> The 'ravel' and 'reshape' commands use "C" and "F" in the sense of >> index ordering, and this mixes the two ideas and is confusing. >> >> What the current situation looks like >> ---------------------------------------------------- >> >> Specifically, we've been rolling this around 4 experienced numpy users >> and we all predicted at least one of the results below wrongly. >> >> This was what we knew, or should have known: >> >> In [2]: import numpy as np >> >> In [3]: arr = np.arange(10).reshape((2, 5)) >> >> In [5]: arr.ravel() >> Out[5]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >> >> So, the 'ravel' operation unravels over the last axis (1) first, >> followed by axis 0. >> >> So far so good (even if the opposite to MATLAB, Octave). >> >> Then we found the 'order' flag to ravel: >> >> In [10]: arr.flags >> Out[10]: >> C_CONTIGUOUS : True >> F_CONTIGUOUS : False >> OWNDATA : False >> WRITEABLE : True >> ALIGNED : True >> UPDATEIFCOPY : False >> >> In [11]: arr.ravel('C') >> Out[11]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >> >> But we soon got confused. How about this? >> >> In [12]: arr_F = np.array(arr, order='F') >> >> In [13]: arr_F.flags >> Out[13]: >> C_CONTIGUOUS : False >> F_CONTIGUOUS : True >> OWNDATA : True >> WRITEABLE : True >> ALIGNED : True >> UPDATEIFCOPY : False >> >> In [16]: arr_F >> Out[16]: >> array([[0, 1, 2, 3, 4], >> [5, 6, 7, 8, 9]]) >> >> In [17]: arr_F.ravel('C') >> Out[17]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >> >> Right - so the flag 'C' to ravel, has got nothing to do with *memory* >> ordering, but is to do with *index* ordering. >> >> And in fact, we can ask for memory ordering specifically: >> >> In [22]: arr.ravel('K') >> Out[22]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >> >> In [23]: arr_F.ravel('K') >> Out[23]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9]) >> >> In [24]: arr.ravel('A') >> Out[24]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >> >> In [25]: arr_F.ravel('A') >> Out[25]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9]) >> >> There are some confusions to get into with the 'order' flag to reshape >> as well, of the same type. >> >> Ravel and reshape use the tems 'C' and 'F" in the sense of index ordering. >> >> This is very confusing. We think the index ordering and memory >> ordering ideas need to be separated, and specifically, we should avoid >> using "C" and "F" to refer to index ordering. >> >> Proposal >> ------------- >> >> * Deprecate the use of "C" and "F" meaning backwards and forwards >> index ordering for ravel, reshape >> * Prefer "Z" and "N", being graphical representations of unraveling in >> 2 dimensions, axis1 first and axis0 first respectively (excellent >> naming idea by Paul Ivanov) >> >> What do y'all think? >> > > Personally I think it is clear enough and that "Z" and "N" would confuse > me just as much (though I am used to the other names). Also "Z" and "N" > would seem more like aliases, which would also make sense in the memory > order context. > If anything, I would prefer renaming the arguments iteration_order and > memory_order, but it seems overdoing it... I am not sure what you mean - at the moment there is one argument called 'order' that can refer to iteration order or memory order. Are you proposing two arguments? > Maybe the documentation could just be checked if it is always clear > though. I.e. maybe it does not use "iteration" or "memory" order > consistently (though I somewhat feel it is usually clear that it must be > iteration order, since no numpy function cares about the input memory > order as they will just do a copy if necessary). Do you really mean this? Numpy is full of 'order=' flags that refer to memory. Cheers, Matthew From matthew.brett at gmail.com Sat Mar 30 15:51:17 2013 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 30 Mar 2013 12:51:17 -0700 Subject: [Numpy-discussion] Raveling, reshape order keyword unnecessarily confuses index and memory ordering In-Reply-To: References: Message-ID: Hi, On Sat, Mar 30, 2013 at 4:14 AM, wrote: > On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett wrote: >> >> Hi, >> >> We were teaching today, and found ourselves getting very confused >> about ravel and shape in numpy. >> >> Summary >> -------------- >> >> There are two separate ideas needed to understand ordering in ravel and reshape: >> >> Idea 1): ravel / reshape can proceed from the last axis to the first, >> or the first to the last. This is "ravel index ordering" >> Idea 2) The physical layout of the array (on disk or in memory) can be >> "C" or "F" contiguous or neither. >> This is "memory ordering" >> >> The index ordering is usually (but see below) orthogonal to the memory ordering. >> >> The 'ravel' and 'reshape' commands use "C" and "F" in the sense of >> index ordering, and this mixes the two ideas and is confusing. >> >> What the current situation looks like >> ---------------------------------------------------- >> >> Specifically, we've been rolling this around 4 experienced numpy users >> and we all predicted at least one of the results below wrongly. >> >> This was what we knew, or should have known: >> >> In [2]: import numpy as np >> >> In [3]: arr = np.arange(10).reshape((2, 5)) >> >> In [5]: arr.ravel() >> Out[5]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >> >> So, the 'ravel' operation unravels over the last axis (1) first, >> followed by axis 0. >> >> So far so good (even if the opposite to MATLAB, Octave). >> >> Then we found the 'order' flag to ravel: >> >> In [10]: arr.flags >> Out[10]: >> C_CONTIGUOUS : True >> F_CONTIGUOUS : False >> OWNDATA : False >> WRITEABLE : True >> ALIGNED : True >> UPDATEIFCOPY : False >> >> In [11]: arr.ravel('C') >> Out[11]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >> >> But we soon got confused. How about this? >> >> In [12]: arr_F = np.array(arr, order='F') >> >> In [13]: arr_F.flags >> Out[13]: >> C_CONTIGUOUS : False >> F_CONTIGUOUS : True >> OWNDATA : True >> WRITEABLE : True >> ALIGNED : True >> UPDATEIFCOPY : False >> >> In [16]: arr_F >> Out[16]: >> array([[0, 1, 2, 3, 4], >> [5, 6, 7, 8, 9]]) >> >> In [17]: arr_F.ravel('C') >> Out[17]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >> >> Right - so the flag 'C' to ravel, has got nothing to do with *memory* >> ordering, but is to do with *index* ordering. >> >> And in fact, we can ask for memory ordering specifically: >> >> In [22]: arr.ravel('K') >> Out[22]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >> >> In [23]: arr_F.ravel('K') >> Out[23]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9]) >> >> In [24]: arr.ravel('A') >> Out[24]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >> >> In [25]: arr_F.ravel('A') >> Out[25]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9]) >> >> There are some confusions to get into with the 'order' flag to reshape >> as well, of the same type. >> >> Ravel and reshape use the tems 'C' and 'F" in the sense of index ordering. >> >> This is very confusing. We think the index ordering and memory >> ordering ideas need to be separated, and specifically, we should avoid >> using "C" and "F" to refer to index ordering. >> >> Proposal >> ------------- >> >> * Deprecate the use of "C" and "F" meaning backwards and forwards >> index ordering for ravel, reshape >> * Prefer "Z" and "N", being graphical representations of unraveling in >> 2 dimensions, axis1 first and axis0 first respectively (excellent >> naming idea by Paul Ivanov) >> >> What do y'all think? >> >> Cheers, >> >> Matthew >> Paul Ivanov >> JB Poline >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > I always thought "F" and "C" are easy to understand, I always thought about > the content and never about the memory when using it. I can only say that 4 out of 4 experienced numpy developers found themselves unable to predict the behavior of these functions before they saw the output. The problem is always that explaining something makes it clearer for a moment, but, for those who do not have the explanation or who have forgotten it, at least among us here, the outputs were generating groans and / or high fives as we incorrectly or correctly guessed what was going to happen. I think the only way to find out whether this really is confusing or not, is to put someone in front of these functions without any explanation and ask them to predict what is going to come out of the various inputs and flags. Or to try and teach it, which was the problem we were having. Cheers, Matthew From josef.pktd at gmail.com Sat Mar 30 16:57:36 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 30 Mar 2013 16:57:36 -0400 Subject: [Numpy-discussion] Raveling, reshape order keyword unnecessarily confuses index and memory ordering In-Reply-To: References: Message-ID: On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett wrote: > Hi, > > On Sat, Mar 30, 2013 at 4:14 AM, wrote: >> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett wrote: >>> >>> Hi, >>> >>> We were teaching today, and found ourselves getting very confused >>> about ravel and shape in numpy. >>> >>> Summary >>> -------------- >>> >>> There are two separate ideas needed to understand ordering in ravel and reshape: >>> >>> Idea 1): ravel / reshape can proceed from the last axis to the first, >>> or the first to the last. This is "ravel index ordering" >>> Idea 2) The physical layout of the array (on disk or in memory) can be >>> "C" or "F" contiguous or neither. >>> This is "memory ordering" >>> >>> The index ordering is usually (but see below) orthogonal to the memory ordering. >>> >>> The 'ravel' and 'reshape' commands use "C" and "F" in the sense of >>> index ordering, and this mixes the two ideas and is confusing. >>> >>> What the current situation looks like >>> ---------------------------------------------------- >>> >>> Specifically, we've been rolling this around 4 experienced numpy users >>> and we all predicted at least one of the results below wrongly. >>> >>> This was what we knew, or should have known: >>> >>> In [2]: import numpy as np >>> >>> In [3]: arr = np.arange(10).reshape((2, 5)) >>> >>> In [5]: arr.ravel() >>> Out[5]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >>> >>> So, the 'ravel' operation unravels over the last axis (1) first, >>> followed by axis 0. >>> >>> So far so good (even if the opposite to MATLAB, Octave). >>> >>> Then we found the 'order' flag to ravel: >>> >>> In [10]: arr.flags >>> Out[10]: >>> C_CONTIGUOUS : True >>> F_CONTIGUOUS : False >>> OWNDATA : False >>> WRITEABLE : True >>> ALIGNED : True >>> UPDATEIFCOPY : False >>> >>> In [11]: arr.ravel('C') >>> Out[11]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >>> >>> But we soon got confused. How about this? >>> >>> In [12]: arr_F = np.array(arr, order='F') >>> >>> In [13]: arr_F.flags >>> Out[13]: >>> C_CONTIGUOUS : False >>> F_CONTIGUOUS : True >>> OWNDATA : True >>> WRITEABLE : True >>> ALIGNED : True >>> UPDATEIFCOPY : False >>> >>> In [16]: arr_F >>> Out[16]: >>> array([[0, 1, 2, 3, 4], >>> [5, 6, 7, 8, 9]]) >>> >>> In [17]: arr_F.ravel('C') >>> Out[17]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >>> >>> Right - so the flag 'C' to ravel, has got nothing to do with *memory* >>> ordering, but is to do with *index* ordering. >>> >>> And in fact, we can ask for memory ordering specifically: >>> >>> In [22]: arr.ravel('K') >>> Out[22]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >>> >>> In [23]: arr_F.ravel('K') >>> Out[23]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9]) >>> >>> In [24]: arr.ravel('A') >>> Out[24]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >>> >>> In [25]: arr_F.ravel('A') >>> Out[25]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9]) >>> >>> There are some confusions to get into with the 'order' flag to reshape >>> as well, of the same type. >>> >>> Ravel and reshape use the tems 'C' and 'F" in the sense of index ordering. >>> >>> This is very confusing. We think the index ordering and memory >>> ordering ideas need to be separated, and specifically, we should avoid >>> using "C" and "F" to refer to index ordering. >>> >>> Proposal >>> ------------- >>> >>> * Deprecate the use of "C" and "F" meaning backwards and forwards >>> index ordering for ravel, reshape >>> * Prefer "Z" and "N", being graphical representations of unraveling in >>> 2 dimensions, axis1 first and axis0 first respectively (excellent >>> naming idea by Paul Ivanov) >>> >>> What do y'all think? >>> >>> Cheers, >>> >>> Matthew >>> Paul Ivanov >>> JB Poline >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> >> I always thought "F" and "C" are easy to understand, I always thought about >> the content and never about the memory when using it. > > I can only say that 4 out of 4 experienced numpy developers found > themselves unable to predict the behavior of these functions before > they saw the output. > > The problem is always that explaining something makes it clearer for a > moment, but, for those who do not have the explanation or who have > forgotten it, at least among us here, the outputs were generating > groans and / or high fives as we incorrectly or correctly guessed what > was going to happen. > > I think the only way to find out whether this really is confusing or > not, is to put someone in front of these functions without any > explanation and ask them to predict what is going to come out of the > various inputs and flags. Or to try and teach it, which was the > problem we were having. changing the names doesn't make it easier to understand. I think the confusion is because the new A and K refer to existing memory ``ravel`` is just stacking columns ('F') or stacking rows ('C'), I don't remember having seen any weird cases. ------------ I always thought of "order" in array creation is the way we want to have the memory layout of the *target* array and has nothing to do with existing memory layout (creating view or copy as needed). reshape, and ravel are *views* if possible, memory might just be some weird strides (and can be ignored unless you want to do some memory optimization, keeping track of the memory is difficult. I don't think I will start to use A and K after upgrading numpy.) >>> a1 = np.ones((10,4)) not contiguous >>> arr2 = a1[:, 2:4] >>> arr2.flags C_CONTIGUOUS : False F_CONTIGUOUS : False OWNDATA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False stack columns (needs to make a copy) >>> arr3 = arr2.ravel('F') >>> arr3.flags C_CONTIGUOUS : True F_CONTIGUOUS : True OWNDATA : True WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False stack columns or rows with reshape (I have no idea what it did with the memory) >>> arr2.reshape(-1,1).flags C_CONTIGUOUS : True F_CONTIGUOUS : False OWNDATA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False >>> arr2.reshape(-1,1, order='F').flags C_CONTIGUOUS : False F_CONTIGUOUS : True OWNDATA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False >>> arr2.reshape(-1, order='F').flags C_CONTIGUOUS : True F_CONTIGUOUS : True OWNDATA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False ------------------- one case where I do pay attention to memory layout is column slicing >>> arr = np.ones((10, 5), order='F') >>> for i in range(1, 5): print arr[:, :i+2].ravel('C').flags['OWNDATA'] ??? >>> for i in range(1,5): print arr[:, :i+2].ravel('F').flags['OWNDATA'] ??? Josef > > Cheers, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From josef.pktd at gmail.com Sat Mar 30 17:20:19 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 30 Mar 2013 17:20:19 -0400 Subject: [Numpy-discussion] Raveling, reshape order keyword unnecessarily confuses index and memory ordering In-Reply-To: References: Message-ID: On Sat, Mar 30, 2013 at 4:57 PM, wrote: > On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett wrote: >> Hi, >> >> On Sat, Mar 30, 2013 at 4:14 AM, wrote: >>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett wrote: >>>> >>>> Hi, >>>> >>>> We were teaching today, and found ourselves getting very confused >>>> about ravel and shape in numpy. >>>> >>>> Summary >>>> -------------- >>>> >>>> There are two separate ideas needed to understand ordering in ravel and reshape: >>>> >>>> Idea 1): ravel / reshape can proceed from the last axis to the first, >>>> or the first to the last. This is "ravel index ordering" >>>> Idea 2) The physical layout of the array (on disk or in memory) can be >>>> "C" or "F" contiguous or neither. >>>> This is "memory ordering" >>>> >>>> The index ordering is usually (but see below) orthogonal to the memory ordering. >>>> >>>> The 'ravel' and 'reshape' commands use "C" and "F" in the sense of >>>> index ordering, and this mixes the two ideas and is confusing. >>>> >>>> What the current situation looks like >>>> ---------------------------------------------------- >>>> >>>> Specifically, we've been rolling this around 4 experienced numpy users >>>> and we all predicted at least one of the results below wrongly. >>>> >>>> This was what we knew, or should have known: >>>> >>>> In [2]: import numpy as np >>>> >>>> In [3]: arr = np.arange(10).reshape((2, 5)) >>>> >>>> In [5]: arr.ravel() >>>> Out[5]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >>>> >>>> So, the 'ravel' operation unravels over the last axis (1) first, >>>> followed by axis 0. >>>> >>>> So far so good (even if the opposite to MATLAB, Octave). >>>> >>>> Then we found the 'order' flag to ravel: >>>> >>>> In [10]: arr.flags >>>> Out[10]: >>>> C_CONTIGUOUS : True >>>> F_CONTIGUOUS : False >>>> OWNDATA : False >>>> WRITEABLE : True >>>> ALIGNED : True >>>> UPDATEIFCOPY : False >>>> >>>> In [11]: arr.ravel('C') >>>> Out[11]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >>>> >>>> But we soon got confused. How about this? >>>> >>>> In [12]: arr_F = np.array(arr, order='F') >>>> >>>> In [13]: arr_F.flags >>>> Out[13]: >>>> C_CONTIGUOUS : False >>>> F_CONTIGUOUS : True >>>> OWNDATA : True >>>> WRITEABLE : True >>>> ALIGNED : True >>>> UPDATEIFCOPY : False >>>> >>>> In [16]: arr_F >>>> Out[16]: >>>> array([[0, 1, 2, 3, 4], >>>> [5, 6, 7, 8, 9]]) >>>> >>>> In [17]: arr_F.ravel('C') >>>> Out[17]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >>>> >>>> Right - so the flag 'C' to ravel, has got nothing to do with *memory* >>>> ordering, but is to do with *index* ordering. >>>> >>>> And in fact, we can ask for memory ordering specifically: >>>> >>>> In [22]: arr.ravel('K') >>>> Out[22]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >>>> >>>> In [23]: arr_F.ravel('K') >>>> Out[23]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9]) >>>> >>>> In [24]: arr.ravel('A') >>>> Out[24]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >>>> >>>> In [25]: arr_F.ravel('A') >>>> Out[25]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9]) >>>> >>>> There are some confusions to get into with the 'order' flag to reshape >>>> as well, of the same type. >>>> >>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index ordering. >>>> >>>> This is very confusing. We think the index ordering and memory >>>> ordering ideas need to be separated, and specifically, we should avoid >>>> using "C" and "F" to refer to index ordering. >>>> >>>> Proposal >>>> ------------- >>>> >>>> * Deprecate the use of "C" and "F" meaning backwards and forwards >>>> index ordering for ravel, reshape >>>> * Prefer "Z" and "N", being graphical representations of unraveling in >>>> 2 dimensions, axis1 first and axis0 first respectively (excellent >>>> naming idea by Paul Ivanov) >>>> >>>> What do y'all think? >>>> >>>> Cheers, >>>> >>>> Matthew >>>> Paul Ivanov >>>> JB Poline >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >>> >>> I always thought "F" and "C" are easy to understand, I always thought about >>> the content and never about the memory when using it. >> >> I can only say that 4 out of 4 experienced numpy developers found >> themselves unable to predict the behavior of these functions before >> they saw the output. >> >> The problem is always that explaining something makes it clearer for a >> moment, but, for those who do not have the explanation or who have >> forgotten it, at least among us here, the outputs were generating >> groans and / or high fives as we incorrectly or correctly guessed what >> was going to happen. >> >> I think the only way to find out whether this really is confusing or >> not, is to put someone in front of these functions without any >> explanation and ask them to predict what is going to come out of the >> various inputs and flags. Or to try and teach it, which was the >> problem we were having. > > changing the names doesn't make it easier to understand. > I think the confusion is because the new A and K refer to existing memory > > > ``ravel`` is just stacking columns ('F') or stacking rows ('C'), I > don't remember having seen any weird cases. example from our statistics use: rows are observations/time periods, columns are variables/individuals using "F" or "C", we can stack either by time-periods (observations) or individuals (cross-section units) that's easy to understand. "A" and "K" are pretty useless for us, because we don't know which stacking we would get (we don't try to control the memory layout) The only reason to use "A" or "K", in my opinion, is to use the existing memory efficiently. Since the order in the array is unpredictable, it only makes sense if we don't care about it, for example when we only have elementwise operations. Josef From matthew.brett at gmail.com Sat Mar 30 18:19:33 2013 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 30 Mar 2013 15:19:33 -0700 Subject: [Numpy-discussion] Raveling, reshape order keyword unnecessarily confuses index and memory ordering In-Reply-To: References: Message-ID: Hi, On Sat, Mar 30, 2013 at 1:57 PM, wrote: > On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett wrote: >> Hi, >> >> On Sat, Mar 30, 2013 at 4:14 AM, wrote: >>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett wrote: >>>> >>>> Hi, >>>> >>>> We were teaching today, and found ourselves getting very confused >>>> about ravel and shape in numpy. >>>> >>>> Summary >>>> -------------- >>>> >>>> There are two separate ideas needed to understand ordering in ravel and reshape: >>>> >>>> Idea 1): ravel / reshape can proceed from the last axis to the first, >>>> or the first to the last. This is "ravel index ordering" >>>> Idea 2) The physical layout of the array (on disk or in memory) can be >>>> "C" or "F" contiguous or neither. >>>> This is "memory ordering" >>>> >>>> The index ordering is usually (but see below) orthogonal to the memory ordering. >>>> >>>> The 'ravel' and 'reshape' commands use "C" and "F" in the sense of >>>> index ordering, and this mixes the two ideas and is confusing. >>>> >>>> What the current situation looks like >>>> ---------------------------------------------------- >>>> >>>> Specifically, we've been rolling this around 4 experienced numpy users >>>> and we all predicted at least one of the results below wrongly. >>>> >>>> This was what we knew, or should have known: >>>> >>>> In [2]: import numpy as np >>>> >>>> In [3]: arr = np.arange(10).reshape((2, 5)) >>>> >>>> In [5]: arr.ravel() >>>> Out[5]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >>>> >>>> So, the 'ravel' operation unravels over the last axis (1) first, >>>> followed by axis 0. >>>> >>>> So far so good (even if the opposite to MATLAB, Octave). >>>> >>>> Then we found the 'order' flag to ravel: >>>> >>>> In [10]: arr.flags >>>> Out[10]: >>>> C_CONTIGUOUS : True >>>> F_CONTIGUOUS : False >>>> OWNDATA : False >>>> WRITEABLE : True >>>> ALIGNED : True >>>> UPDATEIFCOPY : False >>>> >>>> In [11]: arr.ravel('C') >>>> Out[11]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >>>> >>>> But we soon got confused. How about this? >>>> >>>> In [12]: arr_F = np.array(arr, order='F') >>>> >>>> In [13]: arr_F.flags >>>> Out[13]: >>>> C_CONTIGUOUS : False >>>> F_CONTIGUOUS : True >>>> OWNDATA : True >>>> WRITEABLE : True >>>> ALIGNED : True >>>> UPDATEIFCOPY : False >>>> >>>> In [16]: arr_F >>>> Out[16]: >>>> array([[0, 1, 2, 3, 4], >>>> [5, 6, 7, 8, 9]]) >>>> >>>> In [17]: arr_F.ravel('C') >>>> Out[17]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >>>> >>>> Right - so the flag 'C' to ravel, has got nothing to do with *memory* >>>> ordering, but is to do with *index* ordering. >>>> >>>> And in fact, we can ask for memory ordering specifically: >>>> >>>> In [22]: arr.ravel('K') >>>> Out[22]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >>>> >>>> In [23]: arr_F.ravel('K') >>>> Out[23]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9]) >>>> >>>> In [24]: arr.ravel('A') >>>> Out[24]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >>>> >>>> In [25]: arr_F.ravel('A') >>>> Out[25]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9]) >>>> >>>> There are some confusions to get into with the 'order' flag to reshape >>>> as well, of the same type. >>>> >>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index ordering. >>>> >>>> This is very confusing. We think the index ordering and memory >>>> ordering ideas need to be separated, and specifically, we should avoid >>>> using "C" and "F" to refer to index ordering. >>>> >>>> Proposal >>>> ------------- >>>> >>>> * Deprecate the use of "C" and "F" meaning backwards and forwards >>>> index ordering for ravel, reshape >>>> * Prefer "Z" and "N", being graphical representations of unraveling in >>>> 2 dimensions, axis1 first and axis0 first respectively (excellent >>>> naming idea by Paul Ivanov) >>>> >>>> What do y'all think? >>>> >>>> Cheers, >>>> >>>> Matthew >>>> Paul Ivanov >>>> JB Poline >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >>> >>> I always thought "F" and "C" are easy to understand, I always thought about >>> the content and never about the memory when using it. >> >> I can only say that 4 out of 4 experienced numpy developers found >> themselves unable to predict the behavior of these functions before >> they saw the output. >> >> The problem is always that explaining something makes it clearer for a >> moment, but, for those who do not have the explanation or who have >> forgotten it, at least among us here, the outputs were generating >> groans and / or high fives as we incorrectly or correctly guessed what >> was going to happen. >> >> I think the only way to find out whether this really is confusing or >> not, is to put someone in front of these functions without any >> explanation and ask them to predict what is going to come out of the >> various inputs and flags. Or to try and teach it, which was the >> problem we were having. > > changing the names doesn't make it easier to understand. > I think the confusion is because the new A and K refer to existing memory > > > ``ravel`` is just stacking columns ('F') or stacking rows ('C'), I > don't remember having seen any weird cases. > ------------ > > I always thought of "order" in array creation is the way we want to > have the memory layout of the *target* array and has nothing to do > with existing memory layout (creating view or copy as needed). In the case of ravel of course F and C in memory aren't relevant. 'F' and 'C' don't refer to target memory layout at all in 'reshape': In [26]: a = np.arange(10).reshape((2, 5)) In [28]: a.reshape((2, 5), order='F').flags Out[28]: C_CONTIGUOUS : True F_CONTIGUOUS : False OWNDATA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False So I think that distinction actively confusing in this case, and more evidence that this is not the right name for what we mean. Cheers, Matthew From matthew.brett at gmail.com Sat Mar 30 18:21:45 2013 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 30 Mar 2013 15:21:45 -0700 Subject: [Numpy-discussion] Raveling, reshape order keyword unnecessarily confuses index and memory ordering In-Reply-To: References: Message-ID: Hi, On Sat, Mar 30, 2013 at 2:20 PM, wrote: > On Sat, Mar 30, 2013 at 4:57 PM, wrote: >> On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett wrote: >>> Hi, >>> >>> On Sat, Mar 30, 2013 at 4:14 AM, wrote: >>>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett wrote: >>>>> >>>>> Hi, >>>>> >>>>> We were teaching today, and found ourselves getting very confused >>>>> about ravel and shape in numpy. >>>>> >>>>> Summary >>>>> -------------- >>>>> >>>>> There are two separate ideas needed to understand ordering in ravel and reshape: >>>>> >>>>> Idea 1): ravel / reshape can proceed from the last axis to the first, >>>>> or the first to the last. This is "ravel index ordering" >>>>> Idea 2) The physical layout of the array (on disk or in memory) can be >>>>> "C" or "F" contiguous or neither. >>>>> This is "memory ordering" >>>>> >>>>> The index ordering is usually (but see below) orthogonal to the memory ordering. >>>>> >>>>> The 'ravel' and 'reshape' commands use "C" and "F" in the sense of >>>>> index ordering, and this mixes the two ideas and is confusing. >>>>> >>>>> What the current situation looks like >>>>> ---------------------------------------------------- >>>>> >>>>> Specifically, we've been rolling this around 4 experienced numpy users >>>>> and we all predicted at least one of the results below wrongly. >>>>> >>>>> This was what we knew, or should have known: >>>>> >>>>> In [2]: import numpy as np >>>>> >>>>> In [3]: arr = np.arange(10).reshape((2, 5)) >>>>> >>>>> In [5]: arr.ravel() >>>>> Out[5]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >>>>> >>>>> So, the 'ravel' operation unravels over the last axis (1) first, >>>>> followed by axis 0. >>>>> >>>>> So far so good (even if the opposite to MATLAB, Octave). >>>>> >>>>> Then we found the 'order' flag to ravel: >>>>> >>>>> In [10]: arr.flags >>>>> Out[10]: >>>>> C_CONTIGUOUS : True >>>>> F_CONTIGUOUS : False >>>>> OWNDATA : False >>>>> WRITEABLE : True >>>>> ALIGNED : True >>>>> UPDATEIFCOPY : False >>>>> >>>>> In [11]: arr.ravel('C') >>>>> Out[11]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >>>>> >>>>> But we soon got confused. How about this? >>>>> >>>>> In [12]: arr_F = np.array(arr, order='F') >>>>> >>>>> In [13]: arr_F.flags >>>>> Out[13]: >>>>> C_CONTIGUOUS : False >>>>> F_CONTIGUOUS : True >>>>> OWNDATA : True >>>>> WRITEABLE : True >>>>> ALIGNED : True >>>>> UPDATEIFCOPY : False >>>>> >>>>> In [16]: arr_F >>>>> Out[16]: >>>>> array([[0, 1, 2, 3, 4], >>>>> [5, 6, 7, 8, 9]]) >>>>> >>>>> In [17]: arr_F.ravel('C') >>>>> Out[17]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >>>>> >>>>> Right - so the flag 'C' to ravel, has got nothing to do with *memory* >>>>> ordering, but is to do with *index* ordering. >>>>> >>>>> And in fact, we can ask for memory ordering specifically: >>>>> >>>>> In [22]: arr.ravel('K') >>>>> Out[22]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >>>>> >>>>> In [23]: arr_F.ravel('K') >>>>> Out[23]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9]) >>>>> >>>>> In [24]: arr.ravel('A') >>>>> Out[24]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >>>>> >>>>> In [25]: arr_F.ravel('A') >>>>> Out[25]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9]) >>>>> >>>>> There are some confusions to get into with the 'order' flag to reshape >>>>> as well, of the same type. >>>>> >>>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index ordering. >>>>> >>>>> This is very confusing. We think the index ordering and memory >>>>> ordering ideas need to be separated, and specifically, we should avoid >>>>> using "C" and "F" to refer to index ordering. >>>>> >>>>> Proposal >>>>> ------------- >>>>> >>>>> * Deprecate the use of "C" and "F" meaning backwards and forwards >>>>> index ordering for ravel, reshape >>>>> * Prefer "Z" and "N", being graphical representations of unraveling in >>>>> 2 dimensions, axis1 first and axis0 first respectively (excellent >>>>> naming idea by Paul Ivanov) >>>>> >>>>> What do y'all think? >>>>> >>>>> Cheers, >>>>> >>>>> Matthew >>>>> Paul Ivanov >>>>> JB Poline >>>>> _______________________________________________ >>>>> NumPy-Discussion mailing list >>>>> NumPy-Discussion at scipy.org >>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>> >>>> >>>> >>>> I always thought "F" and "C" are easy to understand, I always thought about >>>> the content and never about the memory when using it. >>> >>> I can only say that 4 out of 4 experienced numpy developers found >>> themselves unable to predict the behavior of these functions before >>> they saw the output. >>> >>> The problem is always that explaining something makes it clearer for a >>> moment, but, for those who do not have the explanation or who have >>> forgotten it, at least among us here, the outputs were generating >>> groans and / or high fives as we incorrectly or correctly guessed what >>> was going to happen. >>> >>> I think the only way to find out whether this really is confusing or >>> not, is to put someone in front of these functions without any >>> explanation and ask them to predict what is going to come out of the >>> various inputs and flags. Or to try and teach it, which was the >>> problem we were having. >> >> changing the names doesn't make it easier to understand. >> I think the confusion is because the new A and K refer to existing memory >> >> >> ``ravel`` is just stacking columns ('F') or stacking rows ('C'), I >> don't remember having seen any weird cases. > > example from our statistics use: > rows are observations/time periods, columns are variables/individuals > > using "F" or "C", we can stack either by time-periods (observations) > or individuals (cross-section units) > that's easy to understand. I disagree, I think it's confusing, but I have evidence, and that is that four out of four of us tested ourselves and got it wrong. Perhaps we are particularly dumb or poorly informed, but I think it's rash to assert there is no problem here. Cheers, Matthew From sebastian at sipsolutions.net Sat Mar 30 19:28:49 2013 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Sun, 31 Mar 2013 00:28:49 +0100 Subject: [Numpy-discussion] Raveling, reshape order keyword unnecessarily confuses index and memory ordering In-Reply-To: References: <1364669723.2556.19.camel@sebastian-laptop> Message-ID: <1364686129.2556.65.camel@sebastian-laptop> On Sat, 2013-03-30 at 12:45 -0700, Matthew Brett wrote: > Hi, > > On Sat, Mar 30, 2013 at 11:55 AM, Sebastian Berg > wrote: > > On Fri, 2013-03-29 at 19:08 -0700, Matthew Brett wrote: > >> Hi, > >> > >> We were teaching today, and found ourselves getting very confused > >> about ravel and shape in numpy. > >> > >> > >> What do y'all think? > >> > > > > Personally I think it is clear enough and that "Z" and "N" would confuse > > me just as much (though I am used to the other names). Also "Z" and "N" > > would seem more like aliases, which would also make sense in the memory > > order context. > > If anything, I would prefer renaming the arguments iteration_order and > > memory_order, but it seems overdoing it... > > I am not sure what you mean - at the moment there is one argument > called 'order' that can refer to iteration order or memory order. Are > you proposing two arguments? > Yes that is what I meant. The reason that it is not convincing to me is that if I write `np.reshape(arr, ..., order='Z')`, I may be tempted to also write `np.copy(arr, order='Z')`. I don't see anything against allowing 'Z' as a more memorable 'C' (I also used to forget which was which), but I don't really see enforcing a different _value_ on the same named argument making it clearer. Renaming the argument itself would seem more sensible to me right now, but I cannot think of a decent name, so I would prefer trying to clarify the documentation if necessary. > > Maybe the documentation could just be checked if it is always clear > > though. I.e. maybe it does not use "iteration" or "memory" order > > consistently (though I somewhat feel it is usually clear that it must be > > iteration order, since no numpy function cares about the input memory > > order as they will just do a copy if necessary). > > Do you really mean this? Numpy is full of 'order=' flags that refer to memory. > I somewhat imagined there were more iteration order flags and I basically count empty/ones/.../copy as basically one "array creation" monster... > Cheers, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From brad.froehle at gmail.com Sat Mar 30 19:31:53 2013 From: brad.froehle at gmail.com (Bradley M. Froehle) Date: Sat, 30 Mar 2013 16:31:53 -0700 Subject: [Numpy-discussion] Raveling, reshape order keyword unnecessarily confuses index and memory ordering In-Reply-To: References: Message-ID: On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett wrote: > On Sat, Mar 30, 2013 at 2:20 PM, wrote: > > On Sat, Mar 30, 2013 at 4:57 PM, wrote: > >> On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett > wrote: > >>> On Sat, Mar 30, 2013 at 4:14 AM, wrote: > >>>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett < > matthew.brett at gmail.com> wrote: > >>>>> > >>>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index > ordering. > >>>>> > >>>>> This is very confusing. We think the index ordering and memory > >>>>> ordering ideas need to be separated, and specifically, we should > avoid > >>>>> using "C" and "F" to refer to index ordering. > >>>>> > >>>>> Proposal > >>>>> ------------- > >>>>> > >>>>> * Deprecate the use of "C" and "F" meaning backwards and forwards > >>>>> index ordering for ravel, reshape > >>>>> * Prefer "Z" and "N", being graphical representations of unraveling > in > >>>>> 2 dimensions, axis1 first and axis0 first respectively (excellent > >>>>> naming idea by Paul Ivanov) > >>>>> > >>>>> What do y'all think? > >>>> > >>>> I always thought "F" and "C" are easy to understand, I always thought > about > >>>> the content and never about the memory when using it. > >> > >> changing the names doesn't make it easier to understand. > >> I think the confusion is because the new A and K refer to existing > memory > >> > > I disagree, I think it's confusing, but I have evidence, and that is > that four out of four of us tested ourselves and got it wrong. > > Perhaps we are particularly dumb or poorly informed, but I think it's > rash to assert there is no problem here. > I got all four correct. I think the concept --- at least for ravel --- is pretty simple: would you like to read the data off in C ordering or Fortran ordering. Since the output array is one-dimensional, its ordering is irrelevant. I don't understand the 'Z' / 'N' suggestion at all. Are they part of some pneumonic? I'd STRONGLY advise against deprecating the 'F' and 'C' options. NumPy already suffers from too much bikeshedding with names --- I rarely am able to pull out a script I wrote using NumPy even a few years ago and have it immediately work. Cheers, Brad -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Sat Mar 30 19:42:38 2013 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 30 Mar 2013 16:42:38 -0700 Subject: [Numpy-discussion] Raveling, reshape order keyword unnecessarily confuses index and memory ordering In-Reply-To: References: Message-ID: Hi, On Sat, Mar 30, 2013 at 4:31 PM, Bradley M. Froehle wrote: > On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett > wrote: >> >> On Sat, Mar 30, 2013 at 2:20 PM, wrote: >> > On Sat, Mar 30, 2013 at 4:57 PM, wrote: >> >> On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett >> >> wrote: >> >>> On Sat, Mar 30, 2013 at 4:14 AM, wrote: >> >>>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett >> >>>> wrote: >> >>>>> >> >>>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index >> >>>>> ordering. >> >>>>> >> >>>>> This is very confusing. We think the index ordering and memory >> >>>>> ordering ideas need to be separated, and specifically, we should >> >>>>> avoid >> >>>>> using "C" and "F" to refer to index ordering. >> >>>>> >> >>>>> Proposal >> >>>>> ------------- >> >>>>> >> >>>>> * Deprecate the use of "C" and "F" meaning backwards and forwards >> >>>>> index ordering for ravel, reshape >> >>>>> * Prefer "Z" and "N", being graphical representations of unraveling >> >>>>> in >> >>>>> 2 dimensions, axis1 first and axis0 first respectively (excellent >> >>>>> naming idea by Paul Ivanov) >> >>>>> >> >>>>> What do y'all think? >> >>>> >> >>>> I always thought "F" and "C" are easy to understand, I always thought >> >>>> about >> >>>> the content and never about the memory when using it. >> >> >> >> changing the names doesn't make it easier to understand. >> >> I think the confusion is because the new A and K refer to existing >> >> memory >> >> >> >> I disagree, I think it's confusing, but I have evidence, and that is >> that four out of four of us tested ourselves and got it wrong. >> >> Perhaps we are particularly dumb or poorly informed, but I think it's >> rash to assert there is no problem here. > > > I got all four correct. Then you are smarted and or better informed than we were. I hope you didn't read my explanation before you tested yourself. Of course if you did read my email first I'd expect you and I to get the answer right first time. If you didn't read my email first, and didn't think too hard about it, and still got all the examples right, and you'd get other more confusing examples right that use reshape, then I'd add you as a data point on the other side to the four data points we got yesterday. > I think the concept --- at least for ravel --- is > pretty simple: would you like to read the data off in C ordering or Fortran > ordering. Since the output array is one-dimensional, its ordering is > irrelevant. Right - hence my confidence that Josef's sense of thinking of the 'C' and 'F' being target array output was not a good way to think of it in this case. It is in the case of arr.tostring() though. > I don't understand the 'Z' / 'N' suggestion at all. Are they part of some > pneumonic? Think of the way you'd read off the elements using reverse (last-first) index order for a 2D array, you might imagine something like a Z. > I'd STRONGLY advise against deprecating the 'F' and 'C' options. NumPy > already suffers from too much bikeshedding with names --- I rarely am able > to pull out a script I wrote using NumPy even a few years ago and have it > immediately work. I wish we could drop bike-shedding - it's a completely useless word because one person's bike-shedding is another person's necessary clarification. You think this clarification isn't necessary and you think this discussion is bike-shedding. I'm not suggesting dropping the 'F' and 'C', obviously - can I call that a 'straw man'? I am suggesting changing the name to something much clearer, leaving that name clearly explained in the docs, and leaving 'C' and 'F" as functional synonyms for a very long time. Cheers, Matthew From josef.pktd at gmail.com Sat Mar 30 19:50:53 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 30 Mar 2013 19:50:53 -0400 Subject: [Numpy-discussion] Raveling, reshape order keyword unnecessarily confuses index and memory ordering In-Reply-To: References: Message-ID: On Sat, Mar 30, 2013 at 7:31 PM, Bradley M. Froehle wrote: > On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett > wrote: >> >> On Sat, Mar 30, 2013 at 2:20 PM, wrote: >> > On Sat, Mar 30, 2013 at 4:57 PM, wrote: >> >> On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett >> >> wrote: >> >>> On Sat, Mar 30, 2013 at 4:14 AM, wrote: >> >>>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett >> >>>> wrote: >> >>>>> >> >>>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index >> >>>>> ordering. >> >>>>> >> >>>>> This is very confusing. We think the index ordering and memory >> >>>>> ordering ideas need to be separated, and specifically, we should >> >>>>> avoid >> >>>>> using "C" and "F" to refer to index ordering. >> >>>>> >> >>>>> Proposal >> >>>>> ------------- >> >>>>> >> >>>>> * Deprecate the use of "C" and "F" meaning backwards and forwards >> >>>>> index ordering for ravel, reshape >> >>>>> * Prefer "Z" and "N", being graphical representations of unraveling >> >>>>> in >> >>>>> 2 dimensions, axis1 first and axis0 first respectively (excellent >> >>>>> naming idea by Paul Ivanov) >> >>>>> >> >>>>> What do y'all think? >> >>>> >> >>>> I always thought "F" and "C" are easy to understand, I always thought >> >>>> about >> >>>> the content and never about the memory when using it. >> >> >> >> changing the names doesn't make it easier to understand. >> >> I think the confusion is because the new A and K refer to existing >> >> memory >> >> >> >> I disagree, I think it's confusing, but I have evidence, and that is >> that four out of four of us tested ourselves and got it wrong. >> >> Perhaps we are particularly dumb or poorly informed, but I think it's >> rash to assert there is no problem here. I think you are overcomplicating things or phrased it as a "trick question" ravel F and C have *nothing* to do with memory layout. I think it's not confusing for beginners that have no idea and never think about memory layout. I've never seen any problems with it in statsmodels and I have seen many developers (GSOC) that are pretty new to python and numpy. (I didn't check the repo history to verify, so IIRC) Even if N, Z were clearer in this case (which I don't think it is and which I have no idea what it should stand for), you would have to go for every use of ``order`` in numpy to check whether it should be N or F or Z or C, and then users would have to check which order name convention is used in a specific function. Josef > > > I got all four correct. I think the concept --- at least for ravel --- is > pretty simple: would you like to read the data off in C ordering or Fortran > ordering. Since the output array is one-dimensional, its ordering is > irrelevant. > > I don't understand the 'Z' / 'N' suggestion at all. Are they part of some > pneumonic? > > I'd STRONGLY advise against deprecating the 'F' and 'C' options. NumPy > already suffers from too much bikeshedding with names --- I rarely am able > to pull out a script I wrote using NumPy even a few years ago and have it > immediately work. > > Cheers, > Brad > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From matthew.brett at gmail.com Sat Mar 30 20:29:53 2013 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 30 Mar 2013 20:29:53 -0400 Subject: [Numpy-discussion] Raveling, reshape order keyword unnecessarily confuses index and memory ordering In-Reply-To: References: Message-ID: Hi, On Sat, Mar 30, 2013 at 7:50 PM, wrote: > On Sat, Mar 30, 2013 at 7:31 PM, Bradley M. Froehle > wrote: >> On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett >> wrote: >>> >>> On Sat, Mar 30, 2013 at 2:20 PM, wrote: >>> > On Sat, Mar 30, 2013 at 4:57 PM, wrote: >>> >> On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett >>> >> wrote: >>> >>> On Sat, Mar 30, 2013 at 4:14 AM, wrote: >>> >>>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett >>> >>>> wrote: >>> >>>>> >>> >>>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index >>> >>>>> ordering. >>> >>>>> >>> >>>>> This is very confusing. We think the index ordering and memory >>> >>>>> ordering ideas need to be separated, and specifically, we should >>> >>>>> avoid >>> >>>>> using "C" and "F" to refer to index ordering. >>> >>>>> >>> >>>>> Proposal >>> >>>>> ------------- >>> >>>>> >>> >>>>> * Deprecate the use of "C" and "F" meaning backwards and forwards >>> >>>>> index ordering for ravel, reshape >>> >>>>> * Prefer "Z" and "N", being graphical representations of unraveling >>> >>>>> in >>> >>>>> 2 dimensions, axis1 first and axis0 first respectively (excellent >>> >>>>> naming idea by Paul Ivanov) >>> >>>>> >>> >>>>> What do y'all think? >>> >>>> >>> >>>> I always thought "F" and "C" are easy to understand, I always thought >>> >>>> about >>> >>>> the content and never about the memory when using it. >>> >> >>> >> changing the names doesn't make it easier to understand. >>> >> I think the confusion is because the new A and K refer to existing >>> >> memory >>> >> >>> >>> I disagree, I think it's confusing, but I have evidence, and that is >>> that four out of four of us tested ourselves and got it wrong. >>> >>> Perhaps we are particularly dumb or poorly informed, but I think it's >>> rash to assert there is no problem here. > > I think you are overcomplicating things or phrased it as a "trick question" I don't know what you mean by trick question - was there something over-complicated in the example? I deliberately didn't include various much more confusing examples in "reshape". > ravel F and C have *nothing* to do with memory layout. We do agree on this of course - but you said in an earlier mail that you thought of 'C" and 'F' as referring to target memory layout (which they don't in this case) so I think we also agree that "C" and "F" do often refer to memory layout elsewhere in numpy. > I think it's not confusing for beginners that have no idea and never think > about memory layout. > I've never seen any problems with it in statsmodels and I have seen > many developers (GSOC) that are pretty new to python and numpy. > (I didn't check the repo history to verify, so IIRC) Usually you don't need to know what reshape or ravel did because you are likely to reshape again and that will use the same algorithm. For example, I didn't know that that ravel worked in reverse index order, started explaining it wrong, and had to check. I use ravel and reshape a lot, and have not run into this problem because either a) I didn't test my code properly or b) I did reshape after ravel / reshape and it reversed what I did first time. So, I don't think it's "we haven't noticed any problems" is a good argument in the face of "several experienced developers got it wrong when trying to guess what it did". > Even if N, Z were clearer in this case (which I don't think it is and which > I have no idea what it should stand for), you would have to go for every > use of ``order`` in numpy to check whether it should be N or F or Z or C, > and then users would have to check which order name convention is > used in a specific function. Right - and this would be silly if and only if it made sense to conflate memory layout and index ordering. Cheers, Matthew From josef.pktd at gmail.com Sat Mar 30 22:02:42 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 30 Mar 2013 22:02:42 -0400 Subject: [Numpy-discussion] Raveling, reshape order keyword unnecessarily confuses index and memory ordering In-Reply-To: References: Message-ID: On Sat, Mar 30, 2013 at 8:29 PM, Matthew Brett wrote: > Hi, > > On Sat, Mar 30, 2013 at 7:50 PM, wrote: >> On Sat, Mar 30, 2013 at 7:31 PM, Bradley M. Froehle >> wrote: >>> On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett >>> wrote: >>>> >>>> On Sat, Mar 30, 2013 at 2:20 PM, wrote: >>>> > On Sat, Mar 30, 2013 at 4:57 PM, wrote: >>>> >> On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett >>>> >> wrote: >>>> >>> On Sat, Mar 30, 2013 at 4:14 AM, wrote: >>>> >>>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett >>>> >>>> wrote: >>>> >>>>> >>>> >>>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index >>>> >>>>> ordering. >>>> >>>>> >>>> >>>>> This is very confusing. We think the index ordering and memory >>>> >>>>> ordering ideas need to be separated, and specifically, we should >>>> >>>>> avoid >>>> >>>>> using "C" and "F" to refer to index ordering. >>>> >>>>> >>>> >>>>> Proposal >>>> >>>>> ------------- >>>> >>>>> >>>> >>>>> * Deprecate the use of "C" and "F" meaning backwards and forwards >>>> >>>>> index ordering for ravel, reshape >>>> >>>>> * Prefer "Z" and "N", being graphical representations of unraveling >>>> >>>>> in >>>> >>>>> 2 dimensions, axis1 first and axis0 first respectively (excellent >>>> >>>>> naming idea by Paul Ivanov) >>>> >>>>> >>>> >>>>> What do y'all think? >>>> >>>> >>>> >>>> I always thought "F" and "C" are easy to understand, I always thought >>>> >>>> about >>>> >>>> the content and never about the memory when using it. >>>> >> >>>> >> changing the names doesn't make it easier to understand. >>>> >> I think the confusion is because the new A and K refer to existing >>>> >> memory >>>> >> >>>> >>>> I disagree, I think it's confusing, but I have evidence, and that is >>>> that four out of four of us tested ourselves and got it wrong. >>>> >>>> Perhaps we are particularly dumb or poorly informed, but I think it's >>>> rash to assert there is no problem here. >> >> I think you are overcomplicating things or phrased it as a "trick question" > > I don't know what you mean by trick question - was there something > over-complicated in the example? I deliberately didn't include > various much more confusing examples in "reshape". I meant making the "candidates" think about memory instead of just column versus row stacking. I don't think I ever get confused about reshape "F" in 2d. But when I work with 3d or larger ndim nd-arrays, I always have to try an example to check my intuition (in general not just reshape). > >> ravel F and C have *nothing* to do with memory layout. > > We do agree on this of course - but you said in an earlier mail that > you thought of 'C" and 'F' as referring to target memory layout (which > they don't in this case) so I think we also agree that "C" and "F" do > often refer to memory layout elsewhere in numpy. I guess that wasn't so helpful. (emphasis on *target*, There are very few places where an order keyword refers to *existing* memory layout. So I'm not tempted to think about existing memory layout when I see ``order``. Also my examples might have confused the issue: ravel and reshape, with C and F are easy to understand without ever looking at memory issues. memory only comes into play when we want to know whether we get a view or copy. The examples were only for the cases when I do care about this. ) > >> I think it's not confusing for beginners that have no idea and never think >> about memory layout. >> I've never seen any problems with it in statsmodels and I have seen >> many developers (GSOC) that are pretty new to python and numpy. >> (I didn't check the repo history to verify, so IIRC) > > Usually you don't need to know what reshape or ravel did because you > are likely to reshape again and that will use the same algorithm. > > For example, I didn't know that that ravel worked in reverse index > order, started explaining it wrong, and had to check. I use ravel and > reshape a lot, and have not run into this problem because either a) I > didn't test my code properly or b) I did reshape after ravel / reshape > and it reversed what I did first time. So, I don't think it's "we > haven't noticed any problems" is a good argument in the face of > "several experienced developers got it wrong when trying to guess what > it did". What's reverse index order? In the case of statsmodels, we do care about the stacking order. When we use reshape(..., order='F') or ravel('F'), it's only because we want to have a specific array (not memory) layout (and/or because the raveled array came from R) (aside: 2 cases - for 2d parameter vectors, we ravel and reshape often, and we changed our convention to Fortran order, (parameter in rows, equations in columns, IIRC) The interpretation of the results depends on which way we ravel or reshape. - for panel data (time versus individuals), we need to build matching kronecker product arrays which are block-diagonal if the stacking/``order`` is the right way. None of the cases cares about memory layout, it's just: Do we stack by columns or by rows, i.e. fortran- or c-order? Do we want this in rows or in columns? ) > >> Even if N, Z were clearer in this case (which I don't think it is and which >> I have no idea what it should stand for), you would have to go for every >> use of ``order`` in numpy to check whether it should be N or F or Z or C, >> and then users would have to check which order name convention is >> used in a specific function. > > Right - and this would be silly if and only if it made sense to > conflate memory layout and index ordering. I see the two things, but never saw it as a problem arr2 = np.asarray(arr1, order='F') give me an array with Fortran memory layout, I need it (never used in statsmodels, there might be a few places where we used other ways to control the memory layout, but not much.) arr2 = arr1.reshape(-1, 5, order='F') unstack this array by columns, I want 5 of them arr1 = arr2.ravel('F') go back, stack them again by columns (used quite a bit as described before) Cheers, Josef > > Cheers, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From matthew.brett at gmail.com Sat Mar 30 23:43:17 2013 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 30 Mar 2013 20:43:17 -0700 Subject: [Numpy-discussion] Raveling, reshape order keyword unnecessarily confuses index and memory ordering In-Reply-To: References: Message-ID: Hi, On Sat, Mar 30, 2013 at 7:02 PM, wrote: > On Sat, Mar 30, 2013 at 8:29 PM, Matthew Brett wrote: >> Hi, >> >> On Sat, Mar 30, 2013 at 7:50 PM, wrote: >>> On Sat, Mar 30, 2013 at 7:31 PM, Bradley M. Froehle >>> wrote: >>>> On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett >>>> wrote: >>>>> >>>>> On Sat, Mar 30, 2013 at 2:20 PM, wrote: >>>>> > On Sat, Mar 30, 2013 at 4:57 PM, wrote: >>>>> >> On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett >>>>> >> wrote: >>>>> >>> On Sat, Mar 30, 2013 at 4:14 AM, wrote: >>>>> >>>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett >>>>> >>>> wrote: >>>>> >>>>> >>>>> >>>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index >>>>> >>>>> ordering. >>>>> >>>>> >>>>> >>>>> This is very confusing. We think the index ordering and memory >>>>> >>>>> ordering ideas need to be separated, and specifically, we should >>>>> >>>>> avoid >>>>> >>>>> using "C" and "F" to refer to index ordering. >>>>> >>>>> >>>>> >>>>> Proposal >>>>> >>>>> ------------- >>>>> >>>>> >>>>> >>>>> * Deprecate the use of "C" and "F" meaning backwards and forwards >>>>> >>>>> index ordering for ravel, reshape >>>>> >>>>> * Prefer "Z" and "N", being graphical representations of unraveling >>>>> >>>>> in >>>>> >>>>> 2 dimensions, axis1 first and axis0 first respectively (excellent >>>>> >>>>> naming idea by Paul Ivanov) >>>>> >>>>> >>>>> >>>>> What do y'all think? >>>>> >>>> >>>>> >>>> I always thought "F" and "C" are easy to understand, I always thought >>>>> >>>> about >>>>> >>>> the content and never about the memory when using it. >>>>> >> >>>>> >> changing the names doesn't make it easier to understand. >>>>> >> I think the confusion is because the new A and K refer to existing >>>>> >> memory >>>>> >> >>>>> >>>>> I disagree, I think it's confusing, but I have evidence, and that is >>>>> that four out of four of us tested ourselves and got it wrong. >>>>> >>>>> Perhaps we are particularly dumb or poorly informed, but I think it's >>>>> rash to assert there is no problem here. >>> >>> I think you are overcomplicating things or phrased it as a "trick question" >> >> I don't know what you mean by trick question - was there something >> over-complicated in the example? I deliberately didn't include >> various much more confusing examples in "reshape". > > I meant making the "candidates" think about memory instead of just > column versus row stacking. > I don't think I ever get confused about reshape "F" in 2d. > But when I work with 3d or larger ndim nd-arrays, I always have to > try an example to check my intuition (in general not just reshape). > >> >>> ravel F and C have *nothing* to do with memory layout. >> >> We do agree on this of course - but you said in an earlier mail that >> you thought of 'C" and 'F' as referring to target memory layout (which >> they don't in this case) so I think we also agree that "C" and "F" do >> often refer to memory layout elsewhere in numpy. > > I guess that wasn't so helpful. > (emphasis on *target*, There are very few places where an order > keyword refers to *existing* memory layout. It is helpful because it shows how easy it is to get confused between memory order and index order. > What's reverse index order? I am not being clear, sorry about that: import numpy as np def ravel_iter_last_fastest(arr): res = [] for i in range(arr.shape[0]): for j in range(arr.shape[1]): for k in range(arr.shape[2]): # Iterating over last dimension fastest res.append(arr[i, j, k]) return np.array(res) def ravel_iter_first_fastest(arr): res = [] for k in range(arr.shape[2]): for j in range(arr.shape[1]): for i in range(arr.shape[0]): # Iterating over first dimension fastest res.append(arr[i, j, k]) return np.array(res) a = np.arange(24).reshape((2, 3, 4)) print np.all(a.ravel('C') == ravel_iter_last_fastest(a)) print np.all(a.ravel('F') == ravel_iter_first_fastest(a)) By 'reverse index ordering' I mean 'ravel_iter_last_fastest' above. I guess one could argue that this was not 'reverse' but 'forward' index ordering, but I am not arguing about which is better, or those names, only that it's the order of indices that differs, not the memory layout, and that these ideas need to be kept separate. Cheers, Matthew From matthew.brett at gmail.com Sun Mar 31 00:04:49 2013 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 30 Mar 2013 21:04:49 -0700 Subject: [Numpy-discussion] Raveling, reshape order keyword unnecessarily confuses index and memory ordering In-Reply-To: References: Message-ID: Hi, On Sat, Mar 30, 2013 at 7:02 PM, wrote: > On Sat, Mar 30, 2013 at 8:29 PM, Matthew Brett wrote: >> Hi, >> >> On Sat, Mar 30, 2013 at 7:50 PM, wrote: >>> On Sat, Mar 30, 2013 at 7:31 PM, Bradley M. Froehle >>> wrote: >>>> On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett >>>> wrote: >>>>> >>>>> On Sat, Mar 30, 2013 at 2:20 PM, wrote: >>>>> > On Sat, Mar 30, 2013 at 4:57 PM, wrote: >>>>> >> On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett >>>>> >> wrote: >>>>> >>> On Sat, Mar 30, 2013 at 4:14 AM, wrote: >>>>> >>>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett >>>>> >>>> wrote: >>>>> >>>>> >>>>> >>>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index >>>>> >>>>> ordering. >>>>> >>>>> >>>>> >>>>> This is very confusing. We think the index ordering and memory >>>>> >>>>> ordering ideas need to be separated, and specifically, we should >>>>> >>>>> avoid >>>>> >>>>> using "C" and "F" to refer to index ordering. >>>>> >>>>> >>>>> >>>>> Proposal >>>>> >>>>> ------------- >>>>> >>>>> >>>>> >>>>> * Deprecate the use of "C" and "F" meaning backwards and forwards >>>>> >>>>> index ordering for ravel, reshape >>>>> >>>>> * Prefer "Z" and "N", being graphical representations of unraveling >>>>> >>>>> in >>>>> >>>>> 2 dimensions, axis1 first and axis0 first respectively (excellent >>>>> >>>>> naming idea by Paul Ivanov) >>>>> >>>>> >>>>> >>>>> What do y'all think? >>>>> >>>> >>>>> >>>> I always thought "F" and "C" are easy to understand, I always thought >>>>> >>>> about >>>>> >>>> the content and never about the memory when using it. >>>>> >> >>>>> >> changing the names doesn't make it easier to understand. >>>>> >> I think the confusion is because the new A and K refer to existing >>>>> >> memory >>>>> >> >>>>> >>>>> I disagree, I think it's confusing, but I have evidence, and that is >>>>> that four out of four of us tested ourselves and got it wrong. >>>>> >>>>> Perhaps we are particularly dumb or poorly informed, but I think it's >>>>> rash to assert there is no problem here. >>> >>> I think you are overcomplicating things or phrased it as a "trick question" >> >> I don't know what you mean by trick question - was there something >> over-complicated in the example? I deliberately didn't include >> various much more confusing examples in "reshape". > > I meant making the "candidates" think about memory instead of just > column versus row stacking. To be specific, we were teaching about reshaping a (I, J, K, N) 4D array, it was an image, with time as the 4th dimension (N time points). Raveling and reshaping 3D and 4D arrays is a common thing to do in neuroimaging, as you can imagine. A student asked what he would get back from raveling this array, a concatenated time series, or something spatial? We showed (I'd worked it out by this time) that the first N values were the time series given by [0, 0, 0, :]. He said - "Oh - I see - so the data is stored as a whole lot of time series one by one, I thought it would be stored as a series of images'. Ironically, this was a Fortran-ordered array in memory, and he was wrong. So, I think the idea of memory ordering and index ordering is very easy to confuse, and comes up naturally. I would like, as a teacher, to be able to say something like: This is what C memory layout is (it's the memory layout that gives arr.flags.C_CONTIGUOUS=True) This is what F memory layout is (it's the memory layout that gives arr.flags.F_CONTIGUOUS=True) It's rather easy to get something that is neither C or F memory layout Numpy does many memory layouts. Ravel and reshape and numpy in general do not care (normally) about C or F layouts, they only care about index ordering. My point, that I'm repeating, is that my job is made harder by 'arr.ravel('F')'. Cheers, Matthew From josef.pktd at gmail.com Sun Mar 31 00:05:20 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 31 Mar 2013 00:05:20 -0400 Subject: [Numpy-discussion] Raveling, reshape order keyword unnecessarily confuses index and memory ordering In-Reply-To: References: Message-ID: On Sat, Mar 30, 2013 at 11:43 PM, Matthew Brett wrote: > Hi, > > On Sat, Mar 30, 2013 at 7:02 PM, wrote: >> On Sat, Mar 30, 2013 at 8:29 PM, Matthew Brett wrote: >>> Hi, >>> >>> On Sat, Mar 30, 2013 at 7:50 PM, wrote: >>>> On Sat, Mar 30, 2013 at 7:31 PM, Bradley M. Froehle >>>> wrote: >>>>> On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett >>>>> wrote: >>>>>> >>>>>> On Sat, Mar 30, 2013 at 2:20 PM, wrote: >>>>>> > On Sat, Mar 30, 2013 at 4:57 PM, wrote: >>>>>> >> On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett >>>>>> >> wrote: >>>>>> >>> On Sat, Mar 30, 2013 at 4:14 AM, wrote: >>>>>> >>>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett >>>>>> >>>> wrote: >>>>>> >>>>> >>>>>> >>>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index >>>>>> >>>>> ordering. >>>>>> >>>>> >>>>>> >>>>> This is very confusing. We think the index ordering and memory >>>>>> >>>>> ordering ideas need to be separated, and specifically, we should >>>>>> >>>>> avoid >>>>>> >>>>> using "C" and "F" to refer to index ordering. >>>>>> >>>>> >>>>>> >>>>> Proposal >>>>>> >>>>> ------------- >>>>>> >>>>> >>>>>> >>>>> * Deprecate the use of "C" and "F" meaning backwards and forwards >>>>>> >>>>> index ordering for ravel, reshape >>>>>> >>>>> * Prefer "Z" and "N", being graphical representations of unraveling >>>>>> >>>>> in >>>>>> >>>>> 2 dimensions, axis1 first and axis0 first respectively (excellent >>>>>> >>>>> naming idea by Paul Ivanov) >>>>>> >>>>> >>>>>> >>>>> What do y'all think? >>>>>> >>>> >>>>>> >>>> I always thought "F" and "C" are easy to understand, I always thought >>>>>> >>>> about >>>>>> >>>> the content and never about the memory when using it. >>>>>> >> >>>>>> >> changing the names doesn't make it easier to understand. >>>>>> >> I think the confusion is because the new A and K refer to existing >>>>>> >> memory >>>>>> >> >>>>>> >>>>>> I disagree, I think it's confusing, but I have evidence, and that is >>>>>> that four out of four of us tested ourselves and got it wrong. >>>>>> >>>>>> Perhaps we are particularly dumb or poorly informed, but I think it's >>>>>> rash to assert there is no problem here. >>>> >>>> I think you are overcomplicating things or phrased it as a "trick question" >>> >>> I don't know what you mean by trick question - was there something >>> over-complicated in the example? I deliberately didn't include >>> various much more confusing examples in "reshape". >> >> I meant making the "candidates" think about memory instead of just >> column versus row stacking. >> I don't think I ever get confused about reshape "F" in 2d. >> But when I work with 3d or larger ndim nd-arrays, I always have to >> try an example to check my intuition (in general not just reshape). >> >>> >>>> ravel F and C have *nothing* to do with memory layout. >>> >>> We do agree on this of course - but you said in an earlier mail that >>> you thought of 'C" and 'F' as referring to target memory layout (which >>> they don't in this case) so I think we also agree that "C" and "F" do >>> often refer to memory layout elsewhere in numpy. >> >> I guess that wasn't so helpful. >> (emphasis on *target*, There are very few places where an order >> keyword refers to *existing* memory layout. > > It is helpful because it shows how easy it is to get confused between > memory order and index order. > >> What's reverse index order? > > I am not being clear, sorry about that: > > import numpy as np > > def ravel_iter_last_fastest(arr): > res = [] > for i in range(arr.shape[0]): > for j in range(arr.shape[1]): > for k in range(arr.shape[2]): > # Iterating over last dimension fastest > res.append(arr[i, j, k]) > return np.array(res) > > > def ravel_iter_first_fastest(arr): > res = [] > for k in range(arr.shape[2]): > for j in range(arr.shape[1]): > for i in range(arr.shape[0]): > # Iterating over first dimension fastest > res.append(arr[i, j, k]) > return np.array(res) good example that's just C and F order in the terminology of numpy http://docs.scipy.org/doc/numpy/reference/arrays.nditer.html#controlling-iteration-order (independent of memory) http://docs.scipy.org/doc/numpy/reference/generated/numpy.flatiter.html#numpy.flatiter I don't think we want to rename a large part of the basic terminology of numpy Josef > > > a = np.arange(24).reshape((2, 3, 4)) > > print np.all(a.ravel('C') == ravel_iter_last_fastest(a)) > print np.all(a.ravel('F') == ravel_iter_first_fastest(a)) > > By 'reverse index ordering' I mean 'ravel_iter_last_fastest' above. I > guess one could argue that this was not 'reverse' but 'forward' index > ordering, but I am not arguing about which is better, or those names, > only that it's the order of indices that differs, not the memory > layout, and that these ideas need to be kept separate. > > Cheers, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From matthew.brett at gmail.com Sun Mar 31 00:12:51 2013 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 30 Mar 2013 21:12:51 -0700 Subject: [Numpy-discussion] Raveling, reshape order keyword unnecessarily confuses index and memory ordering In-Reply-To: References: Message-ID: Hi, On Sat, Mar 30, 2013 at 9:05 PM, wrote: > On Sat, Mar 30, 2013 at 11:43 PM, Matthew Brett wrote: >> Hi, >> >> On Sat, Mar 30, 2013 at 7:02 PM, wrote: >>> On Sat, Mar 30, 2013 at 8:29 PM, Matthew Brett wrote: >>>> Hi, >>>> >>>> On Sat, Mar 30, 2013 at 7:50 PM, wrote: >>>>> On Sat, Mar 30, 2013 at 7:31 PM, Bradley M. Froehle >>>>> wrote: >>>>>> On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett >>>>>> wrote: >>>>>>> >>>>>>> On Sat, Mar 30, 2013 at 2:20 PM, wrote: >>>>>>> > On Sat, Mar 30, 2013 at 4:57 PM, wrote: >>>>>>> >> On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett >>>>>>> >> wrote: >>>>>>> >>> On Sat, Mar 30, 2013 at 4:14 AM, wrote: >>>>>>> >>>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett >>>>>>> >>>> wrote: >>>>>>> >>>>> >>>>>>> >>>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index >>>>>>> >>>>> ordering. >>>>>>> >>>>> >>>>>>> >>>>> This is very confusing. We think the index ordering and memory >>>>>>> >>>>> ordering ideas need to be separated, and specifically, we should >>>>>>> >>>>> avoid >>>>>>> >>>>> using "C" and "F" to refer to index ordering. >>>>>>> >>>>> >>>>>>> >>>>> Proposal >>>>>>> >>>>> ------------- >>>>>>> >>>>> >>>>>>> >>>>> * Deprecate the use of "C" and "F" meaning backwards and forwards >>>>>>> >>>>> index ordering for ravel, reshape >>>>>>> >>>>> * Prefer "Z" and "N", being graphical representations of unraveling >>>>>>> >>>>> in >>>>>>> >>>>> 2 dimensions, axis1 first and axis0 first respectively (excellent >>>>>>> >>>>> naming idea by Paul Ivanov) >>>>>>> >>>>> >>>>>>> >>>>> What do y'all think? >>>>>>> >>>> >>>>>>> >>>> I always thought "F" and "C" are easy to understand, I always thought >>>>>>> >>>> about >>>>>>> >>>> the content and never about the memory when using it. >>>>>>> >> >>>>>>> >> changing the names doesn't make it easier to understand. >>>>>>> >> I think the confusion is because the new A and K refer to existing >>>>>>> >> memory >>>>>>> >> >>>>>>> >>>>>>> I disagree, I think it's confusing, but I have evidence, and that is >>>>>>> that four out of four of us tested ourselves and got it wrong. >>>>>>> >>>>>>> Perhaps we are particularly dumb or poorly informed, but I think it's >>>>>>> rash to assert there is no problem here. >>>>> >>>>> I think you are overcomplicating things or phrased it as a "trick question" >>>> >>>> I don't know what you mean by trick question - was there something >>>> over-complicated in the example? I deliberately didn't include >>>> various much more confusing examples in "reshape". >>> >>> I meant making the "candidates" think about memory instead of just >>> column versus row stacking. >>> I don't think I ever get confused about reshape "F" in 2d. >>> But when I work with 3d or larger ndim nd-arrays, I always have to >>> try an example to check my intuition (in general not just reshape). >>> >>>> >>>>> ravel F and C have *nothing* to do with memory layout. >>>> >>>> We do agree on this of course - but you said in an earlier mail that >>>> you thought of 'C" and 'F' as referring to target memory layout (which >>>> they don't in this case) so I think we also agree that "C" and "F" do >>>> often refer to memory layout elsewhere in numpy. >>> >>> I guess that wasn't so helpful. >>> (emphasis on *target*, There are very few places where an order >>> keyword refers to *existing* memory layout. >> >> It is helpful because it shows how easy it is to get confused between >> memory order and index order. >> >>> What's reverse index order? >> >> I am not being clear, sorry about that: >> >> import numpy as np >> >> def ravel_iter_last_fastest(arr): >> res = [] >> for i in range(arr.shape[0]): >> for j in range(arr.shape[1]): >> for k in range(arr.shape[2]): >> # Iterating over last dimension fastest >> res.append(arr[i, j, k]) >> return np.array(res) >> >> >> def ravel_iter_first_fastest(arr): >> res = [] >> for k in range(arr.shape[2]): >> for j in range(arr.shape[1]): >> for i in range(arr.shape[0]): >> # Iterating over first dimension fastest >> res.append(arr[i, j, k]) >> return np.array(res) > > good example > > that's just C and F order in the terminology of numpy > http://docs.scipy.org/doc/numpy/reference/arrays.nditer.html#controlling-iteration-order > (independent of memory) > http://docs.scipy.org/doc/numpy/reference/generated/numpy.flatiter.html#numpy.flatiter > > I don't think we want to rename a large part of the basic terminology of numpy Sometimes two ideas get conflated together, and it seems natural to keep together, until people get confused, and you realize that there are two separate ideas. For example here's a quote from the 'flatiter' doc : Iteration is done in C-contiguous style Now - that seems really ugly to me. For example, 'contiguous' should not be in that sentence, although it's easy to see why it is, and it seems to me to be a sign of the confusion between the ideas. Cheers, Matthew From josef.pktd at gmail.com Sun Mar 31 00:37:50 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 31 Mar 2013 00:37:50 -0400 Subject: [Numpy-discussion] Raveling, reshape order keyword unnecessarily confuses index and memory ordering In-Reply-To: References: Message-ID: On Sun, Mar 31, 2013 at 12:04 AM, Matthew Brett wrote: > Hi, > > On Sat, Mar 30, 2013 at 7:02 PM, wrote: >> On Sat, Mar 30, 2013 at 8:29 PM, Matthew Brett wrote: >>> Hi, >>> >>> On Sat, Mar 30, 2013 at 7:50 PM, wrote: >>>> On Sat, Mar 30, 2013 at 7:31 PM, Bradley M. Froehle >>>> wrote: >>>>> On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett >>>>> wrote: >>>>>> >>>>>> On Sat, Mar 30, 2013 at 2:20 PM, wrote: >>>>>> > On Sat, Mar 30, 2013 at 4:57 PM, wrote: >>>>>> >> On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett >>>>>> >> wrote: >>>>>> >>> On Sat, Mar 30, 2013 at 4:14 AM, wrote: >>>>>> >>>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett >>>>>> >>>> wrote: >>>>>> >>>>> >>>>>> >>>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index >>>>>> >>>>> ordering. >>>>>> >>>>> >>>>>> >>>>> This is very confusing. We think the index ordering and memory >>>>>> >>>>> ordering ideas need to be separated, and specifically, we should >>>>>> >>>>> avoid >>>>>> >>>>> using "C" and "F" to refer to index ordering. >>>>>> >>>>> >>>>>> >>>>> Proposal >>>>>> >>>>> ------------- >>>>>> >>>>> >>>>>> >>>>> * Deprecate the use of "C" and "F" meaning backwards and forwards >>>>>> >>>>> index ordering for ravel, reshape >>>>>> >>>>> * Prefer "Z" and "N", being graphical representations of unraveling >>>>>> >>>>> in >>>>>> >>>>> 2 dimensions, axis1 first and axis0 first respectively (excellent >>>>>> >>>>> naming idea by Paul Ivanov) >>>>>> >>>>> >>>>>> >>>>> What do y'all think? >>>>>> >>>> >>>>>> >>>> I always thought "F" and "C" are easy to understand, I always thought >>>>>> >>>> about >>>>>> >>>> the content and never about the memory when using it. >>>>>> >> >>>>>> >> changing the names doesn't make it easier to understand. >>>>>> >> I think the confusion is because the new A and K refer to existing >>>>>> >> memory >>>>>> >> >>>>>> >>>>>> I disagree, I think it's confusing, but I have evidence, and that is >>>>>> that four out of four of us tested ourselves and got it wrong. >>>>>> >>>>>> Perhaps we are particularly dumb or poorly informed, but I think it's >>>>>> rash to assert there is no problem here. >>>> >>>> I think you are overcomplicating things or phrased it as a "trick question" >>> >>> I don't know what you mean by trick question - was there something >>> over-complicated in the example? I deliberately didn't include >>> various much more confusing examples in "reshape". >> >> I meant making the "candidates" think about memory instead of just >> column versus row stacking. > > To be specific, we were teaching about reshaping a (I, J, K, N) 4D > array, it was an image, with time as the 4th dimension (N time > points). Raveling and reshaping 3D and 4D arrays is a common thing > to do in neuroimaging, as you can imagine. > > A student asked what he would get back from raveling this array, a > concatenated time series, or something spatial? > > We showed (I'd worked it out by this time) that the first N values > were the time series given by [0, 0, 0, :]. > > He said - "Oh - I see - so the data is stored as a whole lot of time > series one by one, I thought it would be stored as a series of > images'. > > Ironically, this was a Fortran-ordered array in memory, and he was wrong. > > So, I think the idea of memory ordering and index ordering is very > easy to confuse, and comes up naturally. > > I would like, as a teacher, to be able to say something like: > > This is what C memory layout is (it's the memory layout that gives > arr.flags.C_CONTIGUOUS=True) > This is what F memory layout is (it's the memory layout that gives > arr.flags.F_CONTIGUOUS=True) > It's rather easy to get something that is neither C or F memory layout > Numpy does many memory layouts. > Ravel and reshape and numpy in general do not care (normally) about C > or F layouts, they only care about index ordering. > > My point, that I'm repeating, is that my job is made harder by > 'arr.ravel('F')'. But once you know that ravel and reshape don't care about memory, the ravel is easy to predict (maybe not easy to visualize in 4-D): order=C: stack the last dimension, N, time series of one 3d pixels, then stack the time series of the next pixel... process pixels by depth and the row by row (like old TVs) I assume you did this because your underlying array is C contiguous. so your ravel('C') is a c-contiguous view (instead of some weird strides or a copy) I usually prefer time in the first dimension, and stack order=F, then I can start at the front, stack all time periods of the first pixel, keep going and work pixels down the columns, first page, next page, ... (and I hope I have a F-contiguous array, so my raveled array is also F-contiguous.) (note: I'm bringing memory back in as optimization, but not to predict the stacking) Josef (I think brains are designed for Fortran order and C-ordering in numpy is a accident, except, reading a Western language book is neither) > > Cheers, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From matthew.brett at gmail.com Sun Mar 31 00:50:00 2013 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 30 Mar 2013 21:50:00 -0700 Subject: [Numpy-discussion] Raveling, reshape order keyword unnecessarily confuses index and memory ordering In-Reply-To: References: Message-ID: Hi, On Sat, Mar 30, 2013 at 9:37 PM, wrote: > On Sun, Mar 31, 2013 at 12:04 AM, Matthew Brett wrote: >> Hi, >> >> On Sat, Mar 30, 2013 at 7:02 PM, wrote: >>> On Sat, Mar 30, 2013 at 8:29 PM, Matthew Brett wrote: >>>> Hi, >>>> >>>> On Sat, Mar 30, 2013 at 7:50 PM, wrote: >>>>> On Sat, Mar 30, 2013 at 7:31 PM, Bradley M. Froehle >>>>> wrote: >>>>>> On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett >>>>>> wrote: >>>>>>> >>>>>>> On Sat, Mar 30, 2013 at 2:20 PM, wrote: >>>>>>> > On Sat, Mar 30, 2013 at 4:57 PM, wrote: >>>>>>> >> On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett >>>>>>> >> wrote: >>>>>>> >>> On Sat, Mar 30, 2013 at 4:14 AM, wrote: >>>>>>> >>>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett >>>>>>> >>>> wrote: >>>>>>> >>>>> >>>>>>> >>>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index >>>>>>> >>>>> ordering. >>>>>>> >>>>> >>>>>>> >>>>> This is very confusing. We think the index ordering and memory >>>>>>> >>>>> ordering ideas need to be separated, and specifically, we should >>>>>>> >>>>> avoid >>>>>>> >>>>> using "C" and "F" to refer to index ordering. >>>>>>> >>>>> >>>>>>> >>>>> Proposal >>>>>>> >>>>> ------------- >>>>>>> >>>>> >>>>>>> >>>>> * Deprecate the use of "C" and "F" meaning backwards and forwards >>>>>>> >>>>> index ordering for ravel, reshape >>>>>>> >>>>> * Prefer "Z" and "N", being graphical representations of unraveling >>>>>>> >>>>> in >>>>>>> >>>>> 2 dimensions, axis1 first and axis0 first respectively (excellent >>>>>>> >>>>> naming idea by Paul Ivanov) >>>>>>> >>>>> >>>>>>> >>>>> What do y'all think? >>>>>>> >>>> >>>>>>> >>>> I always thought "F" and "C" are easy to understand, I always thought >>>>>>> >>>> about >>>>>>> >>>> the content and never about the memory when using it. >>>>>>> >> >>>>>>> >> changing the names doesn't make it easier to understand. >>>>>>> >> I think the confusion is because the new A and K refer to existing >>>>>>> >> memory >>>>>>> >> >>>>>>> >>>>>>> I disagree, I think it's confusing, but I have evidence, and that is >>>>>>> that four out of four of us tested ourselves and got it wrong. >>>>>>> >>>>>>> Perhaps we are particularly dumb or poorly informed, but I think it's >>>>>>> rash to assert there is no problem here. >>>>> >>>>> I think you are overcomplicating things or phrased it as a "trick question" >>>> >>>> I don't know what you mean by trick question - was there something >>>> over-complicated in the example? I deliberately didn't include >>>> various much more confusing examples in "reshape". >>> >>> I meant making the "candidates" think about memory instead of just >>> column versus row stacking. >> >> To be specific, we were teaching about reshaping a (I, J, K, N) 4D >> array, it was an image, with time as the 4th dimension (N time >> points). Raveling and reshaping 3D and 4D arrays is a common thing >> to do in neuroimaging, as you can imagine. >> >> A student asked what he would get back from raveling this array, a >> concatenated time series, or something spatial? >> >> We showed (I'd worked it out by this time) that the first N values >> were the time series given by [0, 0, 0, :]. >> >> He said - "Oh - I see - so the data is stored as a whole lot of time >> series one by one, I thought it would be stored as a series of >> images'. >> >> Ironically, this was a Fortran-ordered array in memory, and he was wrong. >> >> So, I think the idea of memory ordering and index ordering is very >> easy to confuse, and comes up naturally. >> >> I would like, as a teacher, to be able to say something like: >> >> This is what C memory layout is (it's the memory layout that gives >> arr.flags.C_CONTIGUOUS=True) >> This is what F memory layout is (it's the memory layout that gives >> arr.flags.F_CONTIGUOUS=True) >> It's rather easy to get something that is neither C or F memory layout >> Numpy does many memory layouts. >> Ravel and reshape and numpy in general do not care (normally) about C >> or F layouts, they only care about index ordering. >> >> My point, that I'm repeating, is that my job is made harder by >> 'arr.ravel('F')'. > > But once you know that ravel and reshape don't care about memory, the > ravel is easy to predict (maybe not easy to visualize in 4-D): But this assumes that you already know that there's such a thing as memory layout, and there's such a thing as index ordering, and that 'C' and 'F' in ravel refer to index ordering. Once you have that, you're golden. I'm arguing it's markedly harder to get this distinction, and keep it in mind, and teach it, if we are using the 'C' and 'F" names for both things. > order=C: stack the last dimension, N, time series of one 3d pixels, > then stack the time series of the next pixel... > process pixels by depth and the row by row (like old TVs) > > I assume you did this because your underlying array is C contiguous. > so your ravel('C') is a c-contiguous view (instead of some weird > strides or a copy) Sorry - what do you mean by 'this' in 'did this'? Reshape? Why would it matter what my underlying array memory layout was? > I usually prefer time in the first dimension, and stack order=F, then > I can start at the front, stack all time periods of the first pixel, > keep going and work pixels down the columns, first page, next page, > ... > (and I hope I have a F-contiguous array, so my raveled array is also > F-contiguous.) > > (note: I'm bringing memory back in as optimization, but not to predict > the stacking) > > Josef > (I think brains are designed for Fortran order and C-ordering in numpy > is a accident, > except, reading a Western language book is neither) Yes, I find first axis fastest changing easier to think about, and I came from MATLAB (about 8 years ago mind), so that also made it more natural. I had (until yesterday) simply assumed that numpy unraveled that way, because it seemed more obvious to me, and knew that the unravel index order need have nothing to do with the memory order, or the fact that arrays are C contiguous by default. Not so of course. That's not my complaint as you know - it's just a convention, I guessed the convention wrong. Cheers, Matthew From ivan.oseledets at gmail.com Sun Mar 31 01:14:37 2013 From: ivan.oseledets at gmail.com (Ivan Oseledets) Date: Sun, 31 Mar 2013 09:14:37 +0400 Subject: [Numpy-discussion] Indexing bug Message-ID: Message: 2 Date: Sat, 30 Mar 2013 11:13:35 -0700 From: Jaime Fern?ndez del R?o Subject: Re: [Numpy-discussion] Indexing bug? To: Discussion of Numerical Python Message-ID: Content-Type: text/plain; charset="iso-8859-1" On Sat, Mar 30, 2013 at 11:01 AM, Ivan Oseledets wrote: > I am using numpy 1.6.1, > and encountered a wierd fancy indexing bug: > > import numpy as np > c = np.random.randn(10,200,10); > > In [29]: print c[[0,1],:200,:2].shape > (2, 200, 2) > > In [30]: print c[[0,1],:200,[0,1]].shape > (2, 200) > > It means, that here fancy indexing is not working right for a 3d array. > On Sat, Mar 30, 2013 at 11:01 AM, Ivan Oseledets wrote: > I am using numpy 1.6.1, > and encountered a wierd fancy indexing bug: > > import numpy as np > c = np.random.randn(10,200,10); > > In [29]: print c[[0,1],:200,:2].shape > (2, 200, 2) > > In [30]: print c[[0,1],:200,[0,1]].shape > (2, 200) > > It means, that here fancy indexing is not working right for a 3d array. > --> It is working fine, review the docs: http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#advanced-indexing In your return, item [0, :] is c[0, :, 0] and item[1, :]is c[1, :, 1]. If you want a return of shape (2, 200, 2) where item [i, :, j] is c[i, :, j] you could use slicing: c[:2, :200, :2] or something more elaborate like: c[np.arange(2)[:, None, None], np.arange(200)[:, None], np.arange(2)] Jaime ---> Oh! So it is not a bug, it is a feature, which is completely incompatible with other array based languages (MATLAB and Fortran). To me, I can not find a single explanation why it is so in numpy. Taking submatrices from a matrix is a common operation and the syntax above is very natural to take submatrices, not a weird diagonal stuff. i.e., c = np.random.randn(100,100) d = c[[0,3],[2,3]] should NOT produce two numbers! (and you can not do it using slices!) In MATLAB and Fortran c(indi,indj) will produce a 2 x 2 matrix. How it can be done in numpy (and why the complications?) So, please consider this message as a feature request. Ivan From josef.pktd at gmail.com Sun Mar 31 01:38:09 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 31 Mar 2013 01:38:09 -0400 Subject: [Numpy-discussion] Raveling, reshape order keyword unnecessarily confuses index and memory ordering In-Reply-To: References: Message-ID: On Sun, Mar 31, 2013 at 12:50 AM, Matthew Brett wrote: > Hi, > > On Sat, Mar 30, 2013 at 9:37 PM, wrote: >> On Sun, Mar 31, 2013 at 12:04 AM, Matthew Brett wrote: >>> Hi, >>> >>> On Sat, Mar 30, 2013 at 7:02 PM, wrote: >>>> On Sat, Mar 30, 2013 at 8:29 PM, Matthew Brett wrote: >>>>> Hi, >>>>> >>>>> On Sat, Mar 30, 2013 at 7:50 PM, wrote: >>>>>> On Sat, Mar 30, 2013 at 7:31 PM, Bradley M. Froehle >>>>>> wrote: >>>>>>> On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett >>>>>>> wrote: >>>>>>>> >>>>>>>> On Sat, Mar 30, 2013 at 2:20 PM, wrote: >>>>>>>> > On Sat, Mar 30, 2013 at 4:57 PM, wrote: >>>>>>>> >> On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett >>>>>>>> >> wrote: >>>>>>>> >>> On Sat, Mar 30, 2013 at 4:14 AM, wrote: >>>>>>>> >>>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett >>>>>>>> >>>> wrote: >>>>>>>> >>>>> >>>>>>>> >>>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index >>>>>>>> >>>>> ordering. >>>>>>>> >>>>> >>>>>>>> >>>>> This is very confusing. We think the index ordering and memory >>>>>>>> >>>>> ordering ideas need to be separated, and specifically, we should >>>>>>>> >>>>> avoid >>>>>>>> >>>>> using "C" and "F" to refer to index ordering. >>>>>>>> >>>>> >>>>>>>> >>>>> Proposal >>>>>>>> >>>>> ------------- >>>>>>>> >>>>> >>>>>>>> >>>>> * Deprecate the use of "C" and "F" meaning backwards and forwards >>>>>>>> >>>>> index ordering for ravel, reshape >>>>>>>> >>>>> * Prefer "Z" and "N", being graphical representations of unraveling >>>>>>>> >>>>> in >>>>>>>> >>>>> 2 dimensions, axis1 first and axis0 first respectively (excellent >>>>>>>> >>>>> naming idea by Paul Ivanov) >>>>>>>> >>>>> >>>>>>>> >>>>> What do y'all think? >>>>>>>> >>>> >>>>>>>> >>>> I always thought "F" and "C" are easy to understand, I always thought >>>>>>>> >>>> about >>>>>>>> >>>> the content and never about the memory when using it. >>>>>>>> >> >>>>>>>> >> changing the names doesn't make it easier to understand. >>>>>>>> >> I think the confusion is because the new A and K refer to existing >>>>>>>> >> memory >>>>>>>> >> >>>>>>>> >>>>>>>> I disagree, I think it's confusing, but I have evidence, and that is >>>>>>>> that four out of four of us tested ourselves and got it wrong. >>>>>>>> >>>>>>>> Perhaps we are particularly dumb or poorly informed, but I think it's >>>>>>>> rash to assert there is no problem here. >>>>>> >>>>>> I think you are overcomplicating things or phrased it as a "trick question" >>>>> >>>>> I don't know what you mean by trick question - was there something >>>>> over-complicated in the example? I deliberately didn't include >>>>> various much more confusing examples in "reshape". >>>> >>>> I meant making the "candidates" think about memory instead of just >>>> column versus row stacking. >>> >>> To be specific, we were teaching about reshaping a (I, J, K, N) 4D >>> array, it was an image, with time as the 4th dimension (N time >>> points). Raveling and reshaping 3D and 4D arrays is a common thing >>> to do in neuroimaging, as you can imagine. >>> >>> A student asked what he would get back from raveling this array, a >>> concatenated time series, or something spatial? >>> >>> We showed (I'd worked it out by this time) that the first N values >>> were the time series given by [0, 0, 0, :]. >>> >>> He said - "Oh - I see - so the data is stored as a whole lot of time >>> series one by one, I thought it would be stored as a series of >>> images'. >>> >>> Ironically, this was a Fortran-ordered array in memory, and he was wrong. >>> >>> So, I think the idea of memory ordering and index ordering is very >>> easy to confuse, and comes up naturally. >>> >>> I would like, as a teacher, to be able to say something like: >>> >>> This is what C memory layout is (it's the memory layout that gives >>> arr.flags.C_CONTIGUOUS=True) >>> This is what F memory layout is (it's the memory layout that gives >>> arr.flags.F_CONTIGUOUS=True) >>> It's rather easy to get something that is neither C or F memory layout >>> Numpy does many memory layouts. >>> Ravel and reshape and numpy in general do not care (normally) about C >>> or F layouts, they only care about index ordering. >>> >>> My point, that I'm repeating, is that my job is made harder by >>> 'arr.ravel('F')'. >> >> But once you know that ravel and reshape don't care about memory, the >> ravel is easy to predict (maybe not easy to visualize in 4-D): > > But this assumes that you already know that there's such a thing as > memory layout, and there's such a thing as index ordering, and that > 'C' and 'F' in ravel refer to index ordering. Once you have that, > you're golden. I'm arguing it's markedly harder to get this > distinction, and keep it in mind, and teach it, if we are using the > 'C' and 'F" names for both things. No, I think you are still missing my point. I think explaining ravel and reshape F and C is easy (kind of) because the students don't need to know at that stage about memory layouts. All they need to know is that we look at n-dimensional objects in C-order or in F-order (whichever index runs fastest) > >> order=C: stack the last dimension, N, time series of one 3d pixels, >> then stack the time series of the next pixel... >> process pixels by depth and the row by row (like old TVs) >> >> I assume you did this because your underlying array is C contiguous. >> so your ravel('C') is a c-contiguous view (instead of some weird >> strides or a copy) > > Sorry - what do you mean by 'this' in 'did this'? Reshape? Why > would it matter what my underlying array memory layout was? `this` was use ravel('C') and have time series as last index. Because if we have a few gigabytes of video recordings, we better match the ravel order with the memory order. I thought you picked time N in the last axis, so you can have fast access to time series (assuming you didn't specify F-contiguous). (it's not confusing: we have two orders, index/iterator and memory, and to get a nice view, the two should match) rereading: since you had F-ordered memory, ravel('F') gives the nice view (a picture at a time instead of a timeseries at a time) > >> I usually prefer time in the first dimension, and stack order=F, then >> I can start at the front, stack all time periods of the first pixel, >> keep going and work pixels down the columns, first page, next page, >> ... >> (and I hope I have a F-contiguous array, so my raveled array is also >> F-contiguous.) >> >> (note: I'm bringing memory back in as optimization, but not to predict >> the stacking) >> >> Josef >> (I think brains are designed for Fortran order and C-ordering in numpy >> is a accident, >> except, reading a Western language book is neither) > > Yes, I find first axis fastest changing easier to think about, and I > came from MATLAB (about 8 years ago mind), so that also made it more > natural. > > I had (until yesterday) simply assumed that numpy unraveled that way, > because it seemed more obvious to me, and knew that the unravel index > order need have nothing to do with the memory order, or the fact that > arrays are C contiguous by default. Not so of course. That's not my > complaint as you know - it's just a convention, I guessed the > convention wrong. Numpy was written by C developers, and one of the first things I learned about numpy is the ``order``: Default is always C (except for linalg) and axis=None (except in scipy.stats), and dimensions disappear in reduce Cheers, Josef > > Cheers, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From cournape at gmail.com Sun Mar 31 05:30:35 2013 From: cournape at gmail.com (David Cournapeau) Date: Sun, 31 Mar 2013 10:30:35 +0100 Subject: [Numpy-discussion] Indexing bug In-Reply-To: References: Message-ID: On Sun, Mar 31, 2013 at 6:14 AM, Ivan Oseledets wrote: > Message: 2 > Date: Sat, 30 Mar 2013 11:13:35 -0700 > From: Jaime Fern?ndez del R?o > Subject: Re: [Numpy-discussion] Indexing bug? > To: Discussion of Numerical Python > Message-ID: > > Content-Type: text/plain; charset="iso-8859-1" > > On Sat, Mar 30, 2013 at 11:01 AM, Ivan Oseledets > wrote: > >> I am using numpy 1.6.1, >> and encountered a wierd fancy indexing bug: >> >> import numpy as np >> c = np.random.randn(10,200,10); >> >> In [29]: print c[[0,1],:200,:2].shape >> (2, 200, 2) >> >> In [30]: print c[[0,1],:200,[0,1]].shape >> (2, 200) >> >> It means, that here fancy indexing is not working right for a 3d array. >> > > On Sat, Mar 30, 2013 at 11:01 AM, Ivan Oseledets > wrote: > >> I am using numpy 1.6.1, >> and encountered a wierd fancy indexing bug: >> >> import numpy as np >> c = np.random.randn(10,200,10); >> >> In [29]: print c[[0,1],:200,:2].shape >> (2, 200, 2) >> >> In [30]: print c[[0,1],:200,[0,1]].shape >> (2, 200) >> >> It means, that here fancy indexing is not working right for a 3d array. >> > --> > It is working fine, review the docs: > > http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#advanced-indexing > > In your return, item [0, :] is c[0, :, 0] and item[1, :]is c[1, :, 1]. > > If you want a return of shape (2, 200, 2) where item [i, :, j] is c[i, :, > j] you could use slicing: > > c[:2, :200, :2] > > or something more elaborate like: > > c[np.arange(2)[:, None, None], np.arange(200)[:, None], np.arange(2)] > > Jaime > ---> > > > Oh! So it is not a bug, it is a feature, which is completely > incompatible with other array based languages (MATLAB and Fortran). To > me, I can not find a single explanation why it is so in numpy. > Taking submatrices from a matrix is a common operation and the syntax > above is very natural to take submatrices, not a weird diagonal stuff. It is not a weird diagonal stuff, but a well define operation: when you use fancy indexing, the indexing numbers become coordinate ( > i.e., > > c = np.random.randn(100,100) > d = c[[0,3],[2,3]] > > should NOT produce two numbers! (and you can not do it using slices!) > > In MATLAB and Fortran > c(indi,indj) > will produce a 2 x 2 matrix. > How it can be done in numpy (and why the complications?) in your example, it is simple enough: c[[0, 3], 2:4] (return the first row limited to columns 3, 4, and the 4th row limiter to columns 3, 4). Numpy's syntax is' biased' toward fancy indexing, and you need more typing if you want to extract 'irregular' submatrices. Matlab has a different tradeoff (extracting irregular sub-matrices is sligthly easier, but selecting a few points is harder as you need sub2index to use linear indexing). David From matthew.brett at gmail.com Sun Mar 31 15:54:29 2013 From: matthew.brett at gmail.com (Matthew Brett) Date: Sun, 31 Mar 2013 12:54:29 -0700 Subject: [Numpy-discussion] Raveling, reshape order keyword unnecessarily confuses index and memory ordering In-Reply-To: References: Message-ID: Hi, On Sat, Mar 30, 2013 at 10:38 PM, wrote: > On Sun, Mar 31, 2013 at 12:50 AM, Matthew Brett wrote: >> Hi, >> >> On Sat, Mar 30, 2013 at 9:37 PM, wrote: >>> On Sun, Mar 31, 2013 at 12:04 AM, Matthew Brett wrote: >>>> Hi, >>>> >>>> On Sat, Mar 30, 2013 at 7:02 PM, wrote: >>>>> On Sat, Mar 30, 2013 at 8:29 PM, Matthew Brett wrote: >>>>>> Hi, >>>>>> >>>>>> On Sat, Mar 30, 2013 at 7:50 PM, wrote: >>>>>>> On Sat, Mar 30, 2013 at 7:31 PM, Bradley M. Froehle >>>>>>> wrote: >>>>>>>> On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> On Sat, Mar 30, 2013 at 2:20 PM, wrote: >>>>>>>>> > On Sat, Mar 30, 2013 at 4:57 PM, wrote: >>>>>>>>> >> On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett >>>>>>>>> >> wrote: >>>>>>>>> >>> On Sat, Mar 30, 2013 at 4:14 AM, wrote: >>>>>>>>> >>>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett >>>>>>>>> >>>> wrote: >>>>>>>>> >>>>> >>>>>>>>> >>>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index >>>>>>>>> >>>>> ordering. >>>>>>>>> >>>>> >>>>>>>>> >>>>> This is very confusing. We think the index ordering and memory >>>>>>>>> >>>>> ordering ideas need to be separated, and specifically, we should >>>>>>>>> >>>>> avoid >>>>>>>>> >>>>> using "C" and "F" to refer to index ordering. >>>>>>>>> >>>>> >>>>>>>>> >>>>> Proposal >>>>>>>>> >>>>> ------------- >>>>>>>>> >>>>> >>>>>>>>> >>>>> * Deprecate the use of "C" and "F" meaning backwards and forwards >>>>>>>>> >>>>> index ordering for ravel, reshape >>>>>>>>> >>>>> * Prefer "Z" and "N", being graphical representations of unraveling >>>>>>>>> >>>>> in >>>>>>>>> >>>>> 2 dimensions, axis1 first and axis0 first respectively (excellent >>>>>>>>> >>>>> naming idea by Paul Ivanov) >>>>>>>>> >>>>> >>>>>>>>> >>>>> What do y'all think? >>>>>>>>> >>>> >>>>>>>>> >>>> I always thought "F" and "C" are easy to understand, I always thought >>>>>>>>> >>>> about >>>>>>>>> >>>> the content and never about the memory when using it. >>>>>>>>> >> >>>>>>>>> >> changing the names doesn't make it easier to understand. >>>>>>>>> >> I think the confusion is because the new A and K refer to existing >>>>>>>>> >> memory >>>>>>>>> >> >>>>>>>>> >>>>>>>>> I disagree, I think it's confusing, but I have evidence, and that is >>>>>>>>> that four out of four of us tested ourselves and got it wrong. >>>>>>>>> >>>>>>>>> Perhaps we are particularly dumb or poorly informed, but I think it's >>>>>>>>> rash to assert there is no problem here. >>>>>>> >>>>>>> I think you are overcomplicating things or phrased it as a "trick question" >>>>>> >>>>>> I don't know what you mean by trick question - was there something >>>>>> over-complicated in the example? I deliberately didn't include >>>>>> various much more confusing examples in "reshape". >>>>> >>>>> I meant making the "candidates" think about memory instead of just >>>>> column versus row stacking. >>>> >>>> To be specific, we were teaching about reshaping a (I, J, K, N) 4D >>>> array, it was an image, with time as the 4th dimension (N time >>>> points). Raveling and reshaping 3D and 4D arrays is a common thing >>>> to do in neuroimaging, as you can imagine. >>>> >>>> A student asked what he would get back from raveling this array, a >>>> concatenated time series, or something spatial? >>>> >>>> We showed (I'd worked it out by this time) that the first N values >>>> were the time series given by [0, 0, 0, :]. >>>> >>>> He said - "Oh - I see - so the data is stored as a whole lot of time >>>> series one by one, I thought it would be stored as a series of >>>> images'. >>>> >>>> Ironically, this was a Fortran-ordered array in memory, and he was wrong. >>>> >>>> So, I think the idea of memory ordering and index ordering is very >>>> easy to confuse, and comes up naturally. >>>> >>>> I would like, as a teacher, to be able to say something like: >>>> >>>> This is what C memory layout is (it's the memory layout that gives >>>> arr.flags.C_CONTIGUOUS=True) >>>> This is what F memory layout is (it's the memory layout that gives >>>> arr.flags.F_CONTIGUOUS=True) >>>> It's rather easy to get something that is neither C or F memory layout >>>> Numpy does many memory layouts. >>>> Ravel and reshape and numpy in general do not care (normally) about C >>>> or F layouts, they only care about index ordering. >>>> >>>> My point, that I'm repeating, is that my job is made harder by >>>> 'arr.ravel('F')'. >>> >>> But once you know that ravel and reshape don't care about memory, the >>> ravel is easy to predict (maybe not easy to visualize in 4-D): >> >> But this assumes that you already know that there's such a thing as >> memory layout, and there's such a thing as index ordering, and that >> 'C' and 'F' in ravel refer to index ordering. Once you have that, >> you're golden. I'm arguing it's markedly harder to get this >> distinction, and keep it in mind, and teach it, if we are using the >> 'C' and 'F" names for both things. > > No, I think you are still missing my point. > I think explaining ravel and reshape F and C is easy (kind of) because the > students don't need to know at that stage about memory layouts. > > All they need to know is that we look at n-dimensional objects in > C-order or in F-order > (whichever index runs fastest) Would you accept that it may or may not be true that it is desirable or practical not to mention memory layouts when teaching numpy? You believe it is desirable, I believe that it is not - that teaching numpy naturally involves some discussion of memory layout. As evidence: * My student, without any prompting about memory layouts, is asking about it * Travis' numpy book has a very early section on this (section 2.3 - memory layout) * I often think about memory layouts, and from your discussion, you do too. It's uncommon that you don't have to teach something that experienced users think about often. * The most common use of 'order' only refers to memory layout. For example np.array "order" doesn't refer to index ordering but to memory layout. * The current docstring of 'reshape' cannot be explained without referring to memory order. Cheers, Matthew From josef.pktd at gmail.com Sun Mar 31 16:43:36 2013 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 31 Mar 2013 16:43:36 -0400 Subject: [Numpy-discussion] Raveling, reshape order keyword unnecessarily confuses index and memory ordering In-Reply-To: References: Message-ID: On Sun, Mar 31, 2013 at 3:54 PM, Matthew Brett wrote: > Hi, > > On Sat, Mar 30, 2013 at 10:38 PM, wrote: >> On Sun, Mar 31, 2013 at 12:50 AM, Matthew Brett wrote: >>> Hi, >>> >>> On Sat, Mar 30, 2013 at 9:37 PM, wrote: >>>> On Sun, Mar 31, 2013 at 12:04 AM, Matthew Brett wrote: >>>>> Hi, >>>>> >>>>> On Sat, Mar 30, 2013 at 7:02 PM, wrote: >>>>>> On Sat, Mar 30, 2013 at 8:29 PM, Matthew Brett wrote: >>>>>>> Hi, >>>>>>> >>>>>>> On Sat, Mar 30, 2013 at 7:50 PM, wrote: >>>>>>>> On Sat, Mar 30, 2013 at 7:31 PM, Bradley M. Froehle >>>>>>>> wrote: >>>>>>>>> On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> On Sat, Mar 30, 2013 at 2:20 PM, wrote: >>>>>>>>>> > On Sat, Mar 30, 2013 at 4:57 PM, wrote: >>>>>>>>>> >> On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett >>>>>>>>>> >> wrote: >>>>>>>>>> >>> On Sat, Mar 30, 2013 at 4:14 AM, wrote: >>>>>>>>>> >>>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett >>>>>>>>>> >>>> wrote: >>>>>>>>>> >>>>> >>>>>>>>>> >>>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index >>>>>>>>>> >>>>> ordering. >>>>>>>>>> >>>>> >>>>>>>>>> >>>>> This is very confusing. We think the index ordering and memory >>>>>>>>>> >>>>> ordering ideas need to be separated, and specifically, we should >>>>>>>>>> >>>>> avoid >>>>>>>>>> >>>>> using "C" and "F" to refer to index ordering. >>>>>>>>>> >>>>> >>>>>>>>>> >>>>> Proposal >>>>>>>>>> >>>>> ------------- >>>>>>>>>> >>>>> >>>>>>>>>> >>>>> * Deprecate the use of "C" and "F" meaning backwards and forwards >>>>>>>>>> >>>>> index ordering for ravel, reshape >>>>>>>>>> >>>>> * Prefer "Z" and "N", being graphical representations of unraveling >>>>>>>>>> >>>>> in >>>>>>>>>> >>>>> 2 dimensions, axis1 first and axis0 first respectively (excellent >>>>>>>>>> >>>>> naming idea by Paul Ivanov) >>>>>>>>>> >>>>> >>>>>>>>>> >>>>> What do y'all think? >>>>>>>>>> >>>> >>>>>>>>>> >>>> I always thought "F" and "C" are easy to understand, I always thought >>>>>>>>>> >>>> about >>>>>>>>>> >>>> the content and never about the memory when using it. >>>>>>>>>> >> >>>>>>>>>> >> changing the names doesn't make it easier to understand. >>>>>>>>>> >> I think the confusion is because the new A and K refer to existing >>>>>>>>>> >> memory >>>>>>>>>> >> >>>>>>>>>> >>>>>>>>>> I disagree, I think it's confusing, but I have evidence, and that is >>>>>>>>>> that four out of four of us tested ourselves and got it wrong. >>>>>>>>>> >>>>>>>>>> Perhaps we are particularly dumb or poorly informed, but I think it's >>>>>>>>>> rash to assert there is no problem here. >>>>>>>> >>>>>>>> I think you are overcomplicating things or phrased it as a "trick question" >>>>>>> >>>>>>> I don't know what you mean by trick question - was there something >>>>>>> over-complicated in the example? I deliberately didn't include >>>>>>> various much more confusing examples in "reshape". >>>>>> >>>>>> I meant making the "candidates" think about memory instead of just >>>>>> column versus row stacking. >>>>> >>>>> To be specific, we were teaching about reshaping a (I, J, K, N) 4D >>>>> array, it was an image, with time as the 4th dimension (N time >>>>> points). Raveling and reshaping 3D and 4D arrays is a common thing >>>>> to do in neuroimaging, as you can imagine. >>>>> >>>>> A student asked what he would get back from raveling this array, a >>>>> concatenated time series, or something spatial? >>>>> >>>>> We showed (I'd worked it out by this time) that the first N values >>>>> were the time series given by [0, 0, 0, :]. >>>>> >>>>> He said - "Oh - I see - so the data is stored as a whole lot of time >>>>> series one by one, I thought it would be stored as a series of >>>>> images'. >>>>> >>>>> Ironically, this was a Fortran-ordered array in memory, and he was wrong. >>>>> >>>>> So, I think the idea of memory ordering and index ordering is very >>>>> easy to confuse, and comes up naturally. >>>>> >>>>> I would like, as a teacher, to be able to say something like: >>>>> >>>>> This is what C memory layout is (it's the memory layout that gives >>>>> arr.flags.C_CONTIGUOUS=True) >>>>> This is what F memory layout is (it's the memory layout that gives >>>>> arr.flags.F_CONTIGUOUS=True) >>>>> It's rather easy to get something that is neither C or F memory layout >>>>> Numpy does many memory layouts. >>>>> Ravel and reshape and numpy in general do not care (normally) about C >>>>> or F layouts, they only care about index ordering. >>>>> >>>>> My point, that I'm repeating, is that my job is made harder by >>>>> 'arr.ravel('F')'. >>>> >>>> But once you know that ravel and reshape don't care about memory, the >>>> ravel is easy to predict (maybe not easy to visualize in 4-D): >>> >>> But this assumes that you already know that there's such a thing as >>> memory layout, and there's such a thing as index ordering, and that >>> 'C' and 'F' in ravel refer to index ordering. Once you have that, >>> you're golden. I'm arguing it's markedly harder to get this >>> distinction, and keep it in mind, and teach it, if we are using the >>> 'C' and 'F" names for both things. >> >> No, I think you are still missing my point. >> I think explaining ravel and reshape F and C is easy (kind of) because the >> students don't need to know at that stage about memory layouts. >> >> All they need to know is that we look at n-dimensional objects in >> C-order or in F-order >> (whichever index runs fastest) > > Would you accept that it may or may not be true that it is desirable > or practical not to mention memory layouts when teaching numpy? I think they should be in two different sections. basic usage: ravel, reshape in pure index order, and indexing, broadcasting, ... advanced usage: memory layout and some ability to predict when you get a view and when you get a copy. And I still think words can mean different things in different context (with a qualifier maybe) indexing in fortran order memory in fortran order Disclaimer: I never tried to teach numpy and with GSOC students my explanations only went a little bit beyond what they needed to know for the purpose at hand (I hope) > > You believe it is desirable, I believe that it is not - that teaching > numpy naturally involves some discussion of memory layout. > > As evidence: > > * My student, without any prompting about memory layouts, is asking about it > * Travis' numpy book has a very early section on this (section 2.3 - > memory layout) > * I often think about memory layouts, and from your discussion, you do > too. It's uncommon that you don't have to teach something that > experienced users think about often. I'm mentioning memory layout because I'm talking to you. I wouldn't talk about memory layout if I would try to explain ravel, reshape and indexing for the first time to a student. > * The most common use of 'order' only refers to memory layout. For > example np.array "order" doesn't refer to index ordering but to memory > layout. No, as I tried to show with the statsmodels example. I don't require GSOC students (that are relatively new to numpy) to understand much about memory layout. The only use of ``order`` in statsmodels refers to *index* order in ravel and reshape. > * The current docstring of 'reshape' cannot be explained without > referring to memory order. really ? I thought reshape only refers to *index* order for "F" and "C" I don't think I can express my preference for reshape order="F" any better than I did, so maybe it's time for some additional users/developers to chime in. Josef > > Cheers, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From ralf.gommers at gmail.com Sun Mar 31 17:03:05 2013 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sun, 31 Mar 2013 23:03:05 +0200 Subject: [Numpy-discussion] Raveling, reshape order keyword unnecessarily confuses index and memory ordering In-Reply-To: References: Message-ID: On Sun, Mar 31, 2013 at 10:43 PM, wrote: > On Sun, Mar 31, 2013 at 3:54 PM, Matthew Brett > wrote: > > Hi, > > > > On Sat, Mar 30, 2013 at 10:38 PM, wrote: > >> On Sun, Mar 31, 2013 at 12:50 AM, Matthew Brett < > matthew.brett at gmail.com> wrote: > >>> Hi, > >>> > >>> On Sat, Mar 30, 2013 at 9:37 PM, wrote: > >>>> On Sun, Mar 31, 2013 at 12:04 AM, Matthew Brett < > matthew.brett at gmail.com> wrote: > >>>>> Hi, > >>>>> > >>>>> On Sat, Mar 30, 2013 at 7:02 PM, wrote: > >>>>>> On Sat, Mar 30, 2013 at 8:29 PM, Matthew Brett < > matthew.brett at gmail.com> wrote: > >>>>>>> Hi, > >>>>>>> > >>>>>>> On Sat, Mar 30, 2013 at 7:50 PM, wrote: > >>>>>>>> On Sat, Mar 30, 2013 at 7:31 PM, Bradley M. Froehle > >>>>>>>> wrote: > >>>>>>>>> On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett < > matthew.brett at gmail.com> > >>>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>> On Sat, Mar 30, 2013 at 2:20 PM, wrote: > >>>>>>>>>> > On Sat, Mar 30, 2013 at 4:57 PM, > wrote: > >>>>>>>>>> >> On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett > >>>>>>>>>> >> wrote: > >>>>>>>>>> >>> On Sat, Mar 30, 2013 at 4:14 AM, > wrote: > >>>>>>>>>> >>>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett > >>>>>>>>>> >>>> wrote: > >>>>>>>>>> >>>>> > >>>>>>>>>> >>>>> Ravel and reshape use the tems 'C' and 'F" in the sense > of index > >>>>>>>>>> >>>>> ordering. > >>>>>>>>>> >>>>> > >>>>>>>>>> >>>>> This is very confusing. We think the index ordering and > memory > >>>>>>>>>> >>>>> ordering ideas need to be separated, and specifically, we > should > >>>>>>>>>> >>>>> avoid > >>>>>>>>>> >>>>> using "C" and "F" to refer to index ordering. > >>>>>>>>>> >>>>> > >>>>>>>>>> >>>>> Proposal > >>>>>>>>>> >>>>> ------------- > >>>>>>>>>> >>>>> > >>>>>>>>>> >>>>> * Deprecate the use of "C" and "F" meaning backwards and > forwards > >>>>>>>>>> >>>>> index ordering for ravel, reshape > >>>>>>>>>> >>>>> * Prefer "Z" and "N", being graphical representations of > unraveling > >>>>>>>>>> >>>>> in > >>>>>>>>>> >>>>> 2 dimensions, axis1 first and axis0 first respectively > (excellent > >>>>>>>>>> >>>>> naming idea by Paul Ivanov) > >>>>>>>>>> >>>>> > >>>>>>>>>> >>>>> What do y'all think? > >>>>>>>>>> >>>> > >>>>>>>>>> >>>> I always thought "F" and "C" are easy to understand, I > always thought > >>>>>>>>>> >>>> about > >>>>>>>>>> >>>> the content and never about the memory when using it. > >>>>>>>>>> >> > >>>>>>>>>> >> changing the names doesn't make it easier to understand. > >>>>>>>>>> >> I think the confusion is because the new A and K refer to > existing > >>>>>>>>>> >> memory > >>>>>>>>>> >> > >>>>>>>>>> > >>>>>>>>>> I disagree, I think it's confusing, but I have evidence, and > that is > >>>>>>>>>> that four out of four of us tested ourselves and got it wrong. > >>>>>>>>>> > >>>>>>>>>> Perhaps we are particularly dumb or poorly informed, but I > think it's > >>>>>>>>>> rash to assert there is no problem here. > >>>>>>>> > >>>>>>>> I think you are overcomplicating things or phrased it as a "trick > question" > >>>>>>> > >>>>>>> I don't know what you mean by trick question - was there something > >>>>>>> over-complicated in the example? I deliberately didn't include > >>>>>>> various much more confusing examples in "reshape". > >>>>>> > >>>>>> I meant making the "candidates" think about memory instead of just > >>>>>> column versus row stacking. > >>>>> > >>>>> To be specific, we were teaching about reshaping a (I, J, K, N) 4D > >>>>> array, it was an image, with time as the 4th dimension (N time > >>>>> points). Raveling and reshaping 3D and 4D arrays is a common thing > >>>>> to do in neuroimaging, as you can imagine. > >>>>> > >>>>> A student asked what he would get back from raveling this array, a > >>>>> concatenated time series, or something spatial? > >>>>> > >>>>> We showed (I'd worked it out by this time) that the first N values > >>>>> were the time series given by [0, 0, 0, :]. > >>>>> > >>>>> He said - "Oh - I see - so the data is stored as a whole lot of time > >>>>> series one by one, I thought it would be stored as a series of > >>>>> images'. > >>>>> > >>>>> Ironically, this was a Fortran-ordered array in memory, and he was > wrong. > >>>>> > >>>>> So, I think the idea of memory ordering and index ordering is very > >>>>> easy to confuse, and comes up naturally. > >>>>> > >>>>> I would like, as a teacher, to be able to say something like: > >>>>> > >>>>> This is what C memory layout is (it's the memory layout that gives > >>>>> arr.flags.C_CONTIGUOUS=True) > >>>>> This is what F memory layout is (it's the memory layout that gives > >>>>> arr.flags.F_CONTIGUOUS=True) > >>>>> It's rather easy to get something that is neither C or F memory > layout > >>>>> Numpy does many memory layouts. > >>>>> Ravel and reshape and numpy in general do not care (normally) about C > >>>>> or F layouts, they only care about index ordering. > >>>>> > >>>>> My point, that I'm repeating, is that my job is made harder by > >>>>> 'arr.ravel('F')'. > >>>> > >>>> But once you know that ravel and reshape don't care about memory, the > >>>> ravel is easy to predict (maybe not easy to visualize in 4-D): > >>> > >>> But this assumes that you already know that there's such a thing as > >>> memory layout, and there's such a thing as index ordering, and that > >>> 'C' and 'F' in ravel refer to index ordering. Once you have that, > >>> you're golden. I'm arguing it's markedly harder to get this > >>> distinction, and keep it in mind, and teach it, if we are using the > >>> 'C' and 'F" names for both things. > >> > >> No, I think you are still missing my point. > >> I think explaining ravel and reshape F and C is easy (kind of) because > the > >> students don't need to know at that stage about memory layouts. > >> > >> All they need to know is that we look at n-dimensional objects in > >> C-order or in F-order > >> (whichever index runs fastest) > > > > Would you accept that it may or may not be true that it is desirable > > or practical not to mention memory layouts when teaching numpy? > > I think they should be in two different sections. > > basic usage: > ravel, reshape in pure index order, and indexing, broadcasting, ... > > advanced usage: > memory layout and some ability to predict when you get a view and > when you get a copy. > > And I still think words can mean different things in different context > (with a qualifier maybe) > indexing in fortran order > memory in fortran order > > Disclaimer: I never tried to teach numpy > and with GSOC students my explanations only went a little bit > beyond what they needed to know for the purpose at hand (I hope) > > > > > You believe it is desirable, I believe that it is not - that teaching > > numpy naturally involves some discussion of memory layout. > > > > As evidence: > > > > * My student, without any prompting about memory layouts, is asking > about it > > * Travis' numpy book has a very early section on this (section 2.3 - > > memory layout) > > * I often think about memory layouts, and from your discussion, you do > > too. It's uncommon that you don't have to teach something that > > experienced users think about often. > > I'm mentioning memory layout because I'm talking to you. > I wouldn't talk about memory layout if I would try to explain ravel, > reshape and indexing for the first time to a student. > > > * The most common use of 'order' only refers to memory layout. For > > example np.array "order" doesn't refer to index ordering but to memory > > layout. > > No, as I tried to show with the statsmodels example. > I don't require GSOC students (that are relatively new to numpy) to > understand > much about memory layout. > The only use of ``order`` in statsmodels refers to *index* order in > ravel and reshape. > > > * The current docstring of 'reshape' cannot be explained without > > referring to memory order. > > really ? > I thought reshape only refers to *index* order for "F" and "C" > > I don't think I can express my preference for reshape order="F" any > better than I did, so maybe it's time for some additional users/developers > to chime in. My 2cents: while I can't go back and un-read earlier emails in this thread, I don't see what's ambiguous in the case of ravel. For reshape I can see though that it's possible to interpret it in two ways. In such cases I open up IPython and play with a 2x3 array to check my understanding. That's OK, and certainly better than adding duplicate names now for C/F even if that would solve the issue (which it probably wouldn't). Therefore I'm -1 on the initial proposal. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Sun Mar 31 17:04:46 2013 From: matthew.brett at gmail.com (Matthew Brett) Date: Sun, 31 Mar 2013 14:04:46 -0700 Subject: [Numpy-discussion] Raveling, reshape order keyword unnecessarily confuses index and memory ordering In-Reply-To: References: Message-ID: Hi, On Sun, Mar 31, 2013 at 1:43 PM, wrote: > On Sun, Mar 31, 2013 at 3:54 PM, Matthew Brett wrote: >> Hi, >> >> On Sat, Mar 30, 2013 at 10:38 PM, wrote: >>> On Sun, Mar 31, 2013 at 12:50 AM, Matthew Brett wrote: >>>> Hi, >>>> >>>> On Sat, Mar 30, 2013 at 9:37 PM, wrote: >>>>> On Sun, Mar 31, 2013 at 12:04 AM, Matthew Brett wrote: >>>>>> Hi, >>>>>> >>>>>> On Sat, Mar 30, 2013 at 7:02 PM, wrote: >>>>>>> On Sat, Mar 30, 2013 at 8:29 PM, Matthew Brett wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> On Sat, Mar 30, 2013 at 7:50 PM, wrote: >>>>>>>>> On Sat, Mar 30, 2013 at 7:31 PM, Bradley M. Froehle >>>>>>>>> wrote: >>>>>>>>>> On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett >>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> On Sat, Mar 30, 2013 at 2:20 PM, wrote: >>>>>>>>>>> > On Sat, Mar 30, 2013 at 4:57 PM, wrote: >>>>>>>>>>> >> On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett >>>>>>>>>>> >> wrote: >>>>>>>>>>> >>> On Sat, Mar 30, 2013 at 4:14 AM, wrote: >>>>>>>>>>> >>>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett >>>>>>>>>>> >>>> wrote: >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index >>>>>>>>>>> >>>>> ordering. >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> This is very confusing. We think the index ordering and memory >>>>>>>>>>> >>>>> ordering ideas need to be separated, and specifically, we should >>>>>>>>>>> >>>>> avoid >>>>>>>>>>> >>>>> using "C" and "F" to refer to index ordering. >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> Proposal >>>>>>>>>>> >>>>> ------------- >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> * Deprecate the use of "C" and "F" meaning backwards and forwards >>>>>>>>>>> >>>>> index ordering for ravel, reshape >>>>>>>>>>> >>>>> * Prefer "Z" and "N", being graphical representations of unraveling >>>>>>>>>>> >>>>> in >>>>>>>>>>> >>>>> 2 dimensions, axis1 first and axis0 first respectively (excellent >>>>>>>>>>> >>>>> naming idea by Paul Ivanov) >>>>>>>>>>> >>>>> >>>>>>>>>>> >>>>> What do y'all think? >>>>>>>>>>> >>>> >>>>>>>>>>> >>>> I always thought "F" and "C" are easy to understand, I always thought >>>>>>>>>>> >>>> about >>>>>>>>>>> >>>> the content and never about the memory when using it. >>>>>>>>>>> >> >>>>>>>>>>> >> changing the names doesn't make it easier to understand. >>>>>>>>>>> >> I think the confusion is because the new A and K refer to existing >>>>>>>>>>> >> memory >>>>>>>>>>> >> >>>>>>>>>>> >>>>>>>>>>> I disagree, I think it's confusing, but I have evidence, and that is >>>>>>>>>>> that four out of four of us tested ourselves and got it wrong. >>>>>>>>>>> >>>>>>>>>>> Perhaps we are particularly dumb or poorly informed, but I think it's >>>>>>>>>>> rash to assert there is no problem here. >>>>>>>>> >>>>>>>>> I think you are overcomplicating things or phrased it as a "trick question" >>>>>>>> >>>>>>>> I don't know what you mean by trick question - was there something >>>>>>>> over-complicated in the example? I deliberately didn't include >>>>>>>> various much more confusing examples in "reshape". >>>>>>> >>>>>>> I meant making the "candidates" think about memory instead of just >>>>>>> column versus row stacking. >>>>>> >>>>>> To be specific, we were teaching about reshaping a (I, J, K, N) 4D >>>>>> array, it was an image, with time as the 4th dimension (N time >>>>>> points). Raveling and reshaping 3D and 4D arrays is a common thing >>>>>> to do in neuroimaging, as you can imagine. >>>>>> >>>>>> A student asked what he would get back from raveling this array, a >>>>>> concatenated time series, or something spatial? >>>>>> >>>>>> We showed (I'd worked it out by this time) that the first N values >>>>>> were the time series given by [0, 0, 0, :]. >>>>>> >>>>>> He said - "Oh - I see - so the data is stored as a whole lot of time >>>>>> series one by one, I thought it would be stored as a series of >>>>>> images'. >>>>>> >>>>>> Ironically, this was a Fortran-ordered array in memory, and he was wrong. >>>>>> >>>>>> So, I think the idea of memory ordering and index ordering is very >>>>>> easy to confuse, and comes up naturally. >>>>>> >>>>>> I would like, as a teacher, to be able to say something like: >>>>>> >>>>>> This is what C memory layout is (it's the memory layout that gives >>>>>> arr.flags.C_CONTIGUOUS=True) >>>>>> This is what F memory layout is (it's the memory layout that gives >>>>>> arr.flags.F_CONTIGUOUS=True) >>>>>> It's rather easy to get something that is neither C or F memory layout >>>>>> Numpy does many memory layouts. >>>>>> Ravel and reshape and numpy in general do not care (normally) about C >>>>>> or F layouts, they only care about index ordering. >>>>>> >>>>>> My point, that I'm repeating, is that my job is made harder by >>>>>> 'arr.ravel('F')'. >>>>> >>>>> But once you know that ravel and reshape don't care about memory, the >>>>> ravel is easy to predict (maybe not easy to visualize in 4-D): >>>> >>>> But this assumes that you already know that there's such a thing as >>>> memory layout, and there's such a thing as index ordering, and that >>>> 'C' and 'F' in ravel refer to index ordering. Once you have that, >>>> you're golden. I'm arguing it's markedly harder to get this >>>> distinction, and keep it in mind, and teach it, if we are using the >>>> 'C' and 'F" names for both things. >>> >>> No, I think you are still missing my point. >>> I think explaining ravel and reshape F and C is easy (kind of) because the >>> students don't need to know at that stage about memory layouts. >>> >>> All they need to know is that we look at n-dimensional objects in >>> C-order or in F-order >>> (whichever index runs fastest) >> >> Would you accept that it may or may not be true that it is desirable >> or practical not to mention memory layouts when teaching numpy? > > I think they should be in two different sections. > > basic usage: > ravel, reshape in pure index order, and indexing, broadcasting, ... > > advanced usage: > memory layout and some ability to predict when you get a view and > when you get a copy. Right - that is what you think - but I was asking - do you agree that it's possible that that is not best way to teach it? What evidence would you give that it was the best way to teach it? > And I still think words can mean different things in different context > (with a qualifier maybe) > indexing in fortran order > memory in fortran order Right - but you'd probably also accept that using the same word for different and related things is likely to cause confusion? I'm sure we could come up with some experimental evidence for that if you do doubt it. > Disclaimer: I never tried to teach numpy > and with GSOC students my explanations only went a little bit > beyond what they needed to know for the purpose at hand (I hope) > >> >> You believe it is desirable, I believe that it is not - that teaching >> numpy naturally involves some discussion of memory layout. >> >> As evidence: >> >> * My student, without any prompting about memory layouts, is asking about it >> * Travis' numpy book has a very early section on this (section 2.3 - >> memory layout) >> * I often think about memory layouts, and from your discussion, you do >> too. It's uncommon that you don't have to teach something that >> experienced users think about often. > > I'm mentioning memory layout because I'm talking to you. > I wouldn't talk about memory layout if I would try to explain ravel, > reshape and indexing for the first time to a student. > >> * The most common use of 'order' only refers to memory layout. For >> example np.array "order" doesn't refer to index ordering but to memory >> layout. > > No, as I tried to show with the statsmodels example. > I don't require GSOC students (that are relatively new to numpy) to understand > much about memory layout. > The only use of ``order`` in statsmodels refers to *index* order in > ravel and reshape. > >> * The current docstring of 'reshape' cannot be explained without >> referring to memory order. > > really ? > I thought reshape only refers to *index* order for "F" and "C" Here's the docstring for 'reshape': order : {'C', 'F', 'A'}, optional Determines whether the array data should be viewed as in C (row-major) order, FORTRAN (column-major) order, or the C/FORTRAN order should be preserved. The 'A' option cannot be explained without reference to 'C' or 'F' *memory* layout - i.e. a different meaning of the 'C' and 'F" in the indexing interpretation. Actually, as a matter of interest - how would you explain the behavior of 'A' when the array is neither 'C' or 'F' memory layout? Maybe that could be a good test case? Here's the docstring for 'ravel': order : {'C','F', 'A', 'K'}, optional The elements of ``a`` are read in this order. 'C' means to view the elements in C (row-major) order. 'F' means to view the elements in Fortran (column-major) order. 'A' means to view the elements in 'F' order if a is Fortran contiguous, 'C' order otherwise. 'K' means to view the elements in the order they occur in memory, except for reversing the data when strides are negative. By default, 'C' order is used. Cheers, Matthew